Mobroute: Development Guide

This development guide provides various pointers on local development. If something from this guide is missing feel free to add it. If you have a question regarding something on this guide, please open a ticket on the ticket tracker or send a note to the mailing list.

Diagrams

For baseline understanding of the overall system architecture the diagrams at the diagrams doc page can be very helpful.

Internals: Architecture

The Mobroute project is composed of 3 separate codebases / smaller projects:

  • Mobsql: is a GTFS-to-SQL ETL tool which pulls GTFS sources from the MobilityDB into a SQLite database. It exposes both a CLI & a Go library (consumed by Mobroute).
  • Mobroute: is the core routing tool. This project handles interfacing Mobsql via its public Go library and then once the DB is loaded, opens the DB and performs routing calculations (currently via the Connection Scan Algorithm). It exposes both a CLI & a Go library ( consumed by Transito).
  • Transito: is a simple Android & Linux GUI app to use Mobroute. It allows querying from/to via Nominatim and performing routing calculations on-the-go via Mobroute's underlying routing mechanism (note this implicitly integrates Mobsql as well).

Internals: What goes into a Routing Calculation? (e.g. full data pipeline)

Routing dataflow follows a roughly 7-step process. The overall dataflow upon submitting a routing request roughly looks like this:

One time (via RTDatabase via loadmdbgtfs/loadcustomgtfs and then compute op):

  • (1) Dataload: Mobility Database CSV Fetch
    • A HTTP request is made to fetch the Mobility Database CSV
    • This CSV contains thousands of potential GTFS sources to use
    • The CSV is imported into Mobsql's SQLite database
    • (Handled by Mobsql)
  • (2) Dataload: Upstream (Agency) GTFS Fetch & Load
    • Each feed ID in the routing request is downloaded from its source agency in raw ZIP form.
    • Each GTFS ZIP archive is imported into Mobsql's SQLite database, with each table mirroring GTFS spec (but having a 'feed_id' column added referring to GTFS feed ID from the mobility database. Adding this extra column allows 'multisource' storage of GTFS data & routing).
    • (Handled by Mobsql)
  • (3) Dataload: GTFS Computed Tables Calculations
    • The routing algorithm essentially takes the input of several arrays as params which directly correlate with particular SQL-select statements pulling data from the SQLite database housing the GTFS data.
    • These routing logic has the ability to operate directly on the GTFS data (e.g. with no-preprocessing) selecting directly from the core data extraction logic views, however that would be quite inefficient.
    • There are massive indexing and precomputation gains, and as such several views used by the routing API are stored in 'computed tables' which are essentially just a materialized table indexed by MDBID
    • In this step, the views are translated to tables as such; views named _vfoo are materialized into tables like _ctfoo; note _ct is short for "computed table".
    • (Handled by Mobsql; computed tabled spec'd as ExtraSchema by Mobroute)

Routing Library call (via RTRoute):

  • (1) Routing Library Function: Prep for Algorithm - Memload SQL selects
    • This step handles SQL selecting & loading to memory the data needed to run the core routing algorithm.
    • "Loader" functions (see db/load*) translate SQL selects into the arrays of structs for each datatype (Connections, Transfers, etc.)
    • At the completion of this step all data needed is in memory
  • (2) Routing Library Function: CSA Algorithm Execution
    • The CSA algorithm is run, passing the memload'd arrays from (4) as params to the main CSA function entrypoint as args.
    • The result / return value is an array of 'connections' which represents the most efficient route as limited down from the input array of connections. Each connection is essentially the 'quickest' way to reach each destination stop given the criteria input.
  • (3) Routing Library Function: Decoration / Memload SQL Connections Verbose
    • The array of connections from (5) is not a user-friendly format and lacks details such as stop names / latitude / longitude and similar.
    • Thus we go back to the DB as the raw GTFS data has all this information so we just pull the same connections (by a UID) with verbose information metadata.
  • (4) Routing Library Function: Formatting
    • The result of (6) is formatted into different structures by pure functions depending on user input.
    • The main format is 'legs' which is just steps for accomplishing a route like walk here, take a trip here at x time etc. (in much more verbose and dataprocessable format ofcourse).
    • There is also a 'mapurl' formatter which translates the route into GeoJSON and uses a Map URL rendering service which speaks GeoJSON.

Internals: Algorithm

The core of the routing system is based on the Connection Scan Algorithm methodology. See the following papers for more details:

Internals: Glossary

These abbreviations are used in the sourcecode, note explanation below:

Term Abbreviation Explanation
StopUID Stop Unique ID GTFS archives internally use 'stop id' to cross-reference stops. Since multiple GTFS archives can be handled with a single routing, schedule, etc. query in Mobroute; we dynamically create stop UIDs. UIDs in current implementation are always composed as {FEEDID}_{STOPID}
DRUTCTime: Date-relative UTC Time Refers to UTC time (in seconds) relative to the input date for a query. This is used primarily in the connections loader to abstract away timezones.
Feed ID Refers to Mobroute's concept of a single GTFS archive correlating to a ID number. For Mobility DB sources this is always a positive number mapping to the MDB catalog origin feed ID; for custom feed IDs this is a negative user provided number.
MDBID: Mobility Database ID Source ID from pulled from the Mobility Database Catalog.

Testing: Using the route_tester.sh Script to Test GTFS Feeds

Mobroute works with potentially any GTFS source as specified in the Mobility Database. While certain sources are tested by CI or known good, it may be helpful (either because a source is untested, or you're too lazy to specify routing request parameters) to determine if a source "is good and works for routing". The route_tester.sh script serves this functionality as essentially acting as a smoke testing script to allow users to run a 'random' routing request with lax parameters to determine if a particular MDBID can route properly. To use this script, after the mobroute binary is built (assuming you've section to build the mobroute binary and then run: ./scripts/route_tester.sh MDBID

For example: ./scripts/route_tester.sh 1898

Testing: Running Unit & Integration Tests

Run (unit) tests:

./build.sh test

Run GTFS-based (integration) tests:

./build.sh testzipgtfs

Run unit & integration tests both:

./build.sh testall

Run individual unit tests packages:

./build.sh test ./dbquery_test

Testing: Debugging Tips

Various debugging tips below:

  • Debugging via SQLite DB:
    • Run sqlite3 ~/.cache/mobroute/sqlite.db
    • One example, check that the calendar for today actually produces dates: `
      • select * from _vcaltoservice where service_date = 20231220 and source = 1898
  • Clear Cache:
    • If working with multiple sources and you paused the load process, it might not be a bad idea to clear the cache wholesale and retry again.
    • Run: rm -rf ~/.cache/mobroute

Testing: Profiling

  • Set env var MOBROUTE_PPROF to the file to write a pprof profile.
  • Set env var MOBROUTE_CFG to set global JSON runtime config (e.g. can alter mobroute MDB & mobsql params)

Regenerate CLI Documentation doc_cli.md page

The CLI documentation page simply list each subcommand for the mobroute binary and its usage. This is equivalent for each subcommand to running the help text. As such, a generator script creates this page. Regenerate the doc_cli.md page as follows:

./scripts/generate_cliguide.sh > doc/doc_cli.md

Regenerate the master mobroute_lib.go file

For end-users we provide the single package git.sr.ht/~mil/mobroute from which all public functionality is exposed. Types and functions in this package are just aliased from constituent subpackages in api/. Doing this aliasing allows from a development standpoint subpackages to be in completely distinct and isolated namespaces ensuring modularity.

Rather then manually aliasing all public functionality from each subpackage, we use a generator script to create the master library file. Regenerate the master mobroute_lib.go file as follows:

./scripts/generate_mobroutelibgo.sh > mobroute_lib.go