Mobroute: Development Guide
This development guide provides various pointers on local development. If something from this guide is missing feel free to add it. If you have a question regarding something on this guide, please open a ticket on the ticket tracker or send a note to the mailing list.
Diagrams
For baseline understanding of the overall system architecture the diagrams at the diagrams doc page can be very helpful.
Internals: Architecture
The Mobroute project is composed of 3 separate codebases / smaller projects:
- Mobsql: is a GTFS-to-SQL ETL tool which pulls GTFS sources from the MobilityDB into a SQLite database. It exposes both a CLI & a Go library (consumed by Mobroute).
- Mobroute: is the core routing tool. This project handles interfacing Mobsql via its public Go library and then once the DB is loaded, opens the DB and performs routing calculations (currently via the Connection Scan Algorithm). It exposes both a CLI & a Go library ( consumed by Transito).
- Transito: is a simple Android & Linux GUI app to use Mobroute. It allows querying from/to via Nominatim and performing routing calculations on-the-go via Mobroute's underlying routing mechanism (note this implicitly integrates Mobsql as well).
Internals: What goes into a Routing Calculation? (e.g. full data pipeline)
Routing dataflow follows a roughly 7-step process. The overall dataflow upon submitting a routing request roughly looks like this:
One time (via RTDatabase
via loadmdbgtfs
/loadcustomgtfs
and then compute
op):
- (1) Dataload: Mobility Database CSV Fetch
- A HTTP request is made to fetch the Mobility Database CSV
- This CSV contains thousands of potential GTFS sources to use
- The CSV is imported into Mobsql's SQLite database
- (Handled by Mobsql)
- (2) Dataload: Upstream (Agency) GTFS Fetch & Load
- Each feed ID in the routing request is downloaded from its source agency in raw ZIP form.
- Each GTFS ZIP archive is imported into Mobsql's SQLite database, with each table mirroring GTFS spec (but having a 'feed_id' column added referring to GTFS feed ID from the mobility database. Adding this extra column allows 'multisource' storage of GTFS data & routing).
- (Handled by Mobsql)
- (3) Dataload: GTFS Computed Tables Calculations
- The routing algorithm essentially takes the input of several arrays as params which directly correlate with particular SQL-select statements pulling data from the SQLite database housing the GTFS data.
- These routing logic has the ability to operate directly on the GTFS data (e.g. with no-preprocessing) selecting directly from the core data extraction logic views, however that would be quite inefficient.
- There are massive indexing and precomputation gains, and as such several views used by the routing API are stored in 'computed tables' which are essentially just a materialized table indexed by MDBID
- In this step, the views are translated to tables as such; views
named
_vfoo
are materialized into tables like_ctfoo
; note_ct
is short for "computed table". - (Handled by Mobsql; computed tabled spec'd as ExtraSchema by Mobroute)
Routing Library call (via RTRoute
):
- (1) Routing Library Function: Prep for Algorithm - Memload SQL selects
- This step handles SQL selecting & loading to memory the data needed to run the core routing algorithm.
- "Loader" functions (see
db/load*
) translate SQL selects into the arrays of structs for each datatype (Connections, Transfers, etc.) - At the completion of this step all data needed is in memory
- (2) Routing Library Function: CSA Algorithm Execution
- The CSA algorithm is run, passing the memload'd arrays from (4) as params to the main CSA function entrypoint as args.
- The result / return value is an array of 'connections' which represents the most efficient route as limited down from the input array of connections. Each connection is essentially the 'quickest' way to reach each destination stop given the criteria input.
- (3) Routing Library Function: Decoration / Memload SQL Connections Verbose
- The array of connections from (5) is not a user-friendly format and lacks details such as stop names / latitude / longitude and similar.
- Thus we go back to the DB as the raw GTFS data has all this information so we just pull the same connections (by a UID) with verbose information metadata.
- (4) Routing Library Function: Formatting
- The result of (6) is formatted into different structures by pure functions depending on user input.
- The main format is 'legs' which is just steps for accomplishing a route like walk here, take a trip here at x time etc. (in much more verbose and dataprocessable format ofcourse).
- There is also a 'mapurl' formatter which translates the route into GeoJSON and uses a Map URL rendering service which speaks GeoJSON.
Internals: Algorithm
The core of the routing system is based on the Connection Scan Algorithm methodology. See the following papers for more details:
Internals: Glossary
These abbreviations are used in the sourcecode, note explanation below:
Term | Abbreviation | Explanation |
---|---|---|
StopUID | Stop Unique ID | GTFS archives internally use 'stop id' to cross-reference stops. Since multiple GTFS archives can be handled with a single routing, schedule, etc. query in Mobroute; we dynamically create stop UIDs. UIDs in current implementation are always composed as {FEEDID}_{STOPID} |
DRUTCTime: | Date-relative UTC Time | Refers to UTC time (in seconds) relative to the input date for a query. This is used primarily in the connections loader to abstract away timezones. |
Feed ID | Refers to Mobroute's concept of a single GTFS archive correlating to a ID number. For Mobility DB sources this is always a positive number mapping to the MDB catalog origin feed ID; for custom feed IDs this is a negative user provided number. | |
MDBID: | Mobility Database ID | Source ID from pulled from the Mobility Database Catalog. |
Testing: Using the route_tester.sh Script to Test GTFS Feeds
Mobroute works with potentially any GTFS source as specified in the
Mobility Database. While certain sources are tested by CI or known good,
it may be helpful (either because a source is untested, or you're too
lazy to specify routing request parameters) to determine if a source
"is good and works for routing". The route_tester.sh
script serves
this functionality as essentially acting as a smoke testing script to
allow users to run a 'random' routing request with lax parameters to
determine if a particular MDBID can route properly. To use this script,
after the mobroute binary is built (assuming you've section to build
the mobroute binary and then run: ./scripts/route_tester.sh MDBID
For example:
./scripts/route_tester.sh 1898
Testing: Running Unit & Integration Tests
Run (unit) tests:
./build.sh test
Run GTFS-based (integration) tests:
./build.sh testzipgtfs
Run unit & integration tests both:
./build.sh testall
Run individual unit tests packages:
./build.sh test ./dbquery_test
Testing: Debugging Tips
Various debugging tips below:
- Debugging via SQLite DB:
- Run
sqlite3 ~/.cache/mobroute/sqlite.db
- One example, check that the calendar for today actually produces dates: `
select * from _vcaltoservice where service_date = 20231220 and source = 1898
- Run
- Clear Cache:
- If working with multiple sources and you paused the load process, it might not be a bad idea to clear the cache wholesale and retry again.
- Run:
rm -rf ~/.cache/mobroute
Testing: Profiling
- Set env var MOBROUTE_PPROF to the file to write a pprof profile.
- Set env var MOBROUTE_CFG to set global JSON runtime config (e.g. can alter mobroute MDB & mobsql params)
Regenerate CLI Documentation doc_cli.md
page
The CLI documentation page simply list each subcommand for the mobroute
binary and its usage. This is equivalent for each subcommand to running
the help text. As such, a generator script creates this page. Regenerate
the doc_cli.md
page as follows:
./scripts/generate_cliguide.sh > doc/doc_cli.md
Regenerate the master mobroute_lib.go
file
For end-users we provide the single package git.sr.ht/~mil/mobroute
from
which all public functionality is exposed. Types and functions in this
package are just aliased from constituent subpackages in api/
. Doing
this aliasing allows from a development standpoint subpackages to be in
completely distinct and isolated namespaces ensuring modularity.
Rather then manually aliasing all public functionality from each
subpackage, we use a generator script to create the master library
file. Regenerate the master mobroute_lib.go
file as follows:
./scripts/generate_mobroutelibgo.sh > mobroute_lib.go