Dude, Where’s My Real-Time Data?

Aug 28, 2019

One of the advances RTA made a few years back was the introduction of real-time vehicle location and stop time prediction. This gave our customers access to decision-making tools they didn't previously have available, and made choosing transit an easier option.

Unfortunately, the availability of real-time data is based on many moving parts, and all of those parts have to work in concert for this information to be properly, accurately available. Case in point, the recent Red Line disruption due to the repairs needed on the S-curve retaining wall.

When we very suddenly replaced that stretch of rail service with buses, and seemingly any time we made changes to that service, and then when we reverted it back to regular Red Line service on August 26, real-time information availability lagged behind what was actually happening on the street.

The short of it is this -- there are a few different ways of presenting real-time information, which largely come down to real-time location, and real-tim​e prediction. The first, which is used in the NextConnect system that powers the real-time widget on the front page, and several other pages, of riderta.com, is not always as informative, but is less susceptible to data disruptions. The second, used in third-party services like TransitApp, can be more informative, but is more susceptible to back-end data issues. Here’s why:

The first part of being able to show real-time availability is the actual scheduled time. After all, if you don’t know what time a vehicle is scheduled to be at a particular stop, how can you know whether it’s early or late? This is supplied in files formatted to what’s called the General Transit Feed Specification, or GTFS (the G used to stand for Google, who helped develop it in cooperation with transit professionals at TriMet in Portland, OR). These are static files, and only usually change when we’re having some sort of major service change.

The second part of real-time availability relies on the similarly-named, but structurally very different, GTFS-realtime (RT) files. This data describes what's going on with the system at a given moment, in terms of traffic, road closures, or other events that can cause delays. This continuous stream of data is constantly updating – think of it like the "ticker" on any news or financial service.

Real-time location is simple, and can tell you where your vehicle is, compared to where it should be. This system has some limitations, though, in that it can only tell you a vehicle's status relative to a given stop, i.e., this bus is currently running 5 minutes late relative to stop X. However, if there is a road closure between where the bus is now, and where stop X is, those 5 minutes could easily turn into ten or more, and this system can't tell you that until the bus encounters that delay. Being a simpler system, it is more responsive to when we make wholesale changes to our data, because all it looks at is schedule data and current position.

The second type of real-time data is real-time prediction, which attempts to take a larger data set and better predict where and when the bus will be at all locations, as well as allowing for trip planning that takes this into account. This type of service is more susceptible to data disruptions, since it relies on everything being properly "in sync".

Remember those GTFS and GTFS-RT files I mentioned earlier? They both use something called a trip id to identify a particular piece of service we're providing. If the trip ids in those two file sets don't match, real-time prediction goes away. That's what happened this week, when we restored service on the Red Line. The GTFS files we're using were updated to a different set of trip ids, but the GTFS-RT files, due to technical limitations of how quickly our system can distribute out the information, are still using the old trip ids. However, real-time location, since it doesn't rely on those trip ids, continues to function, so you can continue to get real-time location information through our website widget.

The good news is, this should all be corrected on September 1, when everything is back in sync again. The better news is that our ongoing radio upgrade project (which also has the side effect of providing the free wifi in our stations, and which will soon be on our vehicles) should allow for quicker distribution of this information, which means data disruptions like this will become less cumbersome, and will have shorter (but not necessarily non-existent) lag time.

There's a lot to take in here, and a lot to understand. Want to know anything more about this topic? Get in touch with me at webmaster@gcrta.org.