MBTA Capstone Project Creating a better revenue model



Approximately two-thirds of the Massachusetts Bay Transportation Authority’s (MBTA) passenger revenue is generated through the sales of daily, weekly, and monthly passes that grant customers an unlimited number of rides on the system during a specified time window. The bundled nature of this revenue source, along with the multitude of pass types and discounts, makes analyzing and comparing the amount of revenue attributable to a specific bus or subway line difficult. Based on a dataset of 275 million+ bus and subway transactions from 2016 and archived bus and subway schedules, we built a revenue model that assigns an individualized “revenue-per-trip” value for every passenger-boarding. Our model uses a customer’s specific pass price, the number of trips they took during the time period for which their pass was valid, and the number of transfers between lines (if any) that occurred during a given trip to calculate the amount of revenue that is attributable to a specific transaction. Based on this model, we built a visualization tool that enables the transit agency to analyze the factors that are correlated with increased passenger revenue by bus line, as well as to compare passenger and revenue trends between routes overtime.


We used three data sets for our project:

Data Preprocessing

  1. We removed canceled transactions.
  2. For pass users, we imputed the cost of each transaction by distributing the revenue of daily, weekly, and monthly passes across each user’s transactions.
  3. We grouped transfers together and re-split revenue across all portions of a given trip, reassigned these values back.
  4. We corrected the route names in the dataset with the MBTA official route list.

Model Construction

'Model Graph'

Pass holders vs. Single Fare boardings


64% of MBTA bus and subway customers access the subway system via a daily, weekly, or monthly pass. While we did not have access to purchase records for these pass products directly, we were able to estimate the revenue generated from these products through the (hashed) customer id associated with each boarding transaction in our dataset. For example, if we see a customer using a weekly pass to board a bus during week 37, we presume that that customer purchased a weekly pass during week 37 at the retail price of weekly passes at that time. (Note: There was a system wide fare increase on July 1st 2016 that increased the prices of both passes and single-fare tickets.) While this estimation strategy likely leads us to under-counting pass revenue slightly, especially with regard to weekly passes (for example, we presume that customers purchased the minimum number of weekly passes for the transactions they took, while in reality customers are likely not perfectly efficient with their pass purchases), we believe these errors are small enough to mostly be ignored in our analysis.

Overall, the assumptions in our revenue model are as follows:

The last two assumptions have a significant impact on the aggregate revenues attributed to the bus-system overall, and warrant further explanation.

Four ways of calculating revenue

Of the 5 million boardings that occur each week on the MBTA, 1.7 million are bus trips, while the remaining 3.2 million represent subway trips. While more than one third of monthly trips are taken on buses, bus revenue only accounts for 25% of the $34.9 million revenue generated by passenger fares each month for the bus and subway system.

While allocating revenue from pass products, assumptions about how much a particular trip or segment is “worth” can have a significant impact in the relative allocation of revenue between different transit services. For example, if a customer purchases a monthly link pass for $75 (which includes unlimited access to both the bus and subway systems) and then rides the subway 20 times and the bus 20 times during the month, how much money should be allocated to each service? On the one hand, allocating 50% to both the bus and subway systems seems simple and straightforward. However, given that non-pass customers pay approximately 30% more for subway tickets than bus tickets ($2.25 vs $1.70 respectively), one could argue that pass customers likely place more value on their subway trips than their bus segments. Perhaps allocating more than 50% of their pass revenue to the subway system is justifiable. A second consideration relates to how transfers are considered for both pass and non-pass customers. If a customer purchases a subway fare, and then transfers to a bus within 2 hours, the bus fare is free. If we allocate all of the revenue for this fare to the originating mode of transit, the bus line accrues no additional revenue; alternatively, we could rebalance/redistribute all of the revenue generated across all segments of a trip.

The estimated differences in total revenue assigned to the bus system under different combinations of these two assumptions is displayed above: “Rebalanced by trip”, means we’ve redistributed all of the revenue generated by the suite of services that a customer used during the 2-hour transfer window evenly back to each of the services that was used. “Weighted by transit type” means we’ve weighed pass-users simulated fares by the equivalent non-pass cost, and we attribute more revenue for express buses and the subway rides, than local bus rides, which are more expensive for non-pass customers.

For our model, we decided to rebalance transfer revenues across all segments of a passenger’s journey, but assume an even weighting between transit types (“Rebalanced by trip, unweighted by transit type” above).


Revenues Vs. Estimated costs, by week, 2016