Tamela Maciel

Logo

PhD astrophysicist. Data Scientist.
Space Communications Manager.
Physics editor. Science writer.

View My LinkedIn Profile

Find Me On Twitter

View My GitHub Profile

UK Railway Value for Money

By Tamela Maciel

ABOUT:

By train, how far can I get for the least amount of money?

This simple question led to the development of a python script that, using data from the UK’s National Rail website, lists the price of a single ticket on the morning of 1st October 2014 from a starting station to all other UK stations. Being based in Cambridge at the time, I first ran this script starting from Cambridge. I acquired the latitude and longitudes of each station and calculated the ‘as the crow flies’ distance from Cambridge to said station. The resulting ‘Value for Money’ ratio divides the distance in miles from Cambridge by the price in pounds. I repeated the process for other UK starting points including London King’s Cross and Birmingham New Street.

Ticket price data is scaped from the National Rail website using the Python library BeautifulSoup. The ticket prices were collected for a single advance ticket travelling sometime after 5am on Wednesday 1 Oct 2014. Data collected in July 2014.

Station postcodes come from the National Rail website, with amendments from Railway Station and Google.

Distance as the crow flies is calculated using the Python Geopy package.

Value for Money = distance (miles) / ticket price (£)
By this definition, if the Value for Money ratio is large, then £1 carries you further away from the starting location and thus that journey is particularly good value.

Data is visualized using Tableau Public.

Caveat: These maps are simply intended to illustrate the regional railway values throughout the UK. While I have checked the majority of stations for accuracy in price (as retrieved in July 2014 for 1 October 2014) and location, small inaccuracies might still exist for individual stations.

HOW TO RUN:

Run rail_value.py.

python rail_value.py

This contains a series of functions to scrape ticket price data from the National Rail website, calculate distances between stations, and compute the value for money ratio for each station.

The code checks to make sure that the data does not already exist in the directory, and if it does exist, it skips the website query.

Note, there is a built-in time delay between requests to the National Rail website and this part of the code takes a long time to run. I normally leave it to run overnight.

To re-generate station postcodes, run station_postcodes.py to create a data file “station_postcodes.txt”. Note this only needs to be done once.

RESULTS:

Starting from Cambridge (CBG) - Bursting the bubble

Best value journey: Devonport, Plymouth (224 miles for £25.60)

See Cambridge rail journey viz on Tableau Public

Notes on rail journeys from Cambridge

London King’s Cross (KGX) - Is it cheaper to start in central London?

Best value journey: Solihull, West Midlands (93 miles for £6)

See London King’s Cross rail journey viz on Tableau Public

Notes on rail journeys from London King’s Cross

Birmingham New Street (BHM) - Is it as cheap to leave as to arrive?

Best value journey: Sanquhar, Scotland (217 miles for £12.50)

See Birminham New Street rail journey viz on Tableau Public

Notes on rail journeys from Birmingham

For more details see GitHub repository.