Skip to main content

SNCF Open Data: train station attendance

· 5 min read
Hugo Le Moine

Passengers per french train station in 2018

Data

Libraries

The following libraries are imported:

  • pandas and numpy for data processing
  • plotly.colors to use a specific color scale
  • plotly.graph_object for data visualization
import pandas as pd						
import numpy as np
import plotly.colors
import plotly.graph_objects as go

Processing

1. Reading csv files

df_frequentation = pd.read_csv('data/frequentation-gares.csv', sep=';')
df_gares = pd.read_csv('data/referentiel-gares-voyageurs.csv', sep=';')

Sample data from df_frequentation

Nom de la gareCode UIC completCode postalSegmentation DRG 2018Total Voyageurs 2018Total Voyageurs + Non voyageurs 2018Total Voyageurs 2017Total Voyageurs + Non voyageurs 2017Total Voyageurs 2016Total Voyageurs + Non voyageurs 2016Total Voyageurs 2015Total Voyageurs + Non voyageurs 2015
0Abancourt8731375960220c402284022843760437604109641096.5516143972039720
1Agay8775755983530c150931509314154141541924019240.5143701912119121
2Agde8778127834300a588297735372697091871364660656825820.929253662516828146
3Agonac8759515724460c149214921583158311341134.69999611271127
4Aigrefeuille Le Thou8748519317290c18670186701451314513266266.15714400

Sample data from df_gares

Code plate-formeIntitulé gareIntitulé fronton de gareGare DRGGare étrangèreAgence gareRégion SNCFUnité gareUTNbre plateformes...Longitude WGS84Latitude WGS84Code UICTVSSegment DRGNiveau de serviceSOPRGDate fin validité plateformeWGS 84
000007-1Bourg-MadameBourg-MadameTrueFalseAgence Grand SudREGION LANGUEDOC-ROUSSILLONUG Languedoc RoussillonBOURG MADAME GARE1...1.94867042.43240787784876BMDc1.0NaNGARES C LANGUEDOC ROUSSILLONNaN42.4324069,1.9486704
100014-1Bolquère - EyneBolquère - EyneTrueFalseAgence Grand SudREGION LANGUEDOC-ROUSSILLONUG Languedoc RoussillonBOLQUERE EYNE GARE1...2.08755942.49787387784801BQEc1.0NaNGARES C LANGUEDOC ROUSSILLONNaN42.4978734,2.0875591
200015-1Mont-Louis - La CabanasseMont-Louis - La CabanasseTrueFalseAgence Grand SudREGION LANGUEDOC-ROUSSILLONUG Languedoc RoussillonMONT LOUIS LA CABANASSE GARE1...2.11313842.50209087784793MTCc1.0NaNGARES C LANGUEDOC ROUSSILLONNaN42.5020902,2.1131379
300020-1Thuès les BainsThuès les BainsTrueFalseAgence Grand SudREGION LANGUEDOC-ROUSSILLONUG Languedoc RoussillonTHUES LES BAINS GARE1...2.24909442.52880187784744THBc1.0NaNGARES C LANGUEDOC ROUSSILLONNaN42.5288009,2.249094

2. Merging dataframes

The UIC Code is a unique ID for train stations. However, the column names are different in both files, so it's mandatory so specify the left_on and right_on arguments.

df = df_gares.merge(
right=df_frequentation,
left_on='Code UIC',
right_on='Code UIC complet',
how='inner')

3. Filtering

In order to avoid keeping small train stations, I chose to filter out stations with attendance below 1000 passengers in 2018. For visualization purpose, I added a column holding the square root of the number of passengers per station

df = df[df['Total Voyageurs 2018'] > 1000]

4. Adding a category column

By using pandas.cut data can be split into categories according to total number of passengers. This will allow to plot with a different color for each category.

df['category'] = pd.cut(df['Total Voyageurs 2018'], bins=[1e4, 1e5, 1e6, 1e7, np.inf])

Visualization

Plotly is a handy tool when it comes to creating interactive graphs and plots, that you can embed in other websites.

1. Scatter Mapbox

Data contain latitude and longitude: these will be used to plot train stations on the map. The size of the bubbles will depend on the square root of the number of passengers in 2018. A different trace is added for each of the categories defined above. Finally, information shown on mouse-hovering is defined using hovertemplate.

fig = go.Figure()
colors = plotly.colors.sequential.Viridis

for i, cat in enumerate(df.category.cat.categories):
df_sub = df[df.category == cat]
fig.add_trace(go.Scattermapbox(
lat=df_sub['Latitude WGS84'],
lon=df_sub['Longitude WGS84'],
text=df_sub['Intitulé gare'],
marker=dict(
color=colors[2*i+1],
size=np.sqrt(df_sub['Total Voyageurs 2018 sqrt']),
sizemin=1,
sizeref=15,
sizemode='area',
opacity=.8,
),
meta=df_sub['Total Voyageurs 2018'],
hovertemplate="%{text}" + "<br>" + "Passengers: %{meta}",
name=f'> {cat.left:1.0e} passengers',
))

2. Layout

The last step is adding the background map, the title, margins around the plot, and the initial position & zoom.

fig.update_layout(
mapbox_style="open-street-map",
title='Passengers per french train station in 2018',
margin={'l': 0, 'r': 0, 't': 50, 'b': 0},
mapbox=dict(
center={'lon': 2.39, 'lat': 47.09},
zoom=4
),
)

Link to the Jupyter notebook.