SBB Network Analysis - Part 3

Networks
Author

Martin Sterchi

Published

April 1, 2025

After Part 1 and Part 2, where I demonstrated how to create different versions of static networks, I now want to show how to construct a temporal network representation of the Swiss railway network. If you’ve followed along with the first two parts of this series, the code here should be easy to understand.

The temporal network representation I develop here is based on the space-of-changes approach. In this representation, a directed edge connects each station to all subsequent stations for a given “Fahrt.” Instead of aggregating edges between the same pairs of stations, we retain all edges at different points in time, storing both the start time of each edge, \(t\), and the time required to traverse it, \(\delta t\). This is just one possible way to represent temporal edges (see, for instance, the 2012 overview paper by Petter Holme and Jari Saramäki).

With this temporal network model, finding time-respecting paths between any two nodes closely mirrors what the SBB (Swiss railway) app does when searching for the fastest connections between stations.

But let’s start the practical part now.

import pandas as pd
from collections import Counter, defaultdict

# Check versions of libraries.
print("Pandas version:", pd.__version__)

# Make sure there is no limit on the number of columns shown.
pd.set_option('display.max_columns', None)
Pandas version: 2.1.4

Temporal edgelist

As for the space-of-changes representation, we start by loading the already processed “Ist-Daten” from Part 1 and transform all date-time elements into the right format. Also, we only need a few of the columns, so we reduce the dataframe drastically to only 4 columns.

# Load the processed IST-DATEN.
df = pd.read_csv('ist-daten.csv', low_memory=False)

# Convert BETRIEBSTAG to date format
df['BETRIEBSTAG'] = pd.to_datetime(df['BETRIEBSTAG'])

# Convert ANKUNFTSZEIT, AN_PROGNOSE, ABFAHRTSZEIT, AB_PROGNOSE to datetime format
df['ANKUNFTSZEIT'] = pd.to_datetime(df['ANKUNFTSZEIT'])
df['AN_PROGNOSE'] = pd.to_datetime(df['AN_PROGNOSE'])
df['ABFAHRTSZEIT'] = pd.to_datetime(df['ABFAHRTSZEIT'])
df['AB_PROGNOSE'] = pd.to_datetime(df['AB_PROGNOSE'])

# Reduce to relevant columns.
df = df[["FAHRT_BEZEICHNER","STATION_NAME","ANKUNFTSZEIT","ABFAHRTSZEIT"]]

# Check the dataframe.
df.head()
FAHRT_BEZEICHNER STATION_NAME ANKUNFTSZEIT ABFAHRTSZEIT
0 60402-NZ-8503000-213400 Zürich HB NaT 2025-03-05 21:34:00
1 60402-NZ-8503000-213400 Basel SBB 2025-03-05 22:28:00 2025-03-05 23:13:00
2 60402-NZ-8503000-213400 Basel Bad Bf 2025-03-05 23:19:00 2025-03-05 23:23:00
3 60403-NZ-8400058-191500 Basel Bad Bf 2025-03-06 06:10:00 2025-03-06 06:13:00
4 60403-NZ-8400058-191500 Basel SBB 2025-03-06 06:20:00 2025-03-06 06:45:00

We now use almost the same function as for the space-of-changes representation in order to extract the edges between any station and all its subsequent stations in a given “Fahrt”.

The only difference is that we extract, as the third element of an edge, the start time measured in minutes since the start of the day (2025-03-05 00:00:00). So, a train that departs at one minute past midnight will have the start time 1, as the following code demonstrates:

(pd.to_datetime("2025-03-05 00:01:00") - pd.to_datetime("2025-03-05 00:00:00")).total_seconds() / 60
1.0

Here now the function that we will use to iterate over the “Fahrten”:

# Function to compute (directed) edges according to spaces-of-changes principle.
def get_edges_in_groups(group):
    # Empty list for results of a group.
    results = []
    # Loop over all rows in group.
    for i in range(len(group)):
        # Nested loop over all subsequent rows.
        for j in range(i + 1, len(group)):
            # Now, append edge to results list.
            results.append((
                group.iloc[i]["STATION_NAME"], # Station of origin
                group.iloc[j]["STATION_NAME"], # Station of destination
                # Time of departure in minutes since the day began.
                (group.iloc[i]["ABFAHRTSZEIT"] - pd.to_datetime("2025-03-05 00:00:00")).total_seconds() / 60,
                # Duration in minutes.
                (group.iloc[j]['ANKUNFTSZEIT'] - group.iloc[i]['ABFAHRTSZEIT']).total_seconds() / 60
            ))
    # Return list.
    return results

This function is applied as before for the space-of-changes representation:

# Now apply that function group-wise.
edges_series = df.groupby("FAHRT_BEZEICHNER", group_keys=False).apply(get_edges_in_groups)

We can check the same “Fahrt” between Yverdon-les-Bains and Ste-Croix again.

# Let's check out one FAHRT.
edges_series["85:97:9:000"]
[('Yverdon-les-Bains', 'Vuiteboeuf', 333.0, 10.0),
 ('Yverdon-les-Bains', 'Baulmes', 333.0, 14.0),
 ('Yverdon-les-Bains', 'Six-Fontaines', 333.0, 18.0),
 ('Yverdon-les-Bains', 'Ste-Croix', 333.0, 33.0),
 ('Vuiteboeuf', 'Baulmes', 343.0, 4.0),
 ('Vuiteboeuf', 'Six-Fontaines', 343.0, 8.0),
 ('Vuiteboeuf', 'Ste-Croix', 343.0, 23.0),
 ('Baulmes', 'Six-Fontaines', 347.0, 4.0),
 ('Baulmes', 'Ste-Croix', 347.0, 19.0),
 ('Six-Fontaines', 'Ste-Croix', 351.0, 15.0)]

That train starts at 333 minutes past midnight (which is 05:33). The durations are the same as before in the space-of-changes representation.

The final step before getting the data ready for the export is to flatten all the edges that are currently organized in the form of a Pandas series of lists.

# Flatten the result into one edgelist.
edgelist = [x for l in edges_series.values for x in l]

print("Number of edges:", len(edgelist))
Number of edges: 1110834

The space-of-changes representation code now aggregated duplicate edges. Crucially, this step is omitted here as we want to keep the temporal represenation of edges. Thus, our temporal represenation of the network will have 1’110’834 edges.

The final steps are easy: we change the station names to their BPUIC numbers, we convert both the start time and the duration of an edge to integer values, and we export the dataframe as a CSV file.

# Load the nodelist.
nodes = pd.read_csv("nodelist.csv", sep = ";")

# Create a node dict with BPUIC as values
node_dict = dict(zip(nodes.STATION_NAME, nodes.BPUIC))
# Transform edge dict to nested list and replace all station names with their BPUIC
edges = [[node_dict[e[0]], node_dict[e[1]], int(e[2]), int(e[3])] for e in edgelist]

# Create a dataframe
edges = pd.DataFrame(edges, columns = ['BPUIC1','BPUIC2','START','DURATION'])

# Have a look
edges.head()

# Export edge list
# edges.to_csv("edgelist_temporal.csv", sep = ';', encoding = 'utf-8', index = False)
BPUIC1 BPUIC2 START DURATION
0 8503000 8500010 1294 54
1 8503000 8500090 1294 105
2 8500010 8500090 1393 6
3 8500090 8500010 1813 7
4 8500090 8503000 1813 112

You can download the result here: Temporal Edgelist (CSV).

References

Holme, P., & Saramäki, J. (2012). Temporal networks. Physics Reports, 519(3), 97-125. https://doi.org/10.1016/j.physrep.2012.03.001

The title image has been created by Wikimedia user JoachimKohler-HB and is licensed under Creative Commons.