After Part 1 and Part 2, where I demonstrated how to create different versions of static networks, I now want to show how to construct a temporal network representation of the Swiss railway network. If you’ve followed along with the first two parts of this series, the code here should be easy to understand.
The temporal network representation I develop here is based on the space-of-changes approach. In this representation, a directed edge connects each station to all subsequent stations for a given “Fahrt.” Instead of aggregating edges between the same pairs of stations, we retain all edges at different points in time, storing both the start time of each edge, \(t\), and the time required to traverse it, \(\delta t\). This is just one possible way to represent temporal edges (see, for instance, the 2012 overview paper by Petter Holme and Jari Saramäki).
With this temporal network model, finding time-respecting paths between any two nodes closely mirrors what the SBB (Swiss railway) app does when searching for the fastest connections between stations.
But let’s start the practical part now.
import pandas as pdfrom collections import Counter, defaultdict# Check versions of libraries.print("Pandas version:", pd.__version__)# Make sure there is no limit on the number of columns shown.pd.set_option('display.max_columns', None)
Pandas version: 2.1.4
Temporal edgelist
As for the space-of-changes representation, we start by loading the already processed “Ist-Daten” from Part 1 and transform all date-time elements into the right format. Also, we only need a few of the columns, so we reduce the dataframe drastically to only 4 columns.
# Load the processed IST-DATEN.df = pd.read_csv('ist-daten.csv', low_memory=False)# Convert BETRIEBSTAG to date formatdf['BETRIEBSTAG'] = pd.to_datetime(df['BETRIEBSTAG'])# Convert ANKUNFTSZEIT, AN_PROGNOSE, ABFAHRTSZEIT, AB_PROGNOSE to datetime formatdf['ANKUNFTSZEIT'] = pd.to_datetime(df['ANKUNFTSZEIT'])df['AN_PROGNOSE'] = pd.to_datetime(df['AN_PROGNOSE'])df['ABFAHRTSZEIT'] = pd.to_datetime(df['ABFAHRTSZEIT'])df['AB_PROGNOSE'] = pd.to_datetime(df['AB_PROGNOSE'])# Reduce to relevant columns.df = df[["FAHRT_BEZEICHNER","STATION_NAME","ANKUNFTSZEIT","ABFAHRTSZEIT"]]# Check the dataframe.df.head()
FAHRT_BEZEICHNER
STATION_NAME
ANKUNFTSZEIT
ABFAHRTSZEIT
0
60402-NZ-8503000-213400
Zürich HB
NaT
2025-03-05 21:34:00
1
60402-NZ-8503000-213400
Basel SBB
2025-03-05 22:28:00
2025-03-05 23:13:00
2
60402-NZ-8503000-213400
Basel Bad Bf
2025-03-05 23:19:00
2025-03-05 23:23:00
3
60403-NZ-8400058-191500
Basel Bad Bf
2025-03-06 06:10:00
2025-03-06 06:13:00
4
60403-NZ-8400058-191500
Basel SBB
2025-03-06 06:20:00
2025-03-06 06:45:00
We now use almost the same function as for the space-of-changes representation in order to extract the edges between any station and all its subsequent stations in a given “Fahrt”.
The only difference is that we extract, as the third element of an edge, the start time measured in minutes since the start of the day (2025-03-05 00:00:00). So, a train that departs at one minute past midnight will have the start time 1, as the following code demonstrates:
Here now the function that we will use to iterate over the “Fahrten”:
# Function to compute (directed) edges according to spaces-of-changes principle.def get_edges_in_groups(group):# Empty list for results of a group. results = []# Loop over all rows in group.for i inrange(len(group)):# Nested loop over all subsequent rows.for j inrange(i +1, len(group)):# Now, append edge to results list. results.append(( group.iloc[i]["STATION_NAME"], # Station of origin group.iloc[j]["STATION_NAME"], # Station of destination# Time of departure in minutes since the day began. (group.iloc[i]["ABFAHRTSZEIT"] - pd.to_datetime("2025-03-05 00:00:00")).total_seconds() /60,# Duration in minutes. (group.iloc[j]['ANKUNFTSZEIT'] - group.iloc[i]['ABFAHRTSZEIT']).total_seconds() /60 ))# Return list.return results
This function is applied as before for the space-of-changes representation:
# Now apply that function group-wise.edges_series = df.groupby("FAHRT_BEZEICHNER", group_keys=False).apply(get_edges_in_groups)
We can check the same “Fahrt” between Yverdon-les-Bains and Ste-Croix again.
# Let's check out one FAHRT.edges_series["85:97:9:000"]
That train starts at 333 minutes past midnight (which is 05:33). The durations are the same as before in the space-of-changes representation.
The final step before getting the data ready for the export is to flatten all the edges that are currently organized in the form of a Pandas series of lists.
# Flatten the result into one edgelist.edgelist = [x for l in edges_series.values for x in l]print("Number of edges:", len(edgelist))
Number of edges: 1110834
The space-of-changes representation code now aggregated duplicate edges. Crucially, this step is omitted here as we want to keep the temporal represenation of edges. Thus, our temporal represenation of the network will have 1’110’834 edges.
The final steps are easy: we change the station names to their BPUIC numbers, we convert both the start time and the duration of an edge to integer values, and we export the dataframe as a CSV file.
# Load the nodelist.nodes = pd.read_csv("nodelist.csv", sep =";")# Create a node dict with BPUIC as valuesnode_dict =dict(zip(nodes.STATION_NAME, nodes.BPUIC))
# Transform edge dict to nested list and replace all station names with their BPUICedges = [[node_dict[e[0]], node_dict[e[1]], int(e[2]), int(e[3])] for e in edgelist]# Create a dataframeedges = pd.DataFrame(edges, columns = ['BPUIC1','BPUIC2','START','DURATION'])# Have a lookedges.head()# Export edge list# edges.to_csv("edgelist_temporal.csv", sep = ';', encoding = 'utf-8', index = False)