Analytics

A focused set of workflows to extract signal from race data.

1. Segment z-scores

Normalize each station split relative to the field to see strengths and weaknesses.

import numpy as np
import pandas as pd

race = client.get_race(season=7, location="london", gender="male")

split_cols = [
    "skiErg_time",
    "sledPush_time",
    "sledPull_time",
    "burpeeBroadJump_time",
    "rowErg_time",
    "farmersCarry_time",
    "sandbagLunges_time",
    "wallBalls_time",
]

field = race[split_cols].astype(float)

means = field.mean()
stds = field.std(ddof=0).replace(0, np.nan)

z = (field - means) / stds
race_z = pd.concat([race[["name"]], z], axis=1)

2. Rank percentiles by total time

race = client.get_race(season=7, location="london")

race = race.sort_values("total_time")

race["percentile"] = race["total_time"].rank(pct=True)

3. Create pace buckets

bins = [0, 55, 60, 65, 70, 999]
labels = ["<55", "55-60", "60-65", "65-70", "70+"]

race["pace_bucket"] = pd.cut(race["total_time"], bins=bins, labels=labels)
summary = race.groupby("pace_bucket")["total_time"].agg(["count", "mean"])

4. Build a station share profile

Identify which stations dominate a given athlete's time.

athlete = client.get_athlete_in_race(
    season=7,
    location="london",
    athlete_name="surname, name",
)

split_cols = [
    "skiErg_time",
    "sledPush_time",
    "sledPull_time",
    "burpeeBroadJump_time",
    "rowErg_time",
    "farmersCarry_time",
    "sandbagLunges_time",
    "wallBalls_time",
]

row = athlete.iloc[0]
share = row[split_cols] / row[split_cols].sum()