Reproducible research¶

This guide documents the workflow for the notebook example_notebooks/impact_of_race_locations.ipynb. The goal is to make results repeatable, auditable, and easy to refresh.

What the notebook does¶

Pulls Season 7 Open Singles data.
Cleans and filters timing fields.
Compares event distributions for total/work time by gender.
Fits a regression with event-level interaction effects.
Produces plots to visualize event differences.

Dependencies¶

The notebook relies on:

pyrox-client
pandas, numpy
matplotlib, seaborn
statsmodels

If you are using uv (recommended):

uv pip install -e .

If you do not already have Jupyter installed:

uv pip install jupyter

Run the notebook¶

From the repo root:

uv run jupyter notebook example_notebooks/impact_of_race_locations.ipynb

Alternative:

uv run jupyter lab

Then open the notebook and run cells top to bottom.

Data determinism¶

The notebook pulls from the live CDN. To keep results stable across runs:

Avoid force_refresh=True unless you intend to refresh the dataset.
Keep cached data between runs. Default cache location is ~/.cache/pyrox.

If you need a clean refresh, clear the cache before running:

python - <<'PY'
import pyrox
client = pyrox.PyroxClient()
client.clear_cache()
PY

Inputs and outputs¶

Inputs: - Season 7 race data (get_season(season=7, division="open")). - Event list in events_to_analyse.

Outputs: - Plots in the event_dists/ directory (created automatically). - Regression summary in the notebook output.

Repro tips¶

Keep a copy of the events_to_analyse list in the notebook output so the exact event set is recorded.
If you publish results, consider exporting the dataset to a dated parquet file to preserve the snapshot used for modeling.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search