Reproducible research¶
This guide documents the workflow for the notebook
example_notebooks/impact_of_race_locations.ipynb. The goal is to make results
repeatable, auditable, and easy to refresh.
What the notebook does¶
- Pulls Season 7 Open Singles data.
- Cleans and filters timing fields.
- Compares event distributions for total/work time by gender.
- Fits a regression with event-level interaction effects.
- Produces plots to visualize event differences.
Dependencies¶
The notebook relies on:
pyrox-clientpandas,numpymatplotlib,seabornstatsmodels
If you are using uv (recommended):
uv pip install -e .
If you do not already have Jupyter installed:
uv pip install jupyter
Run the notebook¶
From the repo root:
uv run jupyter notebook example_notebooks/impact_of_race_locations.ipynb
Alternative:
uv run jupyter lab
Then open the notebook and run cells top to bottom.
Data determinism¶
The notebook pulls from the live CDN. To keep results stable across runs:
- Avoid
force_refresh=Trueunless you intend to refresh the dataset. - Keep cached data between runs. Default cache location is
~/.cache/pyrox.
If you need a clean refresh, clear the cache before running:
python - <<'PY'
import pyrox
client = pyrox.PyroxClient()
client.clear_cache()
PY
Inputs and outputs¶
Inputs:
- Season 7 race data (get_season(season=7, division="open")).
- Event list in events_to_analyse.
Outputs:
- Plots in the event_dists/ directory (created automatically).
- Regression summary in the notebook output.
Repro tips¶
- Keep a copy of the
events_to_analyselist in the notebook output so the exact event set is recorded. - If you publish results, consider exporting the dataset to a dated parquet file to preserve the snapshot used for modeling.