Testing My Bike Split Calculator Against Real Data
By Tom Norton
April 29th, 2026
A few weeks ago, to plan for some races I have this year, I built a free bike split calculator that estimates finish time from a GPX route, power targets, and a handful of physics parameters. It's based on a physical model including drag, weight, rolling resistance, elevation, grade etc. It produces real-looking results, but I've been wondering:
Do its predictions match real rides?
So I pulled four of my own rides, fed each one into the calculator at my actual average power for the day, and compared the predicted finish time to the time I actually rode. To keep things honest I refactored the calculator into a pure function and called the same code from a Node script that parses FIT and GPX files, so the validation runs the same simulation the web page runs.
The setup
I picked four rides that varied in length and climbing density:
- A short hilly loop, 32 km
- Another short hilly loop, 35 km
- A medium-length climb-heavy ride, 56 km with one long climb
- A long mountainous ride, 101 km with a 30 km descent
Across all rides I held the parameters constant:
- FTP: 300W (used for IF/TSS context. The prediction is driven by avg power directly.)
- Rider weight: 85-88kg depending on the ride
- Bike weight: 9kg
- CdA: 0.35 m² (my normal hoods position)
- Crr: 0.0035 (modern slicks)
- Drivetrain loss: 2%
- Air density: 1.16 kg/m³ (warm-ish days)
- Headwind: 0 (none reported)
For each ride I asked: at my actual average moving power, what finish time does the calculator predict? Then I compared to actual moving time from Strava (which strips out auto-pause for traffic lights and food stops).
The four rides
| Ride | Distance | Climbing | Climbing density | Avg power | Moving time | Predicted | Δ |
|---|---|---|---|---|---|---|---|
| Getting the Suffering In | 35 km | +695 m | 20 m/km | 184W | 1:32:41 | 1:31:14 | −1.6% |
| Internal Center Lock Disc Brakes | 32 km | +667 m | 21 m/km | 220W | 1:17:43 | 1:15:03 | −3.4% |
| Monty P and More with the Girlies | 56 km | +1,261 m | 23 m/km | 169W | 2:49:31 | 2:36:47 | −7.5% |
| Here, There and Everywhere | 101 km | +1,472 m | 15 m/km | 175W | 4:13:18 | 3:55:33 | −7.0% |
The calculator is consistently optimistic on these rides. Never once did it predict a slower time than I actually rode. The error sits between 1.6% and 7.5%, and it grows with duration.
What's driving the bias
The model is constant power plus physics. It doesn't see:
- Cornering on descents. A 7% descent at my weight and CdA hits ~70 km/h in the model. In real life I'm braking for hairpins, sight-line corners, and other traffic. On the 30 km descent in ride 4 I was averaging 50 km/h. The model expected 70.
- Stops inside "moving time". Strava's auto-pause kicks in at near-zero speed, but a 5 km/h crawl through a village still counts as moving and tanks your average. The model has no concept of villages.
- Fatigue. A 4-hour effort at 175W avg is hiding the fact that the last hour was probably 165W with the same perceived effort as the first hour at 185W. The calculator assumes you can hold the average forever.
- Variable pacing helping on hills. This one actually works in the calculator's favour on hilly rides. Surging on climbs at higher-than-avg power saves more time than the same wattage saves on flats. The 7.5% gap on the Mont Pèlerin would be wider if I'd ridden it perfectly steady.
The first three pulls predictions to be optimistic. The last pushes them to be slightly pessimistic. Add it up and you get the 2-8% optimistic gap I see on real rides.
The shape of the error
Two patterns from four rides. Small sample, but they line up:
- Error grows roughly with duration. A 1.5-hour ride is within ~2-3%. A 4-hour ride lands closer to 7%. Real-world overhead doesn't compound, but it does accumulate. The longer you're out, the more time piles up in cornering, braking, and sitting up.
- Climbing density adds friction. The 56 km ride had the highest climbing density (23 m/km) and the worst error (−7.5%). The 101 km ride had only 15 m/km and similar error. More climbs means more descents, and descents are where the model is weakest.
A reasonable heads-up to put on a prediction: expect 2% baseline error, plus ~1-1.5% per hour, plus more if the route is descent-heavy. The number you see on screen is the floor for what's possible if everything goes right. Real rides land slower.
What I did about it
1. Results are now a three-point prediction range. The headline now shows a likely time with optimistic and worst-case below it, instead of a single confident-looking number. The likely figure is the physics prediction plus a real-world overhead that grows with duration: roughly 2% baseline plus ~1.2% per predicted hour. The worst-case adds a larger pad for technical descents and bad days. The race plan, fueling calculator, and target-time check all run off the likely time so the planning numbers stay self-consistent. Section-by-section times in the breakdown table stay as the physics estimate, since pacing power output is what those rows are for.
2. Cornering is still the unfinished business. Modelling corner radius from GPX point geometry and capping local speed at sqrt(μ·g·r) is the biggest remaining lever for descent realism. It's also the hardest piece, so I'm sequencing it after the easier wins. Once it's in, the gap between optimistic and likely should narrow on technical courses.
Try it yourself
If you want to validate the calculator against one of your own rides, it's here. Pull up a GPX from a ride where you had power data, enter your actual average power as the climb / flat / descent power, and see how the predicted time compares to what you actually did. If your error is meaningfully outside the 2-8% range I'm seeing, I'd like to hear about it.