QUANTITATIVE TOOL
Bailey, Borwein, López de Prado & Zhu (2017). Quantify the probability that your backtest result is an artifact of data mining, not genuine alpha. Combines selection bias and parameter overfitting into a single number.
— THE PROBLEM
Every parameter you tune and every configuration you try inflates your in-sample Sharpe. Test 50 strategies with 10 parameters each, and the winner will look spectacular, even if none have real edge. This calculator quantifies exactly how much of your observed performance is likely noise.
— CALCULATOR
The annualized Sharpe of your best backtest. This is the number you're hoping is real.
Number of trading days used.
Every time you changed settings and re-ran counts as one. Be honest — this is the whole point of the tool.
Things you tweaked: lookback window, entry threshold, stop-loss, position size, etc. Count each one.
0 = symmetric. Negative = occasional large losses. Most strategies: between −1 and 0.
3 = normal distribution. Above 3 = more extreme days than expected. Most strategies: between 3 and 6.
— RESULTS
Probability that your strategy's edge is an artifact of data mining.
Verdict
High overfitting risk. The observed performance is likely explained by data mining. Do not allocate capital based on this backtest.
Haircut Sharpe
0.0000
Sharpe after removing selection bias and parameter inflation. What you should actually expect out-of-sample.
Selection bias
1.6176
Expected max SR from N backtests under null (zero skill).
Parameter inflation
1.5875
SR inflation from fitting k free parameters to T observations.
Total SR inflation
3.2051
Combined expected Sharpe inflation from testing multiple strategies and tuning parameters.
Minimum backtest length
6,711
Days needed for your SR to be significant at 95% given current N and k.
— METHODOLOGY
Testing N strategies and keeping the best one inflates the expected maximum Sharpe by ≈ √(2·ln(N)) / √T. With 100 backtests over 250 days, you'd expect SR ≈ 0.19 purely by chance — from strategies with zero true edge.
Each free parameter your strategy uses (lookback window, threshold, MA length, etc.) fits to in-sample noise. The inflation is approximately √(k/T). A strategy with 10 parameters over 500 days inflates SR by ≈ 0.14.
The total Sharpe threshold SR₀ combines both biases. We test whether your observed Sharpe significantly exceeds this threshold using the Probabilistic Sharpe Ratio framework, accounting for non-normal returns. PBO = 1 − Φ(z_adjusted).
— REFERENCE TABLE
Expected spurious Sharpe Ratio for different combinations of backtests and parameters (T = 500 days, normal returns).
| Backtests \ Params | k=0 | k=2 | k=5 | k=10 | k=20 |
|---|---|---|---|---|---|
| N=1 | 0.000 | 0.063 | 0.100 | 0.141 | 0.200 |
| N=5 | 0.053 | 0.117 | 0.153 | 0.195 | 0.253 |
| N=10 | 0.070 | 0.134 | 0.170 | 0.212 | 0.270 |
| N=50 | 0.102 | 0.165 | 0.202 | 0.243 | 0.302 |
| N=100 | 0.113 | 0.177 | 0.213 | 0.255 | 0.313 |
| N=500 | 0.137 | 0.200 | 0.237 | 0.278 | 0.337 |
Values = expected spurious Sharpe Ratio (SR₀). Your observed SR must exceed these thresholds to have genuine edge.
— FAQ
— REFERENCES
Get started
AuditZK computes institutional-grade metrics from verified exchange data — Sharpe, drawdown, VaR, Monte Carlo — with cryptographic attestation. Not backtested. Real.