Backtesting Methodology
Introduction
Backtesting is the process of testing a trading strategy against historical data to evaluate its performance. While backtesting can't guarantee future results, proper methodology helps identify strategies with genuine edge versus those that simply got lucky. This guide covers backtesting concepts, common pitfalls, and best practices.
What is Backtesting?
Definition: Simulating a trading strategy on historical data to see how it would have performed.
Purpose:
- Validate strategy logic
- Estimate potential returns
- Identify weaknesses
- Optimize parameters
- Build confidence before live trading
What Backtesting Can Do:
- ✅ Test if strategy logic works
- ✅ Identify optimal parameters
- ✅ Estimate risk metrics
- ✅ Compare different approaches
- ✅ Find edge in historical data
What Backtesting Cannot Do:
- ❌ Guarantee future performance
- ❌ Account for all market conditions
- ❌ Predict black swan events
- ❌ Replace live trading experience
- ❌ Eliminate all risk
Backtesting Process
1. Define Strategy
Clear Rules:
Entry: Price > EMA(50) AND RSI > 50
Exit: Price < EMA(50) OR Stop Loss hit
Position Size: 2% of capital
Stop Loss: 3%
Must Be Objective:
- No discretion
- No "if it looks good"
- Programmable rules
- Repeatable logic
2. Select Data
Data Requirements:
- Quality: Accurate, clean data
- Quantity: Sufficient history (2+ years)
- Granularity: Match trading timeframe
- Completeness: No gaps or missing data
Data Period:
Minimum: 1 year
Recommended: 2-3 years
Ideal: 5+ years (multiple market cycles)
Timeframe Selection:
Scalping: Tick or 1-minute data
Day Trading: 1-5 minute data
Swing Trading: 15-minute to hourly data
Position Trading: Daily data
3. Run Backtest
Configuration:
{
"startDate": "2021-01-01",
"endDate": "2023-12-31",
"initialBalance": 100000,
"symbols": ["NSE:RELIANCE", "NSE:TCS"],
"slippage": 0.1, // 0.1% slippage
"commission": 0.05 // 0.05% commission
}
Execution:
- Process each candle sequentially
- Apply entry/exit rules
- Track positions and P&L
- Record all trades
- Calculate metrics
4. Analyze Results
Key Metrics:
- Total Return
- Win Rate
- Profit Factor
- Sharpe Ratio
- Maximum Drawdown
- Average Win/Loss
- Number of Trades
Performance Evaluation:
Good Strategy:
- Win Rate: 45-60%
- Profit Factor: >1.5
- Sharpe Ratio: >1.0
- Max Drawdown: <20%
- Consistent across periods
5. Validate
Out-of-Sample Testing:
In-Sample: 2021-2022 (optimize)
Out-of-Sample: 2023 (validate)
If OOS performance similar to IS → Good
If OOS much worse → Overfitted
Walk-Forward Analysis:
- Optimize on period 1
- Test on period 2
- Repeat rolling forward
- Ensures robustness
Common Pitfalls
1. Look-Ahead Bias
Problem: Using future information not available at trade time
Examples:
// ❌ BAD: Using today's close to make today's decision
if (close[0] > sma[0]) {
// This uses today's close which isn't known until day end
}
// ✅ GOOD: Using previous close
if (close[1] > sma[1]) {
// Uses yesterday's data, available at today's open
}
How to Avoid:
- Only use data available at decision time
- Be careful with indicators that "repaint"
- Use previous candle for signals
- Test with realistic execution timing
2. Survivorship Bias
Problem: Only testing on stocks that still exist today
Example:
Testing on current NSE 50 stocks
Excludes companies that failed or delisted
Inflates historical returns
How to Avoid:
- Use point-in-time universe
- Include delisted stocks
- Test on index constituents as of each date
- Account for corporate actions
3. Curve Fitting (Over-Optimization)
Problem: Finding parameters that work perfectly on historical data but fail in live trading
Example:
Testing RSI periods from 2 to 50
Finding RSI(17) gives best results
But RSI(16) and RSI(18) perform poorly
→ Likely curve-fitted
Signs of Curve Fitting:
- Too many parameters
- Very specific parameter values
- Performance degrades with small changes
- Perfect backtest results
- Complex rules
How to Avoid:
- Use standard parameters (14, 20, 50, 200)
- Limit optimization variables
- Test parameter robustness
- Prefer simple strategies
- Validate out-of-sample
4. Data Snooping
Problem: Testing multiple strategies until one works
Example:
Test 100 different strategies
One shows 50% annual return
But it's just random luck
How to Avoid:
- Define strategy before testing
- Limit strategy variations
- Use statistical significance tests
- Validate with Monte Carlo
- Be skeptical of amazing results
5. Ignoring Transaction Costs
Problem: Not accounting for slippage, commissions, taxes
Reality:
Backtest: 30% annual return
After costs:
- Commissions: -2%
- Slippage: -3%
- Taxes: -5%
Actual: 20% annual return
How to Avoid:
- Include realistic commissions
- Add slippage (0.1-0.5%)
- Account for bid-ask spread
- Consider market impact
- Factor in taxes
6. Unrealistic Assumptions
Problem: Assuming perfect execution, infinite liquidity
Unrealistic:
- All orders filled at exact price
- No slippage on large orders
- Instant execution
- Trading any size
Realistic:
- Slippage on market orders
- Partial fills possible
- Execution delays
- Position size limits
Validation Techniques
1. Out-of-Sample Testing
Method:
Total Data: 2020-2023 (4 years)
Split:
In-Sample: 2020-2022 (3 years) - Optimize
Out-of-Sample: 2023 (1 year) - Validate
Never optimize on OOS data!
Evaluation:
IS Performance: 25% return, 15% drawdown
OOS Performance: 22% return, 18% drawdown
Similar results → Strategy is robust
OOS much worse → Strategy is overfitted
2. Walk-Forward Analysis
Method:
Period 1 (2020): Optimize → Test on 2021
Period 2 (2021): Optimize → Test on 2022
Period 3 (2022): Optimize → Test on 2023
Calculate Walk-Forward Efficiency (WFE)
WFE Calculation:
WFE = OOS Performance / IS Performance
>0.7: Good (OOS is 70%+ of IS)
0.5-0.7: Acceptable
<0.5: Poor (significant degradation)
3. Monte Carlo Simulation
Method:
- Take actual trades from backtest
- Randomize order of trades
- Run 1000+ simulations
- Analyze distribution of results
Purpose:
- Assess luck vs skill
- Calculate confidence intervals
- Identify robustness
- Estimate risk of ruin
Interpretation:
Original: 30% return
Monte Carlo: 25% median, 15-40% range
If original is in top 10% → Likely lucky
If original is near median → Robust strategy
4. Different Market Conditions
Test Across:
- Bull markets (2020-2021)
- Bear markets (2022)
- Sideways markets (2019)
- High volatility (2020 crash)
- Low volatility (2017)
Consistent Performance:
Bull: 25% return
Bear: -5% return (better than market)
Sideways: 10% return
Strategy works in multiple conditions ✓
5. Multiple Instruments
Test On:
- Different stocks
- Different sectors
- Different market caps
- Different exchanges
Robust Strategy:
Works on:
- Large caps (RELIANCE, TCS)
- Mid caps (DIXON, POLYCAB)
- Different sectors (Tech, Finance, Energy)
Not just one lucky stock
Best Practices
1. Realistic Assumptions
Include:
{
"slippage": 0.1, // 0.1% per trade
"commission": 0.05, // 0.05% per trade
"minPrice": 10, // Avoid penny stocks
"maxPositionSize": 10, // Max 10% per position
"executionDelay": 1 // 1 candle delay
}
2. Sufficient Data
Minimum Requirements:
- 2+ years of data
- 100+ trades
- Multiple market conditions
- Complete data (no gaps)
More Data = Better:
- 5+ years ideal
- 500+ trades preferred
- Full market cycles
- Various volatility regimes
3. Simple Strategies
Prefer:
- 2-3 indicators maximum
- Standard parameters
- Clear logic
- Few rules
Avoid:
- 10+ conditions
- Highly specific parameters
- Complex combinations
- Many exceptions
4. Parameter Robustness
Test:
RSI(14): 25% return
RSI(13): 23% return
RSI(15): 24% return
Robust! Small changes don't break strategy.
vs.
RSI(14): 25% return
RSI(13): 5% return
RSI(15): 3% return
Curve-fitted! Only works with exact parameter.
5. Statistical Significance
Minimum Requirements:
- 100+ trades
- 2+ years
- Positive expectancy
- Consistent across periods
T-Test:
Test if returns are statistically different from zero
p-value < 0.05 → Statistically significant
6. Risk-Adjusted Returns
Don't Just Look at Returns:
Strategy A: 40% return, 30% drawdown
Strategy B: 25% return, 10% drawdown
Strategy B is better (risk-adjusted)
Use Sharpe Ratio:
Sharpe = (Return - Risk-Free Rate) / Std Deviation
>1.0: Good
>2.0: Excellent
>3.0: Outstanding
7. Document Everything
Record:
- Strategy logic
- Parameters tested
- Optimization process
- Results and metrics
- Assumptions made
- Validation methods
Why:
- Reproducibility
- Learning from failures
- Avoiding repeated mistakes
- Compliance and auditing
Interpreting Results
Good Backtest Results
Characteristics:
- Consistent returns across periods
- Reasonable drawdowns (<20%)
- Sufficient number of trades (100+)
- Works on multiple instruments
- Robust to parameter changes
- Similar IS and OOS performance
- Positive risk-adjusted returns
Example:
Period: 2020-2023
Total Return: 80% (20% annualized)
Win Rate: 52%
Profit Factor: 1.8
Sharpe Ratio: 1.5
Max Drawdown: 15%
Number of Trades: 250
WFE: 0.75
Warning Signs
Red Flags:
- Too good to be true (>100% annual)
- Perfect or near-perfect win rate (>80%)
- Very few trades (<50)
- Works on only one stock
- Sensitive to parameters
- Large IS/OOS performance gap
- Inconsistent across periods
Example:
Period: 2020-2023
Total Return: 500% (125% annualized) ⚠️
Win Rate: 95% ⚠️
Profit Factor: 10.0 ⚠️
Number of Trades: 15 ⚠️
WFE: 0.2 ⚠️
Likely overfitted or data snooped!
Realistic Expectations
Annual Returns:
- Excellent: 30-50%
- Good: 20-30%
- Acceptable: 15-20%
- Poor: <15%
Win Rate:
- High: 55-65%
- Medium: 45-55%
- Low: 35-45%
Sharpe Ratio:
- Excellent: >2.0
- Good: 1.0-2.0
- Acceptable: 0.5-1.0
- Poor: <0.5
From Backtest to Live Trading
1. Paper Trading
Before Going Live:
- Run strategy in paper mode
- Monitor for 1-2 weeks minimum
- Verify execution logic
- Check for bugs
- Confirm performance matches backtest
2. Small Position Sizes
Start Conservative:
Backtest: 2% risk per trade
Live Start: 0.5% risk per trade
Gradually increase as confidence builds
3. Monitor Closely
Track:
- Actual vs expected performance
- Execution quality
- Slippage amounts
- Any unexpected behavior
4. Accept Variance
Understand:
- Live results will differ from backtest
- Short-term variance is normal
- Judge over 50+ trades minimum
- Some drawdown is expected
5. When to Stop
Stop If:
- Drawdown exceeds backtest by 50%
- Strategy logic appears broken
- Market conditions fundamentally changed
- Consistent underperformance (100+ trades)
Summary
Key Principles:
- Realistic Assumptions: Include costs, slippage, delays
- Avoid Biases: Look-ahead, survivorship, curve-fitting
- Validate Thoroughly: OOS, walk-forward, Monte Carlo
- Keep It Simple: Fewer parameters, standard values
- Test Robustness: Multiple periods, instruments, conditions
- Statistical Significance: 100+ trades, 2+ years
- Risk-Adjusted: Focus on Sharpe ratio, not just returns
- Document Process: Record everything for learning
- Be Skeptical: If it looks too good, it probably is
- Paper Trade First: Validate before risking real money
Backtesting Checklist:
- Strategy rules clearly defined
- Sufficient historical data (2+ years)
- Realistic costs included
- No look-ahead bias
- Out-of-sample validation
- Walk-forward analysis
- Monte Carlo simulation
- Multiple instruments tested
- Parameter robustness checked
- Results documented
- Paper trading completed
Related Documentation
- How to Run a Backtest - Step-by-step guide
- How to Interpret Backtest Results - Understanding metrics
- Walk-Forward Optimization - Advanced validation
- Monte Carlo Simulation - Robustness testing