Harnessing Statistical Arbitrage: A Rigorous Approach to Pairs Trading in the Indian Equity Market
Shant Tondon, a finance professional with a background encompassing financial markets analysis, consulting, and entrepreneurship, has developed and evaluated a sophisticated market-neutral pairs trading strategy. This initiative, rooted in his academic pursuits and professional experience, delves into the intricacies of exploiting temporary price divergences in the Indian equity market, specifically focusing on 25 large-cap stocks across key sectors. The project underscores a commitment to building a robust and transparent algorithmic trading prototype, meticulously addressing common pitfalls that can undermine the efficacy of such strategies.
The core of Tondon’s strategy lies in identifying and trading statistically cointegrated pairs of stocks. This approach aims to maintain a market-neutral exposure, meaning the portfolio’s performance is intended to be largely independent of broad market movements. By betting on the convergence of prices between two historically correlated assets, the strategy seeks to profit from short-term mispricings. The project’s timeframe spans from January 1, 2015, to June 30, 2025, encompassing a significant period of Indian market evolution.
Methodology: A Deep Dive into Pair Selection and Signal Generation
Tondon’s methodology is characterized by its emphasis on statistical rigor and the avoidance of common data-mining biases. The process begins with the selection of potential pairs from a universe of 25 NSE large-cap stocks. These stocks are drawn from diverse sectors including Banking, Information Technology (IT), Pharmaceuticals, Cement, and Automobiles. The initial selection is guided by a residual stationarity test, specifically the Augmented Dickey-Fuller (ADF) test, with a focus on the ADF(0) statistic and its associated MacKinnon p-value. This test is crucial for identifying pairs of assets whose price movements are not merely correlated but are cointegrated, suggesting a long-term equilibrium relationship.
To ensure statistical robustness and to control for the increased likelihood of false positives when conducting multiple statistical tests, Tondon applied the Benjamini-Hochberg False Discovery Rate (FDR) control procedure. This method, set at a 5% significance level, helps to filter out spurious cointegration signals that might arise purely by chance.
The trading strategy itself is built upon a rolling walk-forward validation framework. This approach is designed to simulate real-world trading conditions more accurately than a simple backtest on historical data. A training window of 252 trading days (approximately one year) is used to estimate the statistical parameters, such as the hedge ratio, which defines the relative proportions of the two stocks in a pair. This training data is then used to generate trading signals on a subsequent test period, which consists of a 21-day step. This cyclical process of training and testing, moving forward in time, helps to assess the strategy’s adaptability to changing market conditions.
Hedge Ratio Estimation and Cointegration Testing
During the training phase, the hedge ratio (often denoted as beta, $beta$) is estimated using Ordinary Least Squares (OLS) regression. The model typically regresses the price of one stock (Stock A) against the price of the other (Stock B). The resulting coefficient from this regression provides the estimated hedge ratio. The cointegration relationship is then confirmed by examining the stationarity of the residuals from this regression. The ADF test is applied to these residuals, and the p-value derived from the MacKinnon critical values is used to determine if the residuals are stationary, indicating a cointegrated relationship.
The application of the Benjamini-Hochberg FDR at 5% is a critical step in mitigating the risk of selecting pairs that appear cointegrated due to random chance. In a dataset of 25 stocks, numerous potential pairs can be formed. Without appropriate multiple testing correction, many of these pairs might show statistically significant cointegration that does not persist in out-of-sample trading.
Emergent Pairs and Signal Generation Logic
Following this rigorous selection process, Tondon’s framework identified three highly cointegrated pairs that formed the basis of the trading strategy:
- HDFCBANK.NS vs. KOTAKBANK.NS: Two of India’s largest private sector banks, often exhibiting strong correlation in their performance due to similar business models and market dynamics.
- HEROMOTOCO.NS vs. ULTRACEMCO.NS: A combination of an automotive manufacturer and a cement producer. While seemingly disparate, such pairings can emerge from complex sector-wide economic drivers or specific corporate events that influence their relative valuations.
- HCLTECH.NS vs. ICICIBANK.NS: An IT services company and another major Indian bank. This pairing highlights how inter-sectoral relationships can also lead to cointegration, perhaps driven by broader economic growth trends or capital flows.
The signal generation logic is designed to prevent look-ahead bias, a common pitfall where future information is inadvertently used in historical backtests. To achieve this, the rolling variables used for calculating the mean and standard deviation of the spread (the price difference between the two stocks, adjusted by the hedge ratio) are strictly shifted by one day. This ensures that only information available up to the previous trading day is used to make trading decisions for the current day.
The core of the trading signal is the z-score of the spread. The z-score measures how many standard deviations the current spread is away from its historical mean. When the z-score crosses predefined entry thresholds (e.g., -1.5 for a long position in the spread, meaning Stock A is undervalued relative to Stock B, and +1.5 for a short position), a trade is initiated. The exit signal is typically triggered when the z-score reverts to zero, indicating the spread has returned to its mean.

Python Implementation Snippet
A conceptual Python snippet illustrates the mathematical underpinnings of the strategy:
import pandas as pd
import statsmodels.api as sm
def calculate_signals(train_data, test_data, stock_a, stock_b):
# 1. Estimate Hedge Ratio (Beta) using OLS on Training Window
# We assume train_data is a DataFrame with columns stock_a and stock_b
# We regress stock_a on stock_b to find the hedge ratio
model = sm.OLS(train_data[stock_a], train_data[stock_b]).fit()
beta = model.params[0] # Extract the beta coefficient
# 2. Calculate Out-of-Sample Spread
# Spread formula: S_t = A_t - beta * B_t
# We assume test_data is a DataFrame with columns stock_a and stock_b
spread = test_data[stock_a] - (beta * test_data[stock_b])
# 3. Calculate Z-Score strictly avoiding look-ahead bias
# The rolling mean and std dev are calculated on the spread and then shifted by 1 day
# This ensures that the calculation of the z-score for day 't' only uses information available up to day 't-1'.
rolling_mean = spread.rolling(window=30).mean().shift(1) # Using a 30-day rolling window as an example
rolling_std = spread.rolling(window=30).std().shift(1) # Using a 30-day rolling window as an example
z_score = (spread - rolling_mean) / rolling_std
# 4. Generate Trading Signals based on Z-Score Thresholds
# Entry thresholds: Absolute z-score > 1.5
# Exit threshold: When the z-score crosses 0 (mean reversion)
# The exit_signal logic checks if the z-score at the previous day and the current day have opposite signs,
# indicating a crossover of zero. This is a common way to detect mean reversion exit.
long_entry = z_score < -1.5
short_entry = z_score > 1.5
exit_signal = (z_score.shift(1) * z_score <= 0) # This condition implies a zero crossing
return z_score, long_entry, short_entry, exit_signal
This code snippet highlights the crucial steps: estimating the hedge ratio, calculating the spread, computing the z-score while carefully avoiding look-ahead bias, and defining entry and exit signals based on predefined z-score thresholds. The use of a rolling window for mean and standard deviation calculation, coupled with the shift(1) operation, is key to maintaining the integrity of the backtest.
Portfolio and Risk Management
A critical aspect of Tondon’s project is the explicit inclusion of portfolio-level risk management. The strategy allocates a fixed capital of ₹5,00,000 to each active pair. Transaction costs are meticulously accounted for, with a charge of 5 basis points (bps) per leg per side, reflecting realistic trading expenses. This detailed approach to cost inclusion is vital for assessing the true profitability of a trading strategy.
The backtest period, from January 11, 2016, to June 27, 2025, represents a substantial out-of-sample evaluation period following the initial data availability. This period allows for the observation of the strategy’s performance across various market conditions, including periods of high volatility and relative calm.
Key Findings: Performance Metrics and Analysis
The out-of-sample backtest yielded a set of performance metrics that offer a comprehensive view of the strategy’s effectiveness:
- Capital Base: ₹15,00,000 (total capital allocated across the three pairs).
- Pairs Traded: 3, as identified through the rigorous selection process.
- Total Trades: 271 trades executed over the backtest period.
- Win Ratio: 63.47%. This indicates that a majority of the trades were profitable, a positive sign for a mean-reversion strategy that relies on frequent, smaller wins.
- Total PnL: ₹1,65,544.97. This represents the net profit generated by the strategy.
- PnL / Capital: 11.04%. This metric shows the total profit relative to the initial capital base.
- Annualized Return: 0.30%. This figure, while modest, reflects the strategy’s market-neutral nature, which typically aims for steadier, lower returns compared to directional strategies.
- Annualized Volatility: 13.34%. This measures the degree of price fluctuation in the strategy’s returns.
- Sharpe Ratio: 0.089. The Sharpe ratio quantifies risk-adjusted return, indicating the excess return per unit of risk. A ratio of 0.089 suggests that for the level of risk taken, the excess return was relatively low.
- Max Drawdown: -34.31%. This is a significant metric, representing the largest peak-to-trough decline in the portfolio’s value. A drawdown of this magnitude, especially relative to the annualized return, highlights a key area for improvement.
The performance snapshot reveals a strategy that successfully identifies trading opportunities and generates profits with a good win ratio. However, the low annualized return and high maximum drawdown suggest that while the strategy is statistically sound, its profitability and risk profile could be significantly enhanced.
Challenges and Limitations
Despite the rigorous methodology, Tondon acknowledges several challenges and limitations inherent in such a project. One primary concern is the potential for survivorship bias. The fixed universe of 25 large-cap stocks might exclude companies that were delisted or merged during the observation period. If these excluded companies had performed poorly, their absence would artificially inflate the strategy’s historical performance. A more robust approach would involve using a dynamic universe, such as the constituents of a major index like the Nifty 50 or Nifty 100, which reflects actual market composition at any given point in time.
Another limitation is the inherent assumption of stable cointegration relationships. Market regimes can shift, and relationships that appear cointegrated during the training period may break down in the future. The relatively short 21-day test window may not be sufficient to capture longer-term regime shifts.
The fixed entry and exit thresholds (z-score of +/- 1.5 for entry, 0 for exit) are also a point of consideration. These static levels might not be optimal across all market conditions or for all pairs. Furthermore, the absence of explicit stop-loss rules beyond the mean-reversion exit could lead to larger losses on trades that move persistently against the expected convergence.
Next Steps: Enhancing Strategy Performance
Tondon outlines several strategic enhancements to improve the strategy’s risk-adjusted returns and its real-world applicability:
-
Optimize ADF Lag Selection: Replacing the current ADF(0) shortcut with an information-criterion-based lag selector (such as AIC or BIC) for the ADF test would lead to more accurate identification of cointegration. This could reduce spurious signals and improve the reliability of pair selection, contributing to more stable trade entries.
-
Expand the Universe and Diversify Pairs: Increasing the stock universe beyond the current 25 large-cap stocks to include mid-cap NSE stocks across sectors like Energy, Fast-Moving Consumer Goods (FMCG), and Metals would broaden the pool of potential cointegrated pairs. Enhanced diversification can reduce the impact of any single pair’s underperformance and improve overall portfolio stability.
-
Introduce Dynamic Position Sizing: Moving away from a fixed allocation of ₹5,00,000 per pair towards volatility-scaled sizing (e.g., inverse-volatility weighting or Kelly criterion) would allow the strategy to allocate more capital to pairs exhibiting stronger mean-reversion signals or tighter spreads. This could significantly improve the Sharpe ratio and mitigate drawdowns.
-
Refine Entry/Exit Thresholds Adaptively: The static z-score thresholds are a potential area for optimization. Implementing an adaptive threshold model, where entry and exit levels are calibrated to each pair’s rolling volatility or identified market regime (trending versus mean-reverting), could filter out lower-quality signals and potentially improve the win ratio beyond the current 63.47%.
-
Incorporate Stop-Loss Rules: The substantial maximum drawdown of -34.31% relative to the annualized return necessitates the inclusion of stop-loss mechanisms. Implementing pair-level stop-losses, such as exiting a trade when the z-score breaches a higher threshold (e.g., +/- 3.0) or when an unrealized loss exceeds a predetermined percentage of allocated capital, would cap downside risk during adverse regime shifts and improve the Sharpe ratio.
-
Address Survivorship Bias with a Rolling Universe: To ensure more realistic forward-looking performance estimates, the strategy should incorporate a rolling universe approach. This means using a point-in-time constituent list of major indices like the Nifty 50 or Nifty 100 for each training window, rather than a fixed set of stocks that may have survived the entire period. This would eliminate survivorship bias.
Continuous Learning and Future Exploration
For individuals interested in building upon the concepts explored in this project—statistical arbitrage, cointegration testing, and mean-reversion strategy development—a structured learning path is recommended. Foundational application guides such as "Python for Trading Basics" and "Mean Reversion Trading Strategy" by Dr. Ernest P. Chan offer practical insights into building and evaluating statistical models in financial contexts.
For those seeking to move beyond supervised models and delve into more complex quantitative techniques, advanced learning tracks on Algorithmic Trading and Factor-Based Investing provide deeper understanding of strategies that adapt across market regimes. Enhancing modeling and evaluation skills can be achieved through courses on "Quantitative Portfolio Management" and "Backtesting Trading Strategies."
For hands-on learning with industry guidance, curated learning paths in "Quantitative Trading" and "Artificial Intelligence in Trading" offer end-to-end training from data handling to model deployment. For aspiring quantitative traders aiming to replicate such end-to-end projects with expert mentorship, the Executive Programme in Algorithmic Trading (EPAT) provides a comprehensive curriculum covering essential components like Python, statistics, machine learning, and real-world trading applications.
Disclaimer: The information presented in this project is accurate and complete to the best of the student’s knowledge. All recommendations are provided without guarantee from the student or QuantInsti®. Both the student and QuantInsti® disclaim any liability in connection with the use of this information. The content is for informational purposes only, and no guarantee is made that the provided guidance will result in a specific profit.



