In this tutorial, we discussed how to bring Cointegration statistics into Amibroker using Amipy and how to interpret the values returned by the Augmented Dickey Fuller test.
Cointegration is used in Statistical Arbitrage to find best Pair of Stocks (Pair Trading) to go long in one stock and short(Competitive peers) another to generate returns.
Tools required to Compute Cointegration in Amibroker
1)Amipy v1.0.1 (64-bit) – Download Amibroker 64 bit Plugin
2)Amibroker (64 Bit) v6.3 or higher
3)Python 3.8 (64-bit) or higher
Python Libraries needs to be installed
1)Numpy
2)Pandas
3)Statmodels
sometime back did a detailed AmiPy Installation Procedure to send data from Amibroker to python program to do complex statistical computations and return the values back to Amibroker.
What is Statistical Arbitrage?
Statistical Arbitrage is nothing but pair trading based on their relation of stock with one another.
Often, the stock price of companies in the same sector or type of business follows one another very closely. Cointegration is a better way to observe the relationship between two stocks and buys or sells. whenever the relationship gets out of sync, acting on the assumption that the spread will revert back to the mean.
What is Co-Integration?
Co-Integration helps in identifying best stock pairs where the spread could revert to mean value. Co-Integration looks for stationary pair where the mean of the spread is fixed. Whenever the spread is deviating from the mean it generates trading opportunity and the spread will possibly revert back to the mean value.
What is Stationarity?
Most of the financial trading instruments are non-stationary i.e mostly unpredictable whereas Stationarity is more of a predictable time series and which satisfies the following conditions
1) has a constant mean
2) has a constant variance
3) There is no seasonality observed
Let me explain with a funny example which explains Co-Integration in a better way. “A drunken man is walking on the road along with his dog chained and tied up with the drunkard’s hand. When the man is drunk and he is expected to walk random and the chained dog is also expected to walk random(assume a small little puppy 🙂 ). The maximum distance between them could be the length of rope holding the chained dog and it is always fixed. Whenever the distance/spread between the Drunken Man and the Dog goes near to the max distance we can expect a mean reversion in the distance to the mean” In simple words the drunken man and the dog both are Co-Integrated.
If two stocks are highly correlated then both the stocks will move in the same direction most of the time however the magnitude of the moves is unknown and spread can keep increasing as long as it could as shown in the above example. However Co-Integration looks for mean reversion in the spread/distance and the spreads are tradeable. Augmented Dicky Fuller test is generally used to identify with a certain level of confidence whether the spread between two stocks or time series is stationary and cointegrated or not.
Augmented Dickey-Fuller (ADF) Test
The Augmented Dicky Fuller test is a hypothesis test that a signal contains a unit root, we want to reject this hypothesis. The test gives a pValue, the lower this number the more confident we can be that we have found a stationary signal. P-values less than 0.5 are considered to be good mean-reverting stock pairs. Some of the experts even look for values P-values less than 0.1. P-values above 0.1 are likely to be non-statinary and trading such stock pairs are not advisable.
The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the alternative that there is no unit root. If the pvalue is above a critical size, then we cannot reject that there is a unit root.
The p-values are obtained through regression surface approximation from MacKinnon 1994, but using the updated 2010 tables. If the p-value is close to significant, then the critical values should be used to judge whether to reject the null.
Values returned by ADF test using statmodels python library
Parameters | Interpretation |
adf | The test statistic. |
pvalue | MacKinnon”s approximate p-value based on MacKinnon (1994, 2010) |
usedlag | The number of lags used. |
nobs | The number of observations used for the ADF regression and calculation of the critical values |
critical values | Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010). |
icbest | The maximized information criterion if autolag is not None. |
Above image shows Cointegration Statistics between ICICI Bank and HDFC Bank since the last 75 trading sessions which shows the P-Value is less than 0.05 (Highly Co-Integrated) and thereby rejecting the null hypothesis (stationary data)
Dashboard also shows the that ADF test statistic value -2.9379 is greater than the critical values 5% – -2.9020 which indicate a possibly best pair to look for mean reversion in the spread.
The second example showing Infy and TCS Hourly Future charts with P-Value 0.25 is greater than 0.05 threshold which indicates the data is non-stationary (that means it has relation with time).
Computing Co-Integration in Amibroker
Since Co-Integration is a statistical model it is relatively difficult to code in AFL Programming Language we rely on AmiPy 64 bit Amibroker plugin and statistical computing python packages like numpy(to handle arrays), Pandas(to handle time-series data) and statsmodels(to do ADF test) where the close arrays of two stock pair are passed from Amibroker and the CoIntegration is computed by python and revert back to Amibroker.
Cointegration – Amibroker AFL
//Coded by Rajandran R
//Date : 27th Aug 2020
//Requirements
//Used Amipy v0.2.0 (64-bit) - Download from https://forum.amibroker.com/t/amipy-plug-in-python-integration/20337/32
//Amibroker (64 Bit) v6.3 or higher
//Python 3.8 or higher
//ADF Test Return Values
//adffloat - The test statistic.
//pvaluefloat - MacKinnon”s approximate p-value based on MacKinnon (1994, 2010).
//usedlagint - The number of lags used.
//nobsint - The number of observations used for the ADF regression and calculation of the critical values.
//critical valuesdict - Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010).
//icbestfloat - The maximized information criterion if autolag is not None.
_SECTION_BEGIN("Cointegration");
Version(6.3);
SetChartOptions(0,chartShowArrows|chartShowDates);
SetBarsRequired(-2,-2);
//call the python file for execution
PyLoadFromFile("cointegration","C:\\Python\\Code\\cointegration.py");
_N( Symbol1= ParamStr("Symbol1", "HDFC.NS") );
_N( Symbol2= ParamStr("Symbol2", "HDBK.NS") );
Color1 = ParamColor("Color1",colorGreen);
Color2 = ParamColor("Color2",coloryellow);
//Get the Foreign Price values
SetForeign(symbol1);
symC1 = Close;
_N(sym1 = FullName());
RestorePriceArrays( True );
SetForeign(symbol2);
symC2 = Close;
_N(sym2 = FullName());
RestorePriceArrays( True );
period = Param("Cointegration Lookback",40,1,300,1); //lookback for cointegration
//pass the arrays and numbers to the python functions
adfteststatic = PyEvalFunction("cointegration","adf",symC1,symC2,period,0);
pvalue = PyEvalFunction("cointegration","adf",symC1,symC2,period,1);
usedlag = PyEvalFunction("cointegration","adf",symC1,symC2,period,2);
nobs = PyEvalFunction("cointegration","adf",symC1,symC2,period,3);
criticalvalues1 = PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"1%");
criticalvalues5= PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"5%");
criticalvalues10 = PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"10%");
icbest = PyEvalFunction("cointegration","adf",symC1,symC2,period,5);
//pvalue = PyEvalFunction("pvalue",symC1,symC2,period);
//Plot Correlation and CoIntegration in a Dashboard
GfxSetBkMode( 0 );
GfxSelectFont( "Tahoma", 13, 100 );
GfxSetTextColor( colorWhite );
GfxSelectPen( colorGreen, 2 );
GfxSelectSolidBrush( colorgreen );
GfxRectangle( 10, 20, 250, 250 );
GfxTextOut( "Cointegration Stats",23,23);
GfxTextOut( "ADF Test Statistic: " + NumToStr(adfteststatic,1.4,separator=false),23,48);
GfxTextOut( "P- Value : " + NumToStr(pvalue,1.4,separator=false),23,73);
GfxTextOut( "Lags: " + NumToStr(usedlag,1.0,separator=false),23,98);
GfxTextOut( "Observations : " + NumToStr(nobs,1.0,separator=false),23,123);
GfxTextOut( "Critical Values 1% : " + NumToStr(criticalvalues1,1.4,separator=false),23,148);
GfxTextOut( "Critical Values 5% : " + NumToStr(criticalvalues5,1.4,separator=false),23,173);
GfxTextOut( "Critical Values 10% : " + NumToStr(criticalvalues10,1.4,separator=false),23,198);
GfxTextOut( "icbest : " + NumToStr(icbest,1.4,separator=false),23,223);
Plot(symC1,sym1,Color1,styleLine|styleownscale);
Plot(symC2,sym2,Color2,styleLine|styleownscale);
_SECTION_END();
Coinegration.py
#Python Code for cointegration, adftest, critical values - Values will be returned to Amibroker
import numpy as np
import pandas as pd
import statsmodels
from statsmodels.tsa.stattools import coint
import statsmodels.api as stat
import statsmodels.tsa.stattools as ts
from datetime import date
def adf(array1,array2,lookback,element):
df1=pd.DataFrame.from_records({'Close': array1})
df2=pd.DataFrame.from_records({'Close': array2})
result = stat.OLS(df1[['Close']].tail(int(lookback)), df2[['Close']].tail(int(lookback))).fit()
a = ts.adfuller(result.resid)
return a[int(element)]
def criticalvalues(array1,array2,lookback,element,str):
df1=pd.DataFrame.from_records({'Close': array1})
df2=pd.DataFrame.from_records({'Close': array2})
result = stat.OLS(df1[['Close']].tail(int(lookback)), df2[['Close']].tail(int(lookback))).fit()
a = ts.adfuller(result.resid)
return a[int(element)][str]
Sample IPython Notebook to compute Cointegration below using NSEPy without Amibroker:
[iframe src=”https://www.marketcalls.in/wp-content/uploads/2020/08/Cointegration.html”]Sample values are verified with the Amibroker using Datalink as the data source for the HDFC and HDFC Bank Pair
Hope this article help you lean more about co-integration.
References
Quantopian – Cointegration notebook