# Computing Cointegration and Augmented Dickey Fuller test in Amibroker using Python

In this tutorial, we discussed how to bring Cointegration statistics into Amibroker using Amipy and how to interpret the values returned by the Augmented Dickey Fuller test.

Cointegration is used in Statistical Arbitrage to find best Pair of Stocks (Pair Trading) to go long in one stock and short(Competitive peers) another to generate returns.

Tools required to Compute Cointegration in Amibroker

2)Amibroker (64 Bit) v6.3 or higher
3)Python 3.8 (64-bit) or higher

Python Libraries needs to be installed

1)Numpy
2)Pandas
3)Statmodels

sometime back did a detailed AmiPy Installation Procedure to send data from Amibroker to python program to do complex statistical computations and return the values back to Amibroker.

What is Statistical Arbitrage?

Statistical Arbitrage is nothing but pair trading based on their relation of stock with one another.

Often, the stock price of companies in the same sector or type of business follows one another very closely. Cointegration is a better way to observe the relationship between two stocks and buys or sells. whenever the relationship gets out of sync, acting on the assumption that the spread will revert back to the mean.

What is Co-Integration?

Co-Integration helps in identifying best stock pairs where the spread could revert to mean value. Co-Integration looks for stationary pair where the mean of the spread is fixed. Whenever the spread is deviating from the mean it generates trading opportunity and the spread will possibly revert back to the mean value.

What is Stationarity?

Most of the financial trading instruments are non-stationary i.e mostly unpredictable whereas Stationarity is more of a predictable time series and which satisfies the following conditions
1) has a constant mean
2) has a constant variance
3) There is no seasonality observed

Let me explain with a funny example which explains Co-Integration in a better way. “A drunken man is walking on the road along with his dog chained and tied up with the drunkard’s hand. When the man is drunk and he is expected to walk random and the chained dog is also expected to walk random(assume a small little puppy 🙂 ). The maximum distance between them could be the length of rope holding the chained dog and it is always fixed. Whenever the distance/spread between the Drunken Man and the Dog goes near to the max distance we can expect a mean reversion in the distance to the mean” In simple words the drunken man and the dog both are Co-Integrated.

If two stocks are highly correlated then both the stocks will move in the same direction most of the time however the magnitude of the moves is unknown and spread can keep increasing as long as it could as shown in the above example. However Co-Integration looks for mean reversion in the spread/distance and the spreads are tradeable. Augmented Dicky Fuller test is generally used to identify with a certain level of confidence whether the spread between two stocks or time series is stationary and cointegrated or not.

The Augmented Dicky Fuller test is a hypothesis test that a signal contains a unit root, we want to reject this hypothesis. The test gives a pValue, the lower this number the more confident we can be that we have found a stationary signal. P-values less than 0.5 are considered to be good mean-reverting stock pairs. Some of the experts even look for values P-values less than 0.1. P-values above 0.1 are likely to be non-statinary and trading such stock pairs are not advisable.

The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the alternative that there is no unit root. If the pvalue is above a critical size, then we cannot reject that there is a unit root.

The p-values are obtained through regression surface approximation from MacKinnon 1994, but using the updated 2010 tables. If the p-value is close to significant, then the critical values should be used to judge whether to reject the null.

Values returned by ADF test using statmodels python library

Above image shows Cointegration Statistics between ICICI Bank and HDFC Bank since the last 75 trading sessions which shows the P-Value is less than 0.05 (Highly Co-Integrated) and thereby rejecting the null hypothesis (stationary data)

Dashboard also shows the that ADF test statistic value -2.9379 is greater than the critical values 5% – -2.9020 which indicate a possibly best pair to look for mean reversion in the spread.

The second example showing Infy and TCS Hourly Future charts with P-Value 0.25 is greater than 0.05 threshold which indicates the data is non-stationary (that means it has relation with time).

Computing Co-Integration in Amibroker

Since Co-Integration is a statistical model it is relatively difficult to code in AFL Programming Language we rely on AmiPy 64 bit Amibroker plugin and statistical computing python packages like numpy(to handle arrays), Pandas(to handle time-series data) and statsmodels(to do ADF test) where the close arrays of two stock pair are passed from Amibroker and the CoIntegration is computed by python and revert back to Amibroker.

Cointegration – Amibroker AFL

``````//Coded by Rajandran R
//Date : 27th Aug 2020
//Requirements
//Amibroker (64 Bit) v6.3 or higher
//Python 3.8 or higher

//pvaluefloat - MacKinnon”s approximate p-value based on MacKinnon (1994, 2010).

//usedlagint - The number of lags used.

//nobsint - The number of observations used for the ADF regression and calculation of the critical values.

//critical valuesdict - Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010).

//icbestfloat - The maximized information criterion if autolag is not None.

_SECTION_BEGIN("Cointegration");

Version(6.3);

SetChartOptions(0,chartShowArrows|chartShowDates);
SetBarsRequired(-2,-2);

//call the python file for execution

_N( Symbol1= ParamStr("Symbol1", "HDFC.NS") );
_N( Symbol2= ParamStr("Symbol2", "HDBK.NS") );

Color1 =  ParamColor("Color1",colorGreen);
Color2 =  ParamColor("Color2",coloryellow);

//Get the Foreign Price values

SetForeign(symbol1);
symC1 = Close;
_N(sym1 = FullName());
RestorePriceArrays( True );

SetForeign(symbol2);
symC2 = Close;
_N(sym2 = FullName());
RestorePriceArrays( True );

period = Param("Cointegration Lookback",40,1,300,1); //lookback for cointegration

//pass the arrays and numbers to the python functions
criticalvalues1 = PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"1%");
criticalvalues5= PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"5%");
criticalvalues10 = PyEvalFunction("cointegration","criticalvalues",symC1,symC2,period,4,"10%");
//pvalue = PyEvalFunction("pvalue",symC1,symC2,period);

//Plot Correlation and CoIntegration in a Dashboard

GfxSetBkMode( 0 );
GfxSelectFont( "Tahoma", 13, 100 );
GfxSetTextColor( colorWhite );

GfxSelectPen( colorGreen, 2 );
GfxSelectSolidBrush( colorgreen );
GfxRectangle( 10, 20, 250, 250 );

GfxTextOut( "Cointegration Stats",23,23);
GfxTextOut( "P- Value : " + NumToStr(pvalue,1.4,separator=false),23,73);
GfxTextOut( "Lags: " + NumToStr(usedlag,1.0,separator=false),23,98);
GfxTextOut( "Observations : " + NumToStr(nobs,1.0,separator=false),23,123);
GfxTextOut( "Critical Values 1% : " + NumToStr(criticalvalues1,1.4,separator=false),23,148);
GfxTextOut( "Critical Values 5% : " + NumToStr(criticalvalues5,1.4,separator=false),23,173);
GfxTextOut( "Critical Values 10% : " + NumToStr(criticalvalues10,1.4,separator=false),23,198);
GfxTextOut( "icbest : " + NumToStr(icbest,1.4,separator=false),23,223);

Plot(symC1,sym1,Color1,styleLine|styleownscale);
Plot(symC2,sym2,Color2,styleLine|styleownscale);

_SECTION_END();``````

Coinegration.py

``````#Python Code for cointegration, adftest, critical values - Values will be returned to Amibroker

import numpy as np
import pandas as pd
import statsmodels
from statsmodels.tsa.stattools import coint
import statsmodels.api as stat
import statsmodels.tsa.stattools as ts
from datetime import date

df1=pd.DataFrame.from_records({'Close': array1})
df2=pd.DataFrame.from_records({'Close': array2})
result = stat.OLS(df1[['Close']].tail(int(lookback)), df2[['Close']].tail(int(lookback))).fit()
return a[int(element)]

def criticalvalues(array1,array2,lookback,element,str):
df1=pd.DataFrame.from_records({'Close': array1})
df2=pd.DataFrame.from_records({'Close': array2})
result = stat.OLS(df1[['Close']].tail(int(lookback)), df2[['Close']].tail(int(lookback))).fit()
return a[int(element)][str]

``````

Sample IPython Notebook to compute Cointegration below using NSEPy without Amibroker:

Sample values are verified with the Amibroker using Datalink as the data source for the HDFC and HDFC Bank Pair

References

Quantopian – Cointegration notebook

How to interpret adfuller test results?

## [Course] Designing a Stock Market Trading Dashboard App using…

Designing a Stock Market App using Python is a hands-on course that guides you through the development of a functional stock market application. Over...

## Exploring the Essential Python Libraries for Data Analytics

Python has emerged as a powerhouse due to its versatility, ease of use, and extensive library support. Whether you're manipulating data, visualizing trends, performing...