Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

Predicting Stock Price and Market Direction using XGBoost Machine Learning Algorithm

4 min read

Forecasting the trajectory of the stock market remains an elusive endeavor for both investors and traders alike. A myriad of methods and algorithms have emerged over time, striving to address this complex issue, and have achieved varying levels of success. In this blog post, we will discuss the XGBoost algorithm and how it performs better than linear regression for predicting market direction. We will also provide a Python code example for predicting the next day’s NIFTY close and direction using XGBoost.

XGBoost Algorithm

XGBoost, short for eXtreme Gradient Boosting, is a powerful machine-learning algorithm that has been gaining significant attention in recent years. XGBoost is an ensemble technique that uses a collection of decision trees to make predictions. It is particularly effective in handling large datasets and can efficiently manage missing values, outliers, and multicollinearity.

The dataset used for the implementation of the model is the NIFTY_EOD.csv file, which consists of open, high, low ,close, volume, PClose, lreg5, lreg7, lreg9, hma5, hma7, hma9, trsi, atr values for a particular stock.

Download the XGBost Regression Features Dataset prepared using Amibroker

Download NIFTY EOD csv data set

Why XGBoost is Better than Linear Regression

Non-linearity: Unlike linear regression, which assumes a linear relationship between features and the target variable, XGBoost can model complex, non-linear relationships. This is particularly helpful for predicting stock market direction, as the underlying relationships between variables are often non-linear.

Robustness: XGBoost is more robust to noise and outliers in the data compared to linear regression. This makes the algorithm better suited for predicting market direction, which can be influenced by various factors that are not always apparent in the historical data.

Regularization: XGBoost includes regularization, which helps prevent overfitting by penalizing complex models. This helps the algorithm generalize better to new data, making it more reliable for predicting market direction.

Handling missing values: XGBoost can automatically handle missing values, making it easier to work with incomplete datasets. In contrast, linear regression often requires imputation or other preprocessing techniques to handle missing values.

Python Code for Predicting NIFTY Close and Direction

import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error, r2_score, explained_variance_score, mean_absolute_error
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
stock = pd.read_csv("NIFTY_EOD.csv")

data = stock


# Split the data into features and target
y = data['close_forecast']
X = data.drop(columns=['Ticker','Date/Time','close_forecast'], axis=1)


# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an XGBoost model
model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42, booster='gbtree')
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Add predicted prices to test data
predicted_prices = X_test.copy()
predicted_prices['Close'] = y_pred



# Calculate evaluation metrics
mape = mean_absolute_percentage_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
explained_var = explained_variance_score(y_test, y_pred)



# Print the evaluation metrics and directional accuracy
print("MAPE:", mape)
print("Mean squared error:", mse)
print("Root mean squared error:", rmse)
print("Mean absolute error:", mae)
print("R-squared:", r2)
print("Explained variance:", explained_var)





# Predict the next day close and direction
next_day = X.tail(1)
next_day_pred = model.predict(next_day)
next_day_close = next_day_pred[0]
print("Next day predicted close:", next_day_close)




# Store the predicted vs actual values and direction in a separate csv
predictions = pd.DataFrame({'Date/Time': X_test.index, 'Actual': y_test.values, 'Predicted': y_pred})
predictions.to_csv("NIFTY_xgboost_predictions.csv", index=False)

# Plot feature importance using Matplotlib
fig, ax = plt.subplots(figsize=(10, 8))
xgb.plot_importance(model, ax=ax, importance_type='gain')
plt.title('Feature Importance')
plt.show()

The provided Python code imports necessary libraries loads NIFTY historical data, and preprocesses the dataset. It then splits the data into training and test sets and trains an XGBoost model. The model’s performance is evaluated using various metrics, including mean absolute percentage error (MAPE), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), R-squared, explained variance, and directional accuracy. The code then predicts the next day’s NIFTY close and direction.

Python Output

MAPE: 0.012514069979435037
Mean squared error: 7843.5028405418425
Root mean squared error: 88.56355255149741
Mean absolute error: 52.471655232747395
R-squared: 0.9996397979447728
Explained variance: 0.9996400254819038
Next day predicted close: 17407.646

Output and Interpretation

The output metrics for the XGBoost prediction algorithm provide valuable insights into the model’s performance in predicting the NIFTY close prices and market direction. Let’s analyze these metrics in detail:

  1. MAPE (Mean Absolute Percentage Error): 0.012514069979435037
    • MAPE is a measure of prediction accuracy in a forecasting model, expressed as a percentage. It is calculated by taking the average of the absolute percentage errors. The value of 0.0125 means that, on average, the model’s predictions deviate by about 1.25% from the actual values.
  2. Mean Squared Error (MSE): 7843.5028405418425
    • MSE is a measure of the difference between the predicted and actual values. It is calculated by taking the average of the squared differences between the predictions and actual values. A lower MSE indicates a better fit of the model. In this case, the MSE is 7843.5.
  3. Root Mean Squared Error (RMSE): 88.56355255149741
    • RMSE is the square root of MSE. It is another measure of the differences between predicted and actual values, and it is useful because it has the same unit as the target variable. In this case, the RMSE is 88.56, which means that, on average, the model’s predictions are off by about 88.56 units from the actual values.
  4. Mean Absolute Error (MAE): 52.471655232747395
    • MAE is the average of the absolute differences between the predicted and actual values. It is a measure of prediction accuracy that is less sensitive to large errors than MSE or RMSE. In this case, the MAE is 52.47, which means that, on average, the model’s predictions are off by about 52.47 units from the actual values.
  5. R-squared: 0.9996397979447728
    • R-squared, also known as the coefficient of determination, is a measure of the proportion of the variance in the target variable that is predictable from the input features. It ranges from 0 to 1, with 1 indicating a perfect fit. In this case, the R-squared value of 0.9996 suggests that the model explains approximately 99.96% of the variation in the target variable.
  6. Explained Variance: 0.9996400254819038
    • Explained variance is a measure of how well the model captures the variance in the target variable. It ranges from 0 to 1, with a higher value indicating better performance. In this case, the explained variance of 0.9996 suggests that the model captures approximately 99.96% of the variance in the target variable.
  7. Next day predicted close: 17407.646
    • This is the predicted closing price of the stock for the next day based on the model’s forecast.


Feature Importance

Feature importance values represent the relative contribution of each feature to the model’s prediction. In the context of the XGBoost algorithm, these values are computed based on the number of times a feature appears in the trees across all the decision trees and the improvement it brings to the model, typically measured by a metric like Gini impurity or information gain.

The XGBoost algorithm offers significant advantages over linear regression for predicting the stock market and market direction, particularly in handling non-linear relationships and providing robustness against noise and outliers. The provided Python code demonstrates how to use XGBoost for predicting the next day’s NIFTY close and direction, and while the model performs well in predicting close prices, it may require further optimization to improve its ability to predict market direction.

Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

[Live Coding Webinar] Build Your First Trading Bridge for…

In this course, you will be learning to build your own trading bridge using Python. This 60-minute session is perfect for traders, Python enthusiasts,...
Rajandran R
1 min read

How to Place Orders Concurrently using ThreadPoolExecutor – Python…

Creating concurrent orders is essential for active traders, especially those handling large funds, as it allows for executing multiple trade orders simultaneously, thereby maximizing...
Rajandran R
2 min read

Host your Python Flask Web Application using pyngrok and…

Ngrok offers several significant advantages for developers, especially when it comes to testing applications or hosting machine learning models. Ngrok allows you to expose...
Rajandran R
1 min read

2 Replies to “Predicting Stock Price and Market Direction using XGBoost Machine…”

Leave a Reply

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More