Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

AdaBoost – Ensembling Methods in Machine Learning for Stock Market Prediction using Python

7 min read

Predicting the stock market has been a challenging task due to its complex, dynamic, and non-linear nature. Many researchers have tried various machine learning techniques to improve the accuracy and reliability of stock market predictions. One promising approach is the use of ensembling methods, which combine multiple models to achieve better performance.

In this article, we will discuss how ensembling methods, specifically bagging, boosting, stacking, and blending, can be applied to enhance stock market prediction. And How AdaBoost improves the stock market prediction using a combination of Machine Learning Algorithms Linear Regression (LR), K-Nearest Neighbours (KNN), and Support Vector Regression (SVR) and How the models are combined using the ensemble method, AdaBoostRegressor, to improve overall prediction accuracy.

Bagging

Bagging, or Bootstrap Aggregating, is an ensemble method that involves generating multiple models from different bootstrapped subsets of the training data. These models are trained independently, and their predictions are combined through averaging (for regression problems) or voting (for classification problems). Bagging helps reduce the variance in predictions by averaging out the errors from multiple models. In stock market prediction, bagging can be applied by training multiple models, such as decision trees or neural networks, on different subsets of historical stock data. The final prediction is obtained by aggregating the individual model predictions, resulting in a more stable and accurate forecast.

Boosting

Boosting is another ensemble method that focuses on reducing bias in the model by iteratively adjusting the weights of misclassified data points. This technique creates a sequence of weak learners, each attempting to correct the errors made by its predecessor. The final prediction is a weighted combination of the individual weak learners. For stock market prediction, boosting techniques like AdaBoost or Gradient Boosting can be employed to train a series of models on historical stock data. The boosting algorithm assigns higher importance to instances where previous models have made incorrect predictions, ensuring that subsequent models focus on these challenging cases. This results in an overall improvement in prediction accuracy.

Stacking

Stacking, also known as Stacked Generalization, is an ensemble method that combines multiple models with different learning algorithms to maximize their complementary strengths. In stacking, base models are trained on the same dataset, and their predictions are used as input for a higher-level model, called the meta-model. The meta-model learns how to optimally combine the base model predictions to generate the final output. For stock market prediction, one can train various base models, such as linear regression, support vector machines, and neural networks, on historical stock data. A meta-model, like a logistic regression or another neural network, can then be trained on these base model predictions to achieve a more accurate and robust forecast.

Ensembling methods in machine learning, such as bagging, boosting, and stacking, have shown great potential in improving the accuracy and reliability of stock market predictions. By combining multiple models and leveraging their complementary strengths, ensemble techniques can mitigate the shortcomings of individual models, resulting in a more robust and accurate prediction.

Adaboost – Ensembling Method

AdaBoost, short for Adaptive Boosting, is an ensemble learning method that combines multiple weak learners to form a stronger, more accurate model. Initially designed for classification problems, it can be adapted for regression tasks like stock market price prediction. The algorithm works iteratively, training a sequence of weak learners (such as linear regression) and updating their weights based on the prediction errors. The final model is a weighted combination of these weak learners.

Here’s how AdaBoost can help in stock market price prediction:

  1. Enhancing predictive power: By combining multiple weak learners, AdaBoost can capture complex relationships in the stock market data, potentially resulting in more accurate predictions.
  2. Handling noisy data: Stock market data can be noisy, with many irrelevant features and outliers. AdaBoost’s adaptive learning mechanism can be more robust against noise, focusing on the most informative features and down-weighting the impact of outliers.
  3. Interpretability: The implicit feature selection performed by AdaBoost can result in a more interpretable model, making it easier to identify the most relevant factors driving stock market price movements.
  4. Versatility: AdaBoost can be combined with various base learners, making it a flexible method that can be tailored to different stock market prediction problems.
  5. Scalability: The algorithm can be parallelized and is relatively fast to train, making it scalable to large datasets.
AdaBoost Actual Vs Predicted Stock Price

Advantages of using AdaBoost for stock market price prediction include:

Improved accuracy: The ensemble approach can potentially provide better predictive accuracy compared to individual base models, reducing the chances of overfitting and capturing a broader range of patterns in the data.

Robustness to noise: The iterative nature of the AdaBoost algorithm enables it to be more robust against noise and outliers, improving the overall performance on diverse data distributions.

Adaptive learning: AdaBoost assigns higher weights to misclassified or poorly predicted instances in each iteration, encouraging subsequent models to focus more on these challenging examples.

Simple base learners: AdaBoost can work effectively with simple base models, such as linear regression, making the overall ensemble computationally efficient while still achieving good performance.

Feature selection: AdaBoost can implicitly perform feature selection by focusing on the most informative features during the learning process, resulting in a more interpretable and efficient final model.

AdaBoost can be sensitive to noisy data and outliers, so it’s crucial to preprocess and clean the data carefully before using it for prediction.

Adaboost Ensembling using the combination of Linear Regression, Support Vector Regression, K Nearest Neighbors Algorithms – Python Source Code

This Python script is using various machine learning algorithms to predict the closing prices of a stock, given its historical features dataset and almost 34 features (Technical Indicators) stored in the features dataset fileNIFTY_EOD.csv“.

The code imports necessary libraries and modules for data manipulation, visualization, and machine learning. The primary algorithms used for predictions are Linear Regression, K-Nearest Neighbors(KNN), and Support Vector Regression(SVR).

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score, r2_score
from sklearn.metrics import mean_absolute_percentage_error

from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression # <-- Import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import AdaBoostRegressor
import joblib

from numpy.random import seed
import tensorflow as tf

seed(42)
tf.random.set_seed(42)

# Load the data
stock = pd.read_csv("NIFTY_EOD.csv")

data = stock

# Split the data into features and target
y = data['close_forecast']
X = data.drop(columns=['Ticker','Date/Time','close_forecast'], axis=1)

X_copy = X

# Normalize the features
scaler = MinMaxScaler()
X = scaler.fit_transform(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the AdaBoost Regressors
lr_model = AdaBoostRegressor(estimator=LinearRegression(), n_estimators=50, random_state=42) # <-- Change to LinearRegression
lr_model.fit(X_train, y_train)

knn_model = AdaBoostRegressor(estimator=KNeighborsRegressor(n_neighbors=3), n_estimators=50, random_state=42)
knn_model.fit(X_train, y_train)

svr_model = AdaBoostRegressor(estimator=SVR(kernel='rbf', C=1e3, gamma=0.3), n_estimators=50, random_state=42)
svr_model.fit(X_train, y_train)

joblib.dump(lr_model, './ensemble/lr_model.joblib') # <-- Change to lr_model
joblib.dump(knn_model, './ensemble/knn_model.joblib')
joblib.dump(svr_model, './ensemble/svr_model.joblib')

def ensemble_predict(X):
    lr_pred = lr_model.predict(X) # <-- Change to lr_pred
    knn_pred = knn_model.predict(X)
    svr_pred = svr_model.predict(X)
    # Get the importance weights of the regressors
    lr_weight = lr_model.estimator_weights_[0] # <-- Change to lr_weight
    knn_weight = knn_model.estimator_weights_[0]
    svr_weight = svr_model.estimator_weights_[0]
    # Compute the weighted average of the predictions
    weighted_pred = (lr_weight * lr_pred + knn_weight * knn_pred + svr_weight * svr_pred) # <-- Change to lr_weight and lr_pred
    return weighted_pred / (lr_weight + knn_weight + svr_weight)

# Calculate predictions
y_pred_close = ensemble_predict(X_test)

# Calculate accuracy metrics
mse = mean_squared_error(y_test, y_pred_close)
mape = mean_absolute_percentage_error(y_test, y_pred_close) * 100
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_close)

print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("Mean Absolute Percentage Error (MAPE):", mape)
print("R2 Score:", r2)


#Actual Vs Predicted Results

train_split = int(X_copy.shape[0] * 0.8)
actual = X_copy[train_split:]['Close']
actual = actual.reset_index(drop=True) #reset the index
actual = actual.to_frame(name='Close') # Convert the Series to a DataFrame and set the column name to 'Close'

next_day_pred = X_copy[train_split:]
next_day_pred = scaler.transform(next_day_pred)
next_day_forecast = ensemble_predict(next_day_pred)
next_day_forecastdata = pd.DataFrame(next_day_forecast, columns=['predicted'])

# Print results
print("Previous Day Close:", actual.iloc[-1]['Close'])
print("Predicted Next Day Close:", next_day_forecastdata.iloc[-1]['predicted'])


#print("length - actual :" + str(actual.shape[0]) + " predicted :" + str(y_pred_close.shape[0]))

# Plot the predicted vs actual close values
plt.figure(figsize=(10, 5))
plt.plot(actual, label='Actual Close')
plt.plot(next_day_forecastdata, label='Predicted Close')
plt.xlabel('Time')
plt.ylabel('Price')
plt.title('Actual vs Predicted Close')
plt.legend()
plt.show()

The code uses the ensemble method to combine predictions from three different models (Linear Regression, K-Nearest Neighbors, and Support Vector Regression). The ensemble_predict function computes the weighted average of the predictions based on the importance weights of the models. Finally, the script visualizes the actual and predicted closing prices, allowing you to compare the model’s performance.

Python OutPut

Mean Squared Error (MSE): 7696.585379615255
Root Mean Squared Error (RMSE): 87.73018511102809
Mean Absolute Percentage Error (MAPE): 2.105069920226974
R2 Score: 0.9996662213566196
Previous Day Close: 17599.15
Predicted Next Day Close: 17715.394023985627

The Python script calculates several evaluation metrics to assess the performance of the ensemble model for predicting stock closing prices. Here’s an explanation of the output:

  1. Mean Squared Error (MSE): 7696.585379615255 MSE is the average of the squared differences between the actual and predicted closing prices. It’s a common measure for evaluating regression models’ performance. A lower value indicates better performance, with 0 being a perfect fit. In this case, the MSE is 7696.59.
  2. Root Mean Squared Error (RMSE): 87.73018511102809 RMSE is the square root of the MSE. It measures the average deviation of the predicted values from the actual values. The lower the RMSE, the better the model’s performance. In this case, the RMSE is 87.73, which means, on average, the predictions deviate from the actual values by 87.73 units.
  3. Mean Absolute Percentage Error (MAPE): 2.105069920226974 MAPE is the average of the absolute percentage errors between the actual and predicted closing prices. It’s expressed as a percentage and is useful for comparing errors across different scales. A lower MAPE indicates better performance. In this case, the MAPE is 2.11%, which means the predictions deviate from the actual values by an average of 2.11%.
  4. R2 Score: 0.9996662213566196 R2 score, also known as the coefficient of determination, measures how well the predicted values fit the actual data. It ranges from 0 to 1, with 1 indicating a perfect fit and 0 meaning the model doesn’t explain any variability in the data. In this case, the R2 score is 0.9997, which suggests that the ensemble model explains approximately 99.97% of the variability in the closing prices.
  5. Previous Day Close: 17599.15 This is the actual closing price of the stock on the last day of the dataset.
  6. Predicted Next Day Close: 17715.394023985627 This is the predicted closing price of the stock for the next day, as estimated by the ensemble model.

The output suggests that the ensemble model performs well in predicting stock closing prices, as evidenced by the low error metrics and high R2 score. The plot of actual vs. predicted closing prices would provide a visual representation of the model’s performance over time.

Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

[Live Coding Webinar] Build Your First Trading Bridge for…

In this course, you will be learning to build your own trading bridge using Python. This 60-minute session is perfect for traders, Python enthusiasts,...
Rajandran R
1 min read

How to Place Orders Concurrently using ThreadPoolExecutor – Python…

Creating concurrent orders is essential for active traders, especially those handling large funds, as it allows for executing multiple trade orders simultaneously, thereby maximizing...
Rajandran R
2 min read

Host your Python Flask Web Application using pyngrok and…

Ngrok offers several significant advantages for developers, especially when it comes to testing applications or hosting machine learning models. Ngrok allows you to expose...
Rajandran R
1 min read

2 Replies to “AdaBoost – Ensembling Methods in Machine Learning for Stock…”

  1. Good Morning Sir,

    i’m getting error ” File “C:\bktesT\ML\ADABoostRegressor.py”, line 49, in
    joblib.dump(lr_model, ‘./ensemble/lr_model.joblib’) # <– Change to lr_model
    FileNotFoundError: [Errno 2] No such file or directory: './ensemble/lr_model.joblib'". though I've created input as from amibroker exploration renamed it to Nifty_EOD.csv and saved both python and csv in same folder. Can you pls help.

    Thanku

    1. The error message you’re encountering indicates that the file or directory you’re trying to access does not exist. In this case, the error occurs when you’re trying to save a machine learning model using joblib.

      Here’s a breakdown of the error message:

      File “C:\bktesT\ML\ADABoostRegressor.py”, line 49: This tells you that the error is occurring in the file ADABoostRegressor.py at line 49.

      joblib.dump(lr_model, ‘./ensemble/lr_model.joblib’): This line of code is where the error is originating. You are trying to save the lr_model using the joblib.dump function, and you are specifying the file path as ‘./ensemble/lr_model.joblib’.

      FileNotFoundError: [Errno 2] No such file or directory: ‘./ensemble/lr_model.joblib’: This part of the error message indicates that the file or directory ‘./ensemble/lr_model.joblib’ does not exist.

      To resolve this issue, you need to make sure that the specified directory and file path exist before you try to save the model using joblib.dump. Here are a few steps you can take to address the problem:

      Check Directory Existence: Verify that the ‘ensemble’ directory exists in the current working directory where your script is being executed.

      Create Directory: If the ‘ensemble’ directory doesn’t exist, you can create it using code similar to this:

      import os

      # Create the 'ensemble' directory if it doesn't exist
      os.makedirs('ensemble', exist_ok=True)

      Save Model: Once you’ve ensured that the directory exists, you can proceed to save the model:

      joblib.dump(lr_model, './ensemble/lr_model.joblib')

      Make sure you adapt the code to match your use case and directory structure. This should help you avoid the FileNotFoundError by ensuring that the necessary directory and file path exist before saving the model.

Leave a Reply

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More