Pandas is an open-source Python library that is widely used for data manipulation and analysis. One of the most popular features of Pandas is the DataFrame. It is a two-dimensional table-like data structure that allows you to store and manipulate data in a way that is similar to a spreadsheet.
In this tutorial, we will be learning how to retrieve the stock data in pandas dataframe format, dataframe structure, selecting columns, selecting rows, filtering data, adding a column, removing a column, retrieving current trading price & prev day close price, and visualizing data using a simple line chart.
A DataFrame consists of rows and columns, where each row represents a unique observation or record, and each column represents a variable or feature of that observation. You can think of a DataFrame as a spreadsheet, where each row is a record, and each column is a field.
In this tutorial, we will learn how to create and manipulate DataFrames using yfinance and plot using seaborn python library.
Before we begin, you will need to have the following:
- Python is installed on your computer.
- Jupyter Notebook (VS code Editor or Google Colab or any other IDE installed).
- Basic knowledge of Python syntax.
use the pip command as shown below to install the python libraries
pip install pandas yfinance seaborn matplotlib
Before we create a DataFrame, we need to import the necessary libraries. We will be using Pandas, yfinance seaborn, and matplotlib.
import pandas as pd import yfinance as yf import seaborn as sns import matplotlib.pyplot as plt
Creating a DataFrame
There are several ways to create a DataFrame. In this tutorial, In this tutorial, we will use yfinance to fetch stock historical data and create DataFrames.
Creating a DataFrame from yfinance seaborn
yfinance is a Python library that allows you to download financial data from Yahoo Finance. To create a DataFrame from yfinance , we can use the
df = yf.download('ZOMATO.NS', start='2021-01-01', end='2023-02-25') # Display the first 5 rows of the DataFrame df.head()
This will download the stock data for Zomato from Yahoo Finance and create a DataFrame.
To view the entire DataFrame, you can use the
This will display the entire DataFrame.
To select a specific column in the DataFrame, you can use the column name in square brackets.
# Select the 'close' column df['Close']
To select a specific row in the DataFrame, you can use the
To filter the data in the DataFrame, you can use boolean indexing.
# Filter the data where the 'Close' price is greater than 150 df[df['Close'] > 150]
This will filter the data where the ‘Close’ price is greater than 150.
Adding a Column
To add a new column to the DataFrame, you can use the square bracket notation to assign a new column name and the values for that column.
# Calculate the daily return and add it as a new column df['Daily Return'] = df['Close'].pct_change() # Display the first 5 rows of the DataFrame df.head()
This will calculate the daily return and add it as a new column to the DataFrame.
Print Column Headers
To print the column headers use df.columns
Resting the Index of the Dataframe
It seems that the ‘Date’ column is not present in your dataframe. This might be due to the fact that when we download data using the
yf.download method, the date column is automatically set as the index of the dataframe.
To confirm this, you can check the index of your dataframe by running:
This will print the index of the dataframe, which should include dates.
If you want to add a separate ‘Date’ column to your dataframe, you can reset the index of the dataframe using the
df = df.reset_index()
To visualize data in the DataFrame, you can use the
plt functions. Ensure Date and Close is present in the column where Date column is used for plotting x-axis and the close column as the y-axis co-ordinates.
# Create a line plot of the 'Close' price sns.lineplot(x='Date', y='Close', data=df) plt.show()
Retrieving the Current Close and Previous Closing Value
To retrieve the current close and previous closing value from the dataframe, we can use the
iloc function of pandas. The
iloc function is used to select rows and columns from the dataframe by integer index positions.
Here’s an example code snippet:
# Retrieve the current close and previous closing value current_close = df['Close'].iloc[-1] previous_close = df['Close'].iloc[-2] # Print the values print("Current Close:", current_close) print("Previous Close:", previous_close)
Current Close: 54.45000076293945 Previous Close: 54.95000076293945
In this code, we are first selecting the ‘Close’ column from the dataframe using
df['Close']. We then use the
iloc function to select the last row (
iloc[-1]) and second last row (
iloc[-2]) from this column to get the current and previous closing values respectively. Finally, we print the values using the
Note that the
iloc function uses 0-based indexing, so
iloc[-1] refers to the last row,
iloc[-2] refers to the second last row, and so on.
Alternatively, we can also use
stock = yf.Ticker("ZOMATO.NS") #retrieve daily data df = stock.history(period="max", interval="1d") close = df.Close.round(2) # Retrieve and print the current close and previous closing value and 5 day back close price print("Current Price :",close[-1]) print("Previous day closing Price :",close[-2]) print("5 Day Back closing Price :",close[-6])
Current Price : 54.45 Previous day closing Price : 54.95 5 Day Back closing Price : 51.8
df.drop function is used to remove the dataframe columns. The first print statement will print all the columns of the dataframe, including the ‘Date’ column. The second print statement will print the remaining columns after removing columns 6 and 7 (i.e. ‘Dividends’ and ‘Stock Splits’).
#Retrive Infy Data stock = yf.Ticker("INFY.NS") #retrieve daily data df = stock.history(period="max", interval="1d") close = df.Close.round(2) #reset the index df = df.reset_index() #print the columns print(df.columns) #remove the columns 6 & 7 df.drop(df.columns[[6,7]], axis=1, inplace=True) #print the remaining columns print(df.columns)