Pandas is an open-source Python library that is widely used for data manipulation and analysis. One of the most popular features of Pandas is the DataFrame. It is a two-dimensional table-like data structure that allows you to store and manipulate data in a way that is similar to a spreadsheet.
In this tutorial, we will be learning how to retrieve the stock data in pandas dataframe format, dataframe structure, selecting columns, selecting rows, filtering data, adding a column, removing a column, retrieving current trading price & prev day close price, and visualizing data using a simple line chart.
A DataFrame consists of rows and columns, where each row represents a unique observation or record, and each column represents a variable or feature of that observation. You can think of a DataFrame as a spreadsheet, where each row is a record, and each column is a field.
In this tutorial, we will learn how to create and manipulate DataFrames using yfinance and plot using seaborn python library.
Prerequisites
Before we begin, you will need to have the following:
- Python is installed on your computer.
- Jupyter Notebook (VS code Editor or Google Colab or any other IDE installed).
- Basic knowledge of Python syntax.
Installing Libraries
use the pip command as shown below to install the python libraries
pip install pandas yfinance seaborn matplotlib
Importing Libraries
Before we create a DataFrame, we need to import the necessary libraries. We will be using Pandas, yfinance seaborn, and matplotlib.
import pandas as pd
import yfinance as yf
import seaborn as sns
import matplotlib.pyplot as plt
Creating a DataFrame
There are several ways to create a DataFrame. In this tutorial, In this tutorial, we will use yfinance to fetch stock historical data and create DataFrames.
Creating a DataFrame from yfinance seaborn
yfinance is a Python library that allows you to download financial data from Yahoo Finance. To create a DataFrame from yfinance , we can use the yf.download()
function.
df = yf.download('ZOMATO.NS', start='2021-01-01', end='2023-02-25')
# Display the first 5 rows of the DataFrame
df.head()
Output
This will download the stock data for Zomato from Yahoo Finance and create a DataFrame.
Viewing DataFrame
To view the entire DataFrame, you can use the df
function.
df
This will display the entire DataFrame.
Selecting Columns
To select a specific column in the DataFrame, you can use the column name in square brackets.
# Select the 'close' column
df['Close']
Selecting Rows
To select a specific row in the DataFrame, you can use the head
function.
Filtering Data
To filter the data in the DataFrame, you can use boolean indexing.
# Filter the data where the 'Close' price is greater than 150
df[df['Close'] > 150]
This will filter the data where the ‘Close’ price is greater than 150.
Adding a Column
To add a new column to the DataFrame, you can use the square bracket notation to assign a new column name and the values for that column.
# Calculate the daily return and add it as a new column
df['Daily Return'] = df['Close'].pct_change()
# Display the first 5 rows of the DataFrame
df.head()
This will calculate the daily return and add it as a new column to the DataFrame.
Print Column Headers
To print the column headers use df.columns
print(df.columns)
Resting the Index of the Dataframe
It seems that the ‘Date’ column is not present in your dataframe. This might be due to the fact that when we download data using the yf.download
method, the date column is automatically set as the index of the dataframe.
To confirm this, you can check the index of your dataframe by running:
print(df.index)
This will print the index of the dataframe, which should include dates.
If you want to add a separate ‘Date’ column to your dataframe, you can reset the index of the dataframe using the reset_index()
method:
df = df.reset_index()
Visualizing Data
To visualize data in the DataFrame, you can use the sns
and plt
functions. Ensure Date and Close is present in the column where Date column is used for plotting x-axis and the close column as the y-axis co-ordinates.
# Create a line plot of the 'Close' price
sns.lineplot(x='Date', y='Close', data=df)
plt.show()
Retrieving the Current Close and Previous Closing Value
To retrieve the current close and previous closing value from the dataframe, we can use the iloc
function of pandas. The iloc
function is used to select rows and columns from the dataframe by integer index positions.
Here’s an example code snippet:
# Retrieve the current close and previous closing value
current_close = df['Close'].iloc[-1]
previous_close = df['Close'].iloc[-2]
# Print the values
print("Current Close:", current_close)
print("Previous Close:", previous_close)
Output
Current Close: 54.45000076293945
Previous Close: 54.95000076293945
In this code, we are first selecting the ‘Close’ column from the dataframe using df['Close']
. We then use the iloc
function to select the last row (iloc[-1]
) and second last row (iloc[-2]
) from this column to get the current and previous closing values respectively. Finally, we print the values using the print()
function.
Note that the iloc
function uses 0-based indexing, so iloc[-1]
refers to the last row, iloc[-2]
refers to the second last row, and so on.
Alternatively, we can also use
stock = yf.Ticker("ZOMATO.NS")
#retrieve daily data
df = stock.history(period="max", interval="1d")
close = df.Close.round(2)
# Retrieve and print the current close and previous closing value and 5 day back close price
print("Current Price :",close[-1])
print("Previous day closing Price :",close[-2])
print("5 Day Back closing Price :",close[-6])
Output
Current Price : 54.45
Previous day closing Price : 54.95
5 Day Back closing Price : 51.8
Removing Columns
df.drop function is used to remove the dataframe columns. The first print statement will print all the columns of the dataframe, including the ‘Date’ column. The second print statement will print the remaining columns after removing columns 6 and 7 (i.e. ‘Dividends’ and ‘Stock Splits’).
#Retrive Infy Data
stock = yf.Ticker("INFY.NS")
#retrieve daily data
df = stock.history(period="max", interval="1d")
close = df.Close.round(2)
#reset the index
df = df.reset_index()
#print the columns
print(df.columns)
#remove the columns 6 & 7
df.drop(df.columns[[6,7]], axis=1, inplace=True)
#print the remaining columns
print(df.columns)