Web scraping is an invaluable technique for extracting data from websites, and Beautiful Soup is one of the most popular libraries in Python for this purpose. Whether you’re looking to gather financial data, product prices, or any other type of information, Beautiful Soup offers a robust and flexible solution. In this blog, we’ll explore the capabilities of Beautiful Soup and demonstrate how to extract financial data from screener.in, a comprehensive stock analysis platform.
![](https://i0.wp.com/www.marketcalls.in/wp-content/uploads/2024/05/Reliane-Fundamental-Data.png?resize=1024%2C427&ssl=1)
What is Beautiful Soup?
Beautiful Soup is a Python library used for parsing HTML and XML documents. It creates a parse tree from page source code, enabling data extraction from HTML tags. Beautiful Soup works with a parser to provide Pythonic idioms for iterating, searching, and modifying the parse tree.
Key Features of Beautiful Soup
- Ease of Use: Beautiful Soup abstracts many of the complexities involved in parsing HTML, making it simple to use even for beginners.
- Integration with Parsers: It supports several parsers, including Python’s built-in HTML parser, lxml, and html5lib, giving you flexibility in how you process your documents.
- Navigating the Parse Tree: Beautiful Soup provides powerful methods for navigating the parse tree and finding elements using tags, attributes, text, and more.
- Modifying the Parse Tree: You can modify the document by adding, removing, or altering tags and content.
Extracting Data from Screener.in
To demonstrate Beautiful Soup’s capabilities, we’ll extract key financial insights from a company’s page on screener.in. We’ll use the Python requests
library to fetch the webpage and Beautiful Soup to parse and extract the desired data.
Here’s the step-by-step process:
Install Necessary Libraries: Ensure you have Beautiful Soup and requests
installed.
pip install beautifulsoup4 requests
Fetch the Webpage: Use requests
to download the webpage content.
Parse the Webpage: Use Beautiful Soup to parse the HTML content.
Extract Data: Navigate the parse tree to find and extract relevant data.
Python Code
Below is a complete example to fetch and extract data from screener.in:
import requests
from bs4 import BeautifulSoup
stock_ticker = "RELIANCE"
url = f"https://www.screener.in/company/{stock_ticker}/consolidated/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
print("Successfully fetched the webpage")
else:
print(f"Failed to fetch the webpage. Status code: {response.status_code}")
soup = BeautifulSoup(response.content, 'html.parser')
def extract_key_insights(soup):
company_name = soup.find('h1', class_='h2 shrink-text').text.strip()
current_price = soup.find('div', class_='font-size-18 strong line-height-14').find('span').text.strip()
market_cap = soup.find('li', {'data-source': 'default'}).find('span', class_='number').text.strip()
about_section = soup.find('div', class_='company-profile').find('div', class_='sub show-more-box about').text.strip()
return {
"Company Name": company_name,
"Current Price": current_price,
"Market Cap": market_cap,
"About": about_section
}
# Extract data
key_insights = extract_key_insights(soup)
# Print extracted data
print("Company Name:", key_insights['Company Name'])
print("Current Price:", key_insights['Current Price'])
print("Market Cap:", key_insights['Market Cap'])
print("About:", key_insights['About'])
Output
Successfully fetched the webpage
Company Name: Reliance Industries Ltd
Current Price: ₹ 2,870
Market Cap: 19,41,516
About: Reliance was founded by Dhirubhai Ambani and is now promoted and managed by his elder son, Mukesh Dhirubhai Ambani. Ambani's family has about 50% shareholding in the conglomerate.
Explanation of the Code
- Import Libraries: We import the necessary libraries:
requests
for making HTTP requests andBeautifulSoup
frombs4
for parsing HTML. - Define Variables: Set the stock ticker and construct the URL for the company’s page on screener.in.
- Set Headers: Define the headers to mimic a browser request, which helps in avoiding potential blocks by the website.
- Make the Request: Fetch the webpage content using
requests.get()
. Check if the request was successful. - Parse the Content: Create a Beautiful Soup object to parse the HTML content.
- Extract Insights: Define a function
extract_key_insights
to navigate the parse tree and extract the company name, current price, market cap, and about section.
Beautiful Soup simplifies the process of web scraping by providing an intuitive way to navigate and manipulate HTML documents. By following the steps outlined above, you can efficiently extract valuable financial data from screener.in or any other website. As you become more familiar with Beautiful Soup, you’ll discover even more ways to harness its power for your data extraction needs. Happy scraping!
Ok how do we use it ???
Ever heard of WebQL. Try that it might help, you might then take te output from WebQL do a bit of clean up and put it into MySQ. I guess iMacro is also something similar.
@Daemonkane
Thanks for the input sujith. It looks WebQL could also aggregates data from Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores