Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data from Screener

Note:

Blog updated on May 2024 with modern Examples. This tutorial is only for Educational purpose only to understand the capabilities of Beautiful Soup.

Web scraping is an invaluable technique for extracting data from websites, and Beautiful Soup is one of the most popular libraries in Python for this purpose. Whether you’re looking to gather financial data, product prices, or any other type of information, Beautiful Soup offers a robust and flexible solution. In this blog, we’ll explore the capabilities of Beautiful Soup and demonstrate how to extract financial data from screener.in, a comprehensive stock analysis platform.

What is Beautiful Soup?

Beautiful Soup is a Python library used for parsing HTML and XML documents. It creates a parse tree from page source code, enabling data extraction from HTML tags. Beautiful Soup works with a parser to provide Pythonic idioms for iterating, searching, and modifying the parse tree.

Key Features of Beautiful Soup

Ease of Use: Beautiful Soup abstracts many of the complexities involved in parsing HTML, making it simple to use even for beginners.
Integration with Parsers: It supports several parsers, including Python’s built-in HTML parser, lxml, and html5lib, giving you flexibility in how you process your documents.
Navigating the Parse Tree: Beautiful Soup provides powerful methods for navigating the parse tree and finding elements using tags, attributes, text, and more.
Modifying the Parse Tree: You can modify the document by adding, removing, or altering tags and content.

Extracting Data from Screener.in

To demonstrate Beautiful Soup’s capabilities, we’ll extract key financial insights from a company’s page on screener.in. We’ll use the Python requests library to fetch the webpage and Beautiful Soup to parse and extract the desired data.

Here’s the step-by-step process:

Install Necessary Libraries: Ensure you have Beautiful Soup and requests installed.

pip install beautifulsoup4 requests

Fetch the Webpage: Use requests to download the webpage content.

Parse the Webpage: Use Beautiful Soup to parse the HTML content.

Extract Data: Navigate the parse tree to find and extract relevant data.

Python Code

Below is a complete example to fetch and extract data from screener.in:

import requests
from bs4 import BeautifulSoup

stock_ticker = "RELIANCE"
url = f"https://www.screener.in/company/{stock_ticker}/consolidated/"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)

if response.status_code == 200:
    print("Successfully fetched the webpage")
else:
    print(f"Failed to fetch the webpage. Status code: {response.status_code}")

soup = BeautifulSoup(response.content, 'html.parser')

def extract_key_insights(soup):
    company_name = soup.find('h1', class_='h2 shrink-text').text.strip()
    current_price = soup.find('div', class_='font-size-18 strong line-height-14').find('span').text.strip()
    market_cap = soup.find('li', {'data-source': 'default'}).find('span', class_='number').text.strip()
    about_section = soup.find('div', class_='company-profile').find('div', class_='sub show-more-box about').text.strip()

    return {
        "Company Name": company_name,
        "Current Price": current_price,
        "Market Cap": market_cap,
        "About": about_section
    }

# Extract data
key_insights = extract_key_insights(soup)

# Print extracted data
print("Company Name:", key_insights['Company Name'])
print("Current Price:", key_insights['Current Price'])
print("Market Cap:", key_insights['Market Cap'])
print("About:", key_insights['About'])

Output

Successfully fetched the webpage
Company Name: Reliance Industries Ltd
Current Price: ₹ 2,870
Market Cap: 19,41,516
About: Reliance was founded by Dhirubhai Ambani and is now promoted and managed by his elder son, Mukesh Dhirubhai Ambani. Ambani's family has about 50% shareholding in the conglomerate.

Explanation of the Code

Import Libraries: We import the necessary libraries: requests for making HTTP requests and BeautifulSoup from bs4 for parsing HTML.
Define Variables: Set the stock ticker and construct the URL for the company’s page on screener.in.
Set Headers: Define the headers to mimic a browser request, which helps in avoiding potential blocks by the website.
Make the Request: Fetch the webpage content using requests.get(). Check if the request was successful.
Parse the Content: Create a Beautiful Soup object to parse the HTML content.
Extract Insights: Define a function extract_key_insights to navigate the parse tree and extract the company name, current price, market cap, and about section.

Beautiful Soup simplifies the process of web scraping by providing an intuitive way to navigate and manipulate HTML documents. By following the steps outlined above, you can efficiently extract valuable financial data from screener.in or any other website. As you become more familiar with Beautiful Soup, you’ll discover even more ways to harness its power for your data extraction needs. Happy scraping!

3 Replies to “Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data…”

Rowdy007 says:

April 18, 2010 at 10:21 pm

Ok how do we use it ???
Daemonkane says:

April 21, 2010 at 8:18 pm

Ever heard of WebQL. Try that it might help, you might then take te output from WebQL do a bit of clean up and put it into MySQ. I guess iMacro is also something similar.
1. Rajandran R says:
  
  April 22, 2010 at 4:26 pm
  
  @Daemonkane
  
  Thanks for the input sujith. It looks WebQL could also aggregates data from Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores

Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data from Screener – Python Tutorial

What is Beautiful Soup?

Key Features of Beautiful Soup

Extracting Data from Screener.in

Python Code

Explanation of the Code

Related

How I Built a Telegram AI Stock Assistant Using…

[Course] Building Stock Market Based Telegram Bots using Python

Understanding Object-Oriented Programming (OOP) Concepts in Python for Traders…

3 Replies to “Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data…”

Leave a ReplyCancel reply