Extract data with Selenium - how to locate element?

Active3 hr before
Viewed126 times

7 Answers


Static and Dynamic Web Scraping using Selenium and Python,Here are the prerequisites for realizing web scraping using Selenium and Python:,The following Selenium Locators can be used for locating WebElements on the web page under test:,Now that we have imported all modules let’s get our hands dirty with Python web scraping using Selenium,

Example_snippet/controller/utility/_locate.js/ $ pip install requests. . .
$ pip install requests
load more v

Here, I show the methods of Selenium to find multiple elements in web pages [1], Then, these methods return lists

Example_snippet/controller/utility/_locate.js/ !pip install selenium. . .
!pip install selenium
Step 2 continued with from selenium import webdriver. . .
from selenium
import webdriver
from selenium.webdriver.common.keys
import Keys
import pandas as pd
Step 3 continued with driver = webdriver.Chrome(r "C. . .
driver = webdriver.Chrome(r "C:UsersEugeniaDownloadschromedriver_win32chromedriver.exe")
Step 4 continued with url = 'https://en.wikipedia.or. . .
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_greenhouse_gas_emissions'
Step 5 continued with driver.get(url). . .
Step 6 continued with titles = driver.find_elements_. . .
titles = driver.find_elements_by_class_name('headerSort')
Step 7 continued with for t in titles: print(t.t. . .
for t in titles:
Step 8 continued with df = pd.DataFrame(columns=[t.t. . .
df = pd.DataFrame(columns = [t.text
   for t in titles
Step 9 continued with states = driver.find_elements_. . .
states = driver.find_elements_by_xpath('//table[@class="wikitable sortable plainrowheaders jquery-tablesorter"]/tbody/tr/th')
Step 10 continued with for idx,s in enumerate(states). . .
for idx, s in enumerate(states):
   print('row {}:'.format(idx))
Step 11 continued with col2 = driver.find_elements_b. . .
col2 = driver.find_elements_by_xpath('//table[@class="wikitable sortable plainrowheaders jquery-tablesorter"]/tbody/tr/td[1]')
Step 12 continued with col3 = driver.find_elements_b. . .
col3 = driver.find_elements_by_xpath('//table[@class="wikitable sortable plainrowheaders jquery-tablesorter"]/tbody/tr/td[2]')
Step 13 continued with col4 = driver.find_elements_b. . .
col4 = driver.find_elements_by_xpath('//table[@class="wikitable sortable plainrowheaders jquery-tablesorter"]/tbody/tr/td[3]')
Step 14 continued with col5 = driver.find_elements_b. . .
col5 = driver.find_elements_by_xpath('//table[@class="wikitable sortable plainrowheaders jquery-tablesorter"]/tbody/tr/td[4]')
Step 15 continued with df[df.columns[0]] = [s.text fo. . .
df[df.columns[0]] = [s.text
   for s in states
Step 16 continued with df[df.columns[1]] = [s.text fo. . .
df[df.columns[1]] = [s.text
   for s in col2
df[df.columns[2]] = [s.text
   for s in col3
df[df.columns[3]] = [s.text
   for s in col4
df[df.columns[4]] = [s.text
   for s in col5
Step 17 continued with df.head(). . .
Step 18 continued with df.to_csv('greenhouse_gas_emis. . .
load more v

Selenium works by automating browsers to load the website, retrieve the required data, and even take certain actions on the website, Here, we walk through a practical use-case, using Selenium to extract data from a website

Example_snippet/controller/utility/_locate.js/ from selenium import webdriver. . .
from selenium
import webdriver

driver = webdriver.Chrome('YOUR_PATH_TO_chromedriver.exe_FILE')
load more v

find_element_by_link_text: Use text value of a link to find element, ,find_element_by_xpath: Use xpath to find an elements

Example_snippet/controller/utility/_locate.js/ 1pip install selenium . . .
1 pip install selenium
load more v

Line 8 — Note that container1 is a class attribute value identifying two elements and drive,find_element_by_class_name returns the first element found

Example_snippet/controller/utility/_locate.js/ Element identified by id: Cont. . .
Element identified by id: Content 2 hereElement identified by class: Content hereElement identified by class: Content 3 hereElement identified by tag name: My First HeadingElement identified by xpath: My first paragraph.
Step 2 continued with Content hereContent 3 here. . .
Content hereContent 3 here
load more v

How to extract text from a web page using,

Example_snippet/controller/utility/_locate.js/ import java.io.File; import ja. . .
import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class ExtractText {

   static WebDriver driver;

   public static void main(String[] args) throws IOException {

      System.setProperty("webdriver.chrome.driver", "chrome_driver_path");

      driver = new ChromeDriver();

      String output = driver.findElement(By.xpath("/html/body/div[1]/div[5]/div/div/div[1]/div[2]/div[1]/div")).getText();
      File DestFile = new File("extractedFilePath");
      FileUtils.writeStringToFile(DestFile, output);

load more v

Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers ,Connect and share knowledge within a single location that is structured and easy to search,, Stack Overflow help chat ,Stack Overflow en español

Example_snippet/controller/utility/_extract.js/ elem = driver.find_element_by_. . .
elem = driver.find_element_by_xpath('//a[@id="ext-comp-1015"]')
data = elem.text
load more v