Scrape specific <td> in HTML table

Asked
Active3 hr before
Viewed126 times

4 Answers

tablespecific
90%

I am trying to scrape a table using PHP, the thing is that I've managed to scrape it, but I get everything on the webpage's table. I am unsure how I specify which TD's and/or TR's I want to scrape., Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers ,What I get is everything in the table, including the collapsed team information. It looks like this (not sure if a picture is the best way to post it but I'm not sure how to show it in another way, I highlighted the part that I actually want scraped):, Are there any specific ids associated with table rows? – Rehmat Nov 16 '15 at 14:04

I wrote this based on the manual at the link above; it might get you in the right direction:

require "simple_html_dom.php";

$html = file_get_html("http://www.premierleague.com/en-gb/matchday/league-table.html");
$html = new simple_html_dom($html);

$rows = array();
foreach($html - > find('table.leagueTable tr.club-row') as $tr) {
   $row = array();
   foreach($tr - > find('td.col-club,td.col-p,td.col-w,td.col-l,td.col-gf,td.col-ga,td.col-gd,td.col-pts') as $td) {
      $row[] = $td - > innertext;
   }
   $rows[] = $row;
}
var_dump($rows);
load more v
88%

An HTML table starts with a table tag with each row defined with tr and column with td tags respectively. Optionally thead is used to group the header rows and tbody to group the content rows. , To scrape data from HTML table, basically, we need to find the table that we're interested in on a website and iterate for each row the columns that we want to get our data from. , In this case, the table is assigned the classes of table and table-striped Here's the actual HTML code for the table , The first column uses <th> instead of <td> thus our array index starts at the First column of the table.

In this case, the table is assigned the classes of table and table-striped Here's the actual HTML code for the table

<table class="table table-striped">
   <thead>
      <tr>
         <th scope="col">#</th>
         <th scope="col">First</th>
         <th scope="col">Last</th>
         <th scope="col">Handle</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <th scope="row">1</th>
         <td>Mark</td>
         <td>Otto</td>
         <td>@mdo</td>
      </tr>
      <tr>
         <th scope="row">2</th>
         <td>Jacob</td>
         <td>Thornton</td>
         <td>@fat</td>
      </tr>
      <tr>
         <th scope="row">3</th>
         <td>Larry</td>
         <td>the Bird</td>
         <td>@twitter</td>
      </tr>
   </tbody>
</table>
load more v
72%

I need to scrape a table off of a webpage and put it into a pandas data frame. But I am not being able to do it. Let me first give you a hint of how the table is encoded into html document. , Help Center Detailed answers to any questions you might have , Was I unreasonably left out of author list? ,Making statements based on opinion; back them up with references or personal experience.

This will give you all the values under <tr>:

bs = BeautifulSoup(data, "lxml")
table_body = bs.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
   cols = row.find_all('td')
cols = [x.text.strip() for x in cols]
print cols
65%

To know about any element that you wish to scrape, just right-click on that text and examine the tags and attributes of the element.,Moving on to the second tr tag of the outer table, let's get the content of all the three tables by iterating over each table and its rows. ,Analyzing the outer table, we can see that it has special attributes which include class as wikitable and has two tr tags inside tbody.,Now, let's get all the links in the page along with its attributes, such as href, title, and its inner Text.

1
<!DOCTYPE html>
2<html markdown="1">
3

<head>
   4
   <meta charset="utf-8" />
   5
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   6
</head>
7

<body>
   8 <h1 class="heading"> My first Web Scraping with Beautiful soup </h1>
   9 <p>Let's scrap the website using python. </p>
   10

   <body>
      11

</html>
load more v

Other "table-specific" queries related to "Scrape specific <td> in HTML table"