Fetch data of variables inside script tag in Python or Content added from js

Asked
Active3 hr before
Viewed126 times

9 Answers

variablesinsidefetch
90%

Meta Stack Overflow , Stack Overflow help chat ,Stack Overflow en español,Stack Overflow на русском

This will do the trick using re module to extract the data and loading it as JSON:

import urllib
import json
import re
from bs4
import BeautifulSoup

web = urllib.urlopen("http://www.nasdaq.com/quotes/nasdaq-financial-100-stocks.aspx")
soup = BeautifulSoup(web.read(), 'lxml')
data = soup.find_all("script")[19].string
p = re.compile('var table_body = (.*?);')
m = p.match(data)
stocks = json.loads(m.groups()[0])

   >>>
   for stock in stocks:
   ...print stock
   ...[u 'ASPS', u 'Altisource Portfolio Solutions S.A.', 116.96, 2.2, 1.92, 86635, u 'N', u 'N']
   [u 'AGNC', u 'American Capital Agency Corp.', 23.76, 0.13, 0.55, 3184303, u 'N', u 'N']
   .
   .
   .
   [u 'ZION', u 'Zions Bancorporation', 29.79, 0.46, 1.57, 2154017, u 'N', u 'N']

Otherwise, if you look at the actual HTML you will find that the data is available within the page in the following script tag:

<script type="text/javascript">
   var table_body = [
      ["ATVI", "Activision Blizzard, Inc", 20.92, 0.21, 1.01, 6182877, .1, "N", "N"],
      ["ADBE", "Adobe Systems Incorporated", 66.91, 1.44, 2.2, 3629837, .6, "N", "N"],
      ["AKAM", "Akamai Technologies, Inc.", 57.47, 1.57, 2.81, 2697834, .3, "N", "N"],
      ["ALXN", "Alexion Pharmaceuticals, Inc.", 170.2, 0.7, 0.41, 659817, .1, "N", "N"],
      ["ALTR", "Altera Corporation", 33.82, -0.06, -0.18, 1928706, .0, "N", "N"],
      ["AMZN", "Amazon.com, Inc.", 329.67, 6.1, 1.89, 5246300, 2.5, "N", "N"],
      ....
      ["YHOO", "Yahoo! Inc.", 35.92, 0.98, 2.8, 18705720, .9, "N", "N"]
   ];
load more v
88%

Now I have two options either get that table content ot get that the js variables, any one of them can fulfil my task but unfortunately I don't know how to get these , So please tell how I can get resolve any one of the problem.,But now I don't know how I can get data of these js variables.,The problem with this is that the script tag offset is hard-coded and there is not a reliable way to locate it within the page. Changes to the page could break your code.,Otherwise, if you look at the actual HTML you will find that the data is available within the page in the following script tag:

I have also tried one more thing :

web = urllib.urlopen("my url")
html = web.read()
soup = BeautifulSoup(html, 'lxml')
js = soup.find("script")
ss = js.prettify()
print ss

Result :

<script type="text/javascript">
   myPage = 'ETFs';
   sectionId = 'liQuotes'; //section tab
   breadCrumbId = 'qQuotes'; //page
   is_dartSite = "quotes";
   is_dartZone = "news";
   propVar = "ETFs";
</script>
load more v
72%

A function then puts all the data into the table client side after load.,I'm scraping a bunch of simple HTML tables on a bunch of pages. The problem is that the server doesn't hand down the pages with the tables intact. Instead it hands down a blank table with all the data I need stored in array variables in a <script> tag. Inside the script it looks something like this...,The script tag with all the data happens to be the fifth script tag on the page, so I've been able to isolate it using targ_script = soup.findAll('script')[4], but I can't seem to drill down any further than that. Is there any way I can pull the values of those specific variables without resorting to gnarly string manipulation or forcing the script to run?

I'm scraping a bunch of simple HTML tables on a bunch of pages. The problem is that the server doesn't hand down the pages with the tables intact. Instead it hands down a blank table with all the data I need stored in array variables in a <script> tag. Inside the script it looks something like this...

var AHead0 = new Array('<b>
   <font size=2>Data I need</font>
</b>','<b>
   <font size=2 color=green>Data I need</font>
</b>','<b>
   <font size=2>Data I need</font>
</b>');

var AHead1 = new Array('Data I need','Data I need','Data I need');

var AHead2 = new Array('<div id="blahblah1">Data I need</div>','<div id="blahblah2">Data I need</div>','<div id="blahblah3">Data I need</div>');

var AHead3 = new Array('So on and so forth','You get the idea');
load more v
65%

Use Scrapy’s fetch command to download the webpage contents as seen by Scrapy:,If the desired data is in embedded JavaScript code within a <script/> element, see Parsing JavaScript code.,If the desired data is hardcoded in JavaScript, you first need to get the JavaScript code:,If the response is JavaScript, or HTML with a <script/> element containing the desired data, see Parsing JavaScript code.

scrapy fetch--nolog https: //example.com > response.html
load more v
75%

JavaScript can handle many types of data, but for now, just think of numbers and strings. ,Using the dollar sign is not very common in JavaScript, but professional programmers often use it as an alias for the main function in a JavaScript library.,Using the underscore is not very common in JavaScript, but a convention among professional programmers is to use it as an alias for "private (hidden)" variables.,As with algebra, you can do arithmetic with JavaScript variables, using operators like = and +:

var = "";
40%

How to read a local text file using JavaScript?,JavaScript | console.log() with Examples,How to check if a variable is an array in JavaScript?,JavaScript | Adding a class name to the element

load more v
22%

Where do I find functions?,Inheritance in JavaScript,The top level outside all your functions is called the global scope. Values defined in the global scope are accessible from everywhere in the code.,Some functions require parameters to be specified when you are invoking them — these are values that need to be included inside the function parentheses, which it needs to do its job properly.

let myText = 'I am a string';
let newString = myText.replace('string', 'sausage');
console.log(newString);
// the replace() string function takes a source string,
// and a target string and replaces the source string,
// with the target string, and returns the newly formed string
load more v
60%

Program 1: Program to add content inside the body of the HTML file through jQuery.,Program 3: This Program contains different kinds of number data types and they are printed in the HTML file.,Program 7: This program declares a function and then assigns it to a variable and calls the function through the variable.,The < script type = “text/java script “> tag informs the web browser that the JavaScript codes are located in here.

STEP 3: For the jQuery script to work in your HTML file you will need to add jQuery source in the html file like the following code-

<head>
   <script type='text/javascript' src='jquery-3.5.1.min'>
   </script>
</head>
load more v
48%

from bs4
import BeautifulSoup
import requests

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')

s = soup.find('script', type = 'application/ld+json')

# JUST THIS
json = json.loads(s.string)

Other "variables-inside" queries related to "Fetch data of variables inside script tag in Python or Content added from js"