Fetch data of variables inside script tag in Python or Content added from js
Meta Stack Overflow , Stack Overflow help chat ,Stack Overflow en español,Stack Overflow на русском
This will do the trick using re module to extract the data and loading it as JSON:
import urllib import json import re from bs4 import BeautifulSoup web = urllib.urlopen("http://www.nasdaq.com/quotes/nasdaq-financial-100-stocks.aspx") soup = BeautifulSoup(web.read(), 'lxml') data = soup.find_all("script").string p = re.compile('var table_body = (.*?);') m = p.match(data) stocks = json.loads(m.groups()) >>> for stock in stocks: ...print stock ...[u 'ASPS', u 'Altisource Portfolio Solutions S.A.', 116.96, 2.2, 1.92, 86635, u 'N', u 'N'] [u 'AGNC', u 'American Capital Agency Corp.', 23.76, 0.13, 0.55, 3184303, u 'N', u 'N'] . . . [u 'ZION', u 'Zions Bancorporation', 29.79, 0.46, 1.57, 2154017, u 'N', u 'N']
Otherwise, if you look at the actual HTML you will find that the data is available within the page in the following script tag:
Now I have two options either get that table content ot get that the js variables, any one of them can fulfil my task but unfortunately I don't know how to get these , So please tell how I can get resolve any one of the problem.,But now I don't know how I can get data of these js variables.,The problem with this is that the script tag offset is hard-coded and there is not a reliable way to locate it within the page. Changes to the page could break your code.,Otherwise, if you look at the actual HTML you will find that the data is available within the page in the following script tag:
I have also tried one more thing :
web = urllib.urlopen("my url") html = web.read() soup = BeautifulSoup(html, 'lxml') js = soup.find("script") ss = js.prettify() print ss
A function then puts all the data into the table client side after load.,I'm scraping a bunch of simple HTML tables on a bunch of pages. The problem is that the server doesn't hand down the pages with the tables intact. Instead it hands down a blank table with all the data I need stored in array variables in a <script> tag. Inside the script it looks something like this...,The script tag with all the data happens to be the fifth script tag on the page, so I've been able to isolate it using targ_script = soup.findAll('script'), but I can't seem to drill down any further than that. Is there any way I can pull the values of those specific variables without resorting to gnarly string manipulation or forcing the script to run?
I'm scraping a bunch of simple HTML tables on a bunch of pages. The problem is that the server doesn't hand down the pages with the tables intact. Instead it hands down a blank table with all the data I need stored in array variables in a <script> tag. Inside the script it looks something like this...
var AHead0 = new Array('<b> <font size=2>Data I need</font> </b>','<b> <font size=2 color=green>Data I need</font> </b>','<b> <font size=2>Data I need</font> </b>'); var AHead1 = new Array('Data I need','Data I need','Data I need'); var AHead2 = new Array('<div id="blahblah1">Data I need</div>','<div id="blahblah2">Data I need</div>','<div id="blahblah3">Data I need</div>'); var AHead3 = new Array('So on and so forth','You get the idea');
scrapy fetch--nolog https: //example.com > response.html
var = "";
let myText = 'I am a string'; let newString = myText.replace('string', 'sausage'); console.log(newString); // the replace() string function takes a source string, // and a target string and replaces the source string, // with the target string, and returns the newly formed string
STEP 3: For the jQuery script to work in your HTML file you will need to add jQuery source in the html file like the following code-
from bs4 import BeautifulSoup import requests link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p' r = requests.get(link) soup = BeautifulSoup(r.text, 'html.parser') s = soup.find('script', type = 'application/ld+json') # JUST THIS json = json.loads(s.string)
Other "variables-inside" queries related to "Fetch data of variables inside script tag in Python or Content added from js"
- Redux store data update implies a full refresh of React components?
- Apollo Client - refetchQueries calling 2 times
- How to connect postgres sql with react app and send form data stored in states to database?
- How to fetch n dependent data with react-query
- How to update a database according to SignalR?
- Fetch data from localhost cassandra db in react clientside
- Get data from an array of objects depending on a unique value of a key
- How can I create a dynamic query in Amplify Datastore
- How to call api only on mount and click of Fetch Next User button
- How to make react to wait until api data is retrieved when use fetch
- Using Geojson with tons of data in react leaflet
- How to create line chart with JSON data using D3
- Can't retrieve data from youtube Api in React Native expo
- How we can retrieve all data by all users in firebase?
- React.js: How to compare data from 2 objects of arrays?
- Cannot render nested data returned by UseTracker in Meteor - ReactJS
- Using Axios to fetch data and display it in Table in React
- MUI Datatable custom toolbar
- How to delete a row when a button is clicked inside a DataGrid column?
- How do you pass state variables between two separate functional components in React Native?
- Reactjs update context api data after axios response
- How to structure data reducers in Redux-toolkit with multiple apis
- ReactJs: How to pass data from one component to another?
- Mouse position inside autoscaled SVG
- Is there a JSON equivalent of XQuery/XPath?
- Animate marquee on SVG curve
- AngularJS 1.4 directives: scope, two way binding and bindToController
- How to check IF user has ALREADY liked the facebook page?
- Google Maps - Multiple marker from extern json
- How can I check the marker is or isn't in the bounds using google maps v3
- SCRIPT5: Access is denied in IE9 on xmlhttprequest
- Promise - is it possible to force cancel a promise
- Image drawn to HTML5 Canvas does not display correctly on first load
- Open a new tab/window and write something to it?
- Change the selected value of a drop-down list with jQuery
- Can I load a web worker script from an absolute URL?