python - How to fetch resultant of a POST request in a web-scrapper? -

- April 15, 2013

i trying scrape out data this website.

there table in there different organisations listed , name each organisation link webpage more information organisation.

those links instead of being hard-coded hyperlinks call javascript function computed when function called.

<a href="javascript:view_ngo('4309','','1','0')" class="bluelink11px">  biswasuk sevasram sangha      </a>

so not possible scrape out information following links.

is there workaround execute javascript function , html of resultant webpage? using python 3, , using beautiful soup web scraper.

first, javascript not executed server side client side. here, should use debugging facilities or browser (firefox function f12 enough) see happens when click on 1 of links. see javascript code prepares , send post request

so view_ngo(a, b, c, d) generates following post request:

post http://ngo.india.gov.in/view_ngo_details_ngo.php

with following data:

ngo_id=a&records_no=b&page_no=c&page_val=1&issueid=&ngo_black=d&records=

you can see uses session cookie, should take provisions in scraping code.

scraping like:

cookieprocessor = urllib.request.cookieprocessor() opener = urllib.request.build_opener(cookieprocessor) soup = beautifulsoup(opener.open(         'http://ngo.india.gov.in/sector_ngolist_ngo.php?psid=&records='))  # find relevant links , iterate through them view_ngo(a, b, c, d)      data = urllib.parse.urlencode({'ngo_id': a, 'records_no': b, 'page_no': c,         'page_val': '1', 'issue_id':'', 'ngo_black':d, 'records_no':'' }).encode()     soup2 = beautifulsoup(opener.open('http://ngo.india.gov.in/view_ngo_details_ngo.php',         data))

Search This Blog

Image

python - How to fetch resultant of a POST request in a web-scrapper? -

Comments

Post a Comment

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -