python - How to fetch resultant of a POST request in a web-scrapper? -
i trying scrape out data this website.
there table in there different organisations listed , name each organisation link webpage more information organisation.
those links instead of being hard-coded hyperlinks call javascript function computed when function called.
<a href="javascript:view_ngo('4309','','1','0')" class="bluelink11px"> biswasuk sevasram sangha </a>
so not possible scrape out information following links.
is there workaround execute javascript function , html of resultant webpage? using python 3, , using beautiful soup web scraper.
first, javascript not executed server side client side. here, should use debugging facilities or browser (firefox function f12 enough) see happens when click on 1 of links. see javascript code prepares , send post request
so view_ngo(a, b, c, d)
generates following post request:
post http://ngo.india.gov.in/view_ngo_details_ngo.php
with following data:
ngo_id=a&records_no=b&page_no=c&page_val=1&issueid=&ngo_black=d&records=
you can see uses session cookie, should take provisions in scraping code.
scraping like:
cookieprocessor = urllib.request.cookieprocessor() opener = urllib.request.build_opener(cookieprocessor) soup = beautifulsoup(opener.open( 'http://ngo.india.gov.in/sector_ngolist_ngo.php?psid=&records=')) # find relevant links , iterate through them view_ngo(a, b, c, d) data = urllib.parse.urlencode({'ngo_id': a, 'records_no': b, 'page_no': c, 'page_val': '1', 'issue_id':'', 'ngo_black':d, 'records_no':'' }).encode() soup2 = beautifulsoup(opener.open('http://ngo.india.gov.in/view_ngo_details_ngo.php', data))
Comments
Post a Comment