python - Unable to access the real page source code -
this not 1 of standard issues javascript modifying parts of page source become hidden after right click > view page source. problem different.
actually able see whole html code when right click > view page source when try read url via beautifulsoup, xml.parser, or open mechanize, @ point page becomes kind of different , missing important contents.
the way can down real html code manually copy/pase whole content , save file. when automatically python content changes.
essentially site in html, saw there javascript,flash , ajax code too.
have guys ideas can done? know might hard figure out without seeing source code guess i'd better off not posting page url i'm scraping from.
this due page sending different response due different referrer or useragent header.
try setting user agent setting headers firefox example
user_agent = "mozilla/5.0 (windows nt 6.1; rv:2.0.1) gecko/20100101 firefox/4.0.1" headers = { 'user-agent' : user_agent }
Comments
Post a Comment