python - Unable to access the real page source code -

March 15, 2012

this not 1 of standard issues javascript modifying parts of page source become hidden after right click > view page source. problem different.

actually able see whole html code when right click > view page source when try read url via beautifulsoup, xml.parser, or open mechanize, @ point page becomes kind of different , missing important contents.

the way can down real html code manually copy/pase whole content , save file. when automatically python content changes.

essentially site in html, saw there javascript,flash , ajax code too.

have guys ideas can done? know might hard figure out without seeing source code guess i'd better off not posting page url i'm scraping from.

this due page sending different response due different referrer or useragent header.

try setting user agent setting headers firefox example

user_agent = "mozilla/5.0 (windows nt 6.1; rv:2.0.1) gecko/20100101 firefox/4.0.1" headers = { 'user-agent' : user_agent }

Search This Blog

Parth Code

python - Unable to access the real page source code -

Comments

Post a Comment

Popular posts from this blog

c# - WPF Converters DLL - Failed to Add Reference -

sql server - SQL Query get records between 10pm to 6am -

c# - Operator '==' incompatible with operand types 'Guid' and 'Guid' using DynamicExpression.ParseLambda<T, bool> -