web applications - Does Google spider read robots.txt before accessing a resource? -

August 15, 2011

i have been wondering while. know if, instance, link here site example.com , domain hosts robots.txt file disallow everything, google won't index it.

however, happens if i, instance, insert here in stackoverflow js (script src ...) loads resource example.com? google read robots.txt first , decide load resource, or going load anyway, not index it? (this question)

basically worries me since have tracker , not spiders increase number of visits. of course there possibility of blocking several spiders in code putting user-agent, not 100% valid measure since many other spiders (not google, or not search spiders) count.

in theory, google says not read resources excluded robots.txt. in practice, google has not been consistent on this. several months year (and 2012), google not obeying robots.txt sure received complaints - provided mine! however, of mid-may 2013, googlebot seems reading , respecting robots.txt. means reads robots.txt, , reads resources not excluded.

why there time when did not respect robots.txt? possible answers: bad programming. maybe because google has grown much, has been more difficult enforce quality standards. maybe wanted more content. google if ever admits errors, unlikely admit not respecting robots.txt not them forthright on topic.

Search This Blog

Parth Code

web applications - Does Google spider read robots.txt before accessing a resource? -

Comments

Post a Comment

Popular posts from this blog

c# - WPF Converters DLL - Failed to Add Reference -

sql server - SQL Query get records between 10pm to 6am -

Java sticky instances of class com.mysql.jdbc.Field aggregating -