web applications - Does Google spider read robots.txt before accessing a resource? -


i have been wondering while. know if, instance, link here site example.com , domain hosts robots.txt file disallow everything, google won't index it.

however, happens if i, instance, insert here in stackoverflow js (script src ...) loads resource example.com? google read robots.txt first , decide load resource, or going load anyway, not index it? (this question)

basically worries me since have tracker , not spiders increase number of visits. of course there possibility of blocking several spiders in code putting user-agent, not 100% valid measure since many other spiders (not google, or not search spiders) count.

in theory, google says not read resources excluded robots.txt. in practice, google has not been consistent on this. several months year (and 2012), google not obeying robots.txt sure received complaints - provided mine! however, of mid-may 2013, googlebot seems reading , respecting robots.txt. means reads robots.txt, , reads resources not excluded.

why there time when did not respect robots.txt? possible answers: bad programming. maybe because google has grown much, has been more difficult enforce quality standards. maybe wanted more content. google if ever admits errors, unlikely admit not respecting robots.txt not them forthright on topic.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -