CUTTS: We have a question from Leon inthe UK. Leon asks, "Does Googlebot use inference when spidering--having crawled a URL page1.htm and /page2.htm, can it guess at the existence of a page3 and crawl it?
Or does it stick entirely to what it finds via the link graphand/or Sitemaps/feeds?" Well, we also take submitted URL, so that's another way we can crawl things, but we do use some inference.
So, for example, suppose you have a URL with three or four different parameters, we, at times, have the ability to say, "You knowwhat? What if we drop one of those parameters?"
Do we still get the exact the same page back?
And if you drop that parameter and you still got the same page back, then maybe we didn't really need that parameter. So, you're talking about that in terms of almost like probing, you know, known patterns.
But we certainly do use it in terms of removing parameters for some sites and seeing whether that makes a difference. And what that lets you do is end up with much cleaner, prettier URLs with the parameters that didn't really matter.
So, I don't know, I'd have to double check whether we do the type of inference that you're talking about, but we do try to do a smart crawling and say, "Maybe this isn't really a necessary parameter. We see it all of the time. So what happens if we left it out? Would it still have a good page that looks the exact the same to us?
We try to crawl smarter and smarter every single time that we, you know, every year. So we've also added the ability to crawl through a form. Now, if you robust that text out, that form, then we'll never crawl it. And it's not like we're going to enter in a credit card number into a shopping cart, but if you got a form that's just a simple dropdown, for example, we might say,
"You know what? Let's go ahead and try crawling the URL that would result if we selected the value from that dropdown." So, Google tries to look for some of the dead ends that it finds on websites and get around some of those dead ends so that we can crawl more of your site and return that to users. But, of course, anytime you want to block us, just use robots.txt and then we won't crawl those pages.