Today's question comes from the middle of America in Kansas,
John Heard wants to know, why does Google take so long to naturally remove 404 URLs?
It's a good question.
So in theory, four four could be transient.
Right?
A page could be missing and then come back later.
Technically, if you really want to signal that this page is completely gone and will never come back, there's an Http status code called 410 four one0.
But at least last time we checked back in 2007, we actually treated those the same.
But to get to the meat of your question, why did it take so long?
The answer is Webmasters can do kind of interesting things, and we sometimes see Webmasters shoot themselves in the foot.
They'll completely remove their site from the search results, or they'll be down and returning 404 instead.
O, something like a 503 says, come back later.
And so rather than learn very quickly, this is a 404 make it drop out forever.
Usually you'd prefer to build in a little bit more leeway there, so that if a webmaster is making a mistake, you can check a few times and make sure that it really is gone before you drop it out of the index.
Now it's always tricky because if you get it wrong one way people are unhappy, and if you get it wrong the other way, people are unhappy.
So we try to find a balance based on the feedback that we hear the complaints that we hear, what people people are happy, what they're sad about to try to sort of find.
Okay, maybe we'll try this page a few more times and make sure that it's really gone.
And otherwise you would hate it if you had a temporary glitch with your web server, and then Google didn't come back and check on that web server for like three years or something like that.
So it is the sort of thing where we try to find a good balance there. Thanks for the feedback, though.
I can always talk to the crawl team and find out.
Do the 410 really make things go away faster now, or are they still treated the same?
But at least for the time being, we try to build in that safety margin so that if Webmasters do make a mistake, if their servers overloaded, if their web host configure something incorrectly, it won't sabotage.
It won't cause long term damage.
And there will be a way to recover.