Something we’ve come across recently is a search query to check for URL’s which are listed in Google’s main index. This appears to remove any webpage’s which may be listed in Google’s supplemental/secondary index.
The query to find this is: site:domain.com/*
At the moment I’m unsure how accurate this is, the only reference I can find is a comment from Halfdeck on Jim Boykin’s How to Find if a Page is in Google’s Secret Supplemental Results from two and a half years ago. And since then, despite pleas to bring it back – Google have removed the queries which have helped us to find supplemental pages.
But looking at the results, I think there is some truth to the results – even if these are not entirely accurate.
Believable? It seems a bit low to me, for a high quality site like the BBC to only have 2.98% of their pages listed in Google’s main index – but then again 1.43 million pages is still a lot of content. Amazon.co.uk is a similar story, with 3.22 million in the main index as opposed to 145 million in the full index.
So is this one accurate? My honest answer is I’m unsure. I posted earlier this week that we now have 877 posts on the blog, in addition to this there is the main site and other blog pages such as tags and categories etc. So I’d like to think we’d have a large percentage of high-quality content which is well valued by Google. But in actual fact the pages which have built up a strong trust in Google, either via external links or internal linking/navigation/site structure, means that this figure may be very close. The one thing I didn’t understand was that a large number of blog tag pages were listed, often when used only once or twice before – so the number of links pointing to this is likely to be low – as is the quality of the content on the page (which is duplicated from each main post).
And what are the factors which affect this?
If this results are accurate, I’d expect the following factors to have an impact here:
- Link reputation/PageRank to be a large factor for indexation
- Age of site
- History in the search engines
- Inbound links (internal and external) to individual webpages
- Duplicate content
- Volume of content / amount of unique content on-page
So what do you think, have you used this query before and do the results seem accurate? I definitely don’t think they are 100% accurate, but it may be a useful indicator to keep an eye on – especially if you are having problems generating traffic for pages which are indexed for a regular site: command query and the solution to fix this isn’t an obvious one.