Google Insider: Yes, PageRank Determines Your Indexation Cap

Standard

Rand posted about a month ago on Google’s indexation cap.  He wrote,

Google (very likely) has a limit it places on the number of URLs it will keep in its main index and potentially return in the search results for domains.

I have inside information that this is more than “very likely”, it is exactly right, at least since 2006.  I got this information on accident; it was forwarded to me as part of a response to a question I posed via friendly channels at Google.   The email quoted below is the full, unchanged response from a well-known SEO personality who worked at Google.

The problem that inspired my inquiry was almost exactly what Rand describes– 40% traffic drop, rankings on “head” terms were fine, but traffic from thousands of long-tail searches was gone, and the sinking realization that large parts of my site went Supplemental.   Here’s the answer I got:

On XX/XX/06, XXXXXXXXXXXXX wrote:

The main issue is the dramatic decline of backlinks (which is likely causing the PageRank issue). The duplicate content issue isn’t that there are lots of pages that are exactly the same, but that each page isn’t unique enough (too much boilerplate that is the same from page to page and not enough content that is different). But fixing that is not going to help too much. It’s mostly the backlink problem.

If the issue on backlinks is that other sites are linking to 404 pages and not actual pages, putting in 301 redirects for every incorrect link (which they can get a list of in webmaster tools crawl errors) will help.

I’ll see if I can find out if the # of backlinks actually did drop or if we just changed our algorithms to discount many of them. I’m not sure what, if anything, we’ll be able to tell him about what I find out though.

Supplemental results aren’t results that don’t change much, they are results that don’t have enough PageRank to make it into our main index (we can’t tell him that, of course). [emphasis added]

So to sum-up, the key points are:

  • Google imposes a cap on the number of pages you can have in the Main index (unless you have infinite inbound links, see next point)
  • The cap is determined by the number of backlinks to each page (PageRank)
  • Google’s Main Index includes only pages with “sufficient PageRank”
  • Everything else (pages with “insufficient PageRank”) goes into Google’s Supplemental Index
  • Google has never publically confirmed this

Questions I still have:

  • In looking at “backlinks,” with respect to the indexation question, is Google doing a simplistic link count or is “backlinks” a euphamism for PageRank? (My guess is that it’s the later, backlinks = PageRank).
  • Does Google care whether the backlinks/PageRank to a page come from internal or external sources? (My guess is that internal links still influence the “backlinks” to a certain page and thus it’s possible for a site to influence which pages are in the Main index versus Supplemental by “managing” PageRank via their link graph).
  • How might have Google change this algorithm in the past 3 years?

So if your link graph influences which pages are in the Main index (and hopefully I’ve said that generally enough to avoid wading into the debate over nofollow’s efficacy), there are some very striking implications which I’m sure others will explore.

For starters, maybe you deprive some pages of PR (links) so you can concentrate the PR on more valuable pages you can actually “lift” from the Supplemental results (i.e. pages associated with valuable keywords or where the SERP is weak enough to achieve a top 3 ranking).  I’ll leave the tactical question regarding whether nofollow tags are effective ways to do this to others.

Scheduling Appointments in Google Calendar the Easy Way

Standard

whenfreeJust wanted to announce a handy Google Gadget I created to help automate the task of scheduling appointments.

Install ‘When I’m Free’ Google Calendar Gadget

Background

I use Google Calendar and schedule appointments all the time with people who are outside my company and don’t use Google Calendar, so we can’t see each other’s calendars. That means I spend a lot of time looking at my calendar and suggesting available times to people.  I spend a few minutes looking over my calendar to prepare an email that includes something like this:

Would any of these times work for you?
– Tues 11/17 3:00 – 5:00 pm
– Wed 11/18 10:00 AM – 2:00 PM
– Thurs 11/19 10:00 AM – 5:00 PM

That looks simple, right?  Why do I need a tool to help do that?  Because I do this all the time, and every time it takes 3-5 minutes to scan my calendar and suggest times.  …AND, if I’m booking an appointment in a different time zone, I like to be nice and translate it to the other party’s local time zone, which takes another second or two.  …AND since I book a lot of appointments, my calendar is constantly changing, so and every time I want to suggest some available times, I have to go back and do it again.  I’m not one to enjoy repetitive tasks, so I looked for a solution.

I used Xobni for Outlook because it has a nice feature called “Schedule time with so and so” that will look at your calendar and automatically prepare an email to someone with suggestions for upcoming available times when you can meet.  But I couldn’t find anything like this for Google Calendar, hence this Gadget.

calendar_gadget_diagramWhat it Does

It’s a tool that can look at your calendar and create a list of available times that you can easily copy and paste into an email message.  All you do is install it, tell it how many days out you want appointments, the timezone, and how much buffer you want between your existing appointments (to avoid getting booked on back-to-back meetings) and you’ll have a list of times that you can easily copy-and-paste into email whenever you need it.