Googlebot as a Persona, for ProductCamp Austin


Here are the resources that will be useful to attendees of my presentation at ProductCamp Austin.  In my opinion, these should be required reading for any Product Manager responsible for driving a material portion of their company’s revenue from a consumer-facing web application.

Any questions? Get in touch!

No Seriously, Why Doesn’t The Wall Street Journal Link to Websites?


At SMX Advanced, I asked the following question of a panel which included Alex Bennert, In House SEO, Wall Street Journal:

“Why do WSJ journalists not link to a website they write about, even when the story is ABOUT the website?”

Ms. Bennert responded that she was not aware of any who were not linking and said, “they should be.”  Danny Sullivan, Editor-in-Chief of speculated that perhaps it was because the stories were behind the pay wall.

Today, I double-checked and I can confirm they don’t and they aren’t.  I found three recent stories that exemplify the problem, and none are behind the pay wall:

Speaking as both a WSJ online subscriber and SEO, the Journal would be more usable if it consistently linked to websites mentioned in stories.  Here’s my suggestion: if you mention a company in a story, especially where the company is an online-only entity, please link to it. You could simply modify the software running your blog and publishing system to automatically link anything that ends with “.com” (or other TLD).

One other question I have for the Journal: what’s the deal with using official corporate names of a web companies, (e.g.  “Smashwords, FastPencil Inc. and Lulu Enterprises Inc.” [emphasis mine])?  To me, it’s unnecessary and in the above case, inconsistently-applied which makes it distracting.  Why didn’t the Journal refer to Smashwords as “Smashwords, Inc.” yet included the “Inc.” for the other two companies mentioned?  I’m probably the only newspaper geek who notices this stuff, but it’s an unforced error in my book.

You may be wondering, why do I care?  A few years ago, the WSJ mentioned in a story, referring to it as “Apartment Ratings, Inc.” without a link.  So much for clicks from the article or Google seeing a quality signal from that story!  I was frustrated by that experience and ever since I’ve taken note of the way the Journal writes about websites.  I hope Ms. Bennert can help the Journal correct some of these problems.

I also hope the search engineers at Google and Bing find a way to identify companies and websites mentioned in news stories and attribute link juice even if those mentions are not linked.  Seems like it would be not only smart but relatively straightforward for them to monitor authoritative news sources for companies mentioned and treat the mentions just like links.

Google Insider: Yes, PageRank Determines Your Indexation Cap


Rand posted about a month ago on Google’s indexation cap.  He wrote,

Google (very likely) has a limit it places on the number of URLs it will keep in its main index and potentially return in the search results for domains.

I have inside information that this is more than “very likely”, it is exactly right, at least since 2006.  I got this information on accident; it was forwarded to me as part of a response to a question I posed via friendly channels at Google.   The email quoted below is the full, unchanged response from a well-known SEO personality who worked at Google.

The problem that inspired my inquiry was almost exactly what Rand describes– 40% traffic drop, rankings on “head” terms were fine, but traffic from thousands of long-tail searches was gone, and the sinking realization that large parts of my site went Supplemental.   Here’s the answer I got:


The main issue is the dramatic decline of backlinks (which is likely causing the PageRank issue). The duplicate content issue isn’t that there are lots of pages that are exactly the same, but that each page isn’t unique enough (too much boilerplate that is the same from page to page and not enough content that is different). But fixing that is not going to help too much. It’s mostly the backlink problem.

If the issue on backlinks is that other sites are linking to 404 pages and not actual pages, putting in 301 redirects for every incorrect link (which they can get a list of in webmaster tools crawl errors) will help.

I’ll see if I can find out if the # of backlinks actually did drop or if we just changed our algorithms to discount many of them. I’m not sure what, if anything, we’ll be able to tell him about what I find out though.

Supplemental results aren’t results that don’t change much, they are results that don’t have enough PageRank to make it into our main index (we can’t tell him that, of course). [emphasis added]

So to sum-up, the key points are:

  • Google imposes a cap on the number of pages you can have in the Main index (unless you have infinite inbound links, see next point)
  • The cap is determined by the number of backlinks to each page (PageRank)
  • Google’s Main Index includes only pages with “sufficient PageRank”
  • Everything else (pages with “insufficient PageRank”) goes into Google’s Supplemental Index
  • Google has never publically confirmed this

Questions I still have:

  • In looking at “backlinks,” with respect to the indexation question, is Google doing a simplistic link count or is “backlinks” a euphamism for PageRank? (My guess is that it’s the later, backlinks = PageRank).
  • Does Google care whether the backlinks/PageRank to a page come from internal or external sources? (My guess is that internal links still influence the “backlinks” to a certain page and thus it’s possible for a site to influence which pages are in the Main index versus Supplemental by “managing” PageRank via their link graph).
  • How might have Google change this algorithm in the past 3 years?

So if your link graph influences which pages are in the Main index (and hopefully I’ve said that generally enough to avoid wading into the debate over nofollow’s efficacy), there are some very striking implications which I’m sure others will explore.

For starters, maybe you deprive some pages of PR (links) so you can concentrate the PR on more valuable pages you can actually “lift” from the Supplemental results (i.e. pages associated with valuable keywords or where the SERP is weak enough to achieve a top 3 ranking).  I’ll leave the tactical question regarding whether nofollow tags are effective ways to do this to others.