Good Ranking at Google is about Balance

Ranking is a balance of several factors, the most important being title tag, (inbound) link text, and PageRank. Every page ranks based on its own measures of these factors sorted against all other pages in the index. Theoretically, every single page ranks for every single search query, but the vast majority don’t rank very well, so we never see this aspect.

The title tag is an on-page factor, and the most important of the on-page factors at all the engines, Google included. In fact, at Google I can get a blank page to rank in a competitive search so long as I use a good title tag and work hard on my text links. This would be much harder to pull off at Yahoo, where on-page text factors are more important.

The first of the off-page factors is link text. In their original technical papers the founders of Google told use that they take the link text pointing at a page and store it with the target page. This is huge. What it means is that the query engine has only to look at a page to analyze the links to the page. It also tells us that nothing other than the link text itself mattered in link between pages. There is some current argument that this is no longer true, but we’ll leave that for another day.

This accumulation of link text to a target page did not have a name in the Spring of 2002 when I was completing OptiLink, so we had to come up with a name to tell people what we were doing. We decided to name this analysis of the links Link Reputation to differentiate it from Link Popularity which names the mere counting of links.

Link Reputation, because of the way Google handles links, is not "transitive", that is, a link from page A to page B, adds to the Reputation of page B, but none of the links to page A have any influence on the Reputation of page B. For example, suppose we get a link to our hammock site from a site on gardening and the gardening site has a bazillion links to it that say "gardening". Will our hammock site rank for gardening? No. This is easy to prove for yourself by looking search results and doing some analysis with OptiLink.

But PageRank by contrast is transitive. It assigns a number to every page in the Google index using a recursive computation over the entire link graph of the index — a mathematical implementation of "what goes around, comes around". PageRank is one of "elegant" ideas that takes some thinking to understand. If you are interested, take a look at my publications archive, my Dynamic Linking eBook, or my Mastering PageRank video.

But for now let’s talk some about balance.

We’ve all seen cases where a page with few links can beat a page with many links if it has higher PageRank (from its few links). The converse is also true. So how do we get top ranking? First, we do the best we can on each of the major factors (title, link text, PageRank) and then we go look at the pages we are competing against and look for their weakest aspect. It might that we can create better link text; maybe their title is not as good as it could be; or maybe we have to build more PageRank. Whatever it is, the search results themselves tell us what works, we just have to do better.

Google Doesn’t Do Synonyms

Rarely do we want to rank for only one search phrase. Generally there are many different words and phrases that humans might use to get to your pages, different phrases that I refer to here very roughly as "synonyms".

But the notion of synonomy is a "semantic" concept — one that involves meaning — rather than a "syntactic" concept — one that relates only to language. Search engines don’t do semantics, only syntax.

For example, I know that a force transducer is the same thing as a load cell, but these semantically equivalent terms are syntactically disjoint, so ranking for one will not get you ranking for the other in today’s search engines.

This is especially problematic where your company name, which often gets used as link text, does not contain your preferred search. Here’s a concrete example I found to illustrate this point, but you can find many of your own.

A company named Transducer Techniques sells load cells. Totally obvious to me, a total mystery to Google. Many of the links to this firm are from industry directories that use the company name, so they are top ranked for the word "transducer". But according to WordTracker, the real traffic is for "load cell", where this company ranks #4.

Now compare Transducer Techniques to Load Cell Central, the top ranked page for load cell, which is not even in the first 100 results for transducer.

This is entirely a result link text. If links like Transducer Techniques were replaced with links like The Load Cell Experts the rankings would like be very different owing to the difference in PageRanks between these two sites.

A Brief Outline of the Google Architecture

The very front end of Google is the spider function which creates a queue of pages visit with content for each. This feeds the indexer which matches these pages to existing pages in the index and either creates a new index entry or updates an existing one with the new content. Originally at least, and probably still true I think, the indexing stage is also where link text is propagated from linking pages to destination pages where it is stored as an augmentation to the target page. This is one of — IMHO — the great ideas inside Google as it essentially reduces a key off-page analysis to one that can instead be conducted entirely on-page.

As an entirely separate process that uses the existing index as input, the PageRank algorithm is run to create a "side file" of PR values indexed by the same unique identifier used to access pages. The architecture makes no demands as to when PR is updated — it can be on any schedule they like.

What normal humans, not to be confused with SEOs :-), call "the search engine" is really the "query engine". This takes the index of content and the PageRank values and performs a computation to rank results as each user query is received and processed. Clearly, there can be no pre-processing of queries — there is just raw data to feed into this engine.

So what happens to a brand new page?

A new page doesn’t actually "exist" as far as Google is concerned until it is spidered nd indexed. Once it is indexed, meaning you can find it at Google using a site: query on your domain, it is my best understanding that all Link Reputation effects are baked into the index and fully affect search queries. Since spidering and indexing is the fastest part of Goolge, this explains the similarly fast positioning changes that can be accomplished through link text alone.

PageRank is another matter and can take any amount of time. There is no requirement in the software architecture that requires any particular schedule. In fact, my research suggests that the way PR is computed and the schedule used to compute it are drastically different today than 3 years ago.

While awaiting PageRank values for a new page, it can be very difficult for that page to rank to any significant searches. Likewise, making significant linking topoloy changes in an existing website, using my Dynamic Linking approach for example, will generally take a quite a while to have a ranking impact because of how long it takes for PR values to be recomputed across your site.

Links that open a new window. Do they pass PageRank and Link Reputation?

Spiders are simple little animals and the theory that guides the way search engines are built admit of few special cases, so generally speaking, a link is a link is a link. Let’s consider some cases.

The most often question is what happens for the target="_blank" case. This is very common so it would have staggering consequences if it did not act as a "normal" link. Moreover, treating it differently is theoretically unsound, as linking into a new window is not conceptually different, in terms of citation considerations, than linking in the current window.

The question of the use of style classes and DOM ids also comes up. I am certain that these are simply ignored by all search spiders, but that is a subject for another day (why spiders are so dumb).

BTW, OptiLink and OptiSpider both treat all of these links the same. The only exception is the rel="nofollow" attribute (Google’s simplified approach to Dynamic Linking) which is optionally processed by both programs so that our computation of Link Reputation will follow what Google does.

What kinds of “links” are spiderable

When most webmasters talk about links, they almost always mean "navigation". There are several ways to connect your pages together so that humans can get from one to the next.

By contrast, there is precisely one way for spiders to get from one page to another and that is the HTML <A> tag. To look at a page in your browser, you will often be unable to tell what navigation is actually spiderable links. That’s one of the primary uses of OptiSpider.

OptiSpider, like the search engines, is only able to follow <A> tags — all other kinds of navigation, like Javascript, and form buttons as two examples, are completely invisible to OptiSpider.

So, if you are uncertain if your navigation is spiderable or not, just run OptiSpider and see which links are found and which are not.

What are the problems in ranking?

First, let’s define terms.

Dynamic content generally means that pages are generated by some chunk of software by reading data from a database. This is a common design pattern for big catalog style websites.

But no matter how the content gets generated, it still ends up being just plain-ol’ HTML by the time it leaves your web server. What the browsers and search engine spiders see is no more nor less HTML than so-called "static" pages. So the content itself is certainly not a problem — the URLs might be.

To generate the right output, a catalog style site will typically use one or more parameters in the query string. For example, /product.jsp?model=12&color=green&style=7.

Historically, search engines have not liked these complex URLs because they will routinely lead to a (nearly) never-ending sequence of pages. A "plain" URL — one without a query string — tended to have an advantage over complex URLs.

 

This is not as much of a problem today as it once was, but it is still better to avoid complex URLs. To do so requires the use of a URL rewriting engine like Apache’s mod_rewrite module so it is fairly technical, but once done, a "dynamic" website can be made to look entirely "static".

Do the engines discount links from the same domain versus links from a different domain?

Not now. Not ever. To be sure, just follow the money.

Big sites involve big dollars and it is big dollars that makes the world go ’round, so you can bet that the now public Google will have joined the other public financed engines in a gentle but certain catering to the American greenback.

Consider: If my name is Bill and I build a 10,000 page website to support my software business, I expect to have a high PageRank at Google and I expect to be able to control the Link Reputation of my home page using my thousands of internal links. Now suppose some disgruntled open-source weenie links to me with the text "Windows is Evil" and gets ranked ahead of me for my own product name? I would have good reason to be upset.

So, you can bet that at the very next social event for young billionares, Bill will corner Larry and get it fixed.

But seriously, if internal links are significantly discounted relative to external links, then small sites always gain an advantage over large sites. This is very bad. In general, big sites actually do deserve to rank better than small sites, external linking being more or less equal, which internal links will accomplish automatically.

If you rely instead solely on external links, what you will find is that the first to get a top rank will continue to keep top rank because it is top ranked pages that get most of the links. This would make it even harder for a large site to displace a small site that happens to get top ranking.

And finally, just go look at some search results with OptiLink. Big sites have a clear advantage. Do they have more external links than small sites? Only some of the time. It still appears that links from any source are sufficient, so they might as well be your own.

Is a MiniNet an effective strategy for ranking against a million competitors?

The simple answer is yes. If that’s all you need to know, stop now and get back to work! 😉 But if you want to know why…

I am often asked variations of this question and the answer is always yes. Huh? Much like Hitchhiker’s Guide, it is the question itself that is wrong, leading therefore always to the same, unhelpful, answer.

The capacity to rank depends on total number of pages plus the stategy used to link those pages together. Megasite, MiniNet, Blog — doesnt’ matter — pages are pages and the way they are organized into domains simply does not matter. It is the linking that matters.

Once you understand how to do the linking to make best use of the pages you do have, then you can get to the "right" question — the one the Vogons (almost) destroyed to build an intergalactic bypass. 😉 Fortunately I caught it just in time.

For any ranking task, the real question is "how many pages do I need?"

Pages are the ultimate source of ranking power. Smart linking allows you to get the best use of that power. If you are not ranking where you want to, you must either use what you have more effectively (via linking) or create increase the raw power you have available (via more pages). Most ranking solutions involve some of both.

So back to those MiniNets…
Michael Campbell’s network structures are some of the best at using ranking power, so they are indeed a good place to start for most purposes (of course, there are always exceptions), leaving us only the real question of "how many pages". That is what OptiLink is designed to help answer. By examining the quantity and quality of linking employed by top ranking pages, you can estimate what you will need to build to be top-ranked yourself.

Can I use Blogs alone to get good rankings?

Blog or not blog is not the real issue — it’s all about pages. Blogging software just happens to be a readily available content management tool that works fairly well from a linking perspective.

Content (pages) is what ultimately creates Google PageRank and provides places to create links to other pages. The more pages the better, and blogs happen to be pretty decent at creating pages from content. The Mastering PageRank video shows a diagram of why that is.

There are examples of folks making money online with nothing but blogs — just as there are examples of folks making money online completely without the use of blogs. Success is about pages plus linking.

Is too much nofollow a bad thing?

Some webmasters worry that pages with lots of incoming links and very few outgoing links, or lots of nofollow links, or some other pattern that looks like using nofollow to game PageRank is being detected.

Certainly: Not yet. Probably: Not ever.

One of my clients ranks 4 in 3.2 million results and has religiously expunged nearly all off-site links to get there. This was done with the (classical?) Javascript Dynamic Link rather than the newer nofollow link because the site in question predates nofollow. A nofollow implementation should work as well.

Moreover, blogs that allow commenting, and that have nofollow enabled on comments, will look like PageRank is being gamed, when in fact, it is completely automated. This will become more common rather than less so, leading me to conclude that filtering on the use of nofollow is a non-starter.