The Google Top Three

Google depends primarily on three characteristics to rank pages. They are:

  • Page Title
    Having the search term in the title of the page you want rank is key to getting ranked for that term. If it is a multi-word term, don’t break up the term with additional words. For example, ranking for Miami Vacation is more easily done with a title like Best Miami Vacation Packages than with Miami and Orlando Vacations. It is only the title of the page you are trying to rank that matters — the titles of linking pages and other pages on your site are not considered.
  • Inbound Link text
    The link text that refers to a page is very important in ranking the page. The link text is the text that occurs between the <a> and </a> tags in HTML. This will generally be displayed as blue underlined clickable text in a user’s browser. The alt text in images does not seem to be used by the engines, only text is used.
  • PageRank
    This is a feature only at Google, at least until the year 2011, and is a major factor in ranking. It is also fairly involved to manipulate and is the slowest changing aspect of ranking.

By the way…
The other major engines, MSN and Yahoo, can not use PageRank but they do have other link "topology" based schemes. The simplest of these is "Link Popularity". We can be pretty sure that what MSN and Yahoo is more advanced than this, but it does appear to be way easier to "game" than PageRank. That said, if you optimize for PageRank, you will often do what needs to be done to rank at the other engines as well.

The real meaning of Google’s Many Patents

Google has been keeping the patent office busy the last couple years, and reading all those patents has been keeping quite a few SEO gurus busy as well. But figuring out what all those patents are really about is actually pretty easy. In fact, you don’t even have to read them.

Google is acquiring and hoarding Intellectual Property (IP) as a means to create a barrier to entry. This is a marketing and business idea, not an engineering and technical one. By patenting everything under the sun, they tie up core algorithms so that would-be competitors are blocked from using it to build search services. Google is not the first to use this practice — Intel in particular is known for it.

Google also profits from the added advantage that it keeps SEOs confused and busy reading long-winded material that isn’t actually being used, and probably won’t be.

That said, some of these "disclosures" are worth a look and the ideas should at least be tested against the index to see if there is any sign of them being implemented. Don’t count on it. Measure for it. For any one of these patents it’s a far better bet that they are just sitting on it, and not actually using it.

How to write the nofollow attribute

A question came up from one of the owners of the Mastering PageRank video concerning the way I wrote the nofollow attribute. There is a general issue here that should be answered, so I’ll do so here.

The way a browser or a spider processes HTML will create an internal data structure that has all the information from the page, but the order of attributes will NOT be preserved, nor even recognized. So, as an example, writing an <a> tag with the href first and the rel="nofollow" second is no different than giving the attributes in the other order.

In fact, this is required by the HTML and XML specifications: attribute order is not significant.

The way I usually write nofollow links is to place it as the first attribute, like so: <a rel="nofollow" href=… because this allows me to easily find it in the source when checking my pages for errors. But that’s just me.

Relative vs. Absolute links

Rumored to have come from an advanced SEO seminar is this latest SEO myth stemming from a complete lack of understanding of the underlying technology. As the story is told, Absolute Links pass PageRank but Relative Links do not. Horse-hockey!

There are actually three varieties of links. An absolute link is one that includes complete domain name and path information, like http://www.windrosesoftware.com/index.html. A domain relative link is one lacking a domain name, but including absolute path information, such as /site/index.php. The final form is the path relative link, which lacks both the domain name and the leading ‘/’. For example, site/index.php.

A Search Engine Spider, just like your desktop browser, is simply an HTTP/HTML client program. It makes a request via HTTP of a web server and processes the HTML text that is returned as a result. This is the entirety of the interation with the server. All that is left is to process the HTML locally.
To resolve the links in the document, the spider/browser has to take two steps.

First, the "base URL" for the document must be determined. By default, this will be the absolute URL of the document itself. However, the tag can be used in a document to override the base URL used for path relative links found within the document. All browsers and spiders must look for this tag and modify the base URL for the document appropriately before doing any link processing.

Second, with the base URL now in hand, the second step is to "canonicalize" each link. What that twenty dollar word means is "to put into standard form", which in the case of URLs is the same as saying "make all URLs absolute".

  • Absolute URLs obviously don’t change at all, they are already canonical;
  • Domain relative URLs get the domain added; and
  • Path relative URLs get the entire base URL added as a prefix.

So why does it have to be this way? Because spiders deal in "pages", not "sites", there is no way to process non-canonicalized URLs. You can either process absolute URLs or carry around the base URL separately — a relative URL is not meaningful in isolation of the document where it is found. This is so fundamental to the task of parsing HTML, that the only sensible place for the search engines to canonicalize URLs is in the software that does the spidering of pages. Once done, URLs of any variety will be identical.

Moreover, even absolute URLs have problems, owing to what I personally consider a bug in the HTTP specification, so even absolute URLs are not the basis of indexing within search engines. Google uses what the founders called a "docID" to uniquely identify the pages added to the Google index.

Somewhere early in the Google machine, all links are transformed from references via (absolute) URL to references involving the docID. For good technical reasons, the other engines will be similarly organized so that the (original) form of a URL will ceased to be known to the algorithms downstream of the spidering application.

Can linking to a non-related site hurt our PageRank?

All links divert PageRank based on the number of links on the page where the new link appears. The topic of the pages involved in the linking is of no importance to PageRank at all. PageRank only considers the linking structure. As far as PageRank is concerned, the pages could be blank.

What characteristics increase PR?

Every page has a native PageRank. It is pages that create PR. Links only distribute it. The theoretical definition of the PR of a page is the probability that a ‘random surfer’ will access the page. PR is ‘conserved’ — meaning that it is not lost and is not created by any means other than the creation (or destruction) of pages — and the sum of the PR of all pages across Google’s entire index is ‘100%’ — there’s no where else for the random surfer to go — well, except Yahoo or MSN ;-). Keep your eyes on the big three:

  1. Pages are like tickets to the game. The more tickets, the more chances to win. So, make more pages;
  2. Title your pages so that they include search phrases and put said titles on pages that provide meaningful info to humnas that enter that search phrase; and
  3. Use link text to tie together your pages and make the link text include the search phrases that the target pages are about.

DON’T LINK TO GOOGLE!! Google is not some jealous God that requires prayer and supplication. All that link does is bleed PR. Kill it! ONLY link to pages that you want to help RANK. Are you trying to help Google rank?!?!

Are sites with the .php extension worth linking with?

Will they be spidered if they do not have an html extension?

Check to see if the page is indexed by the search engine. If it is, then absolutely, it is an effective link. Google in particular indexes Word files (.doc) and Acrobat files (.pdf) in addtion to HTML. Moreover, php, asp, and jsp files just create HTML output so by the time a browser or Google’s spider sees the page there is no difference. If you take a look at the HTTP headers for these pages (you can do this with OptiLink) you will notice that the content type is text/html just like any "regular" HTML file. The only thing unusual is the page filename extension which bothers Google not in the least.

For really clever folks, you can’t even know what was executed on the server to produce HTML output. For example. it is trivial to configure a server to execute PHP when .html files are served — one line in an .htaccess file is all that is required, so the filename extension really and truly does not matter. The only thing that browsers and spiders can, and do, trust is the Content-type field in the HTTP header.

Are high keyword densities penalized?

When you see really high on-page keyword densities in OptiLink you’ll probably find the the pages are framed and the noframes tag has a very limited string of text in it. Since it is only the noframes tag that the spider is seeing, the keyword density can be very high. But at Google at least there does not appear to be a threshold where some penalty gets imposed. In fact, beyond the title tag, Google does not appear to care about on-page factors at all. At Google, linking is king.

Does Google frown upon pages of the same domain cross linking with each other?

Isn’t this just a normal site navbar? How would a human use a site that did not have such linking?

Or maybe you are referring to multiple subdomains linking to another, see aboutus.com and howstuffworks.com for two very large, very well constructed, very heavily cross-linked and very well ranked examples.

No, it doesn’t look like there’s any frowning involved. 😉

Will I be penalized for linking to non related sites?

All outbound links are evil (because they bleed PR), but generally a necessary evil, and where they go does not in the least matter.