Archives for June 2005

What kinds of “links” are spiderable

When most webmasters talk about links, they almost always mean "navigation". There are several ways to connect your pages together so that humans can get from one to the next.

By contrast, there is precisely one way for spiders to get from one page to another and that is the HTML <A> tag. To look at a page in your browser, you will often be unable to tell what navigation is actually spiderable links. That’s one of the primary uses of OptiSpider.

OptiSpider, like the search engines, is only able to follow <A> tags — all other kinds of navigation, like Javascript, and form buttons as two examples, are completely invisible to OptiSpider.

So, if you are uncertain if your navigation is spiderable or not, just run OptiSpider and see which links are found and which are not.

What are the problems in ranking?

First, let’s define terms.

Dynamic content generally means that pages are generated by some chunk of software by reading data from a database. This is a common design pattern for big catalog style websites.

But no matter how the content gets generated, it still ends up being just plain-ol’ HTML by the time it leaves your web server. What the browsers and search engine spiders see is no more nor less HTML than so-called "static" pages. So the content itself is certainly not a problem — the URLs might be.

To generate the right output, a catalog style site will typically use one or more parameters in the query string. For example, /product.jsp?model=12&color=green&style=7.

Historically, search engines have not liked these complex URLs because they will routinely lead to a (nearly) never-ending sequence of pages. A "plain" URL — one without a query string — tended to have an advantage over complex URLs.

 

This is not as much of a problem today as it once was, but it is still better to avoid complex URLs. To do so requires the use of a URL rewriting engine like Apache’s mod_rewrite module so it is fairly technical, but once done, a "dynamic" website can be made to look entirely "static".

Do the engines discount links from the same domain versus links from a different domain?

Not now. Not ever. To be sure, just follow the money.

Big sites involve big dollars and it is big dollars that makes the world go ’round, so you can bet that the now public Google will have joined the other public financed engines in a gentle but certain catering to the American greenback.

Consider: If my name is Bill and I build a 10,000 page website to support my software business, I expect to have a high PageRank at Google and I expect to be able to control the Link Reputation of my home page using my thousands of internal links. Now suppose some disgruntled open-source weenie links to me with the text "Windows is Evil" and gets ranked ahead of me for my own product name? I would have good reason to be upset.

So, you can bet that at the very next social event for young billionares, Bill will corner Larry and get it fixed.

But seriously, if internal links are significantly discounted relative to external links, then small sites always gain an advantage over large sites. This is very bad. In general, big sites actually do deserve to rank better than small sites, external linking being more or less equal, which internal links will accomplish automatically.

If you rely instead solely on external links, what you will find is that the first to get a top rank will continue to keep top rank because it is top ranked pages that get most of the links. This would make it even harder for a large site to displace a small site that happens to get top ranking.

And finally, just go look at some search results with OptiLink. Big sites have a clear advantage. Do they have more external links than small sites? Only some of the time. It still appears that links from any source are sufficient, so they might as well be your own.

Is a MiniNet an effective strategy for ranking against a million competitors?

The simple answer is yes. If that’s all you need to know, stop now and get back to work! 😉 But if you want to know why…

I am often asked variations of this question and the answer is always yes. Huh? Much like Hitchhiker’s Guide, it is the question itself that is wrong, leading therefore always to the same, unhelpful, answer.

The capacity to rank depends on total number of pages plus the stategy used to link those pages together. Megasite, MiniNet, Blog — doesnt’ matter — pages are pages and the way they are organized into domains simply does not matter. It is the linking that matters.

Once you understand how to do the linking to make best use of the pages you do have, then you can get to the "right" question — the one the Vogons (almost) destroyed to build an intergalactic bypass. 😉 Fortunately I caught it just in time.

For any ranking task, the real question is "how many pages do I need?"

Pages are the ultimate source of ranking power. Smart linking allows you to get the best use of that power. If you are not ranking where you want to, you must either use what you have more effectively (via linking) or create increase the raw power you have available (via more pages). Most ranking solutions involve some of both.

So back to those MiniNets…
Michael Campbell’s network structures are some of the best at using ranking power, so they are indeed a good place to start for most purposes (of course, there are always exceptions), leaving us only the real question of "how many pages". That is what OptiLink is designed to help answer. By examining the quantity and quality of linking employed by top ranking pages, you can estimate what you will need to build to be top-ranked yourself.

Can I use Blogs alone to get good rankings?

Blog or not blog is not the real issue — it’s all about pages. Blogging software just happens to be a readily available content management tool that works fairly well from a linking perspective.

Content (pages) is what ultimately creates Google PageRank and provides places to create links to other pages. The more pages the better, and blogs happen to be pretty decent at creating pages from content. The Mastering PageRank video shows a diagram of why that is.

There are examples of folks making money online with nothing but blogs — just as there are examples of folks making money online completely without the use of blogs. Success is about pages plus linking.

Is too much nofollow a bad thing?

Some webmasters worry that pages with lots of incoming links and very few outgoing links, or lots of nofollow links, or some other pattern that looks like using nofollow to game PageRank is being detected.

Certainly: Not yet. Probably: Not ever.

One of my clients ranks 4 in 3.2 million results and has religiously expunged nearly all off-site links to get there. This was done with the (classical?) Javascript Dynamic Link rather than the newer nofollow link because the site in question predates nofollow. A nofollow implementation should work as well.

Moreover, blogs that allow commenting, and that have nofollow enabled on comments, will look like PageRank is being gamed, when in fact, it is completely automated. This will become more common rather than less so, leading me to conclude that filtering on the use of nofollow is a non-starter.

The Google Top Three

Google depends primarily on three characteristics to rank pages. They are:

  • Page Title
    Having the search term in the title of the page you want rank is key to getting ranked for that term. If it is a multi-word term, don’t break up the term with additional words. For example, ranking for Miami Vacation is more easily done with a title like Best Miami Vacation Packages than with Miami and Orlando Vacations. It is only the title of the page you are trying to rank that matters — the titles of linking pages and other pages on your site are not considered.
  • Inbound Link text
    The link text that refers to a page is very important in ranking the page. The link text is the text that occurs between the <a> and </a> tags in HTML. This will generally be displayed as blue underlined clickable text in a user’s browser. The alt text in images does not seem to be used by the engines, only text is used.
  • PageRank
    This is a feature only at Google, at least until the year 2011, and is a major factor in ranking. It is also fairly involved to manipulate and is the slowest changing aspect of ranking.

By the way…
The other major engines, MSN and Yahoo, can not use PageRank but they do have other link "topology" based schemes. The simplest of these is "Link Popularity". We can be pretty sure that what MSN and Yahoo is more advanced than this, but it does appear to be way easier to "game" than PageRank. That said, if you optimize for PageRank, you will often do what needs to be done to rank at the other engines as well.

The real meaning of Google’s Many Patents

Google has been keeping the patent office busy the last couple years, and reading all those patents has been keeping quite a few SEO gurus busy as well. But figuring out what all those patents are really about is actually pretty easy. In fact, you don’t even have to read them.

Google is acquiring and hoarding Intellectual Property (IP) as a means to create a barrier to entry. This is a marketing and business idea, not an engineering and technical one. By patenting everything under the sun, they tie up core algorithms so that would-be competitors are blocked from using it to build search services. Google is not the first to use this practice — Intel in particular is known for it.

Google also profits from the added advantage that it keeps SEOs confused and busy reading long-winded material that isn’t actually being used, and probably won’t be.

That said, some of these "disclosures" are worth a look and the ideas should at least be tested against the index to see if there is any sign of them being implemented. Don’t count on it. Measure for it. For any one of these patents it’s a far better bet that they are just sitting on it, and not actually using it.

How to write the nofollow attribute

A question came up from one of the owners of the Mastering PageRank video concerning the way I wrote the nofollow attribute. There is a general issue here that should be answered, so I’ll do so here.

The way a browser or a spider processes HTML will create an internal data structure that has all the information from the page, but the order of attributes will NOT be preserved, nor even recognized. So, as an example, writing an <a> tag with the href first and the rel="nofollow" second is no different than giving the attributes in the other order.

In fact, this is required by the HTML and XML specifications: attribute order is not significant.

The way I usually write nofollow links is to place it as the first attribute, like so: <a rel="nofollow" href=… because this allows me to easily find it in the source when checking my pages for errors. But that’s just me.