Skip to content

SEO as International Minutia Dealer

I spend a lot of time on the minutia of web publishing. I also work for International clients. Since most of what I do is minutia for International clients, I recently referred to myself as an “International Minutia Dealer” and the poor lady next to me in the group acted shocked and said “I think we have enough war in the world, thank you very much”. Whatever.

So one of the minutia I deal with is the trailing slash. When dealing with frameworks, front controllers, and other-than-Apache web servers, it can be tough to get absolute control over trailing slash minutia. If you don’t know what I mean, consider this SEO quiz:

Q: How many web resources are represented by the following list of URLs, according to Google?

  9. (or index.php or index.asp, your web server’s default)
  13. (your web server default is still index.html)

See what I mean about minutia? That’s 18 versions so far, and I left out some biggies. So what’s the answer? How many of those are unique URLs according to Google? And if we had another search engine today, what would the answer be for that search engine?

How about putting this into a little context. What if you have a page at and it is very, very popular. It has 2 million back links to that exact URL, and 2 million more to, half with a trailing slash ( and half without ( Your boss implements a new site, and your job is to migrate the site to the new server. Oh, and the new server uses a front controlling system such that all urls will look like  You have been reminded that “you were not hired for your web development or web design skillz, so stay out of that kitchen but because you know SEO make sure they don’t screw up and we don’t lose any rank, ok? It’s your only responsibility, so make sure it’s done right.” Nuff said.

That simple example of minutia drives an industry of highly-prized SEO consultants that work diligently while regular SEO “consultants” argue about whether SEO or PPC is the new sliced bread of marketing, or whether or not SEO is a Ron Popeil “set it and forget it” task.

Matt Cutts almost addressed this issue (from a “Google perspective”) last year on his blog. He was trying to clean up some of the mess around Google’s handling of redirects and Google’s use of the word “canonicalization”.

Those of us who suffered through courses on Linear Systems Control Theory in college know that a canonical form is an arrangement of a system such that you represent it with the least amount of parts (yet it is fully represented). Examples of canonicalization are everywhere, even if unlabeled. here’s another one:

The car has wheels and wheels have wheel covers. If you need to draw (represent) a car, you need to draw it with circles for wheels, because if you drew the car with no wheels people would not say it was a car, but would probably say “it looks like a car with no wheels”, or “it looks like a car that has no wheels”, etc. Draw the wheels and everyone will say “it’s a car”. Did you need to draw the wheel covers?  No. The canonical form of that representation (in this very specific example) could be the car body plus circles for wheels. Google is staffed by a bunch o’ engineers who have probably all taken control systems theory or higher math and so when they wanted to label the idea of “how do we identify the web site resource without all the extra redundancy that might be present in default file names, extensions, meaningless subdomains like www, and trailing slashes“, they probably started simplifying with lingo like “what’s the canonical root“. Geek talk. Of course my example is a physical one for the non-Engineers. Systems theory is not about physical parts like cars and wheels but mathematic equations and representations, which can be mixed and blended as needed to come up with different forms (such as canonical forms). There are actually many kinds of canonical forms. Go figure.

By the way “canonical” is also sometimes defined as “according to the rules” (or canon), but since in this case there are no rules to follow, and the Google people were clearly trying to “figure this out” for a best way, I doubt that’s the source of the use.

Anyway Matt said this about the trailing slash:

Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Sorry that it’s a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:


But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.

Q: So how do I make sure that Google picks the url that I want?
A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to and the other half go to . Instead, pick the url you prefer and always use that format for your internal links.

A commenter apparently is also a Minutia Dealer because he followed up with this good question:

Thanks Matt for the continued explanations and advice about this stuff. I have been reading up on Canonical issues for a while (suffering from one myself due to not knowing about them before hand and not using 301 protection), I have set up a 301 and on server name resolution so that all requests for the main index page go to (the trailing slash is always added anyway).

Google still shows and in the serps and is docking my PR due to it. Will the 301 be picked up by the main googlebot and remove the index.php reference from the results in due course?

Also I can’t fathom out why this sort of thing isn’t under the webmaster’s control? If I know that the result is WRONG then there should be a system to remove JUST that reference? Is this impossible?

As far as I know that follow up question remains unanswered, but that’s not surprising to me. Google has incorporated some automated “canonicalization” checkers which are pre-programmed to handle these minutia according to “the Google Algorithm”. Matt suggested that the examples are “technically different” but also says Google tries to pick the right ones to represent the web resource. More recently, Google people (was it Matt again?) have said that Google is “pretty good as that stuff” when discussing this very issue (I have to re-locate that reference.. don’t have it handy cause it just doesn’t matter to me). Of course they are. But that’s not the question. The question is, what does Google do?

The commenter followed the rules and got stuck – he’s got duplicate entries in the index for the same web resource, due to the way Google spidered and indexed his site. He can’t remove the bad one. He can’t fix the problem.

As a professional SEO (International Minutia Dealer) I want to exercise 100% absolute control over how Google spiders, indexes, and serves up my content. I don’t want to “try and see”, and I don’t want to “find out” on a live site. And when Google changes The Algo, I want Google to change it correctly, not just from TheOldGoogleWay to TheNewGoogleWay. I am all in favor of Google getting better over time, but very much against Google just getting “different” over time. I will strive to be TheBest, and get all of my minutia in a row, and I want Google to evolve into TheBest, reward my orderly, technically-correct minutia with error-free, predictable spidering, indexing, and serving in the SERPs. Is that too much to ask?

Of course I recognize the advances Google has made with Webmaster Console (cue Vanessa?). Yes I know there is now a “www or non-www” option in Google sitemaps. That’s not the answer, however. The questions are there even when there are only a few, relatively advanced people asking them. Those questions should be answered. It is not enough to answer the most basic ones once the majority of people are encountering them (think www vs. non-www).

So about that SEO Quiz… what’s the right answer? The first twelve are almost always the same resource, although they do not have to be. Does Google assume they are? Today? The next six are a bit odd, but today’s frameworks make them more common and less unusual. Are they unique according to Google today? Will they be tomorrow? Does anybody know? Can anybody know?

It seems Google figured out the domain fairly well, so issues of http:// or not are non-issues, but https:// and http:// are different as they should be *unless* you mix them yourself and then I suspect https:// is fair game for indexing even if there is a robots exclusion.  Google does a fair job of picking www or non-www from the way people link to you, the way you use it yourself  for internal linking, and your preferences if you use webmaster console (in that order if you ask me, in reverse order based on my read of Google representative’s suggestions). They still get it wrong sometimes. I still strongly recommend a hard 301 to your preferred default, and a very considerable eye on your in-linking.

What about the harder question of trailing slashes on deep resources? It’s a toss up. Keep in mind you wield some influence over Google by the way you self link and the way other’s link to you, but Matt Cutts said he thinks people commonly type in domains with trailing slashes so I still worry. I suspect there are more important issues on Google burners right now, and competitive SEO types will continue to build test sites and learn for themselves how TheGoogleAlgorithm works, for themselves and for their clients.


  1. Lea de Groot wrote:

    Interesting – I hadn’t considered vs as an issue as I generally work on Apache and it comes ‘free’ so to speak, but, yes, its obviously another one to check in a new situation.
    I note that canonicalisation is obviously becoming a more visible problem when packages like wordpress have extensions to make sure each post is only ever shown from the ‘correct’ URI :)

    Monday, March 19, 2007 at 5:23 pm | Permalink
  2. john andrews wrote:

    Update: Matt Cutt’s notes that “Last week there was an update to how we canonicalize a small number of urls.” He notes it’s a very minor issue for most people, but if you’ve been experimenting with the backslashes and other “minutia” you might want to consider that something may have changed last week cause in experimentation land, such little changes usually make you start over :-)

    Matt also re-emphasizes how it is smart to set a standard for within-site use of backslash and stick with it.


    Monday, March 26, 2007 at 10:00 pm | Permalink
  3. mrg wrote:

    19. index.htm

    I have seen examples of different pages at index.html and index.htm with different pr’s.

    Wednesday, March 28, 2007 at 8:17 am | Permalink
  4. Nirupam Roy wrote:


    After going through all the posts on canonicalization, i have only one thing to ask i.e. “Isn’t Google SMART enough to identify the right version of an url on its own ??”
    Being a Search Giant, everyone would expect Google to be clever enough to understand the right form of an url. Isn’t that so ???


    Friday, March 30, 2007 at 12:29 am | Permalink
  5. john andrews wrote:

    @Nirupam: I think you misunderstand. There are many valid ways to place a web resource. What is ‘the right version”? If you want to know who should be doing things the right way, it is the web master.

    Friday, March 30, 2007 at 1:33 am | Permalink
  6. Nirupam Roy wrote:


    I understood the problem. But my question is “how many webmasters are aware of canonicalization”? thatz why i said it would have been better had google do it on its own. I am very much sure they are aware of which url’s to be considered in search engine listings and which one should not.


    Wednesday, April 4, 2007 at 4:43 am | Permalink
  7. As far as I understood the canonical problem is an ongoing problem Even google in its webmaster tools i.e. the sitemap has introduced the concept of making one url to be the main but still I do prefer to ask google are they themselves not aware which of them is most important to them. Why leave it on people who probably knows little to nothing about canonical problems.

    Tuesday, April 10, 2007 at 3:33 am | Permalink
  8. Ubuntu Daily wrote:

    Fortunately WordPress and other software are programmed so that it does the right thing out of the box.

    Saturday, June 2, 2007 at 11:58 am | Permalink
  9. Artem wrote:

    Blogger and WordPress have not this problem. But I can’t understand, how could be so that one page have the different Page Rank. For example w w w have PR5, but w w have PR4?

    Saturday, July 14, 2007 at 11:48 am | Permalink
  10. I hopw with the introduction of Google Webmaster Tools things will be easier where you can actually help Google identify the right type of url for a site.

    Tuesday, August 28, 2007 at 2:57 am | Permalink
  11. The 301 redirect works very well, it also retains the history, and the back-links. What I have realized though, is that it takes a good six months for your rankings to get back, and for all of your site to get indexed.

    Monday, September 3, 2007 at 12:43 pm | Permalink
  12. Tim Fuchs wrote:


    Every single url gets its own pagerank from google. If you don’t want to have different pageranks for those two you mentioned, just link to / instead of /index.html, the webserver software will pick up the index.html automatically.

    John replies: Yes, they are unique pages and that is why it is a problem. While you may link to just one of them, that does not solve the issue as others can link to the other version, causing both to get indexed.

    Tuesday, September 18, 2007 at 9:13 am | Permalink
  13. Donetsk wrote:

    This may have been one of my biggest (most annoying) problems, whenever I changed shopping cart software it would reformat my url from say /store to /store/ or /store/index.php or /store/home.php HOW FRUSTRATING! Currently with static sites I use a couple of mod rewrites to change the htaccess moving everything to
    Thanks for the post & sharing my pain lol.
    Regards, Don.

    Wednesday, September 26, 2007 at 6:10 pm | Permalink
  14. Bree Falk wrote:

    I didn’t know there were so many different types of web address. My blog has the trailing “/” but I think that all wordpress blogs have this. I don’t know if there is a way to get rid permanently. Would be nice to to just have the normal site address.

    Monday, October 1, 2007 at 4:37 am | Permalink
  15. Empresas wrote:

    There are some update about this.? What better for SEO purposes? finish with slash or extension .html. Thanks.

    Thursday, June 26, 2008 at 8:50 am | Permalink
  16. Zola wrote:

    Recently I moved my blog from www to non www version. I found drastically downfall in images traffic from google. As well as down Pr3 to Pr2.

    @Zola: there are many factors involved… perhaps you had incoming links to www before, etc. 

    Sunday, January 11, 2009 at 4:38 am | Permalink
  17. spammer wrote:

    I definitly go for But i see more and more successfull blogs without the “www”… I gess it is just for the readers… the URL seems shorter and easier to remember.

    however Im still wondering if google see and as 2 different websites…

    Saturday, May 9, 2009 at 7:37 pm | Permalink

3 Trackbacks/Pingbacks

  1. […] Continuing on the concept of SEOs Dealing in Minutia, I am also marveling at how this small problem with Apache has gone un-noticed for so long in SEO world. […]

  2. » Favorite SEO Blog Posts - John Andrews - on Tuesday, October 2, 2007 at 12:27 pm

    […] SEO as International Minutia Dealer […]

  3. […] SEO as International Minutia Dealer […]