Update March 2010:
Google has advanced in its plan to impose new technical methods on webmasters in order to make AJAX accessible. In a new tutorial, the Google team describes a process of tagging URLs such that Google recognizes they utilize callback functions for dynamic content loading, and outlines an “HTML snapshot” method of presenting the context for indexing.
Still quite cumbersome. If you look at the current search results, some webmasters who have been experimenting since Google’s last pronouncement are getting hosed - improperly indexed and in some cases messed up search results for those testing these new methods.
Read the latest descriptions from Google here.
Update October 2009:
Google has released a blog post describing a proposed change to the way we publish information to the web, in order to more robustly support Google’s indexing efforts. In short, Google can’t properly index web pages which change dynamically (such as AJAX-based views) without the publisher helping out. What is the “final state” or perhaps better, what is the state of the page that the publisher wants Google to index?
That has always been the core question, but also represents the sensitive trust point. If Google trusts the publisher to say “this is what should be indexed”, many publishers mis-represent in order to try and rank higher in Google. Google calls this “search spam”. The act of mis-representing to Google what is really in your web page view has been historically called “cloaking”. When you cloak Google, you show Google different page content than you intend to show users that are referred by Google search. Also historically, cloaking can get you banned from Google.
So if Google now wants publishers to use these new HTML constructs (specifically, new alternative URLs that tell Google which state of the AJAX-loaded page is to be considered the “static” state, or the one that should be indexed), how does Google defend against cloaking and search spam?
Perhaps more interestingly, what is the reward for compliance? Better ranking? Or simply indexing? In other words, if you don’t comply with these new Google-specific technical publishing requirements, do you fail to get indexed? And if you then cloak in order to get indexed (using standard HTML), have you broken guidelines and will you get banned?
The competitive webmaster sees all sorts of
opportunities problems lurking in this approach. I’m sure Google does as well, which is why Google needs to try and make this an HTML standard or a web publishing standard. As long as it is a Google-specific technology, or perhaps even a search engine specific technology, it will suffer from that uncertain grey area of trust.
Read the announcement, and listen carefully to the future discussions. This is YEARS away, and in the mean time, I can’t ignore the fact that webmaster trust is becoming an essential ingredient in rankings.
Update early 2009: This is an old post, written when AJAX was a new concept. It addresses a philosophy of SEO vs. a culture of using AJAX… but today in 2009 we have advanced well beyond that stage.So to address the issue of SEO and AJAX, I wan to point to a great article on “Hijax”, which is a progressive enhancement approach to designing the AJAX system, while actually impementing AJAX at the end. The result is content that is accessible, and a presentation layer based on dynamic AJAX implementation. Check out DOM Scripting: Hijax. My original blog post on AJAX and SEO follows:
Subtitled: “Meet my 5 sons: all named George” or “Advanced SEO”
George Foreman has five sons, all named George. If you were to visit George and he were to introduce you to his sons, he would present each one in turn, call them by their name “George” and then maybe add a second label like (”my second George” or “the smart one” or whatever). He said he named them George because he wanted them to always know who their father was. They are not identical: they just have the same name. You can’t tell them apart by their names, but with the added second label or their looks, you could easily identify them. But not if you were blind. You’d need more info, right?
So how do you name your web pages?
Most webmasters know not to use frames because the frame set URL remains in place as each framed page is displayed. The result is like naming all of your kids George - they all have the same URL. A search engine will see just one “name” and list the one URL as a single page in the search index. Bad idea.
Most webmasters these days also know not to use a URL which is almost exactly the same for multiple pages, differing only by a long “id” value. If it isn’t sufficiently distinct to the search engine spider, it won’t be any different from The Other Page Also Named George. Most web masters are also keeping page titles distinct, as a second label (”the smart George”). More advanced web masters have learned to recognize overly “templatic” web pages, which can look nearly identical to other pages, and suffer a similar identity crisis to nearly blind search spiders (less severe consequences, but still sub-optimal).
But what about modern web applications deploying asynchronous user interface technologies like AJAX, where on-page content (or in-page content) can vary depending on context, not just URL? If the middle paragraph updates via asynchronous server calls, without a page refresh, then the URL hasn’t changed and the new resource (or “view”) becomes just another Web Page Named George. Sure it has different content, but it doesn’t exist as a separate entity according to the search engine labels (URL and page title). In other words, it won’t get indexed.
In the current search world, it is essential that each view full of content that you think defines a valuable, site-defining and user-facing page of content on your site be indexed as a unique web resource. If you collate content to create such views so they are specifically relevant for visitors in a specific context (such as we do when we optimize for SEO… matching views of content to referred search engine visitors), then your hard work collating will be for naught if it doesn’t get labeled and indexed.
So what to do?
Many SEO Advice web sites suggest that you generate a second set of static views (or dynamic, but URL-unique views) to be fed to search engines. I don’t think that is a very creative solution. The Sitemaps protocol is also a way to define views to reflect your pre-defined collections, yet archaic anti-cloaking guidelines (like the one at Google) require that those URLs deliver the same content to both users and search spiders. That makes the sitemap little more than a representation of static content. Again, that’s not very creative. For truly asynchronous, data-based content, it’s also not very practical. There are simply too many possible views.
If you are thinking this issue through analytically, and seeking a practical solution that enables deployment of asynchronous visitor views while simultaneously allowing a statically-labeled set of entry points to exist such that a traditional spidering of that content enables a contextual indexing of the content by labeled URL, go ahead and call yourself an SEO. If you get it done in a way that requires a substantial amount of extra work for a web designer, like maybe creating a second static web site just for search engines, then call yourself a second-tier SEO and try to increase your R&D time or training budget. You’re stuck in old-skool SEO. You’re going to need to update your skills sooner than you might like.
But if you have already mapped out your site’s user experience, and documented the intent and strategy that drives the UI designers to create the asynchronous interfaces deployed, then rest assured you are doing well. You understand the reasons why page sections update asynchronously, the underlying data structures in use behind the scenes, and the link between view and content. You already know how you can use that to define a set of defining views, to be sitemapped and exploited as landing pages. A second set of static pages? You betcha. A lot of extra work? on the contrary. It will probably be defining work for the marketing team, worthy of the effort even without search engines requiring it. It won’t require top-dollar designer and developer hours, but merely DHTML web designer hours. Once you get the basic infrastructure in place to support your “sitemap”, you can endeavor to optimize the site independent of the UI designers, the way nature intended :-)
If not, there’s still hope. If you have a good marketing team, you might end up delivering some clue packages to the web too dot oh development primadonnas, along with a proper specification for how they might try and get it a little more right with the next refactoring. In order to keep the SEO bill less than twice the development bill (or less than the first year’s PPC bill). I’m just sayin’, thaz all.