John Andrews is a Competitive Webmaster and Search Engine Optimization Consultant in Seattle, Washington. This is John Andrews blog on issues of interest to the SEO community and competitive webmasters. Want to know more?

johnon.com  Competitive Web & SEO

SEO for AJAX

Update March 2010:

Google has advanced in its plan to impose new technical methods on webmasters in order to make AJAX accessible. In a new tutorial, the Google team describes a process of tagging URLs such that Google recognizes they utilize callback functions for dynamic content loading, and outlines an “HTML snapshot” method of presenting the context for indexing.

Still quite cumbersome. If you look at the current search results, some webmasters who have been experimenting since Google’s last pronouncement are getting hosed - improperly indexed and in some cases messed up search results for those testing these new methods.

Read the latest descriptions from Google here.

Update October 2009:

Google has released a blog post describing a proposed change to the way we publish information to the web, in order to more robustly support Google’s indexing efforts. In short, Google can’t properly index web pages which change dynamically (such as AJAX-based views) without the publisher helping out. What is the “final state” or perhaps better, what is the state of the page that the publisher wants Google to index?

That has always been the core question, but also represents the sensitive trust point. If Google trusts the publisher to say “this is what should be indexed”, many publishers mis-represent in order to try and rank higher in Google. Google calls this “search spam”. The act of mis-representing to Google what is really in your web page view has been historically called “cloaking”. When you cloak Google, you show Google different page content than you intend to show users that are referred by Google search. Also historically, cloaking can get you banned from Google.

So if Google now wants publishers to use these new HTML constructs (specifically, new alternative URLs that tell Google which state of the AJAX-loaded page is to be considered the “static” state, or the one that should be indexed), how does Google defend against cloaking and search spam?

Perhaps more interestingly, what is the reward for compliance? Better ranking? Or simply indexing? In other words, if you don’t comply with these new Google-specific technical publishing requirements, do you fail to get indexed? And if you then cloak in order to get indexed (using standard HTML), have you broken guidelines and will you get banned?

The competitive webmaster sees all sorts of opportunities problems lurking in this approach. I’m sure Google does as well, which is why Google needs to try and make this an HTML standard or a web publishing standard. As long as it is a Google-specific technology, or perhaps even a search engine specific technology, it will suffer from that uncertain grey area of trust.

Read the announcement, and listen carefully to the future discussions. This is YEARS away, and in the mean time, I can’t ignore the fact that webmaster trust is becoming an essential ingredient in rankings.

Update early 2009: This is an old post, written when AJAX was a new concept. It addresses a philosophy of SEO vs. a culture of using AJAX… but today in 2009 we have advanced well beyond that stage.So to address the issue of SEO and AJAX, I wan to point to a great article on “Hijax”, which is a progressive enhancement approach to designing the AJAX system, while actually impementing AJAX at the end. The result is content that is accessible, and a presentation layer based on dynamic AJAX implementation. Check out DOM Scripting: Hijax. My original blog post on AJAX and SEO follows:
Subtitled: “Meet my 5 sons: all named George” or “Advanced SEO

George Foreman has five sons, all named George. If you were to visit George and he were to introduce you to his sons, he would present each one in turn, call them by their name “George” and then maybe add a second label like (”my second George” or “the smart one” or whatever). He said he named them George because he wanted them to always know who their father was. They are not identical: they just have the same name. You can’t tell them apart by their names, but with the added second label or their looks, you could easily identify them. But not if you were blind. You’d need more info, right?

So how do you name your web pages?

Most webmasters know not to use frames because the frame set URL remains in place as each framed page is displayed. The result is like naming all of your kids George - they all have the same URL. A search engine will see just one “name” and list the one URL as a single page in the search index. Bad idea.

Most webmasters these days also know not to use a URL which is almost exactly the same for multiple pages, differing only by a long “id” value. If it isn’t sufficiently distinct to the search engine spider, it won’t be any different from The Other Page Also Named George. Most web masters are also keeping page titles distinct, as a second label (”the smart George”). More advanced web masters have learned to recognize overly “templatic” web pages, which can look nearly identical to other pages, and suffer a similar identity crisis to nearly blind search spiders (less severe consequences, but still sub-optimal).

But what about modern web applications deploying asynchronous user interface technologies like AJAX, where on-page content (or in-page content) can vary depending on context, not just URL? If the middle paragraph updates via asynchronous server calls, without a page refresh, then the URL hasn’t changed and the new resource (or “view”) becomes just another Web Page Named George. Sure it has different content, but it doesn’t exist as a separate entity according to the search engine labels (URL and page title). In other words, it won’t get indexed.

In the current search world, it is essential that each view full of content that you think defines a valuable, site-defining and user-facing page of content on your site be indexed as a unique web resource. If you collate content to create such views so they are specifically relevant for visitors in a specific context (such as we do when we optimize for SEO… matching views of content to referred search engine visitors), then your hard work collating will be for naught if it doesn’t get labeled and indexed.

So what to do?

Many SEO Advice web sites suggest that you generate a second set of static views (or dynamic, but URL-unique views) to be fed to search engines. I don’t think that is a very creative solution. The Sitemaps protocol is also a way to define views to reflect your pre-defined collections, yet archaic anti-cloaking guidelines (like the one at Google) require that those URLs deliver the same content to both users and search spiders. That makes the sitemap little more than a representation of static content. Again, that’s not very creative. For truly asynchronous, data-based content, it’s also not very practical. There are simply too many possible views.

If you are thinking this issue through analytically, and seeking a practical solution that enables deployment of asynchronous visitor views while simultaneously allowing a statically-labeled set of entry points to exist such that a traditional spidering of that content enables a contextual indexing of the content by labeled URL, go ahead and call yourself an SEO. If you get it done in a way that requires a substantial amount of extra work for a web designer, like maybe creating a second static web site just for search engines, then call yourself a second-tier SEO and try to increase your R&D time or training budget. You’re stuck in old-skool SEO. You’re going to need to update your skills sooner than you might like.

But if you have already mapped out your site’s user experience, and documented the intent and strategy that drives the UI designers to create the asynchronous interfaces deployed, then rest assured you are doing well. You understand the reasons why page sections update asynchronously, the underlying data structures in use behind the scenes, and the link between view and content. You already know how you can use that to define a set of defining views, to be sitemapped and exploited as landing pages. A second set of static pages? You betcha. A lot of extra work? on the contrary. It will probably be defining work for the marketing team, worthy of the effort even without search engines requiring it. It won’t require top-dollar designer and developer hours, but merely DHTML web designer hours. Once you get the basic infrastructure in place to support your “sitemap”, you can endeavor to optimize the site independent of the UI designers, the way nature intended :-)

Now if you find yourself working with a javascript team that has the patience required to document that interface, or one that actually had a strategy in place for the user experience before they built it, or one that is working with a data structure actually designed to support that strategically-defined user experience, consider yourself extremely lucky.

If not, there’s still hope. If you have a good marketing team, you might end up delivering some clue packages to the web too dot oh development primadonnas, along with a proper specification for how they might try and get it a little more right with the next refactoring. In order to keep the SEO bill less than twice the development bill (or less than the first year’s PPC bill). I’m just sayin’, thaz all.

★★ Click to share this article:   Digg this     Create a del.icio.us Bookmark     Add to Newsvine

10 Responses to “SEO for AJAX”

  1. matt Says:

    can you clarify please?

    “Many SEO Advice web sites suggest that you generate a second set of static views (or dynamic, but URL-unique views) to be fed to search engines. I don’t think that is a very creative solution.”

    and, later

    A second set of static pages? You betcha.

    I’m a developer. can you give me a use-case or real-world example?

  2. john andrews Says:

    Sure. Instead of creating static pages of your dynamic site (a la sitemap), you need static pages that are strategically designed for your site. It’s not a case of the developer caching pages, but the SEO crafting pages, and working with the developer to generate them.

  3. Matthew Brown Says:

    John,

    Big kudos for an elegantly written piece on a difficult concept. I spend lots of time trying to communicate the same thing to my fellow conscripts, but I’ve never managed to get it as right as you do above.

    Of course, I won’t even touch on how difficult it is to get marketing to do ANY sort of defining work ;)

  4. john andrews Says:

    Thanks Matthew. And your comment on marketing validates you in my book. I have my theories on why that is true, but I’m happy they are finally accepting technology so I won’t risk fracturing any egos at this delicate time ;-)

  5. Daniel R Says:

    John,

    Interesting post, but I’m not sure if that absolutely works for an outside agency coming to a client with a large AJAX/DWR/FLEX driven website.

    We are working on a case where a JS-driven navigation links to over > 2,000 pages (or rather “content scenes” since its all AJAX). It is AJAX driven via a custom CMS system and it looks like the chief purpose of the asynchronous updates is to keep load time to a minimum (the “scenes” has plenty of copy and Flash).

    Strategically creating static pages is not an option because of the massive amount of content in AJAX and the CMS system that exist on the site already. What we’ve opted for is creating a hybrid system that is quasi-sitemap but not quite.

    Basically, the navigation system would be updated from

    Fiction > Dickens

    to

    Fiction > Dickens

    So now, the search engines follow the HREF, while the humans follow onClick.

    Look forward to hearing your thoughts.

  6. Daniel R Says:

    Sorry, I didnt realize HTML codes were valid for comments. Here’s a retry:

    <p>Basically, the navigation system would be updated from</p>
    <p><a href="getContent(’fiction’,'dickens’);">Fiction &gt;
    Dickens</a></p>
    <p>to</p>
    <p><a href="http://client.com/books.jsp?scene=fiction&amp;author=dickens">Fiction
    &gt; Dickens</a></p>

  7. john andrews Says:

    @Daniel: It looks like you’ve assigned unique URLs to the content, but outside of context. That’s a necesary step, right? But you know the context for a bulk of pages, if you know the rationale behind the asynchronous loading (as far as content inclusion goes).

    A step better is to clean up those URLs in typical SEO fashion,of course, using a front controller or rewrite mapper for your pseudo sitemap. But if possible, why not define views into that content that are context aware? You’re almost there anyway.

    It’s difficult to suppose solutions outside of the context of the app. Some of the AJAX stuff I have seen really fits the term “mashup” because that’s what it is.. a mish-mash. So I leave it alone, and index a psuedo-sitemap that is structured.

  8. johnon.com - John Andrews - » Does “Advanced SEO” Even Exist? Says:

    […] When I sit in a session at PubCon, the “SEO” panelists repeatedly say things like “you need to make your title tags unique” and “your non-www needs to 301 to your www” and I get so bored I choose to sit next to IncrediBill, just to keep things interesting. But when I work with SEO and AJAX, I get a headache from the depth of the challenge. […]

  9. john andrews Says:

    UPDATE: I never intended this post to be an SEO for AJAX tutorial. It’s a blog post from an SEO working in the field of SEO, on AJAX and other, advanced SEO topics. BUT, I can certainly clarify and will do so in another AJAX/SEO post on www.johnon.com.

  10. Jose M. Arranz Says:

    Another and better alternative to Google approach: ItsNat

    With ItsNat you develop a Single Page Interface (AJAX intensive) application and (almost) automatically the same is page based when JavaScript is disabled or ignored (like search engine crawlers see your site).

    Take a look:

    The Single Page Interface Manifesto
    http://itsnat.sourceforge.net/php/spim/spi_manifesto_en.php

    Single Page Interface Web Site With ItsNat
    http://itsnat.sourceforge.net/index.php?_page=support.tutorial.spi_site

    SPI web site online demo
    http://www.innowhere.com:8080/spitut/

Leave a Reply: All comments with embedded links will be placed into moderation. All SPAM is reported.