John Andrews is a Competitive Webmaster and Search Engine Optimization Consultant in Seattle, Washington. This is John Andrews blog on issues of interest to the SEO community and competitive webmasters. Want to know more?

johnon.com  Competitive Web & SEO

Latent Semantic Imaging (LSI)..3 years later

Most readers know I loathe giving away free advice. That is just one way that I protect my clients. I often wonder where the beef is in many SEO blogs for that very reason. WHY talk about fight club, if you are a successful member? It doesn’t make sense to me.

What is Latent Semantic Imaging? The words on your pages and their interlinking (including anchor text) create a latent image of the content you are targeting. Talk about developer and fixer and temperatures and exposures and bulbs and timers suggests “darkroom”. Just as chemical “developer” turns an exposed film into a negative of an actual image, Google turns your talk about X and Y into an appearance in the SERPs on the topic of X+Y (if there is one). An exposed film has a “latent image” on it, which is revealed by developing. A web page can create a “latent image” of a topic, exposed by Google’s analysis of the content.
Now 3 and a half years after my close associates and clients heard me proclaiming the effectiveness of certain kinds of word analysis (which I called “latent semantic imaging” after the photographic concept of a latent image), webmasters and SEOs are discussing it and even gave it a new name.

And, the subject is concept density, it isn’t specific to Google, it’s based on word dependencies and LSA, and there’s no need to present synonyms or alternate spellings in the title, headers or body text. In fact, the less ‘bbq’ and its variants are used, the more important the word becomes. Rather than take ‘bbq’ and append words, take ‘bbq’ and find the word dependencies that bolster the concept of ‘bbq’.

That’s a snippet from “digitalghost” and he calls it “concept density” (Webmaster World doesn’t allow linking). He notes it is based in LSA (latent semantic analysis) which is great, but perhaps most importantly he and others note the difference between keyword stuffed content and latent semantic imaging:

Cut to the chase — I blended in many of these newly discovered themes, mixed them naturally into the page (some even became anchor text for internal links) and within a few days the url went to #6 from way down in the 30’s. Within a month it was #1. And given the realities of that market, it should always be #1 or #2, unless some other player enters the field.

That’s tedster (again no link, sorry - Webmaster World policy not mine!). He even states that he has known about this “ever since Google showed us the tilde operator” which is around late 2002-early 2003. See what I mean?

So why, competitive webmasters, is John talking about it now? What’s the angle? I am marveling, that’s all. When you see “how to get rich doing BLAH BLAH BLAH” it almost always means there is no more “mad money” to be made doing BLAH BLAH BLAH. The money in promotional books and make-money-fast schemes (MMFs) is EXCELLENT, but not “mad money”. So, when a method or technique stops earning mad money, and drops to the excellent level, it makes economic sense to sell the technique instead of using it. There are adjacent reasons as well… you want to be the first with an ebook, so you jump in a wee bit early. You also want to side-sell it, which is easiest if you have a first-mover advantage. You also can discredit the copycats, etc.

So it is interesting that now, in August of 2006, 3 1/2 years after I started using it, high-profile webmaster and seo people are now talking about a technique that has worked well in Google… until recently. GOTCHA. Yes, it has changed recently. Two months ago, Google started tuning the knobs on internal anchor text and semantic theming, or whatever you want to call it. Google also appears to have ramped up some internal human editing, and my test sites for latent semantic imaging show quite a bit of variability these days. No more “mad money”. Here come the MMFs?

In this case, latent semantic imaging is not dead by any stretch. But it is now more competitive than it was, and surely Google has a handle on much of it’s influence on the algorithms. Matt Cutts of Google has apparently chosen this area to begin his public SEO work. What does this mean for me and my clients? Nothing. I moved on over a year ago, out of skepticism that it could continue to be so easy. My lazy study of LSA (latent semantic analysis) 3 years ago told me it was computationally too difficult for real-timeuse, although it mightbe very useful as a tool in the Google algorithm. The only way I would be able to use it (if I were a Google search engineer) would be as described - as a latent imaging method. That is what led me to my research. But what we have been doing for these past 3 years has only exploited the lazy approach… certainly Google engineers have more time and talent for this stuff than I do. Surely after 3 years they will have learned to tune it.

It was simply not sustainable, so it was inevitabe that it would get “tuned”, and in my view inevitable that it would be so tuned improperly. To an honest SEO that’s par for the course… you use it for what it is worth, watch the search engine get a grip on it, and keep ahead of the curve. Tedster is a very well respected practitioner… and you can see he’s been “playing with” latent semantic imaging for years before admitting it now, and even now only because the cat appears to be out of the bag. C’mon folks, do you really think that after 3 years playing with the technique, he hasn’t learned more than he’s revealed?

New term? Concept? SEO technique? Nah. New attention, yes. Pay no attention to the little man behind the curtain…

★★ Click to share this article:   Digg this     Create a del.icio.us Bookmark     Add to Newsvine

5 Responses to “Latent Semantic Imaging (LSI)..3 years later”

  1. David Harry Says:

    Great stuff…. I am going to put a resource link to this page in an article I have on LSA/I on my site.

    Great stuff and keep up the great work.

    theGypsy (WMW,SEW)

  2. Peter Says:

    I always thought that LSI was an acronym for Latent Semantic Indexing . . .
    I think LSA has been around since the early 90’s in relation to natural language studies and was adopted by the SE’s for obvious reasons.

  3. DigitalGhost Says:

    The cat may very well be out of the bag, and the concept isn’t new. C’mon, why did Google buy Applied Semantics? I’ve been working with LSA for years, wrote about it 4 years ago, had discussions about it in many fora years ago, but most SEOs dismissed LSA as being too resource intensive for the engines to use. So, no one cared. And now, very few people care to listen, and some still refuse to believe that it has or ever will be used.

    Has Google ‘turned the knobs” Or are they just getting better at implementing LSA? Who cares? What matters is that understanding LSA takes work and most people don’t want to work. They want easy solutions.

    This particular feline jumped out of the sack many years ago, has been discussed for years but simply didn’t get any attention, and likely still won’t. ‘N-grams, word dependencies, concordances, Markov chains, etc, makes people’s eyes glaze over.

  4. John Andrews Says:

    David: Thanks.
    Peter: Just becase I called it Latent Semantic Imaging doesn’t mean that’s a new buzzword. I am not a linguistics researcher by any stretch. I tend to get visual when describing things to clients and the idea here is that the words on the page (via their impact of whatever part of LSA is being used, plus more) can cause Google to assign a new definition (perhaps not any of those words) for that page. So furball, hack, scratch, and meow on the page may influence Google to send you traffic for the query “cat” even though you never had the word “cat” on your page. I called it a “latent image of a cat” in Google’s eyes. See DG’s comment for why this is nothing new.

    DG: I totally agree with you and I think I said that above. Those discussions (and especially the links to references on your old blog, thanks) got me to look at LSA back then. I started ranking for terms not on the page (the latent content) and did so with no backlinks. As the SEO world went into the “links are everything” mode 3 years ago, I was seeing 20k uniques on small sites that had almost zero backlinks.

    As for all the work, you are a linguistic researcher (right?) and so those are your tools. LSA (the field) is your arena, but Google is not yet doing LSA as much as it is doing “something with LSA”. Are your tools required to address what Google is doing with LSA? I’m not sure. I’m pretty sure that while eventually Google will do more LSA (and LSA research will evolve to support SEs better), in the mean time SEO is all about competition. So unless you or some other talented linguistics person is working LSA magic in my niche, I don’t need to do the work you reference. I need to do only what matters.

    Of course I recognize the asymptotic projection of that path towards increased effectiveness of LSA, but I also recognize the infinite power of disruptive technology and market competition. My bets are on SEO.

  5. DigitalGhost Says:

    >>but Google is not yet doing LSA as much as it is doing “something with LSA”.

    That’s my current understanding as well. It’s my belief that Google, and other engines, will continue to integrate various aspects of LSA into their algorithms. To what extent? Only their engineers will ever know. The study of LSA is just one step that I feel is needed to anticipate how engines will operate in the future.

    I may be completely wrong in my anticipation, but the current condition suggests that LSA is a useful tool for concept matching. I would like to think, but I’m not sure, that LSA algorithms (when the technology improves and implementation is more thorough) will be more difficult to manipulate. Not because the technology raises the learning curve, but simply because manipulating (future) LSA algorithms will require more work.

    >>My bets are on SEO.

    As are mine.

Leave a Reply: All comments with embedded links will be placed into moderation. All SPAM is reported.