Skip to content

Google GO words

With all the current chatter about LSI and LSA I decided to re-emphasize something I dropped in a forum years ago : GO words. Not the Go Words defined here, but the ones I defined for myself when first analyzing Google’s love of semantic analysis. Sorry for re-using what appears to be already defined in the search world, but I didn’t know, and I was contrasting “stop” words which also seem to be named on top of an existing definition. (I didn’t say it would be easy).

Anyway, in SEO “stop words” were words which, if they appeared on a page, would cause that page to *not* appear in Google (either for the desired search or for all searches). A stop word was something like “rape”, which triggered some sort of filter and wreaked havoc with search inclusion back then. In those older days of SEO there were clear cases of unrecognized “stop words” causing pages to be dropped, and we SEOs found ’em and removed them. There were secret lists of stop words, allusions to secret lists of stop words, and all sorts of miscshpellingsh of stop words in order to keep the concept in the on-page text without tripping the censors filters. Sometimes stop words applied to certain queries, where the presence or absense of the word influenced whether or not that particular page “qualified” for ranking for a specific query. It wasn’t semantic analysis but censorship filtering back then. Today, these stop words would be considered either hard coded filter criteria or theme triggers that trip semantic set dynamics such that whatever LSA Google is doing in the algo, it is influenced by the stop word. You can see hard-coded stop words in action today with AdSense, with a minimal amount of effort (no, I don’t think the search engine and AdSense censor the same ways).

Once Google disclosed the tilde operator, we could play around inside Google’s synonym engine and that is where I was investigating stop words when I discovered “go” words. Go words (to me, at that time) were words which, if added to your page, caused it to rank for a query or thematic set of queries. I’m not talking about keywords specifically, but related words. Page without it, rank #30. Add the “go word”, and it rises to #3. Repeatable; testable. Go words existed and when you found them and included them, you were rewarded.

Because I’m not doing much real “work” to write this post I don’t have any “go words” to show you. I won’t reveal those I work with currently, and since I have many years in the niche markets I work, it is probably true that I still use most that I know about. In other words, I don’t have any throw-away competitive advantages to give you. I will say that it’s not too hard to find them, especially is you have SEO experience in your niche. They key IMHO is to know that they are out there, so that you can test without wasting your time. That is what I am offering here. No, I don’t mean to suggest that Google synonyms are “go” words. Synonyms are great for working with sets, finding overlaps, and testing pages against the current Google search index “corpus”, but in my experience Go words are rare and not simply threshold-triggering synonyms. When you find some you can test that fairly easily and see if you agree.

Now are “go words” hard-coded, filter triggers, or do they merely tip the scales of LSA-like algorithmic features? My experience is they are hard-coded, because a few very specific instances are just simply amazing to witness. However, I really can’t tell a badly tuned algorithmic dependency (a.k.a. “sensitivity”) from a filter or filter threshold setting… nobody but Google can tell you those details. My view is they are truly “necessary yet not sufficient” conditions for ranking at least in some cases. I would expect that as LSA etc. matures within Google, such things will go away. That will happen slowly.

It is refreshing to find a black/white “signal” like this in Google these days. Everything has become so graduated, when you find something with binary-like impact on the algo it is fun to exploit. When hunting, keep in mind that it really doesn’t matter if you find a set overlap threshold you can cross with 4 specific words, or a hard-coded trigger tripped by a single word: you are after the effect – put in, rank, take out, lose rank. Don’t get academic and miss the benefit.


  1. aaron wall wrote:


    you can see i was excited to try to define words back in the day. I still think part of the initial florida update effect was to suppress the rankings of pages that were too blatently focused on a keyword phrase. The same way you mentioned a b testing in this post I did around Florida on some pages and got similar results…by changing the proximity of some of the core keywords on a page.

    If the term “go words” was to pick up any steam, the way you mention it would surely be far more accurate (and useful) to the current search market than that page I threw up about a month after the Florida update.

    Another great post John :)

    Tuesday, September 5, 2006 at 12:31 am | Permalink
  2. john andrews wrote:

    Thanks Aaron. I recognize the link bait value of coining terms, but obviously I haven’t gone after that stuff (hence all these references to the past). But I do need to communicate clearly with clients.

    As I recall those post-Florida days were the start of Google’s application of digital filtering to page content. To visualize this, think of taking the first “n” words on the page, moving over “m” words and taking another “n” word series, and continuing that through the portion of the page selected for analysis. Start with n=5 and m=1. Weight the center of the 5 word string differently than the “tails”, sort of like imposing a bell curve on the value of the words (the middle words get full value, and the edge words get very low value). Consider each of those overlapping 5 word series as a data point for the statistics (keyword density etc) instead of the usual whole-set analysis. The process enables you to run more advanced analysis on the selected text, very quickly (linear agebra), without as large a data set as you might otherwise need. Compare that to the corpus (index?), and keyword-stuffed content stands out like a sore thumb even if the aggregate word stats match the corpus.

    That is how I interpreted it back then, and that perspective served me well. I didn’t see a need to try and reproduce it or prove the theory, and I don’t project this as a theory of how Google worked, but it is *the* way to digitally characterize serial data. The inverse of generate is deconstruct or analyze. A filter is an inverted generator, and vice versa. Digital signal processing (DSP) is a set of mathematics for doing exactly that in an efficient manner in Base 2 (binary). That is what Google does. Today, they have more resources (money & Ph.D.’s) and the world of semantic analysis is advancing rapidly. Who knows what looms?

    Tuesday, September 5, 2006 at 1:07 pm | Permalink