Skip to content

Cloaking Google like a Lemming

And then a trap door opened, and one after the other the lemmings fell down the hole. So fascinated were they with the idea of following the lemming in front of them, they were unable to recognize their impending doom until it was too late.

I don’t advocate cloaking Google. But I was asked by a competitive webmaster about cloaking today. The question was (re-phrased):

“I understand cloaking and the risks and rewards. I have successfully tested my set up and it works. But I feel uncomfortable simply because I am not able to see how the competitive webmaster cloaker can be better than the typical cloaker. Everything I have set up anyone else could also set up. I know from your blog that I should know more before I take the risk. What am I missing?”

What would you say? Not knowing the cloaking scripts or setup, I can immediately say the weakness is at the trust points. Where does he trust someone or something for the success of his cloaking? At those points, he is vulnerable. At those points, he needs to place some risk management.

Some ideas:

You trust that Google will do what it has done, and not suddenly vary from it’s normal behavior.

Google follows robotx.txt; Google bots show user-agents; Google comes from a Google IP; Google doesn’t process sophosticated javascript. Google sends a bot to check your landing pages. If you trust these to be true, then the moment Google breaks from this defined behavior, the trap door opens and the cloaking lemmings fall into the trap one after the other. What will happen to your site if Google changes? Make sure what what happens is benign. Some of you know my favorite book of all time is “Systemantics” which looks not at how things work, but How Things Fail. How does your thing fail?

You trust you IP list as accurate and comprehensive, up to date.

If I were Google (or a Dirty Bastard anti-cloak detective), I’d send out a stealth bot to hit suspected cloaking sites. What would happen if Google arrived from an IP not on your list? What if your list was corrupted, or worse, co-opted? As soon as you start on a path to updating your cloaking based on an available IP list, you become a lemming. An IP is added to the list, and you start cloaking for that IP, just like everyone else. It seems pretty easy for Google to just launch a spider at known Black Hat sites from a new IP, wait a day, and then hit the sites on the suspect list to see if they cloak the new IP. That’s not anti-cloaking F.U.D. but scenario planning. It’s possible, so why not prepare for it? If you can’t think of or implement ways to protect yourself from this sort of thing, in my opinion you should not be cloaking Google. And if you’re cloaking AdWords landing pages, are you sure you can recognize Google editors by IP? What about part-time, telecommuting editors from all over the world?

You trust your web host environment

Are you sure the only way to your content (cloaked content as well as cloaking content) is through your cloaking script? Really sure? If you use one technology for cloaking, and another for protecting your non-public content, you really need to be very sure you are secure (I’ve seen cgi-based cloaking scripts used with .htaccess auth control and a sprinkling of PHP scripting, all mixed together). It’s not easy to be expert enough in all of the platform technologies to make sure they are collaborating soundly.  If you know what I mean and want to see how much of a problem this is, go and Google to find cloaking content that isn’t supposed to be exposed. Skip the first one because that’s one of my test sites, but look at all the rest. Page after page. Wow. That much. Geez. And ironic how we can simply use Google to find it, eh? In the index?

You trust your privacy, with respect to your other domains

Are you trusting that Google doesn’t know about your association with DomainB, or DomainC, when you start cloaking on DomainA? I’d advise making some changes in advance, just to insert an arms length or so before starting. Are you sure that your cloaking script doesn’t leave a foot print? Even one uniquely-named file on the public portion of your web server can give it away (see that reference above to using Google to find cloaking sites).

One of the most interesting aspects of computer simulation and modeling is the amazing power of large N. The more data you have, the easier it is to see “outliers”, those data points which differ from the rest by an unexplainable amount (unexplainable according to the model). Google has TONS of data about web site linking and the performance of AdWords/AdSense landing pages. If you base your cloaking on your model of how Google sees things, will it still stand out as an outlier when Google compares your site to a test collection of 200,000 similar websites? How do you even know?
I am sure everyone can add some items to this list, provide a corresponding risk-management tactic, and assign a priority for effort based on probability estimates (of it happening), cost estimates (for adjusting afterwards or monitoring), and estimates of risk (a ban, a penalty, etc). What I’m not sure is if everyone is paying attention to the important details, and the importance of the important details. To remain competitive, you must adapt. To adapt, you must first sense your environment. Sometimes that is simply not possible unless you are Google.
Friendly Reminder: If you want to post a comment that says I’m a cheater or scammer or whatever because I am discussing cloaking, move along and find yourself a hate site to frequent. I don’t have time to debate the nature of competition with you, and this is not your forum for proselytizing. Okay?