in All Posts, SEO/SEM, Web

Googlebot’s Javascript random() function is deterministic

I was conducting some experiments on how Googlebot parses and renders Javascript, and I came across a couple of interesting things about the way it does so. The first is that Googlebot’s Math.random() function produces an entirely deterministic series. I created a small script which uses this identify Google in an obfuscated fashion:


http://www.tomanthony.co.uk/fun/googlebot_puzzle.html

The first time Googlebot calls Math.random() the result will always be 0.14881141134537756, the second call will always be 0.19426893815398216. The script I linked to above simply uses this fact but obfuscates it a little and ‘seed’ it with something that doesn’t look too arbitrary.

Crawling at Google Scale

Consider the amount of work Google have to undergo to crawl the whole web AND now run Javascript. Optimisations will need to be abundant, and I imagine that having a deterministic random number function is probably:

  1. Faster
  2. More secure
  3. Predictable – Googlebot can trust a page will render the same on each visit

Speeding up the clock…

Googlebot also seems to run Javascript with a sped up clock, which makes sense. Why actually wait 5 seconds when you are a bot? So Google actually runs the timer a lot faster. If you create a simple ticker script and put it through the Google Search Console ‘Fetch & Render’ function it returns almost instantly, but with results looking like this:

The second date is a date from the future! Marty McFly would be proud.

Since when?

I did wonder if the random number generation sometimes updates, but a Google search for 0.14881141134537756 turns up over 18,000 results, so it seems like it is quite stable. After discovering this I Google about a bit and found an old Hacker News comment by ‘KMag’:

At some point, some SEO figured out that random() was always returning 0.5. I’m not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed.

So it seems things were similar to this for some time now, but instead of random() always returning 0.5, it became a deterministic series. The date is actually accurate initially, but can go into the future, as seen above. KMag went on to say:

I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it’s deterministic but very difficult to game.

Which doesn’t seem to be the case, but I’m unsure this allows you to do much you couldn’t do based on User-Agent / IP, but perhaps does allow you to do it with some plausible deniability!

If you want to reach me hit me up on Twitter @TomAnthonySEO or at Distilled.

Write a Comment

Comment

  1. What a neat trick to detect Google bot.
    How well speeding up javascript affects the website? I mean, I expect to break something like spans or an ajax or whatever may be time sensitive in that case.
    Had you time to test anything like that?

  2. I don’t understand what your concern is. Any pseudorandom number generator must be seeded. If you don’t seed it, it will have a default seed. Running the generator with the same seed multiple times will produce the exact same sequence of numbers.