Googlebot’s Javascript random() function is deterministic

I was conducting some experiments on how Googlebot parses and renders Javascript, and I came across a couple of interesting things about the way it does so. The first is that Googlebot’s Math.random() function produces an entirely deterministic series. I created a small script which uses this identify Google in an obfuscated fashion:

Continue reading

How to confirm a Google user’s specific email address (Bug Bounty Submission)

I recently reported an issue to Google, which allows an attacker to confirm whether a visitor to a web page is logged in to any one of a list of specific Google accounts (including GSuite accounts). It is possible to check about 1000 email addresses every 25 seconds. Google have confirmed this as working as intended, and not considered a bug.

You can test it out yourself on this demo page.

Firstly, a video of a proof of concept, where I identify an account (myself) against a list of 20 accounts:


I’ve previously written about identifying whether a user is logged in to a certain social network, and this attack is a variation of that method (albeit more serious, IMHO).

Google login pages often pass a continue parameter in the URL that is used to redirect a user to their intended destination after they complete login. However, if you are already logged in then you just get redirected immediately to the URL specified in the continue parameter.

This fact can be abused to craft a URL that will redirect users who are logged in to an image file, and challenge users who are not logged in with a login page. If you now use this URL as the src element in an img tag, you can use the Javascript onload and onerror functions to determine whether the image loaded correctly or not.

If the image loaded, then the user is logged in, and if it errored then the user is not logged in. This is an known issue but has limited capacity to cause any sort of problem.

However, Google succumbs to a far more dangerous variation where the attacker can also supply an additional parameter specifying an email address. The redirect then fires if the email matches, but otherwise not.

At this point an attacker can just dynamically create loads of image tags (no need to even add them to the page, you can do it without attaching them to the DOM) with onload attributes and wait for a match. In my tests I could check about 1000 emails every 23-24 seconds or so. If a user is on your site for a couple of minutes then you could check many thousands of possible emails.

Combined with other ways to partially identify users — from using their geography via IP, targeting them with very-targeted (demographically or otherwise) social ads, identifying their corporate network or many other methods, you could dynamically load lists of targets. You can then match these people against requests and record their IP address, location, device, and all sorts of other information.

You may then use that knowledge to setup the attack above to then do some dynamic spear-phishing.

Disclosure Timeline

  • 14th July – I reported this to the security team via their form.
  • 17th July – I heard back it was triaged and awaiting attention.
  • 18th July – The team came back to me and asked me what my suggestions for handling this would be.
  • 18th July – I went back to them with my suggestion for some sort of nonce or salted hash of the email such that a the redirect only worked with that hash and email combo, to stop blind attacks.
  • 19th July – The security team confirms they are filing this as a bug.
  • 21st July – I sent over a copy of my blog post as additional explanation.
  • 9th August – Google team lets me know, after discussion, this is intended behaviour. They suggested there may be rate limiting at a higher rate (I’ve not tried to confirm), and don’t consider it a problem. No action to be taken.

This bug is quite specific, in that you need to have a target or list of target victims. However, I did think it could be quite bad if used.

If you missed it you can test this out here: demo page.

Lastly, a big thanks and shout out to the Google security team, they were responsive, friendly and communicative. 🙂

Hacker News Discussion

Googlebot now accepting Cookies

In the last couple of days Google announced that they were going to start executing javascript on most pages they visit and thus rendering pages far more akin to how our browsers do it. It was inevitable they’d need to do this, so it is a welcome update.

Then today, they announced an updated tool in Webmaster Tools that extends the previous “Fetch as Googlebot” feature with an additional option which enables this new javascript capabilities and returns you a screenshot of the rendered page. Very cool!

In my experimentation with this tool I used to confirm that this new headless browser Googlebot seems to accept and store cookies, and send them back to your server too (as well as referrer headers).

The Setup

I started by creating a simple javascript script to check for the navigator.cookieEnabled JS property, which I then visited in the new WMTs tool:


So now the trick was to check whether Google would really store and return this cookie. I ended up settling on a simple setup with a PHP script that set a cookie and output an page containing an iframe, the document in that iframe was a second script which output the HTTP headers Google sent with the request:


You can see that sure enough Googlebot sends back a Cookie header with the value I had set in the ‘outer’ script. It sends a Referrer header too.

What does this mean?!

So far it is too early to tell. Google doesn’t keep the cookies between runs of the tool, so it really depends how they treat things when actively crawling pages. If crawling multiple pages like a single run of the tool then we should see cookie affected results appearing in the index, but it might be that Google crawls each page separately in a different ‘session’ and then this won’t change very much.

Google Exploit – Steal Account Login Email Addresses

tl;dr I found a bug that allowed me to find anyone with a Google+ account’s login email address (even if they chose not to share it). This could be used to target specific people or just crawl Google+ collecting emails, and tying them easily to other social accounts as step one of something nefarious (e.g spear phishing, or other account compromise). This has now been fixed by Google’s security ninjas.


I often spend time poking around Google for security holes as it is a great way to learn and they have a nice bounty program, so it is win win. I’ve previously claimed a bounty for an authorisation bypass that required modifying the payload of a POST request to one of their backend systems, but today’s bug is far simpler to abuse.

I was poking around Google’s Dashboard where you can monitor account activity etc., and as part of getting my bearings I was doing a first pass to see how the site responded to different unauthorised requests. I found that in certain conditions one such unauthorised request prompted me to login with Google’s new “One account. All of Google” login screen, and prompted me with the login email address, thus exposing it to me for an account I was unauthorised for.

So I was now able to select a target from their Google+ profile and find out their login email address. Alternatively, I could just crawl Google+ profile pages collecting email addresses (and other linked social accounts) forming a nicely little database for spear phishing attacks.

The Bug

I found a URL from a Google email that linked to my Dashboard with an extra parameter:

I immediately recognised the long number as that which appeared in my Google profile URL, before I changed it a pretty one. Lots of profiles are still in the format of:

My ‘pretty’ URL one is:

And in the source code of my profile you’ll find that ID (114756468015607312300) over 400 times. So these are very much public.

Step One

Change the ID to someone else’s and see what happens:

You get redirect to a login page, as expected.

Step Two

Login to your own account, and then try the same URL again:

I get 302 redirected to to the ‘add session’ login rather than the primary login page, but what is that in the URL:

The URL contains the login email address for the account identified by your selected ID number in a hash fragment which is used to pre-populate the login field. I’m still learning this stuff, but I’m unsure how that would ever be a good idea.

I reported the bug before trying to dig into whether any additional account information was leaked.

Step Three

You can now visit your favourite celebrity’s page and get their email (well, their team’s email for most probably, but for some more tech savvy maybe their actual address). Alternatively, and more likely you can target other specific victims, or just write a basic crawler to crawl Google+ tying emails to account ID numbers and other social accounts such as Twitter, LinkedIn, Facebook that the user has in their profile, or their website etc. This would be a great basis for spear phishing or as a first step to trying to compromise someones account.


March 4th – Initial Report
March 4th – Google replied within 4 hours seeking clarity.
March 5th – Google notified the bug was triaged.
March 6th – Google email again following up.
March 6th – I asked about disclosure. Google asked me to wait until fix verified.
March 7th – Google let me know the bug is fixed and verified.
March 14th – Google contacted to confirm a bounty of $1337. Thanks! 🙂

Google should let me know next week whether this qualifies for a bounty (the team votes on all reports at scheduled meetings); I’ll update this post when they do.

Thanks to the Google security team, who were responsive to reply, fast to fix and very communicative.

Hacker News Discussion

Machine Learning for SEOs (on Moz)

Since the Panda and Penguin updates, the SEO community has been talking more and more about machine learning, and yet often the term still isn’t well understood. We know that it is the “magic” behind Panda and Penguin, but how does it work? Why didn’t they use it earlier? What does it have to do with the periodic “data refreshes” we see for both of these algorithms? I think that machine learning is going to be playing a bigger and bigger role in SEO, and so I think it is important that we have a basic understanding of how it works.

From Keywords to Contexts: the New Query Model (on Moz)

As SEOs we talk a lot about “search queries” (or simply “searches”), yet I think search has outgrown our definition of what exactly a search query is. In this post I’m going to explain how I think the old definition is fast becoming less and less useful to us, and also how I believe this is going to mean we’re going to talk about keywords less and less. Our understanding of what we mean when we say “query” has become too narrow.