Googlebot now accepting Cookies

In the last couple of days Google announced that they were going to start executing javascript on most pages they visit and thus rendering pages far more akin to how our browsers do it. It was inevitable they’d need to do this, so it is a welcome update.

Then today, they announced an updated tool in Webmaster Tools that extends the previous “Fetch as Googlebot” feature with an additional option which enables this new javascript capabilities and returns you a screenshot of the rendered page. Very cool!

In my experimentation with this tool I used to confirm that this new headless browser Googlebot seems to accept and store cookies, and send them back to your server too (as well as referrer headers).

The Setup

I started by creating a simple javascript script to check for the navigator.cookieEnabled JS property, which I then visited in the new WMTs tool:

google_cookies_js

So now the trick was to check whether Google would really store and return this cookie. I ended up settling on a simple setup with a PHP script that set a cookie and output an page containing an iframe, the document in that iframe was a second script which output the HTTP headers Google sent with the request:

google_header

You can see that sure enough Googlebot sends back a Cookie header with the value I had set in the ‘outer’ script. It sends a Referrer header too.

What does this mean?!

So far it is too early to tell. Google doesn’t keep the cookies between runs of the tool, so it really depends how they treat things when actively crawling pages. If crawling multiple pages like a single run of the tool then we should see cookie affected results appearing in the index, but it might be that Google crawls each page separately in a different ‘session’ and then this won’t change very much.

Google Exploit – Steal Account Login Email Addresses

tl;dr I found a bug that allowed me to find anyone with a Google+ account’s login email address (even if they chose not to share it). This could be used to target specific people or just crawl Google+ collecting emails, and tying them easily to other social accounts as step one of something nefarious (e.g spear phishing, or other account compromise). This has now been fixed by Google’s security ninjas.

Intro

I often spend time poking around Google for security holes as it is a great way to learn and they have a nice bounty program, so it is win win. I’ve previously claimed a bounty for an authorisation bypass that required modifying the payload of a POST request to one of their backend systems, but today’s bug is far simpler to abuse.

I was poking around Google’s Dashboard where you can monitor account activity etc., and as part of getting my bearings I was doing a first pass to see how the site responded to different unauthorised requests. I found that in certain conditions one such unauthorised request prompted me to login with Google’s new “One account. All of Google” login screen, and prompted me with the login email address, thus exposing it to me for an account I was unauthorised for.

So I was now able to select a target from their Google+ profile and find out their login email address. Alternatively, I could just crawl Google+ profile pages collecting email addresses (and other linked social accounts) forming a nicely little database for spear phishing attacks.

The Bug

I found a URL from a Google email that linked to my Dashboard with an extra parameter:

https://www.google.com/settings/dashboard?uq=114756468015607312300

I immediately recognised the long number as that which appeared in my Google profile URL, before I changed it a pretty one. Lots of profiles are still in the format of:

https://plus.google.com/103112588675637065591/posts

My ‘pretty’ URL one is:

https://plus.google.com/+TomAnthony/posts

And in the source code of my profile you’ll find that ID (114756468015607312300) over 400 times. So these are very much public.

Step One

Change the ID to someone else’s and see what happens:

https://www.google.com/settings/dashboard?uq=106858636547986185729

You get redirect to a login page, as expected.

Step Two

Login to your own account, and then try the same URL again:

https://www.google.com/settings/dashboard?uq=106858636547986185729

I get 302 redirected to accounts.google.com to the ‘add session’ login rather than the primary login page, but what is that in the URL:

https://accounts.google.com/AddSession?continue=https://www.google.com/settings/dashboard#Email=REDACTED_EMAIL_ADDRESS

The URL contains the login email address for the account identified by your selected ID number in a hash fragment which is used to pre-populate the login field. I’m still learning this stuff, but I’m unsure how that would ever be a good idea.

I reported the bug before trying to dig into whether any additional account information was leaked.

Step Three

You can now visit your favourite celebrity’s page and get their email (well, their team’s email for most probably, but for some more tech savvy maybe their actual address). Alternatively, and more likely you can target other specific victims, or just write a basic crawler to crawl Google+ tying emails to account ID numbers and other social accounts such as Twitter, LinkedIn, Facebook that the user has in their profile, or their website etc. This would be a great basis for spear phishing or as a first step to trying to compromise someones account.

Timeline

March 4th – Initial Report
March 4th – Google replied within 4 hours seeking clarity.
March 5th – Google notified the bug was triaged.
March 6th – Google email again following up.
March 6th – I asked about disclosure. Google asked me to wait until fix verified.
March 7th – Google let me know the bug is fixed and verified.
March 14th – Google contacted to confirm a bounty of $1337. Thanks! 🙂

Google should let me know next week whether this qualifies for a bounty (the team votes on all reports at scheduled meetings); I’ll update this post when they do.

Thanks to the Google security team, who were responsive to reply, fast to fix and very communicative.

Hacker News Discussion

Machine Learning for SEOs (on Moz)

Since the Panda and Penguin updates, the SEO community has been talking more and more about machine learning, and yet often the term still isn’t well understood. We know that it is the “magic” behind Panda and Penguin, but how does it work? Why didn’t they use it earlier? What does it have to do with the periodic “data refreshes” we see for both of these algorithms? I think that machine learning is going to be playing a bigger and bigger role in SEO, and so I think it is important that we have a basic understanding of how it works.

From Keywords to Contexts: the New Query Model (on Moz)

As SEOs we talk a lot about “search queries” (or simply “searches”), yet I think search has outgrown our definition of what exactly a search query is. In this post I’m going to explain how I think the old definition is fast becoming less and less useful to us, and also how I believe this is going to mean we’re going to talk about keywords less and less. Our understanding of what we mean when we say “query” has become too narrow.

Linkgex: Tool to Get Links to Specific Subsets of Pages (on Distilled)

Recently I have found myself fairly frequently wanting to get links that are linking to a certain sub-section of a website (i.e. links to only certain pages on the domain). I tend to use a mix of OpenSiteExplorer, Majestic, and Ahrefs when I get backlinks, but currently none of these services actually allow me to get backlinks in such a fashion. I decided to put together a short script proof of concept script to do this.

Change Tracking: Monitor Competitors’ Websites for SEO (on Distilled)

It is relatively standard practice nowadays to do keyword rank checking with tools such as SEOmoz, Authority Labs or Conductor. It just makes sense to us as SEOs to keep an eye on them, whether you are of the school that you should be reporting them to your clients/boss or not. But something I haven’t really done much of until now is tracking my competitors’ sites (their markup, structure and content). I think observing your competitors in a structured and routine fashion is something that absolutely makes sense, and doesn’t need to be a big task.