Facebook exploit – Confirm website visitor identities

Short version:

I discovered a bug that would let any web page identify a logged in FB user by confirming their ID. Facebook fixed in 6-9 months and rewarded a $1000 bounty.

In last years coverage of the Facebook / Cambridge Analytica privacy concerns, Mark Zuckerberg was asked to testify before Congress, and one of the questions they asked was around whether Facebook could track users even on other websites. There was a lot of news coverage around this aspect of Facebook, and a lot of people were up in arms. As one aspect of their response, Facebook launched a Data Abuse Bounty, with the aim of protecting user data from abuse.

So, having recently found a bug in Google’s search engine, I set out to see whether I could track or identify Facebook users when they were on other sites. After a few false starts, I managed to find a bug which allows me to identify whether a visitor is logged in to a specific Facebook account, and can check hundreds of identities per second (in the range of 500 p/s).

I have created a proof of concept of the attack (now fixed), which checks both a small known list of IDs but also allows you to enter an ID and it will confirm whether you are logged in to that account or not.

Method

Facebook has a lot of backend endpoints which are used for various AJAX requests across the site. They are almost all are protected by access-control-allow-origin headers and magic prefixes on JSON responses that prevent JSON hijacking and other nasty attacks.

I searched across the site looking for any endpoints that didn’t have these protections and which did pass my user id in the URL, looking for any way I may be able to parse a response from Facebook to confirm whether the UID in the URL was correct.

I also looked for any images that include the user ID in the URL and behave differently when the UID matches the logged in user (so I could do something similar to this method, but for specific IDs); the closest I got was an image that did behave differently but the URL also included Facebook’s well known fb_dtsg parameter that is unique for users (and changes regularly) which prevented it being abused.

In addition I checked for any 301/302s in these URLs which might represent an opportunity to redirect to an image in a fashion would allow the same trick as above.

After carefully checking dozens of these endpoints I eventually found one that had a slight inconsistency in how it behaved which was a small gap but represented a weakness; it did have an access-control-allow-origin header, but it only included a magic prefix when the user ID (in the __user URL parameter) didn’t match, not when it did match. When the user ID provided in the URL did match the response was pure JSON.


However, because of the pesky access-control-allow-origin header, I couldn’t call this via an XHR request as the browser would block it. At this point I thought it may be another dead end, but I eventually realised what I could do is use it as the src for a normal <script> block; this would of course fail but importantly it fails in a different way in both the cases (due also to the content-type header), and in such a fashion that this can be detected via onload and onerror event handlers.

Here is an example of the URL for the endpoint:

https://www.facebook.com/ajax/pagelet/generic.php/TimelineEntStoryActivityLogPagelet?dpr=2&ajaxpipe=1&ajaxpipe_token=AXjeDM6DZ_aiAeG-&no_script_path=1&data=%7B%22year%22%3A2017%2C%22month%22%3A9%2C%22log_filter%22%3A%22hidden%22%2C%22profile_id%22%3A1159016196%7D&__user=XXXXXXXXXXXX&__a=1&__dyn=7AgNe-4amaxx2u7aJGeFxqeCwKyWzEy4aheC267UqwWhE98nwgU6C4UKK9wPGi2uUG4XzEeUK3uczobrzoeonVUkz8nxm1typ8S2m4pU5LxqrUGcwBx-1-wODBwzg7Gu4pHxx0MxK1Iz8d8vy8yeyES3m6ogUKexeEgy9EhxO2qfyZ1zx69wyQF8uhm3Ch4yEiyocUiVk48a8ky89kdGFUS&__req=fetchstream_8&__be=1&__pc=PHASED%3ADEFAULT&__rev=3832430&__spin_r=3832430&__spin_b=trunk&__spin_t=1524222703&__adt=8&ajaxpipe_fetch_stream=1

I was then able to craft a simple Javascript script that would take a list of user IDs and generate many script tags with callbacks to determine success or failure. Because the endpoint is HTTP2 it also means you can have many of these requests in flight at once, which makes checking against large lists of IDs very quick. I did a small test here and was able to test 400-500 user IDs per second; if this was done in the background on a normal page that a user was on for a minute it would be possible to check thousands of IDs. There didn’t seem to be any sort of rate limiting on this endpoint.

Demo

I have created a small demo which demonstrates the attack. It checks a small list of known user IDs automatically when you arrive on the page, and also allows you to enter an ID on the page and will confirm whether you are logged in to that account.

Impact

This is limited in that you need to be checking against a known list of users, rather than just being able to determine the user’s identity automatically. However, anyone affected by the Cambridge Analytica data situation whose data is already known, they would now be able to be identified and tracked across websites even without using any Facebook APIs.

In addition, the most sinister exploiters (e.g. a repressive regime) of such a bug would likely have a list of people they cared about identifying (which they could also narrow down based on your location and other factors). A final example might be anyone on a corporate IP address or network, where the list of users is probably fair easy to harvest and is fairly finite.

So the scope is fairly narrow, the impact on many may be small, but for some that impact could be high. This would certainly be a violation of privacy for any Facebook user who did get identified.

Disclosure Timeline

  • 20th April 2018 – I filed the initial bug report.
  • 20th April 2018 – Facebook replied letting me know this was being handed to the correct team to investigate.
  • 1st May 2018 – I requested an update.
  • 2nd May 2018 – FB replied – still investigating.
  • 23rd May 2018 – I requested an update, noticing it was fixed in Chrome but not Safari.
  • 23rd May 2018 – FB replied – they were investigating solutions.
  • 20th June 2018 – FB awarded a $1000 bounty.
  • 1st October 2018 – I requested permission to publish.
  • 1st October 2018 – FB replied they were still working on the fix, and they’d update me.
  • 19th February 2019 – I followed up and FB seemed happy for me to publish.

(It is unclear when the final fix rolled – it looks like 6-9 months after I reported it.)

Hijack the Google Login flow (Bug Bounty Submission)

<short version> Google takes some user supplied data in a URL parameter which it expects to be a domain name, but does not validate it is a proper domain name or sanitise it. They then inject this value into the page inside an inline block of Javascript, controlling the next page you will visit (halfway through the login flow), which allows you to replace it with a path of your choosing. This path can be a logout URL which then uses a second user supplied parameter to control the ongoing redirect (cross-origin permitted). So essentially, because the user input is not treated properly an attacker can redirect someone who is halfway through logging in to a page of their choice. Google have now fixed it.</short version>

In July 2017, I reported an issue to Google that allows an attacker to inject a malicious link in to a standard Google ‘account chooser’ screen, such that they can hijack the login process. Google fails to properly authenticate a URL parameter, expected to be a domain name, before placing it server side into some inline Javascript on the page.

Specifically, it allows an attack to craft a URL to an authentic Google login page (where the user must pick from a list of their accounts) such that the destination for any one of those accounts is to a page the attacker controls.

Method

For people with multiple Google accounts, such as myself, we are quite accustomed to being presented with a list of our Google accounts from which we need to select one to log in to. One form of this Google page embeds (for reasons lost on me) the list of email addresses to show inside the URL:


(It is normal for there to be only placeholder icons, hack or not)

I first discovered this page well over a year ago, but other than manipulating the email addresses to say rude words I couldn’t find much malicious to do with it. I recently re-discovered the page, and thought I would take another look at what I could do with it.

I discovered the page uses the parameters not only to present the list of email addresses but also as a factor in controlling where the login flow takes the user to when they select an account, which involves a failure in server-side validation.

Specifically, I found that if I presented emails in a certain format the section after the @ symbol would be used as the path for the following page. Because it is a GSuite page, some server side code attempts to parse the domain from the email then use that to form the path for the following page. However, it does not validate the domain correctly, allowing an attacker to form aribtrary paths.

The improperly parsed ‘domain’ is then embedded inside some inline Javascript page that controls the login flow:

(Values from the URL are embedded server side into inline Javascript, without being properly validated.)

The screenshot shows the value in the URL [marked 1], being embedded into inline Javascript [2]. This value is expected to be a domain, but the server side code does little or no validation. For example, it does not ensure the presence of an TLD extension, and it allows forward slashes.

However, adding such a path did disfigure the listings on the page, such that it was obvious that something was amiss. To counteract this, I simply abused the fact that CSS was used to crop overly long email addresses, and given that invalid email addresses are allowed through here, I simply dressed it up with the full name of the user before (as is standard on these pages most of the time anyway).

To get URL parameters to pass through, I simply put them in the URL before the list of u parameters specifying emails (marked [4] in the screenshot) and these parameters would then be added to the destination path. You can see marked [3] in the screenshot, that this worked because the window.location.pathname is used, which means the query string remains intact.

Now I was able to use abuse existing open redirect via a logout redirect page (so doesn’t require users to be logged in already to work) via another Google open redirect to forward the user to any page of my choosing. There are many such open redirectors on Google.com, so many options to abuse this.

Here is a demo:

The redirect I am using shows a redirect message, but I did later site step that. You could also redirect to a more misleading domain, of course. The result is that you start on an authentic Google account page, and when you select an account that you expect takes you to a password entry page, you arrive at one. However, this page is actually a malicious page being controlled by the attacker, who is now going to steal your login details. They can then redirect you back to an official Google page to cover their tracks.

I reported this to Google who said they already had a bug filed against it, but after some back and forth it seems that was something else. Unfortunately, Google deemed it “does not meet the bar for a financial reward”.

Disclosure Timeline

17th July 2017 – I reported this to the security team via their form.
17th July 2017 – I heard back it was triaged and awaiting attention.
20th July 2017 – I followed up to supply some more information. I was concerned it would be flagged purely as a phishing issue, or open redirect. I also thought they may think it was just the issue that an attacker can control the list of accounts and miss the fact that the URL parsing is broken such that malicious links can be injected.
20th July 2017 – The team came back to me and told me they were already aware of this issue, so it was not eligible for a bug bounty.
21st July 2017 – I shared a draft of this write up with the Google team.
3rd August 2017 – Google came back letting me know they are working on the issue, but in doing so also revealed that (due to my poor explanation!) they had mis-understood the primary issue I was reporting.
4th+7th August 2017 – I replied giving some more details on the exact issue.
11th August 2017 – I followed up.
17th August 2017 – Google replied saying it was actually different, and saying I should follow up in 2-3 weeks.
14th September 2017 – I followed up.
15th September 2017 – Google security team replied saying they hadn’t heard back from the team responsible, and said they would follow up.
2nd October 2017 – I followed up.
12th October 2017 – Google team said still no status update.
3rd November 2017 – I followed up.
20th November 2017 – I followed up.
15th December 2017 – Google team said still no status update.
25th January 2018 – I followed up.
13th March 2018 – Google team said still no status update.
14th April 2018 – I followed up, now 9 months since report.
15th May 2018 – Google let me know a fix is a few days out.
8th June 2018 – Google confirm I can publish.
2nd August 2018 – Google confirmed it “does not meet the bar for a financial reward”.

Similar to the other bug that I recently reported, this is quite a specific attack as it needs to target a specific user, but it is potentially high impact. Furthermore, it may be that the missing validation on the parameter allows for other nefarious uses, but I didn’t dig deeper once I identified the issue.

As usual, a big thanks to the Google team! 🙂

Google exploit via XML Sitemaps to manipulate search results

Short version:

For the $12 cost of a domain, I was able to rank in Google search results with Amazon, Walmart etc. for high value money terms in the US. The Adwords bid price for some these terms is currently around $1 per click, and companies are spendings 10s of thousands of dollars a month to appear as ads on these search results, and I was appearing for free.

Google have now fixed the issue and awarded a bug bounty of $5000.

Google provides an open URL where you can ‘ping’ an XML sitemap which they will fetch and parse – this file can contain indexation directives. I discovered that for many sites it is possible to ping a sitemap that you (the attacker) are hosting in such a way that Google will trust the evil sitemap as belonging to the victim site.

I believe this is the first time they have awarded a bounty for a security issue in the actual search engine, which directly affects the ranking of sites.

Continue reading

Googlebot’s Javascript random() function is deterministic

I was conducting some experiments on how Googlebot parses and renders Javascript, and I came across a couple of interesting things about the way it does so. The first is that Googlebot’s Math.random() function produces an entirely deterministic series. I created a small script which uses this identify Google in an obfuscated fashion:


http://www.tomanthony.co.uk/fun/googlebot_puzzle.html

Continue reading

How to confirm a Google user’s specific email address (Bug Bounty Submission)

I recently reported an issue to Google, which allows an attacker to confirm whether a visitor to a web page is logged in to any one of a list of specific Google accounts (including GSuite accounts). It is possible to check about 1000 email addresses every 25 seconds. Google have confirmed this as working as intended, and not considered a bug.

You can test it out yourself on this demo page.

Firstly, a video of a proof of concept, where I identify an account (myself) against a list of 20 accounts:

Continue reading

Googlebot now accepting Cookies

In the last couple of days Google announced that they were going to start executing javascript on most pages they visit and thus rendering pages far more akin to how our browsers do it. It was inevitable they’d need to do this, so it is a welcome update.

Then today, they announced an updated tool in Webmaster Tools that extends the previous “Fetch as Googlebot” feature with an additional option which enables this new javascript capabilities and returns you a screenshot of the rendered page. Very cool!

Continue reading