in All Posts, Security, SEO/SEM, Web

Google exploit via XML Sitemaps to manipulate search results

Short version:

For the $12 cost of a domain, I was able to rank in Google search results with Amazon, Walmart etc. for high value money terms in the US. The Adwords bid price for some these terms is currently around $1 per click, and companies are spendings 10s of thousands of dollars a month to appear as ads on these search results, and I was appearing for free.

Google have now fixed the issue and awarded a bug bounty of $5000.

Google provides an open URL where you can ‘ping’ an XML sitemap which they will fetch and parse – this file can contain indexation directives. I discovered that for many sites it is possible to ping a sitemap that you (the attacker) are hosting in such a way that Google will trust the evil sitemap as belonging to the victim site.

I believe this is the first time they have awarded a bounty for a security issue in the actual search engine, which directly affects the ranking of sites.

As part of my regular research efforts, I recently discovered an issue to Google that allows an attacker to submit an XML sitemap to Google for a site for which they are not authenticated. As these files can contain indexation directives, such as hreflang, it allows an attacker to utilise these directives to help their own sites rank in the Google search results.

I spent $12 setting up my experiment and was ranking on the first page for high monetizable search terms, with a newly registered domain that had no inbound links.

XML Sitemap & Ping Mechanism

Google allows for the submission of an XML sitemap; these can help them discover URLs to crawl, but can also be used hreflang directives which they use to understand what other international versions of the same page may exist (i.e. “hey Google, this is the US page, but I have a German page on this URL…”). It is not known exactly how Google uses these directives (as with anything related to Google’s search algorithms), but it appears that hreflang allows for one URL to ‘borrow’ the link equity and trust from one URL and use it to rank another URL (i.e. most people link to the US .com version, and so the German version can ‘borrow’ the equity to rank better in Google.de).

You can submit XML sitemaps for your domain either via Google Search Console, inside your robots.txt or via a special ‘ping’ URL. Google’s own docs seem a bit conflicting; at the top of the page they refer to submitting sitemaps via the ping mechanism, but at the bottom of the page they have this warning:

However, in my experience you could absolutely submit new XML sitemaps via the ping mechanism, with Googlebot typically fetching the file within 10-15 seconds of the ping. Importantly, Google also mention a couple of times on the page that if you submit a sitemap via the ping mechanism it will not show up inside your Search Console:

As a related test, I tested whether I could add other known search directives (noindex, rel-canonical) via XML sitemaps (as well as trying a bunch of XML exploits), but Google didn’t seem to use them.

Google Search Console Submission

If you try to submit an XML sitemap in GSC, that includes URLs for another domain you are not authorised for, then GSC rejects them:

We’ll come back to this in a moment.

(Sorry, Jono!)

Open Redirects

Many sites use a URL parameter to control a redirect:

In this example I would be redirected (after login) to page.html. Some sites with poor hygiene allow for what are known as ‘open redirects’, where these parameters allow redirecting to a different domain:

Often these don’t need any interaction (like a login), so they just redirect the user right away:

Open redirects are very common, and often considered not too dangerous; Google does not include them in their bug bounty program for these reasons. However, where possible companies do try to protect against these, but often you can circumvent their protection:

Tesco are a UK retailer doing more than Β£50 billion in revenue, over a Β£1 billion of which via their website. I reported this example to Tesco (along with a number of others to other companies that I discovered during this research) and they have since fixed it.

Ping Sitemaps via Open Redirects 😱

At this point, you may have guessed where I’m going with this. In turns out that when you ping an XML sitemap, if the URL you submit is a redirect Google will follow that redirect, even if it is cross domain. Importantly, it seems to still associate that XML sitemap with the domain that did the redirect, and treat the sitemap it finds after the redirect as authorised for that domain. For example:

In this case, the evil.xml sitemap is hosted on blue.com, but Google associates it as belonging to, and being authoritative for, green.com. Using this you can submit XML sitemaps for sites you shouldn’t have control of, and send Google search directives.

Experiment: Using hreflang directive to ‘steal’ equity and rank for free

At this point I had the various moving parts, but I hadn’t confirmed that Google would really trust a cross-domain redirected XML sitemap, so I spun up an experiment to test it. I had done lots of smaller tests to understand various parts of this (as well as various dead ends), but didn’t expect this experiment to work as well as it did.

I created a fake domain for a UK based retail company that doesn’t operate in the USA, and spun up an AWS server that mimicked the site (primarily through harvesting the legit content and retooling it – i.e. changing currency / address etc.). I have anonymised the company (and industry) here to protect them, so lets just call them victim.com.

I now created a fake sitemap that was hosted on evil.com, but contained only URLs for victim.com. These URLs contained hreflang entries for each URL pointing to an equivalent URL on evil.com, indicating it was the US version of victim.com. I now submitted this sitemap via an open redirect URL on victim.com via Google’s ping mechanism.

Within a 48 hours the site started getting small amounts of traffic for long tail terms (SEMRush screenshot):

A couple more days passed and I started appearing for competitive terms on the 1st page, against the likes of Amazon & Walmart:

Furthermore, Google Search Console for evil.com indicated that victim.com was linking to evil.com, although this obviously was not the case:

At this point I found I was also able to submit XML sitemaps for victim.com inside GSC for evil.com:

It seemed that Google had linked the sites, and evil.com’s search console now had some capabilities to influence victim.com’s setup. I could now also track indexation for my submitted sitemaps (you can see I had thousands of pages indexed now).

Searchmetrics was showing the increasing value of the traffic:

Google Search Console was showing over a million search impressions and over 10,000 clicks from Google search; and at this point I had done nothing other than submit the XML sitemap!

You should note that I was not letting people check out on the evil site, but had I wanted to, at this point I could have either scammed people for a lot of money, or setup ads or otherwise have begun monetising this traffic. In my mind this posed a serious risk to Google visitors, as well as a risk to companies relying on Google search for traffic. Traffic was growing still, but I shut my experiment down and aborted my follow up experiments for fear of doing damage.

Discussion

This method is entirely undetectable for victim.com – the XML sitemaps don’t show up on their end, and if you are doing what I did and leveraging their link equity for a different country, then you could entirely fly under the radar. Competitors in the country are operating in would be left quite baffled by the performance of your site (see above where I’m in the search results as Amazon, Walmart & Target, who are all spending significant resources to be there).

In terms of Black Hat SEO, this had a clear usage, and furthermore is the first example I’m aware of of an outright exploit in the algorithm, rather than manipulating ranking factors. The severity of potential financial impact of the issue seems non-trivial – imagine the potential profit from targeting Tesco or similar (I had more tests to run to investigate this more but couldn’t without potentially causing damage).

Google have awarded a $5000 bounty for this, and the Google team were a pleasure to deal with, as always. Thanks to them.

If you have any questions, comments or information you can contact me at [email protected], on Twitter at @TomAnthonySEO, or via contacting me via Distilled.

Disclosure Timeline

  • 23rd September 2017 – I filed the initial bug report.
  • 25th September 2017 – Google responded – they had triaged the bug and were looking into it.
  • 2nd October 2017 – I sent some more details.
  • 9th October – 6th November – some back and forth status updates.
  • 6th November 2017 – Google said “This report has been somewhat hard to determine on what can be done to prevent this kind of behavior and the amount of it’s impact on our search results.I have reached out to team to get a final decision on your report. I know they have been sifting through the data to determine how prevalent the behavior that you described is and whether this is anything immediately that should be done about it.”
  • 6th November 2017 – I replied suggesting they don’t follow cross-domain redirects for pinged sitemaps – there is little good reason for it, and it could be a GSC only feature.
  • 3rd January 2018 – I asked for a status update.
  • 15th January 2018 – Googled replied “Apologies for the delay, I didn’t want to close this report earlier, because we were unable to get a definitive decision, if it would be possible to address this behavior with the redirect chain without breaking many legitimate use cases. I have reached back out to the team reviewing this report to get a final answer and will update you with their response this week.”
  • 15th February 2018 – Google updated to let me know a bug had been filed on the report, and the VRP board would discuss a bounty.
  • 6th March 2018 – Google let me know they had awarded a bounty of $1337.
  • 6th March 2018 – I shared a draft of this post with Google and asked for green light to disclose.
  • 12th March 2018 – Google let me know they hadn’t completed the fix, and asked me to hold off.
  • 25th March 2018 – Google confirmed the fix was live, and gave me green light to post.
  • 17th April 2018 – Google contacted me again to say they had upgraded the bounty amount to $5000. πŸ™‚

Write a Comment

Comment

84 Comments

    • Probably they’ve forgot about 3-4 zeros at the end… I don’t get their stinginess. He would earn more by using it for 1 day…

      • Exactly, this companies are getting really cheap with their bounties. And they wonder why there are more and more people trying to exploit this kind of things.
        With this low ball values, maybe next time he will sell this exploit for 6 months while they decide to fix it or not
        Btw good joob

  1. Hello,
    Amazing post πŸ™‚
    Black Hat still remain a vast world !
    Now after google has fixed the bug, should website owner fix their urls too (As Tesco ) ?

  2. This was super interesting and insightful, thanks for writing it Tom. Google was a little slow to this, if you were a blackhat you could have made a killing, they def should have given you more bounty πŸ˜‰ …

  3. Awesome find out and what an interesting read here. A little adventure and very intriguing. Glad that it falls into your hand and not others.

  4. Excellent write-up, thanks for sharing so much of the process. I am amazed that it took Google so long to make the fix, and that they paid such a nominal bounty on what could have been a major loophole. Nonetheless, very impressive work πŸ™‚

  5. Just Brilliant, It is something a lot of people in the industry are deliberately looking for and you let it pass to google. I strongly believe they should have pay you much higher than those thousand dollars.

  6. I would say that you are really dumb. You might be trying to present yourself as ethical.
    There was so much money to be made with this bug and no one would have cared about it.
    Lost opportunity.

    Before you commit such things in future, do share the bug with me. I will offer you more in terms of bounty than Google.

  7. Nice work Tom (it was great to see you at SearchLove) This is just awesome.

    “A couple more days passed and I started appearing for competitive terms on the 1st page, against the likes of Amazon & Walmart”

  8. I’ve done a lot of research on IDs in this regard, I have done a study on data mining for a long time, some data that I have achieved, sometimes successful, sometimes unsuccessful. It’s a crazy experiment and I need to congratulate you.

  9. Correction to: and companies are spendings 10s of thousands of dollars a month to appear as ads on these search results

    Its 10s of millions of dollars a month…

    • Good question. I don’t have confirmation of their final fix, no. My suggestion (which is somewhat hidden in the disclosure timeline) was they simply disallow cross-domain redirects for sitemaps that are submitted via the ping URL. Sitemaps can be submitted in robots.txt files, or inside Google’s Search Console platform – both of which could be used if you need a cross-domain redirect for your sitemap (which would be odd, but may happen).

      Such a fix wouldn’t prevent legitimate, authorised users from submitting such a sitemap but would simply disallow pinging of them. The ping mechanism could still be used for re-submission of sitemaps submitted by other means.

      • Yes, it is not easy for Google to fix this without breaking some use cases.
        maybe they just upgrade a higher level to recognize the URL in xml sitemap
        I think you make this possible, Google must have two bugs , 1: xml sitemap 2: hreflang.

        Is that possible to know how Google fix this issues? like email details of feedback

  10. Insightful research Tom. While I disagree about the usefulness of the cross-domain sitemap submit, it certainly should be restricted so that link equity is not hijacked. Blackhats could have used this method to make millions. Just… incredible.

  11. You should have resell it and get more money…

    Also seems google takes a lot time to fix bug why they didnt wait for Microsoft close the bug before publish?

    Tl;DR: Google is a shit company like Facebook

  12. Such a great post as you write ever and today you have nicely done about xml sitemap , you nailed it everything πŸ™‚
    Thank you for sharing awesome things with us

  13. I think Google must to pay you more for fix this bug. One thing that must be notice. No google employee founded such big loope whole. As per my experience in seo field. Amount they paid is too small then your ethical approache