Skip to main content

Can Google Webmaster Tools Data be Trusted?

WMT-feature-image-horizontal Google Webmaster Tools data is something everyone in the SEO industry takes for granted. But just how reliable is it? This article focuses on the link data provided by Google.

It’s a seldom explored feature, but as anyone with a penalty will know – what you can’t see can hurt you.

I tracked and monitored links data coming from GWMT over 10 sites for 2 months, and compared this data with that of other reliable backlink sources. The results were interesting.

In this article I’m going to explain:

  • How much of your backlink profile you can expect to find in a one-off download from Google
  • How to squeeze more data out of your samples by combining them
  • Why you should only download GWMT data twice a week
  • What amazing (and free!) tool you can use to download all your GWMT accounts at once

Profile Sizes

A big part of what I do here at WMG is forensically analyse backlinks to sift out any spam and remove Google Penalties. One reason we’re so successful at penalty removal is down the the sheer detail we go into, and the bulletproof process we have for building and maintaining an accurate backlink profile.

I wanted to know just how reliable GWMT is as a data source, and how useful it is when used for spam hunting. Google provides its links data for free, which makes it a great addition to link sources if you’re on a budget. Ultimately, though, if you’re expecting to have all your links handed to you in a single download then you’re going to be disappointed.

Sample data is just that, a small sample of all recorded links pointing to your site. From what I’ve seen, samples can range from anything between 1 solitary link all the way up to exactly 100,000 links, which may well be the cap on sample sizes. The size of your sample file is determined by the size of your backlink profile.

For example, a domain that Google says has 3,889 backlinks only had 1,326 links available to analyse in the sample data – so this was a sample of approximately 34% of the total profile. A much bigger domain had a profile of 32.2 million backlinks, and that sample contained only 99,228 links – 0.3% of the amount quoted by Google.

It’s not exactly clear how the number of links per sample is calculated, and even when analysing just a single site the samples sizes can still change significantly from from week to week. To get a clearer picture I analysed 300 site profiles, measuring the difference between the number of links stated in Google webmaster tools, and the number of links in the sample data.

The trend that emerged seems to be this: The bigger a website’s link profile, the less Google offer.

Sample size vs profile shown in GWMT
Scatter graph of the amount of links Google records in Webmaster Tools, vs. the links counted in downloaded samples.

So, if your site has more than 100 links you can expect an unreliable representation of your true backlink profile. Unfortunately, unless your site is brand new, you’re likely to have much more than just 100 links.

What about the number of unique linking domains?

Google never states the number of ULDs that point to your site, which would give us an accurate indicator of how much of a backlink profile is covered by sample downloads.

The ‘Links to Your Site’ section does provide the number of  ‘top domains that link to your site’, but these are capped at 1,000 and not an especially useful indicator of how much data you may be missing.

Combine Multiple Samples for a Clearer profile

If you’ve ever had to analyse a backlink profile before then you’ve probably noticed your data goes out of date pretty quickly. Working from a single snapshot of links becomes unreliable, especially if you’re tackling a profile with millions of links over a period of weeks. Taking multiple samples over time (and not just from Google) is a good way of ensuring any newly discovered links are assimilated into a larger cache of historical data.

The Experiment

My method was simple, download GWMT sample and break it down into links and domains. Repeat every day, and if any new domains pop up add these to a master list.

After two months of recording daily GWMT samples I had managed to build up an extensive backlink profile for each of the sites. A combined profile was substantially bigger than any single download, and on average contained around 23% more unique linking domains than an average sample.

For example, I compiled a list of 1,893 unique domains for a site with  an average sample size of 1,622 domains on – that’s 17% more data then I had to start with.

Some profiles revealed significantly more unique linking domains than others during the experiment, and there’s no way to predict how many additional domains you’ll uncover over time.

The result of long-term compilation is always the same: You get bigger sample to base your analysis on, with newly discovered links included.

How much is too much?

I downloaded data once a day to give me a granular level of detail; this would be overkill for anyone wanting to build profiles over time. It became clear within the first week that the links data did not shift on a daily basis, but twice weekly. The graph below shows the fluctuation in the number of links recorded each day between 5th March 2014 and 2nd May 2014 (59 days):

Link Fluctuation Per Day.
A line graph showing the number of links lost or gained compared to the previous day

The number of links provided in each sample changes approximately twice a week. Sometimes there are more links than the last sample, and other times there are less. A closer look at the data showed several things;

  • Data refreshes typically happen every Tuesday and Thursday
  • Links and Domains will be different with every refresh
  • New links and domains appear even on the days when the sample size shirnks

Sample sizes will naturally change as a site’s links drop in and out of Google’s index. However, I uncovered domains halfway through the experiment I’d never seen before which were links placed many years ago.

It’s as if a random sample of old and new backlinks are generated at fixed intervals.

Whatever the case, if you regularly download your GWMT link data you can expect to uncover new linking domains twice a week, as illustrated in the graph below:

A line graph showing the number of unseen unique linking domains, per day.
A line graph showing the number of unseen unique linking domains, per day.

Why use GWMT data at all?

Webmaster tools data often pales in comparison to other backlink sources. It offers a significantly smaller proportion of linking data than any of the trusted backlink sources typically used in the SEO industry, such as Majestic SEO or ahrefs.

However, after comparing link data from 7 of the best backlink sources money can buy, it became clear that sample size isn’t what’s important – it’s uniqueness.

When you’re battling to fight off a penalty every unique link counts. Like I said earlier, what you can’t see can hurt you. Missing a few thin content directories or a hacked forum means a failed reconsideration request, and another month of your domain being penalised. This can be a disaster for e-commerce site owners in particular.

After analysing a variety of different profiles I found that Webmaster tools data contributes between 15-30% of all the unique links in a full profile, and very similar numbers of ULDs. This means that without the webmaster tools account any analysis would have missed around a third of the ULDs needed for a complete analysis.

It’s likely that the uniqueness from Google samples is so high simply because the sample data is much fresher. What this implies is that:

  • Standard backlink sources you pay for aren’t as fresh as Google’s samples
  • Any analysis done without GWMT data is flawed – the most recent spam could be missed entirely, which guarantees a failed reconsideration request

Conclusion

If you’re trying to remove a manual penalty on a single site it’s a good idea to keep on top of your GWMT data. You can spot and remove any malicious links as they arrive, and keep going until the penalty is removed.

If you’re an agency looking after a number of different sites, it might be wise to start collecting data now. We like to be prepared, so a Doomsday Vault has been set up to store GWMT data, which is updated twice a week.

This way there’s always an extensive backlink profile on file with fresh and historic links should any analysis need to be completed 1 month, 6 months or a year in the future.

Is there something that can do this for me?

Yes! Sort of.

There’s a great free tool called SWAT (SEO Web Analysis Toolset) – and one of it’s features allows you to download ALL your GWMT data in one go. And that’s not just Links to Your Site, but all your internal links, top search queries, top search URLs, content keywords and more.

It was very kindly made by Tony McCreath who’s created a whole host of other useful SEO tools, and will no doubt continue to improve SWAT over time. There’s no automated tool, just yet, that can compile and analyse the sample data once you have it. This is something I might develop and share in the future.

My advice for large domains with a penalty would always be: Get a professional to do your backlink analysis for you, it saves a lot of time and money in the long-term.

Alex’s Key Takeaway points:

  • Sample data is proportional to the size of your backlink profile. The larger your profile, the less of it you’ll receive in samples
  • ‘Sample Data’ and ‘Latest Links’ contain exactly the same links. The only difference being that ‘Latest Links’ gives you a date of when each link was first discovered
  • Link data available via downloads refreshes twice weekly – typically a Tuesday and a Thursday, don’t bother downloading samples every day
  • The number of ‘top domains’ available to browse through the ‘Links to Your Site’ page will always be capped at 1000
  • Although the sample size is smaller, it’s uniqueness compared to commercial sources is significant. GWMT data should be included in every analysis
  • That said, it should go without saying that GWMT should be bolstered with other backlink sources, including historical data

FIND THIS POST INTERESTING? SPREAD THE WORD!