Did You Configure Your Site for the International Googlebot?

Lalit Sharma

Aug 21, 20154 min read
Did You Configure Your Site for the International Googlebot

Once every so often, the SEO world is subject to a major shift that shakes its virtual tectonic plates to the core. In January this year, the tremor of this impending quake went unnoticed. This was characterized by a change in the foundational basis of SEO best practices internationally.

In January, Google enhanced its Googlebot’s ability to evaluate the way content changes according to a user’s location – denoted by IP address – and the preferred language settings,– through the Accept-LanguageHTTP header. Today, more and more businesses have restructured their websites to be able to serve up content dynamically according to a user’s language or country.

How Google Crawls and Indexes International Content

Google itself has admitted that it has some issues identifying and indexing international content/content in different languages this way. According to the help section of Google Search Console, Google may in some instances fail to crawl, rank or index international content because its crawlers’ default IP addresses originate from the US.

In addition, the Googlebot crawler currently sends HTTP requests without configuring the Accept-Language setting in the header. Google described in detail how some locale-adaptive pages may be skipped.

For Google to facilitate crawling and indexing of these locale-adaptive pages, it utilizes a locale-aware crawling pattern, which gives better content for searcher globally. Locale-aware crawling happens where Googlebot is crawling under either or both of these configurations:

  1. Geo-distributed crawling

The bot looks like it’s using IP addresses originating outside of the US as well as the long-established, US-based IP addresses. Google recommends treating Googlebot appearing to come from a certain country as a user coming from that country.

So for instance, if users from Papua New Guinea are allowed to view certain content in your site but users from Germany aren’t, Googlebot appearing to come from the former should be allowed while from the latter should not be allowed. Over time, the list of IP addresses for countries from which Googlebot appears to come will change.

  1. Language-dependent crawling

The bot will crawl with the Accept-Language field configured in the HTTP header. For such content, Googlebot applies a number of signals to attempt to crawl such sites’ content by applying various Accept-Language HTTP headers. As such, Google has a better chance of discovering, indexing and ranking that page’s content in the different languages supported.

Locale-adaptive sites therefore need to confirm that their sites have configuration to support both crawling patterns. At present, the bot can recognize certain signals to assess whether a website has locale-adaptive content, which include:

  • Sites offering different content on unchanged URLs according to a user’s geolocation settings
  • Sites offering different content on unchanged URLs according to the Accept-Language field setting on the user’s browser
  • Sites that completely block access depending on the country from which requests come

As such, SEOs that have clients generating content for international/non-English speaking audiences as well need to understand the impact of the upcoming changes to Googlebot crawling and know how to check if their sites have configuration to facilitate Google’s international crawling.

Non-US IP Addresses Crawl Pattern

From January, Googlebot was able to crawl from IP addresses of countries other than the US. Depending on the IP address location, Google is able to understand whether content given by a particular page/site is different for users internationally. It can also evaluate whether the newly-found version of that page/site might be more relevant for a user in a certain country.

Imaginably, this provides a remarkable improvement in the search experience of non-English speaking users by making sure that version of a site/page created for their country is visible in their search results.

Checking Your Site's Configuration

If you or your client’s website offers locale-adaptive dynamic content according to the user’s IP address, you can use international proxy services to check it. Most crawlers like Screaming Frog permit proxy configuration, which will enable the automated identification of SEO non-optimizations from international users’ perspectives.

Accept-LanguageHeader

More sites are using the Accept-Language header setting automatically to change the language of their web content. Google’s locale-adaptive crawling scheme can now send dynamic requests to sites. The bot will send a request for a specific page on the site, with a preferred language specified.

The effect is similar to how you would configure your own browser according to your language preferences from Chrome>>Preferences>>Languages if using Google Chrome. (You may have to click on "advanced settings" in the Preferences menu to find it.)

How to Change Your Configuration Settings

You can use the Locale-adaptive Pages Testing Tool from Merkle and stipulate the specific languages you want to check for. You’re allowed to specify upto 10 URLs and select from Google, Bing or Normal user agents, then run the tool.

From the results, you can tell which sites have proper configuration, even if all your content changes dynamically according to the Accept-Language header setting. For properly configured pages, the Accept-Languagesetting (first column) should match the Content-Language setting (fourth column).

Conclusion

The change in Googlebot international crawling scheme may not have had as big an impact on SEO today, but it promises to have a dramatic impact in the future.

Apart from addressing the issue of dynamically offered content through locale-aware crawling, it’s important to note that Google still favors the use of separate URLs that have been properly annotated for the different content using the rel=alternate hreflang annotation.

You need to understand the contradiction this presents. Why does Google favor the use of separate URLs? Might it be because of the fact that with more sites joining the bandwagon of dynamically-served content, Google will find it harder to identify which content currently exists? It’s a possibility that carries more than just a little merit.

As more websites restructure themselves to meet international users’ needs by offering locale-adaptive content, Google has to develop more crawl configurations to keep informed about the content and thus be able to understand the full picture. While manageable on a small scale, considering the implications of large scale application of locale-adaptiveness no doubt leaves the search engine giant in a precarious position.

Share
Share