Google API documents LEAK! What We Know So Far.


Similar to what happened earlier to Yandex Search factors leak, Internal documentation for Google’s Content Warehouse API has leaked, revealing insights into Google’s search algorithms. The leak includes details about data storage for content, links, and user interactions, but lacks specifics on scoring functions.

The leak validates many long-held SEO beliefs and provides a clearer picture of Google’s Ranking mechanisms, emphasizing the importance of quality content, user engagement, and strategic link building.

Many of the claims coming from people verifying the authenticity of the leaks directly contradict public statements made by Googlers over the years, in particular the company’s repeated denial that click-centric user signals are employed, denial that subdomains are considered separately in rankings, denials of a sandbox for newer websites, denials that a domain’s age is collected or considered, and more.

The Google API Leak Main Points include: 

  • There are 14K ranking features and more in the docs.
  • Google has a feature they compute called “siteAuthority”.
  • Navboost has a specific module entirely focused on click signals representing users as voters and their clicks are stored as their votes.
  • Google stores which result has the longest click during the session
  • Google has an attribute called hostAge that is used specifically “to sandbox fresh spam in serving time”.
  • One of the modules related to page quality scores features a site-level measure of views from Chrome.
  • During the Covid-19 pandemic, Google employed whitelists for websites that could appear high in the results forCovid-related searches.
  • Similarly, during national elections, Google employed whitelists for sites that should be shown (or demoted) for election-related reasons.
  • Other minor factors such as: rank delisting and penalties for domain names that exactly match unbranded search queries, a newer “BabyPanda” score, and spam signals are also considered during the quality evaluation process.
  • Small sites: another feature is smallPersonalSite – for a small personal site or blog. Google could boost or demote such sites. However, that remains an open question. Again, we don’t know for certain how much these features are weighted.

Uncertainties and what to do next:

  • Weighting of factors remains unknown. Although the leak reveals various ranking features, it doesn’t detail their relative importance. SEO professionals should continue to monitor search trends and experiment with different strategies.
  • Impact on specific website types. The leak mentions features like “smallPersonalSite,” but it’s unclear how these impact rankings. It’s essential to adapt SEO strategies based on website type and target audience.

Our Professional Opinion

  • Focus on quality content and user engagement is paramount. The leak emphasizes the importance of high-quality content that resonates with users. Time spent on a page, clicks, and overall user engagement are now confirmed as ranking factors. This strengthens the SEO strategy of creating valuable content that keeps users engaged.
  • Strategic link building remains crucial. The leak underscores the importance of acquiring backlinks from reputable websites. This reinforces the SEO strategy of building a strong backlink profile.
  • Transparency concerns. The leak reveals discrepancies between Google’s public statements and the leaked data. 

Follow Digipeak’s blog for the latest information on Digital marketing news all over the world !

    Are you ready to make the leap into digital marketing?

    Join us now!

    Get in touch

    Asem Mansour
    Share this article

    Resources that you’ll love