In a perfect world, Google would crawl our entire site. Google would find the right pages to put before our target users, and it would do so at instant speed. But Googlebot doesn’t work like that. More specifically, your site’s crawl budget probably doesn’t allow it.
You can’t change how Googlebot works, but you can do something about your crawl budget.
Gary Illyes’s document, “What Crawl Budget Means for Googlebot,” explains Google’s definition of “crawl budget.” It also details how Googlebot uses a site’s crawl budget to determine what/how much gets crawled. We’ll go into it more in-depth soon enough, but we wanted to point out how the document seems to send two conflicting messages:
- Only large sites need to worry about the crawl budget; and
- Getting your site crawled is critical to website success
Sure, Illyes says right there in the document that crawl budget “is not something most publishers have to worry about.” But how can that be? Many of us may not have sites with thousands upon thousands of URLs, but we shouldn’t undervalue getting a more efficient crawl. Especially when we can get it by improving our website.
To improve our crawl budget, we first have to understand it. We need to learn what Googlebot looks at when determining it. Only then can we get a more efficient crawl for our site.
In case you forgot what Googlebot does: “Crawling is its main priority while making sure it doesn’t degrade the experience of users visiting the site.”
Googlebot’s job is to crawl your site. But it also wants to serve up useful pages for users to view. Crawl budget helps it do both.
So, just what is “crawl budget?”
We’ve seen many definitions of crawl budget over the years. And no offense to the ones who’ve done so before, but we’ll take Google’s definition over anyone else’s any day.
Google defines crawl budget as the number of URLs Googlebot can and wants to crawl.
There are two parts to this definition: 1) how many URLs Googlebot CAN crawl, and 2) how many it WANTS to. To understand this, we have to look at the factors that make up crawl budget, “crawl rate limit” and “crawl demand.”
Crawl Rate Limit
Although Googlebot’s sole purpose in life is to crawl, crawl, crawl, it doesn’t want to serve up every page a site has. Not many users would want that. Instead, Googlebot imposes a limit—a maximum fetching rate—made up of the “number of simultaneous parallel connections Googlebot may use to crawl the site.” This is based on two things:
- Crawl health
- Googlebot’s Crawl Rate Limit in Search Console
Yes, you can limit the crawl rate of your site. Although you’d probably only want to lower it in extreme cases, like if your server gets hit with too many requests.
This is what Googlebot wants to crawl. Two things determine crawl demand:
We hope you’ve got useful links. Favorite links get crawled more often, so they’re fresh for the index. Google actively works to “prevent URLs from becoming stale in the index.”
Crawl demand is inarguably more important than your crawl rate limit. If your site doesn’t have excellent links to demand Googlebot to crawl it, your crawl rate limit won’t matter—Googlebot will stop crawling.
What factors affect the crawl budget?
Illyes reports that “having many low-value-add URLs can negatively affect a site’s crawling and indexing.” Weak or worthless links. Here are the kinds of bad links you’ll want to avoid:
- Faceted navigation and session identifiers (think filters on e-commerce sites)
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
How to Get Googlebot to Crawl Your Site More Efficiently
You now know how Google defines crawl budget. And you know that links play a huge role in a healthy crawl. Now it’s time to find out what you can do to improve the crawl budget of your site.
Write A+ content (AKA not duplicate, not low quality). In case you needed yet another reminder, Google doesn’t like duplicate or spammy content. The content on your site should help answer a user’s query. So why waste Google’s time crawling the same content when you can expand your authority with a different answer?
Fix page errors. Pages with soft 404s and other errors can take up crawl budget meant for better pages. Identify which pages have errors and then work to fix them. For example, long redirect chains can slow down crawling (but you knew that already). Cut those down, and you’ll have a more efficient crawl that won’t get clogged up jumping from page to page.
Increase your site speed. Site speed directly affects crawl health, which is a factor for your site’s Crawl Rate Limit. The faster your website, the more open connections Googlebot has to your site. And considering that users will leave a website that doesn’t load within 3 seconds, you have plenty of reasons to make your site lightning-fast.
Budget Well and Benefit
Improving your crawl budget and having a good website is the same. Crawl budget optimization requires the same hard work that goes into making a good user experience. Since even small sites should follow Google’s guidelines—not just large ones—everyone should be conscious of their crawl budget and how they can improve it.
Need help optimizing your crawl budget? Let us know how we can help, and we’ll get you on Googlebot’s good side.