Home » Blog » Semrush Backlinks Update 2021: Here’s How We Built a New Backlinks Tool

Semrush Backlinks Update 2021: Here’s How We Built a New Backlinks Tool

About a year and a half ago, we set a goal.

This goal was to build the biggest, fastest and best backlink database for our clients and to be better than the main competitors in the market.

Now that we’ve reached our goal, we can’t wait for you to try it out for yourself!

Want to know how exactly we were able to build such a database?

All it took was approximately 16,722 cups of coffee paired with over 500 servers and 30,000 hours of work from our engineers and data scientists.

Sounds simple, right?

You just have to check out this blog post to see how much faster we are now.

Let’s talk about what’s new first, and then we’ll show you how we introduced them and what problems we solved. With increased storage space and three times more crawlers, our backlink database now has the capacity to find, index, and grow even more. On average, we now crawl:

Before we get into the details of what has been improved, let’s review the basics of how our backlink database works.

First, we generate a queue of URLs that lebanon telegram data decides which pages will be crawled.

Then our crawlers go out onto the Internet and inspect these pages. Once our crawlers identify hyperlinks that point from these pages to another page on the Internet, they record that information.

Then there is temporary storage that keeps all this data for a while before dumping it into the public storage that any Semrush user can see within the tool.

With our new release, we’ve virtually

emoved the temporary storage step, added three times more crawlers, and added a set of filters before the queue, so the whole process is much faster and more efficient.

To put it simply, there are too many pages to explore on the Internet.

Some need to be crawled more often, others not at all. Therefore, we use a queue that decides the order in which URLs will be crawled.

One common problem with this step is crawling too many similar, irrelevant URLs, which can lead people to see more spam and fewer unique referring domains.

So what did we do?

To optimize the queue, we added filters that prioritize unique content, higher authority sites, and link farm protection. As a result, the system now finds more unique content and generates fewer reports with duplicate links.

Here are some examples of how it currently works:

  • To protect our queue from link farms, we check if a large number of domains are coming from the same IP address. If we see too many micayla rasanow sourcer domains. Coming from the same IP, their priority. In the queue is lowered, allowing us to. Explore more domains from different IPs and not get stuck on a link farm.
  • To protect sites and avoid polluting our reports with similar links, we check if there are too many URLs coming from the same domain. If we see too many URLs on the same domain, they are not all crawled on the same day.
  • To ensure that we crawl updated. Pages as soon as possible. URLs that we have not crawled before. Are given higher priority.
  • Each page has its own hash code that helps us prioritize crawling of unique content.
  • We take into account how often database d new links are generated on the source page.
  • We take into account the authority score of a web page and a domain.

Queue Improvement Technique

  • More than 10 different factors to filter out unnecessary links.
  • More unique and quality pages thanks to new quality control algorithms.

 

Scroll to Top