Deep SEO: Using data to find the best links
Published 4th October 2014
Today, there are two big things I see going on across the Google landscape:
- Fewer and fewer links actually having a ranking effect, with the wrong links having a dampening effect on rankings.
- Sites which have obviously been flagged as trusted sites and get a free pass on links.
I have written this article to help unpack some big ideas, to ultimately share the best of my knowledge with you.
Generally we know links (mostly) get rankings, what most people don’t know is what links work and how to find them. We look at right data and metrics to use when making a decision on a link placement.
The other part of this article is looking into a fairly new phenomenon, where Google appears to give certain sites trust and authority (by their definition) and they rank off the back of virtually no links. We go into how to mould your site so you will hopefully become one of those trusted sites.
At this point, I’d like to reiterate a point I’ve consistently maintained throughout my 15 years in SEO. When it comes to eventdriven digital marketing, especially SEO with its delayed response times to various actions based on an infinite number of variables and unknowns…
Link analysis: foundation concepts
PageRank equates to the number of links pointing to your page, or as Wikipedia states: “PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites”
There is what I describe as “external PageRank”, i.e. links from third-party sites increasing PageRank on a page, and “internal PageRank”, which is the PageRank distribution across a site based on internal links.
For any given page, there will be a mix of PageRank coming from the internal links pointing to that page, through to the external links pointing to that page. I talk about internal PageRank quite a lot, because it has a huge affect on the PageRank on a given page.
TrustRank is effectively the trust Google has in the links to your site. Trust is determined by the number of links from sites known to be trusted.
Wikipedia says: “TrustRank method calls for selecting a small set of seed pages to be evaluated by an expert. Once the reputable seed pages are manually identified, a crawl extending outward from the seed set seeks out similarly reliable and trustworthy pages.
TrustRank’s reliability diminishes with increased distance between documents and the seed set.”
Or in other words, there is a seed set of sites which have been manually reviewed. Good site tend to link to other good sites and those good sites tend to link to other good sites and so on.
Sites and Pages. Google thinks in pages and sees sites as groupings of pages. This is an important concept, because we often just think of getting links on good sites, rather than thinking of getting links on good pages on good sites. This is a small, but massively important difference. When looking for links, I typically think about sites as representing 25% of the overall importance, and pages 75%, i.e. I look for sites with nice metrics, but really look for pages on those sites with nice metrics.
Relevance for links
This is a far bigger deal than ever before.
For many years, anchor text was seen as a good easy way to determine the relevance of a link. But we (all of us in competitive SEO) messed this metric up because we spammed it. Now based on my analysis, relevance seems to be determined by the relevance of the content surrounding the link and the relevance of that content to the content its linking to. It all makes perfect sense to me, because understanding content is core to Google being a great search engine. Algorithms like Panda and the fact you can search content by reading age validates this point.
Link analysis: practical framework
The next thing is to put this together in a practical framework so you can employ this information.
This is where the right tools come into play. Since I use Majestic, I’m going to refer to their metrics
- Citation Flow (same as PageRank),
- Trust Flow (same as TrustRank)
- Topical Trust Flow (categorisation of the content on a page)
Note on Majestic: I have tried several sources of link data and for me Majestic is the one I trust the most.
SEMRUSH. I follow a simple theory with getting links. Does the domain I’m getting a link from, rank anywhere? If it ranks, I assume any link from that site is a valuable one. To find this out I use a tool called SEMRUSH which scrapes around 106 million key phrases and 71 million domains across 25 countries on Google and sometimes Bing, and so gives me large amounts of data about the phrases a site ranks for.
I’m looking for in a good potential linking page:
- Domain Trust Flow above 17 (varies dependent on industry)
- Trust Flow on a page being above 4
- Citation Flow being similar to the Trust Flow number
- Topical Trust Flow categorisation for the linking page being similar to the target page
- Whether the domain ranks on Google
To go from here, its a case of sourcing the raw data and appending these metrics to the domains and pages you will review.
The next step is to get sources of sites who may be relevant to your link building campaign. Data Sources:
- Majestic backlink reports
- Citation Labs Link Prospector (scrape Google)
Citation Labs: Scraping Google
You can do it the hard way and type in a keyword, then manually search the results and spend your time copying and pasting into a document, or you can use a cheap and really effective tool like Citation Labs
Link Prospector. It is a bulk scraper of Google search results (see Figure 1).
With this you can apply advanced filters:
- Guest Posting: Discover blogs that allow guest posts and offer your writing services.
- Links Pages: Find the resource and links pages that will add a link to your site.
- Content Promoters: Find writers who are likely to cover your story, or repost your infographics and articles.
- Reviews: Look for people who review products or services in your space.
- Giveaways: Offer goods and services for use in contests.
- Donations: Be charitable and earn links through sponsorship opportunities.
- Commenting: Find the posts relevant to your key phrases. Join the conversation, and link to your site.
- Expert Interviews: Identify domain experts you can interview, or pitch your own expertise.
- Directories: Identify quality directories to help you get your site listed quickly.
- Forums: Add value to the conversation and build brand equity.
- Blogs: Pinpoint the blogs that cover your industry, and are likely to post about your site.
- Professional Organizations: Locate trades and professional organizations you can join.
- Research: Content Find top tips for writing great content.
- Custom: Use any amount of research phrases you like.
This tool will typically bring in 4,000 lines in a single report. The challenge is dealing with the amount of data, filtering it and making it useful.
Majestic back-link analysis
There is a simple idea here. If a site ranks in a competitive segment, then it’s because it has the right links, or it has been tagged as a trusted authority site, or a mixture of both. Therefore looking at the best backlinks for a domain makes a lot of sense. If they linked to a competitor site similar to yours, they may link to you.
It’s a case of picking domains you think are similar to yours which rank well and analysing their whole link set. The key is to focus on page-level metrics rather than domain level metrics.
The process is simple enough:
- You download all the best backlinks links from a domain
- Filter out anything that is no follow
- Filter out anything that has a page Trust Flow of less than zero
- Add in domain ranking data from SEMRUSH and filter out domains which are not on their database
- Filter out irrelevant sites i.e. fashion site links to a casino site
And you will have a distilled list of pages/ link targets you can go for. Then you manually review them and set them up for pitching.
The key is finding the right sites to analyse. For this, I use the 90 SEO DataGrabber. I add a bunch of keywords for a particular keyword territory I’m interested in, and then I run the tool. I then have a list of sites in order of their total search visibility across all the keywords I’m analysing. From that I can pick out certain sites for further analysis.
You will see as you process a number of these domains, how few backlinks actually have real value. The important thing is to keep away from domains which have been flagged by Google as “untrusted”, because they will bring your rankings down. The domains act as a damper on all your other links. This is where SEMRUSH data is invaluable, because it will guide you to sites that actually have a presence on Google rankings and by (my) definition are trusted. But there is more…what about those sites that just rank on what seems to be no links? What is going on there? To give you more insight, let’s look at one such domain.
Sites which rank on no links
There is a new category of sites which are the trusted authority sites. They are the ones which seem to rank despite having no links of any quality.
There are possibly two explanations:
- the domain has huge numbers of redirected links, and so are not picked up by tools like Majestic
- the domain actually doesn’t have many links at all, but has been ‘blessed’ by Google and ranks irrespective of their link base. thebigfreechiplist.com is one such “blessed” site, and its rankings are climbing…why? (see Figure 2)
On the SEMRUSH Database it ranks for 634 keyphrases like:
|no deposit casino||2|
|online casino bonuses||3|
|best casino bonuses||3|
|online casino bonuses||4|
|no deposit bonus||2|
|all slots casino||3|
|online casino bonus||11|
The site metrics are interesting:
- External Backlinks: 59,283
- Referring Domains: 754
- Referring IPs: 720
- Referring Subnets: 633
- Educational Ref. Backlinks 28
- Governmental Ref. Backlinks 60
- Educational Ref. Domains 4
- Governmental Ref. Domains 1
- Indexed URLs 60,046
Something like 99.2% of its links are “no follow”, and as you know, no follow links don’t pass PageRank, and so don’t count towards Google rankings.
To analyse the data, I use these filters:
- Is a no follow link
- Links have a page trust flow of over 1
- Has at least one ranking keyword on any one of the 20 SEMRUSH country databases
- Is still a live link
There are only nine linking domains left (see Figure 3).
And if we assume relevance is a factor, when we dig into the data some more, it’s clear that there are no good links here. There are some forum posts, a bit of comment spam and a few content for links posts, but nothing that logically justifies its really strong rankings.
My only logical conclusion is this and sites like it have been given a “trusted” classification on Google, and so have started to rank even without links.
In the 2014 leaked Google manual reviewer guidelines, there is far more weight on the trust of a domain. Here are examples for how Google determine this:
2.7 Website Reputation
A website’s reputation is based on the experience of real users, as well as the opinion of people who are experts in the topic of the website.
2.7.4 How to Search for Reputation Information
Look for articles, reviews, forum posts, discussions, etc. written by people about the website.
4.4 Positive Reputation
Reputation is an important consideration when using the High rating. You can view the full manual reviewer guidelines at: http://canuckseo.com/blog_pdfs/google-quality-rater-guidelines-2014.pdf Why thebigfreechiplist.com? Why did they get to be trusted?
When you look at the site, whilst it’s basic (sorry if its your site) and it’s loaded with hundreds of bonus codes, it is actually a really useful resource for anyone looking for a bonus code. You can see how much effort has gone into actually finding bonus codes for the site, despite its appearance (see Figure 4).
According to the Google reviewer guidelines:
“Keep in mind that there are ‘expert’ websites of all types, even gossip websites, fashion websites, humor websites, forum and Q&A pages, etc.” and “High quality pages are designed to achieve their purpose: they are well organized, use space effectively, and have a functional overall layout. “ and “Some pages are ‘prettier’ or more professional looking than others, but you should not rate based on how ‘nice’ the page looks. A page can be very functional and achieve its purpose without being ‘pretty’.”
This describes thebigfreechiplist.com very well. I believe they have been given a “trusted” rating and have sailed up the rankings because of it.
Closing thoughts: if you are lucky enough to be trusted by Google, then keep doing what you were doing i.e. making yourself a genuinely trusted source of information whilst making money for yourself. You will save money on links because you don’t need many and you can probably enjoy life in your beachfront villa. For the rest of us, its a case of weighing up whether its easier to get more links and make the content as converting as possible, or go down the trusted route.
Step 1: Have a real purpose for the site which has benefit to the user
Step 2: Get the right links and start to rise in the rankings
Step 3: If you’re lucky, get ‘blessed’ by Google, ease off on link-building and make loads of money like thebigfreechiplist.com is.