Jul 3, 2011

Keyword Difficulty vs. Size of Domain

You might wonder, when it comes to competitive keywords, does Google prefer ranking big domains? This question can be answered by taking a close, detailed look at the relationship between keyword difficulty and the size of the domain. The result of this study is important, because it will lead us to a deeper understanding of the behavior of Google's ranking algorithm when it comes to ranking competitive keywords (in relation to both small and large domains).
Definition of the Factors

To quantify keyword difficulty, you will associate it with the keyword searches; the higher the keyword search volume, the more competitive the keyword, in most cases. We can look not only at the amount of search volume, but also the number of indexed pages in Google containing the exact term. The higher the number of competing pages, the more the keyword would tend to be competitive.

Therefore, to give a mathematical definition, keyword difficulty can be approximated by the following formula:

Keyword Difficulty = Keyword Search Volume (Exact match) x Google competing pages
In the above definition, it is obvious that if a certain keyword has a very high search volume and many competing pages, it is a difficult keyword for which to rank in Google.

The size of the domain can be measured by the number of indexed URLs in Google using the operator:

Example:-

site:domainname
So if the ranking URL is http://www.skydelinfotech.com , the site operator query will be:
site:skydelinfotech.com

Objective and Methodology of the Study

The main objective is to determine whether Google prefers to rank big websites for highly competitive terms. In this study, you will need a statistically valid sample size. A sample of size of 30 is selected. The 30 random samples will be comprised of the following breakdown:

~10 highly competitive keywords
~10 medium competitive keywords
~10 very easy (non-competitive keywords)
Below is the procedure for gathering the data:

Step 1: Make a final list of keywords selected -- a total of around 30 keywords based on the sampling distribution, with varying difficulty. Put the data in a table format for easier organization. Use the Google Keyword Tool  and the Google search engine itself to get the raw data for keyword difficulty computation.
Step 2: Using Google.com, get the top 10 ranking URLs for each of those keywords on the list.
Step 3: Get the number of indexed URLs for each of those top 10 ranking domains, starting from position 1 and going through to position 10, removing duplicate domains and outliers (refer to spreadsheet for details; link provided in a later section). Tabulate the data in the Excel spreadsheet.
Step 4: Compute the average number of indexed URLs for each keyword from position 1 through position 10 domains.
Step 5: Make a correlation plot between average number of indexed URLs vs. keyword difficulty.
Step 6: Make a regression analysis on the data.
Step 7: We'll draw conclusions and make recommendations.
Keyword Selection List and Keyword Difficulty Data

Data has been gathered according to the methodology set in the previous section, and then keyword difficulty has been calculated. Below you'll see a screen shot of the data gathered:

You've probably noticed that the highly competitive keywords (the first ten) have very high keyword difficulty values as compared to medium and easy keywords.

Google Top 10 Ranking URL Data for the Selected Keywords

After the top 10 domain name data is gathered from Google for the selected keywords, the indexed pages of the domain are gathered as well. How many pages are indexed correlates with the size of the domain.

However, after the data has been gathered, to ensure accuracy of the results, the data outliers are removed. These outliers are defined as data points outside:
Outlier = average + standard deviation;

In layman's term, this outlier represents some noise or special cases which are not considered "normally occurring." After removal of the outliers, the final average data will be used for correlation plot. Below is screen shot of the data table containing an outlier:

Data Tabulation of the Overall Results

Once the outlier has been removed and the average numbers of indexed pages are re-computed, a new column will be added to the original data table labeled "Average indexed URLs of Top Ranking domains."

For example, in the above screen shot, for the competitive keyword "books," the average indexed URLs for all of the top 10 ranking domains for "books" (position 1 to position 10) is 7571714. This number of indexed pages signifies "very large ranking domains" for Google's top 10 positions.

Since you are interested in finding out whether Google prefers to rank "big" sites for "difficult" keywords, you will need to make a correlation plot between these two factors/variables (keyword difficulty and average number of indexed URLs of top ranking domains).

All of the data sets, tables and computations used in this article for your own evaluation purposes

Correlation and Regression Analysis of the Data
Finally, once the data table has been finalized, you are ready to make a correlation plot. Using the MS Excel correlation plot feature generates a chart that looks like the one below:

It is surprising to see that there exists a strong correlation (78%) between "average indexed URLs of Top Ranking domains" and "Keyword difficulty." In short, big domains are the ones ranking for difficult keywords. Stating this in another way, "Google typically prefers to rank big websites for highly competitive terms."

For easier keywords (low keyword difficulty value), Google returns smaller websites with a lower number of indexed URLs. However for highly difficult keywords like "LCD" and "SEO," Google prefers ranking big websites.

Search Engine Optimization | Anand Web Info