17 Jun 2013

Google Spell Check and How it Works: Types of “Refined” Results

Google’s spell check is a very old feature Google has been constantly improving. Understanding how it works is essential for better understanding of keyword research process as well as reputation / brand management.

So what does Google consider a misspelling?

Google is not using any standard Grammar rules or dictionaries. It’s misspelling identification process is all user-behavior based. For many years we’ve been wondering how much the actual web index (versus search volume) actually makes a difference:

Google figures out possible misspellings and their likely correct spellings by using words it finds while searching the web and processing user queries.

The article seems to imply that Google is using both web index as well as query processing algorithm in order to decide if the word you type needs a refinement. Our own experience proves that it seldom has anything to do with the search index (note that in the search below, the correct spelling has about 107,000 results):

netmeg vs nutmeg

(It used to be even worse though)

One of the very old patents (that dates back to 2006 and was covered by SEO by the Sea) suggests Google is primarily focusing on query processing: Query revision using known highly-ranked queries.

This patent introduces query rank (QR) that is calculated based on frequency of a query (QF) and user satisfaction (US) with the query.

  • Frequency of the query = how often this exact query is searched
  • User satisfaction is higher as revision frequency is lower (i.e. the less often each query is refined, the higher user satisfaction).

Query rank

Based on the query rank, there are two types of queries:

  • Known highly-ranked queries (KHRQs): Those with high QR
  • Nearby queries (NQs)

“Nearby” queries are those that have low QR while having some similarity to a KHRQ. The similarity may be semantic (lexical), syntactic, behavioral or any combination of the above.

This implies that:

The lower QR, the more aggressive Google is in trying to fix your “typo”

That brings us to another part of this article:

Types of Query Refinements We Are Observing Now

(I made up the names of the types; there may be better ways to name those…)

1. “Did You Mean” Refinement

How it looks in search: Google will return results for your probably-misspelled query but it will suggest a refinement in top.

This one occurs with most frequently misspelled words (This is one of the limitations of Google’s QR algorithm: Because those words are so frequently searched as misspelled, their QR is higher than it probably should be…)

"Did You Mean" Refinement

Sometimes Google will even force “correctly”-spelled results on top of search results even though you never clicked the suggested “Did you mean” spelling:

"Did You Mean" Refinement - mixed

2. Immediate Refinement

How it looks in search: Sometimes Google doesn’t give the original query any chance at all: It searches for the “correct” spelling instead (while showing how it was refined on top). These queries may have the lowest QR

Immediate Refinement

 

Why care?

The most obvious application of this knowledge is in business name choice. If your chosen name has a too low query rank, you’d probably better stay away from it as it will be hard to break.

Another interesting and tricky application here is that finding misspelled keywords with high enough QR that they are considered proper spelling (or almost proper) can be a huge signal that query has a huge search volume and low refinement frequency (it may be worth pursuing those!)

Comments

Leave a Reply