Implementing Google's "Did You Mean" Feature In Java - InfoQ

Maybe your like

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

Enter your e-mail address Select your country Select a country I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

Live Webinar and Q&A: Designing a Control Plane for Cloud Infrastructure: Governance, State, and Continuous Orchestration (Apr 16, 2026) Save Your Seat

InfoQ Homepage Articles Implementing Google's "Did you mean" Feature In Java

Java AI Agent Swarm Patterns (Webinar March 26th) Implementing Google's "Did you mean" Feature In Java

Apr 09, 2010 5 min read

Leandro Moreira

Write for InfoQ

Feed your curiosity. Help 550k+ global senior developers each month stay ahead.Get in touch Like

Reading list

Introduction

Search engine users often misspell search terms for a variety of reasons varying from keyboard problems (a key not working), to unfamiliar international names (for example Sigmund Freud), changing one letter accidentally (Sinpsons), or adding one more letter (Frusciaante). Google's search engine includes a feature now familiar to many web users - "Did you mean" - which provides alternative suggestions when you may have misspelled a search term.

Text search is common in a variety of applications including many eCommerce web sites where it is typically used to allow customers to search the product catalogue for available items. Here a user misspelling a term could directly result in lost sales. For example suppose you run an online store that sells DVDs. A fan of the actor Arnold Schwarzenegger enters your site to buy all the DVDs starring the actor. His first action is to enter the name of Schwarzenegger into the search field, but unfortunately he misspells the name, typing "Arnold Swuazeneger". The search returns no results and so he points his browser to another store and tries again there.

One solution for this would be an implementation of the "Did you mean" feature with some domain knowledge built in such that it could return "Did you mean Arnold Schwarzenegger". In this article we will explore a simple implementation of this feature in Java.

Edit distance algorithms

Related Sponsors

From Observability to Actionability: Designing Agentic AI for Autonomous SRE on AWS

Related Sponsor

Boost AWS effectiveness with Agentic AI — unify telemetry, reduce noise, and resolve incidents faster. Learn More.

In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other. There are several different ways to define an edit distance, and there are a number of algorithms which may be used to calculate its value using these various definitions. The major ones include Levenshtein, Jaro-Winkler and n-gram. The Jaro-Winkler is a variant of the Jaro distance metric and is mainly used in the area of record linkage (duplicate detection). In the Levenshtein algorithm the distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance approach in 1965. The n-gram model is a probability model for predicting the next item in a sequence and is used in various areas of statistical natural language processing and genetic sequence analysis.

For the purposes of this article, rather than implementing the algorithms from scratch, we can use a pre-existing implementation courtesy of the SpellChecker project in the Apache Lucene sandbox.

In simple terms the Lucene SpellChecker implementation comprises a main class called SpellChecker, which uses a Directory, Dictionary and one of three StringDistance algorithms. The class SpellChecker uses the Strategy Pattern (GoF) to allow you to pick which StringDistance algorithm to use, with built-in implementations for JaroWinklerDistance, LevenshteinDistance and NGramDistance, defaulting to LevenshteinDistance. You can also adjust the accuracy of the results using a value between 0 and 1 defaulted to 0.5. The accuracy setting has a significant impact on results and you will probably find you want to set a value higher than the default, but setting it too high can cause no results to be returned. Using my dictionary, for example, an accuracy figure of 0.749 produced the best results. The Dictionary interface has two direct implementations and also allows you to implement your own.

For our "Did you mean" implementation we search a subset of words in the dictionary, finding words that are 'near' based on the chosen string distance algorithm, and where that distance matches with your pre-defined accuracy setting. Picture 1 shows an overview class diagram of Lucene SpellChecker.

Example

Below is a simple code example. To run it you will need Java5 or later, lucene-core-3.0.0.jar, lucene-spellchecker-3.0.0.jar and a flat file called dictionary.txt (simple text file with words separated by lines - an example of this is below).

//directory creation //spell checker instantiation final SpellChecker sp = new SpellChecker(directory); //index the dictionary sp.indexDictionary(new PlainTextDictionary(new File("dictionary.txt"))); //your 'wrong' search String search = "Arnold Swuazeneger"; //number of suggestions final int suggestionNumber = 5; //get the suggested words String[] suggestions = sp.suggestSimilar(search, suggestionNumber); //show the results. System.out.println("Your Term:" + search); for (String word : suggestions) { System.out.println("Did you mean:" + word); } //creating another misspelled search search = "bava"; suggestions = sp.suggestSimilar(search, suggestionNumber); System.out.println("Your Term:" + search); for (String word : suggestions) { System.out.println("Did you mean:" + word); }

Given the following dictionary.txt file: Seth MacFarlane Arnold Schwarzenegger Scarlett Johansson Rodrigo Santoro java lava bullet

The program will output: Your Term: arnold swuazeneger Did you mean: Arnold Schwarzenegger Your Term: bava Did you mean: java Did you mean: lava Did you mean: bullet

Benchmarking

To get an idea of performance we ran the examples 15 times on a machine with the following configuration and took an average:

O.S.: Windows XP Professional SP3 Processor: Intel Core 2 Duo E6550 @2.33GHz RAM: 1.96GB

Tests

Test	Word Size	Dictionary Size	Accuracy	Algorithm	Indexing time	Suggesting time
T1	17	5	0,5	Levenshtein	73,0136214	25,036049
T2	17	81000	0,5	Levenshtein	3402,293693	27,7293112
T3	17	5	0,5	JaroWinkler	69,53269	24,232477
T4	17	81000	0,5	JaroWinkler	3356,016059	26,287849
T5	17	81000	0,5	NGram	3353,633583	26,580123
T6	17	81000	0,9	Levenshtein	3325,310027	26,96378
T7	17	81000	0,3	Levenshtein	3408,072786	24,723142
T8	4	81000	0,67	Levenshtein	3328,584784	25,363586
T9	28	81000	0,67	Levenshtein	3354,5943	31,284672

Graph

Where: Word size is the number of letters in a word. Dictionary is the number of lines in a file. Accuracy is the accuracy set by setAccuracy method.

On this basis of these tests we can conclude that the accuracy figure has little impact on the time. Word length is significant however - a word with four characters gets results in around 2ms. Of the three algorithms tested NGramDistance is slightly faster than the others. In my tests I also found that the JaroWinklerDistance algorithm produced the least accurate results.

Conclusion

As you can see, using a pre-existing algorithm makes the implementation details for "Did you mean" surprisingly straightforward. In a real-world scenario however much of the effort will be building up a dictionary with suitable domain specific terms to support your application.

References

http://lucene.apache.org/java/docs/
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
http://archsofty.blogspot.com/2009/12/adicione-o-recurso-voce-quis-dizer-nas.html
http://lucene.apache.org/java/3_0_0/api/contrib-spellchecker/index.html
http://en.wikipedia.org/wiki/Edit_distance
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Jaro-Winkler_distance

About the Author

Leandro R. Moreira has been a software developer since 2002, currently working as a Java developer for a government agency in Brazil. He also contributes to a number of open-source projects including Jpcsp. He has published an article in the Mundo Java - issue 30 "World OO: Implementing an Internal DS" and maintains a blog in his native Portuguese.

Rate this Article

Adoption Style Author Contacted

This content is in the Java topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

Enter your e-mail address Select your country Select a country I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

Tag » What Did You Mean By That

Implementing Google's "Did You Mean" Feature In Java - InfoQ

InfoQ Software Architects' Newsletter

Write for InfoQ

Introduction

Edit distance algorithms

Related Sponsors

From Observability to Actionability: Designing Agentic AI for Autonomous SRE on AWS

Related Sponsor

Example

Benchmarking

Conclusion

References

About the Author

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Popular across InfoQ

GitHub Data Shows AI Tools Creating "Convenience Loops" That Reshape Developer Language Choices

Google Publishes Scaling Principles for Agentic Architectures

AWS Launches Agent Plugins to Automate Cloud Deployment

New Research Reassesses the Value of AGENTS.md Files for AI Coding

From Central Control to Team Autonomy: Rethinking Infrastructure Delivery

[Video Podcast] AI Autonomy Is Redefining Architecture: Boundaries Now Matter Most

The InfoQ Newsletter

Definition What Do You Mean By That - Gymglish

When Should I Use 'What Did You Mean By This' Vs. 'What Do ... - Quora

Druski "What Do You Mean By That?" | Know Your Meme

What Do You Mean Vs What Did You Mean - WordReference Forums

What Did You Mean Or What You Meant? - English Forums

What Do You Mean By That? What Is Meant By That? Do These ... - Italki

What Did You Say? What Do You Mean?: An Illustrated Guide To ...

What Did You Mean Or What Did You Meant? - TextRanch

Did You Mean Or Did You Meant? - TextRanch

[Grammar] - Do You Mean/Did You Mean

What Did You Mean | English Examples In Context - ru

What Do You Mean By That? - Kevin Huhn

Did You Mean - Translation Into Spanish - Examples English

Contact