Searching parts of words with elasticsearch

I figured out how to search parts of words with elasticsearch. If you are not familiar with the terms used in the documentation this can be quite a challenge. I will show you the key parts of the configuration you need.

First you need to know a little about the search engine. When you store documents, the text is indexed. This is done by tokenizing the text, basically chopping it up and then running those chops through one or more filters. When searching the same happens to the search terms (chop / filter / used for search). The tokenization and filtering for indexing and searching can be configured independently. Most of the time these configurations are to be kept the same, as illustrated by this case that uses a lowercase filter:

  • text to be indexed: Lorem Ipsum,
  • tokenized: ‘Lorem’, ‘Ipsum’
  • filtered: ‘lorem’, ‘ipsum’

If you search for this text (using the lowercase filter again):

  • text you search for: ‘Lorem’
  • tokenized: ‘Lorem’
  • filtered: ‘lorem’

Notice that both the indexed text and the search term are lowercase and will thus match. If you would not configure a lowercase search filter this would not return a result.

In the case where you want to search for parts of words however, the filters used for index time differ from the filters used for search. Here is the elasticsearch configuration for creating an index, that defines the filters to be used for indexing and for searching:

The numbers correspond to the critical parts of the configuration (explained below).

  1. At index time, we use the translation filter, that is of type ngram, declared at (3).
  2. Notice that for the search we DO NOT use the ngram filter
  3. The declaration of the ngram filter.
The way the ngram filter works, is that it explodes the token stream into more tokens. The token ‘lorem’ for example, is exploded into these tokens:
  • l
  • lo
  • lor
  • lore
  • lorem
All of these are indexed, making it possible to find any part of the word lorem. The configuration parameters min_gram and max_gram indicate the shortest and longest token  to be extracted from the input token. A min_gram of 2 and a max_gram 0f 4 would result in:
  • lo
  • lor
  • lore
This shortens the time required for indexing, but limits the parts of the words you are able to find.
To use the defined filters from your elasticsearch mapping you would use this configuration in your mapping:

  • Facebook
  • LinkedIn

8 thoughts on “Searching parts of words with elasticsearch

  1. Pingback: elasticsearch minihowto | private blog

  2. Hi Gert,

    Thanks for the article, clarifies a bit the ES mess.
    One question though: which type of query do you use if you want to perform a full text search?

    I am trying to do something similar without specifying one or many fields (ie falling back on the “_all” field), but apparently running queries on “_all” doesn’t take into account mappings at all. Any idea on this side? (apart from listing all the possible fields in a ‘match’ query or ‘query string’ query).


  3. Pingback: How to: Beginner's guide to ElasticSearch | SevenNet

  4. Can you please provide me with code snippet for .net elastic search tokinzation
    this is my current code :
    var node = new Uri(“localhost”);

    var settings = new ConnectionSettings(
    defaultIndex: “study”

    var indexSettings = new IndexSettings();
    var custonAnalyzer = new CustomAnalyzer();

    custonAnalyzer.Tokenizer = “mynGram”;
    custonAnalyzer.Filter = new List { “lowercase” };
    indexSettings.Analysis.Analyzers.Add(“mynGram”, custonAnalyzer);
    indexSettings.Analysis.Tokenizers.Add(“mynGram”, new NGramTokenizer
    MaxGram = 10,
    MinGram = 2

    var client = new ElasticClient(settings);
    client.CreateIndex(“study”, c => c
    .Analysis(descriptor => descriptor
    .TokenFilters(bases => bases
    .Add(“name_ngrams”, new NgramTokenFilter
    MaxGram = 11,
    MinGram = 3
    .Add(“punctuation_filter”, new Nest.StopTokenFilter
    Stopwords = new List {“.”}
    .Analyzers(bases => bases
    .Add(“partial_match”, partialMatch)
    .Add(“partial_match_no_punctuation”, partialMatchNoPunctuation)
    .Add(“full_match”, fullMatch))

    SqlConnection connection;
    SqlCommand command;
    string sql = null;
    string sql2 = null;
    string sql3 = null;
    SqlDataReader dataReader;

    string connetionString = “Data Source=localhost;Initial Catalog=XXX;User ID=XX;Password=XXX”;

    sql = “select * from XXX”;
    connection = new SqlConnection(connetionString);

    command = new SqlCommand(sql, connection);
    dataReader = command.ExecuteReader();
    while (dataReader.Read())
    var study = new Study
    Id = int.Parse(dataReader["Id"].ToString()),
    StudyName = dataReader["StudyName"].ToString(),
    var studies = client.Index(study, i => i


    var s = new SearchDescriptor().Query(t => t.Term(c => c.StudyName, “test”));

    var results = client.Search(s);

  5. Pingback: How-to: Beginner's guide to ElasticSearch #dev #programming #it | IT Info

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>