Searching parts of words with elasticsearch

I figured out how to search parts of words with elasticsearch. If you are not familiar with the terms used in the documentation this can be quite a challenge. I will show you the key parts of the configuration you need.

First you need to know a little about the search engine. When you store documents, the text is indexed. This is done by tokenizing the text, basically chopping it up and then running those chops through one or more filters. When searching the same happens to the search terms (chop / filter / used for search). The tokenization and filtering for indexing and searching can be configured independently. Most of the time these configurations are to be kept the same, as illustrated by this case that uses a lowercase filter:

  • text to be indexed: Lorem Ipsum,
  • tokenized: ‘Lorem’, ‘Ipsum’
  • filtered: ‘lorem’, ‘ipsum’

If you search for this text (using the lowercase filter again):

  • text you search for: ‘Lorem’
  • tokenized: ‘Lorem’
  • filtered: ‘lorem’

Notice that both the indexed text and the search term are lowercase and will thus match. If you would not configure a lowercase search filter this would not return a result.

In the case where you want to search for parts of words however, the filters used for index time differ from the filters used for search. Here is the elasticsearch configuration for creating an index, that defines the filters to be used for indexing and for searching:

The numbers correspond to the critical parts of the configuration (explained below).

  1. At index time, we use the translation filter, that is of type ngram, declared at (3).
  2. Notice that for the search we DO NOT use the ngram filter
  3. The declaration of the ngram filter.
The way the ngram filter works, is that it explodes the token stream into more tokens. The token ‘lorem’ for example, is exploded into these tokens:
  • l
  • lo
  • lor
  • lore
  • lorem
All of these are indexed, making it possible to find any part of the word lorem. The configuration parameters min_gram and max_gram indicate the shortest and longest token  to be extracted from the input token. A min_gram of 2 and a max_gram 0f 4 would result in:
  • lo
  • lor
  • lore
This shortens the time required for indexing, but limits the parts of the words you are able to find.
To use the defined filters from your elasticsearch mapping you would use this configuration in your mapping:

IT Quality – Less tools, more collaboration

Ok, so we all understand that building great software requires a lot more than just a bunch of super skilled nerds producing superb code (see my previous post on the subject). It’s all about teamwork – about dedicated people from all disciplines collaborating efficiently throughout the entire software project lifecycle.

But how is this best achieved when each discipline uses different tools?

Continue reading

Want versus need

The job of a requirements engineer is to find out what people need as opposed to what they want. Customers however don’t buy what they need, they buy what they want.

If your goal is to sell sofware solutions that solve the real needs of your customer (and it should, if you are in for a long term relationship) make sure you are armed for the battle against the want. You start with a disadvantage, but if you resist, ultimately everyone wins.

5 tips to find a great software marketer

Software marketing must be the easiest thing to do. Everything from the offering to the actual delivery takes place online. And we’ve got awesome tools to reach out to people  today. So why exactly is it so hard to find a good software marketer?

Here is a list of 5 things to consider when looking for a software marketer:

  • Start marketing before you hire the first marketer. Start using social media, blog, go to events, organize events, be creative. Fail often, it is the only way to learn what you are looking for in the person you are going to hire.
  • Don’t look for a social media rock star (only). Social media often is about shouting and it’s very hard to turn that into marketing.
  • Look for someone that understands your product or service. THAT is where the marketing starts: a good product. Succes, including the success of the use of social media, is about the product.
  • Look for a facilitator. But also check if they are willing to help stuff the boxes if goodies need to be sent out or to make coffee for that important breakfast session that was organized. Someone that goes the extra mile (not to be confused with extra hours).
  • Don’t expect the marketer to do the marketing. What is that? We are looking for somebody to do the work aren’t we? Well, she can’t. At least, not on her own. You are not excused when it comes to marketing tasks. Every single person on your team is part of the marketing.

Did these tips work for us? I can’t tell: we are still looking. If you are interested or do know somebody that is, let me know!

If you need a bigger list, go see Seth Godin’s list.

Do user stories eliminate use cases?

Well… I say NO!

A user story can be a perfect short description of the use case, but it does not contain all the details you need for implementing, testing and maintaining the software.

A use case describes the what of the user story in much greater detail. And as a requirements engineer, when writing a use case description, you are forced to make it thoroughly logical – with no open ends.

So… user stories are great to uncover the who and the why, but you still very much need use cases to describe the what.

Keep your global namespace clean

A good piece of advice from our JavaScript tech day today: keep your global namespace clean. There is two options to make that happen: 1) good intentions, 2) unit test it.

I have a lot of good intentions, but unit tests seem to work better. Here is the Jasmine code:

describe("General tests", function() {
  describe("globals", function() {
    it("should expose only a certain amount of variables", function() {
      var expectedGlobals = ["util", "KeyCode", "logger", "l", "langur"];
      var exposedGlobals = detectGlobals.analyze();
      var expectedButNotDetected = _.difference(expectedGlobals, exposedGlobals);
      var detectedButNotExpected = _.difference(exposedGlobals, expectedGlobals);
      expect(_.size(exposedGlobals)).toEqual(_.size(expectedGlobals));
      expect(expectedButNotDetected.length).toEqual(0);
      expect(detectedButNotExpected.length).toEqual(0);
    });
  });
});

Note that you have to run this test at the very end of your testsuite to detect any globals that are created during the execution of the tests.

Go here for the gist with the Javascript that does the actual work (that gist was heavily inspired by this code, credits go to kangax (Juriy Zaytsev) ).

Techdays

At Avisi, because we like to stay sharp and have fun, we organize what we call ‘techdays’ every month.

The goal is to create a relaxed setting where we can all learn about and/or share new found technologies, techniques or methods. We either have speakers coming in or one of us does the presentation and every member of the team can submit ideas for potential techdays.

Continue reading

Characteristics of a proper build

Wat are the characteristics of a proper build?

  •  It is built off a build server, not your laptop (although now and then I break this rule with no tangible damage until today).
  • The source code is then labeled.
  • You can branch of the source of a build.
  • The build is published somewhere, a network disk, ivy / maven repository, whatever. Just not your laptop.
  • You have an administration of what is in this particular build. A bill of materials with changes, known errors and installation instructions.
  • You do something with it, install it on a dev server at least.

The last one sounds pretty logical, but we have to give it good thought. It is the essence of the build. It is the sole reason to build. Someone is going to use that very build. It is not just the last box that you tick before you start working on the next build.