lunr.js

Simple full-text search in your browser

Get Started

Open your browser's developer tools on this page to follow along.

Set up an index for your notes:

  var index = lunr(function () {
    this.field('title', {boost: 10})
    this.field('body')
    this.ref('id')
  })

Add documents to your index

  index.add({
    id: 1,
    title: 'Foo',
    body: 'Foo foo foo!'
  })

  index.add({
    id: 2,
    title: 'Bar',
    body: 'Bar bar bar!'
  })

Search your documents

  index.search('foo')

About

lunr.js is a simple full text search engine for your client side applications. It is designed to be small, yet full featured, enabling you to provide a great search experience without the need for external, server side, search services.

lunr.js has no external dependencies, although it does require a modern browser with ES5 support. In older browsers you can use an ES5 shim, such as augment.js, to provide any missing JavaScript functionality.

Pipeline

Every document and search query that enters lunr is passed through a text processing pipeline. The pipeline is simply a stack of functions that perform some processing on the text. Pipeline functions act on the text one token at a time, and what they return is passed to the next function in the pipeline.

By default lunr adds a stop word filter and stemmer to the pipeline. You can also add your own processors or remove the default ones depending on your requirements. The stemmer currently used is an English language stemmer, which could be replaced with a non-English language stemmer if required, or a Metaphoning processor could be added.

  var index = lunr(function () {
    this.pipeline.add(function (token, tokenIndex, tokens) {
      // text processing in here
    })

    this.pipeline.after(lunr.stopWordFilter, function (token, tokenIndex, tokens) {
      // text processing in here
    })
  })
        

Functions in the pipeline are called with three arguments: the current token being processed; the index of that token in the array of tokens, and the whole list of tokens part of the document being processed. This enables simple unigram processing of tokens as well as more sophisticated n-gram processing.

The function should return the processed version of the text, which will in turn be passed to the next function in the pipeline. Returning undefined will prevent any further processing of the token, and that token will not make it to the index.

Tokenization

Tokenization is how lunr converts documents and searches into individual tokens, ready to be run through the text processing pipeline and entered or looked up in the index.

The default tokenizer included with lunr is designed to handle general english text well, although application, or language specific tokenizers can be used instead.

Stemming

Stemming increases the recall of the search index by reducing related words down to their stem, so that non-exact search terms still match relevant documents. For example 'search', 'searching' and 'searched' all get reduced to the stem 'search'.

lunr automatically includes a stemmer based on Martin Porter's algorithms.

Stop words

Stop words are words that are very common and are not useful in differentiating between documents. These are automatically removed by lunr. This helps to reduce the size of the index and improve search speed and accuracy.

The default stop word filter contains a large list of very common words in English. For best results a corpus specific stop word filter can also be added to the pipeline. The search algorithm already penalises more common words, but preventing them from entering the index at all can be very beneficial for both space and speed performance.