Node Tutorial Part 18: Full Text Search

28 Mar 2011 | By Alex Young | Tags server node tutorials lmawa nodepad

Welcome to part 18 of Let’s Make a Web App, a tutorial series about building a web app with Node. This series will walk you through the major areas you’ll need to face when building your own applications. These tutorials are tagged with lmawa.

Click to show previous tutorials.

Full Text Search

Given that we’re making a document-based system, wouldn’t it be nice if we had full text search? Mongo doesn’t explicitly support full text search, but simply saving a list of keywords will work.

The list of keywords can be modeled as an array of strings:

Document = new Schema({
  'title': { type: String, index: true },
  'data': String,
  'tags': [String],
  'keywords': [String],
  'user_id': ObjectId
});

Next we need to extract the strings from the document’s content. Mongoose middleware is the perfect way to do this:

Document.pre('save', function(next) {
  this.keywords = extractKeywords(this.data);
  next();
});

The question is, how should extractKeywords work? I’d usually rely on a full text indexer or a stemming library, but a simple function will be easier for now. Let’s use the following algorithm:

  • Split on white space
  • Find words longer than two characters
  • Remove duplicates

Implementing this is fairly easy with the filter iterator:

function extractKeywords(text) {
  if (!text) return [];

  return text.
    split(/\s+/).
    filter(function(v) { return v.length > 2; }).
    filter(function(v, i, a) { return a.lastIndexOf(v) === i; });
}

The regular expression matches white space, and the last filter will remove duplicates by checking if the index of the current value is the same as the last position that it appears. This is a quick and dirty solution, you’d want to spend more time on this for a production system.

Express Action

I’ve added routes for /search and /documents/titles. The titles route will just return a list of all titles with IDs, because the document index method returns documents with all their content.

app.get('/documents/titles.json', loadUser, function(req, res) {
  Document.find({ user_id: req.currentUser.id },
                [], { sort: ['title', 'descending'] },
                function(err, documents) {
    res.send(documents.map(function(d) {
      return { title: d.title, _id: d._id };
    }));
  });
});

// Search
app.post('/search.:format?', loadUser, function(req, res) {
  Document.find({ user_id: req.currentUser.id, keywords: req.body.s ? req.body.s : null },
                [], { sort: ['title', 'descending'] },
                function(err, documents) {
    switch (req.params.format) {
      case 'json':
        res.send(documents.map(function(d) {
          return { title: d.title, _id: d._id };
        }));
      break;
    }
  });
});

The search method expects a post with a s parameter to search on.

Interface

I’ve added a search bar on the top-right. It was a little bit of Jade added to the views/layout.jade file:

#container
  #header
    ul
      li
        h1
          a(href='/') #{nameAndVersion(appName, version)}
      - if (typeof currentUser !== 'undefined')
        li.right
          a#logout(href='/sessions') Log Out
        li.right
          form.search(action='/search')
            input(name='s', value='Search')

With some Stylus:

form.users input[type=submit]
  margin-left 140px
  clear both

form.search
  margin-right 10px

#show-all
  color medium-grey

Now, this is where I start wishing we were already using Backbone.js. I’ve created a function for inserting documents into the list, and one to call the search method:

// Search bar
function showDocuments(results) {
  for (var i = 0; i < results.length; i++) {
    $('#document-list').append('<li><a id="document-title-' + results[i]._id + '" href="/documents/' + results[i]._id + '">' + results[i].title + '</a></li>');
  }
}

function search(value) {
  $.post('/search.json', { s: value }, function(results) {
    $('#document-list').html('');
    $('#document-list').append('<li><a id="show-all" href="#">Show All</a></li>');

    if (results.length === 0) {
      alert('No results found');
    } else {
      showDocuments(results);
    }
  }, 'json');
}

This will automatically show and hide the “Search” text in the input:

$('input[name="s"]').focus(function() {
  var element = $(this);
  if (element.val() === 'Search')
    element.val('');
});

$('input[name="s"]').blur(function() {
  var element = $(this);
  if (element.val().length === 0)
    element.val('Search');
});

$('form.search').submit(function(e) {
  search($('input[name="s"]').val());
  e.preventDefault();
});

$('#show-all').live('click', function(e) {
  $.get('/documents/titles.json', function(results) {
    $('#document-list').html('');
    showDocuments(results);
    if (results.length > 0)
      $('#document-title-' + results[0]._id).click();
  });
  e.preventDefault();
});

It also inserts a document with the title “Show All”. This is styled a little bit differently and will call /documents/titles.json to fetch all the titles.

Indexing

I’ve added a Jake task that can be run with jake index. It’ll force all documents to save. This is just for the reader’s convenience to make existing documents get their keywords generated.

Conclusion

Full text search with Mongo is fairly easy, but this implementation is far from perfect. The keyword extraction algorithm could do with stemming, and the interface isn’t as intuitive as I’d like.

This week’s code was commit ceb9b32.


blog comments powered by Disqus