The JavaScript blog.

AuthorRod Vagg

databases node leveldb

LevelDB and Node: Getting Up and Running

Posted on .

This is the second article in a three-part series on LevelDB and how it can be used in Node.

Our first article covered the basics of LevelDB and its internals. If you haven't already read it you are encouraged to do so as we will be building upon this knowledge as we introduce the Node interface in this article.


There are two primary libraries for using LevelDB in Node, LevelDOWN and LevelUP.

LevelDOWN is a pure C++ interface between Node.js and LevelDB. Its API provides limited sugar and is mostly a straight-forward mapping of LevelDB's operations into JavaScript. All I/O operations in LevelDOWN are asynchronous and take advantage of LevelDB's thread-safe nature to parallelise reads and writes.

LevelUP is the library that the majority of people will use to interface with LevelDB in Node. It wraps LevelDOWN to provide a more Node.js-style interface. Its API provides more sugar than LevelDOWN, with features such as optional arguments and deferred-till-open operations (i.e. if you begin operating on a database that is in the process of being opened, the operations will be queued until the open is complete).

LevelUP exposes iterators as Node.js-style object streams. A LevelUP ReadStream can be used to read sequential entries, forward or reverse, to and from any key.

LevelUP handles JSON and other encoding types for you. For example, when operating on a LevelUP instance with JSON value-encoding, you simply pass in your objects for writes and they are serialised for you. Likewise, when you read them, they are deserialised and passed back in their original form.

A simple LevelUP example

var levelup = require('levelup')

// open a data store
var db = levelup('/tmp/dprk.db')

// a simple Put operation
db.put('name', 'Kim Jong-un', function (err) {

  // a Batch operation made up of 3 Puts
      { type: 'put', key: 'spouse', value: 'Ri Sol-ju' }
    , { type: 'put', key: 'dob', value: '8 January 1983' }
    , { type: 'put', key: 'occupation', value: 'Clown' }
  ], function (err) {

    // read the whole store as a stream and print each entry to stdout
      .on('data', console.log)
      .on('close', function () {

Execute this application and you'll end up with this output:

{ key: 'dob', value: '8 January 1983' }
{ key: 'name', value: 'Kim Jong-un' }
{ key: 'occupation', value: 'Clown' }
{ key: 'spouse', value: 'Ri Sol-ju' }

Basic operations


There are two ways to create a new LevelDB store, or open an existing one:

levelup('/path/to/database', function (err, db) {  
  /* use `db` */

// or

var db = levelup('/path/to/database')  
/* use `db` */

The first version is a more standard Node-style async instantiation. You only start using the db when LevelDB is set up and ready.

The second version is a little more opaque. It looks like a synchronous operation but the actual open call is still asynchronous although you get a LevelUP object back immediately to use. Any calls you make on that object that need to operate on the underlying LevelDB store are queued until the store is ready to accept calls. The actual open operation is very quick so the initial is delay generally not noticeable.


To close a LevelDB store, simply call close() and your callback will be called when the underlying store is completely closed:

// close to clean up
db.close(function (err) { /* ... */ })  

Read, write and delete

Reading and writing are what you would expect for asynchronous Node methods:

db.put('key', 'value', function (err) { /* ... */ })

db.del('key', function (err) { /* ... */ })

db.get('key', function (err, value) { /* ... */ })  


As mentioned in the first article, LevelDB has a batch operation that performs atomic writes. These writes can be either put or delete operations.

LevelUP takes an array to perform a batch, each element of the array is either a 'put' or a 'del':

var operations = [  
    { type: 'put', key: 'Franciscus', value: 'Jorge Mario Bergoglio' }
  , { type: 'del', key: 'Benedictus XVI' }

db.batch(operations, function (err) { /* ... */ })  


LevelUP turns LevelDB's Iterators into Node's readable streams, making them surprisingly powerful as a query mechanism.

LevelUP's ReadStreams share all the same characteristics as standard Node readable object streams, such as being able to pipe() to other streams. They also emit all of the the expected events.

var rs = db.createReadStream()

// our new stream will emit a 'data' event for every entry in the store

rs.on('data' , function (data) { /* data.key & data.value */ })  
rs.on('error', function (err) { /* handle err */ })  
rs.on('close', function () { /* stream finished & cleaned up */ })  

But it's the various options for createReadStream(), combined with the fact that LevelDB sorts by keys that makes it a powerful abstraction:

    start     : 'somewheretostart'
  , end       : 'endkey'
  , limit     : 100           // maximum number of entries to read
  , reverse   : true          // flip direction
  , keys      : true          // see db.createKeyStream()
  , values    : true          // see db.createValueStream()

'start' and 'end' point to keys in the store. These don't need to even exist as actual keys because LevelDB will simply jump to the next existing key in lexicographical order. We'll see later why this is helpful when we explore namespacing and range queries.

LevelUP also provides a WriteStream which maps write() operations to Puts or Batches.

Since ReadStream and WriteStream follow standard Node.js stream patterns, a copy database operation is simply a pipe() call:

function copy (srcdb, destdb, callback) {  
    .on('error', callback)
    .on('close', callback)


LevelUP will accept most kinds of JavaScript objects, including Node's Buffers, as both keys and values for all its operations. LevelDB stores everything as simple byte arrays so most objects need to be encoded and decoded as they go in and come out of the store.

You can specify the encoding of a LevelUP instance and you can also specify the encoding of individual operations. This means that you can easily store text and binary data in the same store.

'utf8' is the default encoding but you can change that to any of the standard Node Buffer encodings. You can also use the special 'json' encoding:

var db = levelup('/tmp/dprk.db', { valueEncoding: 'json' })

  , {
        name       : 'Kim Jong-un'
      , spouse     : 'Ri Sol-ju'
      , dob        : '8 January 1983'
      , occupation : 'Clown'
  , function (err) {
      db.get('dprk', function (err, value) {
        console.log('dprk:', value)

Gives you the following output:

dprk: { name: 'Kim Jong-un',  
  spouse: 'Ri Sol-ju',
  dob: '8 January 1983',
  occupation: 'Clown' }

Advanced example

In this example we assume the data store contains numeric data, where ranges of data are stored with prefixes on the keys. Our example function takes a LevelUP instance and a range key prefix and uses a ReadStream to calculate the variance of the values in that range using an online algorithm:

function variance (db, prefix, callback) {  
  var n = 0, m2 = 0, mean = 0

        start : prefix          // jump to first key with the prefix
      , end   : prefix + '\xFF' // stop at the last key with the prefix
    .on('data', function (data) {
      var delta = data.value - mean
      mean += delta / ++n
      m2 = m2 + delta * (data.value - mean)
    .on('error', callback)
    .on('close', function () {
      callback(null, m2 / (n - 1))

Let's say you were collecting temperature data and you stored your keys in the form: location~timestamp. Sampling approximately every 5 seconds, collecting temperatures in Celsius we may have data that looks like this:

au_nsw_southcoast~1367487282112 = 18.23  
au_nsw_southcoast~1367487287114 = 18.22  
au_nsw_southcoast~1367487292118 = 18.23  
au_nsw_southcoast~1367487297120 = 18.23  
au_nsw_southcoast~1367487302124 = 18.24  
au_nsw_southcoast~1367487307127 = 18.24  

To calculate the variance we can use our function to do it while efficiently streaming values from our store by simply calling:

variance(db, 'au_nsw_southcoast~', function (err, v) {  
  /* v = variance */


The concept of namespacing keys will probably be familiar if you're used to using a key/value store of some kind. By separating keys by prefixes we create discrete buckets, much like a table in a traditional relational database is used to separate different kinds of data.

It may be tempting to create separate LevelDB stores for different buckets of data but you can take better advantage of LevelDB's caching mechanisms if you can keep the data organised in a single store.

Because LevelDB is sorted, choosing a namespace separator character can have an impact on the order of your entries. A commonly chosen namespace character often used in NoSQL databases is ':'. However, this character lands in the middle of the list of printable ASCII characters (character code 58), so your entries may not end up being sorted in a useful order.

Imagine you're implementing a web server session store with LevelDB and you're prefixing keys with usernames. You may have entries that look like this:

rod.vagg:last_login    = 1367487479499  
rod.vagg:default_theme = psychedelic  
rod1977:last_login     = 1367434022300  
rod1977:default_theme  = disco  
rod:last_login         = 1367488445080  
rod:default_theme      = funky  
roderick:last_login    = 1367400900133  
roderick:default_theme = whoa  

Note that these entries are sorted and that '.' (character code 46) and '1' (character code 49) come before ':'. This may or may not matter for your particular application, but there are better ways to approach namespacing.

Recommended delimiters

At the beginning of the printable ASCII character range is '!' (character code 33), and at the end we find '~' (character code 126). Using these characters as a delimiter we find the following sorting for our keys:




But why stick to the printable range? We can go right to the edges of the single-byte character range and use '\x00' (null) or '\xff' (ÿ).

For best sorting of your entries, choose '\x00' (or '!' if you really can't stomach it). But whatever delimiter you choose, you're still going to need to control the characters that can be used as keys. Allowing user-input to determine your keys and not stripping out your delimiter character could result in the NoSQL equivalent of an SQL Injection Attack (e.g. consider the unintended consequences that may arise with the dataset above with a delimiter of '!' and allowing a user to have that character in their username).

Range queries

LevelUP's ReadStream is the perfect range query mechanism. By combining 'start' and 'end', which just need to be approximations of actual keys, you can pluck out the exact the entries you want.

Using our namespaced dataset above, with '\x00' as delimiters, we can fetch all entries for just a single user by carafting a ReadStream range query:

var entries = []

db.createReadStream({ start: 'rod\x00', end: 'rod\x00\xff' })  
  .on('data', function (entry) { entries.push(entry) })
  .on('close', function () { console.log(entries) })

Would give us:

[ { key: 'rod\x00last_login', value: '1367488445080' },
  { key: 'rod\x00default_theme', value: 'funky' } ]

The '\xff' comes in handy here because we can use it to include every string of characters preceding it, so any of our user session keys will be included, as long as they don't start with '\xff'. So again, you need to control the allowable characters in your keys in order to avoid surprises.

Namespacing and range queries are heavily used by many of the libraries that extend LevelUP. In the final article in this series we'll be exploring some of the amazing ways that developers are extending LevelUP to provide additional features, applications and complete databases.

If you want to jump ahead, visit the Modules page on the LevelUP wiki.


databases node leveldb

LevelDB and Node: What is LevelDB Anyway?

Posted on .

This is the first article in a three-part series on LevelDB and how it can be used in Node.

This article will cover the LevelDB basics and internals to provide a foundation for the next two articles. The second and third articles will cover the core LevelDB Node libraries: LevelUP, LevelDOWN and the rest of the LevelDB ecosystem that's appearing in Node-land.


What is LevelDB?

LevelDB is an open-source, dependency-free, embedded key/value data store. It was developed in 2011 by Jeff Dean and Sanjay Ghemawat, researchers from Google. It's written in C++ although it has third-party bindings for most common programming languages. Including JavaScript / Node.js of course.

LevelDB is based on ideas in Google's BigTable but does not share code with BigTable, this allows it to be licensed for open source release. Dean and Ghemawat developed LevelDB as a replacement for SQLite as the backing-store for Chrome's IndexedDB implementation.

It has since seen very wide adoption across the industry and serves as the back-end to a number of new databases and is now the recommended storage back-end for Riak.


  • Arbitrary byte arrays: both keys and values are treated as simple arrays of bytes, so content can anything from ASCII strings to binary blobs.
  • Sorted by keys: by default, LevelDB stores entries lexicographically sorted by keys. The sorting is one of the main distinguishing features of LevelDB amongst similar embedded data storage libraries and comes in very useful for querying as we'll see later.
  • Compressed storage: Google's Snappy compression library is an optional dependency that can decrease the on-disk size of LevelDB stores with minimal sacrifice of speed. Snappy is highly optimised for fast compression and therefore does not provide particularly high compression ratios on common data.
  • Basic operations: Get(), Put(), Del(), Batch()

Basic architecture

Log Structured Merge (LSM) tree


All writes to a LevelDB store go straight into a log and a "memtable". The log is regularly flushed into sorted string table files (SST) where the data has a more permanent home.

Reads on a data store merge these two distinct data structures, the log and the SST files. The SST files represent mature data and the log represents new data, including delete-operations.

A configurable cache is used to speed up common reads. The cache can potentially be large enough to fit an entire active working set in memory, depending on the application.

String Sorted Table files (SST)

Each SST file is limited to ~2MB, so a large LevelDB store will have many of these files. The SST file is divided internally into 4K blocks, each of which can be read in a single operation. The final block is an index that points to the start of each data block and its the key of the entry at the start of the block. A Bloom filter is used to speed up lookups, allowing a quick scan of an index to find the block that may contain the desired entry.

Keys can have shared prefixes within blocks. Any common prefix for keys within a block will be stored once, with subsequent entries storing just the unique suffix. After a fixed number of entries within a block, the shared prefix is "reset"; much like a keyframe in a video codec. Shared prefixes mean that verbose namespacing of keys does not lead to excessive storage requirements.

Table file hierarchy

The table files are not stored in a simple sequence, rather, they are organised into a series of levels. This is the "Level" in LevelDB.

Entries that come straight from the log are organised in to Level 0, a set of up to 4 files. When additional entries force Level 0 above the maximum of 4 files, one of the SST files is chosen and merged with the SST files that make up Level 1, which is a set of up to 10MB of files. This process continues, with levels overflowing and one file at a time being merged with the (up to 3) overlapping SST files in the next level. Each level beyond Level 1 is 10 times the size of the previous level.

Log: Max size of 4MB (configurable), then flushed into a set of Level 0 SST files
Level 0: Max of 4 SST files, then one file compacted into Level 1
Level 1: Max total size of 10MB, then one file compacted into Level 2
Level 2: Max total size of 100MB, then one file compacted into Level 3
Level 3+: Max total size of 10 x previous level, then one file compacted into next level

0 ↠ 4 SST, 1 ↠ 10M, 2 ↠ 100M, 3 ↠ 1G, 4 ↠ 10G, 5 ↠ 100G, 6 ↠ 1T, 7 ↠ 10T


This organisation into levels minimises the reorganisation that must take place as new entries are inserted into the middle of a range of keys. Each reorganisation, or "compaction", is restricted to a just a small section of the data store. The hierarchical structure generally leads to data in the higher levels being the most mature data, with the fresher data being stored in the log and the initial levels. Since the initial levels are relatively small, overwriting and removing entries incurs less cost than when it occurs in the higher levels, but this matches the typical database where you have a large set of mature data and a more volatile set of fresh data (of course this is not always the case, so performance will vary for different data write and retrieve patterns).

A lookup operation must also traverse the levels to find the required entry. A read operation that requests a given key must first look in the log, if it is not found there it looks in Level 0, moving up to Level 1 and so forth. In this way, a lookup operation incurs a minimum of one read per level that must be searched before finding the required entry. A lookup for a key that does not exist must search every level before a definitive "NotFound" can be returned (unless a Del operation is recorded for that key in the log).

Advanced features

  • Batch operations: provide a collection of Put and/or Del operations that are atomic; that is, the whole collection of operations succeed or fail in a single Batch operation.
  • Bi-directional iterators: iterators can start at any key in a LevelDB store (even if that key does not exist, it will simply jump to the next lexical key) and can move forward and backwards through the store.
  • Snapshots: a snapshot provides a reference to the state of the database at a point in time. Read-queries (Get and iterators) can be made against specific snapshots to retrieve entries as they existed at the time the snapshot was created. Each iterator creates an implicit snapshot (unless it is requested against an explicitly created snapshot). This means that regardless of how long an iterator is alive and active, the data set it operates upon will always be the same as at the time the iterator was created.

Some details on these advanced features will be covered in the next two articles, when we turn to look at how LevelDB can be used to simplify data management in your Node application.

If you're keen to learn more and can't wait for the next article, see the LevelUP project on GitHub as this is the focus of much of the LevelDB activity in the Node community at the moment.


tutorials frameworks libraries modules ender

How Ender Bundles Libraries for the Browser

Posted on .

This is a contributed post by Rod Vagg. This work is licensed under a Creative Commons Attribution 3.0 Unported License.

I was asked an interesting Ender question on IRC (#enderjs on Freenode) and as I was answering it, it occurred to me that the subject would be an ideal way to explain how Ender's multi-library bundling works. So here is that explanation!

The original question went something like this:

When a browser first visits my page, they only get served Bonzo (a DOM manipulation library) as a stand-alone library, but on returning visits they are also served Qwery (a selector engine), Bean (an event manager) and a few other modules in an Ender build. Can I integrate Bonzo into the Ender build on the browser for repeat visitors?

What's Ender?

Let's step back a bit and start with some basics. The way I generally explain Ender to people is that it's two different things:

  1. It's a build tool, for bundling JavaScript libraries together into a single file. The resulting file constitutes a new "framework" based around the jQuery-style DOM element collection pattern: $('selector').method(). The constituent libraries provide the functionality for the methods and may also provide the selector engine functionality.
  2. It's an ecosystem of JavaScript libraries. Ender promotes a small collection of libraries as a base, called The Jeesh, which together provide a large portion of the functionality normally required of a JavaScript framework, but there are many more libraries compatible with Ender that add extra functionality. Many of the libraries available for Ender are also usable outside of Ender as stand-alone libraries.

The Jeesh is made up of the following libraries, each of these also works as a stand-alone library:

  • domReady: detects when the DOM is ready for manipulation. Provides $.domReady(callback) and $.ready(callback) methods.
  • Qwery: a small and fast CSS3-compatible selector engine. Does the work of looking up DOM elements when you call $('selector') and also provides $(elements).find('selector'), $(elements).and(elements) and $(elements).is('selector').
  • Bonzo: a DOM manipulation library, providing some of the most commonly used methods, such as $(elements).css('property', 'value'), $(elements).empty(), $(elements).after(elements||html), and many more.
  • Bean: an event manager, provides jQuery-style $(elements).bind('event', callback) and others.

The Jeesh gives you the features of these four libraries bundled into a neat package for only 11.7 kB minified and gzipped.

The Basics: Bonzo

Bonzo is a great way to start getting your head around Ender because it's so useful by itself. Let's include it in a page and do some really simple DOM manipulation with it.

<html lang="en-us">  
  <meta http-equiv="Content-type" content="text/html; charset=utf-8">
  <title>Example 1</title>
  <script src="bonzo.js"></script>
  <script id="scr">
    // the contents of *this* script,
    var scr = document.getElementById('scr').innerHTML

    // create a 

    var pre = bonzo.create('


{gfm-js-extract-pre-3}')` and we'd end up with two blocks, both responding to the click event.

###Removing Bonzo

It's possible to pull Bonzo out of the Ender build and manually stitch it back together again. Just like we used to do with our toys when we were children! (Or was that just me?)

First, our Ender build is now created with: `ender build qwery bean` (or we could run `ender remove bonzo` to remove Bonzo from the previous example's `ender.js` file).  The new `ender.js` file will contain the selector engine goodness from Qwery, and event management from Bean, but not much else.

Bonzo can be loaded separately, but we'll need some special glue to do this. In Ender parlance, this glue is called an Ender **Bridge**.

###The Ender Bridge

Ender follows the basic CommonJS Module pattern -- it sets up a simple module registry and gives each module a `module.exports` object and a `require()` method that can be used to fetch any other modules in the build. It also uses a `provide('name', module.exports)` method to insert exports into the registry with the name of your module. The exact details here aren't important and I'll cover how you can build your own Ender module in a later article, for now we just need a basic understanding of the module registry system.

Using our Qwery, Bean and Bonzo build, the file looks something like this:

text |========================================| | Ender initialisation & module registry | | (we call this the 'client library') | |========================================| | 'module.exports' setup | |


frameworks libraries date history node modules time keyboard ender responsive

Ender Roundup: tablesort.js, Moment.js, jwerty, SelectNav.js, ender-events, ender-assert, Categorizr.js, Arbiter

Posted on .

You can send in your Ender-related projects for review through our contact form or @dailyjs. Be sure to also update the Ender package list page on the Ender wiki.


tablesort.js (GitHub: tristen/tablesort, npm / Ender: tablesort) by Tristen Brown is a dependency-free sorting library for HTML tables. tablesort.js can be invoked stand-alone via new Tablesort(document.getElementById('table-id')) or $('#table-id').tablesort() method from within Ender.

Olivier Vaillancourt has written a small review of tablesort.js for use in Ender on Twitter Bootstrap tables.


Moment.js (GitHub: timrwood/moment, npm / Ender: moment) by Tim Wood is small, yet very comprehensive date and time handling library.


Moment.js was mentioned last year on DailyJS but it now has a simple Ender bridge allowing you to pack it neatly into Ender builds for use via $.ender(). Plus, it's an absolutely fantastic library for anything date/time related so it's worth mentioning again. Be sure to scan the docs to see just how much this library can do.

$.moment().add('hours', 1).fromNow(); // "1 hour ago"

// manipulate
$.moment().add('days', 7).subtract('months', 1).year(2009).hours(0).minutes(0).seconds(0);

// parse dates in different formats
var day = $.moment("12-25-1995", "MM-DD-YYYY");

var a = $.moment([2010, 1, 14, 15, 25, 50, 125]);  
a.format("dddd, MMMM Do YYYY, h:mm:ss a"); // "Sunday, February 14th 2010, 3:25:50 pm"  
a.format("ddd, hA"); // "Sun, 3PM"

// operate on different 'moment' objects
var a = $.moment([2007, 0]);  
var b = $.moment([2008, 5]);  
a.diff(b, 'years'); // 1  
a.diff(b, 'years', true); // 1.5  

The project maintainers also follow a rigorous release methodology, making great use of git branches, something that is not often found on smaller open source libraries.


jwerty (GitHub: keithamus/jwerty, Licence: MIT, npm / Ender: jwerty) by Keith Cirkel is a small keyboard event handling library which can bind, fire and assert key combination strings against elements and events.

$.key('ctrl+shift+P', function () { [...] });
$.key('⌃+⇧+P', function () { [...] });

// specify optional keys
$.key('⌃+⇧+P/⌘+⇧+P', function () { [...] });

// key sequences
$.key('↑,↑,↓,↓,←,→,←,→,B,A,↩', function () { [...] });

// pass in a selector to bind a shortcut local to that element
$.key('⌃+⇧+P/⌘+⇧+P', function () { [...] }, 'input.email', '#myForm');

// use `$.event` as a decorator, to bind events your own way
$('#myinput').bind('keydown', $.keyEvent('⌃+⇧+P/⌘+⇧+P', function () { [...] }));

// use `$.isKey` to check a key combo against a keyboard event
function (event) {  
    if ( $.isKey('⌃+⇧+P', event) ) { [...] }

// use `$.fireKey` to send keyboard events to other places
$.fireKey('enter', 'input:first-child', '#myForm');


SelectNav.js (GitHub: lukaszfiszer/selectnav.js, npm / Ender: selectnav.js) by Lukasz Fiszer is a small library that will convert your website's navigation into a <select> menu. Used together with media queries it helps you to create a space saving, responsive navigation for small screen devices. SelectNav.js is inspired by TinyNav.js for jQuery.

ender-events and ender-assert

ender-events (GitHub: amccollum/ender-events, Licence: MIT, npm / Ender: ender-events) and ender-assert (GitHub: amccollum/ender-assert, Licence: MIT, npm / Ender: ender-assert) are two packages by Andrew McCollum, previously bundled in his node-compat library. ender-events gives you an implementation of the NodeJS EventEmitter class in your browser, while ender-assert gives you a browser version of the NodeJS assert module.

Andrew also has a tiny extension to Bonzo, the DOM utility included in Ender's starter pack (The Jeesh), named ender-remove that simply triggers a 'remove' event when nodes are removed from the DOM. Which can be helpful for performing clean-up actions.


Categorizr.js (GitHub: Skookum/categorizr.js, Licence: MIT, npm / Ender: categorizr) by Dustan Kasten is a JavaScript port of the Categorizr PHP script by Brett Jankord.

Categorizr gives you $.isDesktop() $.isTablet() $.isTV() $.isMobile() methods to determine the current device.


Arbiter (GitHub: iamdustan/arbiter, Licence: MIT, npm / Ender: arbiter) also by Dustan Kasten is a tiny library for managing the HTML5 history interface via pushState(), using AJAX requests to load new content upon request.


ecmascript language ASI

JavaScript and Semicolons

Posted on .


In syntax terms, JavaScript is in the broad C-family of languages. The C-family is diverse and includes languages such as C (obviously), C++, Objective-C, Perl, Java, C# and the newer Go from Google and Rust from Mozilla. Common themes in these languages include:

  • The use of curly braces to surround blocks.
  • The general insignificance of white space (spaces, tabs, new lines) except in very limited cases. Indentation is optional and is therefore a matter of style and preference, plus programs can be written on as few or as many lines as you want.
  • The use of semicolons to end statements, expressions and other constructs. Semicolons become the delimiter that the new line character is in white-space-significant languages.

JavaScript's rules for curly braces, white space and semicolons are consistent with the C-family and its formal specification, known as the ECMAScript Language Specification makes this clear:

Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons.

But it doesn't end there--JavaScript introduces what's known as Automatic Semicolon Insertion (ASI). The specification continues:

Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.

The general C-family rules for semicolons can be found in most teaching material for JavaScript and has been advocated by most of the prominent JavaScript personalities since 1995. In a recent post, JavaScript's inventor, Brendan Eich, described ASI as "a syntactic error correction procedure", (as in "parsing error", rather than "user error").

Recent Developments

There has been a growing trend in the last few years toward the omission semicolons, in some cases totally avoiding them. Perhaps largely inspired by the likes of CoffeeScript and Ruby where semicolons are used only if you want to chain multiple statements on a single line. This view of semicolons could perhaps be summarised as: the semicolon character is optional in most situations and therefore introduces unnecessary syntactic noise--unnecessary syntax can (and maybe should) be avoided.

Unfortunately the division between the semicolon and semicolon-free crowd has become very pronounced and is leading to some angry exchanges. A recent lightning rod for semicolon-related controversy is the most watched project on GitHub, Twitter's Bootstrap and the author of its JavaScript code, Jacob Thornton who is a convert to the semicolon-free camp.

A short exchange this weekend on GitHub between Thornton and a cranky Douglas Crockford (author of perhaps the most-read JavaScript book, JavaScript: The Good Parts) erupted, on GitHub, Twitter and across the Internet.

The initial issue was a request for the addition of a semicolon in order to assist Crockford's JavaScript minifier tool, JSMin, to properly compress the code. Like Crockford's other popular JavaScript tool, JSLint, JSMin follows his rigid view of what the best parts of JavaScript are and reject the other, bad, parts, including treating semicolons as optional.

Crockford, after reviewing the code in question stated:

That is insanely stupid code. I am not going to dumb down JSMin for this case. ... Learn to use semicolons properly. ! is not intended to be a statement separator. ; is.

To which Thornton replied:

i have learned to use them, that's why there isn't one present.

Rather than continue the debate, perhaps it's best to review the rules surrounding semicolons in JavaScript so we can make informed decisions about our own coding style preference and we can learn to contend with code produced by programmers who have other preferences.

As an aside, it should be noted that both Bootstrap and JSMin have been patched to resolve the relevant issues in both.

Rules of ASI

The ECMAScript Language Specification deals with ASI in section 7.8 of editions 1 (PDF) and 2 (PDF) and section 7.9 of editions 3 (PDF), 5 (PDF) and the current drafts of edition 6. The text has stayed roughly the same through the different editions except for the inclusion of continue, break and throw statements in special cases that previously just applied to return. More on this below.

Simply put, the first basic rule of ASI is:

  • If the parser encounters a new line or curly brace, and it is used to break up tokens that otherwise don't belong together, then it will insert a semicolon.

The new line character is the one most commonly used in taking advantage of ASI so we'll restrict ourselves mainly to this case. The most common situation where you'll see curly brace ASI occurring is in code such as: if (foo) { bar++ }. It should be noted, however, that you could surround all your statements, expressions, etc. in curly braces if you wanted to avoid semicolons, i.e. place everything in its own block; although this offers limited help in achieving the kinds of goals that the semicolon-free crowd advocate.

So, as a beginning example, the code:

a = b + c  

has ASI applied because stringing the tokens together without the new line doesn't help. Otherwise, it would be interpreted as c foo() which isn't correct. The parser makes it look like this internally:

a = b + c;  

// or

a = b + c; foo()  

But here we find the most important alleged problems with taking advantage of ASI in your coding style. The important part of the first rule of ASI is that it will only apply if the parser needs to do so in order to make sense of the code in question. Consider the following code:

// example 1
a = b + c  

// example 2
a = b + c  
(options || {}).foo ? bar() : baz()

In both of these cases, the parser doesn't need to apply ASI in order to have properly formed code. In the first example, it can ignore the new line and treat the [ as applying to c, likewise in the second example, the ( can apply to c. So we would end up running something quite different than we might be trying to achieve:

// example 1
a = b + c[1].push(a)  
// i.e. fetch the first element of 'c' and execute the 'push' function on what it finds

// example 2
a = b + c(options || {}).foo ? bar() : baz()  
// i.e. execute 'c' as a function and check for the existence of the property 'foo' on the returned object

Moving on in the language specification, there are a few of special cases:

  • ASI is never performed if it would result in an "empty statement".

Empty statements ordinarily look like this (note the semicolons, there's a statement there, it's just "empty"):

for (counter = 0; counter < something(); counter++);  
// or
if (condition);  
else {  

This is perfectly legal JavaScript and may even be useful in certain situations. However, ASI will never help you achieve this so if you have constructs that would lead to empty statements (if there were semicolons) then you'll simply get an error:

if (condition)  
else {  
  • ASI is not performed within the head of a for loop, where semicolons are an integral part of the construct.

So no ASI is applied in cases such as:

for (var i = 0;  
  i < a.length
// may as well be written as:
for (var i = 0; i < a.length; i++) ...  
  • ASI is performed if the parser reaches the end of a file and a semicolon will help the code make sense. In other words, even if you don't have a new line at the end of your file, it will perform ASI in the same way as if there was one.

There is one final rule in the language specification regarding ASI. This rule overrides the first rule in some special cases, called "restricted productions". I'll split this rule into two to separate out an uncommon case with a much more common case.

  • ASI is always performed on code where a new line character comes before a -- or a ++, even where removing the new line character would still make a valid construct.

This rule is arguably not so important to understand or remember because it doesn't affect code that the majority of programmers would write. Consider the following program where each section contains identical sequence of 'tokens' except for the semicolons and white space characters. Try to predict the output for yourself:

var a, b, c;

// 1) plain
a = b = c = 1; a++; b--; c; console.log('1)', a, b, c)

// 2) extra spaces
a = b = c = 1; a ++ ; b -- ; c; console.log('2)', a, b, c)

// 3) new lines v1
a = b = c = 1  
a ++  
b --  
console.log('3)', a, b, c)

// 4) new lines v2
a = b = c = 1  
++ b
-- c
console.log('4)', a, b, c)

// 5) new lines v3
a = b = c = 1  
console.log('5)', a, b, c)  

As per our ASI rule, even though the new line character doesn't interrupt a valid code construct (a ++ would be the same as a\n++ in most C-family languages), the parser will always insert a semicolon when it encounters a ++ or -- after a new line character. So instead of being postfix operators as in a++ and b--, they become prefix operators on the next variables in the token stream: ++b and --c.

The output from the above program is:

1) 2 0 1  
2) 2 0 1  
3) 2 0 1  
4) 1 2 0  
5) 1 2 0  

The simple moral here is to follow standard C-family coding practice and keep your postfix and prefix operators attached to the variables they are applied to.

The second part of the final rule is more relevant as it can conflict with what you might encounter in standard C-family programs:

  • ASI is always performed where new line characters follow directly after any of the following statements: continue, break, return and throw.

While not common, both continue and break can be followed by a label, indicating where to jump to (labels with these two statements are the less 'evil' cousins of the much maligned goto found in many other languages). If you intend for the program to jump to a label then you must not separate the label from the continue or break by a new line character:

// not the same as:
continue foo;  
// actually interpreted as:

// not the same as:
break foo;  
// actually interpreted as:

The return and throw rules are much more interesting because, like the first rule of ASI, it can lead to non-obvious problems in common code. Generally, a stand-alone throw statement will lead to parse errors, so you should find out fairly quickly that this code is no good because you'll get some kind of parse error:

  new Error('Aieee!')
// interpreted as:
new Error('Aieee!');  

If you have a long line of code containing a throw statement and you wanted to improve readability by using new line characters. You cannot insert the new line straight after the throw or you'll end up with invalid code, i.e. a throw that doesn't have anything to throw. You'll have to rewrite your code or find a different place to introduce the new line character that fits your coding style. For example:

throw new Error(  

The most commonly used of the four statements we are considering is return. It is quite common to try and append a complex series of tokens to a return statement or even use return as an 'early return' from a function, sometimes leading to long lines that we may be tempted to break up with new lines. Consider:

// a common Node construct, an 'early return' after a callback upon receiving an error:
if (err) {  
   callback('Error while processing something that takes a lot of words to describe: ' + err)

As per the ASI rule, the new line character directly following the return leads to the insertion of a semicolon so we actually end up with our 'early return' being a bit too early and our function call becomes dead code. The above code is interpreted as something that is clearly not intended:

if (err) {  
 callback('Error while processing something that takes a lot of words to describe: ' + err);

The impact of ASI on this type of code is particularly sinister because it can be difficult to pick up. We were not intending to use return to actually return a value, but to halt execution of the current block. We're not going to be needing or perhaps even using a return value from the containing function--discovering the error will need somethingElseHere() to have obvious side-effects, which is not always the case.

The same ASI procedure occurs when we try to fit too much into our return value and are tempted to break it up with new lines:

if (foo) {  
    (something + complex()) - (enough[1] / to) << tempt + us(to - linebreak)

It's clear here that we're intending to return a value calculated by the long code string and we've attempted to improve readability by breaking it up with new line characters, or perhaps you have a vertical line in your editor that tempts you to do this kind of breaking.

We end up with an empty return and some dead code:

if (foo) {  
  (something + complex()) - (enough[1] / to) << tempt + us(to - linebreak);

It's important to remember that if you need to have long lines of code beginning with a return then you can't start with a new line straight after the return, you're going to have to find somewhere else to break if you really must break--or even better, avoid long return lines completely.

Gentle Advice

It should be clear, particularly from the last rule outlined above, that we don't need to be following a semicolon-free coding style to fall foul of JavaScript's ASI feature. Perfectly innocent line breaking can lead to semicolons being inserted into our token stream without our permissions so it's important to be aware of the kinds of situations where this can happen. Much of the recent debate about semicolons misses this point. Additionally, love or hate ASI, it's with us and is not going away so perhaps it would be more productive to embrace it as a feature and use it where it suits us and work around it where it doesn't.

Regardless of your preference, ASI and other obscure rules in non-trivial languages such as JavaScript mean that our build tools should involve some kind of syntax checking mechanism. Strongly-typed languages such as Java have sophisticated editors that can understand the intent of your code and provide real-time feedback as you type. It's a little more complex in JavaScript but we do have excellent tools that can analyse our code and point out potential problems or code style that may lead to common hazards.

JSLint by Douglas Crockford is perhaps the best known syntax checking tool available for JavaScript. It will encourage you to follow Crockford's personal coding style, which he believes leads to fewer syntax-related errors.

JSHint was developed as a much-less-opinionated alternative to JSLint. It has many options that let you tailor it to your personal preferences while still steering you away from potential errors, ASI-related and other.

These tools can be run across source files at build time (via a Makefile or as part of your test runner for example), or embedded directly in the text editors most commonly used by JavaScript programmers. Vim, TextMate and SublimeText all have ways of running JSLint or JSHint as you edit, providing quick feedback on any potential code problems. Even the most experienced JavaScript developers can bump into occasional ASI-related problems, having build tools that can point us in the right direction is just common sense.

Semicolon-free Best Practice

If you lean towards a semicolon-free style, there are some well-established conventions that can help produce less error-prone code. One of the largest JavaScript projects following a semicolon-free style is npm by Isaac Schlueter who is now lead developer of the NodeJS project. npm is Node's package manager and the code has a very particular coding style that is followed by many who advocate semicolon-free. Aside from minimal use of semicolons, this style is also characterised by comma-first, putting the emphasis on putting necessary syntax at the beginning of lines rather than at the end, where they can be easily overlooked.

To alleviate problems caused by ASI, Isaac advocates inserting a leading semicolon on lines that begin with syntax that could be interpreted as following on from the previous line. In particular the [ and ( characters. Our examples above involving these two characters can be rewritten as:

// example 1
a = b + c  

// example 2
a = b + c  
;(options || {}).foo ? bar() : baz()

By placing semicolons and commas at the beginning of the line, we elevate their importance in the token stream and potentially assist our own brains in identifying problems. Isaac has a great post dealing with this and other ASI issues.

Lastly, let's try and keep things in perspective. The JavaScript community has generated a lot of heat over a single, humble, character. There are bigger mountains to climb and we would be better off expending all that energy on building awesome things!