Mastering Node Streams: Part 1

2012-09-10 00:00:00 +0100 by Alex R. Young

Streams are one of the most underused data types in Node. If you're deep into Node, you've probably heard this before. But seeing several new modules pop up that are not taking advantage of streams or using them to their full potential, I feel the need to reiterate it.

The common pattern I see in modules which require input is this:

var foo = require('foo');

foo('/path/to/myfile', function onResult(err, results) {
  // do something with results

By only having your module's entry point be a path to a file, you are limiting the stream they could use on it to a readable file stream that they have no control over.

You might think that it's very common for your module to read from a file, but this does not consider the fact that a stream is not only a file stream. A stream could be several things. It could be a parser, HTTP request, or a child process. There are several other possibilities.

Only supporting file paths limits developers -- any other kind of stream will have to be written to the file system and then read later, which is less efficient. One reason for this is the extra memory it takes to store the stream. Secondly, it takes longer to stream the file to disk and then read the data the user needs.

To avoid this, the above foo module's API should be written this way

var stream = fs.createReadStream('/path/to/myfile');
foo(stream, function onResult(err, result) {
  // do something with results

Now foo can take in any type of stream, including a file stream. This is perhaps too long-winded when a file stream is passed; the fs module has to be required, and then a suitable stream must be created.

The solution is to allow both a stream and a file path as arguments.

foo(streamOrPath, function onResult(err, results) {
  // ...

Inside foo, it checks the type of streamOrPath, and will create a stream if needed.

module.exports = function foo(streamOrPath, callback) {
  if (typeof streamOrPath === 'string') {
    stream = fs.createReadStream(streamOrPath);
  } else if (streamOrPath.pipe && streamOrPath.readable) {
    stream = streamOrPath;
  } else {
    throw new TypeError('foo can only be called with a stream or a file path');

  // do whatever with `stream`

There you have it, really simple right? So simple I've created a module just for this common use case, called streamin.

var streamin = require('streamin');

module.exports = function foo(streamOrPath, callback) {
  var stream = streamin(streamOrPath);

  // do whatever with `stream`

Don't be fooled by its name, streamin works with writable streams too.

In the next part, I'll show you how modules like request return streams synchronously even when they're not immediately available.