DailyJS

DailyJS

The JavaScript blog.


AuthorNathan Sweet
Featured

node hbase thrift

Getting Started with HBase and Thrift for Node

Posted on .

Node and HBase

Why Node and HBase?

I think the title of this article is self explanatory, but for a bit of bit background for people who may not know: NodeJS is a JavaScript server framework based on a very thin and fast I/O multiplexer. That means NodeJS is great for things like proxying lots of short lived connections. Node's abilities make it ideally suited for being a proxy to a high-write database like HBase.

Traditionally the way to communicate with HBase using Node is to use the REST interface. This is both slow and not ideally suited for a scaling production environment. If we could leverage the write-scale of HBase with the proxy-scale of Node we could do some pretty cool things, like script map-reduce jobs for HBase in JavaScript, or create a service to do low latency writes. How can we do this? The answer is Thrift!

What is Thrift?

Thrift is a "software framework, for scalable cross-language services development...with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js..." Thrift allows us to create a thin API in NodeJS to communicate with HBase using a thin socket protocol. The advantages over REST are that the connection stays alive and the protocol is thinner than XML.

Enough analysis, let's get started.

Getting Started

First you will need to install HBase and Thrift. I'm not going to go over that here, as there are other great tutorials to do that. Here's one for HBase and one for Thrift (you'll need some other things to get Thrift going). Once you have both HBase and Thrift installed go ahead and open up your shell of choice and create a working directory and make it your current directory.

mkdir node_hbase  
cd node_hbase  

Then generate the Thrift files for Node like so:

thrift --gen js:node [hbase-dir]/src/main/resources/org/apache/hadoop/hbase/thrift/HBase.thrift  

The tutorial for generating these files can be found here.

After generating these files it's time to create our Node file. Here is an example that we can use. Make sure this file is inside the folder we created:

// This section includes a fix from here:
// https://stackoverflow.com/questions/17415528/nodejs-hbase-thrift-weirdness/21141862#21141862
var thrift = require('thrift'),  
  HBase = require('./gen-nodejs/HBase.js'),
  HBaseTypes = require('./gen-nodejs/HBase_types.js'),
  connection = thrift.createConnection('localhost', 9090, {
    transport: thrift.TFramedTransport,
    protocol: thrift.TBinaryProtocol
  });

connection.on('connect', function() {  
  var client = thrift.createClient(HBase,connection);
  client.getTableNames(function(err,data) {
    if (err) {
      console.log('gettablenames error:', err);
    } else {
      console.log('hbase tables:', data);
    }
    connection.end();
  });
});

connection.on('error', function(err){  
  console.log('error', err);
});

We don't want to run this yet. We need to start HBase and HBase's Thrift server/API. Starting HBase is easy, it's just one command:

[hbase-dir]/bin/hbase-daemon.sh start thrift -f

Now we can run our Node file. You may want to go into the HBase shell and create a table, otherwise you will get an empty array back from the example above.

The HBase Thrift API is documented here. The Node implementation of these methods will be different than how they are written and will almost never return anything, but instead will have a callback as an argument which will give the result of the method.

I think Node, HBase, and Thrift as a stack is a winning combination and would be ideally suited for something like a custom analytics platform or a mapping platform.

I will do my best to keep up with any comments or questions people have below (until they close).

Featured

performance node ES5 Enumeration ES3 benchmarking maintainability

JavaScript for Node Part 1: Enumeration

Posted on .

JavaScript developers have been accustomed to a very scattered and incoherent API (the DOM) for some time. As a result, some of JavaScript's most common patterns are pretty weird and unnecessary when programming for a unified and coherent API like Node. It can be easy to forget that the entire ES5 specification is available to you, but there are some standard patterns that deserve to be rethought because of ES5's newer features.

Objects in ES5

Since no object in JavaScript can have identical same-tier keys, all objects can be thought of as being hash tables. Indeed, V8 implements a hash function for object keys. This important concept did not go unnoticed in the ES5 draft and so the method Object.keys was created to extract the internal associative array of any object and return it as a JavaScript Array. In layman's terms, this means that Object.keys returns only the keys that belong to that object and NOT any properties that it may have inherited. This is a powerful and useful construct that can be utilized in Node when enumerating over an object.

The Old Way

Chances are you have run into the following looping pattern:

var key;  
for (key in obj) {  
  if (obj.hasOwnProperty(key))
    obj[key];
}

This was the only way to traverse an object in ES3 without going up an object's prototype chain.

A Better Way

In ES5 there is a better approach. Given that we can simply get the keys of an object and put them into an array, we can loop over an object, but only at the cost of looping over an array. First consider the following:

var keys = Object.keys(obj), i, l;

for (i = 0, l = keys.length; i < l; i++)  
  obj[keys[i]];

This is usually the fastest way of looping over an object in ES5 (at least in V8). However, this method has some drawbacks. If new variables are needed to make calculations, this approach starts to feel overly verbose. Consider the following:

function calculateAngularDistanceOfObject(obj) {  
  if (typeof obj !== 'object') return;
  var keys = Object.keys(obj),
    , EARTH_RADIUS = 3959
    , RADIAN_CONST = Math.PI / 180
    , deltaLat
    , deltLng
    , halfTheSquareChord
    , angularDistanceRad
    , temp
    , a, b, i, l
    ;

  for (i = 0, l = keys.length; i < l; i++) {
    temp = obj[keys[i]];
    a = temp.a;
    b = temp.b;
    deltaLat = a.subLat(b) * RADIAN_CONST;
    deltaLng = a.subLng(b) * RADIAN_CONST;
    halfTheSquareChord = Math.pow(Math.sin(deltaLat / 2), 2) + Math.pow(Math.sin(deltaLng / 2), 2) * Math.cos(a.lat * RADIAN_CONST) * Math.cos(b.lat * RADIAN_CONST);
    obj[keys[i]].angularDistance = 2 * Math.atan2(Math.sqrt(halfTheSquareChord), Math.sqrt(1 - halfTheSquareChord));
  }
}

An Even Better Way

In situations like this, instead of looping over the array of keys using Array’s native forEach method will allow us to create a new scope for the variables we are working with. This will allow us to do our processing in a more encapsulated manner:

function calculateAngularDistanceOfObject(obj) {  
  if (typeof obj !== 'object') return;

  var EARTH_RADIUS = 3959
    , RADIAN_CONST = Math.PI / 180;

  Object.keys(obj).forEach(function(key) {
    var temp = obj[key]
      , a = temp.a
      , b = temp.b
      , deltaLat = a.subLat(b) * RADIAN_CONST
      , deltaLng = a.subLng(b) * RADIAN_CONST;

    halfTheSquareChord = Math.pow(Math.sin(deltaLat / 2), 2) + Math.pow(Math.sin(deltaLng / 2), 2) * Math.cos(a.lat * RADIAN_CONST) * Math.cos(b.lat * RADIAN_CONST);
    obj[key].angularDistance =  2 * Math.atan2(Math.sqrt(halfTheSquareChord), Math.sqrt(1 - halfTheSquareChord));
  });
}

Benchmarking

Choosing the right pattern depends on balancing maintainability with performance. Of the two patterns, forEach is generally considered more readable. In general, iterating over large arrays will generally perform worse with forEach (although better than the old ES3 way), but it's important to correctly benchmark code before making a decision.

One popular solution for Node is node-bench (npm: bench) written by Isaac Schlueter. After installing it here is something to start with:

var bench = require('bench')  
  , obj = { zero: 0, one: 1, two: 2, three: 3, four: 4, five: 5, six: 6, seven: 7, eight: 8, nine: 9 };

// This is to simulate the object having non-enumerable properties
Object.defineProperty(obj, 'z', { value: 26, enumerable: false });

exports.compare = {  
  'old way': function() {
    for (var name in obj) {
      if (obj.hasOwnProperty(name))
        obj[name];
    }
  },

  'loop array': function() {
    var keys = Object.keys(obj)
      , i
      , l;

    for (i = 0, l = keys.length; i < l; i++)
      obj[keys[i]];
  },

  'foreach loop': function() {
    Object.keys(obj).forEach(function(key) {
      obj[key];
    });
  }
};

// This is number of iterations on each test we want to run
bench.COMPARE_COUNT = 8;  
bench.runMain();