The JavaScript blog.


HTML redis node modules statistics parsing curl

Node Roundup: parse5, redis-timeseries, request-as-curl

Posted on .


parse5 (GitHub: inikulin / parse5, License: MIT, npm: parse5) by Ivan Nikulin is a new HTML5 parser, based on the WhatWG HTML5 standard. It was built for a commercial project called TestCafé, when the authors found other HTML5 parsers to be too slow or inaccurate.

It's used like this:

var Parser = require('parse5').Parser;  
var parser = new Parser();  
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>')  
var fragment = parser.parseFragment('<title>Parse5 is &#102;&#117;&#99;&#107;ing awesome!</title><h1>42</h1>');  

I had a look at the source, and it doesn't look like it was made with a parser generator. It has a preprocessor, tokenizer, and special UTF-8 handling. There are no dependencies, other than nodeunit for testing. The tests were derived from html5lib, and include over 8000 test cases.

If you wanted to use it, you'll probably need to write a "tree adapter". Ivan has included an example tree adapter, which reminds me of writing SAX parser callbacks.

Ivan also sent in mods, which is a module system designed to need less boilerplate than AMD-style libraries.

Redis Time Series

Tony Sokhon sent in redis-timeseries (GitHub: tonyskn / node-redis-timeseries, License: MIT, npm: redis-timeseries), a project for managing time series data. I've used Redis a few times as a data sink for projects that need realtime statistics, and I always found it worked well for the modest amounts of data my projects generated. This project gives things a bit more structure -- you can create instances of time series and then record hits, then query them later.

A time series has a granularity, so you can store statistics at whatever resolution you require: ts.getHits('your_stats_key', '1second', ts.minutes(3), callback). This module is used by Tony's dashboard project, which can be used to make a realtime dashboard.


request-as-curl (GitHub: azproduction / node-request-as-curl, License: BSD, npm: request-as-curl) by Mikhail Davydov serialises options for http.ClientRequest into an equivalent curl command. It also works for Express.

// http.ClientRequest:
var req = request('http://google.com/', {method: 'POST', json: data}, function (error, response, expected) {  
  curlify(req.req, data);
  // curl 'http://google.com' -H 'accept: application/json' -H 'content-type: application/json' -H 'connection: keep-alive' --data '{"data":"data"}' --compressed

// Express:

app.get('/', function (req) {  
  // curl 'http://localhost/pewpew' -H 'x-real-ip:' -H etc...

I imagine Mikhail has been using this so he can replicate requests based on logs to aid in debugging.


frameworks games node modules statistics visualisation probability

Node Roundup: Fastworks.js, Probability.js, Colony

Posted on .

You can send in your Node projects for review through our contact form or @dailyjs.


Fastworks.js (License: GPL3, npm: fastworks) by Robee Shepherd is an alternative to Connect. It includes "stacks" for organising middleware, and middleware for routing, static files, compression, cookies, query strings, bodies in various formats (including JSON), and a lot more. It can also work with Connect modules.

StaticFile serves things like images, style sheets and javascript files, using the pretty nifty Lactate node module. According to the author's benchmarks, it can handle more than twice the requests per second that Connect's Send module can.

That Lactate module sounds promising. On the subject of performance, one motivation for developing Fastworks.js was speed, but as of yet it doesn't include benchmarks or tests. Hopefully the author will include these at a later date so we can see how it shapes up against Connect.



Probability.js (License: MIT) by Florian Schäfer is a fascinating little project that helps call functions based on probabilities. Functions are paired alongside a probability so they'll only be called some of the time.

That doesn't sound useful on the surface, but the author suggests it could be useful in game development. Although if you've played the recent XCOM game you may be disillusioned by randomness in games, which is actually quite a well-trodden topic in the games development community. Analysis: Games, Randomness And The Problem With Being Human by Mitu Khandaker is an interesting analysis of games and chance.



Colony (GitHub: hughsk / colony, License: MIT, npm: colony) by Hugh Kennedy displays network graphs of links between Node code and its dependencies, using D3.js.

The network can be navigated around by clicking on files -- the relevant source will be displayed in a panel. Files are coloured in groups based on dependencies, so it's an intuitive way to navigate complex projects.


tutorials node statistics

Brain Training Node

Posted on .

Game scraper

The other day a friend asked me about the validity of video game review scores. There was an accusation of payola against a well-known games magazine, and the gaming community was trying to work out how accurate the magazine's scores were. My programmer's brain immediately thought up ways to solve this -- would a naive Bayesian classifier be sufficient to predict review scores given enough reviews?

The answer to that particular question is beyond the scope of this article. If you're interesting in statistical tests for detecting fraudulent data, then Benford's law is a better starting point.

Anyway, I couldn't help myself from writing some Bayes experiments in Node, and the result is this brief tutorial.

This tutorial introduces naive Bayes classifiers through the classifier module by Heather Arthur, and uses it to classify article text from the web through the power of scraping. It's purely educational rather than genuinely useful, but if you write something interesting based on it let me know in the comments and I'll check it out!


To complete this tutorial, the following things are required:

  • A working installation of Node
  • Basic Node and npm knowledge
  • Redis


Completing this tutorial will teach you:

  • The basics of Bayesian classification
  • How to use the classifier module
  • Web scraping

Getting Started

Like all Node projects, this one needs a package.json. Nothing fancy, but enough to express the project's dependencies:

  "author": "Alex R. Young"
, "name": "brain-training"
, "version": "0.0.1"
, "private": true
, "dependencies": {
    "classifier": "latest"
  , "request": "latest"
  , "cheerio": "latest"
, "devDependencies": {
    "mocha": "latest"
  "engines": {
    "node": "0.8.8"

The cheerio module implements a subset of jQuery, and a small DOM model. It's a handy way to parse web pages where accuracy isn't required. If you need a more accurate DOM simulation, the popular choice is JSDOM.

Core Module

The classifier module has an extremely simple API. It can work with in-memory data, but I wanted to persist data with Redis. To centralise this so we don't have to keep redefining the Redis configuration, the classifier module can be wrapped up like this:

var classifier = require('classifier')  
  , bayes

bayes = new classifier.Bayesian({  
  backend: {
    type: 'Redis'
  , options: {
      hostname: 'localhost'
    , port: 6379
    , name: 'gamescores'

module.exports = {  
  bayes: bayes

Now other scripts can load this file, and run train or classify as required. I called it core.js.

Naive Bayes Classifiers

The classifier itself implements a naive Bayes classifier. Such algorithms have been used as the core of many spam filtering solutions since the mid-1990s. Recently a book about Bayesian statistics, Think Bayes, was featured on Hacker News and garnered a lot of praise from the development community. It's a free book by Allen Downey and makes a difficult subject relatively digestible.

The spam filtering example is probably the easiest way to get started with Bayes. It works by assigning each word in an email a probability of being ham or spam. When a mail is marked as spam, each word will weighted accordingly -- this process is known as training. When a new email arrives, the filter can add up the probabilities of each word, and if a certain threshold is reached then the mail will be marked as spam. This is known as classification.

What makes this type of filtering naive is that each word is considered an independent "event", but in reality the position of a word is important due to the grammatical rules of the language. Even with this arguably flawed assumption, naive classifiers perform well enough to help with a wide range of problems.

The Wikipedia page for Bayesian spam filtering goes into more detail, relating spam filtering algorithms to the formulas required to calculate probabilities.


Create a new file called train.js as follows:

var cheerio = require('cheerio')  
  , request = require('request')
  , bayes = require('./core').bayes

function parseReview(html) {  
  var $ = cheerio.load(html)
    , score
    , article

  article = $('.copy .section p').text();
  score = $('[typeof="v:Rating"] [property="v:value"]').text();
  score = parseInt(score, 10);

  return { score: score, article: article };

function fetch(i) {  
  var trained = 0;

  request('http://www.eurogamer.net/ajax.php?action=frontpage&page=' + i + '&type=review', function(err, response, body) {
    var $ = cheerio.load(body)
      , links = []

    $('.article a').each(function(i, a) {
      var url;
      if (a.attribs) {
        url = 'http://www.eurogamer.net/' + a.attribs.href.split('#')[0];
        if (links.indexOf(url) === -1) {

    var left = links.length;

    links.forEach(function(link) {
      console.log('Fetching:', link);
      request(link, function(err, response, body) {
        var review = parseReview(body)
          , category

        if (review.score > 0 && review.score <= 5) {
          category = 'bad';
        } else if (review.score > 5 && review.score <= 10) {
          category = 'good';

        if (category) {
          console.log(category + ':', review.score);
          bayes.train(review.article, category);


        if (left === 0) {
          console.log('Trained:', trained);


This code is tailored for Eurogamer. If I wanted to write a production version, I'd separate out the scraping code from the training code. Here I just want to illustrate how to scrape and train the classifier.

The parseReview function uses the cheerio module to pull out the review's paragraph tags and extract the text. This is pretty easy because cheerio automatically operates on arrays of nodes, so $('.copy .section p').text() will return a block of text for each paragraph without any extra effort.

The fetch function could be adapted to call Eurogamer's article paginator recursively, but I thought if I put that in there they'd get angry if enough readers tried it out! In this example, fetch will download each article from the first page. I've tried to ensure unique links are requested by creating an array of links and then calling Array.prototype.indexOf to see if the link is already in the array. It also strips out links with hash URLs, because Eurogamer includes an extra #comments link.

Once the unique list of links has been generated, each one is downloaded. It's worth noting that I use Mikeal Rogers' request module here to simplify HTTP requests -- Node's built-in HTTP client library is fine, but Mikeal's module cuts down a bit of boilerplate code. I use it in a lot of projects, from web scrapers to crawlers, and interacting with RESTful APIs.

The scraper code in parseReview tries to pull out the score from the HTML. If a score between 0 and 5 is found, then the article is categorised as 'bad', and anything else is 'good'.


To actually classify other text, we need to find some other text and then call bayes.classify on it. This code expects review URLs from Edge magazine. For example: Torchlight II review.

var request = require('request')  
  , cheerio = require('cheerio')
  , bayes = require('./core').bayes

request(process.argv[2], function(err, request, body) {  
  if (err) {
  } else {
    var $ = cheerio.load(body)
      , text = $('.post-page p').text()


    bayes.classify(text, function(category) {
      console.log('category:', category);

Again, cheerio is used to pull out article text, and then it's handed off to bayes.classify. Notice that the call to classify looks asynchronous -- I quite like the idea of building a simple reusable asynchronous Node Bayes classifier service using Redis.

This script can be run like this:

node classify.js http://www.edge-online.com/review/liberation-maiden-review/  


I've combined my interest in computer and video games with Node to attempt to use a naive Bayes classifier to determine if text about a given game is good or bad. Of course, this is a lot more subjective than the question of ham or spam, so the value is limited. However, hopefully you can see how easy the classifier module makes Bayesian statistics, and you should be able to adapt this code to work with other websites or plain text files.

Heather Arthur has also written brain, which is a neural network library. We've featured this module before on DailyJS, but as there's only three dependents on npm I thought it was worth brining it up again.


jquery ui plugins statistics

jQuery Roundup: jStat, Art Text Light, Portamento

Posted on .

Note: You can send your plugins and articles in for review through our [contact form](/contact.html) or [@dailyjs](http://twitter.com/dailyjs).


jStat (GitHub, License: MIT) is a statistical library that includes general statistical functions and a
large set of probability distribution:

There are a few more that appear to be work in progress. Each
distribution has the following methods:
pdf, cdf, inv, mean, median, mode, and variance.

jStat also includes many commonly used functions as static methods, and
even includes tools for generating random numbers and manipulating
arrays of values.

The reason I've diligently waded through the source linking to Wikipedia
is to demonstrate the scope of the library, because documentation is
currently scant. The source is easy to follow though, so if you're
interested in using the library for something serious don't be afraid to
check it out.

Although not a jQuery library, it's designed to work in browsers and can
work with the jQuery Flot plotting

Art Text Light

Art Text Light
(License: MIT) will apply CSS to elements over a repeated interval. The author has used this to apply text-shadow, creating an
interesting effect. Any CSS or time intervals could be used, so I
imagined using it to set up a fast shimmering text effect on a gaming
site to direct people to a button. Perhaps.


Portamento (GitHub: krisnoble / Portamento, License: GPLv3)
by Kris Noble solves a problem that I had with DailyJS's page design. It
allows panels to float vertically, with lots of customisation
possibilities. For example, panels can be made to stop at a given

like this:

$('#sidebar').portamento({ wrapper: $('#wrapper') });

It also adapts well to small displays:

Portamento also has sensible behaviour if the user's viewport is too small to display the whole panel, so you don't need to worry about users not being able to see your important content.