The JavaScript blog.


libraries APIs parsing

Hello.js, ineed

Posted on .


Andrew Dodson sent in hello.js (GitHub: MrSwitch / hello.js, License: MIT, npm: hellojs), a client-side API wrapper for OAuth2-based REST APIs. It presents a unified API that normalizes paths and responses for Google Data Services, Facebook Graph and Windows Live Connect.

One of the advantages of hello.js is it's modular. There are hello.js modules for Dropbox, LinkedIn, SoundCloud, and Yahoo.

The module API allows you to define things like jsonp functions, so it should be flexible enough to handle a lot of modern services.

HelloJS has been on Hacker News, with a discussion on security, and endorsements from users:

HelloJS is great. I've used it in my last project. It just works. It's well tested, and well documented. There's very little option twiddling required. It just worked seamlessly when I was trying to setup Twitter, Google, LinkedIn and Facebook OAuth logins.


Ivan Nikulin wrote in to say parse5 has a new SAX-style HTML parser which powers the ineed project:

ineed allows you collect useful data from web pages using simple and nice API. Let's collect images, hyperlinks, scripts and stylesheets from www.google.com:

var ineed = require('ineed');

  function (err, response, result) {

Internally, ineed uses streams of HTML tokens so it doesn't have to spend time building and traversing a DOM tree. It seems like an ideal way to handle lots of otherwise awkward scraping tasks.


parsing text

Natural Language Parsing with Retext

Posted on .

Retext (GitHub: wooorm / retext, License: MIT, npm: retext) by Titus Wormer is an extensible module for analysing and manipulating natural language text. It's built on two other modules by the same author. One is TextOM, which provides an object system for manipulating text, and the other is ParseLatin.

Given some text, ParseLatin returns syntax trees:

parseLatin.parse('A simple sentence.');  
 * ˅ Object
 *    ˃ children: Array[1]
 *      type: "RootNode"
 *    ˃ __proto__: Object

These trees can then be processed as required. You can iterate over nodes or search them for values, it's a bit like a DOM for plain text (or syntax/grammar).

The Retext module has lots of plugins. One example is an implementation of the Metaphone algorithm -- retext-double-metaphone. There's also a short-code emoji parser, so you can actually build tightly focused text-processing modules with Retext. Another similar plugin is a typographic parsing library, which converts ASCII to HTML entities.

One cool use of Retext would be natural language date parsing, which is something that in my experience always ends up in a horrible mess of regular expressions. The author is still looking for a "retext-date" implementation, so it would be interesting to see what that looks like in Retext.


HTML redis node modules statistics parsing curl

Node Roundup: parse5, redis-timeseries, request-as-curl

Posted on .


parse5 (GitHub: inikulin / parse5, License: MIT, npm: parse5) by Ivan Nikulin is a new HTML5 parser, based on the WhatWG HTML5 standard. It was built for a commercial project called TestCafé, when the authors found other HTML5 parsers to be too slow or inaccurate.

It's used like this:

var Parser = require('parse5').Parser;  
var parser = new Parser();  
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>')  
var fragment = parser.parseFragment('<title>Parse5 is &#102;&#117;&#99;&#107;ing awesome!</title><h1>42</h1>');  

I had a look at the source, and it doesn't look like it was made with a parser generator. It has a preprocessor, tokenizer, and special UTF-8 handling. There are no dependencies, other than nodeunit for testing. The tests were derived from html5lib, and include over 8000 test cases.

If you wanted to use it, you'll probably need to write a "tree adapter". Ivan has included an example tree adapter, which reminds me of writing SAX parser callbacks.

Ivan also sent in mods, which is a module system designed to need less boilerplate than AMD-style libraries.

Redis Time Series

Tony Sokhon sent in redis-timeseries (GitHub: tonyskn / node-redis-timeseries, License: MIT, npm: redis-timeseries), a project for managing time series data. I've used Redis a few times as a data sink for projects that need realtime statistics, and I always found it worked well for the modest amounts of data my projects generated. This project gives things a bit more structure -- you can create instances of time series and then record hits, then query them later.

A time series has a granularity, so you can store statistics at whatever resolution you require: ts.getHits('your_stats_key', '1second', ts.minutes(3), callback). This module is used by Tony's dashboard project, which can be used to make a realtime dashboard.


request-as-curl (GitHub: azproduction / node-request-as-curl, License: BSD, npm: request-as-curl) by Mikhail Davydov serialises options for http.ClientRequest into an equivalent curl command. It also works for Express.

// http.ClientRequest:
var req = request('http://google.com/', {method: 'POST', json: data}, function (error, response, expected) {  
  curlify(req.req, data);
  // curl 'http://google.com' -H 'accept: application/json' -H 'content-type: application/json' -H 'connection: keep-alive' --data '{"data":"data"}' --compressed

// Express:

app.get('/', function (req) {  
  // curl 'http://localhost/pewpew' -H 'x-real-ip:' -H etc...

I imagine Mikhail has been using this so he can replicate requests based on logs to aid in debugging.


testing node modules parsing text debugging memory

memdiff, numerizerJS, Obfuscate.js

Posted on .


memdiff (GitHub: azer / memdiff, License: WTFPL, npm: memdiff) by Azer Koculu is a BDD-style memory leak tool based on memwatch. It can either be used by writing scripts with describe and it, and then running them with memdiff:

function SimpleClass(){}  
var leaks = [];

describe('SimpleClass', function() {  
  it('is leaking', function() {
    leaks.push(new SimpleClass);

  it('is not leaking', function() {
    new SimpleClass;

Or by loading memdiff with require and passing a callback to memdiff. The memwatch module itself has an event-based API, and includes a platform-independent native module -- so both of these projects are tied to Node and won't work in a browser.


numerizerJS (GitHub: bolgovr / numerizerJS, License: MIT, npm: numerizer) by Roman Bolgov is a library for parsing English language string representations of numbers:

var numerizer = require('numerizer');  
numerizer('forty two'); // '42'  

It's currently very simple, and doesn't support browsers out of the box, but I like the fact the author has included Mocha tests. It'd work well alongside other libraries like Moment.js for providing intuitive text-based interfaces.


Obfuscate.js (GitHub: miohtama / obfuscate.js, License: MIT) by Mikko Ohtamaa is a client-side script for replacing text on pages with nonsense that may be more desirable than private information. Mikko suggests this might be useful for making screenshots, so post-processing isn't required to blur out personal information. The obfuscate function takes an optional selector, so either the entire body of a document can be obfuscated, or just the contents of a given selector.

It walks through each child node looking for text nodes, so it's lightweight and doesn't have any dependencies. It also tries to make the text look similar (at a glance) to the original text.


games language parsing errors jison

Mystik Map Editor, Outcome.js, TypedJS

Posted on .

Mystik Map Editor

Mystik Map Editor

So, you want to build the next Ultima in JavaScript? As well as a game engine tools are required. Mystik Map Editor (GitHub: UrbanTwitch / Mystik-Map-Editor) is an open source tile map editor. The client-side code is built with jQuery and jQuery UI. It supports a few drawing operations, like "brush" tile placement and a line tool, and will display if a tile is walkable or not.

Pressing "Create Map" will output a JSON representation of the current map. To see an example of a game built with this tile editor, visit mystikrpg.com. Technically, the tile editor could be forked, hacked, and used for anything, so if you do build the next Ultima with server-side JavaScript, get in touch!


Outcome.js (npm: outcome) by Craig Condon is a flow control library that focuses on error handling. Any functions with the signature .callback(err, result) can be wrapped, allowing error-related code to be grouped together.

It's quite hard to visualise the library without looking at Craig's example in the outcome.js README. Notice that rather than wrapping if statements around the errors returned in callbacks, on.success( is used to control execution.


TypedJS (License: MIT, GitHub: Proxino / TypedJS) by Ethan Fast allows functions to be annotated with type signatures. The library will then output useful logging during execution, allowing any mismatched types to be detected and potentially fixed.

To do this, comments and tests are used. Given a function with a suitable annotation:

MyObj = {  
  //+ MyObj.test_fun :: Number -> Number -> Number
  test_fun:function(num1, num2){
    return num1 + num2;

Then the function can be tested using TypedJS.run_tests(). If TypedJS.run_tests(true) is used, TypedJS will wrap functions to actually check for type violations. This is currently aimed at client-side development and requires jQuery, but the author notes that it's early days for the library, so hopefully it'll be extended to run elsewhere.

Interestingly, this is built using Jison. For those interested in Jison grammars, check out typedjs_parser.jison.