The JavaScript blog.


libraries node modules scraping

Fuel UX Tutorials, JS Unconf

Posted on .

Fuel UX Tutorials

Stephen James from Salesforce Marketing Cloud sent in a tutorial about Fuel UX 3's datagrid. It's split up into sections with sample code and a demo, so it's easy to follow along:

In this month's tutorial, we introduce the repeater control (v3). The repeater is a datagrid that can do tasks such as paging, sorting, and searching. It supports multiple views and custom-rendered cells. This tutorial showcases the most basic setup with a static source of data. In production, you would likely get data through API requests. The tutorial guides the user in defining and manipulating a data-providing and a column-rendering function.

Stephen also sent in some general information about Fuel UX, which is actually a pretty cool set of extensions for Bootstrap 3 that you can use to build things like web apps, admin areas for content management systems, and so on:

Fuel UX (GitHub: ExactTarget / fuelux, License: BSD-3) extends Bootstrap 3 with additional lightweight controls including combobox, datepicker, infinite scroll, loader, pillbox, placard, repeater, scheduler, search, selectlist, spinbox, tree, and wizard. The extensive documentation site even has a form builder to speed up markup placement.

I've used Fuel UX for a large project and found it worked pretty well, so it's worth trying out if you're looking for a consistent set of controls like wizards and date pickers.

JS Unconf

Philipp Hinrichsen wrote in about JS Unconf, an event that will take place in April in Hamburg. Here are the full details:

  • Location: Universit├Ąt Hamburg, Hamburg
  • Tickets: €79 (includes entrance to the conference, two parties, and food)
  • Size: 300 people

So, what is an "unconf"? Philipp sent in this definition:

Unconf means unconference. The JS Unconf is a non-profit unconference from the community for the community. There is no schedule or speaker list in advance. Everybody can apply with a talk. Everybody can vote for talks at the beginning of each day of the unconference. The talks which got the most votes from the attendees will be picked. The JS Unconf is organized by the BOOT e.V. and the FSR Informatik der Universit├Ąt Hamburg.

If you want to contribute to the conference you can find out how at contriboot.jsunconf.eu>.


libraries node modules scraping

X-Ray: A Scraper by the Author of Cheerio

Posted on .


My favourite general-purpose web scraping utility has to be cheerio, a module that provides the simplicity of CSS selectors with fast and forgiving HTML parsing. The author, Matthew Mueller, has now released x-ray (GitHub: lapwinglabs/x-ray, License: MIT, npm: x-ray), a module specifically built for scraping.

The x-ray module depends on Cheerio, but includes some methods to automate common scraping tasks: it can convert collections of items into JavaScript objects, select objects based on flexible schemas, and it can even paginate based on CSS selectors.

Each of these features is well thought out: pagination supports a delay between requests, and you can set a limit to avoid following too many pages. It also integrates well with Node: you can stream to files, for example.

The data sources are flexible as well. If you need to scrape dynamic pages, then you can use x-ray's PhantomJS driver, otherwise superagent is used.

Matthew's example fetches a user's "stars" page from GitHub and then extracts metadata like the repository link, description, and date:

    $root: '.repo-list-item',
    title: '.repo-list-name',
    link: '.repo-list-name a[href]',
    description: '.repo-list-description',
    meta: {
      $root: '.repo-list-meta',
      starredOn: 'time'
  .paginate('.pagination a:last-child[href]')

One other cool feature of x-ray is it supports output formats. That means you can easily download a page, structure it, then generate JSON, XML, RSS, Atom, CSV, etc.

When compared to using a chaotic mix of HTTP and DOM-parsing modules, x-ray gives scraping a more rigorous structure while remaining flexible. That means it would be possible to share x-ray schemas as Node modules, so the community could collaborate on wrapping useful websites with poor or non-existent APIs. It's true that some people regard scraping as a dark art that borders on copyright theft (if used without permission), but Matthew carefully calms this notion by using the slogan "structure any website".


libraries node email modules scraping windows

Node Roundup: Mailman, trayballoon, unembed

Posted on .


Mailman (GitHub: vdemedes/mailman, License: MIT, npm: mailman) by Vadim Demedes is a module for sending emails that supports generators. It uses nodemailer for sending email, and consolidate.js for templates, which means it supports lots of different template languages.

Generators are used for sending emails, so you can do this:

var mail = new UserMailer({ to: 'vadim@example.com' }).welcome();  
yield mail.deliver();  

Mailman expects a specific directory layout for views, but the added level of structure might help if you've got a big mess of email-related code in your current projects.



trayballoon (GitHub: sindresorhus/trayballoon, License: MIT, npm: trayballoon) by Sindre Sorhus is a module for showing system tray balloons in Windows. You can set the text and image to display, and a callback that will run when the balloon disappears:

  text: 'Unicorns and rainbows'
  icon: 'ponies.ico',
  timeout: 20000
}, function() {
  console.log('Trayballoon disappeared');

It also has a command-line tool which you could use to display notifications when things like tests fail. trayballoon works by bundling an executable called nircmdc.exe which is called with child_process.spawn.


Given some "embed code" for sites like YouTube and Vimeo, unembed (GitHub: colearnr/unembed, License: MIT, npm: unembed) by Prabhu Subramanian will extract the markup and produce a JSON representation. This might be useful if you're scraping sites that use embed codes, like blogs and forums.

I've never thought of applying the tiny modules philosophy to scraping, but it seems like a great way of sharing all of those hacks we use to extract data in a more structured way.


animation dom svg scraping

SVG Circus, Cheers

Posted on .

SVG Circus

SVG Circus

SVG Circus by Alex Kaul is a site for generating SVG animations. You can use it to make spinners for loading indicators, or other animations if you get creative.

It's built with AngularJS and Bootstrap, and the Bootstrap customisation looks pretty cool. Animations can be exported as XML with embedded JavaScript for animation.


Yesterday I mentioned ineed, a scraper API based around a streaming tokenizer. Most of my Node scraping work has been written with Cheerio, which is a small jQuery-inspired API for Node. Cheers (GitHub: fallanic / cheers, License: MIT, npm: cheers) by Fabien Allanic is a Cheerio-based scraper library:

The motivation behind this package is to provide a simple cheerio-based scraping tool, able to divide a website into blocks, and transform each block into a JSON object using CSS selectors.

It works by using configuration objects that describe metadata based on CSS selectors, so it may help you to be more pragmatic about how you scrape documents.


node modules scraping social facebook

Node Roundup: hotcode, fbgraph, browser

Posted on .

You can send your node modules and articles in for review through our [contact form](/contact.html) or [@dailyjs](http://twitter.com/dailyjs).


hotcode (License: MIT, npm: hotcode) by Mathias Pettersson is a small Express app that watches for changes on a given path, then reloads an associated app when files

To use it, follow the instructions in the project's README file. When
hotcode is run it'll print out a link to some JavaScript
(it should be -- this will need to be added to your project to get automatic reloading in
the browser.

Projects can be configured in ~/.hotcode so the path
doesn't need to be entered each time hotcode is started.
This can be set per-domain as well.

One of the interesting things about hotcode is it can be run against
practically any app in any language. If you're sick of having to restart
your app and refresh the browser every time you make a change, then
you're going to love this.


fbgraph (GitHub: criso / fbgraph, License: MIT, npm:
fbgraph) by Cristiano Oliveira provides consistent access to the Facebook graph API. According to the author:

All calls will return json. Facebook sometimes decides to just return a string or true or redirects directly to an image.

Given suitable configuration options, people can be authorised using

var graph = require('fbgraph')
  , authUrl = graph.getOauthUrl({
      'client_id':    conf.client_id
    , 'redirect_uri': conf.redirect_uri

Once the Facebook dialog has been displayed,
graph.authorize is called to complete the process and get
the access token.

API calls are made with graph.get or
graph.post, so most of the API is oriented around HTTP

graph.get('zuck', function(err, res) {
  console.log(res); // { id: '4', name: 'Mark Zuckerberg'... }

graph.post(userId + '/feed', wallPost, function(err, res) {
  // returns the post id
  console.log(res); // { id: xxxxx}

This is a clean, well-documented, and well-tested Facebook library,
which is surprisingly refreshing.


browser (License: MIT, npm: browser) by Shin Suzuki is an event-based library for browsing and scraping URLs, whilst maintaining cookies. Requests are built up using
the library's object, and then triggered with .run():

var $b = require('browser');

Building up a sequence of events is possible with the after

var $b = require('browser');


  .after(); // browse after previously registered function


Event callbacks can also be registered:

$b.on('end', function(err, res) {

Now load up jsdom and you'll be able
to scrape faster than you can say "deep-linking lawsuit"!