Let's Make a Framework: Selectors

01 Apr 2010 | By Alex Young | Tags web frameworks tutorials lmaf

Welcome to part 6 of Let’s Make a Framework, the ongoing series about building a JavaScript framework. This part explores CSS selectors.

If you haven’t been following along, these articles are tagged with lmaf. The project we’re creating is called Turing and is available on GitHub: turing.js. You can contribute! Fork and message me with your changes.

I’ve been adding more functional methods to the framework behind the scenes, if you want to keep up follow on GitHub.

In this part I’ll explain the basics behind selector engines, explore the various popular ones, and examine their characteristics. This is a major area for JavaScript web frameworks — it’ll take me a few posts to do it justice.

History

The importance and need for cross-browser selector APIs is a big thing. Whether the selectors are XPath or CSS, browser implementors are not consistent.

To understand why a selector engine is important, imagine working in the late 90s without a JavaScript framework:

document.all    // A proprietary property in IE4
document.getElementById('navigation');

The first thing that people realised was this could be shortened. Prototype does this with $():

function $(element) {
  if (arguments.length > 1) {
    for (var i = 0, elements = [], length = arguments.length; i < length; i++)
      elements.push($(arguments[i]));
    return elements;
  }
  if (Object.isString(element))
    element = document.getElementById(element);
  return Element.extend(element);
}

Rather than simply aliasing document.getElementById, Prototype can search for multiple IDs and extends elements with its own features.

What we really wanted though was getElementsBySelector. We don’t just think in terms of IDs or tag names, we want to apply operations to sets of elements. CSS selectors are how we usually do this for styling, so it makes sense that similarly styled objects should behave in a similar way.

Simon Willison wrote getElementsBySelector back in 2003. This implementation is partially responsible for a revolution of DOM traversal and manipulation.

Browsers provide more than just getElementById — there’s also getElementsByClassName, getElementsByName, and lots of other DOM-related methods. Most non-IE browsers support searching using XPath expressions using evaluate.

Modern browsers also support querySelector and querySelectorAll. Again, these methods are limited by poor IE support.

Browser Support

Web developers consider browser support a necessary evil. In selector engines, however, browser support is a major concern. Easing the pain of working with the DOM and getting reliable results is fundamental.

I’ve seen some fascinating ways to work around browser bugs. Look at this code from Sizzle:

// Check to see if the browser returns elements by name when
// querying by getElementById (and provide a workaround)
(function(){
  // We're going to inject a fake input element with a specified name
  var form = document.createElement("div"),
  id = "script" + (new Date()).getTime();
  form.innerHTML = "<a name='" + id + "'/>";
   
  // Inject it into the root element, check its status, and remove it quickly
  var root = document.documentElement;
  root.insertBefore( form, root.firstChild );

It creates a fake element to probe browser behaviour. Supporting browsers is a black art beyond the patience of most well-meaning JavaScript hackers.

Performance

Plucking elements from arbitrary places in the DOM is useful. So useful that developers do it a lot, which means selector engines need to be fast.

Falling back to native methods is one way of achieving this. Sizzle looks for querySelectorAll, with code to support browser inconsistencies.

Another way is to use caching, which Sizzle and Prototype both do.

One way to measure selector performance is through a tool like slickspeed or a library like Woosh. These tools are useful, but you have to be careful when interpreting the results. Libraries like Prototype and MooTools extend elements, whereas pure selector engines like Sizzle don’t. That means they might look slow compared to Sizzle but they’re actually fully-fledged frameworks rather than a pure selector engine.

Other Selector Engines

I’ve been referencing popular frameworks and Sizzle so far, but there are other Sizzle-alikes out there. Last year there was a burst of several, perhaps in reaction to the first release of Sizzle. Sizzle has since been gaining popularity with framework implementors again, so I’m not sure if any really made an impact.

Peppy draws on inspiration from lots of libraries and includes some useful comments about dealing with caching. Sly has a lot of work on optimisation and allows you to expose the parsing of selectors. Both Sly and Peppy are fairly large chunks of code — anything comparable to Sizzle is not a trivial project.

API Design

Generally speaking, there are two kinds of APIs. One approach uses a function that returns elements that match a selector, but wraps them in a special class. This class can be used to chain calls that perform complex DOM searches or manipulations. This is how jQuery, Dojo and Glow work.

jQuery gives us $(), which can be used to query the document for nodes, then chain together calls on them.

Dojo has dojo.query which returns an Array-like dojo.NodeList. Glow is similar, with glow.dom.get

The second approach is where elements are returned and extended. Prototype and MooTools do this.

Prototype has $() for getElementById and $$() for querying using CSS or XPath selectors. Prototype extends elements with its own methods. MooTools behaves a lot like this. Both can work with strings or element references.

Turing has been designed much like Glow-style libraries, so we’ll use this approach for our API. There’s a lazy aspect to this design that I like — it might be possible to return unprocessed elements wrapped in an object, then only deal with masking browser inconsistencies when they’re actually manipulated:

turing.dom.find('.class')                  // return elements wrapped in a class without looking at each of them
.find('a')                                 // find elements that are links
.css({ 'background-color': '#aabbcc' })    // apply the style by actually processing elements

Goals

Because Turing exists as an educational framework, it’s probably wise if we keep it simple:

  • CSS selectors only (XPath could be a future upgrade or plugin)
  • Limited pseudo-selectors. Implementing one or two would be nice just to establish the API for them
  • Defer to native methods as quickly as possible
  • Cache for performance
  • Expose the parsing results like Sly does
  • Reuse the wisdom of existing selector engines for patching browser nightmares

Conclusion

I hope you’ve learned why selector engines are so complex and why we need them. Granted, it’s mostly due to browser bugs that don’t need to exist, but browsers and standards both evolve out of step, so sometimes it’s our job to patch the gaps while they catch up.

While it’s true that Sizzle is the mother of all selector engines, I think implementing a small, workable one will be a worthwhile project. Join me in part 7 where I’ll start implementing the selector engine.


blog comments powered by Disqus