DailyJS

DailyJS

The JavaScript blog.


Tagjs101
Featured

commonjs modules amd js101 terminology basics

Terminology: Modules

Posted on .

Learning modern modular frameworks like Backbone.js and AngularJS involves mastering a large amount of terminology, even just to understand a Hello, World application. With that in mind, I wanted to take a break from higher-level libraries to answer the question: what is a module?

The Background Story

Client-side development has always been rife with techniques for patching missing behaviour in browsers. Even the humble <script> tag has been cajoled and beaten into submission to give us alternative ways to load scripts.

It all started with concatenation. Rather than loading many scripts on a page, they are instead joined together to form a single file, and perhaps minimised. One school of thought was that this is more efficient, because a long HTTP request will ultimately perform better than many smaller requests.

That makes a lot of sense when loading libraries -- things that you want to be globally available. However, when writing your own code it somehow feels wrong to place objects and functions at the top level (the global scope).

If you're working with jQuery, you might organise your own code like this:

$(function() {
  function MyConstructor() {
  }

  MyConstructor.prototype = {
    myMethod: function() {
    }
  };

  var instance = new MyConstructor();
});

That neatly tucks everything away while also only running the code when the DOM is ready. That's great for a few weeks, until the file is bustling with dozens of objects and functions. That's when it seems like this monolithic file would benefit from being split up into multiple files.

To avoid the pitfalls caused by large files, we can split them up, then load them with <script> tags. The scripts can be placed at the end of the document, causing them to be loaded after the majority of the document has been parsed.

At this point we're back to the original problem: we're loading perhaps dozens of <script> tags inefficiently. Also, scripts are unable to express dependencies between each other. If dependencies between scripts can be expressed, then they can be shared between projects and loaded on demand more intelligently.

Loading, Optimising, and Dependencies

The <script> tag itself has an async attribute. This helps indicate which scripts can be loaded asynchronously, potentially decreasing the time the browser blocks when loading resources. If we're going to use an API to somehow express dependencies between scripts and load them quickly, then it should load scripts asynchronously when possible.

Five years ago this was surprisingly complicated, mainly due to legacy browsers. Then solutions like RequireJS appeared. Not only did RequireJS allow scripts to be loaded programmatically, but it also had an optimiser that could concatenate and minimise files. The lines between loading scripts, managing dependencies, and file optmisation are inherently blurred.

AMD

The problem with loading scripts is it's asynchronous: there's no way to say load('/script.js') and have code that uses script.js directly afterwards. The CommonJS Modules/AsynchronousDefinition, which became AMD (Asynchronous Module Definition), was designed to get around this. Rather than trying to create the illusion that scripts can be loaded synchronously, all scripts are wrapped in a function called define. This is a global function inserted by a suitable AMD implementation, like RequireJS.

The define function can be used to safely namespace code, express dependencies, and give the module a name (id) so it can be registered and loaded. Module names are "resolved" to script names using a well-defined format.

Although this means every module you write must be wrapped in a call to define, the authors of RequireJS realised it meant that build tools could easily interpret dependencies and generate optimised builds. So your development code can use RequireJS's client-side library to load the necessary scripts, then your production version can preload all scripts in one go, without having to change your HTML templates (r.js is used to do this in practice).

CommonJS

Meanwhile, Node was becoming popular. Node's module system is characterised by using the require statement to return a value that contains the module:

var User = require('models/user');  
User.find(1);  

Can you imagine if every Node module had to be wrapped in a call to define? It might seem like an acceptable trade-off in client-side code, but it would feel like too much boilerplate in server-side scripting when compared to languages like Python.

There have been many projects to make this work in browsers. Most use a build tool to load all of the modules referenced by require up front -- they're stored in memory so require can simply return them, creating the illusion that scripts are being loaded synchronously.

Whenever you see require and exports you're looking at CommonJS Modules/1.1. You'll see this referred to as "CommonJS".

Now you've seen CommonJS modules, AMD, and where they came from, how are they being used by modern frameworks?

Modules in the Wild

Dojo uses AMD internally and for creating your own modules. It didn't originally -- it used to have its own module system. Dojo adopted AMD early on.

AngularJS uses its own module system that looks a lot like AMD, but with adaptations to support dependency injection.

RequireJS supports AMD, but it can load scripts and other resources without wrapping them in define. For example, a dependency between your own well-defined modules and a jQuery plugin that doesn't use AMD can be defined by using suitable configuration options when setting up RequireJS.

There's still a disparity between development and production builds. Even though RequireJS can be used to create serverless single page applications, most people still use a lightweight development server that serves raw JavaScript files, before deploying concatenated and minimised production builds.

The need for script loading and building, and tailoring for various environments (typically development, test, and production) has resulted in a new class of projects. Yeoman is a good example of this: it uses Grunt for managing builds and running a development server, Bower for defining the source of dependencies so they can be fetched, and then RequireJS for loading and managing dependencies in the browser. Yeoman generates skeleton projects that set up development and build environments so you can focus on writing code.

Hopefully now you know all about client-side modules, so the next time you hear RequireJS, AMD, or CommonJS, you know what people are talking about!

Featured

tutorials language js101 beginner

JS101: A Primer on Strings and String Encodings

Posted on .

What is a JavaScript string? It depends on the context. For instance, a string is a primitive value -- a value represented at the "lowest level" of the language's implementation.

Strings are also members of the type String. Strings can be created with the String constructor. Running new String('hello') creates an instance of String.

Finally, String literals are found in the program's source: var name = 'alex'.

Given that there are many ways to represent strings, what is the underlying encoding in JavaScript? Both the third and fifth editions of ECMAScript state that strings are represented as 16-bit unsigned integers:

Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text. However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.

String Encoding

Ultimately a string is just a sequence of characters. In other words, an array of units of information that correspond to digits, letters, and so on. Characters are represented as byte sequences.

When working on client-side JavaScript and HTML, we're used to seeing charset=UTF-8. UTF-8 is a system for encoding characters, and is actually "variable width", which means the bytes used to represent an individual character can vary in length.

I said earlier that JavaScript strings are 16-bit, so how does this relate to UTF-8? In extremely simplified terms for the purposes of a beginner's article, you can think about it like this: JavaScript engines use a fixed 16-bit representation of characters that makes it easier to manage strings internally.

So, even though a browser's JavaScript engine internally represents characters as 16-bit numbers, we don't usually need to know about this. Writing the strings to form controls with the DOM or using XMLHTTPRequest should convert the string to the right encoding. Ideally the server should have sent the Content-Type header set to UTF-8, so the browser will know what to do.

More About Encodings

Even if you're a client-side developer that doesn't care about string encodings, Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets is worth reading because it explains the history behind string encodings. Understanding the history behind what can be a frustrating topic makes it easier to understand.

If you need to work on string encodings in JavaScript, Johan Sundström's post Encoding / decoding UTF8 in javascript from back in 2006 explains how to encode and decode UTF-8.

Monsur Hossain went on to write UTF-8 in JavaScript which goes into unescape and encodeURIComponent in more detail.

Featured

tutorials testing language js101 beginner

JS101: Deep Equal

Posted on .

Back in JS101: Equality I wrote about the difference between == and ===. This is one area of the language that quite clearly causes issues for beginners. In addition, there is another equality concept that can come in handy when writing tests: deep equal. It also illustrates some of the underlying mechanics of the language. As an intermediate JavaScript developer, you should have at least a passing familiarity with deepEqual and how it works.

Unit Testing/1.0

Deep equality is defined in CommonJS Unit Testing/1.0, under subsection 7. The algorithm assumes two arguments: expected and actual. The purpose of the algorithm is to determine if the values are equivalent. It supports both primitive values and objects.

  1. Strict equals (===) means the values are equivalent
  2. Compare dates using the getTime method
  3. If values are not objects, compare with ==
  4. Otherwise, compare each object's size, keys, and values

The fourth point is probably what you would assume deep equality actually means. The other stages reveal things about the way JavaScript works -- the third stage means values that are not objects can easily be compared with == because they're primitive values (Undefined, Null, Boolean, Number, or String).

The second step works because getTime is the most convenient way of comparing dates:

var assert = require('assert')  
  , a = new Date(2012, 1, 1)
  , b = new Date(2012, 1, 1)
  ;

assert.ok(a !== b);  
assert.ok(a != b);  
assert.ok(a.getTime() == b.getTime());  
assert.deepEqual(a, b);  

This script can be run in Node, or with a suitable CommonJS assertion library. It illustrates the point that dates are not considered equal using the equality or strict equality operators -- the easiest way to compare them is with getTime.

Object comparison implies recursion, as some values may also be objects. Also, key comparison isn't as simple as it might seem: real implementations sort keys, compare length, then compare each value.

Bugs

Bugs have been found in the Unit Testing/1.0 specification since it originally appeared. Two have been flagged up on the main Unit Testing page. The Node assert module addresses these points. For example, regular expressions are a special case in the deepEqual implementation:

return actual.source === expected.source &&  
       actual.global === expected.global &&
       actual.multiline === expected.multiline &&
       actual.lastIndex === expected.lastIndex &&
       actual.ignoreCase === expected.ignoreCase;

The source property has a string that represents the original regular expression, and then each flag has to be compared.

Object Comparison

The next time you're writing a test, or even just comparing objects, remember that == will only work for "shallow" comparisons. Testing other values like arrays, dates, regular expressions, and objects requires a little bit more effort.

Featured

tutorials language js101 beginner

JS101: __proto__

Posted on .

When I originally wrote about prototypes in JS101: Prototypes a few people were confused that I didn't mention the __proto__ property. One reason I didn't mention it is I was sticking to standard ECMAScript for the most part, using the Annotated ECMAScript 5.1 site as a reference. It's actually hard to talk about prototypes without referring to __proto__, though, because it serves a very specific and useful purpose.

Recall that objects are created using constructors:

function User() {  
}

var user = new User();  

The prototype property can be used to add properties to instances of User:

function User() {  
}

User.prototype.greet = function() {  
  return 'hello';
};

var user = new User();  
user.greet();  

So far so good. The original constructor can be referenced using the constructor property on an instance:

assert.equal(user.constructor, User);  

However, user.prototype is not the same as User.prototype. What if we wanted to get hold of the original prototype where the greet method was defined based on an instance of a User?

That's where __proto__ comes in. Given that fact, we now know the following two statements to be true:

assert.equal(user.constructor, User);  
assert.equal(user.__proto__, User.prototype);  

Unfortunately, __proto__ doesn't appear in ECMAScript 5 -- so where does it come from? As noted by the documentation on MDN it's a non-standard property. Or is it? It's included in Ecma-262 Edition 6, which means whether it's standard or not depends on the version of ECMAScript that you're using.

It follows that an instance's constructor should contain a reference to the constructor's prototype. If this is true, then we can test it using these assertions:

assert.equal(user.constructor.prototype, User.prototype);  
assert.equal(user.constructor.prototype, user.__proto__);  

The standards also define Object.getPrototypeOf -- this returns the internal property of an object. That means we can use it to access the constructor's prototype:

assert.equal(Object.getPrototypeOf(user), User.prototype);  

Putting all of this together gives this script which will pass in Node and Chrome (given a suitable assertion library):

var assert = require('assert');

function User() {  
}

var user = new User();

assert.equal(user.__proto__, User.prototype);  
assert.equal(user.constructor, User);  
assert.equal(user.constructor.prototype, User.prototype);  
assert.equal(user.constructor.prototype, user.__proto__);  
assert.equal(Object.getPrototypeOf(user), User.prototype);  

Internal Prototype

The confusion around __proto__ arises because of the term internal prototype:

All objects have an internal property called [[Prototype]]. The value of this property is either null or an object and is used for implementing inheritance.

Internally there has to be a way to access the constructor's prototype to correctly implement inheritance -- whether or not this is available to us is another matter. Why is accessing it useful to us? In the wild you'll occasionally see people setting an object's __proto__ property to make objects look like they inherit from another object. This used to be the case in Node's assertion module, but Node's util.inherits method is a more idiomatic way to do it:

// Compare to: assert.AssertionError.__proto__ = Error.prototype;
util.inherits(assert.AssertionError, Error);  

This was changed in assert: remove unnecessary use of __proto__.

The Constructor's Prototype

The User example's internal prototype is set to Function.prototype:

assert.equal(User.__proto__, Function.prototype);  

If you're about to put on your hat, pick up your briefcase, and walk right out the door: hold on a minute. You're coming to the end of the chain -- the prototype chain that is:

assert.equal(User.__proto__, Function.prototype);  
assert.equal(Function.prototype.__proto__, Object.prototype);  
assert.equal(Object.prototype.__proto__, null);  

Remember that the __proto__ property is the internal prototype -- this is how JavaScript's inheritance chain is implemented. Every User inherits from Function.prototype which in turn inherits from Object.prototype, and Object.prototype's internal prototype is null which allows the inheritance algorithm to know it has reached the end of the chain.

Therefore, adding a method to Object.prototype will make it available to every object. Properties of the Object Prototype Object include toString, valueOf, and hasOwnProperty. That means instances of the User constructor in the previous example will have these methods.

Pithy Closing Remark

JavaScript's inheritance model is not class-based. Joost Diepenmaat's post, Constructors considered mildly confusing, summarises this as follows:

In a class-based object system, typically classes inherit from each other, and objects are instances of those classes. ... constructors do nothing like this: in fact constructors have their own [[Prototype]] chain completely separate from the [[Prototype]] chain of objects they initialize.

Rather than visualising JavaScript objects as "classes", try to think in terms of two parallel lines of prototype chains: one for constructors, and one for initialised objects.

References

Featured

tutorials language js101 beginner

JS101: Equality

Posted on .

There are four equality operators in JavaScript:

  • Equals: ==
  • Not equal: !=
  • Strict equal: ===
  • Strict not equal: !==

In JavaScript: The Good Parts, Douglas Crockford advises against using == and !=:

My advice is to never use the evil twins. Instead, always use === and !==.

The result of the equals operator is calculated based on The Abstract Equality Comparison Algorithm. This can lead to confusing results, and these examples are often cited:

'' == '0'           // false  
0 == ''             // true  
0 == '0'            // true

false == undefined  // false  
false == null       // false  
null == undefined   // true  

Fortunately, we can look at the algorithm to better understand these results. The first example is false due to this rule:

If Type(x) is String, then return true if x and y are exactly the same sequence of characters (same length and same characters in corresponding positions). Otherwise, return false.

Basically, the sequence of strings is not the same. In the second example, the types are different, so this rule is used:

If Type(x) is Number and Type(y) is String, return the result of the comparison x == ToNumber(y).

This is where the behaviour of the == starts to get seriously gnarly: behind the scenes, values and objects are changed to different types. The equality operator always tries to compare primitive values, whereas the strict equality operator will return false if the two values are not the same type. For reference, the underlying mechanism used by the strict equality operator is documented in the The Strict Equality Comparison Algorithm section in the ECMAScript Specification.

Strict Equality Examples

Using the same example with the strict equality operator shows an arguably more intuitive result:

'' === '0'           // false  
0 === ''             // false  
0 === '0'            // false

false === undefined  // false  
false === null       // false  
null === undefined   // false  

Is this really how professional JavaScript developers write code? And if so, does === get used that often? Take a look at ajax.js from jQuery's source:

executeOnly = ( structure === prefilters );  
if ( typeof selection === "string" ) {  
} else if ( params && typeof params === "object" ) {

The strict equality operator is used almost everywhere, apart from here:

if ( s.crossDomain == null ) {  

In this case, both undefined and null will be equal, which is a case where == is often used in preference to the strict equivalent:

if ( s.crossDomain === null || s.crossDomain === undefined ) {  

Assertions

One place where the difference between equality and strict equality becomes apparent is in JavaScript unit tests. Most assertion libraries include a way to check 'shallow' equality and 'deep equality'. In CommonJS Unit Testing, these are known as assert.equal and assert.deepEqual.

In the case of deepEqual, there's specific handling for dates and arrays:

equivalence is determined by having the same number of owned properties (as verified with Object.prototype.hasOwnProperty.call), the same set of keys (although not necessarily the same order), equivalent values for every corresponding key, and an identical "prototype" property

Conclusion

To understand how equality and strict equality work in JavaScript, primitive values and JavaScript's implicit type conversion behaviour must be understood. In general, experienced developers advocate using ===, and this is good practice for beginners.

In recognising the confusion surrounding these operators, there is a significant amount of documentation on the topic. For example, Comparison Operators in Mozilla's JavaScript Reference.