DailyJS

DailyJS

The JavaScript blog.


Tagbeginner
Featured

tutorials language js101 beginner

JS101: A Primer on Strings and String Encodings

Posted on .

What is a JavaScript string? It depends on the context. For instance, a string is a primitive value -- a value represented at the "lowest level" of the language's implementation.

Strings are also members of the type String. Strings can be created with the String constructor. Running new String('hello') creates an instance of String.

Finally, String literals are found in the program's source: var name = 'alex'.

Given that there are many ways to represent strings, what is the underlying encoding in JavaScript? Both the third and fifth editions of ECMAScript state that strings are represented as 16-bit unsigned integers:

Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text. However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.

String Encoding

Ultimately a string is just a sequence of characters. In other words, an array of units of information that correspond to digits, letters, and so on. Characters are represented as byte sequences.

When working on client-side JavaScript and HTML, we're used to seeing charset=UTF-8. UTF-8 is a system for encoding characters, and is actually "variable width", which means the bytes used to represent an individual character can vary in length.

I said earlier that JavaScript strings are 16-bit, so how does this relate to UTF-8? In extremely simplified terms for the purposes of a beginner's article, you can think about it like this: JavaScript engines use a fixed 16-bit representation of characters that makes it easier to manage strings internally.

So, even though a browser's JavaScript engine internally represents characters as 16-bit numbers, we don't usually need to know about this. Writing the strings to form controls with the DOM or using XMLHTTPRequest should convert the string to the right encoding. Ideally the server should have sent the Content-Type header set to UTF-8, so the browser will know what to do.

More About Encodings

Even if you're a client-side developer that doesn't care about string encodings, Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets is worth reading because it explains the history behind string encodings. Understanding the history behind what can be a frustrating topic makes it easier to understand.

If you need to work on string encodings in JavaScript, Johan Sundström's post Encoding / decoding UTF8 in javascript from back in 2006 explains how to encode and decode UTF-8.

Monsur Hossain went on to write UTF-8 in JavaScript which goes into unescape and encodeURIComponent in more detail.

Featured

tutorials testing language js101 beginner

JS101: Deep Equal

Posted on .

Back in JS101: Equality I wrote about the difference between == and ===. This is one area of the language that quite clearly causes issues for beginners. In addition, there is another equality concept that can come in handy when writing tests: deep equal. It also illustrates some of the underlying mechanics of the language. As an intermediate JavaScript developer, you should have at least a passing familiarity with deepEqual and how it works.

Unit Testing/1.0

Deep equality is defined in CommonJS Unit Testing/1.0, under subsection 7. The algorithm assumes two arguments: expected and actual. The purpose of the algorithm is to determine if the values are equivalent. It supports both primitive values and objects.

  1. Strict equals (===) means the values are equivalent
  2. Compare dates using the getTime method
  3. If values are not objects, compare with ==
  4. Otherwise, compare each object's size, keys, and values

The fourth point is probably what you would assume deep equality actually means. The other stages reveal things about the way JavaScript works -- the third stage means values that are not objects can easily be compared with == because they're primitive values (Undefined, Null, Boolean, Number, or String).

The second step works because getTime is the most convenient way of comparing dates:

var assert = require('assert')  
  , a = new Date(2012, 1, 1)
  , b = new Date(2012, 1, 1)
  ;

assert.ok(a !== b);  
assert.ok(a != b);  
assert.ok(a.getTime() == b.getTime());  
assert.deepEqual(a, b);  

This script can be run in Node, or with a suitable CommonJS assertion library. It illustrates the point that dates are not considered equal using the equality or strict equality operators -- the easiest way to compare them is with getTime.

Object comparison implies recursion, as some values may also be objects. Also, key comparison isn't as simple as it might seem: real implementations sort keys, compare length, then compare each value.

Bugs

Bugs have been found in the Unit Testing/1.0 specification since it originally appeared. Two have been flagged up on the main Unit Testing page. The Node assert module addresses these points. For example, regular expressions are a special case in the deepEqual implementation:

return actual.source === expected.source &&  
       actual.global === expected.global &&
       actual.multiline === expected.multiline &&
       actual.lastIndex === expected.lastIndex &&
       actual.ignoreCase === expected.ignoreCase;

The source property has a string that represents the original regular expression, and then each flag has to be compared.

Object Comparison

The next time you're writing a test, or even just comparing objects, remember that == will only work for "shallow" comparisons. Testing other values like arrays, dates, regular expressions, and objects requires a little bit more effort.

Featured

tutorials language js101 beginner

JS101: __proto__

Posted on .

When I originally wrote about prototypes in JS101: Prototypes a few people were confused that I didn't mention the __proto__ property. One reason I didn't mention it is I was sticking to standard ECMAScript for the most part, using the Annotated ECMAScript 5.1 site as a reference. It's actually hard to talk about prototypes without referring to __proto__, though, because it serves a very specific and useful purpose.

Recall that objects are created using constructors:

function User() {  
}

var user = new User();  

The prototype property can be used to add properties to instances of User:

function User() {  
}

User.prototype.greet = function() {  
  return 'hello';
};

var user = new User();  
user.greet();  

So far so good. The original constructor can be referenced using the constructor property on an instance:

assert.equal(user.constructor, User);  

However, user.prototype is not the same as User.prototype. What if we wanted to get hold of the original prototype where the greet method was defined based on an instance of a User?

That's where __proto__ comes in. Given that fact, we now know the following two statements to be true:

assert.equal(user.constructor, User);  
assert.equal(user.__proto__, User.prototype);  

Unfortunately, __proto__ doesn't appear in ECMAScript 5 -- so where does it come from? As noted by the documentation on MDN it's a non-standard property. Or is it? It's included in Ecma-262 Edition 6, which means whether it's standard or not depends on the version of ECMAScript that you're using.

It follows that an instance's constructor should contain a reference to the constructor's prototype. If this is true, then we can test it using these assertions:

assert.equal(user.constructor.prototype, User.prototype);  
assert.equal(user.constructor.prototype, user.__proto__);  

The standards also define Object.getPrototypeOf -- this returns the internal property of an object. That means we can use it to access the constructor's prototype:

assert.equal(Object.getPrototypeOf(user), User.prototype);  

Putting all of this together gives this script which will pass in Node and Chrome (given a suitable assertion library):

var assert = require('assert');

function User() {  
}

var user = new User();

assert.equal(user.__proto__, User.prototype);  
assert.equal(user.constructor, User);  
assert.equal(user.constructor.prototype, User.prototype);  
assert.equal(user.constructor.prototype, user.__proto__);  
assert.equal(Object.getPrototypeOf(user), User.prototype);  

Internal Prototype

The confusion around __proto__ arises because of the term internal prototype:

All objects have an internal property called [[Prototype]]. The value of this property is either null or an object and is used for implementing inheritance.

Internally there has to be a way to access the constructor's prototype to correctly implement inheritance -- whether or not this is available to us is another matter. Why is accessing it useful to us? In the wild you'll occasionally see people setting an object's __proto__ property to make objects look like they inherit from another object. This used to be the case in Node's assertion module, but Node's util.inherits method is a more idiomatic way to do it:

// Compare to: assert.AssertionError.__proto__ = Error.prototype;
util.inherits(assert.AssertionError, Error);  

This was changed in assert: remove unnecessary use of __proto__.

The Constructor's Prototype

The User example's internal prototype is set to Function.prototype:

assert.equal(User.__proto__, Function.prototype);  

If you're about to put on your hat, pick up your briefcase, and walk right out the door: hold on a minute. You're coming to the end of the chain -- the prototype chain that is:

assert.equal(User.__proto__, Function.prototype);  
assert.equal(Function.prototype.__proto__, Object.prototype);  
assert.equal(Object.prototype.__proto__, null);  

Remember that the __proto__ property is the internal prototype -- this is how JavaScript's inheritance chain is implemented. Every User inherits from Function.prototype which in turn inherits from Object.prototype, and Object.prototype's internal prototype is null which allows the inheritance algorithm to know it has reached the end of the chain.

Therefore, adding a method to Object.prototype will make it available to every object. Properties of the Object Prototype Object include toString, valueOf, and hasOwnProperty. That means instances of the User constructor in the previous example will have these methods.

Pithy Closing Remark

JavaScript's inheritance model is not class-based. Joost Diepenmaat's post, Constructors considered mildly confusing, summarises this as follows:

In a class-based object system, typically classes inherit from each other, and objects are instances of those classes. ... constructors do nothing like this: in fact constructors have their own [[Prototype]] chain completely separate from the [[Prototype]] chain of objects they initialize.

Rather than visualising JavaScript objects as "classes", try to think in terms of two parallel lines of prototype chains: one for constructors, and one for initialised objects.

References

Featured

tutorials language js101 beginner

JS101: Equality

Posted on .

There are four equality operators in JavaScript:

  • Equals: ==
  • Not equal: !=
  • Strict equal: ===
  • Strict not equal: !==

In JavaScript: The Good Parts, Douglas Crockford advises against using == and !=:

My advice is to never use the evil twins. Instead, always use === and !==.

The result of the equals operator is calculated based on The Abstract Equality Comparison Algorithm. This can lead to confusing results, and these examples are often cited:

'' == '0'           // false  
0 == ''             // true  
0 == '0'            // true

false == undefined  // false  
false == null       // false  
null == undefined   // true  

Fortunately, we can look at the algorithm to better understand these results. The first example is false due to this rule:

If Type(x) is String, then return true if x and y are exactly the same sequence of characters (same length and same characters in corresponding positions). Otherwise, return false.

Basically, the sequence of strings is not the same. In the second example, the types are different, so this rule is used:

If Type(x) is Number and Type(y) is String, return the result of the comparison x == ToNumber(y).

This is where the behaviour of the == starts to get seriously gnarly: behind the scenes, values and objects are changed to different types. The equality operator always tries to compare primitive values, whereas the strict equality operator will return false if the two values are not the same type. For reference, the underlying mechanism used by the strict equality operator is documented in the The Strict Equality Comparison Algorithm section in the ECMAScript Specification.

Strict Equality Examples

Using the same example with the strict equality operator shows an arguably more intuitive result:

'' === '0'           // false  
0 === ''             // false  
0 === '0'            // false

false === undefined  // false  
false === null       // false  
null === undefined   // false  

Is this really how professional JavaScript developers write code? And if so, does === get used that often? Take a look at ajax.js from jQuery's source:

executeOnly = ( structure === prefilters );  
if ( typeof selection === "string" ) {  
} else if ( params && typeof params === "object" ) {

The strict equality operator is used almost everywhere, apart from here:

if ( s.crossDomain == null ) {  

In this case, both undefined and null will be equal, which is a case where == is often used in preference to the strict equivalent:

if ( s.crossDomain === null || s.crossDomain === undefined ) {  

Assertions

One place where the difference between equality and strict equality becomes apparent is in JavaScript unit tests. Most assertion libraries include a way to check 'shallow' equality and 'deep equality'. In CommonJS Unit Testing, these are known as assert.equal and assert.deepEqual.

In the case of deepEqual, there's specific handling for dates and arrays:

equivalence is determined by having the same number of owned properties (as verified with Object.prototype.hasOwnProperty.call), the same set of keys (although not necessarily the same order), equivalent values for every corresponding key, and an identical "prototype" property

Conclusion

To understand how equality and strict equality work in JavaScript, primitive values and JavaScript's implicit type conversion behaviour must be understood. In general, experienced developers advocate using ===, and this is good practice for beginners.

In recognising the confusion surrounding these operators, there is a significant amount of documentation on the topic. For example, Comparison Operators in Mozilla's JavaScript Reference.

Featured

tutorials language js101 beginner

JS101: The Language Past and Present

Posted on .

We've covered a lot of ground since the first JS101, but the truth is I've missed out an important question: what is JavaScript, and who controls it?

I've answered this question and covered a lot more in the History of JavaScript series. This post is a brief introduction, and after reading this post you should know the basics of JavaScript and its relationship to ECMAScript.

Who Made JavaScript?

JavaScript was created by Brendan Eich in 1995 for Netscape. Netscape submitted JavaScript to Ecma International, which is a standards organization based in Geneva. The standardised version is known as ECMAScript.

What is ECMAScript?

Most articles on sites like DailyJS will refer to ECMA-262, ECMAScript 3, and ECMAScript 5. We usually abbreviate these terms to ES3 and ES5.

  • ECMA-262 is the name of the specification, of which there are five editions
  • ECMAScript 3 (or ECMA-262, edition 3) was published in December 1999, and was supported by Netscape 6 and IE 5.5
  • ECMAScript 5 (or ECMA-262, edition 5) was published in December 2009 and is supported by Firefox 4+, Safari 6: ES5 compatibility table

ECMAScript 5 adds a lot of features that we've already started to take for granted: including new array methods like forEach, new Object methods like Object.create, property attributes, function binding, and more.

The Future

ECMAScript is still being actively developed. It's tentatively known as ES.next (ECMA-262 Edition 6), and ES.next working drafts can be downloaded from the ECMAScript wiki.

Proposals for the language are collected on the strawman wiki, and discussed in detail by several contributors on the wiki and mailing lists. To keep up with this, I often check the wiki's recent changes and the es-discuss mailing list.

Brendan Eich's blog covers a lot of the major developments as well. For example, his latest post mentions ES6 and the relationship between the standards makers and the enthusiastic JavaScript developer community.

The language will change, and as new standards are made available you'll need to know what your given platform or browser supports.