The JavaScript blog.


libraries emscripten ocr

JavaScript OCR

Posted on .

What should happen.

Konrad Dzwinel sent in a JavaScript OCR demo. It uses getUserMedia to get images from the camera, glfx.js and JCrop for user-driven image correction, and ocrad.js to handle the character recognition.

The Ocrad.js demo managed to recognise the text in my sample image. I noticed it didn't work with white on black text -- it had to be inverted for the correct text to be recognised.


Ocrad.js is an Emscripten-based translation of Ocrad by Antonio Diaz Diaz. Kevin Kwok, who put together Ocrad.js, also ported GOCR to JavaScript with Emscripten as gocr.js.

GOCR was started by Joerg Schulenburg, but has had other contributors since the original release back in 2000. Kevin compares both libraries and his experiences getting them running in JavaScript:

Anyway, I tried to compile GOCR first and was immediately struck by how easy and painless it had been. I was on a roll, and decided to do Ocrad as well. It wasn't particularly hard- sure it was slightly more involved but still hardly anything.

He also mentions Tesseract, which is a popular OCR system but also widely known to be very large:

In fact, what's absolutely stunning is the sheer universality of Tesseract. Just about everything which claims to have text recognition as a feature is backed by it. At one point, I was hoping that Mathematica had some clever routine using morphology and symbolic new kinds of sciences and evolved automata pattern recognition. Nope! Nestled deep within the gigabytes of code lies the Chuck Testa of textadermies: Tesseract.

I thought Konrad's demo was cool -- being able to edit the brightness, contrast, and crop the image was a nice use of client-side technology. However, so far I've had the problem Kevin mentioned: occasional blocks of nothing, or seemingly random text, then suddenly excellent results.


emulators games emscripten

JavaScript MESS and the Internet Archive

Posted on .

Atari 2600

if you program and want any longevity to your work, make a game. all else recycles, but people rewrite architectures to keep games alive. -- Why the Lucky Stiff

Archive.org has a section dedicated to software. Inside you'll find The Internet Archive Console Living Room, which has details on some major games consoles from the late 70s and 1980s, including the Atari 2600 and the ColecoVision.

The great thing about this project is they're trying to keep old software alive. You can browse through titles and play them in a browser. This is powered by jsmess (GitHub: jsmess / jsmess), an Emscripten-based emulator derived from MESS:

The JAVASCRIPT MESS project is a porting of the MESS emulator, a program that emulates hundreds of machine types, into the JavaScript language. The MESS program can emulate (or begin to emulate) a majority of home computers, and continues to be improved frequently. By porting this program into the standardized and cross-platform JavaScript language, it will be possible to turn computer history and experience into the same embeddable object as movies, documents, and audio.

Running a game binary requires a suitable BIOS, but the groundwork for lots of systems has been added to MESS:

MESS and MAME were started over a decade ago to provide ubiquitous, universal emulation of arcade/gaming machines (MAME) and general computer hardware (MESS). While specific emulation implementations exist that do specific machines better than MAME/MESS, no other project has the comprehensiveness and modularity. Modifications are consistently coming in, and emulation breadth and quality increases over time. In the case of MAME, pages exist listing machines it does not emulate.

Over the last two years there's been a flood of new browser-based emulators, supporting everything from the Amiga to the Game Boy Advance. Part of what makes these project possible is recent technologies like Canvas, WebGL, WebAudio, and FileReader. But even seemingly less buzzwordy APIs like typed arrays can help get old games running smoothly.


editors vim emscripten


Posted on .

Surely the very reason Emscripten was created?

Finally, my worlds have collided! In case you didn't know, I regularly write a Vim blog. It's surprisingly easy to find things to say about this 22 year old text editor, and it's been my main tool for writing code and articles for a long time. Vim.js (GitHub: coolwanglu / vim.js) by Lu Wang is an Emscripten port of Vim, allowing you to use Vim in a browser.

It runs pretty well on my computer -- it seems fast, and the commands I typically use work. It's not like these Vim layers for IDEs and other editors that miss certain motions, registers, and so on: it's basically Vim. Split windows and tabs work, but the help files aren't available (or I can't find them). The way it works in the browser is to use a <span> for each terminal character, which means for the 43x115 example window there are 4945 spans!

From a JavaScript point of view, I found the Sync to Async Transformation document interesting. The author is trying to figure out how to deal with JavaScript's asynchronous nature, given that Vim expects to have a non-busy synchronous sleep() function:

Most works are automatically done by web/transform.js, read the comments inside for more detail. But there are a few left, mainly function pointers, which cannot be automatically identified. Whenever vim.js crashes and you see callback function is not specified! in the browser console, congratulations, you have found one more async function at large.

I wonder if there are any Node developers or ES6 experts that can help with this? If you're interested in the project, there's a TODO which has some Emscripten issues and client-side work that needs doing.