The Ocrad.js demo managed to recognise the text in my sample image. I noticed it didn't work with white on black text -- it had to be inverted for the correct text to be recognised.
Anyway, I tried to compile GOCR first and was immediately struck by how easy and painless it had been. I was on a roll, and decided to do Ocrad as well. It wasn't particularly hard- sure it was slightly more involved but still hardly anything.
He also mentions Tesseract, which is a popular OCR system but also widely known to be very large:
In fact, what's absolutely stunning is the sheer universality of Tesseract. Just about everything which claims to have text recognition as a feature is backed by it. At one point, I was hoping that Mathematica had some clever routine using morphology and symbolic new kinds of sciences and evolved automata pattern recognition. Nope! Nestled deep within the gigabytes of code lies the Chuck Testa of textadermies: Tesseract.
I thought Konrad's demo was cool -- being able to edit the brightness, contrast, and crop the image was a nice use of client-side technology. However, so far I've had the problem Kevin mentioned: occasional blocks of nothing, or seemingly random text, then suddenly excellent results.
The great thing about this project is they're trying to keep old software alive. You can browse through titles and play them in a browser. This is powered by jsmess (GitHub: jsmess / jsmess), an Emscripten-based emulator derived from MESS:
Running a game binary requires a suitable BIOS, but the groundwork for lots of systems has been added to MESS:
MESS and MAME were started over a decade ago to provide ubiquitous, universal emulation of arcade/gaming machines (MAME) and general computer hardware (MESS). While specific emulation implementations exist that do specific machines better than MAME/MESS, no other project has the comprehensiveness and modularity. Modifications are consistently coming in, and emulation breadth and quality increases over time. In the case of MAME, pages exist listing machines it does not emulate.
Over the last two years there's been a flood of new browser-based emulators, supporting everything from the Amiga to the Game Boy Advance. Part of what makes these project possible is recent technologies like Canvas, WebGL, WebAudio, and FileReader. But even seemingly less buzzwordy APIs like typed arrays can help get old games running smoothly.
Finally, my worlds have collided! In case you didn't know, I regularly write a Vim blog. It's surprisingly easy to find things to say about this 22 year old text editor, and it's been my main tool for writing code and articles for a long time. Vim.js (GitHub: coolwanglu / vim.js) by Lu Wang is an Emscripten port of Vim, allowing you to use Vim in a browser.
It runs pretty well on my computer -- it seems fast, and the commands I typically use work. It's not like these Vim layers for IDEs and other editors that miss certain motions, registers, and so on: it's basically Vim. Split windows and tabs work, but the help files aren't available (or I can't find them). The way it works in the browser is to use a <span> for each terminal character, which means for the 43x115 example window there are 4945 spans!
Most works are automatically done by web/transform.js, read the comments inside for more detail. But there are a few left, mainly function pointers, which cannot be automatically identified. Whenever vim.js crashes and you see callback function is not specified! in the browser console, congratulations, you have found one more async function at large.
I wonder if there are any Node developers or ES6 experts that can help with this? If you're interested in the project, there's a TODO which has some Emscripten issues and client-side work that needs doing.