Convert HTML to ENML for Evernote – a non-trivial process

As title implies, this post is for Evernote hackers.

This function is technically one of the most crucial part for making Cheeatz, an editor to Evernote your Code with Gists and Markdown.

so my use case is: convert the HTML generated by Gist’s javascript and Markdown into ENML and save in Evernote.

(convertinng gist’s javascript and markdown into HTML is another non-trivial process, which is not scope of this article)

There are nice javascript libraries to work with Evernote, namely the official sdk. For manipulating ENML, I recommen enmljs by berryboy. This is a simple & handy util.

So enml.js has useful and well-named methods – enml.PlainTextOfENMLenml.HTMLOfENML, etc

the only thing missing is ENMLOfHTML(), which I need.

enmlOfHtml

Github: enmlOfHtml

html to enml, <a> to an <a><p> to a <p>, Sounds easy. Only later I found this is indeed a non-trivial process. I hacked it anyway and put it as above.

usage:

var enmlOfHtmljs = require('enmlOfHtml');

var html = '<html><p>put html here</p></html>';

//ENML is valid ENML that you can send to evernote for creation

enmlOfHtmljs.ENMLOfHTML(html,function(err,ENML){

    console.log(ENML);

});

You can go straight to try it, but a understanding is highly recommend:

Before go in to details, we need to understand the process of saving a note in Evernote.

I won’t cover those auth,token, etc where you can read in their documentation, but focus on ENML.

ENML

ENML is based on a subset of XHTML. There are rules and schema to follow, permitted and prohibited element which can be read here

What need to be done to convert HTML into ENML in Evernote server

From the documentation,

  1. Convert the document into valid XML
  2. Discard all tags that are not accepted by the ENML DTD
  3. Convert tags to the proper ENML equivalent (e.g. BODY becomes EN-NOTE)
  4. Validate against the ENML DTD
  5. Validate href and src values to be valid URLs and protocols

XML

As in step 1, the basic thing is you need to write XML.

here I used xml-writer which enmljs used

Dom or Not?

Some library write xml using tree-like structure or with DOM-likeapi. From my experience there is performance punishment to emulate the dom at node side (e.g. with jsdom). I choose to write those HTML straight

I have been trying with libxmljs, but I dont see advantage using it at the moment for building XML. However I believe for parsing this one is nice.

Since this use purely regex, this part should work in both client side and server side.

Dont escape those HTML!

One Caveat is you need to writeRaw to write characters, otherwise HTML will be escaped

Clean up your HTML

Then step 2 & 3 is the tricky part. Doing it with regex alone could be painful, but luckily I found this

module node-resanitize

I modifiy the library to support options on what attributes to escape.

also remember to replace body with en-note

CSS!

This is one of the most non-trivial part which is logically:

  1. there is link style sheet in HTML (as in gist)
  2. ENML dont support link tag.
  3. Luckily, style attribute is supported in most tags.

inline it!

=>so you need to extract that style sheet (download if needed) and inline it as attribute.

luckily, there is a bigger audience for this problem. Another place posing similar requirements are what you use day to days, Email.

So there are some good libraries out there. Styliner is excellent.

Meanwhile, it used Q and result is returned inside the callback, and this make this enmlOfHtml put result into callback as well.

Note the 5th step – values in href and src must be valid URLs and protocols

This is what I missed and somehow created a bug.

At the time of writing, github changed their javascript to render one of the link without the gist domain –>actually a bug

so instead of href="https://gist.github.com/vincent....", there is href="/vincent..."

Then when user try to create Note in my site, it fails as when I call the create Note api there is an error

{ errorCode: 11, 

parameter: 'Error processing document: Invalid a href attribute:vincentlaucy/5548010/raw/29e88cc4f84422df5febadf93b10227f4c894c9b/gistfile1.js' } 

With try and error, to get Evernote accept your ENML, it must start with :// at least

Some Similar implementation is in Sanitize, where you can pass options on what to accept (e.g. ftp://, http:// etc), just it is client side.

These values should be either removed / replace with default / current domain to pass the validation.

I put a simple regex for that purpose.

Side-track: this is why you should always write “learning test” against external api

Make it better: Local Validation

I didnt mention step 4- validation

As metnioned in Evernote’s Docuemntation

Note: While it is possible to rely on the Evernote Cloud API to validate the ENML of your notes, we recommend downloading the DTD file (linked above) and use it to validate your note’s XML within your app. A few reasons this is a good idea:

  • Note validation will be much faster when performed locally.
  • Note validation can be performed offline.
  • The results of validating your notes locally will be the same as if you were to rely on the Evernote Cloud API to validate your ENML.

So Evernote is using DTD but not XSD, I googled a little bit on using node for DTD validation, however seems no javascript library available at the moment. Let me know if you found one.

Make it better: more

so I put a trivial implementation for this non-trivial process, but more worth to be done

  • test casessss
  • make this module support requirejs
  • it on client side
  • find/create a module that is good at both client side and server side HTML sanitize, with generic options

Hope you find it useful.

Happy to tell you this blog post produced using Cheeatz

Advertisements

Introducing Cheeatz: Evernote your code with Gists and Markdown

Last 2 months I have been working hard on Cheeatz

By Cheeatz – that is cheat sheet. (and I hope it sounds like Cheese for you)

It is on Evernote Dev Cup

Basically right now it is what the pitch line says:

Evernote your code with Gist and Markdown

so with Cheeatz you can write in markdown, embed your gist and save it into evernote – with all those formatting.

Demo

Write markdown and save gist into evernote.

For HOW-TOs, [visit the site](cheeatz.com/editor) or watch this video.
It is good for both non-developers and developers.

Basically with this web app:

  • you input your markdown with the live-preview,
  • If you are developer and writing about some code – embed gist by id when you want!
  • save to Evernote and open it for modify later
  • and share it to your friends.

Use Cases

  • you are creating cheatsheet
  • you are creating documentation
  • you are jotting notes against some code

Upcoming / Missing features:

  • create gist right away from the editor
  • Tagging
  • Code language-syntax highlighting in editor

Why Evernote

I am addicted to Evernote and I think it provides really nice features – esp synchronization and search, of course.

There are various ways to embed code in a blog post or documentations. Gist is great gaining popularity. Nice syntax highlighting and one thing definitely important – every gist is a git repo. The only thing missing is pull request, and Cheeatz is little bit into that.

Also for sure the Evernote DevCup is very kind to developers – thus I started to create this prototype on Evernote’s API

It is non-trivial

you may first think why don’t I just clip the gist and paste into evernote.

It work in some scenarios, but what If your gist changes? your markdown changes? You definitely needsynchronization.

Also most blogging platforms allow you to embed gist and show it in browser, but we also render the gist view and save it to your Evernote.

You can view in Evernote your gist with all those highlighting and search it, even offline.

But Why

This is a prototype, what I aim is something bigger.

The current prototype missed the most important feature: search for Cheeatz and collaborate for it.

There has been too many times that when I code I google for quick snippets. I look for stuff likejava buffered input stream or unix kill process by name but I need to read through many pages to get that simple code.

A recent example, I was working with a Java expert and I saw him literally google for Thread.sleep()example and paste it into the code. Whay he got is the Thread.sleep with the try catch.

Admit it, We all cheat

Thanks to google I can be a programmer with my bad memory. “Cheating” for a solution provides you the correct syntax, quick answer and more importantly, best practices insights on stuff that you never thought of.

Cheating Ecosystem

What we need is a platform that we can cheat better, together.

Again, gist is great. But there too many gists. Trust the cloud, voting up and down will be helpful.

Also we will need ways that people can contribute to the same cheat sheet. What if every cheeatz is a git repository as well? then you got all those familiar clone-pull requests.

With Markdown you write less

I only discovered markdown this year but it is definitely game-changing.

I have been using different approaches to sync markdown and evernote, but for seamless integration I will need to create my own.

Stackoverflow is there

Stackoverflow is great. It literally saves days and years for developers. You know what I mean.

However, sometimes I feel that Stackoverflow should be places for problems that required detailed and quality discussion. As its policy says, google and read the documentation before you ask the question. For questions like how to loop an array in java or how to wait the document ready in jQuery, that should really be other places.

Last Year I was learning Groovy at company and really it is Groovy Goodness by mrhaki that taught me how to. Sometimes you know what you want to do – loop an array or so, just you need to get damn syntax right. or sometimes you don’t know what you are doing – you see those examples and you got some idea. That site is excellent – you google it and it provides examples and expected results in a precise manner. Many cases you just copy, paste and modify a little bit you get what you want. This is quite inspiring for me.

Save some brain power

Personally I am little bit into cognitive science and thus I believe providing the most efficient information retrieval is really something important. Unified format. Applicable platforms. Executable code readily result will ease our brain Time are saved and less neurons are distracted.

A little bit more serious than my stuff before, at least I bought my very first domain name.

Stack

At the moment, it is Node.js/express.js/require.js/Redis. Yeoman for dev. Trying to put AngularJs into the picture.

Thanks to many more node libraries, I will write a little bit more on this latter.

So at the very least this blog post is written in the editor of Cheeatz I created, and I think it helped.

Thanks to my friends @gilbertwat and @westerpantz for contributing and make this happening.

Now I need to rush for my other projects – but let me know your thoughts on this. I do agree it might be quite fragile and slow right now, but if I can learn that it is something really worth the effort I will definitely improve it, and I believe this is how everything started.