As title implies, this post is for Evernote hackers.
This function is technically one of the most crucial part for making Cheeatz, an editor to Evernote your Code with Gists and Markdown.
so my use case is: convert the HTML generated by Gist’s javascript and Markdown into ENML and save in Evernote.
(convertinng gist’s javascript and markdown into HTML is another non-trivial process, which is not scope of this article)
There are nice javascript libraries to work with Evernote, namely the official sdk. For manipulating ENML, I recommen enmljs by berryboy. This is a simple & handy util.
So enml.js has useful and well-named methods – enml.PlainTextOfENML
, enml.HTMLOfENML
, etc
the only thing missing is ENMLOfHTML()
, which I need.
enmlOfHtml
Github: enmlOfHtml
html to enml, <a>
to an <a>
, <p>
to a <p>
, Sounds easy. Only later I found this is indeed a non-trivial process. I hacked it anyway and put it as above.
usage:
var enmlOfHtmljs = require('enmlOfHtml');
var html = '<html><p>put html here</p></html>';
//ENML is valid ENML that you can send to evernote for creation
enmlOfHtmljs.ENMLOfHTML(html,function(err,ENML){
console.log(ENML);
});
You can go straight to try it, but a understanding is highly recommend:
Before go in to details, we need to understand the process of saving a note in Evernote.
I won’t cover those auth,token, etc where you can read in their documentation, but focus on ENML.
ENML
ENML is based on a subset of XHTML. There are rules and schema to follow, permitted and prohibited element which can be read here
What need to be done to convert HTML into ENML in Evernote server
From the documentation,
- Convert the document into valid XML
- Discard all tags that are not accepted by the ENML DTD
- Convert tags to the proper ENML equivalent (e.g. BODY becomes EN-NOTE)
- Validate against the ENML DTD
- Validate href and src values to be valid URLs and protocols
XML
As in step 1, the basic thing is you need to write XML.
here I used xml-writer which enmljs used
Dom or Not?
Some library write xml using tree-like structure or with DOM-likeapi. From my experience there is performance punishment to emulate the dom at node side (e.g. with jsdom). I choose to write those HTML straight
I have been trying with libxmljs, but I dont see advantage using it at the moment for building XML. However I believe for parsing this one is nice.
Since this use purely regex, this part should work in both client side and server side.
Dont escape those HTML!
One Caveat is you need to writeRaw
to write characters, otherwise HTML will be escaped
Clean up your HTML
Then step 2 & 3 is the tricky part. Doing it with regex alone could be painful, but luckily I found this
module node-resanitize
I modifiy the library to support options on what attributes to escape.
also remember to replace body with en-note
CSS!
This is one of the most non-trivial part which is logically:
- there is link style sheet in HTML (as in gist)
- ENML dont support link tag.
- Luckily, style attribute is supported in most tags.
inline it!
=>so you need to extract that style sheet (download if needed) and inline it as attribute.
luckily, there is a bigger audience for this problem. Another place posing similar requirements are what you use day to days, Email.
So there are some good libraries out there. Styliner is excellent.
Meanwhile, it used Q and result is returned inside the callback, and this make this enmlOfHtml put result into callback as well.
Check the Link
Note the 5th step – values in href and src must be valid URLs and protocols
This is what I missed and somehow created a bug.
At the time of writing, github changed their javascript to render one of the link without the gist domain –>actually a bug
so instead of href="https://gist.github.com/vincent...."
, there is href="/vincent..."
Then when user try to create Note in my site, it fails as when I call the create Note api there is an error
{ errorCode: 11,
parameter: 'Error processing document: Invalid a href attribute:vincentlaucy/5548010/raw/29e88cc4f84422df5febadf93b10227f4c894c9b/gistfile1.js' }
With try and error, to get Evernote accept your ENML, it must start with :// at least
Some Similar implementation is in Sanitize, where you can pass options on what to accept (e.g. ftp://, http:// etc), just it is client side.
These values should be either removed / replace with default / current domain to pass the validation.
I put a simple regex for that purpose.
Side-track: this is why you should always write “learning test” against external api
Make it better: Local Validation
I didnt mention step 4- validation
As metnioned in Evernote’s Docuemntation
Note: While it is possible to rely on the Evernote Cloud API to validate the ENML of your notes, we recommend downloading the DTD file (linked above) and use it to validate your note’s XML within your app. A few reasons this is a good idea:
- Note validation will be much faster when performed locally.
- Note validation can be performed offline.
- The results of validating your notes locally will be the same as if you were to rely on the Evernote Cloud API to validate your ENML.
So Evernote is using DTD but not XSD, I googled a little bit on using node for DTD validation, however seems no javascript library available at the moment. Let me know if you found one.
Make it better: more
so I put a trivial implementation for this non-trivial process, but more worth to be done
- test casessss
- make this module support requirejs
- it on client side
- find/create a module that is good at both client side and server side HTML sanitize, with generic options
Hope you find it useful.
Happy to tell you this blog post produced using Cheeatz