## Programmer Productivity
# Variations on the HTML Two-Step
The World Wide Web's hypertext markup language (HTML) has an inherent beauty all
its own:
* Both the author's words and the typographer's instructions are contained in a
single plain text file.
* Instructions are encoded using short, easily remembered mnemonics.
* Mnemonics are wrapped into pairs of less-than/greater-than tags which are easily
parsed by software.
* Tagged words are nested inside tagged paragraphs.
* Paragraphs are assembled into sections, and sections into the whole.
* The whole document is structured in a hierarchical way that just makes sense.
It's easy to get started with HTML. There's a handful of tags, with single-letter
mnemonics, used to surround significant words (<b>, <i>, <u>, ...).
There's another batch of tags to mark the beginning and ending of paragraphs, pull
quotes, sections and divisions (<p>, <blockquote>, <section>,
<div>, ...). And there are a few miscellaneous tags for proper document assembly
and interoperability (<title>, <meta>, <link>, <script>,
<style>). Beginners can successfully use HTML with just these.
With HTML, the author's work is light, because the browser does most of the
heavy-lifting. Browsers read the tags, determine how to fit the composition to
the page, and how to style the typography and page decorations.
But this apparent ease of use and inherent beauty is deceiving. In practice,
HTML is neither easy to write nor easy to read. It's ironic for me to say this
because I *use* HTML daily. But sadly, I seldom ever *read* an HTML document outside
a browser (unless I'm troubleshooting a problem). And even more rarely do I ever
*write* an HTML document from scratch without the aid of a software app.
I know I'm not alone. It's not that HTML is impossible to read or write — it's
just awkward.
For starters there's the pinky finger problem. You know — the keyboard sequence
for creating a tag: the left pinky finger holds down the `Shift` key while the
right middle finger searches for the `<` key . . . then you release the `Shift` key
while typing the tag's mnemonic characters . . . then once again with the left
pinky `Shift` key thingy, while the right ring finger (or is it the one next to
it?) fumbles around for the `>` key.
Then — after composing the paragraph or phrase or word — the whole process is
repeated with the closing tag, with all the same keystrokes plus the ergonomic *coup
de grâce*: the right pinky hitting the `/` key *while remembering that the left
pinky should momentarily let up on pressing the `Shift` key!*
Try doing this for a sustained period and you'll soon be filing a worker's
compensation claim for ergonomic injury. It's like the party game Twister, where
your body parts get tied in knots just by following the rules.
So writing HTML is unpleasant, but what about reading it?
In actual practice, HTML tags seldom appear stand-alone. Attributes are
frequently applied to tags in order to associate them with styling rules (CSS).
And on dynamic documents, attributes are applied to tags for dynamic
manipulation (JavaScript). Both of these clutter the document significantly. But
the real problem is that tags and attributes aren't just lightly sprinkled here
and there; they're dumped all over the document in a downpour — like one of
those climate change rainfalls that floods the city so you can't see the streets
anymore.
Reading an HTML document — even one that has color syntax highlighting — makes
you wish there was a legal signal-to-noise limit, with offenders getting fined.
It's not uncommon to discover documents where the noise is so many times greater
than the signal that it's impossible to find the author's words.
If Clara Peller was still around, she'd be demanding to know "Where's the
content?"
HTML wasn't supposed to be like this.
---
Back in 1990, HTML and it's sibling, the hypertext transport protocol (HTTP),
were developed as a novel solution to an age-old problem. How can technical data
about an active project be written and distributed to all team members while
every piece of the project was in a continual state of flux?
One of HTML's unsung innovations was that documents were written using plain
text files. That meant that documents could be shared across different operating
systems and read on any computer without the need for any special word
processing software. At the time, plain text files — with embedded instructions
that could be understood by both humans and computers — was a significant new
feature.
But of course the World Wide Web became an entirely different thing. The concept
of hypertext linking became the star of the show, and the promise of
software-less authoring and revision soon fell by the wayside.
---
A few years ago all my notions about HTML were put to the test. Lisa, an
acquaintance, was quizzing me about software that would allow her to
self-publish books. Lisa is an accomplished writer, and she's no stranger to
tech. She's always looking to improve her craft and was ready to learn something
new if it held the promise of streamlining her work.
Now there's something you should know about Lisa. She's the kind of person that
likes things to be simple and distraction-free. So for her, word processing
software with ribbons of buttons were on the way out. She just wanted a clean
slate where her composition could come together, regardless of which device she
had at hand, or what task she was working on.
Somehow Lisa had gotten the notion that learning HTML was her ticket to eBook
publishing, so she was ready to go all in.
At first glance, HTML appears to be a writer's dream. Simple mnemonic marks
annotate the author's manuscript to describe what needs to be emphasized
(<em>, <strong>), or how to break a long document into sections
(<article>, <section>, <hr>), or how to place some paragraphs
out of the normal reading flow (<aside>, <blockquote>).
In addition to that, the model for HTML's tags closely follows the well established
jargon used in books and journals. So Lisa thought it was a natural fit for her
writing. There's a tag for the document (<title>), tags for recording
information about the author, publisher, and copyright (<meta>), tags for
building a hierarchical table of contents (<h1>, <h2>, <h3>,
...), and even tags for editors to communicate necessary revisions (<ins>,
<del>). All of that without any need for special authoring software, or a
content management system, or even a database. So why not write the manuscript
directly using HTML?
The more she learned, the more questions she had. But increasingly these weren't
easy "how to" questions. They were the tough "why" questions — like the ones
delivered non-stop by a three-year old trying to understand the world.
* Why isn't there a way to build a Table of Contents from the headings?
* Why can't the working sections of a manuscript be placed in separate files and
assembled into a finished document later?
* Why aren't hyperlinks bidirectional?
* Why can't the computer figure out that sentences within a list (<ul> and
<ol>) are "list items" without explicit <li> tags?
* Why can't rows and cells of a table be implied without all those <tr> and
<td> tags?
* Why do some tags need a complementary closing tag, while others don't?
I became exasperated with Lisa's tirade and only managed to utter "just
because". The truth was: I had been in this game since the beginning and had
forgotten how to challenge my own assumptions.
---
Lisa's questions were too close to home. Why were so many people using WordPress
with its dependencies on PHP and MySQL — shouldn't authors be able to write blog
posts one document at a time, on any platform, without all that overhead?
And what about Markdown. Why have software developers resorted to writing their
README files using Markdown? Surely software developers know how to write HTML!
Why be restricted to Markdown's small subset of HTML (one that doesn't even have
support for tables), for such an important part of their project?
And then there's wikitext. Is possible that Wikipedia owes its success, in part,
to the fact that it doesn't require contributors to directly deal with HTML?
Everywhere we look we see how directly writing HTML has been shunned. Look at
how Google Docs, Medium, Ghost — and all the other software apps whose focus is
on writing articles that target readers using a browser — have gone to great
lengths to shield authors from ever seeing HTML.
As an author and a software developer I took this as a challenge. Could I come
up with a way to keep the good parts of HTML while getting rid of the ergonomic
disaster that XML imposed on it?
It took a lot of work, but eventually I settled on a design that was
comfortable, and an implementation that worked flawlessly. And by flawless, I
mean *lossless round-trip interoperability.* Document conversions between my
design and HTML could be conducted either direction with complete fidelity.
There isn't space here to provide a tutorial for how the design works, but
curious readers can examine and compare it for themselves. Below are links to
four text documents hosted on Github that contain the entire text for this
article, word-for-word with typography and hypertext marks. You can compare for
yourself which you find most readable.
As for writing, I'll simply tell you that this article was not written using
Medium's famously lovable editor. I'll let you guess which of these four is the
original composition, and which are the converted knockoffs:
* BluePhrase
* Markdown
* HTML
* Wikitext
Here are convenient links to the documentation for each of the above formats to
help with the comparison:
* BluePhrase Syntax
* John Gruber's Markdown Syntax
* HTML Living Standard
* MediaWiki Cheatsheet
Finally, here is a safe playground where you can experiment with what I've come
up with: Blue Fiddle
.
---
In 2004 Tim Berners-Lee, the original creator of the World Wide Web (HTML +
HTTP), was knighted by Queen Elizabeth II for his contributions to society. I
won't go so far as to say my little improvement deserves that much credit, but I
won't refuse the honor if Her Majesty offers it.