How to build rich, styled input on the web

Building rich text editing experiences on the web is tricky. Even adding simple features like bold and italics mean digging into some of the weird, old corners of the HTML standard. This post walks through the basics of how it's done.

Getting to know contenteditable

Setting up a text input

2.1

Restricting input to plain text

2.2

Finding the cursor

2.3

Handling pasted content

Putting it together: building a rich text input area

3.1

Preserving cursor position

next post: adding emoji / auto-complete

tl;dr: Here's an example. Here's the source.

I recently set out to build a rich input box, assuming some googling and stack overflow articles would be enough to get started. I found a lot of content on how bad the tools the browser gives you are. I found less help on how to get started building something. This is an attempt to document my initial learnings from building simple rich text editing on the web.

There are two big parts of building a rich editing experience. The first is detecting the content a user inputs, the second is displaying that content. I knew enough CSS and HTML to get content on a page; this post focuses mostly on capturing input cleanly.

A kind-of-standard for managing user input is the contenteditable attribute, which as far as I can tell is an artifact of an Internet Explorer 5.5 feature for editing emails in Outlook Online. The feature was sort-of-cloned by other browsers. It's bug ridden at best and broken by design at worst. It's also very powerful if you can avoid the messy parts.

There many great open source projects that have a robust approach to fully-featured rich text editing. I was more interested in building some basic functionality (autocomplete and at-mentioning) without loading an entire text editor before page load.

1Getting to know `contenteditable`

It's worth playing with contenteditable a little.

I've iframe'd in an div with the contenteditable flag set, I'd recommend using the link beneath it to try it in a new tab. Here are some things to try:

type into the box.
copy and paste part of a website. Images, formatting, potentially some javascript (onclick=, for example)
copy something with complex CSS (ie, explicit positioning). Stuff can escape the box.

<html>
  <body>
    <div contenteditable="true">
      editable content!
    </div>
  </body>
</html>

example1.html ( full source )

My first goal is to build something that behaves like a textarea. Pretty much any box that allows only text input and has a cursor location will work. I'm going to use a div with the contenteditable flag set to manage the cursor, and then restrict behavior to only allow plain text.

Starting with the box in the first example, I now want to restrict the contents of the box to only be text. On some webkit browsers the attr contenteditable="plaintext-only" does this; not all browsers are webkit.

My approach is to capture any insertion that isn't a single character resulting from a keystroke and define some custom behavior. This means I need to detect drag/drop and paste events. According to the docs these insertions always trigger a drop event. In practice, I need to listen for paste events as well.

function paste_event_handler(e){
  var raw_text, raw_data, paste_data;

  // prevent the paste from happening
  e.preventDefault();

  // try to get the data from a DragEvent
  raw_data = e.dataTransfer;

  // fall back to data from a ClipboardEvent
  paste_data = e.clipboardData;
  raw_data = raw_data || paste_data;

  // extract the raw text from the pasted data
  raw_text = raw_data.getData("Text");
  console.log(raw_text);
};

// set up listeners for `drop` and `paste` events
var editable_div = document.getElementById("my-editable-div");
editable_div.addEventListener("drop", paste_event_handler);
editable_div.addEventListener("paste", paste_event_handler);

example2.html ( full source )

Here's an experiment that suppresses any input that isn't normal typing, instead logging the content to the console. No need to open a dev console, output is captured and displayed on the page. On most browsers, drag-drop from outside an iFrame is suppressed.

According to the w3c working draft, when an editable element has focus it has a cursor or selection. You can check on the cursor or selection's position programmatically.

Before inserting content at the cursor, I needed to understand the structure of selection ranges and how to manipulate them. I'll review the basics, but you can skip ahead.

The content of a div is composed of a set of Nodes (as is the rest of the DOM). For my purposes there are two types of nodes:

text nodes - blocks of text. don't have any additional markup
elements - the DOM nodes we manipulate all the time (divs, spans, etc)

If there is focus on the page, it is represented as a Range. A range has a start location and an end location, called boundary points). A boundary point is defined by a node and an offset.

In a text node a boundary point is effectively an index into a string: the boundary point is immediately after the <offset>th character in the string (or before the first char if offset = 0).

In an elements, the boundary point lies between nodes immediately after the <offset>th child node.

I found this easier to understand after playing with it, I've written a tool for playing with boundary points. Different browsers will behave slightly differently along the boundaries between nodes, but the way in which indexing works should stay the same.

In this demo, there is a graphical representation of the nodes, with red and green dots at the boundary points when the box has focus.

<!-- raw html from example -->
  try
  <span style="color: red">
    <b> highlight</b>
    ing or clicking parts of
  </span>
  this sentence

example3.html

Notice that "this" and "sentence" are different nodes even though they are the same chunk of HTML. Try typing, and using ctrl+i and ctrl+b to bold and italify text. Depending on your browser, you may be able to position boundary points between the text nodes or element boundaries. Each browser is a little different in their implementation of this behavior.

The normalize function tries to join text_nodes when they appear next to each other. You can call it on a parent node that contains multiple child nodes.

I need to manipulate selection ranges as I mutate the div to make the text "rich". The selection and range APIs can do a lot of things - these are the fields and functions I use over the rest of this post to work with selections.

document.getSelection()
Returns a Selection, which represents the current selection in the document; contains zero or more Range's
Selection.rangeCount
the number of Range's in the selection. This is either 0 or 1 in almost every practical situation.
There are some rare cases when rangeCount > 1. The most common is when the user selects some text with one input device (ie, a mouse) and then moves focus with another device (ie, hitting tab on a keyboard) - resulting in two selection ranges. For this post, I'll assume that there is at most one range. In practice you will have to handle higher rangeCounts by checking which elements have focus.
Range.startContainer, Range.startOffset, Range.endContainer, Range.endOffset
The (node, offset) pairs defining a range.
Range.deleteContents()
Delete the content between the boundaries, removing any nodes that are contained completely between those two points.
Range.insertNode(node)
Inserts a node immediately after the start boundary, splitting any text nodes into multiple nodes and pushing any other boundary points to the end of the inserted node.
before insert - <start1><start2><end1> content <end2>
after insert - <start1> <INSERTED_NODE> <start2><end1> content <end2>
Range.setStart(node, offset), Range.setEnd(node, offset)
Adjust the boundaries.
Node.normalize()
Combine adjacent child text nodes into a single node.

Using this API, I add functionality for pasting and dropping content back into the input box. I only insert the text, ignoring styling and formatting.

function get_range(){
  var sel = document.getSelection();
  // rangeCount is 0 if nothing is selected (ie, we do
  // not have user focus)
  if (sel.rangeCount === 0) {
    return;
  }
  // if the browser allows multiple simultaneous selections,
  // much of this example needs to be fancier. Luckily most
  // browsers don't allow that while editing text.
  return sel.getRangeAt(0);
}

function insert_text_at_cursor(text){
  // get user selection, if there is any
  var range = get_range();
  if (!range) return;

  // delete the selection if needed
  range.deleteContents();

  // insert text
  var text_node = document.createTextNode(text);
  range.insertNode(text_node);

  // the "start" of our range is now before the inserted text,
  // we need to move it to the end...
  range.setStart(range.endContainer, range.endOffset);

  // ...and then force user focus to that range
  document.getSelection().removeAllRanges()
  document.getSelection().addRange(range)
}

example4.html ( full source )

Some things are not suppressed here: a user can still add emphasis to text on most user agents (ctrl+b on desktop, for example). Rather than catching those cases, I am going to add some heavy-handed formatting logic that overwrites any of the formatting behavior provided by the user agent.

Now that I have a "textarea" that mostly works, I'm ready to add some intelligence. For my demo I'd like to turn @-mentions blue.

As a starting point, I just reformat the text after each keystroke. This doesn't preserve cursor position. The textarea is almost unusable with the cursor jumping all over the place.

var AT_MENTION_REGEX = /((?!\w)@[\w]+)/g;

/*
* Highlight @-mentions in the most naïve way
* possible - rebuild the entire div, clear
* and replace. Highly recommend you do something
* more efficient in practice :-)
*/
function format_content(){
  // editable_div is the editable DOM element
  // (see example 2 and onwards)
  var raw_content = editable_div.textContent;
  editable_div.innerHTML = raw_content.replace(
    AT_MENTION_REGEX,
    "<span style='color:cyan'>$1</span>"
  );
}

editable_div.addEventListener("keyup", format_content);

example5.html ( full source )

One way to avoid moving focus when mutating an editable region is to mark the cursor position. As three as the text is preserved, the markers should be as well. I do this in three steps

mark the beginning and end of the selection ranges using unique characters
mutate the contents
restore the selection and remove the markers

This breaks if the delimiter characters appear elsewhere in the content being edited. I work around this problem using private use unicode characters. I enforce that two reserved characters are never used in the textarea, that way I can use them as markers.

I start by adding mark_cursor and restore_cursor functions to the code from above.

// set the markers
var START_RANGE_MARKER = "\u0091"
var END_RANGE_MARKER = "\u0092"

// messy regex hack to stop the cursor from interfering
// with matching mentions - in practice we should remove
// the cursor markers before doing tokenization logic.
var AT_MENTION_REGEX = /((?!\w)@[\w\u0091\u0092]+)/g

/*
* Highlight @-mentions in the most naïve way possible - rebuild
* the entire div, clear and replace. Highly recommend you
* do something more efficient :-)
*/
function format_content(){
  var raw_content;
  mark_cursor(); // implemented below
  raw_content = editable_div.textContent;
  editable_div.innerHTML = raw_content.replace(
    AT_MENTION_REGEX,
    "<span style='color:cyan'>$1</span>"
  );
  restore_cursor(); // implemented below
}

I get the Selection and Range, and mark them with my reserved characters. Since the function above doesn't remove characters, these will remain even after we format the text.

For this example I use a helper function to insert the markers, in practice I use the same helper function to handle pasted text.

/*
* marks the current location of the cursor or selection
*/
function mark_cursor(){
  var range = get_range();
  // The order matters here!
  // See the notes on how Node.insertCursor() works
  // above.
  _insert_char(END_RANGE_MARKER,
    range.endContainer, range.endOffset);
  _insert_char(START_RANGE_MARKER,
    range.startContainer, range.startOffset);
}

/*
* inserts a char into a text node (@container) at a given offset
*/
function _insert_char(char, container, offset){
  var cursor, node;
  cursor = document.createRange();
  cursor.setStart(container, offset);
  node = document.createTextNode(char);
  cursor.insertNode(node);
}

I use a helper method to find the position of a given character before removing it, and use this to restore the selection.

/*
* restore the cursor or selection placed by `mark_cursor`
*/
function restore_cursor(){
  var temp, range, start_node,
      start_offset, end_node, end_offset;

  range = document.createRange()

  temp = _find_and_remove_marker(START_RANGE_MARKER, editable_div);
  start_node = temp[0];
  start_offset = temp[1];

  temp = _find_and_remove_marker(END_RANGE_MARKER, editable_div);
  end_node = temp[0];
  end_offset = temp[1];

  range.setStart(start_node, start_offset);
  range.setEnd(end_node, end_offset);

  sel = document.getSelection();
  sel.removeAllRanges();
  sel.addRange(range);
}

/*
* Note: TreeWalker provides a more succinct and efficient way to
* search the node tree. In an attempt to minimize the number of
* APIs used, I'm doing some simple recursion to walk to tree
*/
/*
* this method finds the first instance of @marker in @root_node,
* removes it, and returns the container node and offset of the
* location being marked as a tuple
*/
function _find_and_remove_marker(marker, root_node){
  var node, i, offset, result, children;
  if (root_node.nodeValue != null){
    offset = root_node.nodeValue.indexOf(marker);
    if (offset >= 0) {
      root_node.nodeValue =
        root_node.nodeValue.substr(0, offset) +
        root_node.nodeValue.substr(offset+1);
      return [root_node, offset];
    }
  } else {
    children = root_node.childNodes;
    for (var i in children){
      node = children[i];
      result = _find_and_remove_marker(marker, node);
      if (result != null)
        return result
    }
  }
  return null
}

example6.html ( full source )

instead of contenteditable="true", contenteditable=typing may be the right choice, but I haven't played with it enough.
undo is broken in these examples. There is a thing called UndoManager that can help. It would need its own post. I'm ignoring it for this example.
Things like what happens when you hit enter (newline? <br>? <p>?) are TOTALLY DIFFERENT in different browsers. I don't handle line breaks correctly in these examples.
Every user agent has it's own pile of legacy methods. The fiddle-till-it-works approach ends badly.
[selection].anchorNode is a thing. [selection].baseNode is only a thing in some user agents.

part 2 coming roughly "when I get around to writing it"

How to build rich, styled input on the web

Making rich text input for the web.

1Getting to know `contenteditable`

2Setting up a text input

2.1Restricting input to plain text

2.2Finding the cursor

Nodes, Selections, and Ranges

List of useful commands for manipulating selection ranges

2.3Handling pasted content

3Putting it together: building a rich text input area

Simple replacement code

3.1Preserving cursor position

Add calls to mark and restore the cursor

Cursor marking function

Cursor restoring function

Finished "textarea" with highlighted at-mentions

Caveats and Gotchas

4next post: adding emoji / auto-complete

How to build rich, styled input on the web

Making rich text input for the web.

1Getting to know contenteditable

2Setting up a text input

2.1Restricting input to plain text

2.2Finding the cursor

Nodes, Selections, and Ranges

List of useful commands for manipulating selection ranges

2.3Handling pasted content

3Putting it together: building a rich text input area

Simple replacement code

3.1Preserving cursor position

Add calls to mark and restore the cursor

Cursor marking function

Cursor restoring function

Finished "textarea" with highlighted at-mentions

Caveats and Gotchas

4next post: adding emoji / auto-complete

1Getting to know `contenteditable`