Markdown Parser demo annotated source
Back to indexA renderer for a custom flavor of markdown, that renders
live, with every keystroke. I wrote the Marked
component
to be integrated into my productivity apps (I'm rewriting my
notes and todo apps soon), but it also works well as a live
editor by itself.
6
Bootstrap the required globals from Torus, since we're not bundling
8for (const exportedName in Torus) {
9 window[exportedName] = Torus[exportedName];
10}
11
Like jdom.js
, this is a unique object that identifies
that a reader has reached the last character/line to read. Used
for parsing strings.
15const READER_END = [];
These are the regular expressions (RE
) that match things
like headers, images, and quotes.
18const RE_HEADER = /^(#{1,6})\s*(.*)/;
19const RE_IMAGE = /^%\s+(\S*)/;
20const RE_QUOTE = /^(>+)\s*(.*)/;
21const RE_LIST_ITEM = /^(\s*)(-|\d+\.)\s+(.*)/;
22
Delimiters for text styles. If you want the more standard flavor of markdown, you can change these these delimiters to get 90% of the way there (minus the links).
27const ITALIC_DELIMITER = '/';
28const BOLD_DELIMITER = '*';
29const STRIKE_DELIMITER = '~';
30const CODE_DELIMITER = '`';
31const LINK_DELIMITER_LEFT = '<';
32const LINK_DELIMITER_RIGHT = '>';
33const PRE_DELIMITER = '``';
34const LITERAL_DELIMITER = '%%';
35
Some text expansions / replacements I find convenient.
37const BODY_TEXT_TRANSFORMS = new Map([
38 // RegExp: replacement
39 [/--/g, '—'], // em-dash from two dashes
40 [/(\?!|!\?)/g, '‽'], // interrobang!
41 [/\$\$/g, '💵'],
42 [/:\)/g, '🙂'],
43 [/<3/g, '❤️'],
44 [/:wave:/g, '👋'],
45]);
46
This is the default input that the user sees when they first open the app. It demonstrates the basic syntax.
49const INPUT_PLACEHOLDER = `# Write some markdown!
50
51## Hash signs mark /headers/.
52
53Here's some text, with /italics/, *bold*, ~strikethrough~, and \`monospace\` styles. We can also *~/combine/~* these things for */\`more emphasis\`/*.
54
55Let's include some links. Here's one to <https://google.com/>.
56
57> Quotes.
58>> Nested quotes, like this...
59>> ... even across lines.
60
61We can include lists ...
62
63- First
64- Second
65 - Third, which is indented
66 - Fourth
67
68We can also number lists, and mix both styles.
69
701. Cal Bears
712. Purdue Boilermakers
723. every other school
73 - ???
744. Stanford... trees?
75
76
77We can include code blocks.
78
79\`\`
80#include <stdio.h>
81
82int main() {
83 printf("Two backticks denote a code block");
84
85 return 0;
86}
87\`\`
88
89To include images, prefix the URL with a percent sign:
90
91% https://www.ocf.berkeley.edu/~linuslee/pic.jpg
92
93That's it! Happy markdowning :)
94
95If you're curious about how this app works, you can check out the entire, annotated source code at <https://thesephist.github.io/torus/markdown-parser-demo>, where you'll find annotated JavaScript source files behind this and a few other apps.
96
97This renderer was built with Torus, a UI framework for the web written by Linus, for his personal suite of productivity apps. You can find more information about Torus at <https://github.com/thesephist/torus/>, and you can find Linus at <https://linus.zone/now/>.
98`;
99
A generator that yields characters from a string, used for parsing text.
101class Reader {
102
103 constructor(str) {
104 this.str = str;
105 this.idx = 0;
106 }
107
108 next() {
109 return this.str[this.idx ++] || READER_END;
110 }
111
Look ahead a character, but don't increment the position.
113 ahead() {
114 return this.str[this.idx] || READER_END;
115 }
116
Reads the string until the first occurrence of a given character.
118 until(char) {
119 const sub = this.str.substr(this.idx);
120 const nextIdx = sub.indexOf(char);
121 const part = sub.substr(char, nextIdx);
122 this.idx += nextIdx + 1;
123 return part;
124 }
125
126}
127
Like Reader
, but for lines. It's used for things like parsing nested lists
and block quotes.
130class LineReader {
131
132 constructor(lines) {
133 this.lines = lines;
134 this.idx = 0;
135 }
136
137 next() {
138 if (this.idx < this.lines.length) {
139 return this.lines[this.idx ++];
140 } else {
141 this.idx = this.lines.length;
142 return READER_END;
143 }
144 }
145
Decrement the counter, so next()
will return the same line once again.
147 backtrack() {
148 this.idx = this.idx - 1 < 0 ? 0 : this.idx - 1;
149 }
150
151}
152
Parse "body text", which may include italics, bold text, strikethroughs, and inline code blocks. This also takes care of text expansions defined above.
155const parseBody = (reader, tag, delimiter = '') => {
156 const children = [];
157 let buf = '';
Function to "commit" the text read into the buffer as a child of body text, so we can add other elements after it.
160 const commitBuf = () => {
161 for (const re of BODY_TEXT_TRANSFORMS.keys()) {
162 buf = buf.replace(re, BODY_TEXT_TRANSFORMS.get(re));
163 }
164 children.push(buf);
165 buf = '';
166 }
167 let char;
168 let last = '';
Loop through each character. If there are delimiters, read until the end of the delimited chunk of text and parse the contents inside as the right tag.
172 while (last = char, char = reader.next()) {
173 switch (char) {
Backslash is an escape character, so anything that comes right after it is just read into the buffer.
176 case '\\':
177 buf += reader.next();
178 break;
If we find the delimiter parseBody
was called with, that means
we've reached the end of the delimited sequence of text we were
reading from reader
and must return control flow to the calling function.
182 case delimiter:
183 if (last === ' ') {
184 buf += char;
185 } else {
186 commitBuf();
187 return {
188 tag: tag,
189 children: children,
190 }
191 }
192 break;
If we reach the end of the body text, commit everything we've got so far and return the whole thing.
195 case READER_END:
196 commitBuf();
197 return {
198 tag: tag,
199 children: children,
200 }
Each of these delimiter cases check if the next character is a space. If it is, it may just be that the user is trying to type, e.g. 3 < 10 or async / await. We don't count those characters as styling delimiters. That would be annoying for the user.
205 case ITALIC_DELIMITER:
206 if (reader.ahead() === ' ') {
207 buf += char;
208 } else {
209 commitBuf();
210 children.push(parseBody(reader, 'em', ITALIC_DELIMITER));
211 }
212 break;
213 case BOLD_DELIMITER:
214 if (reader.ahead() === ' ') {
215 buf += char;
216 } else {
217 commitBuf();
218 children.push(parseBody(reader, 'strong', BOLD_DELIMITER));
219 }
220 break;
221 case STRIKE_DELIMITER:
222 if (reader.ahead() === ' ') {
223 buf += char;
224 } else {
225 commitBuf();
226 children.push(parseBody(reader, 'strike', STRIKE_DELIMITER));
227 }
228 break;
229 case CODE_DELIMITER:
230 if (reader.ahead() === ' ') {
231 buf += char;
232 } else {
233 commitBuf();
234 children.push({
235 tag: 'code',
Rather than recursively parsing the text inside a code block, we just take it verbatim. Otherwise symbols like * and / in code have to be escaped, which would be really annoying.
239 children: [reader.until(CODE_DELIMITER)],
240 });
241 }
242 break;
If we find a link, we read until the end of the link and return a JDOM object that's a clickable link tag that opens in another tab.
245 case LINK_DELIMITER_LEFT:
246 if (reader.ahead() === ' ') {
247 buf += char;
248 } else {
249 commitBuf();
250 const url = reader.until(LINK_DELIMITER_RIGHT);
251 children.push({
252 tag: 'a',
253 attrs: {
254 href: url || '#',
255 rel: 'noopener',
256 target: '_blank',
257 },
258 children: [url],
259 });
260 }
261 break;
If none of the special cases matched, just add the character to the buffer we're reading to.
264 default:
265 buf += char;
266 break;
267 }
268 }
269
270 throw new Error('This should not happen while reading body text!');
271}
272
Given a reader of lines, parse (potentially) nested lists recursively.
274const parseList = lineReader => {
275 const children = [];
276
We check out the first line in the sequence to determine how far indented we are, and what kind of list (number, bullet) it is.
280 let line = lineReader.next();
281 const [_, indent, prefix] = RE_LIST_ITEM.exec(line);
282 const tag = prefix === '-' ? 'ul' : 'ol';
283 const indentLevel = indent.length;
284 lineReader.backtrack();
285
Loop through the next few lines from the reader.
287 while ((line = lineReader.next()) !== READER_END) {
288 const [_, _indent, prefix] = RE_LIST_ITEM.exec(line) || [];
If there's a valid list item prefix, we count it as a list item.
290 if (prefix) {
We compare the indentation level of this line, versus the first line in the list.
293 const thisIndentLevel = line.indexOf(prefix);
If it's indented less, we've stumbled upon the end of the list section. Backtrack and return control to the parent list or block.
297 if (thisIndentLevel < indentLevel) {
298 lineReader.backtrack();
299 return {
300 tag: tag,
301 children: children,
302 }
If it's the same indentation, treat it as the next item in the list. Parse the list content as body text, and add it to the list of children.
305 } else if (thisIndentLevel === indentLevel) {
306 const body = line.match(/\s*(?:\d+\.|-)\s*(.*)/)[1];
307 children.push(parseBody(new Reader(body), 'li'));
If this line is indented farther than the first line,
that means it's the start of a further-nested list.
Call parseList
recursively, and add the returned list
as a child.
312 } else { // thisIndentLevel > indentLevel
313 lineReader.backtrack();
314 children.push(parseList(lineReader));
315 }
If there's no valid list item prefix, it's the end of the list.
317 } else {
318 lineReader.backtrack();
319 return {
320 tag: tag,
321 children: children,
322 }
323 }
324 }
325 return {
326 tag: tag,
327 children: children,
328 }
329}
330
Like parseList
, but for nested block quotes.
332const parseQuote = lineReader => {
333 const children = [];
334
Look ahead at the first line to determine how far nested we are.
336 let line = lineReader.next();
337 const [_, nestCount] = RE_QUOTE.exec(line);
338 const nestLevel = nestCount.length;
339 lineReader.backtrack();
340
Loop through each line in the block quote.
342 while ((line = lineReader.next()) !== READER_END) {
343 const [_, nestCount, quoteText] = RE_QUOTE.exec(line) || [];
If we're able to find a line matching the block quote regex, count it as another line in the block.
346 if (quoteText !== undefined) {
347 const thisNestLevel = nestCount.length;
If this line is nested less than the first line, it's the end of this block quote. Return control to the parent block quote.
351 if (thisNestLevel < nestLevel) {
352 lineReader.backtrack();
353 return {
354 tag: 'q',
355 children: children,
356 }
If this line is indented same as the first line, continue reading the quote.
359 } else if (thisNestLevel === nestLevel) {
360 children.push(parseBody(new Reader(quoteText), 'p'));
If this line is indented further in, it's the start of another nested quote block. Call itself recursively.
363 } else { // thisNestLevel > nestLevel
364 lineReader.backtrack();
365 children.push(parseQuote(lineReader));
366 }
If the line didn't match the block quote regex, it's the end of the block quote, so return what we have.
369 } else {
370 lineReader.backtrack();
371 return {
372 tag: 'q',
373 children: children,
374 }
375 }
376 }
377 return {
378 tag: 'q',
379 children: children,
380 }
381}
382
Main Torus function component for the parser. This component takes
a string input, parses it into JDOM (HTML elements), and returns it
in a <div>
.
386const Markus = str => {
387
Make a new line reader that we'll pass to functions to read the input.
389 const lineReader = new LineReader(str.split('\n'));
390
Various parsing state registers.
392 let inCodeBlock = false;
393 let codeBlockResult = '';
394 let inLiteralBlock = false;
395 let literalBlockResult = '';
396 const result = [];
397
398 let line;
399 while ((line = lineReader.next()) !== READER_END) {
If we're in a code block, don't do more parsing and add the line directly to the code block
402 if (inCodeBlock) {
403 if (line === PRE_DELIMITER) {
404 result.push({
405 tag: 'pre',
406 children: [codeBlockResult],
407 });
408 inCodeBlock = false;
409 codeBlockResult = '';
410 } else {
411 if (!codeBlockResult) {
412 codeBlockResult = line.trimStart() + '\n';
413 } else {
414 codeBlockResult += line + '\n';
415 }
416 }
... likewise for literal HTML blocks.
418 } else if (inLiteralBlock) {
419 if (line === LITERAL_DELIMITER) {
420 const wrapper = document.createElement('div');
421 wrapper.innerHTML = literalBlockResult;
422 result.push(wrapper);
423 inLiteralBlock = false;
424 literalBlockResult = '';
425 } else {
426 literalBlockResult += line;
427 }
If the line starts with a hash sign, it's a header! Parse it as such.
429 } else if (line.startsWith('#')) {
430 const [_, hashes, header] = RE_HEADER.exec(line);
The HTML tag is 'h'
followed by the number of #
signs.
432 result.push(parseBody(new Reader(header), 'h' + hashes.length));
If the line matches the image line format, parse the URL out of the line and add a link that wraps the image, so it's clickable in the final result HTML.
436 } else if (RE_IMAGE.exec(line)) {
437 const [_, imageURL] = RE_IMAGE.exec(line);
438 result.push({
439 tag: 'a',
440 attrs: {
441 href: imageURL || '#',
442 rel: 'noopener',
443 target: '_blank',
444 style: {cursor: 'pointer'},
445 },
446 children: [{
447 tag: 'img',
448 attrs: {
449 src: imageURL,
450 style: {maxWidth: '100%'},
451 },
452 }],
453 });
If the line matches a block quote format, backtrack and send the control off to the block quote parser, including the line we just read.
457 } else if (RE_QUOTE.exec(line)) {
458 lineReader.backtrack();
459 result.push(parseQuote(lineReader));
Detect horizontal dividers and handle it.
461 } else if (line === '- -') {
462 result.push({tag: 'hr'});
Detect start of a code block
464 } else if (line === PRE_DELIMITER) {
465 inCodeBlock = true;
Detect start of a literal HTML block
467 } else if (line === LITERAL_DELIMITER) {
468 inLiteralBlock = true;
Detect list formats (numbered, bullet) and if they're found, send the control flow off to the list parsing function.
472 } else if (RE_LIST_ITEM.exec(line)) {
473 lineReader.backtrack();
474 result.push(parseList(lineReader));
If none of the above match, it's a plain old boring paragraph. Read the line as a paragraph body.
477 } else {
478 result.push(parseBody(new Reader(line), 'p'));
479 }
480 }
481
Return the array of children wrapped in a <div>
, with some padding
at the bottom so it's freely scrollable during editing.
484 return jdom`<div class="render" style="padding-bottom:75vh">${result}</div>`;
485}
486
Editor view modes
488const MODE = {
0 -> two-up, preview and editor; default
490 BOTH: 0,
1 -> editor only
492 EDITOR: 1,
2 -> preview only
494 PREVIEW: 2,
495}
496
The app component wraps the entire application and handles state.
498class App extends StyledComponent {
499
500 init() {
We start with the default two-up view
502 this.mode = MODE.BOTH;
Temporary state used to store whether the save button should show the saving state indicator ("saved")
505 this.showSavedIndicator = false;
506
If we've previously saved the user input, pull that back out. Otherwise, use the default placeholder.
509 this.inputValue = window.localStorage.getItem('markusInput') || INPUT_PLACEHOLDER;
510
Bind a few methods we're using to handle input.
512 this.handleInput = this.handleInput.bind(this);
513 this.handleKeydown = this.handleKeydown.bind(this);
514 this.handleToggleMode = this.handleToggleMode.bind(this);
515 this.handleSave = this.save.bind(this, {showIndicator: true});
516
Before the user leaves the site, we want to save the user input to local storage so we can pull it back out later when the user visits the site again.
519 window.addEventListener('beforeunload',
520 this.save.bind(this, {showIndicator: false}));
521 }
522
Callback to save current editor buffer contents to localStorage
.
This method takes an option to give feedback to the user, which is
set to false if saving on the onbeforeunload event.
526 save({showIndicator} = {}) {
527 if (showIndicator) {
528 this.showSavedIndicator = true;
529 setTimeout(() => {
530 this.showSavedIndicator = false;
531 this.render();
532 }, 1000);
533 this.render();
534 }
535 window.localStorage.setItem('markusInput', this.inputValue);
536 }
537
538 styles() {
539 return css`
540 box-sizing: border-box;
541 font-family: system-ui, sans-serif;
542 display: flex;
543 flex-direction: column;
544 justify-content: space-between;
545 align-items: flex-start;
546 height: 100vh;
547 width: 100%;
548 max-width: 1600px;
549 margin: 0 auto;
550 overflow: hidden;
551
552 header {
553 padding: 20px 18px 0 18px;
554 box-sizing: border-box;
555 width: 100%;
556 display: flex;
557 flex-direction: row;
558 align-items: center;
559 justify-content: space-between;
560 }
561 .buttonGroup {
562 display: flex;
563 flex-direction: row;
564 align-items: center;
565 justify-content: space-between;
566
567 button {
568 margin: 0 6px;
569 padding: 6px 10px;
570 font-size: 1em;
571 border-radius: 4px;
572 background: #fff;
573 box-shadow: 0 3px 8px -1px rgba(0, 0, 0, .3);
574 border: 0;
575 cursor: pointer;
576 &:hover {
577 opacity: .7;
578 }
579 }
580 }
581 .title {
582 margin: 0;
583 font-weight: normal;
584 color: #888;
585 .dark {
586 color: #000;
587 }
588 }
589 .renderContainer {
590 display: flex;
591 flex-direction: row;
592 justify-content: space-between;
593 align-items: flex-start;
594 height: calc(100% - 60px);
595 width: 100%;
596 padding: 16px;
597 box-sizing: border-box;
598 }
599 .half {
600 width: calc(50% - 8px);
601 }
602 .full {
603 width: calc(100% - 8px);
604 }
605 .half, .full {
606 height: 100%;
607 box-sizing: border-box;
608 }
609 .render, textarea {
610 box-sizing: border-box;
611 border: 0;
612 box-shadow: 0 3px 8px -1px rgba(0, 0, 0, .3);
613 padding: 12px;
614 border-radius: 6px;
615 background: #fff;
616 height: 100%;
617 -webkit-overflow-scrolling: touch;
618 overflow-y: auto;
619 }
620 textarea {
621 font-family: 'Fira Code', 'Menlo', 'Monaco', monospace;
622 width: 100%;
623 resize: none;
624 font-size: 14px;
625 outline: none;
626 color: #999;
627 line-height: 1.5em;
628 &:focus {
629 color: #000;
630 }
631 }
632 .result {
633 height: 100%;
634 }
635 .render {
636 p, pre, code {
637 line-height: 1.5em;
638 }
639 li {
640 margin-bottom: 6px;
641 }
642 pre {
643 padding: 8px;
644 overflow-x: auto;
645 }
646 code {
647 padding: 1px 5px;
648 margin: 0 2px;
649 }
650 pre, code {
651 font-family: 'Menlo', 'Monaco', monospace;
652 background: #eee;
653 border-radius: 4px;
654 }
655 q {
656 &::before, &::after {
657 content: '';
658 }
659 display: block;
660 border-left: 3px solid #777;
661 padding-left: 6px;
662 }
663 }
664 `;
665 }
666
When the input changes, set the new local state
and queue up another render using requestAnimationFrame
, to be
efficient with when we render (not necessarily now, just before
the next frame).
671 handleInput(evt) {
672 this.inputValue = evt.target.value;
673 requestAnimationFrame(() => this.render());
674 }
675
This is a way to make sure TAB
keys can be used
to enter four spaces (yay spaces instead of tabs!) instead
of tab to the next input on the page. This makes the textarea
behave like a text editor, allowing you to indent with tab.
680 handleKeydown(evt) {
681 if (evt.key === 'Tab') {
682 evt.preventDefault();
683 const idx = evt.target.selectionStart;
684 if (idx !== null) {
685 const front = this.inputValue.substr(0, idx);
686 const back = this.inputValue.substr(idx);
687 this.inputValue = front + ' ' + back;
688 this.render();
Rendering the new input value will make us lose focus on the textarea, so we put the focus back by selecting the area the user was just editing.
692 evt.target.setSelectionRange(idx + 4, idx + 4);
693 }
694 }
695 }
696
697 handleToggleMode() {
Increment mode counter to stay within [0, 2] range.
699 this.mode = ++ this.mode % 3;
700 this.render();
701 }
702
703 compose() {
704 let modeView = null; // unreachable
705
Provide the correct mode view to the app shell depending on the chosen view.
708 switch (this.mode) {
709 case MODE.EDITOR:
710 modeView = jdom`<div class="full result">
711 ${Markus(this.inputValue)}
712 </div>`;
713 break;
714 case MODE.PREVIEW:
715 modeView = jdom`<div class="full markdown">
716 <textarea autofocus value="${this.inputValue}" oninput="${this.handleInput}"
717 placeholder="Start writing ..." onkeydown="${this.handleKeydown}" />
718 </div>`;
719 break;
720 default:
721 modeView = [
722 jdom`<div class="half result">
723 ${Markus(this.inputValue)}
724 </div>`,
725 jdom`<div class="half markdown">
726 <textarea autofocus value="${this.inputValue}" oninput="${this.handleInput}"
727 placeholder="Start writing ..." onkeydown="${this.handleKeydown}" />
728 </div>`,
729 ];
730 break;
731 }
732
733 return jdom`<main>
734 <header>
735 <h1 class="title">
736 <span class="dark">Markus</span>, a live markdown editor
737 </h1>
738 <div class="buttonGroup">
739 <button class="saveButton" onclick="${this.handleSave}">
740 ${this.showSavedIndicator ? 'saved' : 'save'}
741 </button>
742 <button class="showRenderedToggle" onclick="${this.handleToggleMode}">
743 mode ++
744 </button>
745 </div>
746 </header>
747 <div class="renderContainer">${modeView}</div>
748 </main>`;
749 }
750
751}
752
Create an instance of the app and mount it to the page DOM.
754const app = new App();
755document.body.appendChild(app.node);
Basic grey background and reset of the default margin on <body>
757document.body.style.backgroundColor = '#f8f8f8';
758document.body.style.margin = '0';
759