Markdown Parser demo annotated source

Back to index

        

A renderer for a custom flavor of markdown, that renders live, with every keystroke. I wrote the Marked component to be integrated into my productivity apps (I'm rewriting my notes and todo apps soon), but it also works well as a live editor by itself.

6

Bootstrap the required globals from Torus, since we're not bundling

8for (const exportedName in Torus) {
9    window[exportedName] = Torus[exportedName];
10}
11

Like jdom.js, this is a unique object that identifies that a reader has reached the last character/line to read. Used for parsing strings.

15const READER_END = [];

These are the regular expressions (RE) that match things like headers, images, and quotes.

18const RE_HEADER = /^(#{1,6})\s*(.*)/;
19const RE_IMAGE = /^%\s+(\S*)/;
20const RE_QUOTE = /^(>+)\s*(.*)/;
21const RE_LIST_ITEM = /^(\s*)(-|\d+\.)\s+(.*)/;
22

Delimiters for text styles. If you want the more standard flavor of markdown, you can change these these delimiters to get 90% of the way there (minus the links).

27const ITALIC_DELIMITER = '/';
28const BOLD_DELIMITER = '*';
29const STRIKE_DELIMITER = '~';
30const CODE_DELIMITER = '`';
31const LINK_DELIMITER_LEFT = '<';
32const LINK_DELIMITER_RIGHT = '>';
33const PRE_DELIMITER = '``';
34const LITERAL_DELIMITER = '%%';
35

Some text expansions / replacements I find convenient.

37const BODY_TEXT_TRANSFORMS = new Map([
38    // RegExp: replacement
39    [/--/g, '—'], // em-dash from two dashes
40    [/(\?!|!\?)/g, '‽'], // interrobang!
41    [/\$\$/g, '💵'],
42    [/:\)/g, '🙂'],
43    [/<3/g, '❤️'],
44    [/:wave:/g, '👋'],
45]);
46

This is the default input that the user sees when they first open the app. It demonstrates the basic syntax.

49const INPUT_PLACEHOLDER = `# Write some markdown!
50
51## Hash signs mark /headers/.
52
53Here's some text, with /italics/, *bold*, ~strikethrough~, and \`monospace\` styles. We can also *~/combine/~* these things for */\`more emphasis\`/*.
54
55Let's include some links. Here's one to <https://google.com/>.
56
57> Quotes.
58>> Nested quotes, like this...
59>> ... even across lines.
60
61We can include lists ...
62
63- First
64- Second
65    - Third, which is indented
66    - Fourth
67
68We can also number lists, and mix both styles.
69
701. Cal Bears
712. Purdue Boilermakers
723. every other school
73    - ???
744. Stanford... trees?
75
76
77We can include code blocks.
78
79\`\`
80#include <stdio.h>
81
82int main() {
83    printf("Two backticks denote a code block");
84
85    return 0;
86}
87\`\`
88
89To include images, prefix the URL with a percent sign:
90
91% https://www.ocf.berkeley.edu/~linuslee/pic.jpg
92
93That's it! Happy markdowning :)
94
95If you're curious about how this app works, you can check out the entire, annotated source code at <https://thesephist.github.io/torus/markdown-parser-demo>, where you'll find annotated JavaScript source files behind this and a few other apps.
96
97This renderer was built with Torus, a UI framework for the web written by Linus, for his personal suite of productivity apps. You can find more information about Torus at <https://github.com/thesephist/torus/>, and you can find Linus at <https://linus.zone/now/>.
98`;
99

A generator that yields characters from a string, used for parsing text.

101class Reader {
102
103    constructor(str) {
104        this.str = str;
105        this.idx = 0;
106    }
107
108    next() {
109        return this.str[this.idx ++] || READER_END;
110    }
111

Look ahead a character, but don't increment the position.

113    ahead() {
114        return this.str[this.idx] || READER_END;
115    }
116

Reads the string until the first occurrence of a given character.

118    until(char) {
119        const sub = this.str.substr(this.idx);
120        const nextIdx = sub.indexOf(char);
121        const part = sub.substr(char, nextIdx);
122        this.idx += nextIdx + 1;
123        return part;
124    }
125
126}
127

Like Reader, but for lines. It's used for things like parsing nested lists and block quotes.

130class LineReader {
131
132    constructor(lines) {
133        this.lines = lines;
134        this.idx = 0;
135    }
136
137    next() {
138        if (this.idx < this.lines.length) {
139            return this.lines[this.idx ++];
140        } else {
141            this.idx = this.lines.length;
142            return READER_END;
143        }
144    }
145

Decrement the counter, so next() will return the same line once again.

147    backtrack() {
148        this.idx = this.idx - 1 < 0 ? 0 : this.idx - 1;
149    }
150
151}
152

Parse "body text", which may include italics, bold text, strikethroughs, and inline code blocks. This also takes care of text expansions defined above.

155const parseBody = (reader, tag, delimiter = '') => {
156    const children = [];
157    let buf = '';

Function to "commit" the text read into the buffer as a child of body text, so we can add other elements after it.

160    const commitBuf = () => {
161        for (const re of BODY_TEXT_TRANSFORMS.keys()) {
162            buf = buf.replace(re, BODY_TEXT_TRANSFORMS.get(re));
163        }
164        children.push(buf);
165        buf = '';
166    }
167    let char;
168    let last = '';

Loop through each character. If there are delimiters, read until the end of the delimited chunk of text and parse the contents inside as the right tag.

172    while (last = char, char = reader.next()) {
173        switch (char) {

Backslash is an escape character, so anything that comes right after it is just read into the buffer.

176            case '\\':
177                buf += reader.next();
178                break;

If we find the delimiter parseBody was called with, that means we've reached the end of the delimited sequence of text we were reading from reader and must return control flow to the calling function.

182            case delimiter:
183                if (last === ' ') {
184                    buf += char;
185                } else {
186                    commitBuf();
187                    return {
188                        tag: tag,
189                        children: children,
190                    }
191                }
192                break;

If we reach the end of the body text, commit everything we've got so far and return the whole thing.

195            case READER_END:
196                commitBuf();
197                return {
198                    tag: tag,
199                    children: children,
200                }

Each of these delimiter cases check if the next character is a space. If it is, it may just be that the user is trying to type, e.g. 3 < 10 or async / await. We don't count those characters as styling delimiters. That would be annoying for the user.

205            case ITALIC_DELIMITER:
206                if (reader.ahead() === ' ') {
207                    buf += char;
208                } else {
209                    commitBuf();
210                    children.push(parseBody(reader, 'em', ITALIC_DELIMITER));
211                }
212                break;
213            case BOLD_DELIMITER:
214                if (reader.ahead() === ' ') {
215                    buf += char;
216                } else {
217                    commitBuf();
218                    children.push(parseBody(reader, 'strong', BOLD_DELIMITER));
219                }
220                break;
221            case STRIKE_DELIMITER:
222                if (reader.ahead() === ' ') {
223                    buf += char;
224                } else {
225                    commitBuf();
226                    children.push(parseBody(reader, 'strike', STRIKE_DELIMITER));
227                }
228                break;
229            case CODE_DELIMITER:
230                if (reader.ahead() === ' ') {
231                    buf += char;
232                } else {
233                    commitBuf();
234                    children.push({
235                        tag: 'code',

Rather than recursively parsing the text inside a code block, we just take it verbatim. Otherwise symbols like * and / in code have to be escaped, which would be really annoying.

239                        children: [reader.until(CODE_DELIMITER)],
240                    });
241                }
242                break;

If we find a link, we read until the end of the link and return a JDOM object that's a clickable link tag that opens in another tab.

245            case LINK_DELIMITER_LEFT:
246                if (reader.ahead() === ' ') {
247                    buf += char;
248                } else {
249                    commitBuf();
250                    const url = reader.until(LINK_DELIMITER_RIGHT);
251                    children.push({
252                        tag: 'a',
253                        attrs: {
254                            href: url || '#',
255                            rel: 'noopener',
256                            target: '_blank',
257                        },
258                        children: [url],
259                    });
260                }
261                break;

If none of the special cases matched, just add the character to the buffer we're reading to.

264            default:
265                buf += char;
266                break;
267        }
268    }
269
270    throw new Error('This should not happen while reading body text!');
271}
272

Given a reader of lines, parse (potentially) nested lists recursively.

274const parseList = lineReader => {
275    const children = [];
276

We check out the first line in the sequence to determine how far indented we are, and what kind of list (number, bullet) it is.

280    let line = lineReader.next();
281    const [_, indent, prefix] = RE_LIST_ITEM.exec(line);
282    const tag = prefix === '-' ? 'ul' : 'ol';
283    const indentLevel = indent.length;
284    lineReader.backtrack();
285

Loop through the next few lines from the reader.

287    while ((line = lineReader.next()) !== READER_END) {
288        const [_, _indent, prefix] = RE_LIST_ITEM.exec(line) || [];

If there's a valid list item prefix, we count it as a list item.

290        if (prefix) {

We compare the indentation level of this line, versus the first line in the list.

293            const thisIndentLevel = line.indexOf(prefix);

If it's indented less, we've stumbled upon the end of the list section. Backtrack and return control to the parent list or block.

297            if (thisIndentLevel < indentLevel) {
298                lineReader.backtrack();
299                return {
300                    tag: tag,
301                    children: children,
302                }

If it's the same indentation, treat it as the next item in the list. Parse the list content as body text, and add it to the list of children.

305            } else if (thisIndentLevel === indentLevel) {
306                const body = line.match(/\s*(?:\d+\.|-)\s*(.*)/)[1];
307                children.push(parseBody(new Reader(body), 'li'));

If this line is indented farther than the first line, that means it's the start of a further-nested list. Call parseList recursively, and add the returned list as a child.

312            } else { // thisIndentLevel > indentLevel
313                lineReader.backtrack();
314                children.push(parseList(lineReader));
315            }

If there's no valid list item prefix, it's the end of the list.

317        } else {
318            lineReader.backtrack();
319            return {
320                tag: tag,
321                children: children,
322            }
323        }
324    }
325    return {
326        tag: tag,
327        children: children,
328    }
329}
330

Like parseList, but for nested block quotes.

332const parseQuote = lineReader => {
333    const children = [];
334

Look ahead at the first line to determine how far nested we are.

336    let line = lineReader.next();
337    const [_, nestCount] = RE_QUOTE.exec(line);
338    const nestLevel = nestCount.length;
339    lineReader.backtrack();
340

Loop through each line in the block quote.

342    while ((line = lineReader.next()) !== READER_END) {
343        const [_, nestCount, quoteText] = RE_QUOTE.exec(line) || [];

If we're able to find a line matching the block quote regex, count it as another line in the block.

346        if (quoteText !== undefined) {
347            const thisNestLevel = nestCount.length;

If this line is nested less than the first line, it's the end of this block quote. Return control to the parent block quote.

351            if (thisNestLevel < nestLevel) {
352                lineReader.backtrack();
353                return {
354                    tag: 'q',
355                    children: children,
356                }

If this line is indented same as the first line, continue reading the quote.

359            } else if (thisNestLevel === nestLevel) {
360                children.push(parseBody(new Reader(quoteText), 'p'));

If this line is indented further in, it's the start of another nested quote block. Call itself recursively.

363            } else { // thisNestLevel > nestLevel
364                lineReader.backtrack();
365                children.push(parseQuote(lineReader));
366            }

If the line didn't match the block quote regex, it's the end of the block quote, so return what we have.

369        } else {
370            lineReader.backtrack();
371            return {
372                tag: 'q',
373                children: children,
374            }
375        }
376    }
377    return {
378        tag: 'q',
379        children: children,
380    }
381}
382

Main Torus function component for the parser. This component takes a string input, parses it into JDOM (HTML elements), and returns it in a <div>.

386const Markus = str => {
387

Make a new line reader that we'll pass to functions to read the input.

389    const lineReader = new LineReader(str.split('\n'));
390

Various parsing state registers.

392    let inCodeBlock = false;
393    let codeBlockResult = '';
394    let inLiteralBlock = false;
395    let literalBlockResult = '';
396    const result = [];
397
398    let line;
399    while ((line = lineReader.next()) !== READER_END) {

If we're in a code block, don't do more parsing and add the line directly to the code block

402        if (inCodeBlock) {
403            if (line === PRE_DELIMITER) {
404                result.push({
405                    tag: 'pre',
406                    children: [codeBlockResult],
407                });
408                inCodeBlock = false;
409                codeBlockResult = '';
410            } else {
411                if (!codeBlockResult) {
412                    codeBlockResult = line.trimStart() + '\n';
413                } else {
414                    codeBlockResult += line + '\n';
415                }
416            }

... likewise for literal HTML blocks.

418        } else if (inLiteralBlock) {
419            if (line === LITERAL_DELIMITER) {
420                const wrapper = document.createElement('div');
421                wrapper.innerHTML = literalBlockResult;
422                result.push(wrapper);
423                inLiteralBlock = false;
424                literalBlockResult = '';
425            } else {
426                literalBlockResult += line;
427            }

If the line starts with a hash sign, it's a header! Parse it as such.

429        } else if (line.startsWith('#')) {
430            const [_, hashes, header] = RE_HEADER.exec(line);

The HTML tag is 'h' followed by the number of # signs.

432            result.push(parseBody(new Reader(header), 'h' + hashes.length));

If the line matches the image line format, parse the URL out of the line and add a link that wraps the image, so it's clickable in the final result HTML.

436        } else if (RE_IMAGE.exec(line)) {
437            const [_, imageURL] = RE_IMAGE.exec(line);
438            result.push({
439                tag: 'a',
440                attrs: {
441                    href: imageURL || '#',
442                    rel: 'noopener',
443                    target: '_blank',
444                    style: {cursor: 'pointer'},
445                },
446                children: [{
447                    tag: 'img',
448                    attrs: {
449                        src: imageURL,
450                        style: {maxWidth: '100%'},
451                    },
452                }],
453            });

If the line matches a block quote format, backtrack and send the control off to the block quote parser, including the line we just read.

457        } else if (RE_QUOTE.exec(line)) {
458            lineReader.backtrack();
459            result.push(parseQuote(lineReader));

Detect horizontal dividers and handle it.

461        } else if (line === '- -') {
462            result.push({tag: 'hr'});

Detect start of a code block

464        } else if (line === PRE_DELIMITER) {
465            inCodeBlock = true;

Detect start of a literal HTML block

467        } else if (line === LITERAL_DELIMITER) {
468            inLiteralBlock = true;

Detect list formats (numbered, bullet) and if they're found, send the control flow off to the list parsing function.

472        } else if (RE_LIST_ITEM.exec(line)) {
473            lineReader.backtrack();
474            result.push(parseList(lineReader));

If none of the above match, it's a plain old boring paragraph. Read the line as a paragraph body.

477        } else {
478            result.push(parseBody(new Reader(line), 'p'));
479        }
480    }
481

Return the array of children wrapped in a <div>, with some padding at the bottom so it's freely scrollable during editing.

484    return jdom`<div class="render" style="padding-bottom:75vh">${result}</div>`;
485}
486

Editor view modes

488const MODE = {

0 -> two-up, preview and editor; default

490    BOTH: 0,

1 -> editor only

492    EDITOR: 1,

2 -> preview only

494    PREVIEW: 2,
495}
496

The app component wraps the entire application and handles state.

498class App extends StyledComponent {
499
500    init() {

We start with the default two-up view

502        this.mode = MODE.BOTH;

Temporary state used to store whether the save button should show the saving state indicator ("saved")

505        this.showSavedIndicator = false;
506

If we've previously saved the user input, pull that back out. Otherwise, use the default placeholder.

509        this.inputValue = window.localStorage.getItem('markusInput') || INPUT_PLACEHOLDER;
510

Bind a few methods we're using to handle input.

512        this.handleInput = this.handleInput.bind(this);
513        this.handleKeydown = this.handleKeydown.bind(this);
514        this.handleToggleMode = this.handleToggleMode.bind(this);
515        this.handleSave = this.save.bind(this, {showIndicator: true});
516

Before the user leaves the site, we want to save the user input to local storage so we can pull it back out later when the user visits the site again.

519        window.addEventListener('beforeunload',
520            this.save.bind(this, {showIndicator: false}));
521    }
522

Callback to save current editor buffer contents to localStorage. This method takes an option to give feedback to the user, which is set to false if saving on the onbeforeunload event.

526    save({showIndicator} = {}) {
527        if (showIndicator) {
528            this.showSavedIndicator = true;
529            setTimeout(() => {
530                this.showSavedIndicator = false;
531                this.render();
532            }, 1000);
533            this.render();
534        }
535        window.localStorage.setItem('markusInput', this.inputValue);
536    }
537
538    styles() {
539        return css`
540        box-sizing: border-box;
541        font-family: system-ui, sans-serif;
542        display: flex;
543        flex-direction: column;
544        justify-content: space-between;
545        align-items: flex-start;
546        height: 100vh;
547        width: 100%;
548        max-width: 1600px;
549        margin: 0 auto;
550        overflow: hidden;
551
552        header {
553            padding: 20px 18px 0 18px;
554            box-sizing: border-box;
555            width: 100%;
556            display: flex;
557            flex-direction: row;
558            align-items: center;
559            justify-content: space-between;
560        }
561        .buttonGroup {
562            display: flex;
563            flex-direction: row;
564            align-items: center;
565            justify-content: space-between;
566
567            button {
568                margin: 0 6px;
569                padding: 6px 10px;
570                font-size: 1em;
571                border-radius: 4px;
572                background: #fff;
573                box-shadow: 0 3px 8px -1px rgba(0, 0, 0, .3);
574                border: 0;
575                cursor: pointer;
576                &:hover {
577                    opacity: .7;
578                }
579            }
580        }
581        .title {
582            margin: 0;
583            font-weight: normal;
584            color: #888;
585            .dark {
586                color: #000;
587            }
588        }
589        .renderContainer {
590            display: flex;
591            flex-direction: row;
592            justify-content: space-between;
593            align-items: flex-start;
594            height: calc(100% - 60px);
595            width: 100%;
596            padding: 16px;
597            box-sizing: border-box;
598        }
599        .half {
600            width: calc(50% - 8px);
601        }
602        .full {
603            width: calc(100% - 8px);
604        }
605        .half, .full {
606            height: 100%;
607            box-sizing: border-box;
608        }
609        .render, textarea {
610            box-sizing: border-box;
611            border: 0;
612            box-shadow: 0 3px 8px -1px rgba(0, 0, 0, .3);
613            padding: 12px;
614            border-radius: 6px;
615            background: #fff;
616            height: 100%;
617            -webkit-overflow-scrolling: touch;
618            overflow-y: auto;
619        }
620        textarea {
621            font-family: 'Fira Code', 'Menlo', 'Monaco', monospace;
622            width: 100%;
623            resize: none;
624            font-size: 14px;
625            outline: none;
626            color: #999;
627            line-height: 1.5em;
628            &:focus {
629                color: #000;
630            }
631        }
632        .result {
633            height: 100%;
634        }
635        .render {
636            p, pre, code {
637                line-height: 1.5em;
638            }
639            li {
640                margin-bottom: 6px;
641            }
642            pre {
643                padding: 8px;
644                overflow-x: auto;
645            }
646            code {
647                padding: 1px 5px;
648                margin: 0 2px;
649            }
650            pre, code {
651                font-family: 'Menlo', 'Monaco', monospace;
652                background: #eee;
653                border-radius: 4px;
654            }
655            q {
656                &::before, &::after {
657                    content: '';
658                }
659                display: block;
660                border-left: 3px solid #777;
661                padding-left: 6px;
662            }
663        }
664        `;
665    }
666

When the input changes, set the new local state and queue up another render using requestAnimationFrame, to be efficient with when we render (not necessarily now, just before the next frame).

671    handleInput(evt) {
672        this.inputValue = evt.target.value;
673        requestAnimationFrame(() => this.render());
674    }
675

This is a way to make sure TAB keys can be used to enter four spaces (yay spaces instead of tabs!) instead of tab to the next input on the page. This makes the textarea behave like a text editor, allowing you to indent with tab.

680    handleKeydown(evt) {
681        if (evt.key === 'Tab') {
682            evt.preventDefault();
683            const idx = evt.target.selectionStart;
684            if (idx !== null) {
685                const front = this.inputValue.substr(0, idx);
686                const back = this.inputValue.substr(idx);
687                this.inputValue = front + '    ' + back;
688                this.render();

Rendering the new input value will make us lose focus on the textarea, so we put the focus back by selecting the area the user was just editing.

692                evt.target.setSelectionRange(idx + 4, idx + 4);
693            }
694        }
695    }
696
697    handleToggleMode() {

Increment mode counter to stay within [0, 2] range.

699        this.mode = ++ this.mode % 3;
700        this.render();
701    }
702
703    compose() {
704        let modeView = null; // unreachable
705

Provide the correct mode view to the app shell depending on the chosen view.

708        switch (this.mode) {
709            case MODE.EDITOR:
710                modeView = jdom`<div class="full result">
711                    ${Markus(this.inputValue)}
712                </div>`;
713                break;
714            case MODE.PREVIEW:
715                modeView = jdom`<div class="full markdown">
716                    <textarea autofocus value="${this.inputValue}" oninput="${this.handleInput}"
717                        placeholder="Start writing ..." onkeydown="${this.handleKeydown}" />
718                </div>`;
719                break;
720            default:
721                modeView = [
722                    jdom`<div class="half result">
723                        ${Markus(this.inputValue)}
724                    </div>`,
725                    jdom`<div class="half markdown">
726                        <textarea autofocus value="${this.inputValue}" oninput="${this.handleInput}"
727                            placeholder="Start writing ..." onkeydown="${this.handleKeydown}" />
728                    </div>`,
729                ];
730                break;
731        }
732
733        return jdom`<main>
734            <header>
735                <h1 class="title">
736                    <span class="dark">Markus</span>, a live markdown editor
737                </h1>
738                <div class="buttonGroup">
739                    <button class="saveButton" onclick="${this.handleSave}">
740                        ${this.showSavedIndicator ? 'saved' : 'save'}
741                    </button>
742                    <button class="showRenderedToggle" onclick="${this.handleToggleMode}">
743                        mode ++
744                    </button>
745                </div>
746            </header>
747            <div class="renderContainer">${modeView}</div>
748        </main>`;
749    }
750
751}
752

Create an instance of the app and mount it to the page DOM.

754const app = new App();
755document.body.appendChild(app.node);

Basic grey background and reset of the default margin on <body>

757document.body.style.backgroundColor = '#f8f8f8';
758document.body.style.margin = '0';
759