1 Crunchy details
This package includes an extensive set of unit tests. In addition to preventing regressions, these nicely illustrate the printer’s expected behavior in a variety of edge cases.
1.1 HTML particulars
Escaping special characters: Any <, > and & characters in string elements are escaped, and any symbols or integers in element position are converted to character entities:
> (display (xexpr->html5 '(p "Entities: " nbsp 65))) <p>Entities: A</p>
> (display (xexpr->html5 '(p "Escaping < > &"))) <p>Escaping < > &</p>
In attribute values, the " character is escaped in addition to <, > and & characters:
> (display (xexpr->html5 '(p [[data-desc "Escaping \" < > &"]] "Foo"))) <p data-desc="Escaping " < > &">Foo</p>
The contents of <style> and <script> tags are never escaped or wrapped; the contents of <pre> tags are escaped, but never wrapped.
> (display (xexpr->html5 '(body (style "/* No escaping! & < > \" */") (script "/* No escaping! & < > \" */") (pre "Escaping! & < > \""))))
<body>
<style>/* No escaping! & < > " */</style>
<script>/* No escaping! & < > " */</script>
<pre>Escaping! & < > "</pre>
</body>
The printer can handle XML comment and cdata elements. Comments are line-wrapped and indented like everything else. CDATA content is never modified or escaped.
> (define com (comment "Behold, a hidden comment & < >")) > (define cd (cdata #f #f "<![CDATA[Also some of this & < > ]]>"))
> (display (xexpr->html5 #:wrap 20 `(body (article (h1 "Title" ,com) (p ,cd " foo")))))
<body>
<article>
<h1>Title<!--
Behold, a hidden
comment & < >
--></h1>
<p>
<![CDATA[Also some of this & < > ]]>
foo</p>
</article>
</body>
Differences from XML/XHTML: Attributes which the HTML5 spec identifies as boolean attributes are printed using the HTML5 “short” syntax. So, for example when '(disabled "true") is supplied as an attribute, it is printed as disabled rather than disabled="" or disabled="disabled".
> (display (xexpr->html5 '(label (input [[type "checkbox"] [disabled ""]]) " Cheese"))) <label><input type="checkbox" disabled> Cheese</label>
HTML elements which cannot have content (void elements) are ended with > (rather than with /> as in XML):
> (display (xexpr->html5 '(div (img [[src "cat.webp"]]))))
<div>
<img src="cat.webp">
</div>
> (display (xexpr->html5 '(head (meta [[charset "UTF-8"]]))))
<head>
<meta charset="UTF-8">
</head>
1.2 Comparing with included Racket functions
Racket already includes a few functions for printing X-expressions in string form. These work just fine for generic XML markup; but for use as HTML content, the markup they generate can be incorrect or suboptimal.
In particular, all three of these functions will escape <, > and & characters inside <script> and <style> tags, which is likely to introduce JavaScript and CSS errors.
The xexpr->string function is the simplest. It does not offer line wrapping or indentation:
> (xexpr->string '(body (main (script "3 > 2")))) "<body><main><script>3 > 2</script></main></body>"
The display-xml/content function (in combination with xexpr->xml) offers options for indentation, but the docs warn that in HTML applications additional whitespace may be introduced. It does not support wrapping lines beyond a maximum width.
; Will render incorrectly as "Hello World" ; due to the added line break
> (display-xml/content (xexpr->xml '(body (article (p (b "Hello") (i "World"))))) #:indentation 'scan)
<body>
<article>
<p>
<b>Hello</b>
<i>World</i>
</p>
</article>
</body>
; HTML5 printer will leave lines long ; rather than add significant whitespace
> (display (xexpr->html5 #:wrap 20 '(body (article (p (b "Hello") (i "World"))))))
<body>
<article>
<p>
<b>Hello</b><i>World</i></p>
</article>
</body>
The write-xexpr function has the same shortcomings as those already mentioned, and comes with its own very odd optional line wrapping scheme: adding a line break before the closing > of every opening tag.
> (write-xexpr '(body (article (p (b "Hello") (i "World"))))) <body><article><p><b>Hello</b><i>World</i></p></article></body>
> (write-xexpr '(body (article (p (b "Hello") (i "World")))) #:insert-newlines? #t)
<body
><article
><p
><b
>Hello</b><i
>World</i></p></article></body>
1.3 Comparing with xexpr->html
The txexpr package includes xexpr->html, which correctly avoids escaping special characters inside <script> and <style> tags. Its HTML output will always be correct and faithful to the input, but since it performs no wrapping or indentation, the output can be difficult to read without additional processing.
> (define xp '(html (head (style "/* < > & */")) (body (section (h1 "Beginning")) (section (h1 "End"))))) > (display (xexpr->html xp)) <html><head><style>/* < > & */</style></head><body><section><h1>Beginning</h1></section><section><h1>End</h1></section></body></html>
> (display (xexpr->html5 xp))
<!DOCTYPE html>
<html>
<head>
<style>/* < > & */</style>
</head>
<body>
<section>
<h1>Beginning</h1>
</section>
<section>
<h1>End</h1>
</section>
</body>
</html>
1.4 Comparing with HTML Tidy
The HTML Tidy console application has been the best available tool for linting, correcting and formatting HTML markup since its creation in 1994. Its original purpose was to correct errors in HTML files written by hand in text editors.
Tidy is a much more comprehensive tool than this one and much more configurable. It always produces correctly line-wrapped and indented HTML, though this is only part of its functionality.
In terms of formatting functionality specifically, there are only a couple of significant difference betweeen Tidy and this package:
HTML Tidy still counts line width by characters rather than graphemes, so it may wrap lines earlier than necessary when they contain emoji or other multi-byte graphemes.
HTML Tidy has numerous configuration options for adjusting the output formatting and for pruning the output (such as removing empty elements that could otherwise have content). xexpr->html5 offers very few options for customizing the output, focusing instead on providing a reasonable set of defaults, and avoiding any meaningful transformation of the structure of the HTML input.
Note that MacOS ships with an old version of HTML Tidy, but it’s too old for use with modern HTML.
This package includes unit tests which compare its output against that of HTML Tidy in some cases. When tests are run (including at the time of package installation), it will search for a version of Tidy version 5.8.0 or newer, first in the HTML_TIDY_PATH environment variable, then in the current PATH; if found, these unit tests will be run normally. Otherwise, the tests will pass without any comparison actually being made.
1.5 Probing and prodding
(require html-printer/debug) | package: html-printer |
I lied at the beginning of these docs when I said this package only provides a single function. Here are a couple more, though they will only be interesting to people who really want to kick the tires.
procedure
x : xexpr? wrap : exact-positive-integer? = 20
procedure
x : xexpr? wrap : exact-positive-integer? = 20
> (proof '(p "Chaucer, Rabelais and " (em "Balzac!")))
----|----1----|----2----|----3----|
<p>Chaucer,·Rabelais¶
and·<em>Balzac!</em></p>¶
The debug function does the same thing but spits out an ungodly amount of gross logging on (current-error-port), for use in debugging the printing algorithm. (Note that all logging activity is disabled by default because of its huge performance penalty, but it gets temporarily enabled during calls to debug by way of parameterize.)
> (debug '(p "Chaucer, Rabelais and " (em "Balzac!")))
----|----1----|----2----|----3----|
<p>Chaucer,·Rabelais¶
and·<em>Balzac!</em></p>¶
html-printer: EXPR block starting… • tag:p prev-token:first
html-printer: └─ PRT indent start • col:1 indent-level:0
html-printer: └─ PRT indent end • col:1 indent-level:0
html-printer: └─ PRT put! start… • v:<p> col:1 accum-width:0 logical-line-start:#t indent-level:0
html-printer: └─ PRT put! …end • col:4 accum-width:0 logical-line-start:#f indent-level:0
html-printer: EXPR string starting… • prev-token:normal str:Chaucer, Rabelais and
html-printer: └─ PRT accum/wrap! start… • col:4 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{}
html-printer: └─ PRT accum! start… • col:4 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{}
html-printer: └─ PRT accum! …end • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}
html-printer: └─ PRT accum/wrap! …end • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}
html-printer: └─ PRT accum/wrap! start… • col:4 accum-width:8 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,"}
html-printer: └─ PRT accum! start… • col:4 accum-width:8 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"Chaucer,"}
html-printer: └─ PRT accum! …end • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}
html-printer: └─ PRT accum/wrap! …end • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}
html-printer: └─ PRT accum/wrap! start… • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}
html-printer: └─ PRT flush start… • col:4 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{#<_bp>,"Chaucer,",#<_bp>," "}
html-printer: └─ PRT lop-accum-end _ • accum-width:9 lopped-len:1 which-end:right
html-printer: └─ PRT flush at_bp • col:4 buffer-width:0 held-whsp?:#f buffer:{}
html-printer: └─ PRT flush non-whsp • held-whsp?:#f buffer:{"Chaucer,"}
html-printer: └─ PRT flush at_bp • col:4 buffer-width:8 held-whsp?:#f buffer:{"Chaucer,"}
html-printer: └─ PRT flush printbuf… • held-whsp?:#f logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:Chaucer, col:4 accum-width:8 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:12 accum-width:8 logical-line-start:#f indent-level:0
html-printer: └─ PRT flush done-breaking • col:12 accum-width:8 logical-line-start:#f accumulator:{#<_bp>,"Chaucer,",#<_bp>}
html-printer: └─ PRT accum! start… • col:12 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#f accumulator:{}
html-printer: └─ PRT accum! …end • col:12 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{" "}
html-printer: └─ PRT flush done • col:12 accum-width:1 logical-line-start:#f accumulator:{" "}
html-printer: └─ PRT accum! start… • col:12 accum-width:1 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{" "}
html-printer: └─ PRT accum! …end • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}
html-printer: └─ PRT accum/wrap! …end • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}
html-printer: └─ PRT accum/wrap! start… • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}
html-printer: └─ PRT flush start… • col:12 accum-width:9 logical-line-start:#f indent-level:0 accumulator:{" ",#<_bp>,"Rabelais"}
html-printer: └─ PRT flush whitespace • v:
html-printer: └─ PRT flush at_bp • col:12 buffer-width:1 held-whsp?:1 buffer:{}
html-printer: └─ PRT flush non-whsp • held-whsp?:1 buffer:{" ","Rabelais"}
html-printer: └─ PRT flush done-breaking • col:12 accum-width:9 logical-line-start:#f accumulator:{" ",#<_bp>,"Rabelais"}
html-printer: └─ PRT put! start… • v: col:12 accum-width:9 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:13 accum-width:9 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:Rabelais col:13 accum-width:9 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:21 accum-width:9 logical-line-start:#f indent-level:0
html-printer: └─ PRT accum! start… • col:21 accum-width:0 logical-line-start:#f indent-level:0 breakpoint-before?:#t accumulator:{}
html-printer: └─ PRT accum! …end • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}
html-printer: └─ PRT accum/wrap! …end • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}
html-printer: └─ PRT accum/wrap! start… • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}
html-printer: └─ PRT flush start… • col:21 accum-width:1 logical-line-start:#f indent-level:0 accumulator:{#<_bp>," "}
html-printer: └─ PRT lop-accum-end _ • accum-width:1 lopped-len:1 which-end:right
html-printer: └─ PRT flush at_bp • col:21 buffer-width:0 held-whsp?:#f buffer:{}
html-printer: └─ PRT break+indent! start… • col:21 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{#<_bp>}
html-printer: └─ PRT break+indent! …end • col:1 accum-width:0 logical-line-start:#t indent-level:0 accumulator:{#<_bp>}
html-printer: └─ PRT flush done-breaking • col:1 accum-width:0 logical-line-start:#t accumulator:{#<_bp>}
html-printer: └─ PRT accum! start… • col:1 accum-width:0 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{}
html-printer: └─ PRT accum! …end • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}
html-printer: └─ PRT accum/wrap! …end • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}
html-printer: └─ PRT accum/wrap! start… • col:1 accum-width:3 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and"}
html-printer: └─ PRT accum! start… • col:1 accum-width:3 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"and"}
html-printer: └─ PRT accum! …end • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}
html-printer: └─ PRT accum/wrap! …end • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}
html-printer: EXPR string end • last-word:
html-printer: EXPR inline start… • tag:em prev-token:normal
html-printer: └─ PRT accum/wrap! start… • col:1 accum-width:4 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," "}
html-printer: └─ PRT accum! start… • col:1 accum-width:4 logical-line-start:#t indent-level:0 breakpoint-before?:#t accumulator:{#<_bp>,"and",#<_bp>," "}
html-printer: └─ PRT accum! …end • col:1 accum-width:8 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}
html-printer: └─ PRT accum/wrap! …end • col:1 accum-width:8 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}
html-printer: EXPR string starting… • prev-token:sticky str:Balzac!
html-printer: └─ PRT accum! start… • col:1 accum-width:8 logical-line-start:#t indent-level:0 breakpoint-before?:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>"}
html-printer: └─ PRT accum! …end • col:1 accum-width:15 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}
html-printer: EXPR string end • last-word:Balzac!
html-printer: └─ PRT pop-whitespace _ • accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}
html-printer: EXPR inline after • popped:#f tag:em
html-printer: EXPR inline …closing • tag:em last-token:sticky
html-printer: └─ PRT accum! start… • col:1 accum-width:15 logical-line-start:#t indent-level:0 breakpoint-before?:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!"}
html-printer: └─ PRT accum! …end • col:1 accum-width:20 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}
html-printer: EXPR block …closing • tag:p last-tok:sticky
html-printer: └─ PRT check/flush col • accum-width:20 wrap-col:20 indent-level:0
html-printer: └─ PRT flush start… • col:1 accum-width:20 logical-line-start:#t indent-level:0 accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}
html-printer: └─ PRT flush at_bp • col:1 buffer-width:0 held-whsp?:#f buffer:{}
html-printer: └─ PRT flush non-whsp • held-whsp?:#f buffer:{"and"}
html-printer: └─ PRT flush at_bp • col:1 buffer-width:3 held-whsp?:#f buffer:{"and"}
html-printer: └─ PRT flush printbuf… • held-whsp?:#f logical-line-start:#t indent-level:0
html-printer: └─ PRT put! start… • v:and col:1 accum-width:20 logical-line-start:#t indent-level:0
html-printer: └─ PRT put! …end • col:4 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT flush whitespace • v:
html-printer: └─ PRT flush at_bp • col:4 buffer-width:1 held-whsp?:1 buffer:{}
html-printer: └─ PRT flush non-whsp • held-whsp?:1 buffer:{" ","<em>"}
html-printer: └─ PRT flush non-whsp • held-whsp?:#f buffer:{" ","<em>","Balzac!"}
html-printer: └─ PRT flush non-whsp • held-whsp?:#f buffer:{" ","<em>","Balzac!","</em>"}
html-printer: └─ PRT flush done-breaking • col:4 accum-width:20 logical-line-start:#f accumulator:{#<_bp>,"and",#<_bp>," ",#<_bp>,"<em>","Balzac!","</em>"}
html-printer: └─ PRT put! start… • v: col:4 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:5 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:<em> col:5 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:9 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:Balzac! col:9 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:16 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:</em> col:16 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:21 accum-width:20 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! start… • v:</p> col:21 accum-width:0 logical-line-start:#f indent-level:0
html-printer: └─ PRT put! …end • col:25 accum-width:0 logical-line-start:#f indent-level:0
html-printer: └─ PRT break! start… • col:25 accum-width:0 logical-line-start:#f indent-level:0 accumulator:{}
html-printer: └─ PRT break! …end • col:1 accum-width:0 logical-line-start:#t indent-level:0 accumulator:{}