Joel Dueck Dot Com: A lightweight Pollen replacement

I’m working on a Racket #lang called Punct that serves the same purpose as Pollen — a programming environment for published artifacts. Punct combines Markdown and free-form Racket code, producing a format-independent AST.

Why would I do this? I have enjoyed using Pollen very much. But after seven years of improving my Racket skills, my “Pollen projects” have been making less and less actual use of Pollen’s facilities and features. I’ve started to wish for a DSL that has only the pieces I need: something with fewer moving parts and a different approach to markup and rendering.

Punct is supposed to be a good fit for when you want very lightweight markup and for the language to handle paragraphs detection and footnotes for you, but you still want the ability to use functions as markup for things Markdown doesn’t provide.

I have designed this for my own use, perhaps only for a single project. I plan to make it publicly available, but not as a package on the package server, thus contributing to the Lisp Curse.

Differences from Pollen

All (or nearly all) markup/tag functions are defined in and provided by the language. There is no equivalent of a pollen.rkt file that the language always has to search for, and no setup submodule in which to provide additional runtime options.
- One or more module paths can be included on the #lang line to be auto-required for documents that need additional functions. (This is a syntactic convenience; of course normal require works as well.)
The location of the “project root” is just the current directory.
Punct has just one dialect. There are no preprocessor or “pagetree” flavors of the language.
Punct source documents evaluate to a generic AST (in the form of an X-expression) rather than going directly to a specific output format. The AST is independent of the output format, so tag functions do not need to check for the “current” output format, and the source document always compiles to the same value. (Of course, this is already possible in Pollen but it is mandatory in Punct.)
Punct provides some basic “renderers” for transforming the AST to HTML, LaTeX, etc. Projects can provide fallback functions for custom elements, or simply write their own renderers from scratch.
Punct does not provide any command-line tools. Sources are compiled (possibly also cached) with raco make. Each project will need to provide its own script for actually applying templates and writing output files.
Punct might not provide a project server for local testing. Any simple local web server will work to serve static files, and fswatch could be used to run make whenever a .rkt file changes. Alternatively, I could write a lightweight web server (possibly as a separate package) that runs make on every requested file before serving it.

New Features

Metadata block

Sources can optionally add metadata using key: value lines delimited by lines consisting only of consecutive hyphens:

  #lang punct
    
  ---
  title: Prepare to be amazed
  date: 2020-05-07
  ---

This is a syntactic convenience that comes with a few rules and limitations:

The metadata block must be the first non-whitespace thing that follows the #lang line.
The values will always be parsed as flat strings.
The reader will not evaluate any escaped code inside the metadata block; all characters in the keys and values will be used verbatim.

If you want to use non-string values, or the results of expressions, in your metadata, you can use the set-meta function anywhere in the document or in code contained in other modules. Within the document body you can also use the ? macro as shorthand for set-meta.

Integrating Markdown

Markdown itself is too limited, but it would be nice to be able to use it as a starting point and add “tag functions” where richer markup is needed.

The language uses the commonmark package for Markdown processing. This package is ideal since it implements a thoroughly specified standard, is fast, and parses to an AST rather than directly to HTML.

Here’s how Markdown and Racket are combined:

The language parses the source file with a Pollen-like reader first. All the code is expanded and evaluated, and the results of all the expressions are converted to strings. X-expressions get special handling during this conversion so they can be reassembled later.
After all expressions are evaluated, the document is concatenated into a string that is parsed by commonmark, producing a document struct.
The document struct is then transformed into another AST in the form of an X-expression, using a custom renderer. During this process, any X-expressions from step 1 are recognized and reconstituted in place.
This final AST becomes the doc provided by the source document.

Layering two independent syntaxes on top of each other like this is tricky. The hard part is handling tag functions that emit something other than a stringish value, such as an X-expression that itself contains text that should be parsed as Markdown. For example, there might be a footnote reference in part of the caption of a figure tag.

The solution is to “flat pack” X-expressions before the CommonMark parsing pass — that is, to transform them recursively into a flat strings delimited by HTML-style tags that preserve their attributes and elements. When parsing the combined string content, CommonMark will parse the delimiting tags as html and html-block values. Then, during the AST transformation (step 3 above) these values will be recognized and used to reassemble the original x-expressions in place, with their original string content replaced by the parsed CommonMark elements.

Name ideas

~~#lang yarn~~
~~#lang dispatch~~
~~#lang interpunct~~
#lang punct

Example syntax

Racket code in Punct uses Scribble’s @ syntax, but with the • “bullet” character (Unicode U+2022) as the control character rather than @.

This is an example of syntax only; the note, attrib and other functions shown are not actually provided by the language (yet).

  #lang punct "my-additional-tags.rkt"
    
  ---
  title: Prepare to be amazed
  date: 2020-05-07
  ---

  This is a paragraph. **Bold text**, etc. — you know the Markdown drill.

  The `my-additional-tags.rkt` above is an example of an optional module path 
  that will be `require`d into the current document for additional tag functions.

  > Famous quotation.
  >
  > •attrib{Surly Buster, [_Fight to Win_][ftw] (2008)}

  The above is an example of a Markdown blockquote containing a tag function which
  in turn contains Markdown.

  •note[#:date "2020-05-07" #:by "A Reader" #:bylink "foo@msn.com"]{

    This is a note added to the document[^1].

    •poem[#:title "Institutions"]{
      ‘Ləh’
    }

    [^1]: It can contain its own footnotes and link references.
  }

 ［ftw]: https://surly.guy/fight-to-win/ 'Book website'

HTML Rendering

To try out the HTML renderer, run a Punct program (in DrRacket, for example), then in the REPL:

(require punct/render/html)
(doc->string doc)

Prior Art

How to Create a Pollen Markup Alternative in 61 Lines by Sage Gerard. Sage’s take was more about a text markup format that you could send through eval rather than creating a proper #lang where the sources would behave like first-class modules. Still a very useful experiment.

joeldueckdotcom