Documenting a language using a custom Sphinx domain and Pygments lexer



Recently I’ve been looking at the software engineering tools / techniques I used when engineering the MDCF Architect (see my original post). Today I’m going to talk about Sphinx and Pygments — tools used by my research lab for developer-facing documentation.  Both of these tools work great “out of the box” for most setups, but since my project uses the somewhat-obscure programming language AADL, quite a bit of extra configuration was needed for everything to work correctly.

Sphinx is a documentation-generating tool that was originally created for the python language documentation, though it can now support a number of languages / other features through the sphinx-contrib project.  It uses reStructuredText, which I found to be totally usable after I took some time to poke around at a few examples. Since your documentation will probably have lots of code examples, it uses Pygments to provide syntax highlighting. Pygments supports a crazy-huge number of languages, which is probably one reason why it’s one of the most popular programs for syntax highlighting.

But, what do you do when you want to document a language that isn’t supported by either Sphinx or Pygments?  You add your own support, of course! Though it took quite a bit of digging / trial-and-error, I added a custom domain to Sphinx and a custom lexer for Pygments, and integrated the whole process so generating documentation is still just one step.

A Custom Sphinx Domain

Before I get into discussing how I made a custom Sphinx domain, let me first back up and explain what exactly a domain (in Sphinx parlance) is.  A full explanation is available from the Sphinx website, but the short version is that a domain provides support for documenting a programming language — primarily by enabling grouping and cross-linking of the language’s constructs. So, for example, you could specify a function name and its parameters, and get a nicely formatted description in your documentation (the example formatting has been somewhat wordpress-ified, but it gives an idea):

Description: Threads correspond to individual units of work such as handling an incoming message or checking for an alarm condition
Contained Elements:
  • features (port) — The process-level ports this thread either reads from or writes to.

There isn’t a lot of documentation for creating a custom Sphinx domains, but there are a lot of examples in the sphinx-contrib project. All of these examples, though, are built to produce a standalone, installable package that will make the domain available for use on a particular machine.  Unfortunately, this would greatly complicate the distribution process of my software — anyone who wanted to build the project (including documentation) from source would have to install a bunch of extra stuff.  Plus, this installation would need to be repeated on each of the build machines my research lab uses (there are nearly 20 of them, and all installation has to go through the already overworked KSU CIS IT) and any changes would mean repeating the entire process. Instead, I decided to try and just hook the custom domain into my Sphinx installation, and it turned out this was pretty easy to do.  There are two steps: 1) develop the custom domain, and 2) add it to sphinx.

Developing the Domain

I got started by using the GNU Make domain, by Kay-Uwe Lorenz, as a template; I found it to be quite understandable. From there I sort of hacked in some dependencies from the sphinx-contrib project (and imported others) until I had enough to use the  custom_domain  class. Then it was just the configuration of a few names, template strings, and the fields used by the AADL elements I wanted to document.  Fields, which make up the bulk of the domain specification, come in three kinds — Fields, GroupedFields, and TypedFields. Fields are the most basic elements, GroupedFields add the ability for fields to be grouped together, and TypedFields enable both grouping and a type specification.  I didn’t find a lot of documentation online, but the source is available, and pretty illustrative if you’re stuck.

Now you can use elements from these domains in your documentation pretty easily:

A Custom Pygments Lexer

The Pygments documentation has a pretty thorough walkthrough of how to write your own lexer.  Using that (and the examples in the lexer file) I was able to write my own lexer with relatively little frustration.  When it came time to use my lexer in Sphinx, though, I ran into a problem similar to the one I had with the domain — in the typical use case, the lexer would have to be installed into an existing Pygments installation before the documentation could be built.  Fortunately, like domains, lexers can be provided directly to Sphinx (assuming Pygments is installed somewhere, that is).

Developing the Lexer

Pygments lexer development using the RegexLexer class is pretty straightforward — you essentially just define a state machine with regular expressions that govern transitions between the various tokens (ie, your lexemes). Here’s an excerpt of the full lexer:

Once available, using your lexer to describe an example is even more straightforward; you simply use the  :language:  directive:

Putting it all together

Once you have your domain and lexer built, you just need to make Sphinx aware of them.  Put the files somewhere accessible (I have mine in a util folder that sits at the top level of my documentation) and use the  sphinx.add_lexer(“name”, lexer)  and  sphinx.add_domain(domain)  functions in the  setup(sphinx)  function in your file:

You can see an example of what this all looks like over at the MDCF Architect documentation, and you can see the full domain and lexer files on the MDCF Architect github page.


4 responses to “Documenting a language using a custom Sphinx domain and Pygments lexer”

  1. […] (an eclipse plugin), I also built a number of supporting artifacts — things like developer-targeted documentation and testing with coverage information. Integrating these (and other) build features with Maven is […]

  2. Thanks for leaving this here – it may be just what I need.  I just found it in a google search that had me running in circles.  Lots and lots of documents explained how to add a new lexer to Pygments, but nothing mentioned how to do it properly in a managed environment where you’re supposed to behave as a normal unprivileged user, not as root.

    I do happen to have root access where it’s installed but I’m also trying to put this project on github and I want people to be able to generate the docs from instructions and scripts after they grab a copy of the source from there.  I don’t want those instructions to have to include the step, “Now become root and go make this permanent change to your system-wide installation of the Pygments utility just to run my one little thing on it”.

    I still have a hard time believing that that is the standard way all the Pygments documents tell you to do things, but it seems to be.

    1. Glad to hear it was useful!

  3. Joel Gerber Avatar
    Joel Gerber

    Exactly what I was looking for. Thanks so much for sharing!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.