Documenting a language using a custom Sphinx domain and Pygments lexer

Recently I’ve been looking at the software engineering tools / techniques I used when engineering the MDCF Architect (see my original post). Today I’m going to talk about Sphinx and Pygments — tools used by my research lab for developer-facing documentation. Both of these tools work great “out of the box” for most setups, but since my project uses the somewhat-obscure programming language AADL, quite a bit of extra configuration was needed for everything to work correctly.

Sphinx is a documentation-generating tool that was originally created for the python language documentation, though it can now support a number of languages / other features through the sphinx-contrib project. It uses reStructuredText, which I found to be totally usable after I took some time to poke around at a few examples. Since your documentation will probably have lots of code examples, it uses Pygments to provide syntax highlighting. Pygments supports a crazy-huge number of languages, which is probably one reason why it’s one of the most popular programs for syntax highlighting.

But, what do you do when you want to document a language that isn’t supported by either Sphinx or Pygments? You add your own support, of course! Though it took quite a bit of digging / trial-and-error, I added a custom domain to Sphinx and a custom lexer for Pygments, and integrated the whole process so generating documentation is still just one step.

A Custom Sphinx Domain

Before I get into discussing how I made a custom Sphinx domain, let me first back up and explain what exactly a domain (in Sphinx parlance) is. A full explanation is available from the Sphinx website, but the short version is that a domain provides support for documenting a programming language — primarily by enabling grouping and cross-linking of the language’s constructs. So, for example, you could specify a function name and its parameters, and get a nicely formatted description in your documentation (the example formatting has been somewhat wordpress-ified, but it gives an idea):

`thread`
Description:	Threads correspond to individual units of work such as handling an incoming message or checking for an alarm condition
Contained Elements:	features (`port`) — The process-level ports this thread either reads from or writes to.
Properties:	Period (`Timing_Properties::Period`) — The task’s period. Ignored for sporadic tasks. Deadline (`Timing_Properties::Deadline`) — The amount of time that can lapse between the task’s dispatch and completion. WCET (`Timing_Properties::Compute_Execution_Time`) — A task’s worst case execution time is the most time it will take to complete after dispatch. Dispatch (`Thread_Properties::Dispatch_Protocol`) — Either sporadic or periodic. Periodic tasks are dispatched once per period, while sporadic tasks are dispatched when a message arrives on their associated port.

There isn’t a lot of documentation for creating a custom Sphinx domains, but there are a lot of examples in the sphinx-contrib project. All of these examples, though, are built to produce a standalone, installable package that will make the domain available for use on a particular machine. Unfortunately, this would greatly complicate the distribution process of my software — anyone who wanted to build the project (including documentation) from source would have to install a bunch of extra stuff. Plus, this installation would need to be repeated on each of the build machines my research lab uses (there are nearly 20 of them, and all installation has to go through the already overworked KSU CIS IT) and any changes would mean repeating the entire process. Instead, I decided to try and just hook the custom domain into my Sphinx installation, and it turned out this was pretty easy to do. There are two steps: 1) develop the custom domain, and 2) add it to sphinx.

Developing the Domain

I got started by using the GNU Make domain, by Kay-Uwe Lorenz, as a template; I found it to be quite understandable. From there I sort of hacked in some dependencies from the sphinx-contrib project (and imported others) until I had enough to use the custom_domain class. Then it was just the configuration of a few names, template strings, and the fields used by the AADL elements I wanted to document. Fields, which make up the bulk of the domain specification, come in three kinds — Fields, GroupedFields, and TypedFields. Fields are the most basic elements, GroupedFields add the ability for fields to be grouped together, and TypedFields enable both grouping and a type specification. I didn’t find a lot of documentation online, but the source is available, and pretty illustrative if you’re stuck.

AADLDomain = custom_domain(‘AADLDomain’,
        name  = ‘aadl’,
        label = “AADL”, 

        elements = dict(
            construct = dict(
                objname = “AADL Construct”,
                indextemplate = “pair: %s; AADL Construct”,
                fields = [
                    TypedField(‘property’,
                        label = “Properties”,
                        names = [‘property’],
                        typenames = [‘type’],
                        can_collapse = True,
                    ),
                    TypedField(‘contained-element’,
                        label = “Contained Elements”,
                        names = [‘contained-element’],
                        typenames = [‘kind’],
                        can_collapse = True,
                    ),
                    TypedField(‘context’,
                        label = “Contexts”,
                        names = [‘context’],
                        typenames = [‘context-type’],
                        can_collapse = True,
                    ),
                    GroupedField(‘trigger-type’,
                        label = “Triggers”,
                        names = [‘trigger-type’],
                        can_collapse = True,
                    )
                ],
            ),
            # Other elements here…
        ))

AADLDomain = custom_domain(‘AADLDomain’,

name = ‘aadl’,

label = “AADL”,

elements = dict(

construct = dict(

objname = “AADL Construct”,

indextemplate = “pair: %s; AADL Construct”,

fields = [

TypedField(‘property’,

label = “Properties”,

names = [‘property’],

typenames = [‘type’],

can_collapse = True,

TypedField(‘contained-element’,

label = “Contained Elements”,

names = [‘contained-element’],

typenames = [‘kind’],

can_collapse = True,

TypedField(‘context’,

label = “Contexts”,

names = [‘context’],

typenames = [‘context-type’],

can_collapse = True,

GroupedField(‘trigger-type’,

label = “Triggers”,

names = [‘trigger-type’],

can_collapse = True,

)

# Other elements here…

))

Now you can use elements from these domains in your documentation pretty easily:

.. construct:: thread

	Threads correspond to individual units of work such as handling an incoming message or checking for an alarm condition.
	
	:contained-element features: The process-level ports this thread either reads from or writes to.
	:kind features: :construct:`port`
	:property Period: |prop period|
	:property Deadline: |prop deadline|
	:property WCET: |prop wcet|
	:property Dispatch: |prop dispatch|
	:type Period: :property:`Timing_Properties::Period<period>`
	:type Deadline: :property:`Timing_Properties::Deadline<deadline>`
	:type WCET: :property:`Timing_Properties::Compute_Execution_Time<wcet>`
	:type Dispatch: :property:`Thread_Properties::Dispatch_Protocol<dispatch-protocol>`

.. construct:: thread

Threads correspond to individual units of work such as handling an incoming message or checking for an alarm condition.

:contained–element features: The process–level ports this thread either reads from or writes to.

:kind features: :construct:`port`

:property Period: |prop period|

:property Deadline: |prop deadline|

:property WCET: |prop wcet|

:property Dispatch: |prop dispatch|

:type Period: :property:`Timing_Properties::Period<period>`

:type Deadline: :property:`Timing_Properties::Deadline<deadline>`

:type WCET: :property:`Timing_Properties::Compute_Execution_Time<wcet>`

:type Dispatch: :property:`Thread_Properties::Dispatch_Protocol<dispatch–protocol>`

A Custom Pygments Lexer

The Pygments documentation has a pretty thorough walkthrough of how to write your own lexer. Using that (and the examples in the other.py lexer file) I was able to write my own lexer with relatively little frustration. When it came time to use my lexer in Sphinx, though, I ran into a problem similar to the one I had with the domain — in the typical use case, the lexer would have to be installed into an existing Pygments installation before the documentation could be built. Fortunately, like domains, lexers can be provided directly to Sphinx (assuming Pygments is installed somewhere, that is).

Developing the Lexer

Pygments lexer development using the RegexLexer class is pretty straightforward — you essentially just define a state machine with regular expressions that govern transitions between the various tokens (ie, your lexemes). Here’s an excerpt of the full lexer:

class AADLLexer(RegexLexer):
   
    name = ‘AADL’
    aliases = [‘aadl’]
    filenames = [‘*.aadl’]
    mimetypes = [‘text/x-aadl’]

    flags = re.MULTILINE | re.DOTALL

    iden_rex = r'[a-zA-Z_][a-zA-Z0-9_\.]*’
    with_tuple = (r'(with)(\s+)’, bygroups(Keyword.Namespace, Text), ‘with-list’)
    # Other common regular expressions and tuples go here
    
    tokens = {
        ‘property-section’ : [
            text_tuple,
            (class_iden_rex + r'(\s*)(=>)(\s*)’, bygroups(Name.Class, Punctuation, Name.Class, Whitespace, Operator, Whitespace), ‘property-section-property-value’),
            (r'(‘ + iden_rex + r’)(\s*)(=>)(\s*)’, bygroups(Name.Class, Whitespace, Operator, Whitespace), ‘property-section-property-value’),
            (r”, Generic.Error, ‘#pop’),
        ],
        ‘root’: [
            (r'(\n\s*|\t)’, Whitespace),
            (r’–.*?$’, Comment.Single),
            (r'(package)(\s+)’, bygroups(Keyword.Namespace, Text), ‘packageOrSystem’),
            (r'(public|private)’, Keyword.Namespace),
            with_tuple,
            (keyword_rex + r'(\s+)’, bygroups(Keyword.Type, Text), ‘package-declaration’),
            (r'(subcomponents|connections|features|flows)(\s+)’, bygroups(Keyword.Namespace, Whitespace)),
            (definition_rex, bygroups(Name.Variable, Punctuation), ‘declaration’),
            (r'(properties)(\s*)’, bygroups(Keyword.Namespace, Whitespace), ‘property-section’),
            (r'(end)(\s+)’, bygroups(Keyword.Namespace, Whitespace), ‘package-declaration’),
            (r'(property set)(\s+)’, bygroups(Keyword.Namespace, Whitespace), ‘property-set’),
        ],
        # More tokens and their transition-rule regexes go here
    }

class AADLLexer(RegexLexer):

name = ‘AADL’

aliases = [‘aadl’]

filenames = [‘*.aadl’]

mimetypes = [‘text/x-aadl’]

flags = re.MULTILINE | re.DOTALL

iden_rex = r‘[a-zA-Z_][a-zA-Z0-9_\.]*’

with_tuple = (r‘(with)(\s+)’, bygroups(Keyword.Namespace, Text), ‘with-list’)

# Other common regular expressions and tuples go here

tokens = {

‘property-section’ : [

text_tuple,

(class_iden_rex + r‘(\s*)(=>)(\s*)’, bygroups(Name.Class, Punctuation, Name.Class, Whitespace, Operator, Whitespace), ‘property-section-property-value’),

(r‘(‘ + iden_rex + r‘)(\s*)(=>)(\s*)’, bygroups(Name.Class, Whitespace, Operator, Whitespace), ‘property-section-property-value’),

(r”, Generic.Error, ‘#pop’),

‘root’: [

(r‘(\n\s*|\t)’, Whitespace),

(r‘–.*?$’, Comment.Single),

(r‘(package)(\s+)’, bygroups(Keyword.Namespace, Text), ‘packageOrSystem’),

(r‘(public|private)’, Keyword.Namespace),

with_tuple,

(keyword_rex + r‘(\s+)’, bygroups(Keyword.Type, Text), ‘package-declaration’),

(r‘(subcomponents|connections|features|flows)(\s+)’, bygroups(Keyword.Namespace, Whitespace)),

(definition_rex, bygroups(Name.Variable, Punctuation), ‘declaration’),

(r‘(properties)(\s*)’, bygroups(Keyword.Namespace, Whitespace), ‘property-section’),

(r‘(end)(\s+)’, bygroups(Keyword.Namespace, Whitespace), ‘package-declaration’),

(r‘(property set)(\s+)’, bygroups(Keyword.Namespace, Whitespace), ‘property-set’),

# More tokens and their transition-rule regexes go here

}

Once available, using your lexer to describe an example is even more straightforward; you simply use the :language: directive:

.. literalinclude:: snippets/logic.aadl
	:language: aadl
	:linenos:

.. literalinclude:: snippets/logic.aadl

:language: aadl

:linenos:

Putting it all together

Once you have your domain and lexer built, you just need to make Sphinx aware of them. Put the files somewhere accessible (I have mine in a util folder that sits at the top level of my documentation) and use the sphinx.add_lexer(“name”, lexer) and sphinx.add_domain(domain) functions in the setup(sphinx) function in your conf.py file:

def setup(sphinx):
    sys.path.insert(0, os.path.abspath(‘./util’))
    from AADLLexer import AADLLexer
    from AADLDomain import AADLDomain
    sphinx.add_lexer(“aadl”, AADLLexer())
    sphinx.add_domain(AADLDomain)

def setup(sphinx):

sys.path.insert(0, os.path.abspath(‘./util’))

from AADLLexer import AADLLexer

from AADLDomain import AADLDomain

sphinx.add_lexer(“aadl”, AADLLexer())

sphinx.add_domain(AADLDomain)

You can see an example of what this all looks like over at the MDCF Architect documentation, and you can see the full domain and lexer files on the MDCF Architect github page.

Comments

4 responses to “Documenting a language using a custom Sphinx domain and Pygments lexer”

Automating all Aspects of a Build with Maven Plugins | Sam Procter's Website

June 27, 2014

[…] (an eclipse plugin), I also built a number of supporting artifacts — things like developer-targeted documentation and testing with coverage information. Integrating these (and other) build features with Maven is […]

Steven Mading

February 23, 2016

Thanks for leaving this here – it may be just what I need. I just found it in a google search that had me running in circles. Lots and lots of documents explained how to add a new lexer to Pygments, but nothing mentioned how to do it properly in a managed environment where you’re supposed to behave as a normal unprivileged user, not as root.

I do happen to have root access where it’s installed but I’m also trying to put this project on github and I want people to be able to generate the docs from instructions and scripts after they grab a copy of the source from there. I don’t want those instructions to have to include the step, “Now become root and go make this permanent change to your system-wide installation of the Pygments utility just to run my one little thing on it”.

I still have a hard time believing that that is the standard way all the Pygments documents tell you to do things, but it seems to be.

1. Sam
  
  March 9, 2016
  
  Glad to hear it was useful!
  
Joel Gerber

December 23, 2016

Exactly what I was looking for. Thanks so much for sharing!

Documenting a language using a custom Sphinx domain and Pygments lexer

A Custom Sphinx Domain

Developing the Domain

A Custom Pygments Lexer

Developing the Lexer

Putting it all together

Comments

4 responses to “Documenting a language using a custom Sphinx domain and Pygments lexer”

Leave a Reply Cancel reply