2.1. Introduction

This introduction gives a short overview of how to write and use Spicy parsers. For a complete list of features available see Reference.

Here’s a simple “Hello, World!” in Spicy:

module Test;

print "Hello, world!";

Assuming that’s stored in hello.spicy, we can compile and run it with spicy-driver [Missing] like this:

> spicy-driver hello.spicy
Hello, World!

spicy-driver compiles the source code into native code on the fly, and then runs it directly. Alternatively, hilti-build [Missing] produces a stand-alone binary for subsequent execution:

> hilti-build -o a.out tools/spicy-driver/spicy-driver.cc hello.spicy
> ./a.out
Hello, World!

Note the inclusion of spicy-driver.cc here: Spicy generated code cannot run on its own but needs a driver program that provides a main function as well as normally (though not in this trivial example) also the input data for the generated parsers. spicy-driver.cc is a generic version of such a driver that can be used for testing and debugging parsers from the command line, without the need for any further host application. If you run the generated binary a.out with --help, you’ll see all the options that spicy-driver.cc provides.

Note

The spicy-driver tool is actually also compiled from the same spicy-driver.cc code, with just a few smaller tweaks to turn the generic driver into JIT mode.

Note that the above hello-world program does not actually contain any parser specification; just some global code executed at initialization time. We’ll see below how to write actual parsers.

2.1.1. A Simple Parser

A Spicy parser specification describes the layout of a protocol data unit (PDU), along with semantic actions to perform when individual pieces are parsed. Here’s a simple example for parsing an HTTP-style request line, such as GET /index.html HTTP/1.0:


module Request;

const Token      = /[^ \t\r\n]+/;
const WhiteSpace = /[ \t]+/;
const NewLine    = /\r?\n/;

export type RequestLine = unit {
    method:  Token;
    :        WhiteSpace;
    uri:     Token;
    :        WhiteSpace;
    version: Version;
    :        NewLine;

    on %done {
        print self.method, self.uri, self.version.number;
        }
};

type Version = unit {
    :       /HTTP\//;
    number: /[0-9]+\.[0-9]+/;
};

In this example, you can see a number of things:

  • A specification must always start with a module statement defining a namespace.

  • The layout of a self-contained data unit is defined by creating a unit type, listing its individual elements in the order they are to be parsed. In the example, there are two such units defined, RequestLine and Version.

  • Each field inside a unit has a type and an optional name. The type defines how that field will be parsed from the raw input stream. In the example, all fields have a regular expression as their type, which means that the generated parser will match these expressions against the input stream in the order as the fields are layed out. Note how the regular expressions can either be given directly as a field’s type (as in Version), or indirectly via globally defined Constants (as in RequestLine). Also note that if a field has a regular expression as it’s type, the parsed value will later have a type of bytes.

    If a field has a name, it can later be referenced for getting to its content. Consequently, all fields with semantic meanings have names in the example, while those which are unlikely to be relevant later don’t (e.g., the whitespaces).

  • A unit field can have another unit as its type; here that’s the case for the version field in RequestLine. The meaning is straight-forward: when parsing the outer unit reaches that field, it first fully parses the sub-field accordings to that one’s layout specification before it continues in the outer unit.

  • We can specify code to be executed when a unit has been completely parsed by defining an on %done hook The statements in the hook can refer to the current unit instance by using the implicitly defined self identifier; and they can access the parsed fields by using a standard attribute notation (as used by other languages, such as Python or C++). As the access to version shows, this also works for getting to the fields of sub-units. In the example, we tell the generated parser to output three of the parsed fields when finished.

  • The export keyword declares the parser generated for a unit to be accessible from an external host application. Only exported units can later be the starting point for feeding in input, all other units can’t be directly used for parsing (only indirectly as sub-units of an exported unit).

Now let’s see how we turn this into an actual parser. If we save the above specification into the file request.spicy, we can use spicy-driver to execute it:

> echo "GET /index.html HTTP/1.0" | spicy-driver request.spicy
GET /index.html 1.0

As you see, the parsing succeeds and the print statement wrote out the three fields one would expect. If we pass something in that’s malformed, the parser will complain:

> echo "GET HTTP/1.0" | spicy-driver request.spicy
hilti: uncaught exception, ParseError with argument look-ahead symbol(s) [[ \t]+] not found

2.1.2. Current State

Please note that Spicy (and HILTI) is not yet production-ready, and there are number of known problems. In particular:

  • Only 64-bit Linux and Mac OS are supported right now.
  • The Spicy compiler is not good a detecting malformed input. If there’s an error in a *.spicy will, chances are high that it will either give a pretty much incomprehensible error message or even just crash.
  • Many of Spicy’s features have not yet been exercised much other than via the unit tests in the test suite. Anything beyond that may or may not work ...
  • The language still lacks many basic features, including data types and operators. It’s generally rather easy to add them (in particular if HILTI already has the corresponding support), but the current set is simply driven by what’s been needed so far.

If you find any problems—including bugs, missing features, and unexpected/broken error handling—it would be most helpful if you could prepare a corresponding BTest unit test that demonstrates the issue; see tests/spicy/* for examples. Please then file the unit test with the github tracker .

2.1.3. Exploring More

  • The Spicy Reference is slowly growing. Eventually, it will document all available features. Note that the Data Types section is auto-generated from the source code and hence comprehensively lists all currently available operators. Operations not in there aren’t supported yet.
  • There are some preliminary protocol parsers libspicy/parsers/, and also in bro/spicy/.
  • Look at Spicy source files (*.spicy) across the tests/spicy/* subdirectories to see how Spicy grammars look like. In particular, the test/unit/*.spicy show various features available for defining units.