Montag, Februar 19, 2018

TableGen #1: What has TableGen ever done for us?

This is the first entry in an on-going series. Here's a list of all entries:
  1. What has TableGen ever done for us?
  2. Functional Programming
  3. Bits
  4. Resolving variables
  5. DAGs
  6. to be continued
Anybody who has ever done serious backend work in LLVM has probably developed a love-hate relationship with TableGen. At its best it can be an extremely useful tool that saves a lot of manual work. At its worst, it will drive you mad with bizarre crashes, indecipherable error messages, and generally inscrutable failures to understand what you want from it.

TableGen is an internal tool of the LLVM compiler framework. It implements a domain-specific language that is used to describe many different kinds of structures. These descriptions are translated to read-only data tables that are used by LLVM during compilation.

For example, all of LLVM's intrinsics are described in TableGen files. Additionally, each backend describes its target machine's instructions, register file(s), and more in TableGen files.

The unit of description is the record. At its core, a record is a dictionary of key-value pairs. Additionally, records are typed by their superclass(es), and each record can have a name. So for example, the target machine descriptions typically contain one record for each supported instruction. The name of this record is the name of the enum value which is used to refer to the instruction. A specialized backend in the TableGen tool collects all records that subclass the Instruction class and generates instruction information tables that is used by the C++ code in the backend and the shared codegen infrastructure.

The main point of the TableGen DSL is to provide an ostensibly convenient way to generate a large set of records in a structured fashion that exploits regularities in the target machine architecture. To get an idea of the scope, the X86 backend description contains ~47k records generated by ~62k lines of TableGen. The AMDGPU backend description contains ~39k records generated by ~24k lines of TableGen.

To get an idea of what TableGen looks like, consider this simple example:
def Plain {
  int x = 5;

class Room<string name> {
  string Name = name;
  string WallColor = "white";

def lobby : Room<"Lobby">;

multiclass Floor<int num, string color> {
  let WallColor = color in {
    def _left : Room<num # "_left">;
    def _right : Room<num # "_right">;

defm first_floor : Floor<1, "yellow">;
defm second_floor : Floor
<2, "gray">;
This example defines 6 records in total. If you have an LLVM build around, just run the above through llvm-tblgen to see them for yourself. The first one has name Plain and contains a single value named x of value 5. The other 5 records have Room as a superclass and contain different values for Name and WallColor.

The first of those is the record of name lobby, whose Name value is "Lobby" (note the difference in capitalization) and whose WallColor is "white".

Then there are four records with the names first_floor_left, first_floor_right, second_floor_left, and second_floor_right. Each of those has Room as a superclass, but not Floor. Floor is a multiclass, and multiclasses are not classes (go figure!). Instead, they are simply collections of record prototypes. In this case, Floor has two record prototypes, _left and _right. They are instantiated by each of the defm directives. Note how even though def and defm look quite similar, they are conceptually different: one instantiates the prototypes in a multiclass (or several multiclasses), the other creates a record that may or may not have one or more superclasses.

The Name value of first_floor_left is "1_left" and its WallColor is "yellow", overriding the default. This demonstrates the late-binding nature of TableGen, which is quite useful for modeling exceptions to an otherwise regular structure:
class Foo {
  string salutation = "Hi";
  string message = salutation#", world!";

def : Foo {
salutation = "Hello";
The message of the anonymous record defined by the def-statement is "Hello, world!".

There is much more to TableGen. For example, a particularly surprising but extremely useful feature are the bit sets that are used to describe instruction encodings. But that's for another time.

For now, let me leave you with just one of the many ridiculous inconsistencies in TableGen:
class Tag<int num> {
  int Number = num;

class Test<int num> {
  int Number1 = Tag<5>.Number;
  int Number2 = Tag<num>.Number;
  Tag Tag1 = Tag<5>;
  Tag Tag2 = Tag<num>;

def : Test<5>;
What are the values in the anonymous record? It turns out that Number1 and Number2 are both 5, but Tag1 and Tag2 refer to different records. Tag1 refers to an anonymous record with superclass Tag and Number equal to 5, while Tag2 also refers to an anonymous record, but with the Number equal to an unresolved variable reference.

This clearly doesn't make sense at all and is the kind of thing that sometimes makes you want to just throw it all out of the window and build your own DSL with blackjack and Python hooks. The problem with that kind of approach is that even if the new thing looks nicer initially, it'd probably end up in a similarly messy state after another five years.

So when I ran into several problems like the above recently, I decided to take a deep dive into the internals of TableGen with the hope of just fixing a lot of the mess without reinventing the wheel. Over the next weeks, I plan to write a couple of focused entries on what I've learned and changed, starting with how a simple form of functional programming should be possible in TableGen.

Keine Kommentare: