Metaprogramming is less fun in D

Despite the speaker’s calm voice and the slides showing almost nothing but code in C++ (which isn’t what you’d normally expect from a keynote), Louis Dionne’s talk on metaprogramming at Meeting C++ 2016 was a truly exciting one.

It’s been years since last time I did some metaprogramming in C++, and this talk brought me back old memories of that feeling when I’d put together some incredibly clever pieces of template black magic and it finally worked. Oh, that was soo cunning. I’m the best hacker. I’m sure you know that feeling too.

I was happy to learn that the picture has changed significantly since that time, and cluttering the code with many nested levels of angle brackets adorned here and there with some double colons is no longer the thing. With Hana you basically write code that looks like your usual runtime function or operator calls, but under the hood these functions are generics that operate on the information carried in types of their arguments. There’s no run-time state involved in these operations, so in the generated machine code they are all optimized out. Impressive.

Case study: Event system

Let’s take a look at the example discussed in the talk (starting at about 38:30). If you’ve watched it already, you can skip this part. I’ll copy the entire example here so that it’s close at hand for reference in further discussion.

Consider an event system that identifies events by their names. We can register any number of callbacks (handlers) to be called when an event is triggered. Later, we trigger the events, expecting that all callbacks registered for a given event will execute.

int main() {
  event_system events({"foo", "bar", "baz"});

  events.on("foo", []() { std::cout << "foo triggered!" << '\n'; });
  events.on("foo", []() { std::cout << "foo again!" << '\n'; });
  events.on("bar", []() { std::cout << "bar triggered!" << '\n'; });
  events.on("baz", []() { std::cout << "baz triggered!" << '\n'; });

  events.trigger("foo");
  events.trigger("baz");
  // events.trigger("unknown"); // WOOPS! Runtime error!
}

We start with a Java-style run-time only implementation. We use a hash map to find a vector of functions to call given an event name. Initially, an empty vector is inserted into the map for each known event.

struct event_system {
  using Callback = std::function<void()>;
  std::unordered_map<std::string, std::vector<Callback>> map_;

  explicit event_system(std::initializer_list<std::string> events) {
    for (auto const& event : events)
      map_.insert({event, {}});
  }

Now, to register a callback, we find the right vector in the map, and push the callback at its end.

  template <typename F>
  void on(std::string const& event, F callback) {
    auto callbacks = map_.find(event);
    assert(callbacks != map_.end() &&
      "trying to add a callback to an unknown event");

    callbacks->second.push_back(callback);
  }

Finally, triggering an event causes calling all callbacks in the vector for the specified event.

  void trigger(std::string const& event) {
    auto callbacks = map_.find(event);
    assert(callbacks != map_.end() &&
      "trying to trigger an unknown event");

    for (auto& callback : callbacks->second)
      callback();
  }

That’s all well and good, but frequently it’s already known at design time what the possible events are and when they should be triggered. So why do we have to pay for the search in map each time we trigger an event? And worse, if we mistype the name of an event, we may be unlucky enough to only know it when it’s too late.

Compile-time lookup

Hana can save us such annoyances by allowing us to do the lookup at compile time, with only cosmetic changes to the above code. First, we update the call site with compile-time string literals in place of the run-time ones.

int main() {
  auto events = make_event_system("foo"_s, "bar"_s, "baz"_s);

  events.on("foo"_s, []() { std::cout << "foo triggered!" << '\n'; });
  events.on("foo"_s, []() { std::cout << "foo again!" << '\n'; });
  events.on("bar"_s, []() { std::cout << "bar triggered!" << '\n'; });
  events.on("baz"_s, []() { std::cout << "baz triggered!" << '\n'; });
  // events.on("unknown"_s, []() {}); // compiler error!

  events.trigger("foo"_s); // no overhead
  events.trigger("baz"_s);
  // events.trigger("unknown"_s); // compiler error!
}

Note the _s suffix on all event names. It requires a special string literal operator which will probably make it into the C++ standard in 2020, but it’s already implemented in Clang and GCC, so why not using it now? The operator builds a stateless object where the string itself is stored in the object’s type, e.g. "foo"_s becomes an instance of hana::string<'f', 'o', 'o'>.

Implementation with hana::map

Now let’s replace the run-time map with hana::map, declaring a vector of callbacks for each event with a bit of template parameter pack expansion magic.

template <typename ...Events>
struct event_system {
  using Callback = std::function<void()>;
  hana::map<hana::pair<Events, std::vector<Callback>>...> map_;

Now we can just default construct the event_system, which will default construct map_, and consequently all the vectors of callbacks it contains will be initialized to empty vectors.

template <typename ...Events>
event_system<Events...> make_event_system(Events ...events) {
  return {};
}

Finally, we replace the run-time lookup with its compile-time equivalent.

  template <typename Event, typename F>
  void on(Event e, F callback) {
    auto is_known_event = hana::contains(map_, e);
    static_assert(is_known_event,
      "trying to add a callback to an unknown event");

    map_[e].push_back(callback);
  }

  template <typename Event>
  void trigger(Event e) const {
    auto is_known_event = hana::contains(map_, e);
    static_assert(is_known_event,
      "trying to trigger an unknown event");

    for (auto& callback : map_[e])
      callback();
  }

What happens here is that the vector that should be accessed is determined at compile time, and each instantiation of the function templates above just accesses its own vector. We expect that there’s no additional run-time cost compared to hand written functions for each event, e.g.

  template <typename F>
  void trigger_foo(F callback) {
    for (auto& callback : callbacks_foo)
      callback();
  }

If your application triggers events frequently, changing from dynamic to static dispatch may result in a noticeable speedup. The chart below shows the time to call trigger for an event with exactly one callback registered. With compile-time lookup based on hana::map it’s about 14 times faster than run-time lookup in unordered_map, and only about 15% slower than just calling an std::function.

Event system performance: D vs. C++

You can have both at the same time

There are cases in which the event to be triggered will be decided only at run time, e.g.:

  std::string e = read_from_stdin();
  events.trigger(e);

Our event system can be easily extended to handle such cases. Just like in the first version with run-time only lookup, we use an unordered map. We don’t want to store the callback vectors twice, so the values in the map are pointers to vectors already stored inside the static map.

  std::unordered_map<std::string, std::vector<Callback>* const> dynamic_;

  event_system() {
    hana::for_each(hana::keys(map_), [&](auto event) {
      dynamic_.insert({event.c_str(), &map_[event]});
    });
  }

Being able to trigger the run-time-determined event is now just a matter of overloading the trigger method that does exactly the same as the one we had in the pure run-time implementation.

  void trigger(std::string const& event) {
    auto callbacks = dynamic_.find(event);
    assert(callbacks != dynamic_.end() &&
      "trying to trigger an unknown event");

    for (auto& callback : *callbacks->second)
      callback();
  }

Can we do better?

At this point in the talk (59:30) you’ll hear Louis saying:

All I need to support compile-time and run-time lookup is a single overload. That is pretty powerful, and I do not know any other language that allows me to do that.

It was obvious to me that D could do it better, and it took me about 10 minutes to sketch an equivalent implementation. But I can understand: if I were to advertise effects of years of hard work I’d done, I wouldn’t want people to look for something else. ;)

Don’t you know D? You could’ve heard of it as “C++ done right”, but that’s not entirely true. D also has its own baggage of bad decisions and quirks. Over time, it’s collected ad-hoc standard library additions in completely different styles. Its documentation is sometimes outdated or misses that one thing you’re looking for. Its community and the support it receives from the industry are incomparably smaller to that received by C++. But leaving all of that aside, after years of using D for various tasks, I must agree that it really lives up to its promise of being “a practical language for practical programmers who need to get the job done quickly, reliably, and leave behind maintainable, easy to understand code.”

So let’s see if D can do The Overload Trick and at least match the C++/Hana duo in expressiveness in this use case.

Interface

Let’s start with what we expect at the call site.

void main() {
  EventSystem!("foo", "bar", "baz") events;

  events.on!"foo"(() { writeln("foo triggered!"); });
  events.on!"foo"(() { writeln("foo again!"); });
  events.on!"bar"(() { writeln("bar triggered!"); });
  events.on!"baz"(() { writeln("baz triggered!"); });
  // events.on!"unknown"(() {}); // compile error!

  events.trigger!"foo";
  events.trigger!"baz";
  events.trigger("bar"); // overload for dynamic dispatch
  // events.trigger!"unknown"; // compile error!
}

It looks pretty similar to what we had in C++. The most important difference is that in D, there’s no special syntax for compile-time strings, and regular strings can be passed as template parameters. For this reason, the idea of stateless objects with strings encoded in types won’t add any value here (but it is technically possible to implement it, see below).

The EventSystem struct template can be instantiated by just passing the event names as arguments, and the default static initializers are sufficient for all its members, so there’s no need for a factory function.

Function templates on and trigger also accept compile-time strings. Since each string is a single token, the parentheses around the template argument lists can be skipped, just as those around an empty run-time argument list. This minor syntactic quirk turned trigger!("foo")() into less cluttered trigger!"foo".

The distiction between run-time and compile-time arguments is preserved with regular language rules. You don’t need to remember that in this particular case the _s suffix implies a compile-time entity. Note that we trigger foo and baz via static dispatch, but we also expect that an overload is available that accepts a run-time evaluated event name when we trigger bar.

Compile-time lookup

Now on to the implementation:

struct EventSystem(events...) {
  alias Callback = void delegate();
  Callback[][events.length] callbacks_;   // like std::array<std::vector<Callback>, sizeof...(events)> in C++

There’s no equivalent of hana::map in Phobos, but since all values in the map are of the same type, we can declare a fixed-length array of them and use staticIndexOf to map event names to indices in the array. The argument list that this struct template receives as events... may include various compile-time entities, including types, template names, constant values of different primitive or composite types. In particular, strings will do just fine.

The implementation of on and trigger also looks pretty similar to the C++ version, except that first we look for the index of the requested vector of callbacks, and then get the vector from the array via this index.

  void on(string event)(Callback c) {
    enum index = staticIndexOf!(event, events);
    static assert(index >= 0,
      "trying to add a callback to an unknown event: " ~ event);

    callbacks_[index] ~= c;
  }

  void trigger(string event)() {
    enum index = staticIndexOf!(event, events);
    static assert(index >= 0,
      "trying to trigger an unknown event: " ~ event);

    foreach (callback; callbacks_[index])
      callback();
  }

Note the enum in place where C++ version used auto. The type is also inferred automatically, but using enum forces index to be computed at compile time and only then it can be used in static assert.

The lookup (compile-time linear search with staticIndexOf) is done only once per instantiation (i.e. once per key), and there is no run-time cost associated with it. Also, indexing the array via statically known index doesn’t add any run-time overhead.

Overloading trigger()

And now, the overload of trigger accepting a dynamic key. There’s nothing unusual in having both template and non-template overloads side by side, so let’s focus only on the implementation of run-time lookup. One way would be to use a built-in associative array, populated during construction of the map object (you can try it yourself), but for a small number of keys, a linear search comparing the requested key with all known keys shouldn’t be much worse.

  void trigger(string event) {
    foreach (i, e; events) {
      if (event == e) {
        foreach (c; callbacks_[i])
          c();
        return;
      }
    }
    assert(false, "trying to trigger an unknown event: " ~ event);
  }

What’s going on here? The outer foreach iterates over events which is a compile-time tuple of strings. It’s not really a loop–the compiler pastes the body of this static foreach once for each element of the tuple, substituting that element for e and its index for index. The result is as if we have three ifs one after another:

      if (event == "foo") {
        foreach (callback; callbacks_[0])
          callback();
        return;
      }
      if (event == "bar") {
        foreach (callback; callbacks_[1])
          callback();
        return;
      }
      if (event == "baz") {
        foreach (callback; callbacks_[2])
          callback();
        return;
      }

One-on-one

That’s all. You can download both versions, stare at the code for a while and compile them. The D version can be compiled with DMD, GDC or LDC. On my machine, it consistently compiles noticeably faster than the C++ one (minimum of 10 consecutive attempts, all compiled on x86_64 with -O3):

	C++ (g++, clang++)	D (gdc, ldc2)
GCC/GDC 6.2.0	0.98s	0.45s
Clang 3.9.1/LDC 1.1.0	1.09s	0.67s

I wouldn’t draw any serious conclusions from the time it takes to compile a tiny toy program, but in general, D was designed for fast compilation, not to mention that the volume of library code brought in is much smaller.

And what if we mistyped a name of an event? The D frontend (common for all 3 major compilers) tries to balance between the volume and usefulness of error messages, and here it does the job very well. The only thing we’ll see is:

es.d(23): Error: static assert  "trying to trigger an unknown event: unknown"
es.d(102):        instantiated from here: trigger!"unknown"

And in C++? In my shell I had to scroll up through about a hundred lines of heavily punctuated messages to see that important one coming from the static_assert (although with Clang’s highlighting it wasn’t as bad as with GCC’s). There used to be a tool for transforming C++ error messages into something digestible for humans. Is it still around?

The last thing to check is if we didn’t lose performance. In the graph below you can see the comparison of D vs. C++ version for both static and dynamic dispatch. The D version seems to be slightly faster in both cases, but it’s only a limited microbenchmark. The D dynamic lookup might just perform better here because we use a different algorithm that favors small sets. And the ~10% speedup for the static lookup most likely comes from std::function doing a bit more work on each call than delegate does (check the assembly: D, C++.) It won’t be noticed in a larger application doing anything meaningful in the callback, so let’s just assume both versions perform equally well.

Event system performance: D vs. C++

In conclusion, with D we’ve achieved a similar result as in C++, but without resorting to a complex library. The D implementation looks straightforward, and uses only a few basic features of the language and standard library. There’s no feeling of being super-clever nor did I have to learn anything new. But isn’t it exactly what we expect from maintainable, easy to understand code?

So is it really better? There’s no absolute scale for that, but if other factors aren’t more important, I’d likely be in favor of a solution that doesn’t rely on an external library, compiles faster and produces less scary error messages if something goes wrong.

hana::map equivalent in D

But what if we wanted a data structure in D that behaves like hana::map? After all, we won’t always have values of the same type, and using an intermediate integer index feels a little bit unprofessional. Is it possible?

Prerequisites

It turns out, the only obstacle is that a compile-time entity (type or value) cannot be used as a run-time argument to overloaded indexing operators. To overcome this, the same technique as Hana uses can be applied in D. Consider a struct template parameterized with a single value or type. I don’t know how to name it so let it be an Entity.

struct Entity(arg...)
  if (arg.length == 1)
{
  static if (is(arg[0])) {
    alias Type = arg[0];
  } else static if (is(typeof(arg[0]) T)) {
    alias Type = T;
    enum value = arg[0];
  }
}

We can instantiate concrete struct types and create objects by writing e.g. Entity!"foo"() or Entity!double(). Such objects have no state, but their types will select different instantiations of template functions.

Unlike regular function calls, constructing a struct object requires parentheses, which makes it a bit more verbose than Hana’s _c and _s suffixes. There is a number of ways to make it more concise, parameterized enum being one of them:

enum c(arg...) = Entity!arg();

static assert(is(c!int.Type == int));
static assert(is(c!"foo".Type == string));
static assert(c!"foo".value == "foo");
static assert(c!42.value == 42);

Similar syntax for user-defined literals was reportedly used first in std.conv.octal.

Storage for values

Now let’s get to the map itself. We want something that can store values of different types (like std.typecons.Tuple, and where keys are types or compile-time values, instantiated like this:

struct Bar {}
Map!(
  "foo",  int,           // string "foo" maps to a value of type int
  Bar,    string,        // type Bar maps to a value of type string
  "한",   string[]) map; // string "한" maps to a value of type string[]

We need to separate the interleaving keys and value types and declare storage for the values themselves:

struct Map(spec...) {
  alias Keys = Even!spec;
  alias Values = Odd!spec;
  Values values;

Even and Odd aren’t standard things, but we can quickly implement them on our own:

template Stride(size_t first, size_t stride, A...) {
  static if (A.length > first)
    alias Stride = AliasSeq!(A[first], Stride!(stride, stride, A[first .. $]));
  else
    alias Stride = AliasSeq!();
}

alias Odd(A...) = Stride!(1, 2, A);
alias Even(A...) = Stride!(0, 2, A);

Now, values is a built-in tuple consisting of elements of all types in the type list Values. It can be indexed using a constant integer, e.g. map.values[2]. It can also be iterated over using the “static foreach” construct we saw before. That means we’ve got iteration over keys or values for free. Try:

  // initialize via Struct Literal syntax
  auto map =
    Map!("foo", int, Bar, string, "한", string[])(
      42, "baz", ["lorem", "ipsum", "dolor"]);

  // iterate over keys
  foreach (K; map.Keys)
    writeln(K.stringof);

  // iterate over types of values
  foreach (V; map.Values)
    writeln(V.stringof);

  // iterate over values
  foreach (value; map.values)
    writeln(value);

Operators

Hana offers free function template contains to check whether a given key is in the map. In D, the in operator is usually used. It can be overloaded by implementing opBinaryRight. Since it only depends on compile-time information (types of its arguments), it can be declared static:

  static bool opBinaryRight(string op, Key...)(Entity!Key key)
    if (op == "in")
  {
    enum index = staticIndexOf!(Key, Keys);
    return index >= 0;
  }

Let’s see if it works:

  static assert(c!"foo" in map);
  static assert(c!Bar in map);
  static assert(c!"한" in map);
  static assert(c!42 !in map);

To look up a value using a compile-time key, we use staticIndexOf and add a meaningful message if the key is not found:

  private template IndexOf(alias Key) {
    enum IndexOf = staticIndexOf!(Key, Keys);
    static assert(IndexOf >= 0,
      "trying to access a nonexistent key: " ~ Key);
  }

This can be used to implement indexing operators. Why not a single one? Unlike in C++, where operator[] returns an lvalue reference which can be further manipulated using e.g. assignment operators, in D the operators for read-only indexing, indexing with simple assignment, and indexing with compound assignment are overloaded separately.

  auto opIndex(Key...)(Entity!Key key) const {
    return values[IndexOf!Key];
  }

  auto opIndexAssign(T, Key...)(auto ref T value, Entity!Key key) {
    return values[IndexOf!Key] = value;
  }

  auto opIndexOpAssign(string op, T, Key...)(auto ref T value, Entity!Key key) {
    return mixin(`values[IndexOf!Key] ` ~ op ~ `= value`);
  }

In all cases, T and Key are inferred from the types of the arguments. Evaluation of IndexOf!Key and indexing in values are both done at compile time. Let’s test it:

  // compile-time lookup, run-time assignment
  map[c!"foo"] = 42;        // opIndexAssign
  map[c!Bar] = "baz";
  map[c!"한"] ~= "lorem";    // opIndexOpAssign!"~"
  map[c!"한"] ~= "ipsum";
  map[c!"한"] ~= "dolor";

  // compile-time lookup, run-time comparison
  assert(map[c!"foo"] == 42);  // opIndex
  assert(map[c!Bar] == "baz");
  assert(map[c!"한"] == ["lorem", "ipsum", "dolor"]);

Performance

That’s all! We have a map that allows lookup and iteration in compile time with run-time-like syntax, just like hana::map, and it’s nothing special–just a bunch of your everyday trivial one-liners.

You may want to try modifying EventSystem to use Map that we’ve just implemented. If you’re impatient, you can find one possible implementation in the same archive. I couldn’t notice any measurable difference in compilation time between this version and the previous one, and the run-time performance is also very similar, as shown in the graph below.

Event system based on static Map in D vs. previous solutions

The question is whether this would be the right design choice in D. Operators [] and in are just syntactic sugar that could easily be replaced with “normal” template functions like get and contains, making the awkward c!arg unnecessary. Iteration over a list of compile-time entities or elements of value tuples is a built-in language feature, and doesn’t need it either.

Summing up

This isn’t the first attempt at emulating Hana in D. See for example this post on type objects, with an interesting example of quicksort on types. Hana’s tricks making metafunctions look like regular run-time code are applicable in D, but thanks to D’s built-in metaprogramming features they do not substantially improve code readability or programmer productivity. Until the new CTFE engine is complete and merged, they will probably only hurt build times without giving much in return.

The talk concludes with a few examples of what will be possible with features proposed for C++20 – named arguments, foreach over a type list, and serialization to JSON with reflection. You probably won’t be surprised that all of this have been possible in D for years, but with a cleaner syntax and less intellectual effort required, stripping metaprogramming of all the fun you could have doing it in C++.