Natively Implemented Functions in Erlang

Companion code for this post available on Github

Erlang is a surprisingly performant language for something that provides so many high level primitives, but occasionally there comes a time when you need to perform a complex calculation fast, and the immutability of Erlang data or the slight overhead of the BEAM becomes an annoyance. As an example, I’ve recently been working on a project that requires calculating the edit distance between streaming pairs of data in real time. Erlang is an excellent fit for streaming data from the network to many parallel workers, but the Levenshtein distance calculation is somewhat taxing. As a first pass, I adapted a memoized distance calculation using plain Erlang:

erlang_leven(<<Bin/binary>>, <<Bin2/binary>>) ->
    {Ed, _Cache} = erlang_leven(Bin, Bin2, dict:new()),
    Ed.

erlang_leven(<<>>, <<Bin/binary>>, Cache) ->
    {byte_size(Bin), dict:store({<<>>, Bin}, byte_size(Bin), Cache)};
erlang_leven(<<Bin/binary>>, <<"">>, Cache) ->
    {byte_size(Bin), dict:store({Bin, <<>>}, byte_size(Bin), Cache)};
erlang_leven(<<B:8, B1/binary>>, <<B:8, B2/binary>>, Cache) ->
    erlang_leven(B1, B2, Cache);
erlang_leven(Bin1 = <<_:8, B1/binary>>, Bin2 = <<_:8, B2/binary>>, Cache) ->
    case dict:is_key({Bin1, Bin2}, Cache) of
        true -> {dict:fetch({Bin1, Bin2}, Cache), Cache};
        false ->
            {L1, C1} = erlang_leven(Bin1, B2, Cache),
            {L2, C2} = erlang_leven(B1, Bin2, C1),
            {L3, C3} = erlang_leven(B1, B2, C2),
            L = 1 + lists:min([L1, L2, L3]),
            {L, dict:store({Bin1, Bin2}, L, C3)}
    end.

If we create a simple test function, we can see how this performs with a midsize input:

perftest(Iterations, Method) ->
    Start = os:system_time(),
    method_loop(Iterations, Method),
    Diff = os:system_time() - Start,
    (Iterations / Diff) * 1000000000.

method_loop(0, _) -> ok;
method_loop(I, Method) ->
    Method(
        <<"7ab02d24-2d67-11e8-835d-0b1d27744c6d">>,
        <<"80393ca4-2d67-11e8-8f7f-c7a8488e4904">>
    ),
    method_loop(I - 1, Method).
8> c('src/perftest').
{ok,perftest}
9> perftest:perftest(1000, fun perftest:erlang_leven/2).
211.50049535298368

Slightly over 200 iterations per second - not stellar. Even with parallelism, this is going to struggle with even a moderate amount of real-time data. So it’s time to turn to NIFs.

Natively Implemented Functions

Erlang, like many languages, supports a foreign function interface (FFI) allowing simple integration of code written in C/C++ into the runtime. In order to add some native code to a project, let’s first create a simple library using a rebar template.

ross@mjolnir:/h/r/P/Erlang$ rebar3 new lib levenshtein
===> Writing levenshtein/src/levenshtein.erl
===> Writing levenshtein/src/levenshtein.app.src
===> Writing levenshtein/rebar.config
===> Writing levenshtein/.gitignore
===> Writing levenshtein/LICENSE
===> Writing levenshtein/README.md

Now, we need to update the project to reflect the fact that we’ll be including native code. Rebar has an invocation for this as well:

ross@mjolnir:/h/r/P/Erlang$ cd levenshtein
ross@mjolnir:/h/r/P/E/levenshtein$ rebar3 new cmake
===> Writing c_src/Makefile

We won’t touch this Makefile since the defaults work well enough, but if one wanted to this is where they’d specify any extra compiler/linker flags they may need. What we will need to do is start writing our C code though. So let’s create our files (c_src/levenshtein.{c,h}) and add some of the basic glue code.

// The first thing we'll want to do in our header (except for include guards)
// is include the methods and structures we'll need to interact with the Erlang
// runtime.
#include <erl_nif.h>

// The second thing will be to define the module callbacks - when a native
// module is registered, it may specify three optional callbacks that the
// system can invoke at different parts of the code lifecycle. These methods
// are not required, but if they are not specified then the ERTS will not allow
// the module to be upgraded in-place.
// Here, we do not keep any state in the module, so these methods will be
// no-ops.
int load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info);
int upgrade(ErlNifEnv* env, void** priv_data, void** old_priv_data,
            ERL_NIF_TERM load_info);
void unload(ErlNifEnv* env, void* priv_data);

// Now, what we came here for: our native method.
// All native methods have the same signature:
// - A pointer to the Erlang environment from which they have been called
// - The number of arguments they have been passed
// - The arguments themselves, all of which are of type ERL_NIF_TERM
// All methods must return a value, also of ERL_NIF_TERM.
// The actual argument verification must occur in the method itself.
static ERL_NIF_TERM erl_levenshtein(
    ErlNifEnv* env,
    int argc,
    const ERL_NIF_TERM argv[]
);

// Now, we define the exported methods for this native module, just as we
// would for a normal Erlang source file.
// In this case, the functions are a list of ErlNifFunc structs, each of which
// contains the name of the function, the number of arguments (arity),
// the function pointer and a flags field.
// This flags field can be left as zero for most cases, but can be used to
// indicate whether a NIF is dirty. For more info, see the Erlang docs on this:
// http://erlang.org/doc/man/erl_nif.html#dirty_nifs
static ErlNifFunc nif_funcs[] = {
    {"levenshtein", 2, erl_levenshtein, 0}
};

// Finally, we invoke the macro that seals the pact with the ERTS.
// This registers the module under the given name, with the export list and
// given callback methods.
ERL_NIF_INIT(levenshtein, nif_funcs, load, NULL, upgrade, unload);

So far so good, we have just about all the glue we need on the C side with only a handfule of lines. Now, all we have to do is implement the function itself. So let’s switch over to c_src/levenshtein.c and get started. For a full rundown of the Erlang C api, you can take a look a the man page for erl_nif.

static ERL_NIF_TERM erl_levenshtein(ErlNifEnv* env, int argc, const
                                    ERL_NIF_TERM argv[]) {
    // The very first thing we'll want to do in our function is check we've
    // been called with the correct arity.
    if (argc != 2) {
        // If we didn't get the two arguments we were expecting, we need to
        // raise a badarg error.
        // There is a provided function for precisely this:
        return enif_make_badarg(env);
    }

    // We now know that we have two arguments, but all arguments provided are
    // of type ERL_NIF_TERM, so we need to make sure that they are what we
    // expect - in this case, two binaries.
    if(!enif_is_binary(env, argv[0])
        || !enif_is_binary(env, argv[1])) {
        // If they aren't both binaries, we return an error tuple
        // {error, not_a_binary}. However, we need to possibly create those
        // atoms ourselves, as well as manually wrap them in a tuple.
        // The implementation of mk_atom can be seen below.
        return enif_make_tuple2(
            env, mk_atom(env, "error"), mk_atom(env, "not_a_binary"));
    }

    // At this point, we know we have both arguments and that both are
    // binaries. We can now safely query them as such, and load their data.
    // The ErlNifBinary struct contains only the length of the binary, and the
    // pointer to the bytes.
    ErlNifBinary binary1, binary2;
    enif_inspect_binary(env, argv[0], &binary1);
    enif_inspect_binary(env, argv[1], &binary2);

    // Now, we can call through to our actual C implementation of the
    // Levenshtein distance (omitted here).
    unsigned int edit_distance = levenshtein(
        binary1.data, binary1.size,
        binary2.data, binary2.size,
    )

    // We can't return the number directly though - we need to convert it to an
    // Erlang term first.
    return enif_make_int(env, editDistance);
}

// Convert a C string to an atom.
ERL_NIF_TERM mk_atom(ErlNifEnv* env, const char* atom) {
    ERL_NIF_TERM ret;
    // enif_make_existing_atom checks the atom table to see if we've already
    // created an atom with this string representation. If this atom already
    // exists, then it will return true and ret will contain the valid atom
    // data. Always try and fetch the atom first, and not make a new one each
    // time, or you risk exhausting the finite space allocated for the atom
    // table.
    if(!enif_make_existing_atom(env, atom, &ret, ERL_NIF_LATIN1)) {
        // If the atom was not already part of the atom table, then we can
        // safely create a new atom.
        return enif_make_atom(env, atom);
    }
    return ret;
}

With this, we’re now done with the C portion of our integration, and need to wire it up from the Erlang side. To do this, we’ll be taking advantage of the on_load directive, which specifies a method to be invoked when a module’s code is loaded by the ERTS. In this case, we want to load the .so file with our native code as soon as the module is added.

-module(levenshtein).
-export([levenshtein/2]).
-on_load(init/0).

% We need to include a dummy method, so that the compiler has something to work
% with. The implementation of it will be replaced when the native code is
% loaded, so it should just raise an error in the exceptional case that it
% actually gets invoked.
levenshtein(_, _) -> exit({not_loaded, [{module, ?MODULE}, {line, ?LINE}]}).

% Now, we define our code loading init method
init() ->
    % Find our priv dir, and the .so file with our module name
    SoName = case code:priv_dir(?MODULE) of
        {error, bad_name} -> exit({error, missing_priv_dir});
        Dir -> filename:join(Dir, ?MODULE)
    end,
    % Attempt to load the code.
    % The second parameter is passed to the module's load callback as
    % `load_info`, but we have no use for it here.
    ok = erlang:load_nif(SoName, 0).

With this in place, all that’s left is to open a shell and test it. Running rebar3 shell should successfully compile and link the C library and place it in the priv dir. Now, let’s see how much of a speed benefit we picked up from rewriting our algorithm natively:

1> c('src/levenshtein').
2> c('src/perftest').
3> perftest:perftest(1000000, fun levenshtein:levenshtein/2).
263647.94814025454

Three orders of magnitude faster! Much more like it. Hopefully this has shown that implementing something as a NIF is not only an excellent performance choice in some cases, it’s also not as daunting a task as you might think. If you want to take a full look at a natively implemented library, you can view the complete code for this project on Github.

Update: As a followup to this post, I have also written a little about how to make a NIF behave nicely with regards to the Erlang scheduler here.

Comments