Since I started working with Ruby, in some situations I miss some of the Clojure libraries – especially killer ones like Pathom.

Unfortunately, rewriting the whole library in a different language is completely out of the question, especially because I will reintroduce some of the bugs that are already fixed, and will not have some of the more important features like parallel processing and other stuff.

But it’s not an “end of the world” situation. GraalVM have a very important feature that is to compile Java code to native, using what it’s usually called the “native-image” or, to be more precise, SubtrateVM. I also knew that you could generate shared libraries with this approach, but I never actually did anything and in my last attempts – every time I tried the only result I had was lots of frustrations and weird errors.

So that’s why I’m very proud and very surprised to announce that I have a Pathom working in Ruby right now.

But more on that later – it’s probably not yet ready to be used in production; but what I want to share is how I actually made it work, then challenges I faced (and how to handle them) and how can other people do the same. Finally, I am writing this post to make it easy for other people to do the same – honestly, the whole thing was awfully hard but not because it’s actually hard – but because the documentation is lacking so much that, at some times, I even though things I did were not possible.

So first things first, you must be aware that I will be working with four different languages in this project. The code that I wanted to wrap is written in Clojure, but that doesn’t translate well to GraalVM, so I wrote a wrapper in Java. Compiling to a shared library generated a C header (and .so file) that I had have to somehow translate to Ruby, using the native Gem approach. And finally, the C version of the Ruby library is not very friendly to work in Ruby code, so I made a wrapper in Ruby that is the “public API” for Pathom – that I gently called “Patao” for now, and for people that know Portuguese you probably got the joke.

First things first, I found that it’s actually pretty hard to serialize stuff between C and Java (which is basically the “bridge” between Clojure and Ruby – the full path is Clojure -> Java -> C -> Ruby). So I did the second best thing: I used the Transit library for Ruby and Clojure and I’m basically serializing everything when I pass arguments between languages. This highly simplifies things, but may be a performance hit (that I’m willing to take – my approach in every weird experiment I do is to make it work, make it right, and then make it fast). Because I don’t need to “stream” anything, I made a to-str and from-str in Ruby and Clojure, and I assumed every parameter and result with some exceptions is a string.

So, to begin everything, I did a proof-of-concept task: to receive a Ruby object and easily convert to Clojure, and back:

(ns big-duck.core
  (:require [cognitect.transit :as transit])
  (:import [java.io ByteArrayInputStream ByteArrayOutputStream]))

(gen-class
 :name "big_duck.core"
 :methods [^:static [receive [String] String]])

(defn- from-string [^java.lang.String string]
  (let [in (ByteArrayInputStream. (.getBytes string))
        reader (transit/reader in :json)]
    (transit/read reader)))

(defn- ^java.lang.String to-string [obj]
  (let [out (ByteArrayOutputStream.)
        writer (transit/writer out :json)]
    (transit/write writer obj)
    (.toString out)))

(defn -receive [x]
  (to-string (println "hello, world!" (from-string x))))

And then I made a Java code to offer a better way API for GraalVM:

package big_duck;

import org.graalvm.nativeimage.IsolateThread;
import org.graalvm.nativeimage.c.function.*;
import org.graalvm.nativeimage.c.type.*;
import org.graalvm.word.*;

import java.util.function.Function;

public final class LibPatao {
  @CEntryPoint(name = "receive")
  public static CCharPointer receive(IsolateThread thread, CCharPointer s) {
    String expr = CTypeConversion.toJavaString(s);
    String result = big_duck.core.receive(expr);
    CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(result);
    CCharPointer value = holder.get();
    return value;
  }
}

Amazing, right? And then I found my first problem

Lein, and compilation steps

I decided to use Leiningen to manage the project, because… well, because it works out of the box. Don’t repeat yourself. Seriously. Anyway, Lein thinks the steps to make an uberjar are to compile Java sources, then compile Clojure, and then pack everything. With is true for 99% of the projects, but not for this one… so what I did was to create a profile called :java:

(defproject big-duck "0.1.0-SNAPSHOT"
  ; ...
  :repl-options {:init-ns user} ; Speed up REPL loading
  :profiles {:uberjar {:aot :all
                       :jvm-opts ["-Dclojure.spec.skip-macros=true"
                                  "-Dclojure.compiler.direct-linking=true"]}
             :java {:java-source-paths ["src"]}})

And then compilation happens in two steps: generate the uberjar, then call lein with-profile +java javac. This will generate the java class inside target/classes but that’s fine because we don’t actually need the .class file to be inside the uberjar.

Then, it’s time to compile things with GraalVM…. with this weirdly large command:

$JAVA_HOME/bin/native-image  \
  --shared \
  --no-fallback \
  -jar target/big-duck-0.1.0-SNAPSHOT-standalone.jar \
  -cp target/classes/ \
  -H:+ReportExceptionStackTraces \
  -H:Name=libexample \
  -J-Dclojure.spec.skip-macros=true \
  -J-Dclojure.compiler.direct-linking=true \
  -H:ReflectionConfigurationFiles=reflection.json \
  --features=clj_easy.graal_build_time.InitClojureClasses

Notice the clj_easy.graal_build_time – I had to add this library to my classpath, and I did that by adding as a dependency.

This generated a .so, a bunch of .h files, etc. I moved all of them to an ext folder.

The file

This is the header file generated by GraalVM:

#ifndef __LIBEXAMPLE_H
#define __LIBEXAMPLE_H

#include <graal_isolate.h>


#if defined(__cplusplus)
extern "C" {
#endif

int run_main(int argc, char** argv);

char* receive(graal_isolatethread_t*, char*);

#if defined(__cplusplus)
}
#endif
#endif

Every method needs to receive this graal_isolatethread_t* parameter, which in my case, was fine – it’s hard to control objects that live in Java-land and need to become C pointers, and somehow make then survive garbage collector of Java but also making them cleanup fine in Ruby-land, so I decided to make everything global in Clojure-land and generate a new graal_isolatethread for every object of Pathom – which is fine, usually we don’t have too many Pathom objects anyway (I HOPE!). Still, let’s continue the whole process

Ruby gem, finally

With all of that, we need to generate a Ruby class in the C library. What I did was to make a C struct to hold the GraalVM isolate and allocate that on class instantiation (and also freeing it on destructor):

#include <ruby.h>
#include <libexample.h>

typedef struct {
    graal_isolate_t *isolate;
    graal_isolatethread_t *thread;
} RubyClassData;

void free_struct(RubyClassData* self) {
  graal_tear_down_isolate(self->thread);
  self->isolate = NULL;
  self->thread = NULL;
  free(self);
  self = NULL;
}

static VALUE ruby_class_allocate(VALUE klass) {
    RubyClassData *data = malloc(sizeof(RubyClassData));
    if (graal_create_isolate(NULL, &data->isolate, &data->thread) != 0) {
      fprintf(stderr, "initialization error\n");
      return 1;
    }
    return Data_Wrap_Struct(klass, NULL, free_struct, data);
}

static graal_isolatethread_t* get_thread(VALUE self) {
    RubyClassData *data;
    Data_Get_Struct(self, RubyClassData, data);
    return data->thread;
}

const char* example_receive(VALUE self, VALUE argument) {
  const char* result = receive(
    get_thread(self),
    StringValueCStr(argument)
  );
  return rb_str_new2(result);
};

void Init_example() {
  VALUE ExampleClass = rb_define_class("Example", rb_cObject);
  rb_define_alloc_func(ExampleClass, ruby_class_allocate);
  rb_define_method(ExampleClass, "receive", example_receive, 1);
}

This is quite a lot to handle, so let’s start with the basics: in the C library for making gems, every ruby value is called VALUE. We have some tools to convert that to C stuff, and we’ll be using basically StringValueCStr, which converts a VALUE to a char*, and rb_str_new2, which converts a VALUE to a char*.

To create a class in Ruby-land we use the line VALUE ExampleClass = rb_define_class("Example", rb_cObject); First argument is the class name as a char*, and second is the “superclass”, which is this case is just Object. Then we define an “allocation function” which will allocate C struct for our code – that is defined in rb_define_alloc_func(ExampleClass, ruby_class_allocate);. In that function, we do a malloc (the C way to allocate memory) to allocate the structure with the Graal stuff, then we run the graal_create_isolate, which basically created the “Isolate” and the “Thread” that we need to pass to every function that we want to call in Java-land. We also define a free_struct, which, unsurprisingly, frees memory.

Finally, we define the method receive, inside a class, says that the implementation of that method is the function example_receive, and it accepts one argument; the implementation is straightforward and, again, we only handle strings. So what now?

From string to complex objects

Remember how we implemented receive in Clojure-land:

(defn -receive [x]
  (to-string (println "hello, world!" (from-string x))))

We are going to do the same in Ruby-land, but NOT in C – because, honestly, why? So we’re going to create a Ruby class in Ruby itself:

require "transit"
require_relative './example'

class Patao < Example
  def receive(param)
    from_str super(to_str(param))
  end

  def to_str(obj)
    io = StringIO.new('', 'w+')
    writer = Transit::Writer.new(:json, io)
    writer.write(obj)
    io.string.tap { io.close }
  end

  def from_str(str)
    reader = Transit::Reader.new(:json, StringIO.new(str))
    reader.read
  end
end

And that’s all – we get any parameter we received, serialize it with Transit, send it to Java-land, then send it to Clojure-land, where it gets de-serialized, we’ll print “Hello, World” followed by the actual parameter we sent, and then the result (which in this case, will always be nil) is serialized back to Transit, then sent to the C API, back to Ruby, that will call from_str on it, and that will be nil in Ruby. I tested with multiple situations – HashMaps, Arrays, Keywords, Strings, Numbers, true/false/nil, and they all serialize correctly – which is honestly AMAZING.

But… of course, that’s not all. I wanted to wrap Pathom, and in that library, I have “resolvers” – basically, I define a data structure that holds a callback and that is called when the resolver is needed. Which means… I had to serialize functions… from Ruby, to C, then to Java, then to Ruby….

…. which opens a whole cam of worms that will become a different post 🙂