Last post I gave an introduction on how I’m working with GraalVM to port Clojure things to Ruby. Now, it’s time to handle the hard parts.

The first thing is – some stuff simply doesn’t work. Pathom3 (the library I wanted to port) depends on Guardrails, which requires core.async, which starts a thread pool, which GraalVM doesn’t like. At all.

And the second things is how to serialize some weird stuff like callbacks. So let’s start with the first part – how to make a “Pathom3 shared library” without GraalVM screaming at us

As soon as I added the Pathom requires to my core.clj file, I found the error Detected a started Thread in the image heap.. Graal is very friendly and basically allow you to have an option to trace where the thread was created, and I found that Guardrails was able to start it. To fix this issue, and others, I basically added some exclusions and bumped Guardrails to see if that made the problem go away – and it did. I know I was lucky, but there may be other things one can try to fix the problem. For completion, here are the list of exclusions and the dependencies I added to project.clj:

[com.wsscode/pathom3 "2023.08.22-alpha" :exclusions [org.clojure/core.async
                                                     com.fulcrologic/guardrails
                                                     com.cognitect/transit-cljs]]
[org.clojure/core.async "1.6.681"]
[com.fulcrologic/guardrails "1.2.5"]

Still, it’s a good idea to document failures so other people don’t try to replicate them. Things that I did that didn’t work:

  1. Use requiring-resolve – this crashes with “can’t find the class” in runtime
  2. Use dynamic require (basically, use (require 'some-thing) in the function I want to use some-thing) – this crashes with the same error above
  3. Any combination of the above, together with informing lein to compile the class I wanted to use – same errors
  4. Compile the CLJ code and then try to load it with import like a Java class – might work, but calling compiled Clojure from Java is not that easy.

So, with that out of the way, let’s try to serialize callbacks between Ruby, C, Java, and Clojure

Callbacks

To be able to pass callbacks between the Ruby and Clojure code, we need to decide on an interface. We could use clojure.lang.IFn, but it’s not that easy to implement the interface, so I ended up using the Java Closure API – basically, Clojure is going to expect a parameter of the class java.util.function.Function, and then on the Java side… well, that’s actually a little more complicated. Don’t ask me why, but to be able to receive functions, we need to define an interface that extends CFunctionPointer. Then, we need to annotate a function with @InvokeCFunctionPointer, and that is going to be our “callback”. So far so good, so here’s what I did:

First attempt – single param

So, because the idea is to always serialize things, the first try was to make an interface with an ˋinvokeˋ function that receives a ˋCCharPointerˋ and returns a ˋCCharPointerˋ:

interface Callback extends CFunctionPointer {
    @InvokeCFunctionPointer
    CCharPointer invoke(CCharPointer param);
  }

And now comes the complicated part: to register a resolver, the first idea was to pass a list of resolvers that already exist, the params of the new resolver, and have a return code that’s basically all resolvers with the new one concatenated. Unfortunately, at the time I could not find a way to make it work, and I found out that the solution was quite janky too, so I decided to keep a global list of resolvers in the Clojure side, and ˋgen_resolverˋ would just concatenate the new resolver into the global list. That works, and it’s really easy to implemen in the Clojure side:

(defn- make-resolver [resolver-name params fun]
  (let [params (->> params
                    (map (fn [[k v]] [(keyword "com.wsscode.pathom3.connect.operation" (name k))
                                      v]))
                    (into {}))
        resolver-fun (fn [_env input]
                       (let [result (fun (to-string input))]
                         (from-string result)))]
    (pco/resolver (symbol resolver-name) params resolver-fun)))

(def ^:private resolvers (atom []))

(defn- -gen_resolver [^java.util.function.Function fun name params]
  (swap! resolvers
         conj
         (make-resolver name (from-string params) #(.apply fun %))))

Remember that we’re receiving a ˋjava.util.function.Functionˋ so we need to call ˋ.applyˋ on it. We’re also namespacing all keywords from the resolver config, because Ruby doesn’t have the concept of “qualified keywords” so forcing the user to type the full namespace name would be quite weird.

So, things are implemented in the Clojure side, now it’s time for the java side:

  @CEntryPoint(name = "gen_resolver")
  public static void gen_resolver(IsolateThread thread, Callback callback, CCharPointer name, CCharPointer params) {
    Function<String, String> adaptedCallback = (argument) -> {
      CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(argument);
      CCharPointer value = holder.get();
      CCharPointer result = callback.invoke(value);
      String r = CTypeConversion.toJavaString(result);
      return r;
    };

    Object result = big_duck.core.gen_resolver(
      adaptedCallback,
      CTypeConversion.toJavaString(name),
      CTypeConversion.toJavaString(params)
    );
  }

Lots of conversions on ˋadaptedCallbackˋ but that’s fine – it’s just a way to make our function, that’s type ˋCallbackˋ, to become a java ˋFunctionˋ, make all the conversions between ˋCCharPointerˋ to ˋStringˋ and back, etc, etc… and finally, we go to the Ruby side:

static VALUE example_gen_resolver(VALUE self, VALUE name, VALUE params, VALUE block) {

  const char* call_function(const char* str_from_clojure) {
    VALUE str_param = rb_str_new2(str_from_clojure);
    VALUE ret = rb_funcall(block, rb_intern("call"), 1, str_param);
    const char* result = StringValueCStr(ret);
    return result;
  };

  void* callback = &call_function;
  gen_resolver(
    get_thread(self),
    callback,
    StringValueCStr(name),
    StringValueCStr(params)
  );
  return Qnil;
}

This is a little complicated if you’re not familiar with C, but basically – this is defining a Ruby function with 3 arguments (one of them which is a block – I’m passing the block as argument here, not with the special block syntax) and then I’m defining a “pointer function” – basically, an “inner function” that’s called ˋcall_functionˋ which the only reason to exist is to adapt a Ruby block to a function that expects a ˋconst char*ˋ and returns a ˋcost charˋ, That’s all.

I won’t show the Ruby code, because, well… this crashes everything with the infamous error “Segmentation Fault”, or “segfault” for short. It’s a known error for people that work with C that basically means we’re trying to access memory that we’re not supposed to… but why?

How pointer functions are made

I honestly tested this approach, and it did work; except that it only works if we pass a function to the Java world and immediately call it. If we don’t call the function immediately, then it segfaults.

For people familiar with C, the answer is obvious. For people that are not, the reason is simple – we declared ˋcall_functionˋ inside the example_gen_resolver – which means that as soon as we leave the function, the inner call_function is gone too, making they memory unaccessible and causing the crash.

This is a problem especially in this world where we have two languages that manipulate memory automatically and one that doesn’t (C). Still, we can solve they by moving the inner function to the outside… or can we?

Partial pointer functions

One thing to remember is that C is a low-level language. This means that most things we take for granted – Closures, partial functions, garbage collection, etc – don’t exist in C. Remember that call_function needs to receive a const char*, convert that to a Ruby String, and call the block. Now, how do I move that to the global scope and somehow make it “see” the block it’s supposed to call? Well, we actually only have a single option – to pass the block to the Java world, and then change the Callback interface to accept the block as an additional argument. The Java world doesn’t even need to know what the block even is – the argument is essentially opaque, because we’re only doing this to circumvent lots of limitations of the C language.

The final code is basically to move the call_function from outside example_gen_resolver and make it accept the block like this:

const char* call_function(VALUE block, const char* str_from_clojure) {
  VALUE str_param = rb_str_new2(str_from_clojure);
  VALUE ret = rb_funcall(block, rb_intern("call"), 1, str_param);
  const char* result = StringValueCStr(ret);
  return result;
};

The only new thing here is the VALUE block, and everything is the same; now, in the Java part:

  @CEntryPoint(name = "gen_resolver")
  public static void gen_resolver(
      IsolateThread thread, 
      Callback callback, 
      // This is the block - it's opaque, Java have NO IDEA on what this is
      // as it'll only be used to send the info to the C function
      VoidPointer arg1, 
      CCharPointer name, 
      CCharPointer params
    ) {
    Function<String, String> adaptedCallback = (argument) -> {
      CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(argument);
      CCharPointer value = holder.get();
      CCharPointer result = callback.invoke(arg1, value);
      String r = CTypeConversion.toJavaString(result);
      return r;
    };

    Object result = big_duck.core.gen_resolver(
      adaptedCallback,
      CTypeConversion.toJavaString(name),
      CTypeConversion.toJavaString(params)
    );
  }

  interface Callback extends CFunctionPointer {
    @InvokeCFunctionPointer
                        // We changed this signature
                        // to accept this "VoidPointer"
    CCharPointer invoke(VoidPointer arg1, CCharPointer output);
  }

With this, our “function pointer” is outside the gen_resolver code, so we won’t need to worry about it being out of scope and crashing the whole process. Of course, the gen_resolver function call is very weird if we call it directly from Ruby, so let’s wrap that into an easier version in Ruby:

  def gen_resolver(name, params, &b)
    new_proc = proc do |i|
      to_str b.call(from_str(i))
    end
    super(name, to_str(params), new_proc)
  end

This… sounds more complicated than it needs to be, right? But it’s actually not – the first thing we did was to generate new_proc because, remember, the resolver block will receive, from the C library, a String. But that’s not how we want to work – we want to receive arbitrary Clojure objects (that will become Ruby ones) and then return arbitrary Ruby objects (that need to be serialized to String). So new_proc will do this dance for us – it’ll deserialize the i parameter, that is a string, call the original block with the deserialized object, get the result that will be an arbitrary Ruby object and re-serialize that to string. Then we call super passing the resolver name, the params of the resolver, and the new_proc to this C code:

static VALUE example_gen_resolver(VALUE self, VALUE name, VALUE params, VALUE block) {
  void* callback = &call_function;
  gen_resolver(
    get_thread(self),
    callback,
    block,
    StringValueCStr(name),
    StringValueCStr(params)
  );
  return Qnil;
}

The only difference with this version is that we don’t define call_function inside this function, and we pass the block as a parameter to the C library.

Meme - now everything works, right? ... Right?

IRB Failures

So this code works, but… only if I don’t try to run it with IRB. Or pry. So… what was going on?

The reason, unsurprisingly, is because of Garbage Collector. Let’s look again at the Ruby code:

  def gen_resolver(name, params, &b)
    new_proc = proc ...
    super(name, to_str(params), new_proc)
  end

What happens when gen_resolver returns? Where are we holding a reference to b, or even to new_proc for this matter? The answer, of course, is “nowhere”. Because of that, Ruby happily garbage collects the new_proc variable, and the b block. If we’re lucky, it’ll replace with other object like a String, and we’ll get an error undefined methodcall’ for an instance of String`. If we’re not, it’ll segfault. Again. Please, don’t ask me how I found about that – it’s a whole weird story about trying to run QT UIs with Ruby… anyway, both cases are possible – it is possible to get an weird error like the first, or to crash the whole interpreter.

So the obvious solution is to keep a reference of the proc. Because Pathom doesn’t allow for two resolvers with the same name, this is a opportunity to also sanitize this:

  def gen_resolver(name, params, &b)
    name = name.to_s
    if(@blocks[name])
      raise ArgumentError, "Resolver #{name} is already declared"
    end

    new_proc = proc do |i|
      to_str b.call(from_str(i))
    end
    @blocks[name] = new_proc
    super(name, to_str(params), new_proc)
  end

So this fixes the issue, because now we have an instance variable @blocks keeping the reference for every block we use.

The next steps are – wrap two more functions, and solve some issues with LD_LIBRARY_PATH. Because wrapping more functions won’t add anything to this post, let’s move to the last step: wrapping up

Bundling the shared library

So… the last problem is – GraalVM generated a shared library and we want to dynamically link it. In Windows, this is easy – just put the .dll in the same directory you’re running your code. In Linux… well, not that easy because we need to put in a library path, and that usually requires root access. But there’s a way to fix that: to use -Wl,rpath,<dir> option while compiling things. This also means we need to compile the lib when the user installs the gem which is something I wanted to avoid, but for now, it’s ok.

So… here’s how to do it: we move the .c file, the shared library generated by GraalVM, and everything else, into a directory called ext. Then we make our extconf.rb like this:

require &#039;mkmf&#039;

dir_config(&#039;&lt;name-of-the-graal-shared-lib&gt;&#039;, [&#039;.&#039;])
have_library(&#039;&lt;name-of-the-graal-shared-lib&gt;&#039;)

with_ldflags(&quot;-Wl,-rpath,#{__dir__}&quot;) do
  create_makefile(&#039;&lt;name-of-the-ruby-native-library&gt;&#039;)
end

Here are some considerations – when we compile the code with GraalVM, we must name the library libSomething – this “lib” prefix is mandatory. But inside our extconf.rb, we don’t use the prefix. So if we name our library libPatao we’ll use dir_config('patao', ['.']) and have_library('patao'). It’s weird, I know…

The with_ldflags basically ask the Ruby native code to search the shared library from GraalVM in the “current directory” – in this case, the ext directory. Now, we need to configure our gemspec file:

Gem::Specification.new do |s|
  s.name        = &quot;patao&quot;
  # ...
  s.extensions &lt;&lt; &#039;ext/extconf.rb&#039;
  s.files       = [
    &quot;ext/libPatao.so&quot;,
    &quot;ext/libPatao.h&quot;, &quot;ext/graal_isolate.h&quot;,
    &quot;ext/patao.c&quot;,
    &quot;ext/extconf.rb&quot;,
    &quot;lib/patao.rb&quot;
  ]
end

I decided to keep all files intact so show that, yes, you need to include every artifact that GraalVM produced you, and the extconf.rb, and the Ruby C code to the list of files – otherwise we won’t be able to compile code in the user’s machine when the user installs the gem.

And… that’s all. For now, at least – I still need to configure a CI machine to build Windows, Mac and Linux binaries to this lib, then generate the gem with the .so, .dynlib, and .dll so that it can be installed in any system. In some next step I might test a different serialization strategy, make performance tests, and everything, but for now… surprisingly, it works!