Recently, I’ve been missing Pathom (from Clojure) in the Ruby world. So I decided to check if I could expose an arbitrary Clojure library to Ruby, using the GraalVM’s “Substrate VM” (popularly known as native_image).

The way I tried to do this was to expose a shared library to C, then making a Ruby extension that consumes this shared library and then finally “normalize stuff” into Ruby. This… honestly, worked way better than I expected, and because I needed a lot of help from AI tools, guesswork, and reading some very weirdly-written Java code from Oracle (and auto-generated Javadocs) I decided to make a post explaining the whole process.

The whole experience started as weird, then easy, then VERY hard, and right now… I’m not sure where it will be, because I’m kind of changing the whole project’s core ideas (the reasons will be clear while you read this post). So let’s dive in

Prepare the Clojure code

To expose things to a shared library, you need some “special” Java classes. I say “special” because, it seems, they break what you expect from Java: there, you have a “top-level” Object class that everybody inherits implicitly. The “GraalVM” classes don’t do that – they somehow are not Object, which is weird, so things break if you try to use them as such (the worst offender being trying to debug them with System.out.println – this causes a compilation error for some reason, even if you try to call .toString on them).

Now… the ugly part is that I could not find a way to compile Clojure to a GraalVM-compatible native image. For some reason, it doesn’t detect the classes like CEntryPoint and CCharPointer, for example. So I decided to add a new profile to my project.clj file (yeah, I use Lein, if you want to suffer and try to make this work with the CLI, be my guest) then make a .clj file with all the functions I want to expose to Graal. These all need their own method signatures, and because we’re in Clojure-land, we need to be sure these are all “Java” objects (so no exposing clojure.lang.<Anything> really. It might be possible to do this, but honestly, it’s easier to just not do this, for reasons that, again, will be clearer later) – so for example, avoid returning things that are not “primitives” and if you do need to return some Java-thing, try to return an interface that Java already supports (like java.util.function.Function for example). You also want to make the signatures for your functions and then generate a Java class – this is an easy example of my code:

(ns big-duck.core
  (:require [cognitect.transit :as transit]
            [com.wsscode.pathom3.connect.indexes :as indexes]
            [com.wsscode.pathom3.interface.eql :as eql]
            [com.wsscode.pathom3.connect.operation :as pco])
  (:import [java.io ByteArrayInputStream ByteArrayOutputStream]))

(gen-class
 :name &quot;big_duck.core&quot;
 :methods [^:static [gen_resolver [java.util.function.Function String String] Object]
           ^:static [query [String] String]])

(defn- from-string [^java.lang.String string]
  (let [in (ByteArrayInputStream. (.getBytes string))
        reader (transit/reader in :json)]
    (transit/read reader)))

(defn- ^java.lang.String to-string [obj]
  (let [out (ByteArrayOutputStream.)
        writer (transit/writer out :json)]
    (transit/write writer obj)
    (.toString out)))

(def ^:private resolvers (atom []))

(def ^:private eql (atom nil))

(defn- -gen_eql []
  (let [env (-&gt; @resolvers
                indexes/register)]
    (reset! eql
            (fn eql
              ([query] (eql {} query))
              ([seed query]
               (eql/process (-&gt; env
                                (assoc :com.wsscode.pathom3.error/lenient-mode? true))
                            seed
                            query))))))

(defn- -gen_resolver [^java.util.function.Function fun name params]
  (swap! resolvers conj (-gen_static_resolver fun name params))
  (-gen_eql))

(defn- -query [str]
  (to-string (@eql (from-string str))))

There are… kind of a lot things here, but ignore it for now. You only need to know is that gen_resolver will generate a Pathom resolver and save it in the global vector resolvers and will generate the EQL function for us, and that query will query that EQL function and return the result.

Now, for the more detailed parts – from-string and to-string basically gets a Transit-serialized JSON string and serializes/deserializes it to EDN. This is used to avoid having to implement a huge API to generate Clojure maps and vectors and objects from C and vice-versa. Is it a bad idea? Probably. But it works for now, so that’s all that matters in a proof-of-concept.

Java part

To make this work, we need to expose the right classes to GraalVM. I made a Java class that simply calls these functions, but exposing the right types:

package big_duck;

import org.graalvm.nativeimage.*;
import org.graalvm.nativeimage.c.function.*;
import org.graalvm.nativeimage.c.type.*;
import org.graalvm.word.*;

import java.util.function.Function;

public final class LibPatao {
  @CEntryPoint(name = &quot;gen_resolver&quot;)
  public static void gen_resolver(IsolateThread thread, Callback callback, VoidPointer arg1, CCharPointer name, CCharPointer params) {
    AdaptedCallback adaptedCallback = new AdaptedCallback(callback, arg1);

    Object result = big_duck.core.gen_resolver(
      adaptedCallback,
      CTypeConversion.toJavaString(name),
      CTypeConversion.toJavaString(params)
    );
  }

  @CEntryPoint(name = &quot;query&quot;)
  public static CCharPointer query(IsolateThread thread, CCharPointer query) {
    String result = big_duck.core.query(CTypeConversion.toJavaString(query));
    CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(result);
    CCharPointer value = holder.get();
    return value;
  }

  // This shouldn&#039;t be needed. Unfortunately, GraalVM refuses
  // to compile if I don&#039;t put this here
  public static void main(String[] a) {
    System.out.println(&quot;OK, I guess &quot; + a);
  }
}

The only issue is that I first need to compile the Clojure code, then I need to compile the Java one. To make that I made a profile in my Lein containing :java-sources and then I run lein uberjar, followed by lein with-profile +java javac. Just remember that you need to run all this in a GraalVM machine (otherwise classes like CCharPointer won’t exist) and that’s it.

Compiling

I’m not going to say that I understand fully the compilation process. I basically copy-pasted a reflection.json file from other project, and then compiled the code with the following code:

$JAVA_HOME/bin/native-image  \
  --shared \
  --no-fallback \
  -jar target/big-duck-0.1.0-SNAPSHOT-standalone.jar \
  -cp target/classes/ \
  -H:+ReportExceptionStackTraces \
  -H:Name=libpatao \
  -J-Dclojure.spec.skip-macros=true \
  -J-Dclojure.compiler.direct-linking=true \
  -H:ReflectionConfigurationFiles=reflection.json \
  --features=clj_easy.graal_build_time.InitClojureClasses

The -jar points to the uberjar generated by lein, and the -cp is because we compiled a new Java source for the entry point. Also, the --features is a class that points to the maven dependency com.github.clj-easy/graal-build-time – we need to add this dependency on our uberjar, and the easier way is to just add to the :dependencies in your project.clj file.

This will generate some files – all starting with libpatao like the -H:Name toggle configured.

The easy parts

If your code only returns primitive-ish things (Strings, Floats, Integers, etc) then things are easy. One example of that is the query method.

Query expects a CCharPointer, and returns a CCharPointer. These mean literally “C Char Pointer” which translates to char *pointer in C. The first argument of any “native library” method is an IsolateThread and I would be lying if I told you I fully understand what this is. From what I do understand is that you can have multiple “Graal Isolates” running in your C code at the same time, and that’s why our global resolvers are not a problem (for now) – they can “live” in different isolates, meaning that if I want to expose them to a Ruby class, as long as I create a different “isolate” when the class is created, and destroy such “isolate” when the class is GC-ed, everything works fine.

You have these CTypeConversion things that convert a CCharPointer to a java String (CTypeConversion.toJavaString in this case) and these “holders” that get a Java string, “hold them” as a a C pointer, and then get the pointer as a CCharPointer to be back to C code. Should I actually free the C char pointer, from C or from Java? Honestly, I have no idea. I tried to find if I had too, and AFAIK, I need to, but all the code that I found online only got me a segmentation fault, so for now I’m assuming that I have memory leaks in my code, and I’ll just move on.

Now for the Ruby part: you can actually define a Ruby class in C very easily, and then define what we call an “allocator function”. These have two different responsibilities: the first one, to define how to wrap some arbitrary C structure, and the second, how to “free” that structure. If you’re not familiar with C (or C++), these are languages where you have to manually handle memory – which means that if my Java code somehow defines some structure that needs to be initialized (as it does!) then you’ll also need to manually “cleanup” said structures. In our case, the class needs to wrap the structure containing the GraalVM isolate and thread, and then free these when the object is gone:

#include &lt;ruby.h&gt;
// This comes from GraalVM after I compile the library:
#include &lt;libpatao.h&gt;

// This is the structure I will wrap:
typedef struct {
    graal_isolate_t *isolate;
    graal_isolatethread_t *thread;
} RubyClassData;

void free_struct(RubyClassData* self) {
    // free the Graal objects
    graal_tear_down_isolate(self-&gt;thread);
    // set everything to NULL, which is a good practice always
    self-&gt;isolate = NULL;
    self-&gt;thread = NULL;
    // Free the &quot;wrap object&quot;
    free(self);
    // And again, set it to NULL
    self = NULL;
}

static VALUE ruby_class_allocate(VALUE klass) {
    // This allocates memory for the struct above...
    RubyClassData *data = malloc(sizeof(RubyClassData));
    // This creates the &quot;isolate&quot; and &quot;thread&quot;:
    if (graal_create_isolate(NULL, &amp;data-&gt;isolate, &amp;data-&gt;thread) != 0) {
        fprintf(stderr, &quot;initialization error\n&quot;);
        return 1;
    }
    // And this will wrap around the PataoImpl object
    // free_struct is called to free the structure
    return Data_Wrap_Struct(klass, NULL, free_struct, data);
}

void Init_patao_impl() {
    VALUE PataoImplClass = rb_define_class(&quot;PataoImpl&quot;, rb_cObject);
    rb_define_alloc_func(PataoImplClass, ruby_class_allocate);
}

So now we only need to wrap the “query” function in Ruby. The idea is to be able to call something like PataoImpl.new.query(some_serialized_string) and that might return me some_serialized_result as a string:

static graal_isolatethread_t* get_thread(VALUE self) {
    RubyClassData *data;
    // This will get the wrapped data in the object
    Data_Get_Struct(self, RubyClassData, data);
    // And simply get the thread
    return data-&gt;thread;
}

static VALUE patao_impl_query(VALUE self, VALUE q) {
    // Convert the Ruby string to a C one
    const char* q2 = StringValueCStr(q);
    // Get the result of a query
    const char* result = query(get_thread(self), q2);
    // Wrap it back in a Ruby string
    // (probaby I have a leak here)
    return rb_str_new2(result);
}

void Init_patao_impl() {
    ...
    rb_define_method(PataoImplClass, &quot;query&quot;, patao_impl_query, 1);
}

In a next post, I’ll show how I basically decided to change everything because I found out that I don’t actually want different isolates for each object…