Exposing Clojure to Ruby and other languages

At the first part of this series I showed how to expose some Java methods (that delegates to Clojure functions) to a shared library, so that the shared library could be exposed in Ruby via the C bindings. There is a lot of “cutting corners” on the first implementation, for example, we only transmitted strings that would be serialized and de-serialized between languages, and the fact we’re using GraalVM “isolates” to allow multiple instances of “VMs” to exist, so we can fake having multiple objects, etc. In this post, I will fix some of these issues, and then show how to expose complex objects.

So, here’s a quick overview of the Clojure library I want to expose: the library allows you to define “resolvers”, which are functions. The difference between a resolver and a “normal function” is that the resolver always expects and returns a “Map” (HashMap in Ruby). The resolver also have some “metadata” that explains, at least, the dependencies: meaning, what are the keys that the “input” of the resolver must have, and what are they keys that the “output” will return.

The implications of this are huge, although they might not appear so – the idea is that, instead of wiring up things manually (like, for example, querying a database, getting the result, sending to some external service, get the result of the service, do something else) you just define the “dependencies” – for example, “I need the user’s login and email for this external service” as a “resolver” – if hashmap doesn’t have it, you’ll have a different resolver saying “Query the database and return the user’s login and email” and so on.

Here’s how I envisioned the code above, in Ruby:

user_data = Resolver.new(inputs: [:user__id], outputs: [:user__login, :user__email]) do |inputs|
  # Inputs will ALWAYS be a HashMap with {:user__id &lt;someting&gt;} on it
  user = User.select(:login, :email).find(inputs[:user__id])
  { user__login: user.login, user__email: user.email }
end

external_data = Resolver.new(inputs: [:user__login, :user__email], outputs: [:oauth__token]) do |user|
  # Again, user will always be a HashMap with {user__login: &lt;login&gt;, user__email: &lt;email&gt;}
  result = RestService.call(user[:user__login], user[:user__email])
  { oauth__token: result.body[:token] }
end

Here’s a problem: in the last post, we were serializing data, so hashmaps, arrays, keywords, strings, etc. But now… we need to serialize Ruby’s Procs objects, somehow expose them to the Java side (and then to the Clojure side) and finally, be able to call said blocks from Clojure (as if they’re Clojure functions) and then get the result. GraalVM offers a CFunctionPointer, which is a way to somehow call a function from C from Java, and it’s honestly kinda weird to use, so here are the steps:

We first define a sub-interface of the CFunctionPointer. We can’t send any arguments, but we can define as many inputs we want.
We define a “callback” class. This “callback” will hold the “CFunctionPointer” subclass, the arguments we want to send to it, and it’ll be responsible to actually expose our callback to Clojure somehow.
We serialize the input before sending to the “callback class”, get back the output, serialize it back, and return as a string, so that we can de-serialize in Ruby
We add some annotations so that everything works.

The Callback Dance

For the record, I can’t bother to remember all the correct packages for everything GraalVM-related, so I’m assuming we’re importing everything on top of the file – so we’ll import:

import org.graalvm.nativeimage.c.function.*;
import org.graalvm.nativeimage.c.type.*;
import org.graalvm.word.*;

So, here are the things we need to be aware of: in Ruby, a “Callback” is a “Proc” object. Because Ruby is a dynamic language, in the C API a Proc appears as a VALUE type – exactly the same as a number, string, class, etc – everything is the same VALUE type.

We can’t pass a VALUE to GraalVM and expect it to works as a “function pointer”. We actually need to pass a C function, indeed. We might be tempted to do this:

void Init_patao_impl() {
  // ... lots of &quot;Patao&quot; implementation here
  // We define a class &quot;ResolverImpl&quot;
  ResolverImplClass = rb_define_class_under(PataoModule, &quot;ResolverImpl&quot;, rb_cObject);
  // And here we define the &quot;initialize&quot; function that will accept the &quot;name&quot;, &quot;params&quot;, and a &quot;block&quot; object
  rb_define_method(ResolverImplClass, &quot;initialize&quot;, resolver_class_initialize, 3);
}

// This should be BEFORE the code above, but for the sake of simplicity, let&#039;s
// simply put it here so the code appears AFTER the explanation :D

static VALUE resolver_class_initialize(VALUE self, VALUE name, VALUE params, VALUE block) {
  const char* call_function(const char* str_from_clojure) {
    // We wrap the string that comes from the Clojure/Java land...
    VALUE str_param = rb_str_new2(str_from_clojure);
    // ... call the block that we passed as a parameter by invoking
    // the method &quot;call&quot; on it, sending the param as an argument
    VALUE ret = rb_funcall(block, rb_intern(&quot;call&quot;), 1, str_param);
    // ...and get back the result as a string so we can send it back
    // to GraalVM
    const char* result = StringValueCStr(ret);
    return result;
  };

  void* callback = &amp;call_function;
  void* resolver = gen_resolver(
    global_thread,
    &amp;call_function,
    StringValueCStr(name),
    StringValueCStr(params)
  );
  // ... some wrapping here ...
  return Qnil;
}

But this doesn’t work – in fact, it crashes with a dreadful “segmentation fault” (hopefully – if it doesn’t, it might pose a HUGE security risk). The reason is very simple: C is not a garbage-collected language. What this means is that as soon as the resolver_class_initialize (which is the enclosing scope of call_function) exits, the call_function won’t exist anymore – and because we need to hold a pointer to this inner pointer function and send to Java, when Java calls that function again, it will be pointing to unallocated data (at best), causing the “stack overflow” error or something else entirely (at worst), possibly opening up a huge exploit in your code.

So, what can we do? Well, we can define a function that never gets out of scope, and to do that, we need to create it on the top level scope. But here’s another problem – as you can see, we have this variable block, and its value will obviously be different for every invocation of the resolver_class_initialize function. If we create a top level function, we can’t capture the block anymore. So here’s another technique that lots of C programmers use to be able to send arbitrary data to the function: instead of the top level function receiving a single parameter (the str_from_clojure data), we can define a function that receives both the str_from_clojure and some “arbitrary data” that can be basically anything: in this case, it will be the block from Clojure.

Then, instead of sending to the Java world only the “function pointer”, we’ll also send the “block”, keep the “block” in the Java side as an “opaque object”, and then when we want to “callback” the pointer function, we’ll send both the “block” and the “string” that we want to pass as a parameter. In the C language, we usually use void* as a “anything goes” data structure, so it’s convenient that on GraalVM, “opaque arbitrary C data” is called… VoidPointer.

So here are the steps:

Step 1: Define an interface that reflects the C function pointer.

Because we changed our C code to look like this:

// We need to add this &quot;block&quot; argument here...
const char* call_function(VALUE block, const char* str_from_clojure) {
  VALUE str_param = rb_str_new2(str_from_clojure);
  VALUE ret = rb_funcall(block, rb_intern(&quot;call&quot;), 1, str_param);
  const char* result = StringValueCStr(ret);
  return result;
};

static VALUE resolver_class_initialize(VALUE self, VALUE name, VALUE params, VALUE block) {
  void* callback = &amp;call_function;
  void* resolver = gen_resolver(
    global_thread,
    &amp;call_function,
    // and then cast it to void* here to send to Java-land
    (void*) block,
    StringValueCStr(name),
    StringValueCStr(params)
  );
  // ... some wrapping here ...
  return Qnil;
}

We’ll now need to define an interface in the GraalVM “java wrapper code” that reflects call_function. We will call this interface… Callback. Because it sounds right:

public interface Callback extends CFunctionPointer {
  // This annotates the &quot;invoke&quot; function as a &quot;callable&quot; in the 
  // GraalVM native shared library
  @InvokeCFunctionPointer
  // The &quot;VoidPointer&quot; here is the block, or anything else we
  // might want to pass
  CCharPointer invoke(VoidPointer arg1, CCharPointer output);
}

Now here’s a problem: In Clojure, we expect to call a “Clojure function”, not this weird CFunctionPointer. And, we need to “store” the VoidPointer in some object (so we’ll need to define a class without static objects), store in such a way that we can call it later, pass that object to Clojure, and make Clojure call that object instead. Luckily for us, Java already offers a class that somehow defines a “function”-ish – it’s called java.util.function.Function, so we’ll import that, and subclass it so that it’ll always have the right types, it’ll have a custom constructor, and instead of just “calling back” what was sent, we’ll “call back” our Callback with the VoidPointer parameter:

public class AdaptedCallback implements Function&lt;String, String&gt; {
  // We define the VoidPointer here
  private VoidPointer arg1;
  // and the callback, both coming from C world
  private Callback callback;

  public AdaptedCallback(Callback callback, VoidPointer arg1) {
    // Very simple constructor, only stores the parameters
    this.arg1 = arg1;
    this.callback = callback;
  }

  public String apply(String argument) {
    // We need to convert the String to a &quot;C string&quot;, like always...
    CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(argument);
    CCharPointer value = holder.get();
    // Then we invoke the callback with the block and value
    CCharPointer result = callback.invoke(arg1, value);
    // And we get back a &quot;const char*&quot; from C, that we convert back to Java
    String r = CTypeConversion.toJavaString(result);
    return r;
  }
}

Finally we can add a “wrapper code” in our static Java class, one that will expose a C function to our Ruby library:

public final class LibPatao {
  // ... lots of code here ...
  @CEntryPoint(name = &quot;gen_resolver&quot;)
  // we now receive the Callback, which is the &quot;function pointer&quot; from C,
  // and the VoidPointer, which is the block from Ruby
  public static ObjectHandle gen_resolver(IsolateThread thread, Callback callback, VoidPointer arg1, CCharPointer name, CCharPointer params) {
    // then we define the &quot;adaptedCallback&quot;
    AdaptedCallback adaptedCallback = new AdaptedCallback(callback, arg1);

    // and finally, call our Clojure code
    Object resolver = big_duck.core.gen_resolver(
      adaptedCallback,
      CTypeConversion.toJavaString(name),
      CTypeConversion.toJavaString(params)
    );
    // This will be left for later...
    return ObjectHandles.getGlobal().create(resolver);
  }
}

And… finally… we generate the Resolver from the “Clojure” code:

(defn- make-resolver [resolver-name params fun]
  (let [params (-&gt;&gt; params
                    (map (fn [[k v]] [(keyword &quot;com.wsscode.pathom3.connect.operation&quot; (name k))
                                      v]))
                    (into {}))
        resolver-fun (fn [_env input]
                       (let [result (fun (to-string input))]
                         (from-string result)))]
    (pco/resolver (symbol resolver-name) params resolver-fun)))

(defn- -gen_resolver [^java.util.function.Function fun name params]
  (make-resolver name (from-string params) #(.apply fun %)))

This might seem convoluted, and indeed it is, but it’s the only way that we can basically serialize arbitrary data between Ruby and Clojure, considering that we also have to account the limitations in C and Java.

Now: if we try to generate a resolver in Ruby, our API calls will be very, very, very convoluted – because we’ll need to serialize the config parameter, pass to the shared library, then on the block our “inputs” parameter will be a serialized string, that’ll need to deserialize, do whatever we want to do, serialize back a HashMap to string, and send that string as a result. So, to avoid this, we can create a class called Resolver, that will just inherit from ResolverImpl:

module Patao
  class Resolver &lt; ResolverImpl
    def initialize(name, params, &amp;b)
      name = name.to_s
      new_proc = proc { |i| to_str b.call(from_str(i)) }
      super(name, to_str(params), new_proc)
    end
  end
end

What this will do is very simple: first, it’ll allow the constructor to receive a block, via the “block argument” (instead of needing to manually create a Proc) and it’ll “normalize” said Proc with from_str and to_str (helper functions that will make serialize and deserialize the strings). When generating a resolver like this, one of three things can happen: it can work, but it can also either crash with a “Segmentation Fault” error, or crash with an error like “Undefined method ‘call’ for an instance of String”, which is… weird…

The reason is – Ruby, contrary to C, is a garbage-collected language. What this means is that the block we created (with new_proc) will be sent to C, then to Java, stored in Java… but there’s nothing in Ruby keeping a reference to new_proc, so it’ll be garbage-collected. The Ruby interpreter reuses a lot of memory, so the memory position of new_proc might be replaced with a string (causing the latter error) or, it can be simply unallocated (causing the segfault error). In each of these cases, it’s not what we want – but there’s an easy fix – just keep an instance variable like @block = new_proc for example, and that will guarantee that the block will only be removed when this class is also removed.

Now, with all the changes, we can create a resolver; we can serialize and deserialize stuff; we can query code (but it doesn’t do anything yet, because we didn’t add any code to define the resolvers for a specific query); and all that using only one GraalVM isolate (we will need to remove the “free” and “allocate” functions we made in our last post) so we can reuse resolvers between different objects.

We also introduced a lot of memory leaks that might not be so obvious, and still have no way to send a “list of resolvers” to our query. But that’s up for a future post.

Exposing Clojure to Ruby and other languages – Callbacks

Published by Maurício Szabo on 2025-04-21

The Callback Dance

Step 1: Define an interface that reflects the C function pointer.

Like this:

Related

Clojure

Exposing Clojure to Ruby and other languages – Java objects in C

Clojure

Clojure

Quick Post – Multiple Shadow-CLJS builds at the same runtime

Exposing Clojure to Ruby and other languages – Callbacks

Published by Maurício Szabo on 2025-04-21

The Callback Dance

Step 1: Define an interface that reflects the C function pointer.

Share this:

Like this:

Related

Related Posts

Clojure

Exposing Clojure to Ruby and other languages – Java objects in C

Clojure

Exposing Clojure to Ruby and other languages

Clojure

Quick Post – Multiple Shadow-CLJS builds at the same runtime