At the first part of this series I showed how to expose some Java methods (that delegates to Clojure functions) to a shared library, so that the shared library could be exposed in Ruby via the C bindings. There is a lot of “cutting corners” on the first implementation, for example, we only transmitted strings that would be serialized and de-serialized between languages, and the fact we’re using GraalVM “isolates” to allow multiple instances of “VMs” to exist, so we can fake having multiple objects, etc. In this post, I will fix some of these issues, and then show how to expose complex objects.
So, here’s a quick overview of the Clojure library I want to expose: the library allows you to define “resolvers”, which are functions. The difference between a resolver and a “normal function” is that the resolver always expects and returns a “Map” (HashMap in Ruby). The resolver also have some “metadata” that explains, at least, the dependencies: meaning, what are the keys that the “input” of the resolver must have, and what are they keys that the “output” will return.
The implications of this are huge, although they might not appear so – the idea is that, instead of wiring up things manually (like, for example, querying a database, getting the result, sending to some external service, get the result of the service, do something else) you just define the “dependencies” – for example, “I need the user’s login and email for this external service” as a “resolver” – if hashmap doesn’t have it, you’ll have a different resolver saying “Query the database and return the user’s login and email” and so on.
Here’s how I envisioned the code above, in Ruby:
user_data = Resolver.new(inputs: [:user__id], outputs: [:user__login, :user__email]) do |inputs| # Inputs will ALWAYS be a HashMap with {:user__id <someting>} on it user = User.select(:login, :email).find(inputs[:user__id]) { user__login: user.login, user__email: user.email } end external_data = Resolver.new(inputs: [:user__login, :user__email], outputs: [:oauth__token]) do |user| # Again, user will always be a HashMap with {user__login: <login>, user__email: <email>} result = RestService.call(user[:user__login], user[:user__email]) { oauth__token: result.body[:token] } end
Here’s a problem: in the last post, we were serializing data, so hashmaps, arrays, keywords, strings, etc. But now… we need to serialize Ruby’s Proc
s objects, somehow expose them to the Java side (and then to the Clojure side) and finally, be able to call said blocks from Clojure (as if they’re Clojure functions) and then get the result. GraalVM offers a CFunctionPointer
, which is a way to somehow call a function from C from Java, and it’s honestly kinda weird to use, so here are the steps:
- We first define a sub-interface of the CFunctionPointer. We can’t send any arguments, but we can define as many inputs we want.
- We define a “callback” class. This “callback” will hold the “CFunctionPointer” subclass, the arguments we want to send to it, and it’ll be responsible to actually expose our callback to Clojure somehow.
- We serialize the input before sending to the “callback class”, get back the output, serialize it back, and return as a string, so that we can de-serialize in Ruby
- We add some annotations so that everything works.
The Callback Dance
For the record, I can’t bother to remember all the correct packages for everything GraalVM-related, so I’m assuming we’re importing everything on top of the file – so we’ll import:
import org.graalvm.nativeimage.c.function.*; import org.graalvm.nativeimage.c.type.*; import org.graalvm.word.*;
So, here are the things we need to be aware of: in Ruby, a “Callback” is a “Proc” object. Because Ruby is a dynamic language, in the C API a Proc
appears as a VALUE
type – exactly the same as a number, string, class, etc – everything is the same VALUE
type.
We can’t pass a VALUE
to GraalVM and expect it to works as a “function pointer”. We actually need to pass a C function, indeed. We might be tempted to do this:
void Init_patao_impl() { // ... lots of "Patao" implementation here // We define a class "ResolverImpl" ResolverImplClass = rb_define_class_under(PataoModule, "ResolverImpl", rb_cObject); // And here we define the "initialize" function that will accept the "name", "params", and a "block" object rb_define_method(ResolverImplClass, "initialize", resolver_class_initialize, 3); } // This should be BEFORE the code above, but for the sake of simplicity, let's // simply put it here so the code appears AFTER the explanation :D static VALUE resolver_class_initialize(VALUE self, VALUE name, VALUE params, VALUE block) { const char* call_function(const char* str_from_clojure) { // We wrap the string that comes from the Clojure/Java land... VALUE str_param = rb_str_new2(str_from_clojure); // ... call the block that we passed as a parameter by invoking // the method "call" on it, sending the param as an argument VALUE ret = rb_funcall(block, rb_intern("call"), 1, str_param); // ...and get back the result as a string so we can send it back // to GraalVM const char* result = StringValueCStr(ret); return result; }; void* callback = &call_function; void* resolver = gen_resolver( global_thread, &call_function, StringValueCStr(name), StringValueCStr(params) ); // ... some wrapping here ... return Qnil; }
But this doesn’t work – in fact, it crashes with a dreadful “segmentation fault” (hopefully – if it doesn’t, it might pose a HUGE security risk). The reason is very simple: C is not a garbage-collected language. What this means is that as soon as the resolver_class_initialize
(which is the enclosing scope of call_function
) exits, the call_function
won’t exist anymore – and because we need to hold a pointer to this inner pointer function and send to Java, when Java calls that function again, it will be pointing to unallocated data (at best), causing the “stack overflow” error or something else entirely (at worst), possibly opening up a huge exploit in your code.
So, what can we do? Well, we can define a function that never gets out of scope, and to do that, we need to create it on the top level scope. But here’s another problem – as you can see, we have this variable block
, and its value will obviously be different for every invocation of the resolver_class_initialize
function. If we create a top level function, we can’t capture the block anymore. So here’s another technique that lots of C programmers use to be able to send arbitrary data to the function: instead of the top level function receiving a single parameter (the str_from_clojure
data), we can define a function that receives both the str_from_clojure
and some “arbitrary data” that can be basically anything: in this case, it will be the block from Clojure.
Then, instead of sending to the Java world only the “function pointer”, we’ll also send the “block”, keep the “block” in the Java side as an “opaque object”, and then when we want to “callback” the pointer function, we’ll send both the “block” and the “string” that we want to pass as a parameter. In the C language, we usually use void*
as a “anything goes” data structure, so it’s convenient that on GraalVM, “opaque arbitrary C data” is called… VoidPointer
.
So here are the steps:
Step 1: Define an interface that reflects the C function pointer.
Because we changed our C code to look like this:
// We need to add this "block" argument here... const char* call_function(VALUE block, const char* str_from_clojure) { VALUE str_param = rb_str_new2(str_from_clojure); VALUE ret = rb_funcall(block, rb_intern("call"), 1, str_param); const char* result = StringValueCStr(ret); return result; }; static VALUE resolver_class_initialize(VALUE self, VALUE name, VALUE params, VALUE block) { void* callback = &call_function; void* resolver = gen_resolver( global_thread, &call_function, // and then cast it to void* here to send to Java-land (void*) block, StringValueCStr(name), StringValueCStr(params) ); // ... some wrapping here ... return Qnil; }
We’ll now need to define an interface in the GraalVM “java wrapper code” that reflects call_function
. We will call this interface… Callback
. Because it sounds right:
public interface Callback extends CFunctionPointer { // This annotates the "invoke" function as a "callable" in the // GraalVM native shared library @InvokeCFunctionPointer // The "VoidPointer" here is the block, or anything else we // might want to pass CCharPointer invoke(VoidPointer arg1, CCharPointer output); }
Now here’s a problem: In Clojure, we expect to call a “Clojure function”, not this weird CFunctionPointer
. And, we need to “store” the VoidPointer
in some object (so we’ll need to define a class without static objects), store in such a way that we can call it later, pass that object to Clojure, and make Clojure call that object instead. Luckily for us, Java already offers a class that somehow defines a “function”-ish – it’s called java.util.function.Function
, so we’ll import that, and subclass it so that it’ll always have the right types, it’ll have a custom constructor, and instead of just “calling back” what was sent, we’ll “call back” our Callback
with the VoidPointer
parameter:
public class AdaptedCallback implements Function<String, String> { // We define the VoidPointer here private VoidPointer arg1; // and the callback, both coming from C world private Callback callback; public AdaptedCallback(Callback callback, VoidPointer arg1) { // Very simple constructor, only stores the parameters this.arg1 = arg1; this.callback = callback; } public String apply(String argument) { // We need to convert the String to a "C string", like always... CTypeConversion.CCharPointerHolder holder = CTypeConversion.toCString(argument); CCharPointer value = holder.get(); // Then we invoke the callback with the block and value CCharPointer result = callback.invoke(arg1, value); // And we get back a "const char*" from C, that we convert back to Java String r = CTypeConversion.toJavaString(result); return r; } }
Finally we can add a “wrapper code” in our static Java class, one that will expose a C function to our Ruby library:
public final class LibPatao { // ... lots of code here ... @CEntryPoint(name = "gen_resolver") // we now receive the Callback, which is the "function pointer" from C, // and the VoidPointer, which is the block from Ruby public static ObjectHandle gen_resolver(IsolateThread thread, Callback callback, VoidPointer arg1, CCharPointer name, CCharPointer params) { // then we define the "adaptedCallback" AdaptedCallback adaptedCallback = new AdaptedCallback(callback, arg1); // and finally, call our Clojure code Object resolver = big_duck.core.gen_resolver( adaptedCallback, CTypeConversion.toJavaString(name), CTypeConversion.toJavaString(params) ); // This will be left for later... return ObjectHandles.getGlobal().create(resolver); } }
And… finally… we generate the Resolver from the “Clojure” code:
(defn- make-resolver [resolver-name params fun] (let [params (->> params (map (fn [[k v]] [(keyword "com.wsscode.pathom3.connect.operation" (name k)) v])) (into {})) resolver-fun (fn [_env input] (let [result (fun (to-string input))] (from-string result)))] (pco/resolver (symbol resolver-name) params resolver-fun))) (defn- -gen_resolver [^java.util.function.Function fun name params] (make-resolver name (from-string params) #(.apply fun %)))
This might seem convoluted, and indeed it is, but it’s the only way that we can basically serialize arbitrary data between Ruby and Clojure, considering that we also have to account the limitations in C and Java.
Now: if we try to generate a resolver in Ruby, our API calls will be very, very, very convoluted – because we’ll need to serialize the config
parameter, pass to the shared library, then on the block our “inputs” parameter will be a serialized string, that’ll need to deserialize, do whatever we want to do, serialize back a HashMap to string, and send that string as a result. So, to avoid this, we can create a class called Resolver
, that will just inherit from ResolverImpl
:
module Patao class Resolver < ResolverImpl def initialize(name, params, &b) name = name.to_s new_proc = proc { |i| to_str b.call(from_str(i)) } super(name, to_str(params), new_proc) end end end
What this will do is very simple: first, it’ll allow the constructor to receive a block, via the “block argument” (instead of needing to manually create a Proc
) and it’ll “normalize” said Proc with from_str
and to_str
(helper functions that will make serialize and deserialize the strings). When generating a resolver like this, one of three things can happen: it can work, but it can also either crash with a “Segmentation Fault” error, or crash with an error like “Undefined method ‘call’ for an instance of String”, which is… weird…
The reason is – Ruby, contrary to C, is a garbage-collected language. What this means is that the block we created (with new_proc
) will be sent to C, then to Java, stored in Java… but there’s nothing in Ruby keeping a reference to new_proc
, so it’ll be garbage-collected. The Ruby interpreter reuses a lot of memory, so the memory position of new_proc
might be replaced with a string (causing the latter error) or, it can be simply unallocated (causing the segfault error). In each of these cases, it’s not what we want – but there’s an easy fix – just keep an instance variable like @block = new_proc
for example, and that will guarantee that the block will only be removed when this class is also removed.
Now, with all the changes, we can create a resolver; we can serialize and deserialize stuff; we can query code (but it doesn’t do anything yet, because we didn’t add any code to define the resolvers for a specific query); and all that using only one GraalVM isolate (we will need to remove the “free” and “allocate” functions we made in our last post) so we can reuse resolvers between different objects.
We also introduced a lot of memory leaks that might not be so obvious, and still have no way to send a “list of resolvers” to our query. But that’s up for a future post.