R6 Generator Maven Plugin: Getting started

Maven plugin and annotation processor to write glue code to allow correctly annotated java class to be used within R as an set of R6 classes.

Rationale

R can use RJava or jsr223 to communicate with java. R has a class system called R6.

If you want to use a java library in R there is potentially a lot of glue code needed, and R library specific packaging configuration required.

However if you don’t mind writing an R-centric API in Java you can generate all of this glue code using a few java annotations and the normal javadoc annotations. This plugin aims to provide an annotation processor that writes that glue code and creates a fairly transparent connection between java code and R code, with a minimum of hard work. The focus of this is streamlining the creation of R libraries by Java developers, rather than allowing access to arbitrary Java code from R.

The ultimate aim of this plugin to allow java developers to provide simple APIs for their libraries, package their library using maven, push it to github and for that to become seamlessly available as an R library, with a minimal amount of fuss.

Basic usage

Write a java api:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import uk.co.terminological.rjava.RClass;
import uk.co.terminological.rjava.RMethod;
import uk.co.terminological.rjava.types.RDataframe;

/**
 * This class is a very basic example of the features of the rJava maven plugin. <br>
 * The class is annotated with an @RClass to identify it as part of the R API. <br>
 */
@RClass(
    exampleSetup = {
        "J = JavaApi$get()"
    },
    testSetup = {
        "J = JavaApi$get()",
    }
)
public class MinimalExample {

    static Logger log = LoggerFactory.getLogger(MinimalExample.class);
    
    @RMethod(examples = {
        "minExample = J$MinimalExample$new()",
        "minExample$demo(dataframe=tibble::tibble(input=c(1,2,3)), message='Hello world')"
    })
    /**
     * Documentation of the method can be done in JavaDoc and these will be present in the R documentation 
     * @param dataframe - a dataframe with an arbitrary number of columns
     * @param message - a message
     * @return the dataframe unchanged
     *  
     */
    public RDataframe demo(RDataframe dataframe, String message) {
        log.info("this dataframe has nrow="+dataframe.nrow());
        log.info(message);
        return dataframe;
    }
    
}

Key points:

  • You can annotate multiple classes with @RClass.
  • Only public methods annotated with @RMethod will feature in the R library
  • you cannot overload methods or constructors. Only one method with a given name is supported, and only one annotated constructor.
  • static and non static java methods are supported.
  • Objects that can be translated into R are returned by value
  • Other objects are passed around in R by reference as R6 Objects mirroring the layout of the java code.
  • Such objects can interact with each other in the same java api engine (see below)

Package it:

Required Maven runtime dependency

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <r6.version>1.1.0</r6.version>
    </properties>

    <groupId>io.github.terminological</groupId>
    <artifactId>r6-generator-docs</artifactId>
    <version>${r6.version}</version>
    <packaging>jar</packaging>

    <name>R6 Generator Maven Plugin Test</name>

    <dependencies>
        <dependency>
            <groupId>io.github.terminological</groupId>
            <artifactId>r6-generator-runtime</artifactId>
            <version>${r6.version}</version>
        </dependency>
...
    </dependencies>

Repository configuration if you want to use the unstable main-SNAPSHOT version of the r6-generator.

    <!-- Resolve SNAPSHOTS of the runtime library on Github packages
    not needed if you are using a stable r6.version of the r6-generator-runtime 
    and r6-generator-maven-plugin from Maven central rather than xx.xx.xx-SNAPSHOT -->
    <repositories>
        <repository>
            <id>github</id>
            <url>https://maven.pkg.github.com/terminological/m2repo</url>
        </repository>
    </repositories>

    <!-- Resolve SNAPSHOTS of the maven plugin on Github packages -->
    <pluginRepositories>
        <pluginRepository>
            <id>github</id>
            <url>https://maven.pkg.github.com/terminological/m2repo</url>
        </pluginRepository>
    </pluginRepositories>
    
    <!-- N.B. for this to work with Github packages you need a personal access token
    defined in your ~/.m2/settings.xml file as a server with the id of `github`
    to match the above, e.g.
    
    <settings>
        <servers>
            <server>
                <id>github</id>
                <username>GITHUB_USERNAME</username>
                <password>GITHUB_TOKEN</password>
            </server>
        </servers>
    </settings>
    
    All of which is probably a good reason to only use stable releases from Maven
    Central.
    -->
    

Maven plugin example configuration:

    <build>
        <plugins>
...
            <plugin>
                <groupId>io.github.terminological</groupId>
                <artifactId>r6-generator-maven-plugin</artifactId>
                <version>${r6.version}</version>
                <configuration>
                    <packageData>
                        <!-- R package metadata: -->
                        <title>A test library</title>
                        
                        <!-- As this project is documenting the r6-generator-maven-plugin 
                        I am syncing R package version with the r6-generator version. 
                        This is unlikely to be what you want to do. -->
                        <version>${r6.version}</version>
                         
                        <!-- Instead you most likely want to sync the R package 
                        version to your Java artifact version in a "normal" project 
                        e.g.: -->
                        <!-- <version>${project.version}</version> -->
                        <!-- Note that -SNAPSHOT java versions will be rolled 
                        back to previous patch version for the R package 
                        so 0.1.1-SNAPSHOT (Java) becomes 0.1.0.9000 (R)
                        this is due to the difference between R and Java versioning 
                        strategies. R tools typically use non-standard semantic 
                        versioning. -->
                        
                        <!-- Alternatively you can manage the R package version 
                        manually by putting a R style version
                        of the format 0.1.0.9000.  e.g. -->
                        <!-- <version>0.1.0.9000</version> -->
                        
                        <doi>10.5281/zenodo.6645134</doi>
                        <packageName>testRapi</packageName>
                        <githubOrganisation>terminological</githubOrganisation>
                        <githubRepository>r6-generator-docs</githubRepository>
                        <!-- often (but not in this case) the repository will be 
                        the same as the package, e.g.: --> 
                        <!-- <githubRepository>${packageName}</githubRepository> -->
                        
                        <license>MIT</license>
                        
                        <!-- this is the Description field in the R DESCRIPTION file
                        CRAN specifies some standards for this, such as it 
                        should not start with the package name and must pass 
                        spellchecks any references to other packages mut be in 
                        single quotes.-->
                        <description>
                            Documents the features of the 'r6-generator-maven-plugin' 
                            by providing an example of an R package automatically 
                            generated from Java code by the plugin. It is not 
                            intended to be useful beyond testing, demonstrating 
                            and documenting the features of the r6 generator plugin.
                        </description>
                        <maintainerName>Rob</maintainerName>
                        <maintainerFamilyName>Challen</maintainerFamilyName>
                        <maintainerEmail>[email protected]</maintainerEmail>
                        <maintainerOrganisation>terminological ltd.</maintainerOrganisation>
                        
                        <!-- Build configuration options: -->
                        
                        <!-- starts the R library with Java code in remote 
                        debugging mode: -->
                        <debug>false</debug>
                        
                        <!-- Roxygen can integrate user supplied and generated R 
                        code, but requires a working R version on the system 
                        that generates the R package. This must be set if, like 
                        this package, you define some additional manual 
                        functions in your own `.R` files in the R directory 
                        beyond those generated by the package. This kind of 
                        hybrid java and R package must use devtools::document 
                        through this option to generate the correct NAMESPACE 
                        file and documentation. -->
                        <useRoxygen2>true</useRoxygen2> 
                        
                        <!-- Runs a R CMD Check as part of the maven build and abort on failure . -->
                        <useCmdCheck>true</useCmdCheck>
                        
                         <!-- Pkgdown will generate a nice looking site. if it fails the build will abort --> 
                        <usePkgdown>true</usePkgdown>
                        
                        <!-- Install the library on the local machine when finished. disable for CI -->
                        <installLocal>true</installLocal> 
                        
                        <!-- building the javadocs into the documentation is nice but can add 
                        to the size of the package which is not helpful if submitting to CRAN -->
                        <useJavadoc>false</useJavadoc> 
                        
                        <!-- pre-compiling the binary if probably a safest option, where the compilation is done during maven build
                        the alternative is to compile the java from source code on first use of the library from within R
                        this requires the user to have a JDK installed, and uses a maven wrapper script -->
                        <preCompileBinary>true</preCompileBinary> 
                        
                        <!-- packaging all dependencies is the most robust but 
                        results in a large package size that may not be accepted on CRAN
                        however this is the simplest if the main target is r-universe or deployment via github
                        the alternative is to deploy a minimal jar and fetch all dependencies on first library use.
                        this option only applies if the binary is precompiled in the previous option. -->
                        <packageAllDependencies>true</packageAllDependencies>
                        
                        <!-- Maven shade can minimise the size of JAR files by trimming bits that you don't actually use -->
                        <useShadePlugin>true</useShadePlugin>
                        
                        <!-- any rJava VM start up options can be added here -->
                        <rjavaOpts>
                            <!-- this example sets the maximum heap size -->
                            <rjavaOpt>-Xmx256M</rjavaOpt>
                        </rjavaOpts>
                        
                    </packageData>
                    <!-- the best place to put the R package is in the directory above the java code
                    and to have the java code in a `java` subdirectory of a github repo.
                    i.e. this file would be `java/pom.xml`. This makes R optimally 
                    happy and is the best layout for new projects. -->
                    <outputDirectory>${project.basedir}/..</outputDirectory>
                </configuration>
                <executions>
                    <execution>
                        <id>clean-r-library</id>
                        <goals>
                            <goal>clean-r-library</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>flatten-pom</id>
                        <goals>
                            <goal>flatten-pom</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>generate-r-library</id>
                        <goals>
                            <goal>generate-r-library</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

And with this in place, a call to mvn package or mvn install will create your R library by adding files to your java source tree in the directory. Push your java source tree to github (Optional).

Run it from R:

# library(devtools)

# if you are using locally:
# devtools::install_local("~/Git/your-project-id")
# devtools::load_all("~/Git/your-project-id")
# OR if you pushed the project to github
# install_github("your-github-name/your-project-id")

# a basic smoke test

# the JavaApi class is the entry point for R to your Java code.
J <- testRapi::JavaApi$get()

# all the API classes and methods are classes attached to the J java api object
eg = J$MinimalExample$new()
df = eg$demo(dataframe = diamonds, message = "The diamonds dataframe")
nrow(df)
## [1] 53940

For basic info about the plugin see: https://github.com/terminological/r6-generator

For a more complete working example and further documentation see: https://github.com/terminological/r6-generator-docs