R6 Generator Maven Plugin: Metadata and Maven configuration

DRAFT

Maven pom.xml options

An example of the Maven configuration from this test project is shown below. The key parts of this configuration are described below:

  • Dependency on the r6-generator-runtime Java library.
  • Github Maven repository declarations for both code and plugins.
  • Plugin configuration for the r6-generator-maven-plugin.

Sample pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
...
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <r6.version>1.1.0</r6.version>
    </properties>

    <groupId>io.github.terminological</groupId>
    <artifactId>r6-generator-docs</artifactId>
    <version>${r6.version}</version>
    <packaging>jar</packaging>

    <name>R6 Generator Maven Plugin Test</name>

    <dependencies>
        <dependency>
            <groupId>io.github.terminological</groupId>
            <artifactId>r6-generator-runtime</artifactId>
            <version>${r6.version}</version>
        </dependency>
...
    </dependencies>
...
    <!-- Resolve SNAPSHOTS of the runtime library on Github packages
    not needed if you are using a stable r6.version of the r6-generator-runtime 
    and r6-generator-maven-plugin from Maven central rather than xx.xx.xx-SNAPSHOT -->
    <repositories>
        <repository>
            <id>github</id>
            <url>https://maven.pkg.github.com/terminological/m2repo</url>
        </repository>
    </repositories>

    <!-- Resolve SNAPSHOTS of the maven plugin on Github packages -->
    <pluginRepositories>
        <pluginRepository>
            <id>github</id>
            <url>https://maven.pkg.github.com/terminological/m2repo</url>
        </pluginRepository>
    </pluginRepositories>
    
    <!-- N.B. for this to work with Github packages you need a personal access token
    defined in your ~/.m2/settings.xml file as a server with the id of `github`
    to match the above, e.g.
    
    <settings>
        <servers>
            <server>
                <id>github</id>
                <username>GITHUB_USERNAME</username>
                <password>GITHUB_TOKEN</password>
            </server>
        </servers>
    </settings>
    
    All of which is probably a good reason to only use stable releases from Maven
    Central.
    -->
    
...
    <build>
        <plugins>
...
            <plugin>
                <groupId>io.github.terminological</groupId>
                <artifactId>r6-generator-maven-plugin</artifactId>
                <version>${r6.version}</version>
                <configuration>
                    <packageData>
                        <!-- R package metadata: -->
                        <title>A test library</title>
                        
                        <!-- As this project is documenting the r6-generator-maven-plugin 
                        I am syncing R package version with the r6-generator version. 
                        This is unlikely to be what you want to do. -->
                        <version>${r6.version}</version>
                         
                        <!-- Instead you most likely want to sync the R package 
                        version to your Java artifact version in a "normal" project 
                        e.g.: -->
                        <!-- <version>${project.version}</version> -->
                        <!-- Note that -SNAPSHOT java versions will be rolled 
                        back to previous patch version for the R package 
                        so 0.1.1-SNAPSHOT (Java) becomes 0.1.0.9000 (R)
                        this is due to the difference between R and Java versioning 
                        strategies. R tools typically use non-standard semantic 
                        versioning. -->
                        
                        <!-- Alternatively you can manage the R package version 
                        manually by putting a R style version
                        of the format 0.1.0.9000.  e.g. -->
                        <!-- <version>0.1.0.9000</version> -->
                        
                        <doi>10.5281/zenodo.6645134</doi>
                        <packageName>testRapi</packageName>
                        <githubOrganisation>terminological</githubOrganisation>
                        <githubRepository>r6-generator-docs</githubRepository>
                        <!-- often (but not in this case) the repository will be 
                        the same as the package, e.g.: --> 
                        <!-- <githubRepository>${packageName}</githubRepository> -->
                        
                        <license>MIT</license>
                        
                        <!-- this is the Description field in the R DESCRIPTION file
                        CRAN specifies some standards for this, such as it 
                        should not start with the package name and must pass 
                        spellchecks any references to other packages mut be in 
                        single quotes.-->
                        <description>
                            Documents the features of the 'r6-generator-maven-plugin' 
                            by providing an example of an R package automatically 
                            generated from Java code by the plugin. It is not 
                            intended to be useful beyond testing, demonstrating 
                            and documenting the features of the r6 generator plugin.
                        </description>
                        <maintainerName>Rob</maintainerName>
                        <maintainerFamilyName>Challen</maintainerFamilyName>
                        <maintainerEmail>[email protected]</maintainerEmail>
                        <maintainerOrganisation>terminological ltd.</maintainerOrganisation>
                        
                        <!-- Build configuration options: -->
                        
                        <!-- starts the R library with Java code in remote 
                        debugging mode: -->
                        <debug>false</debug>
                        
                        <!-- Roxygen can integrate user supplied and generated R 
                        code, but requires a working R version on the system 
                        that generates the R package. This must be set if, like 
                        this package, you define some additional manual 
                        functions in your own `.R` files in the R directory 
                        beyond those generated by the package. This kind of 
                        hybrid java and R package must use devtools::document 
                        through this option to generate the correct NAMESPACE 
                        file and documentation. -->
                        <useRoxygen2>true</useRoxygen2> 
                        
                        <!-- Runs a R CMD Check as part of the maven build and abort on failure . -->
                        <useCmdCheck>true</useCmdCheck>
                        
                         <!-- Pkgdown will generate a nice looking site. if it fails the build will abort --> 
                        <usePkgdown>true</usePkgdown>
                        
                        <!-- Install the library on the local machine when finished. disable for CI -->
                        <installLocal>true</installLocal> 
                        
                        <!-- building the javadocs into the documentation is nice but can add 
                        to the size of the package which is not helpful if submitting to CRAN -->
                        <useJavadoc>false</useJavadoc> 
                        
                        <!-- pre-compiling the binary if probably a safest option, where the compilation is done during maven build
                        the alternative is to compile the java from source code on first use of the library from within R
                        this requires the user to have a JDK installed, and uses a maven wrapper script -->
                        <preCompileBinary>true</preCompileBinary> 
                        
                        <!-- packaging all dependencies is the most robust but 
                        results in a large package size that may not be accepted on CRAN
                        however this is the simplest if the main target is r-universe or deployment via github
                        the alternative is to deploy a minimal jar and fetch all dependencies on first library use.
                        this option only applies if the binary is precompiled in the previous option. -->
                        <packageAllDependencies>true</packageAllDependencies>
                        
                        <!-- Maven shade can minimise the size of JAR files by trimming bits that you don't actually use -->
                        <useShadePlugin>true</useShadePlugin>
                        
                        <!-- any rJava VM start up options can be added here -->
                        <rjavaOpts>
                            <!-- this example sets the maximum heap size -->
                            <rjavaOpt>-Xmx256M</rjavaOpt>
                        </rjavaOpts>
                        
                    </packageData>
                    <!-- the best place to put the R package is in the directory above the java code
                    and to have the java code in a `java` subdirectory of a github repo.
                    i.e. this file would be `java/pom.xml`. This makes R optimally 
                    happy and is the best layout for new projects. -->
                    <outputDirectory>${project.basedir}/..</outputDirectory>
                </configuration>
                <executions>
                    <execution>
                        <id>clean-r-library</id>
                        <goals>
                            <goal>clean-r-library</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>flatten-pom</id>
                        <goals>
                            <goal>flatten-pom</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>generate-r-library</id>
                        <goals>
                            <goal>generate-r-library</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
...
</project>

The r6-generator-runtime Java library

The runtime Java library contains code needed in the development of R compatible libraries in Java. These are the annotations that identify a Java Class as part of the R Api, (uk.co.terminological.rjava.RClass and uk.co.terminological.rjava.RMethod), specialised data types that can be round tripped between R and Java (in package uk.co.terminological.rjava.types.*), and various data conversion helpers (e.g. uk.co.terminological.rjava.RConverter) which can be used to accelerate development of an R centric API in Java.

Github maven repository declarations for both code and plugins

Both the core maven plugin and the runtime library are available through the the Maven central repository. The ${r6.version} property is here set to xx.xx.xx-SNAPSHOT which retrieves the development version of the plugin. or could be 1.0 for stable versions. The most recent stable release version numbers will be available on the github releases page or on maven central.

Unstable releases are managed on Github packages. Use of Github SNAPSHOT releases is only possible if you have set up a Github personal access token (classic). This process is described here and your personal tokens can be set up here. For package read access ticking the box for read:packages is sufficient. The token generated must then be copied into your ~/.m2/settings.xml to allow Maven to pick up the SNAPSHOTs. It is generally easier to just use Maven central releases.

Plugin configuration for the r6-generator-maven-plugin

outputDirectory

The plugin needs to know where to output the R library. In the example above this is <outputDirectory>${project.basedir}/..</outputDirectory> which specifies the generated R package will reside in the directory above the java pom.xml. This infers that your Java code should be in a subdirectory of the R project (and hence the root of the Git repository). We recommend using the java directory for this. Keeping the Java source code within the R project is the best option as the resulting R package can be committed to GitHub with the Java source in the java directory (which is where R CMD Check expects sources to be). This may however be impossible if the Java library requires that is in the root of the github project. In this case the R and Java code can co-exist, and the pom.xml can be in the project root. This will be OK however if you plan on submitting your library to CRAN it will generate a few NOTES that will need to be explained.

If the R package is generated in a sub-directory of the Java code this has implications for the ability to compile that code from source from within R, and is generally a bad idea. In this case the fat JAR must be distributed with the R package and the fact that the R package is in a sub-directory will need to be be specified if the library is intended to be installed via GitHub (e.g. using devtools::install_github("my-org/my-project"). It is also equally possible to place the generated code into the root directory of the project (${project.basedir}) or indeed into a completely separate directory.

packageData

The package data section includes metadata that describes the R library and a few control flags:

  • title
    • the title is used in description files and man pages
  • version (optional - defaults to ${package.version})
    • the version of the R package that will be generated. This can be either specified as a Java/Maven style semantic version (e.g. xx.xx.xx-SNAPSHOT by reusing the Maven ${package.version}) or an R style version (yy.yy.yy.9000). If given as a Java/Maven style -SNAPSHOT version the generated R package will be one patch version less - so 1.0.0-SNAPSHOT will generate an R package with version 0.99.99.9000. This is to maintain consistent ordering between the two versioning styles when you are syncing the R package to Maven versions.
  • debug (optional - defaults to false)
    • a true or false value that determines whether remote java debugging should be compiled into the R package. Remote debugging affects performance and can prevent the loading of the R package if previous versions have not been unload correctly. This is useful only for debugging R to Java integration problems.
  • usePkgdown (optional - defaults to false)
    • If the generated R package is working and loads correctly we can use r cran("pkgdown") to generate online documentation. This will be generated in a docs sub-folder of the project outputDirectory. This can be pushed to GitHub and used as online documentation of the generated R library. Regardless of online documentation, the standard R man pages are generated for interactive help on the package. Pkgdown documentation will not be generated if the project is in debug mode.
  • useRoxygen2 (optional - defaults to false)
    • If the generated R package is working and loads correctly we can use devtools to generate configuration files and man pages for the package using Roxygen2 annotations. This also allows co-existence of manually written and generated functions, within the same package. This is a fairly experimental feature and I can’t promise that generated code will not accidentally over-write manually written code. Use it at your own risk.
  • useJavadoc (optional - defaults to false)
    • Along side the R documentation it may be helpful to provide Javadocs of the Java back end of the library. These are generated into the docs\javadoc sub-folder.
  • installLocal (optional - defaults to false)
    • If building the R package is successful maven will try and use pak::local_install to install it and any dependencies on the local machine. This is useful during development as it gets around some of the potential version inconsistencies that crop up during iterative development. It is not intended for use outside of development e.g. in a CI environment.
  • precompileBinary (defaults to true)
    • Usually we wish to distribute a package with a compiled jar file of the Java code. In exceptional circumstances this can be replaced with a sources only distribution. In this case the Java code will be compiled on first use on the users machine, which should be done automatically. To make this possible the user must have a Java development kit installed.
  • packageAllDependencies (defaults to true, only relevant if precompileBinary is true)
    • A single jar file is created including all dependencies in one “uber-jar”. This is the most reliable option but transitive dependencies result in large Java libraries which will almost always exceed CRAN’s stringent 4Mb limits. This limit is not an issue if you plan on deploying to r-universe, which is how I package my projects. If this option is set to false then the resulting jar file will need to fetch dependencies from the internet on first use. This can be time consuming and a potential point of failure.
  • useShadePlugin (defaults to false, only relevant if packageAllDependencies is true)
    • The Maven shade plugin can be used to minimise the size of a “uber-jar” but sometimes removes essential pieces. This could work to reduce the size of a packaged jar file but thorough testing of the result is recommended.
  • rjavaOpts\rjavaOpt
    • A list of JVM flags can be provided that will be provided during the initialization of the Java Virtual Machine (JVM). In the example here the JVM is given 256Mb of heap space (-Xmx256M). Only one JVM is initialised regardless of how many different Java based R packages are used, and only the first one gets to decide the initial parameters. As such these options may be ignored by R if a JVM has already been created. Javadoc documentation will not be generated if the project is in debug mode.
  • packageName
    • the most important entry - the desired name of the package forms the name space for the package so it is best that it is short and has not already been used for a R package - e.g. don’t call it “stats”.
  • license
    • a license specification - it is best to use a CRAN friendly license. N.B. This should be MIT rather than MIT + file LICENSE, the additional license file is added if needed.
  • maintainerName, maintainerFamilyName, maintainerEmail, maintainerOrganisation
    • the author details used here are supplemented by any author details found in @Author Java doclet tags in the code.

Goals

The plugin has three goals - clean-r-library, flatten-pom and generate-r-library

  • clean-r-library (binds to Maven clean phase)
    • clear out old generated files leaving user modified files in place.
  • flatten-pom (binds to Maven process resources phase)
    • create a simplified pom, collapsing any details inherited from parent pom and removing dependencies on parent poms
  • generate-r-library (binds to Maven install phase)
    • run the code generator and install compiled jar into the R package.

In general all 3 goals should be run.