---
title: "R6 Generator Maven Plugin: Key features"
author: "Rob Challen"
date: "19/10/2020"
output: 
  html_vignette
header-includes:
  \usepackage{amsmath}
  \usepackage{minted}
  \usemintedstyle{emacs}
  \setminted[java]{fontsize=\footnotesize,tabsize=3}
  \setminted[xml]{fontsize=\footnotesize,tabsize=3}
vignette: >
  %\VignetteIndexEntry{R6 Generator Maven Plugin: Key features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = TRUE, warning = TRUE, error = TRUE)
library(tidyverse)

here::i_am("vignettes/R6-generator-features.Rmd")
source(here::here("vignettes/codeSnip.R"))
```

## Features

The `FeatureTest` Java class is designed to showcase the main aspects of the R6 Generator Maven Plugin, and serves as a quick guide to Java programmers wishing to use the plugin. The source of the `FeatureTest` class is shown below, where the use of the Java annotations `@RClass` and `@RMethod` tag a class, and specific methods in that class for use in R. The code structure, parameters and return values or the tagged classes and methods are used to create an equivalent R6 class structure in an R library. In general Javadoc comments and tags are used to document the library, and where there are no applicable tags specific fields in the `@RClass` and `@RMethod` annotations can been used to specify needed imports, suggested R package dependencies and provide specific example code if needed. 

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_1","END_SNIP_1") #,35,51)
```

The packaging of this class into an R library is described [elsewhere](./R6-generator-intro). The package name (in this case `testRapi`), the directory of the library (in this example `~/Git/r6-generator-maven-plugin-test/r-library/`) and other metadata such as author and license details are defined in the Maven plugin configuration (in a file named `pom.xml`). This configuration is described in detail [elsewhere](./R6-generator-maven-config). For the purposes of this we assume the Java code has been compiled, generating the `testRapi` R package which is ready for installation.

## Installation and instantiation

The generated R package can be installed into R in more or less the same way as any other R library, depending on how it is deployed. Typical scenarios would be pushing the whole Java project to Github and installing to R from Github using `devtools::install_github()`, installing directly from the local filesystem, with `devtools::install()`, or submitting the R library sub-directory as a project to CRAN and installing from there, using `install.packages()`. 

```{r echo=TRUE, eval=FALSE}
# not run
# remove installed versions
try(detach("package:testRapi", unload = TRUE),silent = TRUE)
remove.packages("testRapi")
rm(list = ls())
```

Restarting R maybe also required if there was a running java VM.

```{r echo=TRUE, eval=FALSE}
# locally compiled
devtools::install("~/Git/r6-generator-docs", upgrade = "never")
# pushed to github
# devtools::install_github("terminological/r6-generator-docs", upgrade = "never")
# submitted to CRAN
# install.packages("testRapi")

```

The R6 api to the Java classes requires a running instance of the Java Virtual Machine and JNI bridge provided by `r cran("rJava")`. It also requires Java classpath dependencies to be loaded and application logging to be initialised. This is all managed by a specific generated R6 class called `JavaApi` and creating a singleton instance of this is the first step to using the library in R. In these examples the singleton instance `J` is referred to as the "root" of the api, as all the functions of the API stem from it.

```{r message=TRUE}
J = testRapi::JavaApi$get(logLevel = "WARN")
J$changeLogLevel("DEBUG")
J$.log$debug("prove the logger is working and outputting debug statements...")
J$printMessages()
```

Using the `FeatureTest` class above requires a creating a new instance of the class. This is done through the root of the api as follows, and the `FeatureTest` constructor simply logs the `logMessage` parameter's value.

```{r}
feat1 = J$FeatureTest$new(logMessage = "Hello world. Creating a new object")
```

## Predictable data type conversion

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_2","END_SNIP_2")
```

The `FeatureClass.doHelloWorld()` method takes no arguments and returns a value to R. A detailed discussion of R and Java data types is to be found elsewhere but our approach has involved developing a specific set of Java datatypes that have close relationships to the native R datatypes. This enables loss-less round tripping of data from R to Java and back again, but requires the mapping of Java data types to R. This is handled by the `uk.co.terminological.rjava.RConverter` class which provides a range of datatype transformers, and the `uk.co.terminological.rjava.types.*` classes which specify Java equivalents to R data types. These are needed as R's dynamic datatypes contain concepts which are not readily represented in the primitive Java datatypes that are transferred across the JNI. Thus some marshaling is required on both sides to ensure translation is 100% accurate, including for example, conversion of R logical vectors containing NA values, to Java `List<Boolean>` via JNI primitive arrays, or support for typed NA values (e.g. `NA_int_` versus `NA_logical_`). 

The `doHelloWorld()` function returns a character vector, The `doSum()` function expects 2 R numeric values and seamlessly handles both datatype coercion and NA values.


```{r}
feat1$doHelloWorld()
class(feat1$doHelloWorld())

feat1$doSum(3L, 4.1)
class(feat1$doSum(3L, 4.1))

feat1$doSum(3.0, NA_real_)
class(feat1$doSum(3.0, NA_real_))
```

Wrapping and unwrapping every datatype is inconvenient for the Java programmer so some single valued primitive types are supported as parameters and return types of Java functions, particularly `int`, `char`, `double`, `boolean`, and `java.lang.String`, but these come with constraints on use, particularly around NA values in R, and use in asynchronous code.

```{r}
feat1$doSum2(3L, 4L)
class(feat1$doSum2(3L, 4L))

# casts inputs to integer
feat1$doSum2(3.0,4.0)
class(feat1$doSum2(3.0,4.0))

# fails as expects an integer
try(feat1$doSum2(3.0,4.5))

# fails as NA values are not supported by primitive types
try(feat1$doSum2(3L,NA_integer_))
```

Default values in R are demonstrated here with the `@RDefault` annotation which has a string of valid R code producing the value that you want as the default value when this method is called from R. Any valid R code that produces an input that can be coerced to the correct type is allowed here but string values must be double quoted and double escaped if needs be. (I.e. the R string `hello..<newline>...world` would be `"hello...\n...world"` in R so must be given as `@RDefault(value="\"hello...\\n...world\"")` here in an annotation).

Static Java methods are also supported. R6 does not have a concept of static methods, so to get the same look and feel as the object interface in Java, we use the root of the JavaApi as a place to hold the static methods. This enables auto-completion for static methods. In this example the static method `demoStatic` nothing (an R NULL is invisibly returned), but logs its input.

```{r}
J$FeatureTest$demoStatic("Hello, static world, in a Java-like interface.")
```

As static methods are stateless they can also be implemented as more regular package functions, for which exactly the same functionality as the format above is made. For this to work all the static functions declared in the API must have different names. At the moment this is up to the developer to ensure this is the case, although at some point I will make a check for it. To differentiate the object style of call above from the function style more common in R packages we have converted static Java method names from camel case to snake case. Therefore the same exact same call as above in the functional style is as follows. Both functional and object oriented interfaces are generated for all static methods:

```{r}
testRapi::demo_static("Hello, static world, in a more regular R-like interface.")
```

## More complex objects

The generated API has support for the loss-less bi-directional transfer of a range of R data types into Java and back to R. Extensive tests are available elsewhere but in general support for vectors, dataframes and lists is mostly complete, including factors, but matrices and arrays are not yet implemented. Dataframes with named rows are also not yet supported. Dataframes as well as other objects can be serialised in Java and de-serialised. This serialisation has been done for the ggplot2::diamonds data set, and the resulting de-serialisation shown here. Factor levels and ordering are preserved when the factor is part of a vector or dataframe.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_5","END_SNIP_5")
```

The basic smoke tests of this are as follows

```{r}
feat1$doSomethingWithDataFrame(ggplot2::diamonds)
feat1$generateDataFrame() %>% glimpse()
J$FeatureTest$diamonds() %>% glimpse()

if (identical(J$FeatureTest$diamonds(), ggplot2::diamonds)) {
  message("PASS: round tripping ggplot2::diamonds including java serialisation and deserialisation works")
} else {
  stop("FAIL: serialised diamonds from Java should be identical to the ggplot source")
}
```

## Objects, fluent apis and factory methods

The generated R6 code can handle return of Java objects to R, as long as they are a part of the api and annotated with `@RClass`. A common use case for this is fluent Apis, where the Java object is manipulated by a method and returns itself. 

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_3","END_SNIP_3")
```

The `JavaApi` root manages R's perspective on the identity of objects in Java. This allows for fluent api methods, and method chaining. This is not flawless but should work for most common scenarios. It is possible that complex edge cases may appear equal in Java but not identical in R, so true equality should rely on the Java `equals()` method. 

```{r}

feat1$getMessage()
feat2 = feat1$fluentSetMessage("Hello world. updating message.")
feat2$getMessage()

if(identical(feat1,feat2)) {
  message("PASS: the return value of a fluent setter returns the same object as the original")
} else {
  print(feat1$.jobj)
  print(feat2$.jobj)
  print(feat1$.jobj$equals(feat2$.jobj))
  stop("FAIL: these should have been identical")
}

if (feat1$equals(feat2)) {
  message("PASS: java based equality detection is supported")
} else {
  stop("FAIL: these should have been equal")
}

feat1$getMessage()
# Operations on feat2 are occurring on feat1 as they are the same underlying object
feat2$fluentSetMessage("Hello world. updating message again.")
feat1$getMessage()
```

Factory methods allow java methods to create and return Java objects. This is supported as long as the objects are a part of the api and annotated with `@RClass`. Arbitrary Java objects are not supported as return types and Java code that tries to return such objects will throw an exception during the maven packaging phase. This is by design to enforce formal contracts between the Java code and the R api. If you want dynamic manipulation of the Java objects then the `r cran("jsr223")` plugin is more appropriate for you.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_4","END_SNIP_4")
```
This Java code from refers to another class - `MoreFeatureTest` which has the following basic structure:

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/MoreFeatureTest.java"),"START_SNIP_1","END_SNIP_1")
```
The `FeatureTest.factoryMethod(a,b)` method allows us to construct instances of another class. This enables builder patterns in the R api. The `MoreFeatureTest.create(message1,message2)` method demonstrates static factory methods, which return instances of the same class. Static methods are implemented as methods in the `JavaApi` root, as demonstrated here, and accessed through the root object `J`:

```{r}
# factory method from builder class
moreFeat1 = feat1$factoryMethod("Hello","World")
# static factory method accessed through the root of the API
moreFeat2 = J$MoreFeatureTest$create("Ola","El Mundo")
# either of these can be passed as a parameter
feat1$objectAsParameter(moreFeat1)
```

## Logging, printing and exceptions

The logging sub-system is based on `slf4j` with a `log4j2` implementation. These are specified in the `r6-generator-runtime` dependency `pom.xml`, so anything that imports that will have them as a transitive dependency. These are needed as dynamic alteration of the logging level from R is dependent on implementation details of `log4j`. This is maybe possible to remove in the future. 

Exceptions thrown from Java are handled in the same way as `r cran("rJava")`, and printed messages are seen on the R console as expected. However `rJava` does something strange to messages from `System.out` that means they do not appear in knitr output. To resolve this a unsightly workaround (hack) has been adopted that collects messages from system out and prints them after the Java method has completed. This has the potential to cause all sorts of issues, which I think I have mostly resolved, but it is best described as a work in progress.

The logging level can be controlled at runtime by a function in the `JavaApi` root. Logging can be configured dynamically with a `log4j` properties file (not shown) to enable file based logging, for example.


```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/MoreFeatureTest.java"),"START_SNIP_2","END_SNIP_2")
```

```{r}
# System.out printing
moreFeat1$printMessage()

# Testing logging levels
J$changeLogLevel("ALL")
moreFeat1$testLogging()

# Suppressing errors
try(moreFeat1$throwCatchable(),silent = TRUE)

# Handling errors
tryCatch(
  {
    moreFeat1$throwRuntime()
  }, 
  error = function(e) {
    message("the error object has a set of classes: ",paste0(class(e),collapse=";"))
    warning("error: ",e$message)
    # the e$jobj entry gives native access to the throwable java object thanks to rJava.
    e$jobj$printStackTrace()
  }, 
  finally = print("finally")
)

J$changeLogLevel("ERROR")
moreFeat1$testLogging()

# J$reconfigureLog("/path/to/log4j.prop")
```

## Finalising and clean up

The Java objects bound to R instances will stay in memory whilst they are needed. When they go out of scope they should automatically be garbage collected as a native feature of `rJava`. R6 object finalizers are also generated when specified by the code and these are triggered during release of the Java objects, and may call any closing code needed in the Java library (e.g. closing input streams etc.). 

```{r}
feat1 = J$FeatureTest$new(logMessage = "Hello world. Creating a new object")
feat1$doHelloWorld()
```

When an object goes out of scope the finalizer will be called.
This can happen much later, and any errors thrown by the finalizer code
could cause issues. Code run in these finalizers can throw unchecked exceptions which are ignored and converted to logged errors.

```{r}

feat1 = NULL
gc()

```

The finalizer should also be called implicitly when the R6 object goes out of scope in R.

## Support for debugging

Debugging compiled Java code running in the context of a R is not for the faint-hearted. It definitely makes sense to test and debug the Java code in Java first. To make this possible it is useful to be able to serialise some test data in the exact format in which it will arrive in Java from R. To that end all the Java structures supported can be serialised, and de-serialised for testing purposes. The `testRapi` library presented here has a set of functions that facilitate this as static methods of `J$Serialiser`.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/Serialiser.java"),"SNIPPET_1","SNIPPET_2")
```

```{r}
s = tempfile(pattern = "diamonds", fileext = ".ser")
J$Serialiser$serialiseDataframe(dataframe = ggplot2::diamonds, filename = s)
J$Serialiser$deserialiseDataframe(filename=s) %>% glimpse()
```

With serialised test data, as dataframes, lists or named lists, development of Java functions and unit tests can be created that output values of the correct `RObject` datatype. Correct packaging and integration with R is a question of running `mvn install` to compile the Java into a jar file and generate R library code, then using `devtools::install` to install the generated R library. As you iterate development I have found it necessary to install the package and restart the session for R to pick up new changes in the compiled Java files. There is probably a cleaner way to do this but I haven't found it yet.

```BASH
# compile Java code and package R library using `mvn install` command
cd ~/Git/r6-generator-docs
mvn install
```

```R
setwd("~/Git/r6-generator-docs")
# remove previously installed versions
try(detach("package:testRapi", unload = TRUE),silent = TRUE)
remove.packages("testRapi")
# rm(list = ls()) may be required to clear old versions of the library code
# Restarting R maybe also required if there was a running java VM otherwise changes to the jars on the classpath are not picked up.
# install locally compiled R library:
devtools::install("~/Git/r6-generator-docs", upgrade = "never")
# N.B. devtools::load_all() does not appear to always successfully pick up changes in the compiled java code
```

For initial integration testing there is a debug flag in the maven `pom.xml`
that enables remote Java debugging to the initialized when the library is first
loaded in R. When set to true a Java debugging session on port 8998 is opened
which can be connected to as a remote Java application. This allows breakpoints
to be set on Java code and the state of the JVM to be inspected when Java code
is executed from R, however Java code changes cannot be hot-swapped into the
running JVM, and so debugging integration issues is relatively limited. For more
details see the [Maven configuration vignette](./R6-generator-maven-config).

There are other limitations with enabling Java debugging, not least being the
potential for port conflicts with multiple running instances of the development
library, and caching issues between running and loaded versions of the Java
code. Whilst not too painful (compared to the alternatives) this is very
definitely not a REPL experience and reserved for the last stage of debugging.
Part of the purpose of strongly enforcing a datatype conversion contract between
Java and R, and extensive use of code generation, is to decouple Java and R
development as much as possible (N.B. do as I say - not as I do).

## Asychronous and long running code

Java code that takes a long time to complete or requires interaction from the
user creates a problem for `rJava` as the program control is passed completely
to Java during the code execution. This can lock the R session until the Java
code is finished. The fact that the R session is blocked pending the result from
Java means there is no obvious way to terminate a running Java process from
within R, and if a Java process goes rogue then the R session hangs.

We have approached this by creating a `RFuture` class which is bundled in any R
package built with `r6-generator-maven-plugin`, and some Java infrastructure to
allow a Java method call, initiated by R, to be run in its own thread. The
thread is monitored using the `R6` `RFuture` class. This allows instantaneous
return from the Java call which executes asynchronously in the background,
freeing up the R process to continue. The `RFuture` class has functions to
`cancel()` a thread, or check whether it is complete (`isDone()`), cancelled
(`isCanceled()`), or to wait for the result and `get()` it.

The `RFuture` thread wrapper is used for Java methods annotated with `@RAsync`
instead of `@RMethod`.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_7","END_SNIP_7") #,35,51)
```

A basic test of this follows which starts the execution of a 10 second countdown in Java. The countdown 

```{r}
# J = testRapi::JavaApi$get(logLevel = "WARN")
featAsyn = J$FeatureTest$new("Async testing")
# The asyncCountdown resets a timer in the FeatureTest class
tmp = featAsyn$asyncCountdown()
message("Control returned immediately.")

Sys.sleep(4)
# The countdown is not finished
if (tmp$isDone()){
  stop("FAIL: Too soon for the countdown to have finished..!")
} else {
  message("PASS: 4 seconds later the countdown is still running.")
}

Sys.sleep(8)
if (!tmp$isDone()) {
  stop("FAIL: It should have been finished by now!")
} else {
  message("PASS: the countdown is finished.")
  # in this case getting the result returns nothing as the java method is void
  # but it should trigger printing the java output.
}
```
System output from asynchronous code can be very confusing if it appears out of
sequence to other code. The system output of Java code running asynchronously 
is cached and only displayed when the result is retrieved via `get()`

```{r}
tmp$get()
```

`RFuture` does not ensure thread safety, which in general is up to the Java programmer
however in the situation where you are annotating a non thread safe class that might be
used in an `@RAsync` annotated method there is a basic locking mechanism that 
prevents multiple synchronous calls of the same method in the same object.

```{r}
# Potential for race condition is prevented by the sychronise=true annotation
tmp = featAsyn$asyncCountdown()
tmp2 = featAsyn$asyncCountdown()
Sys.sleep(5)
if (tmp$cancel()) print("First counter cancelled.")
```

Although both counters were triggered at the same time the second one is 
waiting to obtain a lock. In this example we cancel the first call after 5
seconds:

```{r}
system.time({
  try(tmp$get())
})
```

After which the second call starts processing. If you are running this 
interactively you will notice a progress indicator appears.

```{r}
system.time({
  tmp2$get()
})
```
If the default  `@RAsync(synchronise=false)` is used then race conditions may 
occur if the Java method changes the state of other objects. 
This is demonstrated here where both methods are altering the underlying 
counter alternating. As before, the output is only displayed when the result is 
requested:

```{r}
# Potential for race condition is prevented by the sychronise=true annotation
system.time({
  tmp = featAsyn$asyncRaceCountdown()
  tmp2 = featAsyn$asyncRaceCountdown()
  tmp$get()
})
```

In this case the execution takes far less that 10 seconds as both countdowns are
running in parallel and using the same timer. The output from the second 
function

```{r}
system.time({
  tmp2$get()
})
```

The `RFuture` class is also useful to prevent lock-ups due to Java code entering
an infinite loop or waiting on external input that never arrives. Sometimes
blocking the R process is useful, as long as the Java process can be terminated
at the same time as the R process, so that we can be sure that a Java process is
finished. This is supported by the `@RBlocking` annotation which places the Java
method call in a thread that can be cleanly interrupted from R, but otherwise
makes R wait for Java to finish.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_8","END_SNIP_8") #,35,51)
```

```{r}
tmp = featAsyn$blockingCountdown()
```

Static methods are more likely to be type safe. Async methods can be static in
which case there is no potential for race conditions and we don't need to check
for them.

```{r results='asis', echo=FALSE}
codeSnip("java",
         filename=here::here("java/src/main/java/uk/co/terminological/rjava/test/FeatureTest.java"),"START_SNIP_9","END_SNIP_9") #,35,51)
```

```{r}
# debug(J$FeatureTest$asyncStaticCountdown)
tmp = J$FeatureTest$asyncStaticCountdown("hello 1",4)
tmp2 = J$FeatureTest$asyncStaticCountdown("hello 2",4)
Sys.sleep(5)
tmp$get()
tmp2$get()
```

## Parameters and return types in asynchronous methods

ASync and blocking methods are handled slightly different internally. When
writing a Java method you cannot use inputs that are primitives. All parameters 
must be subtypes of `RObject` such as `RInteger` rather than the primitive 
equivalent `int`. This is a result of dynamic type checking using reflection 
when calling the java method and may be dealt with in the future. Async
methods can happily return Java objects annotated with `@RClass` which will be
appropriately passed to R wrapped in an `R6` class.

```{r}
tmp3 = J$FeatureTest$asyncFactory()
result = tmp3$get()
result$generateFactorVec()
```

## Monitoring the status of long running operations

As long running jobs are in the background the status of all long running jobs 
may need to be queried. The status may be "cancelled", "in progress", "result ready"
or if the result has been already retrieved by `get()` it may be "processed".

```{r}
tmp = featAsyn$asyncCountdown()
status = testRapi::.background_status()
status
```

Previous results can be retrieved from this list using the id.

```{r}

oldFut = testRapi::.background_get_by_id(status$id[1])
oldFut$get()

```

Releasing old results may be necessary if memory is an issue. The tidy up clears
all processed and cancelled background tasks, and frees up associated JVM memory.

```{r}
testRapi::.background_tidy_up()
testRapi::.background_status()
```

# Summary

The `r6-generator-maven-plugin` can be used to generate an R package with `R6` classes that exposes selected Java methods to R. Given enough detail in Java the resulting generated R package can be quite feature rich and setup in a format ready to deploy to `r-universe`. The aim is to make the process of creating R clients for Java libraries easy and maintainable.