R and Java have quite different philosophy on datatypes that mean that loss less round tripping of data through the limited expressibility of the JNI bridge is a non trivial exercise. This is why a code generation library is valuable as it removes the need for developers to understand the grim complexity of the details. The following areas proved to be the most difficult:
By inspecting the Java code at compilation time, and by imposing
constraints on the types of data transferred between R and Java, it is
possible to use a combination of convention, data-type coercion, and
type checking to ensure that inputs to Java code from R are type safe,
and (almost) 100% faithful copies of their R equivalents and vice-versa.
This differs in approach from r cran("jsr223")
which
performs dynamic data conversion from R types to generic R data
structures. This is harder to make consistent and requires a degree of
introspection, during the marshaling and un-marshaling process.
Enforcing rigid type systems on the interface between R and Java allows
the simpler transformation to be made, based on the strongly typed Java
code at compile time, which should result in faster but less flexible
data transfer.
R data types converted into Java.
Ensuring an R NULL value is correctly returned requires a placeholder
class in Java. This is
uk.co.terminological.rjava.types.RNull
and it enforces some
of the identity constraints. Java void types can be best represented as
invisible(NULL)
which is almost the same as not returning
anything from a method.
@RMethod
public RNull bounceNull(RNull x) {
log.info("java: "+x);
System.out.println(x.rCode());
return x;
}
@RMethod
public void bounceVoid() {
log.info("java void");
}
## Initialised testRapi
## Tests the round tripping of supported datatypes
## java: NULL
## NULL
## java void
Strings are the least contentious format with high similarity between
R and Java, so long as we are all using UTF-8. An R character vector can
be transferred to Java seamlessly except for the fact that it is
transferred as an array of characters over JNI. A Java equivalent of
uk.co.terminological.rjava.types.RCharacter
is provided to
mark the object as an R compatible datatype, in general though
java.lang.String
can used instead of singleton strings. For
character vectors we have
uk.co.terminological.rjava.types.RCharacterVector
which is
a specialised Java collection of
uk.co.terminological.rjava.types.RCharacter
.
@RMethod
public String bounceString(String x) {
log.info("java: "+x);
return x;
}
@RMethod
public RCharacter bounceCharacter(RCharacter x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RCharacterVector bounceCharacterVector(RCharacterVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
## java: Hello
## [1] "Hello"
## java: Hello
## [1] "Hello"
## java: <rcharacter[2]>{Hello, World}
## [1] "Hello" "World"
Numerics in R can be represented a number of ways in Java with
different degrees of precision. Various Java types, such as
java.lang.Double
, java.lang.Float
,
java.math.BigDecimal
, java.lang.Long
, and
their primitive counterparts
double
,float
,long
are all best
represented by R numeric values. The
uk.co.terminological.rjava.types.RNumeric
type allows Java
programmers to dynamically convert input R Numeric values to a specific
native Java type. The uk.co.terminological.rjava.RConverter
class provides various methods to convert native Java types to
uk.co.terminological.rjava.types.RNumeric
for output. These
functions also handles the special values of Inf
and
-Inf
, NaN
, and NA_real_
and their
equivalent values in Java. To support the use of native Java, singleton
RNumeric
values can be substituted with primitive Java
double
in method signatures as long as the inputs can never
be NA_real_
, and the code is not run asynchronously. Vector
numeric inputs from R are specified by the
uk.co.terminological.rjava.types.RNumericVector
class,
which is a collection of RNumeric
values.
@RMethod
public double bounceDouble(double x) {
log.info("java: "+x);
return x;
}
@RMethod
public RNumeric bounceNumeric(RNumeric x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RNumericVector bounceNumericVector(RNumericVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
## java: 1.23
## [1] 1.23
## Error in self$.api$.toJava$double(x) :
## cant use NA as input to java double
## java: 4.60
## [1] 4.6
## java: NA
## [1] NA
## java: Infinity
## [1] Inf
## java: -Infinity
## [1] -Inf
## java: NaN
## [1] NaN
## java: <rnumeric[3]>{2.00, 5.00, 34.0}
## [1] 2 5 34
## java: <rnumeric[7]>{2.30, 4.60, NA, 34.0, NaN, Infinity, -Infinity}
## [1] 2.3 4.6 NA 34.0 NaN Inf -Inf
In an identical way to above
uk.co.terminological.rjava.types.RInteger
types hold
integer values. R inputs to Java functions are coerced to integers and
an error thrown if this is not possible. The primitive int
Java type can be used for singletons if they are not NA
,
and vectors are handled in the same way as before with a specialist
uk.co.terminological.rjava.types.RIntegerVector
. Non
integer numeric types are not coerced to integer, but rather a runtime
error is thrown to the user if they try and use the wrong type.
@RMethod
public int bounceInt(int x) {
log.info("java: "+x);
return x;
}
@RMethod
public RInteger bounceInteger(RInteger x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RIntegerVector bounceIntegerVector(RIntegerVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
## java: 1
## [1] 1
## java: 3
## [1] 3
## java: <rinteger[3]>{2, 3, 4}
## [1] 2 3 4
## java: <rinteger[3]>{2, 3, 4}
## [1] 2 3 4
## java: <rinteger[3]>{2, NA, 4}
## [1] 2 NA 4
Factors are somewhat complicated as a individual factor only makes
sense in the context of a vector of possible options. However the
uk.co.terminological.rjava.types.RFactor
type and
uk.co.terminological.rjava.types.RFactorVector
collection
allow information to be retained about the values and labels for R
Factors. There is support for mapping R factors to Java
Enum
classes which is provided by
uk.co.terminological.rjava.RConverter
and creating R
ordered factors from Enum
s.
@RMethod
public RFactor bounceFactor(RFactor x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RFactorVector bounceFactorVector(RFactorVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
## java: a
## [1] "a"
## java: <rfactor[5]>{a, b, c, b, a}
## [1] a b c b a
## Levels: a < b < c
Date support is provided by
uk.co.terminological.rjava.types.RDate
which allows R Date
and POSIXt types to be represented in Java as
java.time.LocalDate
s. Vectors of dates are also supported
as before. There is currently no support for datetime classes but this
is a possible enhancement. There is a
uk.co.terminological.rjava.types.RDateVector
class for
collections.
@RMethod
public RDate bounceDate(RDate x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RDateVector bounceDateVector(RDateVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
## java: 2001-02-03
## [1] "2001-02-03"
## java: 2001-02-03
## [1] "Date"
## java: <rdate[3]>{2001-02-03, 2001-02-04, 2001-02-05}
## [1] "2001-02-03" "2001-02-04" "2001-02-05"
## java: <rdate[3]>{2001-02-03, NA, 2001-02-05}
## [1] "2001-02-03" NA "2001-02-05"
## dates smaller than 0001-01-01 will be converted to NA
## java: NA
## [1] NA
## dates smaller than 0001-01-01 will be converted to NA
## java: <rdate[5]>{NA, 0001-01-01, 0011-01-01, 0101-01-01, 1001-01-01}
## [1] NA "1-01-01" "11-01-01" "101-01-01" "1001-01-01"
R logicals are mapped to
uk.co.terminological.rjava.types.RLogical
objects which can
represent NA_logical_
values faithfully. If NA
values are not needed then primitive boolean
types can be
substituted as before, and vectors work as before.
@RMethod
public RLogical bounceLogical(RLogical x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RLogicalVector bounceLogicalVector(RLogicalVector x) {
log.info("java: "+x);
//System.out.println(x.rCode());
return x;
}
@RMethod
public RFile bounceFile(RFile x) {
log.info("java: "+x);
return x;
}
## java: true
## [1] TRUE
## java: <rlogical[3]>{true, true, false}
## [1] TRUE TRUE FALSE
## java: <rlogical[3]>{true, NA, false}
## [1] TRUE NA FALSE
Files in R need to be converted to absolute paths before they can be used in Java. Tilde path expansion is also needed to correctly pick up the users home directory. Relative paths are considered to be relative to whatever the current working directory is in R at the time the function is called. All of these edge cases can be ignored in Java. However we do not enforce that the parent directory must exist, that is up to the Java developer.
## java: /github/home/tmp/test1
## /github/home/tmp/test1
## java: /tmp/RtmpJO3Sfa/Rbuild194b22d2c815/testRapi/tmp/test2
## /tmp/RtmpJO3Sfa/Rbuild194b22d2c815/testRapi/tmp/test2
## java: /tmp/RtmpJO3Sfa/Rbuild194b22d2c815/testRapi/vignettes/tmp/test3
## /tmp/RtmpJO3Sfa/Rbuild194b22d2c815/testRapi/vignettes/tmp/test3
In Java R dataframes are modelled as a named list of
uk.co.terminological.rjava.types.RVector<?>
each
holding columnar data of unspecified type. This is represented
internally as a column wise Map, but the
uk.co.terminological.rjava.types.RDataframe
class contains
a number of methods to make using dataframes intuitive in Java. This
includes support for Iterable
and Stream
interfaces operating row-wise over the data. The
uk.co.terminological.rjava.types.RBoundDataframe
can map
typed columns to a stream of proxy objects satisfying an interface
specification. This can be used to convert a RDataframe
into a stream of custom POJOs (more examples TBD). The dataframe can
support any column with vector data types mentioned above. At present
however it does not support named rows, as the focus is on tidy
dataframes, nor does it support purrr
style list
columns.
@RMethod
public RDataframe bounceDataframe(RDataframe x) {
log.info("java: "+x);
//System.out.println(x.rCode());
//System.out.println(x.rConversion());
return x;
}
testDf = tibble::tibble(
grp = c("A","A","A","B","B","B"),
x=c(0,1,2,4,5,6),
y=c(3L,2L,1L,-1L,-2L,-3L)
)
testDf = dplyr::group_by(testDf,grp,x)
b$bounceDataframe(testDf)
## java: groups: [grp, x]
## grp: <rcharacter[6]>{A, A, A, B, B, B}
## x: <rnumeric[6]>{0.00, 1.00, 2.00, 4.00, 5.00, 6.00}
## y: <rinteger[6]>{3, 2, 1, -1, -2, -3}
## # A tibble: 6 × 3
## # Groups: grp, x [6]
## grp x y
## <chr> <dbl> <int>
## 1 A 0 3
## 2 A 1 2
## 3 A 2 1
## 4 B 4 -1
## 5 B 5 -2
## 6 B 6 -3
b$bounceDataframe(tibble::tibble(
u=factorVec[1:3],
v=c(TRUE,NA,FALSE),
w=c("alpha",NA,"gamma"),
x=c(0,1,2),
y=c(3L,2L,1L),
z=as.Date(c("2001-02-03",NA,"2001-02-05"))
))
## java: groups: []
## u: <rfactor[3]>{a, b, c}
## v: <rlogical[3]>{true, NA, false}
## w: <rcharacter[3]>{alpha, NA, gamma}
## x: <rnumeric[3]>{0.00, 1.00, 2.00}
## y: <rinteger[3]>{3, 2, 1}
## z: <rdate[3]>{2001-02-03, NA, 2001-02-05}
## # A tibble: 3 × 6
## u v w x y z
## <ord> <lgl> <chr> <dbl> <int> <date>
## 1 a TRUE alpha 0 3 2001-02-03
## 2 b NA <NA> 1 2 NA
## 3 c FALSE gamma 2 1 2001-02-05
## java: <rnumeric[10]>{
## <rnumeric[10]>{0.00, 0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900},
## <rnumeric[10]>{1.00, 1.10, 1.20, 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90},
## <rnumeric[10]>{2.00, 2.10, 2.20, 2.30, 2.40, 2.50, 2.60, 2.70, 2.80, 2.90},
## <rnumeric[10]>{3.00, 3.10, 3.20, 3.30, 3.40, 3.50, 3.60, 3.70, 3.80, 3.90},
## <rnumeric[10]>{4.00, 4.10, 4.20, 4.30, 4.40, 4.50, 4.60, 4.70, 4.80, 4.90},
## <rnumeric[10]>{5.00, 5.10, 5.20, 5.30, 5.40, 5.50, 5.60, 5.70, 5.80, 5.90},
## <rnumeric[10]>{6.00, 6.10, 6.20, 6.30, 6.40, 6.50, 6.60, 6.70, 6.80, 6.90},
## <rnumeric[10]>{7.00, 7.10, 7.20, 7.30, 7.40, 7.50, 7.60, 7.70, 7.80, 7.90},
## <rnumeric[10]>{8.00, 8.10, 8.20, 8.30, 8.40, 8.50, 8.60, 8.70, 8.80, 8.90},
## <rnumeric[10]>{9.00, 9.10, 9.20, 9.30, 9.40, 9.50, 9.60, 9.70, 9.80, 9.90}
## }
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
## [2,] 0.1 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1
## [3,] 0.2 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2
## [4,] 0.3 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3
## [5,] 0.4 1.4 2.4 3.4 4.4 5.4 6.4 7.4 8.4 9.4
## [6,] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
## [7,] 0.6 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6
## [8,] 0.7 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7
## [9,] 0.8 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8
## [10,] 0.9 1.9 2.9 3.9 4.9 5.9 6.9 7.9 8.9 9.9
## java: <rnumeric[2]>{
## <rnumeric[4]>{
## <rnumeric[8]>{0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00},
## <rnumeric[8]>{8.00, 9.00, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0},
## <rnumeric[8]>{16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0},
## <rnumeric[8]>{24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0}
## },
## <rnumeric[4]>{
## <rnumeric[8]>{32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0},
## <rnumeric[8]>{40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0},
## <rnumeric[8]>{48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0},
## <rnumeric[8]>{56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0}
## }
## }
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 0 8 16 24
## [2,] 1 9 17 25
## [3,] 2 10 18 26
## [4,] 3 11 19 27
## [5,] 4 12 20 28
## [6,] 5 13 21 29
## [7,] 6 14 22 30
## [8,] 7 15 23 31
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 32 40 48 56
## [2,] 33 41 49 57
## [3,] 34 42 50 58
## [4,] 35 43 51 59
## [5,] 36 44 52 60
## [6,] 37 45 53 61
## [7,] 38 46 54 62
## [8,] 39 47 55 63
In R, lists and named are complex objects with optionally named
sequences of arbitrary typed data. They are analogous to JSON objects
and it is tempting to serialise all R lists to JSON and use a JSON
library to interpret them in Java. This would be possible but lose some
of the support built into the R classes mentioned above. As such we took
a hybrid approach where R lists and named lists are dynamically and
recursively mapped to collection types from R to Java, and exported back
from Java to R serialised as a character string containing R code, which
is evaluated by the R interpreter. Despite being somewhat hacky this
does a surprisingly good job at transferring lists from R to Java and
back to R faithfully. However it is probably not well suited to very
large lists and definitely could not support lists that have cyclical
structures in the object graph. To support fluent use of R Lists in Java
all classes that derive from RObject
support the visitor
pattern, which can be used to relatively simply select out datatypes of
interest. Support for a XPath like syntax to access specific elements of
nested lists is planned.
@RMethod
public RList bounceList(RList x) {
log.info("java: "+x);
System.out.println(x.rCode());
return x;
}
@RMethod
public RNamedList bounceNamedList(RNamedList x) {
log.info("java: "+x);
System.out.println(x.rCode());
return x;
}
## java: <rlist>{
## a,
## b,
## c,
## <rnumeric[3]>{1.00, 2.00, 3.00}
## }
## list('a', 'b', 'c', c(1.0, 2.0, 3.0))
## [[1]]
## [1] "a"
##
## [[2]]
## [1] "b"
##
## [[3]]
## [1] "c"
##
## [[4]]
## [1] 1 2 3
## java: <rlist>{
## a,
## <rlist>{
## b,
## 1.00,
## z
## },
## c,
## <rnumeric[3]>{1.00, 2.00, 3.00}
## }
## list('a', list('b', 1.0, 'z'), 'c', c(1.0, 2.0, 3.0))
## java: <rnamedlist>{
## x: a,
## b: <rcharacter[3]>{a, NA, c},
## c: 1.00
## }
## list(x='a', b=c('a', NA, 'c'), c=1.0)
## $x
## [1] "a"
##
## $b
## [1] "a" NA "c"
##
## $c
## [1] 1
So far we have concentrated on the use case of transferring data from
R to Java and back again. However we also wish to be able to rapidly
create data in Java that is going to be faithfully preserved in R. For
this end we have created a number of type converters, builder functions
and collectors, that help to marshal native Java data into
RObject
s.
the RPrimitive
interface possesses a range of factory
methods to generate appropriately typed RPrimitive
s from
Java primitives, boxed types, and Enum
s.
@RMethod
public RCharacter generateCharacter() {
return RPrimitive.of("Hello world");
}
@RMethod
public RNumeric generateNumeric() {
return RPrimitive.of(123.0);
}
@RMethod
public RInteger generateInteger() {
return RPrimitive.of(345);
}
public static enum Test {
ONE,TWO,THREE
}
@RMethod
public RFactor generateFactor() {
return RPrimitive.of(Test.ONE);
}
@RMethod
public RLogical generateLogical() {
return RPrimitive.of(true);
}
## Tests the java creation of supported datatypes
## [1] "Hello world"
## [1] 123
## [1] 345
## [1] "ONE"
## [1] TRUE
Similarly the RVector
class supports a range of fluent
builder methods which allow de novo creation of correctly typed
RVectors. The RConverter
class also provides a range of
collectors that facilitate mapping Java Streams to R Vectors.
@RMethod
public RCharacterVector generateCharacterVec() {
return RVector.with("Hello world","Ola el mundo","Bonjour le monde", null);
}
@RMethod
public RNumericVector generateNumericVec() {
return DoubleStream
.of(3.0, 4.3, 2.1, 2.3).boxed()
.collect(RConverter.doubleCollector());
}
@RMethod
public RIntegerVector generateIntegerVec() {
return RVector.with(345, 678, null, 89);
}
@RMethod
public RFactorVector generateFactorVec() {
return RVector.with(Test.ONE,Test.THREE,null,Test.TWO);
}
@RMethod
public RLogicalVector generateLogicalVec() {
return RVector.with(true,false,null);
}
## [1] ONE THREE <NA> TWO
## Levels: ONE < TWO < THREE
## [1] 345 678 NA 89
## [1] "Hello world" "Ola el mundo" "Bonjour le monde" NA
## [1] 3.0 4.3 2.1 2.3
## [1] TRUE FALSE NA
Both RDataframe
and RList
classes implement
fluent methods to allow the creation of complex data structures in a
method familiar to Java programmers. Again RConverter
provides specialised collectors to map a stream of objects representing
sequential rows of data to the columnar format of the
RDataframe
using MapRule
s specified using
functional lambda syntax to define the mapping from object to dataframe
column.
@RMethod
public RDataframe generateDataframe() {
return RDataframe.create()
.withCol("A", RVector.with(3.0, 4.3, 2.1))
.withCol("B", RVector.with(Test.ONE,Test.THREE,Test.TWO))
.withCol("C", RVector.with("Hello world","Ola el mundo","Bonjour le monde"));
}
@RMethod
public RDataframe generateStreamDataframe() {
return
Arrays.asList("Hello","World","Stream","Support","in","Java")
.stream()
.collect(RConverter.dataframeCollector(
RConverter.mapping("original", s-> s),
RConverter.mapping("lowercase", s-> s.toLowerCase()),
RConverter.mapping("uppercase", s-> s.toUpperCase()),
RConverter.mapping("subst", s-> s.substring(0,Math.min(3,s.length()))),
RConverter.mapping("length", s-> s.length())
));
}
## # A tibble: 3 × 3
## A B C
## <dbl> <ord> <chr>
## 1 3 ONE Hello world
## 2 4.3 THREE Ola el mundo
## 3 2.1 TWO Bonjour le monde
## # A tibble: 6 × 5
## original lowercase uppercase subst length
## <chr> <chr> <chr> <chr> <int>
## 1 Hello hello HELLO Hel 5
## 2 World world WORLD Wor 5
## 3 Stream stream STREAM Str 6
## 4 Support support SUPPORT Sup 7
## 5 in in IN in 2
## 6 Java java JAVA Jav 4
Finally a note on list generation that contains Enum
values in Java cannot always be converted to factors in R. If this is
not possible then the conversion will fall back to a character string of
the label of the factor value.
/**
* Lists are much harder to type check than vectors hence RList builder methods throw checked exceptions
* @return a RList containing the supplied Java objects converted into RObjects
* @throws UnconvertableTypeException if those objects are not themselves or cannot be converted into an RObject
*/
@RMethod
public RList generateList() throws UnconvertableTypeException {
return RList.withRaw("one", Test.TWO, 3.0);
}
@RMethod
public RNamedList generateNamedList() throws UnconvertableTypeException {
return RNamedList
.withRaw("A","one")
.andRaw("B", Test.TWO)
.andRaw("C", RVector.with(3.0, 4.3, 2.1));
}
## [[1]]
## [1] "one"
##
## [[2]]
## [1] "TWO"
##
## [[3]]
## [1] 3
## $A
## [1] "one"
##
## $B
## [1] "TWO"
##
## $C
## [1] 3.0 4.3 2.1