There are a number of utility classes in the runtime java library to facilitate data marshalling between Java and R. This is an incomplete overview of some of the patterns you can use to convert Java data to an R friendly form.
The main imports that are useful in Java code that provides an R api are:
import uk.co.terminological.rjava.types.*;
import static uk.co.terminological.rjava.RConverter.*;
import static uk.co.terminological.rjava.Rule.MapRule.*;
import uk.co.terminological.rjava.IncompatibleTypeException;
import uk.co.terminological.rjava.NameNotFoundException;
import uk.co.terminological.rjava.RConverter;
import uk.co.terminological.rjava.utils.RFunctions;
import uk.co.terminological.rjava.RName;
Unwrapping data that arrives from R is usually the case of calling
.get()
on the RObject
class, which should give
you the standard Java object. For primitives this will be in their boxed
form with NA values represented as null. A .opt()
call will
attempt to coerce the input to a specific Java class and give you the
result as a Optional
if it can do so. In this case empty
optionals may be NA or incompatible values in R.
RDate date = RDate.from("2020-02-02");
assertTrue(date.opt(LocalDate.class).isPresent());
assertTrue(date.get().toString().equals("2020-02-02"));
// RPrimitive.as(Class<?>) returns an optional. NA or incorrect types both come out as Optional.empty()
RNumeric num = RNumeric.from(123.456);
assertTrue(num.opt(BigDecimal.class).isPresent());
assertTrue(num.opt(Float.class).isPresent());
assertTrue(num.opt(Long.class).isPresent());
// Will return an Optional.empty() as not a String
assertTrue(!num.opt(String.class).isPresent());
// RPrimitive.get() will return the main underlying implementation type (in this case a Double) or a null for a NA
assertTrue(num.get().doubleValue() == 123.456);
assertTrue(num.get(Float.class).equals(123.456F));
// NA values are returned as Optional.empty() by RPrimitive.opt()
assertTrue(!RInteger.NA.opt().isPresent());
assertTrue(RInteger.from(234).opt().isPresent());
RVectors can be created from Java collections, arrays or Streams,
using either static methods on the RVector
class or using
the RConverter
static collector methods or
convert()
method.
// Java boxed arrays can be directly converted to vector from RVector
// Integer.MIN_VALUE inputs are converted to nulls silently
Integer[] tmp = {1,3,5,7, null, Integer.MIN_VALUE, Integer.MIN_VALUE+1, 0};
RIntegerVector col0 = RVector.with(tmp);
RIntegerVector col1 = convert(tmp);
assertTrue(col1.get(4).isNa());
assertTrue(col1.get(5).isNa());
assertFalse(col1.get(6).isNa());
assertTrue(!col1.get(4).opt().isPresent());
// Alternatively RConverter has a set of collectors for Streams of convertible types.
RIntegerVector col2 = Stream.of(tmp).collect(integerCollector());
assertTrue(col1.equals(col0));
assertTrue(col1.equals(col2));
// Conversion using a CollectionConverter to convert arbitrary collection types
RIntegerVector col3 = using(integerCollector()).convert(tmp);
RIntegerVector col4 = using(integerCollector()).convert(Arrays.asList(tmp));
RIntegerVector col5 = using(integerCollector()).convert(Arrays.asList(tmp).iterator());
RIntegerVector col6 = using(integerCollector()).convert(Stream.of(tmp));
// These should all be the same
assertTrue(col1.equals(col3));
assertTrue(col1.equals(col4));
assertTrue(col1.equals(col5));
assertTrue(col1.equals(col6));
// But Converting collectors can also handle singleton conversions
RIntegerVector i1 = using(integerCollector()).convert(tmp[0]);
assertTrue(i1.get(0) instanceof RInteger);
assertTrue(col1.get(0).equals(i1.get(0)));
// Primitive arrays are slightly less flexible but RConverter can process them
// nulls are not allowed but the "equivalent" to NA is Integer.MIN_VALUE = RInteger.NA_INT
// If they are converted to a boxed form by an IntStream they can be collected as above
int[] tmp2 = {1,3,5,7, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE+1, 0};
RIntegerVector col7 = convert(tmp2);
RIntegerVector col8 = IntStream.of(tmp2).boxed().collect(integerCollector());
assertTrue(col1.equals(col7));
assertTrue(col1.equals(col8));
Other RVector
examples show using the
RConverter
collector methods for different data types:
//Dates - difficulty is getting a stream of maybe null dates
//Collect streams of other data types:
RDateVector rdv = Stream
.of("2020-03-01","2020-03-02","2020-03-03",null,"2020-03-05")
.collect(dateFromStringCollector());
RLogicalVector rlv1 = rdv.stream().map(d -> d.isNa()).collect(booleanCollector());
//or use RConverter.convert on arrays
boolean[] bools = {false,false,false,true,false};
boolean[] bools2 = {false,false,false,true,true};
assertTrue(rlv1.equals(convert(bools)));
assertTrue(!rlv1.equals(convert(bools2)));
Whe inputting data it is possible to convert dataframe input by
defining a interface using @RName
annotations where the
name matched the column names. The type of the data must also be given
as the columns in an RDataframe
class are loosely
typed.
import uk.co.terminological.rjava.RName;
import uk.co.terminological.rjava.types.RFactor;
import uk.co.terminological.rjava.types.RInteger;
import uk.co.terminological.rjava.types.RNumeric;
public interface DiamondPOJO {
@RName("carat") public RNumeric getCarat();
@RName("depth") public RNumeric getDepth();
@RName("table") public RNumeric getTable();
@RName("price") public RInteger getPrice();
@RName("clarity") public RFactor getClarity();
}
Once an interface is defined binding that to a dataframe and streaming the result gives a :
Stream<DiamondPOJO> bound = FeatureTest
.diamonds()
.attach(DiamondPOJO.class)
.streamCoerce();
double averageDepth = bound.mapToDouble(d -> d.getDepth().get()).average().getAsDouble();
System.out.print("average depth: "+averageDepth);
// mean(diamonds$depth)
// [1] 61.7494
In another example you can see the @RName
annotation is
only needed if the name does not match exactly (which if you are
following POJO conventions it will not).
The interface can define default methods which can be used to transform the data.
// N.B. interface specification must be public
public static interface Diamonds {
@RName("carat") public RNumeric getCarats();
@RName("cut") public RFactor getCuts();
public RInteger price(); //doesn't have to be named if
public default void print() {
System.out.println("price: "+this.price()+"; carats: "+this.getCarats() + "; cut: "+this.getCuts());
}
}
@Test
final void testDataframeCoercion() throws IOException, UnconvertableTypeException {
RDataframe dia = getDiamonds();
//Test object binding and default interface methods:
dia.stream(Diamonds.class).limit(10).forEach(Diamonds::print);
System.out.println(""+dia.pull("price",RIntegerVector.class).get().collect(Collectors.averagingDouble(x -> (double) x)));
dia.attach(Diamonds.class).getCoercedRow(100).getCuts().get();
dia.attach(Diamonds.class).getRow(100).lag().coerce().getCuts().get();
}
A stream of Java objects can be collected into a R data frame using a
dataframe collector. This defines the mappings from object to value
associated with each column name. The result is an
RDataframe
that can be passed back to R as the result of a
Java method.
//Use a stream + dataframe collector to generate data frame:
Arrays.asList("Hello","World","Stream","Support","in","Java")
.stream()
.collect(dataframeCollector(
mapping("original", s-> s),
mapping("lowercase", s-> s.toLowerCase()),
mapping("uppercase", s-> s.toUpperCase()),
mapping("subst", s-> s.substring(0,Math.min(3,s.length()))),
mapping("length", s-> s.length())
));