R6 Generator Runtime: Using R Datatypes in Java

There are a number of utility classes in the runtime java library to facilitate data marshalling between Java and R. This is an incomplete overview of some of the patterns you can use to convert Java data to an R friendly form.

Imports

The main imports that are useful in Java code that provides an R api are:

import uk.co.terminological.rjava.types.*;
import static uk.co.terminological.rjava.RConverter.*;
import static uk.co.terminological.rjava.Rule.MapRule.*;

import uk.co.terminological.rjava.IncompatibleTypeException;
import uk.co.terminological.rjava.NameNotFoundException;
import uk.co.terminological.rjava.RConverter;
import uk.co.terminological.rjava.utils.RFunctions;
import uk.co.terminological.rjava.RName;

Wrapping and unwrapping

Unwrapping data that arrives from R is usually the case of calling .get() on the RObject class, which should give you the standard Java object. For primitives this will be in their boxed form with NA values represented as null. A .opt() call will attempt to coerce the input to a specific Java class and give you the result as a Optional if it can do so. In this case empty optionals may be NA or incompatible values in R.

        RDate date = RDate.from("2020-02-02");
        assertTrue(date.opt(LocalDate.class).isPresent());
        assertTrue(date.get().toString().equals("2020-02-02"));
        
        // RPrimitive.as(Class<?>) returns an optional. NA or incorrect types both come out as Optional.empty()
        RNumeric num = RNumeric.from(123.456);
        assertTrue(num.opt(BigDecimal.class).isPresent());
        assertTrue(num.opt(Float.class).isPresent());
        assertTrue(num.opt(Long.class).isPresent());
        
        // Will return an Optional.empty() as not a String
        assertTrue(!num.opt(String.class).isPresent());
        
        // RPrimitive.get() will return the main underlying implementation type (in this case a Double) or a null for a NA
        assertTrue(num.get().doubleValue() == 123.456);
        assertTrue(num.get(Float.class).equals(123.456F));
        
        // NA values are returned as Optional.empty() by RPrimitive.opt()
        assertTrue(!RInteger.NA.opt().isPresent());
        assertTrue(RInteger.from(234).opt().isPresent());
        

RVector creation and transformation

RVectors can be created from Java collections, arrays or Streams, using either static methods on the RVector class or using the RConverter static collector methods or convert() method.

        
        // Java boxed arrays can be directly converted to vector from RVector
        // Integer.MIN_VALUE inputs are converted to nulls silently
        Integer[] tmp = {1,3,5,7, null, Integer.MIN_VALUE, Integer.MIN_VALUE+1, 0};
        RIntegerVector col0 = RVector.with(tmp);
        RIntegerVector col1 = convert(tmp);
        
        assertTrue(col1.get(4).isNa());
        assertTrue(col1.get(5).isNa());
        assertFalse(col1.get(6).isNa());
        
        assertTrue(!col1.get(4).opt().isPresent());
        
        // Alternatively RConverter has a set of collectors for Streams of convertible types.
        RIntegerVector col2 = Stream.of(tmp).collect(integerCollector());
        assertTrue(col1.equals(col0));
        assertTrue(col1.equals(col2));
        
        // Conversion using a CollectionConverter to convert arbitrary collection types
        RIntegerVector col3 = using(integerCollector()).convert(tmp);
        RIntegerVector col4 = using(integerCollector()).convert(Arrays.asList(tmp));
        RIntegerVector col5 = using(integerCollector()).convert(Arrays.asList(tmp).iterator());
        RIntegerVector col6 = using(integerCollector()).convert(Stream.of(tmp));
        // These should all be the same     
        assertTrue(col1.equals(col3));
        assertTrue(col1.equals(col4));
        assertTrue(col1.equals(col5));
        assertTrue(col1.equals(col6));
                
        // But Converting collectors can also handle singleton conversions
        RIntegerVector i1 = using(integerCollector()).convert(tmp[0]);
        assertTrue(i1.get(0) instanceof RInteger);
        assertTrue(col1.get(0).equals(i1.get(0)));
                        
        // Primitive arrays are slightly less flexible but RConverter can process them
        // nulls are not allowed but the "equivalent" to NA is Integer.MIN_VALUE = RInteger.NA_INT
        // If they are converted to a boxed form by an IntStream they can be collected as above
        int[] tmp2 = {1,3,5,7, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE+1, 0};
        RIntegerVector col7 = convert(tmp2);
        RIntegerVector col8 = IntStream.of(tmp2).boxed().collect(integerCollector());
        assertTrue(col1.equals(col7));
        assertTrue(col1.equals(col8));
        

Other RVector examples show using the RConverter collector methods for different data types:

        
        //Dates - difficulty is getting a stream of maybe null dates
        //Collect streams of other data types:
        RDateVector rdv = Stream
                .of("2020-03-01","2020-03-02","2020-03-03",null,"2020-03-05")
                .collect(dateFromStringCollector());
        RLogicalVector rlv1 = rdv.stream().map(d -> d.isNa()).collect(booleanCollector());
        //or use RConverter.convert on arrays
        boolean[] bools = {false,false,false,true,false};
        boolean[] bools2 = {false,false,false,true,true};
        assertTrue(rlv1.equals(convert(bools)));
        assertTrue(!rlv1.equals(convert(bools2)));
        

RDataframe binding to POJOs

Whe inputting data it is possible to convert dataframe input by defining a interface using @RName annotations where the name matched the column names. The type of the data must also be given as the columns in an RDataframe class are loosely typed.

import uk.co.terminological.rjava.RName;
import uk.co.terminological.rjava.types.RFactor;
import uk.co.terminological.rjava.types.RInteger;
import uk.co.terminological.rjava.types.RNumeric;

public interface DiamondPOJO {

    @RName("carat") public RNumeric getCarat();
    @RName("depth") public RNumeric getDepth();
    @RName("table") public RNumeric getTable();
    @RName("price") public RInteger getPrice();
    @RName("clarity") public RFactor getClarity();
    
}

Once an interface is defined binding that to a dataframe and streaming the result gives a :

            Stream<DiamondPOJO> bound = FeatureTest
                .diamonds()
                .attach(DiamondPOJO.class)
                .streamCoerce();
            
            double averageDepth = bound.mapToDouble(d -> d.getDepth().get()).average().getAsDouble();
            System.out.print("average depth: "+averageDepth);
            
            // mean(diamonds$depth)
            // [1] 61.7494

In another example you can see the @RName annotation is only needed if the name does not match exactly (which if you are following POJO conventions it will not).

The interface can define default methods which can be used to transform the data.

    // N.B. interface specification must be public
    public static interface Diamonds {
        @RName("carat") public RNumeric getCarats();
        @RName("cut") public RFactor getCuts();
        public RInteger price(); //doesn't have to be named if 
        public default void print() {
            System.out.println("price: "+this.price()+"; carats: "+this.getCarats() + "; cut: "+this.getCuts());
        }
    }
    
    @Test
    final void testDataframeCoercion() throws IOException, UnconvertableTypeException {
        RDataframe dia = getDiamonds();
        //Test object binding and default interface methods:
        dia.stream(Diamonds.class).limit(10).forEach(Diamonds::print);
        
        System.out.println(""+dia.pull("price",RIntegerVector.class).get().collect(Collectors.averagingDouble(x -> (double) x)));
        
        dia.attach(Diamonds.class).getCoercedRow(100).getCuts().get();
        dia.attach(Diamonds.class).getRow(100).lag().coerce().getCuts().get();
    }

Streaming POJOs to RDataframes

A stream of Java objects can be collected into a R data frame using a dataframe collector. This defines the mappings from object to value associated with each column name. The result is an RDataframe that can be passed back to R as the result of a Java method.

        //Use a stream + dataframe collector to generate data frame:
        Arrays.asList("Hello","World","Stream","Support","in","Java")
        .stream()
        .collect(dataframeCollector(
            mapping("original", s-> s),
            mapping("lowercase", s-> s.toLowerCase()),
            mapping("uppercase", s-> s.toUpperCase()),
            mapping("subst", s-> s.substring(0,Math.min(3,s.length()))),
            mapping("length", s-> s.length())
        ));

Lists and named lists