HTML and CSS do a good job at
automatically laying out and styling content particularly in tables,
however it is not natively designed for pagination. This library
converts HTML content into PDF and PNG formats for embedding into LaTeX
documents, within the constraints of page sizes. It allows use of HTML
table layout from HTML first libraries such as gt
and
huxtable
within latex documents, or presentations, and
which appear the same as the HTML versions of those tables. HTML content
can grow in width up to the page dimensions, but preventing it from
overflowing, and without forcing table layout to be wider than it would
normally be. This heurisitic calculation of the output size up to fit
within set limits is one of the differentiators between this and other
HTML to PDF converters.
html2pdfr
PDF images can be included in LaTeX files
using an includegraphics
directive in exactly the same way
as figures. Although focussed on tabular content, html2pdfr
can convert other simple HTML, including SVG and MathML content, with
variable success rates.
The R Package is a wrapper around the Java OpenHTML2PDF library (https://github.com/danfickle/openhtmltopdf), and
requires a working installation of Java and rJava
. All
other dependencies are resolved automatically at runtime. It does not
require a graphical display and would suit running on a headless server.
The underlying Java library does not support javascript which would be
required for D3 content or rendering shiny apps, and for which
webshot
or webshot2
would be a better option.
The library relies on locally installed fonts, and the paths to local
.ttf
files must be supplied; this is managed by the
systemfonts
package. The library can resolve local and
remote CSS and image files, specified relative to the HTML and locate
them without the need for a web server.
html2pdfr
is based on a java library and must have a
working version of Java
and rJava
installed.
The following commands can ensure that your rJava
installation is working.
Binary packages of html2pdfr
are available on the
r-universe for macOS
and Windows
.
html2pdfr
can be installed from source on Linux.
html2pdfr
has been tested on R versions 3.6, 4.0, 4.1 and
4.2.
options(repos = c(
terminological = 'https://terminological.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
# Download and install html2pdfr in R
install.packages('html2pdfr')
# Browse the html2pdfr manual pages
help(package = 'html2pdfr')
Unstable versions are available but on windows build may fail if the
multi-arch option is set. Windows users will also need
RTools4.2
:
The Java libraries in html2pdfr
are 29 Mb which are too
large for CRAN.
On first use the major Java library dependencies of this project must be downloaded and cached. This can take some time but only needs to be done once. The following basic initialization code sets up the library:
# this produces a verbose output which can be hidden with suppressMessages:
conv = html2pdfr::html_converter()
Once this is complete the conv
object provides the
useful functions of the package.
PDF rendering of HTML can be done direct from a URL, or from a locally stored HTML file. Pulling in a URL and converting it to PDF is done like so:
html2pdfr::url_to_pdf(
htmlUrl = "https://cran.r-project.org/banner.shtml",
outFile = out("docs/articles/example-output.pdf")
)
## [1] "/tmp/RtmplBnpyW/docs/articles/example-output.pdf"
The resulting pdf is here, Your
success rendering HTML will vary as complex web pages (including in this
example, frames) are not supported by the underlying engine. The focus
of html2pdfr
is on simpler static html content and not
complex pages, for which alternatives already exist (see
webshot2
for example).
In the following, more usual, example the HTML is generated within R
(as you might find from a tabular data library such as
huxtable
or gtables
) and passed to the
converter with some target page dimensions. The converter will lay out
the table within the confines of the maximum space available,
overflowing to new pages, where-ever required.
irisHtml = iris[c("Species","Sepal.Width")] %>% huxtable::as_hux() %>% huxtable::theme_article() %>% huxtable::to_html()
html2pdfr::html_fragment_to_pdf(
htmlFragment = irisHtml,
maxWidthInches = 8, maxHeightInches = 8,
outFile = out("docs/articles/example-output-2.pdf")
)
## [1] "/tmp/RtmplBnpyW/docs/articles/example-output-2.pdf"
And the resulting pdf of the generated HTML is here. This document should not have pages any more than 8 inches high. The width in this case is determined by the content, which is much less wide than the maximum specified 8 inches. If there was very wide content, the converter would wrap content within cells to stay within the specified bounding box size. This bounding box behaviour means that we can insert the generated pdf into a latex document simply without risk of overfull boxes.
The layout engine should support simple SVG and MathML content.
However it does not execute javascript so is not be able to lay out D3
content. If this is something you need then using webshot2
,
which wraps a whole Chrome instance, may be a better option.
# Javascript Does not work:
# conv$urlToPdf(
# htmlUrl = "https://bl.ocks.org/mbostock/raw/1389927/?raw=true",
# outFile = here::here("docs/articles/example-d3.pdf")
# )
# MathML does work.
html2pdfr::url_to_pdf(
htmlUrl = "https://fred-wang.github.io/MathFonts/mozilla_mathml_test/",
outFile = out("docs/articles/example-mathml.pdf")
)
## [1] "/tmp/RtmplBnpyW/docs/articles/example-mathml.pdf"
One likely output of the package when passed a large amount of data in a table is a multipage pdf, where the pages can be designed small enough to fit into the overall flow of a latex document. This can be included into a latex document using the following approach which includes each page seperately. I this way an html table can be converted to a multipage pdf which can be embedded into a parent latex document, even possibly in landscape as here, but with consistent page furniture:
\begingroup
\begin{sidewaysfigure}
\begin{center}
%\fbox{
\includegraphics[page=1, width=\linewidth]{multipageTable.pdf}%}
\end{center}
\end{sidewaysfigure}
\begin{sidewaysfigure}
\begin{center}
%\fbox{
\includegraphics[page=2, width=\linewidth]{multipageTable.pdf}%}
\captionof{table}{Caption}
\label{your_label}
\end{center}
\end{sidewaysfigure}
\endgroup