Skip to Content

`ggplot2::ggplot_build()` abandans the dark magic I worked pass midnight

# install needed R packages
remotes::update_packages(c('gginnards', 'tibble', 'ggplot2'), upgrade = TRUE)

1 Beginning

When I do rotation in Lu Lab, I’m assigned to write some R code for matrix process and plot part of exSeek project. In the process, I want to reproduce a plot in R, i.e., the second plot here.

2 Development

The problem is that I want to label the text above each violin, but there isn’t a easy way to get the highest y coordinate. median() is okay, but calculate the value by something like median() + 1.5 * quantile(...) is too complicated and ugly. The value has already been calculated in the plot, why we bother to do it again. 1

After inspect the ggplot object (treat it as a list) for a while, I find a package, gginnards. It can show you the internal aesthetics computed by ggplot, like this one

3 Climax

However, it can only print the value, while I want to get it. So I write the following function (I worked pass 0:00, maybe close to 01:00):

#' @title Get computed variables of a ggplot object
#'
#' @description
#' Access the computed variables, like `density` of [ggplot2::geom_density()]
#'
#' @param ggplot_obj ggplot object. [ggplot2::ggplot()] must contains
#'
#' @return tibble.
#'
#' @keywords internal
#'
#' @examples
#' get_ggplot_data(ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point())
#' \donotrun{
#' # ggplot() must contain default x, y aesthesic.
#' get_ggplot_data(ggplot2::ggplot(mtcars) + ggplot2::geom_point(ggplot2::aes(wt, mpg)))
#' # failed with geom_density
#' }

get_ggplot_data <- function(ggplot_obj) {
    temp_file <- tempfile();
    on.exit(file.remove(temp_file))

    env <- new.env();
    assgin_to_data <- function(x, env) {print(x);env$data <- tibble::as_tibble(x)}

    ggplot_obj2 = ggplot_obj +
        gginnards::stat_debug_panel(
            summary.fun = assgin_to_data,
            summary.fun.args = list(env = env)
        )

    sink(temp_file)
    png(temp_file)
    print(ggplot_obj2)
    dev.off()
    sink()

    env$data
}

Let’s have a try

ggplot_obj <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point()

get_ggplot_data(ggplot_obj)
## # A tibble: 32 x 4
##        x     y PANEL group
##    <dbl> <dbl> <fct> <int>
##  1  2.62  21   1        -1
##  2  2.88  21   1        -1
##  3  2.32  22.8 1        -1
##  4  3.22  21.4 1        -1
##  5  3.44  18.7 1        -1
##  6  3.46  18.1 1        -1
##  7  3.57  14.3 1        -1
##  8  3.19  24.4 1        -1
##  9  3.15  22.8 1        -1
## 10  3.44  19.2 1        -1
## # … with 22 more rows

4 Epilogue

When I tidy the code in the morning today, I find that ggplot2 actually provides an official interface for this purpose, ggplot2::ggplot_build(). Bang, my heart crashes.

As usual, I move the code here for memorial.


  1. In fact, when my collaborator creates the plot in Python, he use median() plus a constant.↩︎