Perform hyperparameter tuning using R and caret on Vertex AI

To produce any sufficiently accurate machine learning model, the process requires tuning parameters and hyperparameters. Your model’s parameters are variables that your chosen machine learning technique uses to adjust to your data, like weights in neural networks to minimize loss. Hyperparameters are variables that control the training process itself. For example, in a multilayer perceptron, altering the number and size of hidden layers can have a profound effect on your model’s performance, as does the maximum depth or minimum observations per node in a decision tree.

Hyperparameter tuning can be a costly endeavor, especially when done manually or when using exhaustive grid search to search over a larger hyperparameter space. 

In 2017, Google introduced Vizier, a technique used internally at Google for performing black-box optimization. Vizier is used to optimize many of our own machine learning models, and is also available in Vertex AI, Google Cloud’s machine learning platform.  Vertex AI Hyperparameter tuning for custom training is a built-in feature using Vertex AI Vizier for training jobs. It helps determine the best hyperparameter settings for an ML model.


In this blog post, you will learn how to perform hyperparameter tuning of your custom R models through Vertex AI.

Since many R users prefer to use Vertex AI from RStudio programmatically, you will interact with Vertex AI through the Vertex AI SDK via the reticulate package. 

The process of tuning your custom R models on Vertex AI comprises the following steps:

  1. Enable Google Cloud Platform (GCP) APIs and set up the local environment

  2. Create custom R script for training a model using specific set of hyperparameters

  3. Create a Docker container that supports training R models with Cloud Build and Container Registry 

  4. Train and tune a model using HyperParameter Tuning jobs on Vertex AI Training


To showcase this process, you train a simple boosted tree model to predict housing prices on the California housing data set. The data contains information from the 1990 California census. The data set is publicly available from Google Cloud Storage at gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv

The tree model model will predict a median housing price, given a longitude and latitude along with data from the corresponding census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

1 Hyperparameter 121922.jpg

Environment setup

This blog post assumes that you are either using Vertex AI Workbench with an R kernel or RStudio. Your environment should include the following requirements:

  • The Google Cloud SDK

  • Git

  • R

  • Python 3

  • Virtualenv

To execute shell commands, define a helper function:

[StructValue([(u’code’, u’library(glue)rnlibrary(IRdisplay)rnrnsh <- function(cmd, args = c(), intern = FALSE) {rn if (is.null(args)) {rn cmd <- glue(cmd)rn s <- strsplit(cmd, ” “)[[1]]rn cmd <- s[1]rn args <- s[2:length(s)]rn }rn ret <- system2(cmd, args, stdout = TRUE, stderr = TRUE)rn if (“errmsg” %in% attributes(attributes(ret))$names) cat(attr(ret, “errmsg”), “\n”)rn if (intern) return(ret) else cat(paste(ret, collapse = “\n”))rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb2943990>)])]

You should also install a few R packages and update the SDK for Vertex AI:

[StructValue([(u’code’, u’install.packages(c(“reticulate”, “glue”))rnsh(“pip install –upgrade google-cloud-aiplatform”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb295fed0>)])]

Next, you define variables to support the training and deployment process, namely:

  • PROJECT_ID: Your Google Cloud Platform Project ID

  • REGION: Currently, the regions us-central1, europe-west4, and asia-east1 are supported for Vertex AI; it is recommended that you choose the region closest to you

  • BUCKET_URI: The staging bucket where all the data associated with your dataset and model resources are stored

  • DOCKER_REPO: The Docker repository name to store container artifacts

  • IMAGE_NAME: The name of the container image

  • IMAGE_TAG: The image tag that Vertex AI will use

  • IMAGE_URI: The complete URI of the container image
[StructValue([(u’code’, u’PROJECT_ID <- “YOUR_PROJECT_ID”rnREGION <- “us-central1″rnBUCKET_URI <- glue(“gs://{PROJECT_ID}-vertex-r”)rnDOCKER_REPO <- “vertex-r”rnIMAGE_NAME <- “vertex-r”rnIMAGE_TAG <- “latest”rnIMAGE_URI <- glue(“{REGION}{PROJECT_ID}/{DOCKER_REPO}/{IMAGE_NAME}:{IMAGE_TAG}”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb295fa50>)])]

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

[StructValue([(u’code’, u’sh(“gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb295f9d0>)])]

Finally, you import and initialize the reticulate R package to interface with the Vertex AI SDK, which is written in Python.

[StructValue([(u’code’, u’library(reticulate)rnlibrary(glue)rnuse_python(Sys.which(“python3”))rnrnaiplatform <- import(“”)rnaiplatform$init(project = PROJECT_ID, location = REGION, staging_bucket = BUCKET_URI)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb1635ed0>)])]

Create container images for training and tuning models

The Dockerfile for your custom container is built on top of the Deep Learning container — the same container that is also used for Vertex AI Workbench. You just add an R script for model training and tuning.

Before creating such a container, you enable Artifact Registry and configure Docker to authenticate requests to it in your region.

[StructValue([(u’code’, u’sh(“gcloud artifacts repositories create {DOCKER_REPO} –repository-format=docker –location={REGION} –description=\”Docker repository\””)rnsh(“gcloud auth configure-docker {REGION} –quiet”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb1635310>)])]

Next, create a Dockerfile.

[StructValue([(u’code’, u’# filename: Dockerfile – container specifications for using R in Vertex AIrnFROM /rootrnrnCOPY train.R /root/train.Rrnrn# Install FortranrnRUN apt-get updaternRUN apt-get install gfortran -yyrnrn# Install R packagesrnRUN Rscript -e \”install.packages(‘plumber’)\”rnRUN Rscript -e \”install.packages(‘argparser’)\”rnRUN Rscript -e \”install.packages(‘gbm’)\”rnRUN Rscript -e \”install.packages(‘caret’)\”rnRUN Rscript -e \”install.packages(‘reticulate’)\”rnrnRUN pip install cloudml-hypertune’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb171ee90>)])]

Next, create the file train.R, which is used to train your R model. The script trains a gbm model (generalized boosted regression model) on the California Housing dataset. Vertex AI sets environment variables that you can utilize, and the hyperparameters for each trial are passed as command line arguments. The trained model artifacts are then stored in your Cloud Storage bucket. The results of your training script are communicated back to Vertex AI using the hypertune package, which stores a JSON file to /tmp/hypertune/output.metrics. Vertex AI uses this information to come up with a hyperparameter configuration for the next trial, and to assess which trial was responsible for the best overall result.

[StructValue([(u’code’, u’#!/usr/bin/env Rscriptrn# filename: train.R – perform hyperparameter tuning on a boosted tree model using Vertex AIrnrnlibrary(tidyverse)rnlibrary(data.table)rnlibrary(argparser)rnlibrary(jsonlite)rnlibrary(reticulate)rnlibrary(caret)rnrn# The GCP Project IDrnproject_id <- Sys.getenv(“CLOUD_ML_PROJECT_ID”)rnrn# The GCP Regionrnlocation <- Sys.getenv(“CLOUD_ML_REGION”)rnrn# The Cloud Storage URI to upload the trained model artifact tornmodel_dir <- Sys.getenv(“AIP_MODEL_DIR”)rnrn# The trial IDrntrial_id <- Sys.getenv(“CLOUD_ML_TRIAL_ID”, 0)rnrn# The JSON file to save metric results tornmetric_file <- “/var/hypertune/output.metrics”rnrn# Read hyperparameters for this trialrnp <- arg_parser(“California Housing Model”) %>%rn add_argument(“–n.trees”, default = “100”, help = “number of trees to fit”, type = “integer”) %>%rn add_argument(“–interaction.depth”, default = 3, help = “maximum depth of each tree”) %>%rn add_argument(“–n.minobsinnode”, default = 10, help = “minimun number of observations in terminal node”) %>%rn add_argument(“–shrinkage”, default = 0.1, help = “learning rate”) %>%rn add_argument(“–data”, help = “path to the training data in GCS”)rnrndir.create(“/tmp/hypertune”)rnargv <- parse_args(p, unlist(strsplit(commandArgs(trailingOnly = TRUE), “=”)))rnrnrn# Read housing datasetrnsystem2(“gsutil”, c(“cp”, argv$data, “./data.csv”))rndata <- fread(“data.csv”)rnprint(data)rnrnrn# Start model training with the hyperparameter for the trialrnprint(“Starting Model Training”)rntuneGrid <- expand.grid(rn interaction.depth = as.integer(argv$interaction.depth),rn n.trees = as.integer(argv$n.trees),rn n.minobsinnode = as.integer(argv$n.minobsinnode),rn shrinkage = as.numeric(0.1)rn)rnprint(tuneGrid)rnfitControl <- trainControl(method = “cv”, number = 3)rnset.seed(42)rnfit <- train(median_house_value ~ .,rn method = “gbm”,rn trControl = fitControl,rn tuneGrid = tuneGrid,rn metric = “MAE”,rn data = datarn)rnrnmean_absolute_error <- mean(fit$resample$MAE)rncat(paste(“mean absolute error:”, mean_absolute_error, “\\n”))rnrnrn# Report hyperparameter tuning metric to Vertex AI for pickingrn# hyperparameter configuration for the next trialrnhypertune <- import(“hypertune”)rnhpt <- hypertune$HyperTune()rnhpt$report_hyperparameter_tuning_metric(rn hyperparameter_metric_tag = “mean_absolute_error”,rn metric_value = as.numeric(mean_absolute_error),rn global_step = 1000)rnrnrn# Save model to Cloud Storage bucketrnsaveRDS(fit$finalModel, “gbm.rds”)rnsystem2(“gsutil”, c(“cp”, “gbm.rds”, model_dir))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb171e290>)])]

Finally, you build the Docker container image on Cloud Build – the serverless CI/CD platform.  Building the Docker container image may take 10 to 15 minutes.

[StructValue([(u’code’, u’sh(“gcloud builds submit –region={REGION} –tag={IMAGE_URI} –timeout=1h”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb171e250>)])]

Tune custom R model

Once your training application is containerized, you define the machine specifications for the tuning job. In this example, you use n1-standard-4 instances.

[StructValue([(u’code’, u’worker_pool_specs <- list(rn list(rn ‘machine_spec’ = list(rn ‘accelerator_count’ = as.integer(0),rn ‘machine_type’ = ‘n1-standard-4’rn ),rn ‘container_spec’ = list(rn “image_uri” = IMAGE_URI,rn “command” = c(“Rscript”, “train.R”),rn “args” = list(“–data”, “gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv”)rn ),rn ‘replica_count’ = 1rn )rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb171e6d0>)])]

This specification is then used in a CustomJob.

[StructValue([(u’code’, u’MODEL_DIR <- glue(“{BUCKET_URI}/aiplatform-custom-job-hpt”)rncustom_job <- aiplatform$CustomJob(rn display_name = “california-custom-job”,rn worker_pool_specs = worker_pool_specs,rn base_output_dir = MODEL_DIRrn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb29432d0>)])]

Hyperparameter tuning jobs search for the best combination of hyperparameters to optimize your metrics. Hyperparameter tuning jobs do this by running multiple trials of your training application with different sets of hyperparameters.

2 Hyperparameter 121922.jpg

You can control the job in the following ways:

  • max_trial_count: Decide how many trials you want to allow the service to run. Increasing the number of trials generally yields better results, but it is not always so. Usually, there is a point of diminishing returns after which additional trials have little or no effect on the accuracy. Before starting a job with a large number of trials, you may want to start with a small number of trials to gauge the effect your chosen hyperparameters have on your model’s accuracy. To get the most out of hyperparameter tuning, you shouldn’t set your maximum value lower than ten times the number of hyperparameters you use.

  • parallel_trial_count: You can specify how many trials can run in parallel. Running parallel trials has the benefit of reducing the time the training job takes (real time — the total processing time required is not typically changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials start without having the benefit of the results of any trials still running.

In addition, you also need to specify which hyperparameters to tune. There is little universal advice to give about how to choose which hyperparameters you should tune. If you have experience with the machine learning technique that you’re using, you may have insight into how its hyperparameters behave. You may also be able to find advice from machine learning communities.

However you choose them, it’s important to understand the implications. Every hyperparameter that you choose to tune has the potential to increase the number of trials required for a successful tuning job. When you run a hyperparameter tuning job on Vertex AI, the amount you are charged is based on the duration of the trials initiated by your hyperparameter tuning job. A careful choice of hyperparameters to tune can reduce the time and cost of your hyperparameter tuning job.

Vertex AI supports several data types for hyperparameter tuning jobs.

[StructValue([(u’code’, u’hpt_job <- aiplatform$HyperparameterTuningJob(rn display_name = “california-hpt-job”,rn custom_job = custom_job,rn max_trial_count = as.integer(14),rn parallel_trial_count = as.integer(2),rn metric_spec = list(rn “mean_absolute_error” = “minimize”rn ),rn parameter_spec = list(rn “n.trees” = aiplatform$hyperparameter_tuning$IntegerParameterSpec(rn min = as.integer(10), max = as.integer(1000), scale = “linear”rn ),rn “interaction.depth” = aiplatform$hyperparameter_tuning$IntegerParameterSpec(rn min = as.integer(1), max = as.integer(10), scale = “linear”rn ),rn “n.minobsinnode” = aiplatform$hyperparameter_tuning$IntegerParameterSpec(rn min = as.integer(1), max = as.integer(20), scale = “linear”rn )rn )rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb2a5ce50>)])]

To tune the model, you call the method run().

[StructValue([(u’code’, u’hpt_job$run()rnhpt_job’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb2a5c990>)])]

Finally, to list all trials and their respective results, we can inspect hpt_job$trials.

[StructValue([(u’code’, u’results <- lapply(hpt_job$trials,rn function(x) { c(as.integer(x$id), as.numeric(x$final_measurement$metrics[[0]]$value)) }rn)rnresults <-, results))rncolnames(results) <- c(“id”, “metric”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb0623750>)])]

And find the trial with the lowest error.

[StructValue([(u’code’, u’best_trial <- results[results$metric == min(results$metric), ]$idrnhpt_job$trials[[best_trial]]’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3dfdb1737bd0>)])]

The results of this tuning job can also be inspected from the Vertex AI Console.

3 Hyperparameter 121922.jpg


In this blog post, you have gone through tuning a custom R model using Vertex AI. For easier reproducibility, you can refer to this notebook on GitHub. You can deploy the resultant model from the best trial on Vertex AI Prediction following the article here.

Related Article

Use R to train and deploy machine learning models on Vertex AI

How to train and deploy a machine learning model with R on Vertex AI.

Read Article