diff --git a/static/docs/.nojekyll b/static/docs/.nojekyll new file mode 100644 index 0000000000000000000000000000000000000000..8b137891791fe96927ad78e64b0aad7bded08bdc --- /dev/null +++ b/static/docs/.nojekyll @@ -0,0 +1 @@ + diff --git a/static/docs/404.html b/static/docs/404.html new file mode 100644 index 0000000000000000000000000000000000000000..b12a511fa1e33d297108ae1b1de59e6496fd51e4 --- /dev/null +++ b/static/docs/404.html @@ -0,0 +1,122 @@ + + +
+ + + + +This outlines how to propose a change to torch. For more detailed info about contributing to this, and other tidyverse packages, please see the development contributing guide.
+You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the source file. This generally means you’ll need to edit roxygen2 comments in an .R, not a .Rd file. You can find the .R file that generates the .Rd by reading the comment in the first line.
See also the Documentation section.
+If you find a bug in torch please open an issue here. Please, provide detailed information on how to reproduce the bug. It would be great to also provide a reprex.
Feel free to open issues here and add the feature-request tag. Try searching if there’s already an open issue for your feature-request, in this case it’s better to comment or upvote it intead of opening a new one.
We welcome contributed examples. feel free to open a PR with new examples. The examples should be placed in the vignettes/examples folder.
The examples should be an .R file and a .Rmd file with the same name that just renders the code.
+See mnist-mlp.R and mnist-mlp.Rmd
+One must be able to run the example without manually downloading any dataset/file. You should also add an entry to the _pkgdown.yaml file.
We have many open issues in the github repo if there’s one item that you want to work on, you can comment on it and ask for directions.
+New R code should follow the tidyverse style guide. You can use the styler package to apply these styles, or simply run tools/style.sh script, which also formats the code and removes whitespaces. Please don’t restyle code that has nothing to do with your PR.
New C/C++ code should follow Google style guide. You can use the clang-format to apply these styles, or simply run tools/style.sh script, which also formats the code and removes whitespaces. Please don’t restyle code that has nothing to do with your PR.
We use roxygen2, with Markdown syntax, to build all documentation for the package.
We use testthat for unit tests. Contributions with test cases included are easier to accept.
devtools packageWe use devtools as the toolchain for development, but a few steps must be done before setiing up.
The first time you clone the repository, you must run:
+
+source("tools/buildlantern.R")This will compile Lantern binaries and download LibTorch and copy the binaries to deps folder in the working directory.
This command must be run everytime you modify lantern code. ie. code that lives in lantern/src.
You can the run
+
+devtools::load_all()To load torch and test interactively. Or
+
+devtools::test()To run the test suite.
+YEAR: 2020 +COPYRIGHT HOLDER: Daniel Falbel ++ +
Copyright (c) 2020 Daniel Falbel
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+library(torch)
+torch_manual_seed(1) # setting seed for reproducibilityThis vignette showcases the basic functionality of distributions in torch. Currently the distributions modules are considered ‘work in progress’ and are still experimental features in the torch package. You can see the progress in this link.
+The distributions modules in torch are modelled after PyTorch’s distributions module which in turn is based on the TensorFlow Distributions package.
+This vignette is based in the TensorFlow’s distributions tutorial.
+Let’s start and create a new instance of a normal distribution:
+
+n <- distr_normal(loc = 0, scale = 1)
+n
+#> torch_Normal ()We can draw samples from it with:
+
+n$sample()
+#> torch_tensor
+#> -0.2807
+#> [ CPUFloatType{1} ]or, draw multiple samples:
+
+n$sample(3)
+#> torch_tensor
+#> -1.6389
+#> -0.0983
+#> 1.1163
+#> [ CPUFloatType{3,1} ]We can evaluate the log probability of values:
+
+n$log_prob(0)
+#> torch_tensor
+#> -0.9189
+#> [ CPUFloatType{1} ]
+log(dnorm(0)) # equivalent R code
+#> [1] -0.9189385or, evaluate multiple log probabilities:
+
+n$log_prob(c(0, 2, 4))
+#> torch_tensor
+#> -0.9189
+#> -2.9189
+#> -8.9189
+#> [ CPUFloatType{3} ]A distribution can take a tensor as it’s parameters:
+
+b <- distr_bernoulli(probs = torch_tensor(c(0.25, 0.5, 0.75)))
+b
+#> torch_Bernoulli ()This object represents 3 independent Bernoulli distributions, one for each element of the tensor.
+We can sample a single observation:
+
+b$sample()
+#> torch_tensor
+#> 0
+#> 1
+#> 1
+#> [ CPUFloatType{3} ]or, a batch of n observations:
+b$sample(6)
+#> torch_tensor
+#> 0 1 0
+#> 0 0 1
+#> 0 0 1
+#> 0 1 1
+#> 0 0 1
+#> 0 0 1
+#> [ CPUFloatType{6,3} ]The log_prob method of distributions can be differentiated, thus, distributions can be used to train models in torch.
Let’s implement a Gaussian linear model, but first let’s simulate some data
+
+x <- torch_randn(100, 1)
+y <- 2*x + 1 + torch_randn(100, 1)and plot:
+
+plot(as.numeric(x), as.numeric(y))
We can now define our model:
+
+GaussianLinear <- nn_module(
+ initialize = function() {
+ # this linear predictor will estimate the mean of the normal distribution
+ self$linear <- nn_linear(1, 1)
+ # this parameter will hold the estimate of the variability
+ self$scale <- nn_parameter(torch_ones(1))
+ },
+ forward = function(x) {
+ # we estimate the mean
+ loc <- self$linear(x)
+ # return a normal distribution
+ distr_normal(loc, self$scale)
+ }
+)
+
+model <- GaussianLinear()We can now train our model with:
+
+opt <- optim_sgd(model$parameters, lr = 0.1)
+
+for (i in 1:100) {
+ opt$zero_grad()
+ d <- model(x)
+ loss <- torch_mean(-d$log_prob(y))
+ loss$backward()
+ opt$step()
+ if (i %% 10 == 0)
+ cat("iter: ", i, " loss: ", loss$item(), "\n")
+}
+#> iter: 10 loss: 1.809854
+#> iter: 20 loss: 1.606322
+#> iter: 30 loss: 1.46065
+#> iter: 40 loss: 1.399433
+#> iter: 50 loss: 1.39078
+#> iter: 60 loss: 1.390136
+#> iter: 70 loss: 1.390089
+#> iter: 80 loss: 1.390086
+#> iter: 90 loss: 1.390085
+#> iter: 100 loss: 1.390085We can see the parameter estimates with:
+
+model$parameters
+#> $linear.weight
+#> torch_tensor
+#> 1.9772
+#> [ CPUFloatType{1,1} ][ requires_grad = TRUE ]
+#>
+#> $linear.bias
+#> torch_tensor
+#> 1.0390
+#> [ CPUFloatType{1} ][ requires_grad = TRUE ]
+#>
+#> $scale
+#> torch_tensor
+#> 0.9716
+#> [ CPUFloatType{1} ][ requires_grad = TRUE ]and quickly compare with the glm() function:
+summary(glm(as.numeric(y) ~ as.numeric(x)))
+#>
+#> Call:
+#> glm(formula = as.numeric(y) ~ as.numeric(x))
+#>
+#> Deviance Residuals:
+#> Min 1Q Median 3Q Max
+#> -2.5311 -0.6277 -0.1177 0.5544 3.3037
+#>
+#> Coefficients:
+#> Estimate Std. Error t value Pr(>|t|)
+#> (Intercept) 1.03900 0.09844 10.55 <2e-16 ***
+#> as.numeric(x) 1.97723 0.09392 21.05 <2e-16 ***
+#> ---
+#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+#>
+#> (Dispersion parameter for gaussian family taken to be 0.963191)
+#>
+#> Null deviance: 521.260 on 99 degrees of freedom
+#> Residual deviance: 94.393 on 98 degrees of freedom
+#> AIC: 284.02
+#>
+#> Number of Fisher Scoring iterations: 2
+library(torch)
+
+# creates example tensors. x requires_grad = TRUE tells that
+# we are going to take derivatives over it.
+x <- torch_tensor(3, requires_grad = TRUE)
+y <- torch_tensor(2)
+
+# executes the forward operation x^2
+o <- x^2
+
+# computes the backward operation for each tensor that is marked with
+# requires_grad = TRUE
+o$backward()
+
+# get do/dx = 2 * x (at x = 3)
+x$grad## torch_tensor
+## 6
+## [ CPUFloatType{1} ]
+
+library(torch)
+
+# creates example tensors. x requires_grad = TRUE tells that
+# we are going to take derivatives over it.
+dense <- nn_module(
+ clasname = "dense",
+ # the initialize function tuns whenever we instantiate the model
+ initialize = function(in_features, out_features) {
+
+ # just for you to see when this function is called
+ cat("Calling initialize!")
+
+ # we use nn_parameter to indicate that those tensors are special
+ # and should be treated as parameters by `nn_module`.
+ self$w <- nn_parameter(torch_randn(in_features, out_features))
+ self$b <- nn_parameter(torch_zeros(out_features))
+
+ },
+ # this function is called whenever we call our model on input.
+ forward = function(x) {
+ cat("Calling forward!")
+ torch_mm(x, self$w) + self$b
+ }
+)
+
+model <- dense(3, 1)## Calling initialize!
+
+# you can get all parameters
+model$parameters## $w
+## torch_tensor
+## -1.5142
+## 0.5371
+## -2.1216
+## [ CPUFloatType{3,1} ][ requires_grad = TRUE ]
+##
+## $b
+## torch_tensor
+## 0
+## [ CPUFloatType{1} ][ requires_grad = TRUE ]
+
+# or individually
+model$w## torch_tensor
+## -1.5142
+## 0.5371
+## -2.1216
+## [ CPUFloatType{3,1} ][ requires_grad = TRUE ]
+
+model$b## torch_tensor
+## 0
+## [ CPUFloatType{1} ][ requires_grad = TRUE ]
+
+# create an input tensor
+x <- torch_randn(10, 3)
+y_pred <- model(x)## Calling forward!
+
+y_pred## torch_tensor
+## -3.5633
+## 0.2356
+## 0.8124
+## 1.0558
+## 0.6703
+## -2.9113
+## -3.8835
+## 1.8210
+## 4.3455
+## 0.8677
+## [ CPUFloatType{10,1} ][ grad_fn = <AddBackward0> ]
+
+library(torch)
+
+# In deep learning models you don't usually have all your data in RAM
+# because you are usually training using mini-batch gradient descent
+# thus only needing a mini-batch on RAM each time.
+
+# In torch we use the `datasets` abstraction to define the process of
+# loading data. Once you have defined your dataset you can use torch
+# dataloaders that allows you to iterate over this dataset in batches.
+
+# Note that datasets are optional in torch. They are jut there as a
+# recommended way to load data.
+
+# Below you will see an example of how to create a simple torch dataset
+# that pre-process a data.frame into tensors so you can feed them to
+# a model.
+
+df_dataset <- dataset(
+ "mydataset",
+
+ # the input data to your dataset goes in the initialize function.
+ # our dataset will take a dataframe and the name of the response
+ # variable.
+ initialize = function(df, response_variable) {
+ self$df <- df[,-which(names(df) == response_variable)]
+ self$response_variable <- df[[response_variable]]
+ },
+
+ # the .getitem method takes an index as input and returns the
+ # corresponding item from the dataset.
+ # the index could be anything. the dataframe could have many
+ # rows for each index and the .getitem method would do some
+ # kind of aggregation before returning the element.
+ # in our case the index will be a row of the data.frame,
+ .getitem = function(index) {
+ response <- torch_tensor(self$response_variable[index])
+ x <- torch_tensor(as.numeric(self$df[index,]))
+
+ # note that the dataloaders will automatically stack tensors
+ # creating a new dimension
+ list(x = x, y = response)
+ },
+
+ # It's optional, but helpful to define the .length method returning
+ # the number of elements in the dataset. This is needed if you want
+ # to shuffle your dataset.
+ .length = function() {
+ length(self$response_variable)
+ }
+
+)
+
+
+# we can now initialize an instance of our dataset.
+# for example
+mtcars_dataset <- df_dataset(mtcars, "mpg")
+
+# now we can get an item with
+mtcars_dataset$.getitem(1)## $x
+## torch_tensor
+## 6.0000
+## 160.0000
+## 110.0000
+## 3.9000
+## 2.6200
+## 16.4600
+## 0.0000
+## 1.0000
+## 4.0000
+## 4.0000
+## [ CPUFloatType{10} ]
+##
+## $y
+## torch_tensor
+## 21
+## [ CPUFloatType{1} ]
+
+# Given a dataset you can create a dataloader with
+dl <- dataloader(mtcars_dataset, batch_size = 15, shuffle = TRUE)
+
+# we can then loop trough the elements of the dataloader with
+coro::loop(for(batch in dl) {
+ cat("X size: ")
+ print(batch[[1]]$size())
+ cat("Y size: ")
+ print(batch[[2]]$size())
+})## X size: [1] 15 10
+## Y size: [1] 15 1
+## X size: [1] 15 10
+## Y size: [1] 15 1
+## X size: [1] 2 10
+## Y size: [1] 2 1
+ Gallery of scripts demonstrating torch functionality.
| Examples | +
|---|
| basic-autograd | +
| basic-nn-module | +
| dataset | +
Adding operations to autograd requires implementing a new autograd_function for each operation. Recall that autograd_functionss are what autograd uses to compute the results and gradients, and encode the operation history. Every new function requires you to implement 2 methods:
forward() - the code that performs the operation. It can take as many arguments as you want, with some of them being optional, if you specify the default values. All kinds of R objects are accepted here. Tensor arguments that track history (i.e., with requires_grad=TRUE) will be converted to ones that don’t track history before the call, and their use will be registered in the graph. Note that this logic won’t traverse lists or any other data structures and will only consider Tensor’s that are direct arguments to the call. You can return either a single Tensor output, or a list of Tensors if there are multiple outputs. Also, please refer to the docs of autograd_function to find descriptions of useful methods that can be called only from forward().
backward() - gradient formula. It will be given as many Tensor arguments as there were outputs, with each of them representing gradient w.r.t. that output. It should return as many Tensors as there were Tensor's that required gradients in forward, with each of them containing the gradient w.r.t. its corresponding input.
It’s the user’s responsibility to use the special functions in the forward’s ctx properly in order to ensure that the new autograd_function works properly with the autograd engine.
save_for_backward() must be used when saving input or ouput of the forward to be used later in the backward.
mark_dirty() must be used to mark any input that is modified inplace by the forward function.
mark_non_differentiable() must be used to tell the engine if an output is not differentiable.
Below you can find code for a linear function:
+
+linear <- autograd_function(
+ forward = function(ctx, input, weight, bias = NULL) {
+ ctx$save_for_backward(input = input, weight = weight, bias = bias)
+ output <- input$mm(weight$t())
+ if (!is.null(bias))
+ output <- output + bias$unsqueeze(0)$expand_as(output)
+
+ output
+ },
+ backward = function(ctx, grad_output) {
+
+ s <- ctx$saved_variables
+
+ grads <- list(
+ input = NULL,
+ weight = NULL,
+ bias = NULL
+ )
+
+ if (ctx$needs_input_grad$input)
+ grads$input <- grad_output$mm(s$weight)
+
+ if (ctx$needs_input_grad$weight)
+ grads$weight <- grad_output$t()$mm(s$input)
+
+ if (!is.null(s$bias) && ctx$needs_input_grad$bias)
+ grads$bias <- grad_output$sum(dim = 0)
+
+ grads
+ }
+)Here, we give an additional example of a function that is parametrized by non-Tensor arguments:
+
+mul_constant <- autograd_function(
+ forward = function(ctx, tensor, constant) {
+ ctx$save_for_backward(constant = constant)
+ tensor * constant
+ },
+ backward = function(ctx, grad_output) {
+ v <- ctx$saved_variables
+ list(
+ tensor = grad_output * v$constant
+ )
+ }
+)
+x <- torch_tensor(1, requires_grad = TRUE)
+o <- mul_constant(x, 2)
+o$backward()
+x$grad
+#> torch_tensor
+#> 2
+#> [ CPUFloatType{1} ]In this article we describe the indexing operator for torch tensors and how it compares to the R indexing operator for arrays.
+Torch’s indexing semantics are closer to numpy’s semantics than R’s. You will find a lot of similarities between this article and the numpy indexing article available here.
Single element indexing for a 1-D tensors works mostly as expected. Like R, it is 1-based. Unlike R though, it accepts negative indices for indexing from the end of the array. (In R, negative indices are used to remove elements.)
+
+x <- torch_tensor(1:10)
+x[1]
+#> torch_tensor
+#> 1
+#> [ CPULongType{} ]
+x[-1]
+#> torch_tensor
+#> 10
+#> [ CPULongType{} ]You can also subset matrices and higher dimensions arrays using the same syntax:
+
+x <- x$reshape(shape = c(2,5))
+x
+#> torch_tensor
+#> 1 2 3 4 5
+#> 6 7 8 9 10
+#> [ CPULongType{2,5} ]
+x[1,3]
+#> torch_tensor
+#> 3
+#> [ CPULongType{} ]
+x[1,-1]
+#> torch_tensor
+#> 5
+#> [ CPULongType{} ]Note that if one indexes a multidimensional tensor with fewer indices than dimensions, one gets an error, unlike in R that would flatten the array. For example:
+
+x[1]
+#> torch_tensor
+#> 1
+#> 2
+#> 3
+#> 4
+#> 5
+#> [ CPULongType{5} ]It is possible to slice and stride arrays to extract sub-arrays of the same number of dimensions, but of different sizes than the original. This is best illustrated by a few examples:
+
+x <- torch_tensor(1:10)
+x
+#> torch_tensor
+#> 1
+#> 2
+#> 3
+#> 4
+#> 5
+#> 6
+#> 7
+#> 8
+#> 9
+#> 10
+#> [ CPULongType{10} ]
+x[2:5]
+#> torch_tensor
+#> 2
+#> 3
+#> 4
+#> 5
+#> [ CPULongType{4} ]
+x[1:(-7)]
+#> torch_tensor
+#> 1
+#> 2
+#> 3
+#> 4
+#> [ CPULongType{4} ]You can also use the 1:10:2 syntax which means: In the range from 1 to 10, take every second item. For example:
+x[1:5:2]
+#> torch_tensor
+#> 1
+#> 3
+#> 5
+#> [ CPULongType{3} ]Another special syntax is the N, meaning the size of the specified dimension.
+x[5:N]
+#> torch_tensor
+#> 5
+#> 6
+#> 7
+#> 8
+#> 9
+#> 10
+#> [ CPULongType{6} ]++Note: the slicing behavior relies on Non Standard Evaluation. It requires that the expression is passed to the
+[not exactly the resulting R vector.
To allow dynamic dynamic indices, you can create a new slice using the slc function. For example:
+x[1:5:2]
+#> torch_tensor
+#> 1
+#> 3
+#> 5
+#> [ CPULongType{3} ]is equivalent to:
+
+x[slc(start = 1, end = 5, step = 2)]
+#> torch_tensor
+#> 1
+#> 3
+#> 5
+#> [ CPULongType{3} ]Like in R, you can take all elements in a dimension by leaving an index empty.
+Consider a matrix:
+
+x <- torch_randn(2, 3)
+x
+#> torch_tensor
+#> -1.3812 0.8185 -0.6109
+#> -1.5838 0.8854 1.4161
+#> [ CPUFloatType{2,3} ]The following syntax will give you the first row:
+
+x[1,]
+#> torch_tensor
+#> -1.3812
+#> 0.8185
+#> -0.6109
+#> [ CPUFloatType{3} ]And this would give you the first 2 columns:
+
+x[,1:2]
+#> torch_tensor
+#> -1.3812 0.8185
+#> -1.5838 0.8854
+#> [ CPUFloatType{2,2} ]By default, when indexing by a single integer, this dimension will be dropped to avoid the singleton dimension:
+
+x <- torch_randn(2, 3)
+x[1,]$shape
+#> [1] 3You can optionally use the drop = FALSE argument to avoid dropping the dimension.
+x[1,,drop = FALSE]$shape
+#> [1] 1 3It’s possible to add a new dimension to a tensor using index-like syntax:
+
+x <- torch_tensor(c(10))
+x$shape
+#> [1] 1
+x[, newaxis]$shape
+#> [1] 1 1
+x[, newaxis, newaxis]$shape
+#> [1] 1 1 1You can also use NULL instead of newaxis:
+x[,NULL]$shape
+#> [1] 1 1Sometimes we don’t know how many dimensions a tensor has, but we do know what to do with the last available dimension, or the first one. To subsume all others, we can use ..:
+z <- torch_tensor(1:125)$reshape(c(5,5,5))
+z[1,..]
+#> torch_tensor
+#> 1 2 3 4 5
+#> 6 7 8 9 10
+#> 11 12 13 14 15
+#> 16 17 18 19 20
+#> 21 22 23 24 25
+#> [ CPULongType{5,5} ]
+z[..,1]
+#> torch_tensor
+#> 1 6 11 16 21
+#> 26 31 36 41 46
+#> 51 56 61 66 71
+#> 76 81 86 91 96
+#> 101 106 111 116 121
+#> [ CPULongType{5,5} ]Vector indexing is also supported but care must be taken regarding performance as, in general its much less performant than slice based indexing.
+++Note: Starting from version 0.5.0, vector indexing in torch follows R semantics, prior to that the behavior was similar to numpy’s advanced indexing. To use the old behavior, consider using
+?torch_index,?torch_index_putortorch_index_put_.
+x <- torch_randn(4,4)
+x[c(1,3), c(1,3)]
+#> torch_tensor
+#> -1.9185 0.5525
+#> -1.1085 0.4425
+#> [ CPUFloatType{2,2} ]You can also use boolean vectors, for example:
+
+x[c(TRUE, FALSE, TRUE, FALSE), c(TRUE, FALSE, TRUE, FALSE)]
+#> torch_tensor
+#> -1.9185 0.5525
+#> -1.1085 0.4425
+#> [ CPUFloatType{2,2} ]The above examples also work if the index were long or boolean tensors, instead of R vectors. It’s also possible to index with multi-dimensional boolean tensors:
+
+x <- torch_tensor(rbind(
+ c(1,2,3),
+ c(4,5,6)
+))
+x[x>3]
+#> torch_tensor
+#> 4
+#> 5
+#> 6
+#> [ CPUFloatType{3} ]After the usual R package installation, torch requires installing other 2 libraries: LibTorch and LibLantern. They are automatically installed by detecting information about you OS if you are using torch in interactive mode. If you are running torch in non-interactive environments you need to set the TORCH_INSTALL env var to 1, so it’s automatically installed or manually call torch::install_torch().
We have provide pre-compiled binaries for all major platforms and you can find specific installation instructions below.
+If you don’t have a GPU or want to install the CPU version of torch, you can install with:
+install.packages("torch")Some Windows distributions don’t have the Visual Studio runtime pre-installed and you will observe an error like:
+Error in cpp_lantern_init(normalizePath(install_path())): C:\Users\User\Documents\R\R-4.0.2\library\torch\deps\lantern.dll - The specified module could not be found.
+See here for instructions on how to install it.
+Since version 0.1.1 torch supports GPU installation on Windows. In order to use GPU’s with torch you need to:
+Have a CUDA compatible NVIDIA GPU. You can find if you have a CUDA compatible GPU here.
Have properly installed the NVIDIA CUDA toolkit version 10.2 or 11.1. For CUDA v11.1, follow the installation instructions here. Note: We currently do not support the latest CUDA version 11.
Have installed cuDNN version 7.6 (if using CUDA v10.2) or cuDNN 8.0 (if using CUDA 11.1). Follow the installation instructions available here.
Once you have installed all pre-requisites you can install torch with:
+install.packages("torch")If you have followed default installation locations we will detect that you have CUDA software installed and automatically download the GPU enabled Lantern binaries. You can also specify the CUDA env var with something like Sys.setenv(CUDA="11.1") if you want to force an specific version of the CUDA toolkit.
We only support CPU builds of torch on MacOS. On MacOS you can install torch with:
+
+install.packages("torch")To install the GPU version of torch on linux you must verify that:
You have a NVIDIA CUDA compatible GPU. You can find if you have a CUDA compatible GPU here.
You have correctly installed the NVIDIA CUDA Toolkit versions 10.2 or 11.1, follow the instructions here.
You have installed cuDNN version 7.6 - (for CUDA 10.2) and 8.0 for CUDA 11.1. Follow the installation instructions available here.
Once you have installed all pre-requisites you can install torch with:
+install.packages("torch")If you have followed default installation locations we will detect that you have CUDA software installed and automatically download the GPU enabled Lantern binaries. You can also specify the CUDA env var with something like Sys.setenv(CUDA="10.2") if you want to force an specific version of the CUDA toolkit.
If you encounter timeout during library download, or if after a while, downloads end-up with a warning such as:
+Warning messages:
+1: In utils::download.file(library_url, temp_file) :
+ downloaded length 44901568 != reported length 141774525
+2: In utils::download.file(library_url, temp_file) :
+ URL '...': Timeout of 60 seconds was reached
+3: Failed to install Torch, manually run install_torch(). download from 'https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.7.1.zip' failed
+This means you encounter a download timeout. then, you should increase the timeout value in install_torch() like
+install_torch(timeout = 600)In cases where you cannot reach download servers from the machine you intend to install torch on, last resort is to install Torch and Lantern library from files. This is done in 3 steps :
+1- get the download URLs of the files.
+
+get_install_libs_url(type = "10.2")2- save those files into the machine filesystem. We will use /tmp/ here as an example .
3- install torch from files
+
+install_torch_from_file(libtorch = "file:///tmp/libtorch-cxx11-abi-shared-with-deps-1.7.1%2Bcu101.zip",
+ liblantern = "file:///tmp/Linux-gpu-101.zip")Central to data ingestion and preprocessing are datasets and data loaders. A dataset is an object that holds the data to use, while a data loader is an object that will load the data from a dataset providing a way to access subsets of the data. By using datasets and data loaders you will have a process for clearly organizing your data and passing it to other components of the torch package, such as model training.
+Built into torch are premade datasets that are commonly used in machine learning, such as the MNIST handwriting dataset (mnist_dataset()). Most of the prebuilt datasets relate to image recognition and natural language processing.
Below is an example of how you would use the MNIST dataset with a dataloader. First, the minst_dataset() function is used to create ds which is a Dataset object. Then a dataloader dl is created to query that data. Finally, that dataloader is used in a coro::loop() to iterate over batches of that data:
# Create a dataset from included data
+ds <- mnist_dataset(
+ dir,
+ download = TRUE,
+ transform = function(x) {
+ x <- x$to(dtype = torch_float())/256
+ x[newaxis,..]
+ }
+)
+
+# Create the loader to query the data in batches
+dl <- dataloader(ds, batch_size = 32, shuffle = TRUE)
+
+coro::loop(for (b in dl)) {
+# use the data from each batch `b` here
+# ...
+})See vignettes/examples/mnist-cnn.R for a complete example.
In the more common situation where you have a unique set of data that isn’t included with the package you’ll need to make a custom Dataset subclass by using the dataset() function. The custom Dataset subclass is an abstract R6 container for the data. It will need to know some information about the particular dataset, such as how to iterate over it.
At a minimum, when using dataset() to create a custom Dataset class you’ll want to define the following:
name - for convenience, keep track of what type of data it isinitialize - a member function defining how to create a object with that class. It could have no parameters, for when all objects of that class will be the same, or you can give it specific parameters usually for if different objects should have different data..getitem - this member function is called when the dataloader goes to pull a new batch of data. You can include preprocessing in this function if needed. Note that the function will be called extremely frequently, so it’s advantageous to make it fast..length - this will return the amount of data in the dataset, which is helpful for users.While this may sound complicated the base logic is only a few steps–the complexity often comes from the data itself and how involved your preprocessing is. Here we show how to create your own Dataset class to train on Allison Horst's penguins.
| Component | +
+Dataset R6 class |
+
+Dataset object |
+
+DataLoader object |
+batch | +
|---|---|---|---|---|
| Description | +Output of dataset(). When calling dataset() it should have at least a name, initialize, .getitem, and .length. Output is a Dataset generator. |
+Object created by using the custom Dataset generator. Actually stores the data |
+Object that queries the Dataset object to pull batches of data |
+The subsample of data used for things like model training | +
| Penguin example | +penguins_dataset |
+tuxes |
+dl |
+b |
+
+library(palmerpenguins)
+library(magrittr)
+
+penguins
+#> # A tibble: 344 × 8
+#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
+#> <fct> <fct> <dbl> <dbl> <int> <int>
+#> 1 Adelie Torgersen 39.1 18.7 181 3750
+#> 2 Adelie Torgersen 39.5 17.4 186 3800
+#> 3 Adelie Torgersen 40.3 18 195 3250
+#> 4 Adelie Torgersen NA NA NA NA
+#> 5 Adelie Torgersen 36.7 19.3 193 3450
+#> 6 Adelie Torgersen 39.3 20.6 190 3650
+#> 7 Adelie Torgersen 38.9 17.8 181 3625
+#> 8 Adelie Torgersen 39.2 19.6 195 4675
+#> 9 Adelie Torgersen 34.1 18.1 193 3475
+#> 10 Adelie Torgersen 42 20.2 190 4250
+#> # … with 334 more rows, and 2 more variables: sex <fct>, year <int>In addition, any number of helper functions can be defined.
+Here, we assume the penguins have already been loaded, and all preprocessing consists in removing lines with NA values, transforming factors to numbers starting from 0, and converting from R data types to torch tensors.
In .getitem, we essentially decide how this data is going to be used: All variables besides species go into x, the predictor, and species will constitute y, the target. Predictor and target are returned in a list, to be accessed as batch[[1]] and batch[[2]] during training.
+penguins_dataset <- dataset(
+
+ name = "penguins_dataset",
+
+ initialize = function() {
+ self$data <- self$prepare_penguin_data()
+ },
+
+ .getitem = function(index) {
+
+ x <- self$data[index, 2:-1]
+ y <- self$data[index, 1]$to(torch_long())
+
+ list(x, y)
+ },
+
+ .length = function() {
+ self$data$size()[[1]]
+ },
+
+ prepare_penguin_data = function() {
+
+ input <- na.omit(penguins)
+ # conveniently, the categorical data are already factors
+ input$species <- as.numeric(input$species)
+ input$island <- as.numeric(input$island)
+ input$sex <- as.numeric(input$sex)
+
+ input <- as.matrix(input)
+ torch_tensor(input)
+ }
+)Let’s create the dataset , query for it’s length, and look at its first item:
+
+tuxes <- penguins_dataset()
+tuxes$.length()
+#> [1] 333
+tuxes$.getitem(1)
+#> [[1]]
+#> torch_tensor
+#> 3.0000
+#> 39.1000
+#> 18.7000
+#> 181.0000
+#> 3750.0000
+#> 2.0000
+#> 2007.0000
+#> [ CPUFloatType{7} ]
+#>
+#> [[2]]
+#> torch_tensor
+#> 1
+#> [ CPULongType{} ]To be able to iterate over tuxes, we need a data loader (we override the default batch size of 1):
+dl <- tuxes %>% dataloader(batch_size = 8)Calling .length() on a data loader (as opposed to a dataset) will return the number of batches we have:
+dl$.length()
+#> [1] 42And we can create an iterator to inspect the first batch:
+
+iter <- dl$.iter()
+b <- iter$.next()
+b
+#> [[1]]
+#> torch_tensor
+#> 3.0000 39.1000 18.7000 181.0000 3750.0000 2.0000 2007.0000
+#> 3.0000 39.5000 17.4000 186.0000 3800.0000 1.0000 2007.0000
+#> 3.0000 40.3000 18.0000 195.0000 3250.0000 1.0000 2007.0000
+#> 3.0000 36.7000 19.3000 193.0000 3450.0000 1.0000 2007.0000
+#> 3.0000 39.3000 20.6000 190.0000 3650.0000 2.0000 2007.0000
+#> 3.0000 38.9000 17.8000 181.0000 3625.0000 1.0000 2007.0000
+#> 3.0000 39.2000 19.6000 195.0000 4675.0000 2.0000 2007.0000
+#> 3.0000 41.1000 17.6000 182.0000 3200.0000 1.0000 2007.0000
+#> [ CPUFloatType{8,7} ]
+#>
+#> [[2]]
+#> torch_tensor
+#> 1
+#> 1
+#> 1
+#> 1
+#> 1
+#> 1
+#> 1
+#> 1
+#> [ CPULongType{8} ]To train a network, we can use coro::loop() to iterate over batches.
Our example network is very simple. (In reality, we would want to treat island as the categorical variable it is, and either one-hot-encode or embed it.)
+net <- nn_module(
+ "PenguinNet",
+ initialize = function() {
+ self$fc1 <- nn_linear(7, 32)
+ self$fc2 <- nn_linear(32, 3)
+ },
+ forward = function(x) {
+ x %>%
+ self$fc1() %>%
+ nnf_relu() %>%
+ self$fc2() %>%
+ nnf_log_softmax(dim = 1)
+ }
+)
+
+model <- net()We still need an optimizer:
+
+optimizer <- optim_sgd(model$parameters, lr = 0.01)And we’re ready to train:
+
+for (epoch in 1:10) {
+
+ l <- c()
+
+ coro::loop(for (b in dl) {
+ optimizer$zero_grad()
+ output <- model(b[[1]])
+ loss <- nnf_nll_loss(output, b[[2]])
+ loss$backward()
+ optimizer$step()
+ l <- c(l, loss$item())
+ })
+
+ cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
+}
+#> Loss at epoch 1: 207.466937
+#> Loss at epoch 2: 2.068251
+#> Loss at epoch 3: 2.068251
+#> Loss at epoch 4: 2.068251
+#> Loss at epoch 5: 2.068251
+#> Loss at epoch 6: 2.068251
+#> Loss at epoch 7: 2.068251
+#> Loss at epoch 8: 2.068251
+#> Loss at epoch 9: 2.068251
+#> Loss at epoch 10: 2.068251Through this example we have trained a deep learning model using dataset() to define a custom class and then loaded it in batches with a data loader. By using the dataset and data loader we were able to write code that split the data preprocessing and setup from the model training itself.
When using datasets and data loaders you may find that under certain conditions your code is running more slowly than you’d expect. In some situations the overhead of using dataloaders and datasets can impact overall performance. This may change in time as the R/C++ integration of Torch improves, but for now there are some workarounds:
+.getbatch() instead of .getitem()
+By default a dataloader will use the .getitem() member function to pull each single datapoint individually. You can speed this up by switching to using .getbatch() which will pull all the datapoints in a batch at once:
+penguins_dataset_batching <- dataset(
+
+ name = "penguins_dataset_batching",
+
+ initialize = function() {
+ self$data <- self$prepare_penguin_data()
+ },
+
+ # the only change is that this went from .getitem to .getbatch
+ .getbatch = function(index) {
+
+ x <- self$data[index, 2:-1]
+ y <- self$data[index, 1]$to(torch_long())
+
+ list(x, y)
+ },
+
+ .length = function() {
+ self$data$size()[[1]]
+ },
+
+ prepare_penguin_data = function() {
+
+ input <- na.omit(penguins)
+ # conveniently, the categorical data are already factors
+ input$species <- as.numeric(input$species)
+ input$island <- as.numeric(input$island)
+ input$sex <- as.numeric(input$sex)
+
+ input <- as.matrix(input)
+ torch_tensor(input)
+ }
+)In many instances the only change is to exactly replace just .getitem with .getbatch since often the .getitem function is written to handle vectors of indices. In this penguins example the .getitem function used the index to select the rows, which will work fine with a vector instead
If switching to .getbatch does not provide the benefit you were expecting you could also remove the dataset entirely and manually pass the data. At this point you are trading readability of your code and convenience for speed.
+input <- na.omit(penguins)
+# conveniently, the categorical data are already factors
+input$species <- as.numeric(input$species)
+input$island <- as.numeric(input$island)
+input$sex <- as.numeric(input$sex)
+
+input <- as.matrix(input)
+input <- torch_tensor(input)
+
+data_x <- input[, 2:-1]
+data_y <- input[, 1]$to(torch_long())
+
+batch_size <- 8
+num_data_points <- data_y$size(1)
+num_batches <- floor(num_data_points/batch_size)
+
+for(epoch in 1:10){
+
+ # rearrange the data each epoch
+ permute <- torch_randperm(num_data_points) + 1L
+ data_x <- data_x[permute]
+ data_y <- data_y[permute]
+
+ # manually loop through the batches
+ for(batch_idx in 1:num_batches){
+
+ # here index is a vector of the indices in the batch
+ index <- (batch_size*(batch_idx-1) + 1):(batch_idx*batch_size)
+
+ x <- data_x[index]
+ y <- data_y[index]$to(torch_long())
+
+ optimizer$zero_grad()
+ output <- model(x)
+ loss <- nnf_nll_loss(output, y)
+ loss$backward()
+ optimizer$step()
+ l <- c(l, loss$item())
+ }
+
+ cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
+}Currently the only way to load models from python is to rewrite the model architecture in R. All the parameter names must be identical. A complete example from Python to R is shown below. This is an extension of the Serialization vignette.
+An artificial neural net is implemented below in Python. Note the final line which uses torch.save().
+import torch
+import numpy as np
+
+#Make up data
+
+madeUpData_x = np.random.rand(1000,100)
+madeUpData_y = np.random.rand(1000)
+
+#Convert to categorical
+madeUpData_y = madeUpData_y.round()
+
+train_py_X = torch.from_numpy(madeUpData_x).float()
+
+train_py_Y = torch.from_numpy(madeUpData_y).float()
+
+#Note that this class must be replicated identically in R
+class simpleMLP(torch.nn.Module):
+ def __init__(self):
+ super(simpleMLP, self).__init__()
+ self.modelFit = torch.nn.Sequential(
+ torch.nn.Linear(100,20),
+ torch.nn.ReLU(),
+ torch.nn.Linear(20,1),
+ torch.nn.Sigmoid())
+
+ def forward(self, x):
+ x =self.modelFit(x)
+
+ return x
+
+model = simpleMLP()
+
+
+def modelTrainer(data_X,data_Y,model):
+ criterion = torch.nn.BCELoss()
+ optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
+
+ for epoch in range(100):
+
+ optimizer.zero_grad()
+
+ yhat = model(data_X)
+
+ loss = criterion(yhat,data_Y.unsqueeze(1))
+
+ loss.backward()
+ optimizer.step()
+
+modelTrainer(data_X = train_py_X,data_Y = train_py_Y,model = model)
+
+#-----------------------------------------------------------------
+#save the model
+
+#Note that model.state_dict() comes out as an ordered dictionary
+#The code below converts to a dictionary
+stateDict = dict(model.state_dict())
+
+#Note the argument _use_new_zipfile_serialization
+torch.save(stateDict,f="path/babyTest.pth",
+ _use_new_zipfile_serialization=True)Once we have a saved .pth object we can load this into R. An example use case would be training a model in Python then using Shiny to develop a GUI for predictions from a trained model.
+
+library(torch)
+
+#Make up some test data
+#note that proper installation of torch will yield no errors when we run
+#this code
+y <- torch_tensor(array(runif(8),dim = c(2,2,2)),dtype = torch_float64())
+
+#Note the identical names between the Python class definition and our
+#class definition
+simpleMLP <- torch::nn_module(
+ "simpleMLP",
+ initialize = function(){
+
+ self$modelFit <- nn_sequential(nn_linear(100,20),
+ nn_relu(),
+ nn_linear(20,1),
+ nn_sigmoid())
+
+ },
+ forward = function(x){
+ self$modelFit(x)
+ }
+)
+
+
+model <- simpleMLP()
+
+state_dict <- torch::load_state_dict("p/babyTest.pth")
+model$load_state_dict(state_dict)
+
+#Note that the dtype set in R has to match the made up data from Python
+#More generally if reading new data into R you must ensure that it matches the
+#dtype that the model was trained with in Python
+newData = torch_tensor(array(rnorm(n=1000),dim=c(10,100)),dtype=torch_float32())
+
+predictMe <- model(newData)Torch tensors in R are pointers to Tensors allocated by LibTorch. This has one major consequence for serialization. One cannot simply use saveRDS for serializing tensors, as you would save the pointer but not the data itself. When reloading a tensor saved with saveRDS the pointer might have been deleted in LibTorch and you would get wrong results.
To solve this problem, torch implements specialized functions for serializing tensors to the disk:
torch_save(): to save tensors and models to the disk.torch_load(): to load the models or tensors back to the session.Please note that this format is still experimental and you shouldn’t use it for long term storage.
+You can save any object of type torch_tensor to the disk using:
+x <- torch_randn(10, 10)
+torch_save(x, "tensor.pt")
+x_ <- torch_load("tensor.pt")
+
+torch_allclose(x, x_)
+#> [1] TRUEThe torch_save and torch_load functions also work for nn_modules objects.
When saving an nn_module, all the object is serialized including the model structure and it’s state.
+module <- nn_module(
+ "my_module",
+ initialize = function() {
+ self$fc1 <- nn_linear(10, 10)
+ self$fc2 <- nn_linear(10, 1)
+ },
+ forward = function(x) {
+ x %>%
+ self$fc1() %>%
+ self$fc2()
+ }
+)
+
+model <- module()
+torch_save(model, "model.pt")
+model_ <- torch_load("model.pt")
+
+# input tensor
+x <- torch_randn(50, 10)
+torch_allclose(model(x), model_(x))
+#> [1] TRUECurrently the only way to load models from python is to rewrite the model architecture in R. All the parameter names must be identical.
+You can then save the PyTorch model state_dict using:
+torch.save(model, fpath, _use_new_zipfile_serialization=True)
+You can then reload the state dict in R and reload it into the model with:
+
+state_dict <- load_state_dict(fpath)
+model <- Model()
+model$load_state_dict(state_dict)You can find working examples in torchvision. For example this is what we do for the AlexNet model.
You can save the state of optimizers so you can continue training from the exact same position.
+In order to this we use the state_dict() and load_state_dict() methods from the optimizer combined with torch_save:
+model <- nn_linear(1, 1)
+opt <- optim_adam(model$parameters)
+
+train_x <- torch_randn(100, 1)
+train_y <- torch_randn(100, 1)
+
+loss <- nnf_mse_loss(model(train_x), train_y)
+loss$backward()
+opt$step()
+#> NULL
+
+# Now let's save the optimizer state
+tmp <- tempfile()
+torch_save(opt$state_dict(), tmp)
+
+# And now let's create a new optimizer and load back
+opt2 <- optim_adam(model$parameters)
+opt2$load_state_dict(torch_load(tmp))In this article we describe various ways of creating torch tensors in R.
You can create tensors from R objects using the torch_tensor function. The torch_tensor function takes an R vector, matrix or array and creates an equivalent torch_tensor.
You can see a few examples below:
+
+torch_tensor(c(1,2,3))
+#> torch_tensor
+#> 1
+#> 2
+#> 3
+#> [ CPUFloatType{3} ]
+
+# conform to row-major indexing used in torch
+torch_tensor(matrix(1:10, ncol = 5, nrow = 2, byrow = TRUE))
+#> torch_tensor
+#> 1 2 3 4 5
+#> 6 7 8 9 10
+#> [ CPULongType{2,5} ]
+torch_tensor(array(runif(12), dim = c(2, 2, 3)))
+#> torch_tensor
+#> (1,.,.) =
+#> 0.9033 0.5918 0.7076
+#> 0.3192 0.7748 0.5755
+#>
+#> (2,.,.) =
+#> 0.7412 0.5693 0.5732
+#> 0.5785 0.9144 0.0358
+#> [ CPUFloatType{2,2,3} ]By default, we will create tensors in the cpu device, converting their R datatype to the corresponding torch dtype.
++Note currently, only numeric and boolean types are supported.
+
You can always modify dtype and device when converting an R object to a torch tensor. For example:
+torch_tensor(1, dtype = torch_long())
+#> torch_tensor
+#> 1
+#> [ CPULongType{1} ]
+torch_tensor(1, device = "cpu", dtype = torch_float64())
+#> torch_tensor
+#> 1
+#> [ CPUDoubleType{1} ]Other options available when creating a tensor are:
+requires_grad: boolean indicating if you want autograd to record operations on them for automatic differentiation.pin_memory: – If set, the tensor returned would be allocated in pinned memory. Works only for CPU tensors.These options are available for all functions that can be used to create new tensors, including the factory functions listed in the next section.
+You can also use the torch_* functions listed below to create torch tensors using some algorithm.
For example, the torch_randn function will create tensors using the normal distribution with mean 0 and standard deviation 1. You can use the ... argument to pass the size of the dimensions. For example, the code below will create a normally distributed tensor with shape 5x3.
+x <- torch_randn(5, 3)
+x
+#> torch_tensor
+#> 0.5579 0.2416 -0.8984
+#> 0.5693 -0.9027 -2.7836
+#> -1.7356 -0.4453 -0.7002
+#> -0.6925 0.9486 0.3937
+#> 0.3937 1.7807 -0.6612
+#> [ CPUFloatType{5,3} ]Another example is torch_ones, which creates a tensor filled with ones.
+x <- torch_ones(2, 4, dtype = torch_int64(), device = "cpu")
+x
+#> torch_tensor
+#> 1 1 1 1
+#> 1 1 1 1
+#> [ CPULongType{2,4} ]Here is the full list of functions that can be used to bulk-create tensors in torch:
+torch_arange: Returns a tensor with a sequence of integers,torch_empty: Returns a tensor with uninitialized values,torch_eye: Returns an identity matrix,torch_full: Returns a tensor filled with a single value,torch_linspace: Returns a tensor with values linearly spaced in some interval,torch_logspace: Returns a tensor with values logarithmically spaced in some interval,torch_ones: Returns a tensor filled with all ones,torch_rand: Returns a tensor filled with values drawn from a uniform distribution on [0, 1).torch_randint: Returns a tensor with integers randomly drawn from an interval,torch_randn: Returns a tensor filled with values drawn from a unit normal distribution,torch_randperm: Returns a tensor filled with a random permutation of integers in some interval,torch_zeros: Returns a tensor filled with all zeros.Once a tensor exists you can convert between dtypes and move to a different device with to method. For example:
+x <- torch_tensor(1)
+y <- x$to(dtype = torch_int32())
+x
+#> torch_tensor
+#> 1
+#> [ CPUFloatType{1} ]
+y
+#> torch_tensor
+#> 1
+#> [ CPUIntType{1} ]You can also copy a tensor to the GPU using:
+ +Central to torch is the torch_tensor objects. torch_tensor’s are R objects very similar to R6 instances. Tensors have a large amount of methods that can be called using the $ operator.
Following is a list of all methods that can be called by tensor objects and their documentation. You can also look at PyTorch’s documentation for additional details.
+Is this Tensor with its dimensions reversed.
+If n is the number of dimensions in x, x$numpy_T() is equivalent to x$permute(n, n-1, ..., 1).
add(other, *, alpha=1) -> Tensor
+Add a scalar or tensor to self tensor. If both alpha and other are specified, each element of other is scaled by alpha before being used.
When other is a tensor, the shape of other must be broadcastable with the shape of the underlying tensor
See ?torch_add
align_as(other) -> Tensor
+Permutes the dimensions of the self tensor to match the dimension order in the other tensor, adding size-one dims for any new names.
This operation is useful for explicit broadcasting by names (see examples).
+All of the dims of self must be named in order to use this method. The resulting tensor is a view on the original tensor.
All dimension names of self must be present in other$names. other may contain named dimensions that are not in self$names; the output tensor has a size-one dimension for each of those new names.
To align a tensor to a specific order, use $align_to.
+# Example 1: Applying a mask
+mask <- torch_randint(low = 0, high = 2, size = c(127, 128), dtype=torch_bool())$refine_names(c('W', 'H'))
+imgs <- torch_randn(32, 128, 127, 3, names=c('N', 'H', 'W', 'C'))
+imgs$masked_fill_(mask$align_as(imgs), 0)
+
+# Example 2: Applying a per-channel-scale
+scale_channels <- function(input, scale) {
+ scale <- scale$refine_names("C")
+ input * scale$align_as(input)
+}
+
+num_channels <- 3
+scale <- torch_randn(num_channels, names='C')
+imgs <- torch_rand(32, 128, 128, num_channels, names=c('N', 'H', 'W', 'C'))
+more_imgs = torch_rand(32, num_channels, 128, 128, names=c('N', 'C', 'H', 'W'))
+videos = torch_randn(3, num_channels, 128, 128, 128, names=c('N', 'C', 'H', 'W', 'D'))
+
+# scale_channels is agnostic to the dimension order of the input
+scale_channels(imgs, scale)
+scale_channels(more_imgs, scale)
+scale_channels(videos, scale)Permutes the dimensions of the self tensor to match the order specified in names, adding size-one dims for any new names.
All of the dims of self must be named in order to use this method. The resulting tensor is a view on the original tensor.
All dimension names of self must be present in names. names may contain additional names that are not in self$names; the output tensor has a size-one dimension for each of those new names.
all() -> bool
+Returns TRUE if all elements in the tensor are TRUE, FALSE otherwise.
+
+a <- torch_rand(1, 2)$to(dtype = torch_bool())
+a
+a$all()all(dim, keepdim=FALSE, out=NULL) -> Tensor
+Returns TRUE if all elements in each row of the tensor in the given dimension dim are TRUE, FALSE otherwise.
If keepdim is TRUE, the output tensor is of the same size as input except in the dimension dim where it is of size 1. Otherwise, dim is squeezed (see ?torch_squeeze()), resulting in the output tensor having 1 fewer dimension than input.
dim retained or not
+a <- torch_rand(4, 2)$to(dtype = torch_bool())
+a
+a$all(dim=2)
+a$all(dim=1)allclose(other, rtol=1e-05, atol=1e-08, equal_nan=FALSE) -> Tensor
+See ?torch_allclose
any() -> bool
+Returns TRUE if any elements in the tensor are TRUE, FALSE otherwise.
+
+a <- torch_rand(1, 2)$to(dtype = torch_bool())
+a
+a$any()any(dim, keepdim=FALSE, out=NULL) -> Tensor
+Returns TRUE if any elements in each row of the tensor in the given dimension dim are TRUE, FALSE otherwise.
If keepdim is TRUE, the output tensor is of the same size as input except in the dimension dim where it is of size 1. Otherwise, dim is squeezed (see ?torch_squeeze()), resulting in the output tensor having 1 fewer dimension than input.
dim retained or not
+a <- torch_randn(4, 2) < 0
+a
+a$any(2)
+a$any(1)apply_(callable) -> Tensor
+Applies the function callable to each element in the tensor, replacing each element with the value returned by callable.
as_subclass(cls) -> Tensor
+Makes a cls instance with the same data pointer as self. Changes in the output mirror changes in self, and the output stays attached to the autograd graph. cls must be a subclass of Tensor.
Computes the gradient of current tensor w.r.t. graph leaves.
+The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying gradient. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t. self.
This function accumulates gradients in the leaves - you might need to zero $grad attributes or set them to NULL before calling it. See Default gradient layouts<default-grad-layouts> for details on the memory layout of accumulated gradients.
create_graph is TRUE. NULL values can be specified for scalar Tensors or ones that don’t require grad. If a NULL value would be acceptable then this argument is optional.FALSE, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to TRUE is not needed and often can be worked around in a much more efficient way. Defaults to the value of create_graph.TRUE, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to FALSE.bernoulli(*, generator=NULL) -> Tensor
+Returns a result tensor where each \(\texttt{result[i]}\) is independently sampled from \(\text{Bernoulli}(\texttt{self[i]})\). self must have floating point dtype, and the result will have the same dtype.
See ?torch_bernoulli
bernoulli_(p=0.5, *, generator=NULL) -> Tensor
+Fills each location of self with an independent sample from \(\text{Bernoulli}(\texttt{p})\). self can have integral dtype.
bernoulli_(p_tensor, *, generator=NULL) -> Tensor
+p_tensor should be a tensor containing probabilities to be used for drawing the binary random number.
The \(\text{i}^{th}\) element of self tensor will be set to a value sampled from \(\text{Bernoulli}(\texttt{p\_tensor[i]})\).
self can have integral dtype, but p_tensor must have floating point dtype.
See also $bernoulli and ?torch_bernoulli
bfloat16(memory_format=torch_preserve_format) -> Tensor self$bfloat16() is equivalent to self$to(torch_bfloat16). See [to()].
bool(memory_format=torch_preserve_format) -> Tensor
+self$bool() is equivalent to self$to(torch_bool). See [to()].
byte(memory_format=torch_preserve_format) -> Tensor
+self$byte() is equivalent to self$to(torch_uint8). See [to()].
cauchy_(median=0, sigma=1, *, generator=NULL) -> Tensor
+Fills the tensor with numbers drawn from the Cauchy distribution:
+\[ +f(x) = \dfrac{1}{\pi} \dfrac{\sigma}{(x - \text{median})^2 + \sigma^2} +\]
+char(memory_format=torch_preserve_format) -> Tensor
+self$char() is equivalent to self$to(torch_int8). See [to()].
clone(memory_format=torch_preserve_format()) -> Tensor
+Returns a copy of the self tensor. The copy has the same size and data type as self.
+x <- torch_tensor(1)
+y <- x$clone()
+
+x$add_(1)
+ycontiguous(memory_format=torch_contiguous_format) -> Tensor
+Returns a contiguous in memory tensor containing the same data as self tensor. If self tensor is already in the specified memory format, this function returns the self tensor.
copy_(src, non_blocking=FALSE) -> Tensor
+Copies the elements from src into self tensor and returns self.
The src tensor must be broadcastable with the self tensor. It may be of a different data type or reside on a different device.
cpu(memory_format=torch_preserve_format) -> Tensor
+Returns a copy of this object in CPU memory.
+If this object is already in CPU memory and on the correct device, then no copy is performed and the original object is returned.
+ +cuda(device=NULL, non_blocking=FALSE, memory_format=torch_preserve_format) -> Tensor
+Returns a copy of this object in CUDA memory.
+If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned.
+torch_device): The destination GPU device. Defaults to the current CUDA device.TRUE and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: FALSE.torch_memory_format, optional): the desired memory format oftorch_preserve_format.dense_dim() -> int
+If self is a sparse COO tensor (i.e., with torch_sparse_coo layout), this returns the number of dense dimensions. Otherwise, this throws an error.
See also $sparse_dim.
dequantize() -> Tensor
+Given a quantized Tensor, dequantize it and return the dequantized float Tensor.
+Returns a new Tensor, detached from the current graph.
+The result will never require gradient.
+Returned Tensor shares the same storage with the original one.
+In-place modifications on either of them will be seen, and may trigger errors in correctness checks.
+IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error.
For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.
Detaches the Tensor from the graph that created it, making it a leaf. Views cannot be detached in-place.
+double(memory_format=torch_preserve_format) -> Tensor
+self$double() is equivalent to self$to(torch_float64). See [to()].
element_size() -> int
+Returns the size in bytes of an individual element.
+
+torch_tensor(c(1))$element_size()expand(*sizes) -> Tensor
+Returns a new view of the self tensor with singleton dimensions expanded to a larger size.
Passing -1 as the size for a dimension means not changing the size of that dimension.
+Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1.
+Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the stride to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory.
More than one element of an expanded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first.
+
+x <- torch_tensor(matrix(c(1,2,3), ncol = 1))
+x$size()
+x$expand(c(3, 4))
+x$expand(c(-1, 4)) # -1 means not changing the size of that dimensionexpand_as(other) -> Tensor
+Expand this tensor to the same size as other. self$expand_as(other) is equivalent to self$expand(other.size()).
Please see $expand for more information about expand.
exponential_(lambd=1, *, generator=NULL) -> Tensor
+Fills self tensor with elements drawn from the exponential distribution:
\[ +f(x) = \lambda e^{-\lambda x} +\]
+fill_diagonal_(fill_value, wrap=FALSE) -> Tensor
+Fill the main diagonal of a tensor that has at least 2-dimensions. When dims>2, all dimensions of input must be of equal length. This function modifies the input tensor in-place, and returns the input tensor.
+
+a <- torch_zeros(3, 3)
+a$fill_diagonal_(5)
+b <- torch_zeros(7, 3)
+b$fill_diagonal_(5)
+c <- torch_zeros(7, 3)
+c$fill_diagonal_(5, wrap=TRUE)float(memory_format=torch_preserve_format) -> Tensor
+self$float() is equivalent to self$to(torch_float32). See [to()].
geometric_(p, *, generator=NULL) -> Tensor
+Fills self tensor with elements drawn from the geometric distribution:
\[ +f(X=k) = p^{k - 1} (1 - p) +\]
+get_device() -> Device ordinal (Integer)
+For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, an error is thrown.
+
+x <- torch_randn(3, 4, 5, device='cuda:0')
+x$get_device()
+x$cpu()$get_device() # RuntimeError: get_device is not implemented for type torch_FloatTensorThis attribute is NULL by default and becomes a Tensor the first time a call to backward computes gradients for self. The attribute will then contain the gradients computed and future calls to [backward()] will accumulate (add) gradients into it.
half(memory_format=torch_preserve_format) -> Tensor
+self$half() is equivalent to self$to(torch_float16). See [to()].
Returns a new tensor containing imaginary values of the self tensor. The returned tensor and self share the same underlying storage.
+x <- torch_randn(4, dtype=torch_cfloat())
+x
+x$imagindex_add(tensor1, dim, index, tensor2) -> Tensor
+Out-of-place version of $index_add_. tensor1 corresponds to self in $index_add_.
index_add_(dim, index, tensor) -> Tensor
+Accumulate the elements of tensor into the self tensor by adding to the indices in the order given in index. For example, if dim == 0 and index[i] == j, then the i th row of tensor is added to the j th row of self.
The dim th dimension of tensor must have the same size as the length of index (which must be a vector), and all other dimensions must match self, or an error will be raised.
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch_backends.cudnn.deterministic = TRUE.
tensor to select from
+x <- torch_ones(5, 3)
+t <- torch_tensor(matrix(1:9, ncol = 3), dtype=torch_float())
+index <- torch_tensor(c(1L, 4L, 3L))
+x$index_add_(1, index, t)index_copy(tensor1, dim, index, tensor2) -> Tensor
+Out-of-place version of $index_copy_. tensor1 corresponds to self in $index_copy_.
index_copy_(dim, index, tensor) -> Tensor
+Copies the elements of tensor into the self tensor by selecting the indices in the order given in index. For example, if dim == 0 and index[i] == j, then the i th row of tensor is copied to the j th row of self.
The dim th dimension of tensor must have the same size as the length of index (which must be a vector), and all other dimensions must match self, or an error will be raised.
tensor to select from
+x <- torch_zeros(5, 3)
+t <- torch_tensor(matrix(1:9, ncol = 3), dtype=torch_float())
+index <- torch_tensor(c(1, 5, 3))
+x$index_copy_(1, index, t)index_fill(tensor1, dim, index, value) -> Tensor
+Out-of-place version of $index_fill_. tensor1 corresponds to self in $index_fill_.
index_fill_(dim, index, val) -> Tensor
+Fills the elements of the self tensor with value val by selecting the indices in the order given in index.
self tensor to fill in
+x <- torch_tensor(matrix(1:9, ncol = 3), dtype=torch_float())
+index <- torch_tensor(c(1, 3), dtype = torch_long())
+x$index_fill_(1, index, -1)index_put(tensor1, indices, value, accumulate=FALSE) -> Tensor
+Out-place version of $index_put_. tensor1 corresponds to self in $index_put_.
index_put_(indices, value, accumulate=FALSE) -> Tensor
+Puts values from the tensor value into the tensor self using the indices specified in indices (which is a tuple of Tensors). The expression tensor.index_put_(indices, value) is equivalent to tensor[indices] = value. Returns self.
If accumulate is TRUE, the elements in value are added to self. If accumulate is FALSE, the behavior is undefined if indices contain duplicate elements.
indices() -> Tensor
+If self is a sparse COO tensor (i.e., with torch_sparse_coo layout), this returns a view of the contained indices tensor. Otherwise, this throws an error.
See also Tensor.values.
int(memory_format=torch_preserve_format) -> Tensor
+self$int() is equivalent to self$to(torch_int32). See [to()].
int_repr() -> Tensor
+Given a quantized Tensor, self$int_repr() returns a CPU Tensor with uint8_t as data type that stores the underlying uint8_t values of the given Tensor.
irfft(signal_ndim, normalized=FALSE, onesided=TRUE, signal_sizes=NULL) -> Tensor
+See ?torch_irfft
is_complex() -> bool
+Returns TRUE if the data type of self is a complex data type.
is_contiguous(memory_format=torch_contiguous_format) -> bool
+Returns TRUE if self tensor is contiguous in memory in the order specified by memory format.
is_floating_point() -> bool
+Returns TRUE if the data type of self is a floating point data type.
All Tensors that have requires_grad which is FALSE will be leaf Tensors by convention.
For Tensors that have requires_grad which is TRUE, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is NULL.
Only leaf Tensors will have their grad populated during a call to [backward()]. To get grad populated for non-leaf Tensors, you can use [retain_grad()].
+a <- torch_rand(10, requires_grad=TRUE)
+a$is_leaf
+
+# b <- torch_rand(10, requires_grad=TRUE)$cuda()
+# b$is_leaf()
+# FALSE
+# b was created by the operation that cast a cpu Tensor into a cuda Tensor
+
+c <- torch_rand(10, requires_grad=TRUE) + 2
+c$is_leaf
+# c was created by the addition operation
+
+# d <- torch_rand(10)$cuda()
+# d$is_leaf()
+# TRUE
+# d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
+
+# e <- torch_rand(10)$cuda()$requires_grad_()
+# e$is_leaf()
+# TRUE
+# e requires gradients and has no operations creating it
+
+# f <- torch_rand(10, requires_grad=TRUE, device="cuda")
+# f$is_leaf
+# TRUE
+# f requires grad, has no operation creating itIs TRUE if the Tensor is a meta tensor, FALSE otherwise. Meta tensors are like normal tensors, but they carry no data.
is_set_to(tensor) -> bool
+Returns TRUE if this object refers to the same THTensor object from the Torch C API as the given tensor.
See ?torch_istft ## item
item() -> number
+Returns the value of this tensor as a standard Python number. This only works for tensors with one element. For other cases, see $tolist.
This operation is not differentiable.
+
+x <- torch_tensor(1.0)
+x$item()log_normal_(mean=1, std=2, *, generator=NULL)
+Fills self tensor with numbers samples from the log-normal distribution parameterized by the given mean \mu and standard deviation \sigma. Note that mean and std are the mean and standard deviation of the underlying normal distribution, and not of the returned distribution:
\[ +f(x) = \dfrac{1}{x \sigma \sqrt{2\pi}}\ e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} +\]
+long(memory_format=torch_preserve_format) -> Tensor
+self$long() is equivalent to self$to(torch_int64). See [to()].
map_(tensor, callable)
+Applies callable for each element in self tensor and the given tensor and stores the results in self tensor. self tensor and the given tensor must be broadcastable.
The callable should have the signature:
callable(a, b) -> number
masked_fill_(mask, value)
+Fills elements of self tensor with value where mask is TRUE. The shape of mask must be broadcastable <broadcasting-semantics> with the shape of the underlying tensor.
masked_scatter(mask, tensor) -> Tensor
+Out-of-place version of $masked_scatter_
masked_scatter_(mask, source)
+Copies elements from source into self tensor at positions where the mask is TRUE. The shape of mask must be :ref:broadcastable <broadcasting-semantics> with the shape of the underlying tensor. The source should have at least as many elements as the number of ones in mask
Stores names for each of this tensor’s dimensions.
+names[idx] corresponds to the name of tensor dimension idx. Names are either a string if the dimension is named or NULL if the dimension is unnamed.
Dimension names may contain characters or underscore. Furthermore, a dimension name must be a valid Python variable name (i.e., does not start with underscore).
+Tensors may not have two named dimensions with the same name.
+ +narrow(dimension, start, length) -> Tensor
+See ?torch_narrow
+x <- torch_tensor(matrix(1:9, ncol = 3))
+x$narrow(1, 1, 3)
+x$narrow(1, 1, 2)narrow_copy(dimension, start, length) -> Tensor
+Same as Tensor.narrow except returning a copy rather than shared storage. This is primarily for sparse tensors, which do not have a shared-storage narrow method. Calling narrow_copy` withdimemsion > self\(sparse_dim()`` will return a copy with the relevant dense dimension narrowed, and ``self\)shape`` updated accordingly.
new_empty(size, dtype=NULL, device=NULL, requires_grad=FALSE) -> Tensor
+Returns a Tensor of size size filled with uninitialized data. By default, the returned Tensor has the same torch_dtype and torch_device as this tensor.
torch_dtype, optional): the desired type of returned tensor. Default: if NULL, same torch_dtype as this tensor.torch_device, optional): the desired device of returned tensor. Default: if NULL, same torch_device as this tensor.FALSE.
+tensor <- torch_ones(5)
+tensor$new_empty(c(2, 3))new_full(size, fill_value, dtype=NULL, device=NULL, requires_grad=FALSE) -> Tensor
+Returns a Tensor of size size filled with fill_value. By default, the returned Tensor has the same torch_dtype and torch_device as this tensor.
torch_dtype, optional): the desired type of returned tensor. Default: if NULL, same torch_dtype as this tensor.torch_device, optional): the desired device of returned tensor. Default: if NULL, same torch_device as this tensor.FALSE.
+tensor <- torch_ones(c(2), dtype=torch_float64())
+tensor$new_full(c(3, 4), 3.141592)new_ones(size, dtype=NULL, device=NULL, requires_grad=FALSE) -> Tensor
+Returns a Tensor of size size filled with 1. By default, the returned Tensor has the same torch_dtype and torch_device as this tensor.
torch_Size of integers defining thetorch_dtype, optional): the desired type of returned tensor. Default: if NULL, same torch_dtype as this tensor.torch_device, optional): the desired device of returned tensor. Default: if NULL, same torch_device as this tensor.FALSE.
+tensor <- torch_tensor(c(2), dtype=torch_int32())
+tensor$new_ones(c(2, 3))new_tensor(data, dtype=NULL, device=NULL, requires_grad=FALSE) -> Tensor
+Returns a new Tensor with data as the tensor data. By default, the returned Tensor has the same torch_dtype and torch_device as this tensor.
new_tensor always copies data(). If you have a Tensordata` and want to avoid a copy, use [$requires_grad_()] or [$detach()]. If you have a numpy array and want to avoid a copy, use [torch_from_numpy()].
When data is a tensor x, [new_tensor()()] reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore tensor$new_tensor(x) is equivalent to x$clone()$detach() and tensor$new_tensor(x, requires_grad=TRUE) is equivalent to x$clone()$detach()$requires_grad_(TRUE). The equivalents using clone() and detach() are recommended.
data.torch_dtype, optional): the desired type of returned tensor. Default: if NULL, same torch_dtype as this tensor.torch_device, optional): the desired device of returned tensor. Default: if NULL, same torch_device as this tensor.FALSE.
+tensor <- torch_ones(c(2), dtype=torch_int8)
+data <- matrix(1:4, ncol = 2)
+tensor$new_tensor(data)new_zeros(size, dtype=NULL, device=NULL, requires_grad=FALSE) -> Tensor
+Returns a Tensor of size size filled with 0. By default, the returned Tensor has the same torch_dtype and torch_device as this tensor.
torch_Size of integers defining thetorch_dtype, optional): the desired type of returned tensor. Default: if NULL, same torch_dtype as this tensor.torch_device, optional): the desired device of returned tensor. Default: if NULL, same torch_device as this tensor.FALSE.
+tensor <- torch_tensor(c(1), dtype=torch_float64())
+tensor$new_zeros(c(2, 3))See ?torch_norm ## normal_
normal_(mean=0, std=1, *, generator=NULL) -> Tensor
+Fills self tensor with elements samples from the normal distribution parameterized by mean and std.
numpy() -> numpy.ndarray
+Returns self tensor as a NumPy :class:ndarray. This tensor and the returned ndarray share the same underlying storage. Changes to self tensor will be reflected in the :class:ndarray and vice versa.
permute(*dims) -> Tensor
+Returns a view of the original tensor with its dimensions permuted.
+ +
+x <- torch_randn(2, 3, 5)
+x$size()
+x$permute(c(3, 1, 2))$size()pin_memory() -> Tensor
+Copies the tensor to pinned memory, if it’s not already pinned.
+put_(indices, tensor, accumulate=FALSE) -> Tensor
+Copies the elements from tensor into the positions specified by indices. For the purpose of indexing, the self tensor is treated as if it were a 1-D tensor.
If accumulate is TRUE, the elements in tensor are added to self. If accumulate is FALSE, the behavior is undefined if indices contain duplicate elements.
+src <- torch_tensor(matrix(3:8, ncol = 3))
+src$put_(torch_tensor(1:2), torch_tensor(9:10))q_per_channel_axis() -> int
+Given a Tensor quantized by linear (affine) per-channel quantization, returns the index of dimension on which per-channel quantization is applied.
+q_per_channel_scales() -> Tensor
+Given a Tensor quantized by linear (affine) per-channel quantization, returns a Tensor of scales of the underlying quantizer. It has the number of elements that matches the corresponding dimensions (from q_per_channel_axis) of the tensor.
+q_per_channel_zero_points() -> Tensor
+Given a Tensor quantized by linear (affine) per-channel quantization, returns a tensor of zero_points of the underlying quantizer. It has the number of elements that matches the corresponding dimensions (from q_per_channel_axis) of the tensor.
+q_scale() -> float
+Given a Tensor quantized by linear(affine) quantization, returns the scale of the underlying quantizer().
+q_zero_point() -> int
+Given a Tensor quantized by linear(affine) quantization, returns the zero_point of the underlying quantizer().
+random_(from=0, to=NULL, *, generator=NULL) -> Tensor
+Fills self tensor with numbers sampled from the discrete uniform distribution over [from, to - 1]. If not specified, the values are usually only bounded by self tensor’s data type. However, for floating point types, if unspecified, range will be [0, 2^mantissa] to ensure that every value is representable. For example, torch_tensor(1, dtype=torch_double).random_() will be uniform in [0, 2^53].
Returns a new tensor containing real values of the self tensor. The returned tensor and self share the same underlying storage.
+x <- torch_randn(4, dtype=torch_cfloat())
+x
+x$realrecord_stream(stream)
+Ensures that the tensor memory is not reused for another tensor until all current work queued on stream are complete.
The caching allocator is aware of only the stream where a tensor was allocated. Due to the awareness, it already correctly manages the life cycle of tensors on only one stream. But if a tensor is used on a stream different from the stream of origin, the allocator might reuse the memory unexpectedly. Calling this method lets the allocator know which streams have used the tensor.
+Refines the dimension names of self according to names.
Refining is a special case of renaming that “lifts” unnamed dimensions. A NULL dim can be refined to have any name; a named dim can only be refined to have the same name.
Because named tensors can coexist with unnamed tensors, refining names gives a nice way to write named-tensor-aware code that works with both named and unnamed tensors.
+names may contain up to one Ellipsis (...). The Ellipsis is expanded greedily; it is expanded in-place to fill names to the same length as self$dim() using names from the corresponding indices of self$names.
+imgs <- torch_randn(32, 3, 128, 128)
+named_imgs <- imgs$refine_names(c('N', 'C', 'H', 'W'))
+named_imgs$namesRegisters a backward hook.
+The hook will be called every time a gradient with respect to the Tensor is computed. The hook should have the following signature::
+hook(grad) -> Tensor or NULL
+The hook should not modify its argument, but it can optionally return a new gradient which will be used in place of grad.
This function returns a handle with a method handle$remove() that removes the hook from the module.
+v <- torch_tensor(c(0., 0., 0.), requires_grad=TRUE)
+h <- v$register_hook(function(grad) grad * 2) # double the gradient
+v$backward(torch_tensor(c(1., 2., 3.)))
+v$grad
+h$remove()Renames dimension names of self.
There are two main usages:
+self$rename(**rename_map) returns a view on tensor that has dims renamed as specified in the mapping rename_map.
self$rename(*names) returns a view on tensor, renaming all dimensions positionally using names. Use self$rename(NULL) to drop names on a tensor.
One cannot specify both positional args names and keyword args rename_map.
+imgs <- torch_rand(2, 3, 5, 7, names=c('N', 'C', 'H', 'W'))
+renamed_imgs <- imgs$rename(c("Batch", "Channels", "Height", "Width"))repeat(*sizes) -> Tensor
+Repeats this tensor along the specified dimensions.
+Unlike $expand, this function copies the tensor’s data.
+x <- torch_tensor(c(1, 2, 3))
+x$`repeat`(c(4, 2))
+x$`repeat`(c(4, 2, 1))$size()repeat_interleave(repeats, dim=NULL) -> Tensor
+See [torch_repeat_interleave()].
+requires_grad_(requires_grad=TRUE) -> Tensor
+Change if autograd should record operations on this tensor: sets this tensor’s requires_grad attribute in-place. Returns this tensor.
[requires_grad_()]’s main use case is to tell autograd to begin recording operations on a Tensor tensor. If tensor has requires_grad=FALSE (because it was obtained through a DataLoader, or required preprocessing or initialization), tensor.requires_grad_() makes it so that autograd will begin to record operations on tensor.
TRUE.
+# Let's say we want to preprocess some saved weights and use
+# the result as new weights.
+saved_weights <- c(0.1, 0.2, 0.3, 0.25)
+loaded_weights <- torch_tensor(saved_weights)
+weights <- preprocess(loaded_weights) # some function
+weights
+
+# Now, start to record operations done to weights
+weights$requires_grad_()
+out <- weights$pow(2)$sum()
+out$backward()
+weights$gradreshape(*shape) -> Tensor
+Returns a tensor with the same data and number of elements as self but with the specified shape. This method returns a view if shape is compatible with the current shape. See $view on when it is possible to return a view.
See ?torch_reshape
reshape_as(other) -> Tensor
+Returns this tensor as the same shape as other. self$reshape_as(other) is equivalent to self$reshape(other.sizes()). This method returns a view if other.sizes() is compatible with the current shape. See $view on when it is possible to return a view.
Please see reshape for more information about reshape.
resize_(*sizes, memory_format=torch_contiguous_format) -> Tensor
+Resizes self tensor to the specified size. If the number of elements is larger than the current storage size, then the underlying storage is resized to fit the new number of elements. If the number of elements is smaller, the underlying storage is not changed. Existing elements are preserved but any new memory is uninitialized.
This is a low-level method. The storage is reinterpreted as C-contiguous, ignoring the current strides (unless the target size equals the current size, in which case the tensor is left unchanged). For most purposes, you will instead want to use $view(), which checks for contiguity, or $reshape(), which copies data if needed. To change the size in-place with custom strides, see $set_().
torch_memory_format, optional): the desired memory format of Tensor. Default: torch_contiguous_format. Note that memory format of self is going to be unaffected if self$size() matches sizes.
+x <- torch_tensor(matrix(1:6, ncol = 2))
+x$resize_(c(2, 2))resize_as_(tensor, memory_format=torch_contiguous_format) -> Tensor
+Resizes the self tensor to be the same size as the specified tensor. This is equivalent to self$resize_(tensor.size()).
scatter_(dim, index, src) -> Tensor
+Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.
For a 3-D tensor, self is updated as:
self[index[i][j][k]][j][k] = src[i][j][k] # if dim == 0
+self[i][index[i][j][k]][k] = src[i][j][k] # if dim == 1
+self[i][j][index[i][j][k]] = src[i][j][k] # if dim == 2
+This is the reverse operation of the manner described in $gather.
self, index and src (if it is a Tensor) should have same number of dimensions. It is also required that index.size(d) <= src.size(d) for all dimensions d, and that index.size(d) <= self$size(d) for all dimensions d != dim.
Moreover, as for $gather, the values of index must be between 0 and self$size(dim) - 1 inclusive, and all values in a row along the specified dimension dim must be unique.
value is not specifiedsrc is not specified
+x <- torch_rand(2, 5)
+x
+torch_zeros(3, 5)$scatter_(
+ 1,
+ torch_tensor(rbind(c(2, 3, 3, 1, 1), c(3, 1, 1, 2, 3)), x)
+)
+
+z <- torch_zeros(2, 4)$scatter_(
+ 2,
+ torch_tensor(matrix(3:4, ncol = 1)), 1.23
+)scatter_add_(dim, index, src) -> Tensor
+Adds all values from the tensor other into self at the indices specified in the index tensor in a similar fashion as ~$scatter_. For each value in src, it is added to an index in self which is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.
For a 3-D tensor, self is updated as::
self[index[i][j][k]][j][k] += src[i][j][k] # if dim == 0
+self[i][index[i][j][k]][k] += src[i][j][k] # if dim == 1
+self[i][j][index[i][j][k]] += src[i][j][k] # if dim == 2
+self, index and src should have same number of dimensions. It is also required that index.size(d) <= src.size(d) for all dimensions d, and that index.size(d) <= self$size(d) for all dimensions d != dim.
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch_backends.cudnn.deterministic = TRUE.
+x <- torch_rand(2, 5)
+x
+torch_ones(3, 5)$scatter_add_(1, torch_tensor(rbind(c(0, 1, 2, 0, 0), c(2, 0, 0, 1, 2))), x)select(dim, index) -> Tensor
+Slices the self tensor along the selected dimension at the given index. This function returns a view of the original tensor with the given dimension removed.
set_(source=NULL, storage_offset=0, size=NULL, stride=NULL) -> Tensor
+Sets the underlying storage, size, and strides. If source is a tensor, self tensor will share the same storage and have the same size and strides as source. Changes to elements in one tensor will be reflected in the other.
Moves the underlying storage to shared memory.
+This is a no-op if the underlying storage is already in shared memory and for CUDA tensors. Tensors in shared memory cannot be resized.
+short(memory_format=torch_preserve_format) -> Tensor
+self$short() is equivalent to self$to(torch_int16). See [to()].
size() -> torch_Size
+Returns the size of the self tensor. The returned value is a subclass of tuple.
+torch_empty(3, 4, 5)$size()sparse_dim() -> int
+If self is a sparse COO tensor (i.e., with torch_sparse_coo layout), this returns the number of sparse dimensions. Otherwise, this throws an error.
See also Tensor.dense_dim.
sparse_mask(input, mask) -> Tensor
+Returns a new SparseTensor with values from Tensor input filtered by indices of mask and values are ignored. input and mask must have the same shape.
See ?torch_split
See ?torch_stft
storage_offset() -> int
+Returns self tensor’s offset in the underlying storage in terms of number of storage elements (not bytes).
+x <- torch_tensor(c(1, 2, 3, 4, 5))
+x$storage_offset()
+x[3:N]$storage_offset()stride(dim) -> tuple or int
+Returns the stride of self tensor.
Stride is the jump necessary to go from one element to the next one in the specified dimension dim. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension dim.
+x <- torch_tensor(matrix(1:10, nrow = 2))
+x$stride()
+x$stride(1)
+x$stride(-1)sub(other, *, alpha=1) -> Tensor
+Subtracts a scalar or tensor from self tensor. If both alpha and other are specified, each element of other is scaled by alpha before being used.
When other is a tensor, the shape of other must be broadcastable <broadcasting-semantics> with the shape of the underlying tensor.
sum_to_size(*size) -> Tensor
+Sum this tensor to size. size must be broadcastable to this tensor size.
to(*args, **kwargs) -> Tensor
+Performs Tensor dtype and/or device conversion. A torch_dtype and :class:torch_device are inferred from the arguments of self$to(*args, **kwargs).
If the self Tensor already has the correct torch_dtype and :class:torch_device, then self is returned. Otherwise, the returned tensor is a copy of self with the desired torch_dtype and :class:torch_device.
Here are the ways to call to:
to(dtype, non_blocking=FALSE, copy=FALSE, memory_format=torch_preserve_format) -> Tensor
+Returns a Tensor with the specified dtype
torch_memory_format, optional): the desired memory format of returned Tensor. Default: torch_preserve_format.to(device=NULL, dtype=NULL, non_blocking=FALSE, copy=FALSE, memory_format=torch_preserve_format) -> Tensor
+Returns a Tensor with the specified device and (optional) dtype. If dtype is NULL it is inferred to be self$dtype. When non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor.
When copy is set, a new Tensor is created even when the Tensor already matches the desired conversion.
torch_memory_format, optional): the desired memory format of returned Tensor. Default: torch_preserve_format.function:: to(other, non_blocking=FALSE, copy=FALSE) -> Tensor
+Returns a Tensor with same torch_dtype and :class:torch_device as the Tensor other. When non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor.
When copy is set, a new Tensor is created even when the Tensor already matches the desired conversion.
+tensor <- torch_randn(2, 2) # Initially dtype=float32, device=cpu
+tensor$to(dtype = torch_float64())
+
+other <- torch_randn(1, dtype=torch_float64())
+tensor$to(other = other, non_blocking=TRUE)to_sparse(sparseDims) -> Tensor Returns a sparse copy of the tensor. PyTorch supports sparse tensors in coordinate format <sparse-docs>.
tolist() -> list or number
+Returns the tensor as a (nested) list. For scalars, a standard Python number is returned, just like with $item. Tensors are automatically moved to the CPU first if necessary.
This operation is not differentiable.
+triangular_solve(A, upper=TRUE, transpose=FALSE, unitriangular=FALSE) -> (Tensor, Tensor)
+See [torch_triangular_solve()]
+type(dtype=NULL, non_blocking=FALSE, **kwargs) -> str or Tensor Returns the type if dtype is not provided, else casts this object to the specified type.
If this is already of the correct type, no copy is performed and the original object is returned.
+TRUE, and the source is in pinned memoryasync in place ofnon_blocking argument. The async arg is deprecated.type_as(tensor) -> Tensor
+Returns this tensor cast to the type of the given tensor.
+This is a no-op if the tensor is already of the correct type. This is equivalent to self$type(tensor.type())
Unflattens the named dimension dim, viewing it in the shape specified by namedshape.
unfold(dimension, size, step) -> Tensor
+Returns a view of the original tensor which contains all slices of size size from self tensor in the dimension dimension.
Step between two slices is given by step.
If sizedim is the size of dimension dimension for self, the size of dimension dimension in the returned tensor will be (sizedim - size) / step + 1.
An additional dimension of size size is appended in the returned tensor.
uniform_(from=0, to=1) -> Tensor
+Fills self tensor with numbers sampled from the continuous uniform distribution:
\[ +P(x) = \dfrac{1}{\text{to} - \text{from}} +\]
+Eliminates all but the first element from every consecutive group of equivalent elements.
+See [torch_unique_consecutive()]
+values() -> Tensor
+If self is a sparse COO tensor (i.e., with torch_sparse_coo layout), this returns a view of the contained values tensor. Otherwise, this throws an error.
view(*shape) -> Tensor
+Returns a new tensor with the same data as the self tensor but of a different shape.
The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions d, d+1, \dots, d+k that satisfy the following contiguity-like condition that \forall i = d, \dots, d+k-1,
\[ +\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1] +\]
+Otherwise, it will not be possible to view self tensor as shape without copying it (e.g., via contiguous). When it is unclear whether a view can be performed, it is advisable to use :meth:reshape, which returns a view if the shapes are compatible, and copies (equivalent to calling contiguous) otherwise.
view_as(other) -> Tensor
+View this tensor as the same size as other. self$view_as(other) is equivalent to self$view(other.size()).
Please see $view for more information about view.
where(condition, y) -> Tensor
+self$where(condition, y) is equivalent to torch_where(condition, self, y). See ?torch_where
TorchScript is a statically typed subset of Python that can be interpreted by LibTorch without any Python dependency. The torch R package provides interfaces to create, serialize, load and execute TorchScript programs.
+Advantages of using TorchScript are:
+TorchScript code can be invoked in its own interpreter, which is basically a restricted Python interpreter. This interpreter does not acquire the Global Interpreter Lock, and so many requests can be processed on the same instance simultaneously.
This format allows us to save the whole model to disk and load it into another environment, such as on server written in a language other than R.
TorchScript gives us a representation in which we can do compiler optimizations on the code to make execution more efficient.
TorchScript allows us to interface with many backend/device runtimes that require a broader view of the program than individual operators.
TorchScript programs can be created from R using tracing. When using tracing, code is automatically converted into this subset of Python by recording only the actual operators on tensors and simply executing and discarding the other surrounding R code.
+Currently tracing is the only supported way to create TorchScript programs from R code.
+For example, let’s use the jit_trace function to create a TorchScript program. We pass a regular R function and example inputs.
+fn <- function(x) {
+ torch_relu(x)
+}
+
+traced_fn <- jit_trace(fn, torch_tensor(c(-1, 0, 1)))The jit_trace function has executed the R function with the example input and recorded all torch operations that occurred during execution to create a graph. graph is how we call the intermediate representation of TorchScript programs, and it can be inspected with:
+traced_fn$graph
+#> graph(%0 : Float(3, strides=[1], requires_grad=0, device=cpu)):
+#> %1 : Float(3, strides=[1], requires_grad=0, device=cpu) = aten::relu(%0)
+#> return (%1)The traced function can now be invoked as a regular R function:
+
+traced_fn(torch_randn(3))
+#> torch_tensor
+#> 0
+#> 0
+#> 0
+#> [ CPUFloatType{3} ]It’s also possible to trace nn_modules() defined in R, for example:
+module <- nn_module(
+ initialize = function() {
+ self$linear1 <- nn_linear(10, 10)
+ self$linear2 <- nn_linear(10, 1)
+ },
+ forward = function(x) {
+ x %>%
+ self$linear1() %>%
+ nnf_relu() %>%
+ self$linear2()
+ }
+)
+traced_module <- jit_trace(module(), torch_randn(10, 10))When using jit_trace with a nn_module only the forward method is traced. You can use the jit_trace_module function to pass example inputs to other methods. Traced modules look like normal nn_modules(), and can be called the same way:
+traced_module(torch_randn(3, 10))
+#> torch_tensor
+#> 0.01 *
+#> 3.1033
+#> -11.5480
+#> -17.3729
+#> [ CPUFloatType{3,1} ][ grad_fn = <AddBackward0> ]
+# fn does does an operation for each dimension of a tensor
+fn <- function(x) {
+ x %>%
+ torch_unbind(dim = 1) %>%
+ lapply(function(x) x$sum()) %>%
+ torch_stack(dim = 1)
+}
+# we trace using as an example a tensor with size (10, 5, 5)
+traced_fn <- jit_trace(fn, torch_randn(10, 5, 5))
+# applying it with a tensor with different size returns an error.
+traced_fn(torch_randn(11, 5, 5))
+#> Error in cpp_call_traced_fn(ptr, inputs): The following operation failed in the TorchScript interpreter.
+#> Traceback of TorchScript (most recent call last):
+#> RuntimeError: Expected 10 elements in a list but found 11ScriptModule, operations that have different behaviors in training and eval modes will always behave as if it were in the mode it was in during tracing, no matter which mode the ScriptModule is in. For example:
+traced_dropout <- jit_trace(nn_dropout(), torch_ones(5,5))
+traced_dropout(torch_ones(3,3))
+#> torch_tensor
+#> 0 0 0
+#> 0 0 0
+#> 0 2 2
+#> [ CPUFloatType{3,3} ]
+traced_dropout$eval()
+# even after setting to eval mode, dropout is applied
+traced_dropout(torch_ones(3,3))
+#> torch_tensor
+#> 2 0 0
+#> 2 0 2
+#> 2 2 2
+#> [ CPUFloatType{3,3} ]
+fn <- function(x, y) {
+ x + y
+}
+jit_trace(fn, torch_tensor(1), 1)
+#> Error in cpp_trace_function(tr_fn, list(...), .compilation_unit, strict, : Only tensors or (possibly nested) dict or tuples of tensors can be inputs to traced functions. Got float
+#> Exception raised from addInput at ../torch/csrc/jit/frontend/tracer.cpp:408 (most recent call first):
+#> frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x1110b5522 in libc10.dylib)
+#> frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x1110b3c3a in libc10.dylib)
+#> frame #2: torch::jit::tracer::addInput(std::__1::shared_ptr<torch::jit::tracer::TracingState> const&, c10::IValue const&, std::__1::shared_ptr<c10::Type> const&, torch::jit::Value*) + 6951 (0x115a9f8e7 in libtorch_cpu.dylib)
+#> frame #3: torch::jit::tracer::addInput(std::__1::shared_ptr<torch::jit::tracer::TracingState> const&, c10::IValue const&, std::__1::shared_ptr<c10::Type> const&, torch::jit::Value*) + 4216 (0x115a9ee38 in libtorch_cpu.dylib)
+#> frame #4: torch::jit::tracer::trace(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >, std::__1::function<std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> > (std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)> const&, std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (at::Tensor const&)>, bool, bool, torch::jit::Module*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) + 1034 (0x115a9c44a in libtorch_cpu.dylib)
+#> frame #5: _lantern_trace_fn + 541 (0x110ea439d in liblantern.dylib)
+#> frame #6: cpp_trace_function(Rcpp::Function_Impl<Rcpp::PreserveStorage>, XPtrTorchStack, XPtrTorchCompilationUnit, XPtrTorchstring, bool, XPtrTorchScriptModule, bool, bool) + 547 (0x1104138b3 in torchpkg.so)
+#> frame #7: _torch_cpp_trace_function + 727 (0x1102971c7 in torchpkg.so)
+#> frame #8: R_doDotCall + 2679 (0x108849737 in libR.dylib)
+#> frame #9: do_dotcall + 334 (0x10884abfe in libR.dylib)
+#> frame #10: bcEval + 28581 (0x108881c05 in libR.dylib)
+#> frame #11: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #12: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #13: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #14: bcEval + 26782 (0x1088814fe in libR.dylib)
+#> frame #15: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #16: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #17: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #18: Rf_eval + 1595 (0x10887aa6b in libR.dylib)
+#> frame #19: do_eval + 625 (0x10889e461 in libR.dylib)
+#> frame #20: bcEval + 28581 (0x108881c05 in libR.dylib)
+#> frame #21: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #22: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #23: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #24: bcEval + 26782 (0x1088814fe in libR.dylib)
+#> frame #25: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #26: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #27: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #28: bcEval + 26782 (0x1088814fe in libR.dylib)
+#> frame #29: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #30: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #31: Rf_eval + 1124 (0x10887a894 in libR.dylib)
+#> frame #32: do_withVisible + 57 (0x10889ead9 in libR.dylib)
+#> frame #33: do_internal + 362 (0x1088e22fa in libR.dylib)
+#> frame #34: bcEval + 29053 (0x108881ddd in libR.dylib)
+#> frame #35: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #36: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #37: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #38: bcEval + 26782 (0x1088814fe in libR.dylib)
+#> frame #39: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #40: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #41: getvar + 778 (0x1088a3f7a in libR.dylib)
+#> frame #42: bcEval + 15063 (0x10887e737 in libR.dylib)
+#> frame #43: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #44: R_execClosure + 2169 (0x10889a429 in libR.dylib)
+#> frame #45: Rf_applyClosure + 471 (0x108899217 in libR.dylib)
+#> frame #46: bcEval + 26782 (0x1088814fe in libR.dylib)
+#> frame #47: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #48: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #49: getvar + 778 (0x1088a3f7a in libR.dylib)
+#> frame #50: bcEval + 15063 (0x10887e737 in libR.dylib)
+#> frame #51: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #52: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #53: getvar + 778 (0x1088a3f7a in libR.dylib)
+#> frame #54: bcEval + 15063 (0x10887e737 in libR.dylib)
+#> frame #55: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #56: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #57: getvar + 778 (0x1088a3f7a in libR.dylib)
+#> frame #58: bcEval + 15063 (0x10887e737 in libR.dylib)
+#> frame #59: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> frame #60: forcePromise + 172 (0x10889892c in libR.dylib)
+#> frame #61: getvar + 778 (0x1088a3f7a in libR.dylib)
+#> frame #62: bcEval + 15063 (0x10887e737 in libR.dylib)
+#> frame #63: Rf_eval + 385 (0x10887a5b1 in libR.dylib)
+#> :It’s also possible to create TorchScript programs by compiling TorchScript code. TorchScript code looks a lot like standard python code. For example:
+
+tr <- jit_compile("
+def fn (x: Tensor):
+ return torch.relu(x)
+
+")
+tr$fn(torch_tensor(c(-1, 0, 1)))
+#> torch_tensor
+#> 0
+#> 0
+#> 1
+#> [ CPUFloatType{3} ]TorchScript programs can be serialized using the jit_save function and loaded back from disk with jit_load.
For example:
+
+fn <- function(x) {
+ torch_relu(x)
+}
+tr_fn <- jit_trace(fn, torch_tensor(1))
+jit_save(tr_fn, "path.pt")
+loaded <- jit_load("path.pt")Loaded programs can be executed as usual:
+
+loaded(torch_tensor(c(-1, 0, 1)))
+#> torch_tensor
+#> 0
+#> 0
+#> 1
+#> [ CPUFloatType{3} ]Note You can load TorchScript programs that were created in libraries different than torch for R. Eg, a TorchScript program can be created in PyTorch with torch.jit.trace or torch.jit.script, and run from R.
R objects are automatically converted to their TorchScript counterpart following the Types table in this document. However, sometimes it’s necessary to make type annotations with jit_tuple() and jit_scalar() to disambiguate the conversion.
The following table lists all TorchScript types and how to convert the to and back to R.
+| TorchScript Type | +R Description | +
|---|---|
Tensor |
+A torch_tensor with any shape, dtype or backend. |
+
Tuple[T0, T1, ..., TN] |
+A list() containing subtypes T0, T1, etc. wrapped with jit_tuple() . |
+
bool |
+A scalar logical value create using jit_scalar. |
+
int |
+A scalar integer value created using jit_scalar. |
+
float |
+A scalar floating value created using jit_scalar. |
+
str |
+A string (ie. character vector of length 1) wrapped in jit_scalar. |
+
List[T] |
+An R list of which all types are type T . Or numeric vectors, logical vectors, etc. |
+
Optional[T] |
+Not yet supported. | +
Dict[str, V] |
+A named list with values of type V . Only str key values are currently supported. |
+
T |
+Not yet supported. | +
E |
+Not yet supported. | +
NamedTuple[T0, T1, ...] |
+A named list containing subtypes T0, T1, etc. wrapped in jit_tuple(). |
+
So far, all we’ve been using from torch is tensors, but we’ve been performing all calculations ourselves – the computing the predictions, the loss, the gradients (and thus, the necessary updates to the weights), and the new weight values. In this chapter, we’ll make a significant change: Namely, we spare ourselves the cumbersome calculation of gradients, and have torch do it for us.
+Before we see that in action, let’s get some more background.
+Torch uses a module called autograd to record operations performed on tensors, and store what has to be done to obtain the respective gradients. These actions are stored as functions, and those functions are applied in order when the gradient of the output (normally, the loss) with respect to those tensors is calculated: starting from the output node and propagating gradients back through the network. This is a form of reverse mode automatic differentiation.
+As users, we can see a bit of this implementation. As a prerequisite for this “recording” to happen, tensors have to be created with requires_grad = TRUE. E.g.
+x <- torch_ones(2,2, requires_grad = TRUE)To be clear, this is a tensor with respect to which gradients have to be calculated – normally, a tensor representing a weight or a bias, not the input data 1. If we now perform some operation on that tensor, assigning the result to y
+y <- x$mean()we find that y now has a non-empty grad_fn that tells torch how to compute the gradient of y with respect to x:
+y$grad_fn
+#> MeanBackward0Actual computation of gradients is triggered by calling backward() on the output tensor.
+y$backward()That executed, x now has a non-empty field grad that stores the gradient of y with respect to x:
+x$grad
+#> torch_tensor
+#> 0.2500 0.2500
+#> 0.2500 0.2500
+#> [ CPUFloatType{2,2} ]With a longer chain of computations, we can peek at how torch builds up a graph of backward operations.
+Here is a slightly more complex example. We call retain_grad() on y and z just for demonstration purposes; by default, intermediate gradients – while of course they have to be computed – aren’t stored, in order to save memory.
+x1 <- torch_ones(2,2, requires_grad = TRUE)
+x2 <- torch_tensor(1.1, requires_grad = TRUE)
+y <- x1 * (x2 + 2)
+y$retain_grad()
+z <- y$pow(2) * 3
+z$retain_grad()
+out <- z$mean()Starting from out$grad_fn, we can follow the graph all back to the leaf nodes:
+# how to compute the gradient for mean, the last operation executed
+out$grad_fn
+#> MeanBackward0
+# how to compute the gradient for the multiplication by 3 in z = y$pow(2) * 3
+out$grad_fn$next_functions
+#> [[1]]
+#> MulBackward1
+# how to compute the gradient for pow in z = y.pow(2) * 3
+out$grad_fn$next_functions[[1]]$next_functions
+#> [[1]]
+#> PowBackward0
+# how to compute the gradient for the multiplication in y = x * (x + 2)
+out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions
+#> [[1]]
+#> MulBackward0
+# how to compute the gradient for the two branches of y = x * (x + 2),
+# where the left branch is a leaf node (AccumulateGrad for x1)
+out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions[[1]]$next_functions
+#> [[1]]
+#> torch::autograd::AccumulateGrad
+#> [[2]]
+#> AddBackward1
+# here we arrive at the other leaf node (AccumulateGrad for x2)
+out$grad_fn$next_functions[[1]]$next_functions[[1]]$next_functions[[1]]$next_functions[[2]]$next_functions
+#> [[1]]
+#> torch::autograd::AccumulateGradAfter calling out$backward(), all tensors in the graph will have their respective gradients created. Without our calls to retain_grad above, z$grad and y$grad would be empty:
+out$backward()
+z$grad
+#> torch_tensor
+#> 0.2500 0.2500
+#> 0.2500 0.2500
+#> [ CPUFloatType{2,2} ]
+y$grad
+#> torch_tensor
+#> 4.6500 4.6500
+#> 4.6500 4.6500
+#> [ CPUFloatType{2,2} ]
+x2$grad
+#> torch_tensor
+#> 18.6000
+#> [ CPUFloatType{1} ]
+x1$grad
+#> torch_tensor
+#> 14.4150 14.4150
+#> 14.4150 14.4150
+#> [ CPUFloatType{2,2} ]Thus acquainted with autograd, we’re ready to modify our example.
+For a single new line calling loss$backward(), now a number of lines (that did manual backprop) are gone:
+### generate training data -----------------------------------------------------
+# input dimensionality (number of input features)
+d_in <- 3
+# output dimensionality (number of predicted features)
+d_out <- 1
+# number of observations in training set
+n <- 100
+# create random data
+x <- torch_randn(n, d_in)
+y <- x[,1]*0.2 - x[..,2]*1.3 - x[..,3]*0.5 + torch_randn(n)
+y <- y$unsqueeze(dim = 1)
+### initialize weights ---------------------------------------------------------
+# dimensionality of hidden layer
+d_hidden <- 32
+# weights connecting input to hidden layer
+w1 <- torch_randn(d_in, d_hidden, requires_grad = TRUE)
+# weights connecting hidden to output layer
+w2 <- torch_randn(d_hidden, d_out, requires_grad = TRUE)
+# hidden layer bias
+b1 <- torch_zeros(1, d_hidden, requires_grad = TRUE)
+# output layer bias
+b2 <- torch_zeros(1, d_out,requires_grad = TRUE)
+### network parameters ---------------------------------------------------------
+learning_rate <- 1e-4
+### training loop --------------------------------------------------------------
+for (t in 1:200) {
+
+ ### -------- Forward pass --------
+ y_pred <- x$mm(w1)$add(b1)$clamp(min = 0)$mm(w2)$add(b2)
+ ### -------- compute loss --------
+ loss <- (y_pred - y)$pow(2)$mean()
+ if (t %% 10 == 0) cat(t, as_array(loss), "\n")
+ ### -------- Backpropagation --------
+ # compute the gradient of loss with respect to all tensors with requires_grad = True.
+ loss$backward()
+
+ ### -------- Update weights --------
+
+ # Wrap in torch.no_grad() because this is a part we DON'T want to record for automatic gradient computation
+ with_no_grad({
+
+ w1$sub_(learning_rate * w1$grad)
+ w2$sub_(learning_rate * w2$grad)
+ b1$sub_(learning_rate * b1$grad)
+ b2$sub_(learning_rate * b2$grad)
+
+ # Zero the gradients after every pass, because they'd accumulate otherwise
+ w1$grad$zero_()
+ w2$grad$zero_()
+ b1$grad$zero_()
+ b2$grad$zero_()
+
+ })
+
+}
+#> 10 106.7842
+#> 20 92.94109
+#> 30 81.24539
+#> 40 71.34554
+#> 50 62.94669
+#> 60 55.81044
+#> 70 49.71231
+#> 80 44.48481
+#> 90 39.98558
+#> 100 36.11638
+#> 110 32.76665
+#> 120 29.85979
+#> 130 27.33166
+#> 140 25.13169
+#> 150 23.20926
+#> 160 21.52449
+#> 170 20.04912
+#> 180 18.75175
+#> 190 17.60774
+#> 200 16.59894We still manually compute the forward pass, and we still manually update the weights. In the last two chapters of this section, we’ll see how these parts of the logic can be made more modular and reusable, as well.
+