--- title: "GPFR example" output: rmarkdown::html_vignette header-includes: - \usepackage{bm} vignette: > %\VignetteIndexEntry{gpfr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE, results = 'hold', warning=F, cache=F, eval=T, #dev = 'pdf', message=F, fig.width=5, fig.height=5, tidy.opts=list(width.cutoff=75), tidy=FALSE ) old <- options(scipen = 1, digits = 4) ``` Suppose we have a functional response variable $y_m(t), \ m=1,\dots,M$, a functional covariate $x_m(t)$ and also a set of $p=2$ scalar covariates $\textbf{u}_m = (u_{m0},u_{m1})^\top$. A Gaussian process functional regression (GPFR) model used in this example is defined by $y_m(t) = \mu_m(t) + \tau_m(x_m(t)) + \varepsilon_m(t)$, where $\mu_m(t) = \textbf{u}_m^\top \boldsymbol{\beta}(t)$ is the mean function model across different curves and $\tau_m(x_m(t))$ is a Gaussian process with zero mean and covariance function $k_m(\boldsymbol{\theta}|x_m(t))$. That is, $\tau_m(x_m(t))$ defines the covariance structure of $y_m(t)$ for the different data points within the same curve. The error term can be assumed to be $\varepsilon_m(t) \sim N(0, \sigma_\varepsilon^2)$, where the noise variance $\sigma_\varepsilon^2$ can be estimated as a hyperparameter of the Gaussian process. In the example below, the training data consist of $M=20$ realisations on $[-4,4]$ with $n=50$ points for each curve. We assume regression coefficient functions $\beta_0(t)=1$, $\beta_1(t)=\sin((0.5 t)^3)$, scalar covariates $u_{m0} \sim N(0,1)$ and $u_{m1} \sim N(10,5^2)$ and a functional covariate $x_m(t) = \exp(t) + v$, where $v \sim N(0, 0.1^2)$. The term $\tau_m(x_m(t))$ is a zero mean Gaussian process with exponential covariance kernel and $\sigma_\varepsilon^2 = 1$. We also simulate an $(M+1)$th realisation which is used to assess predictions obtained by the model estimated by using the training data of size $M$. The $y_{M+1}(t)$ and $x_{M+1}(t)$ curves are observed on equally spaced $60$ time points on $[-4,4]$. ```{r setup} library(GPFDA) require(MASS) ``` ```{r} set.seed(100) M <- 20 n <- 50 p <- 2 # number of scalar covariates hp <- list('pow.ex.v'=log(10), 'pow.ex.w'=log(1),'vv'=log(1)) ## Training data: M realisations ----------------- tt <- seq(-4,4,len=n) b <- sin((0.5*tt)^3) scalar_train <- matrix(NA, M, p) t_train <- matrix(NA, M, n) x_train <- matrix(NA, M, n) response_train <- matrix(NA, M, n) for(i in 1:M){ u0 <- rnorm(1) u1 <- rnorm(1,10,5) x <- exp(tt) + rnorm(n, 0, 0.1) Sigma <- cov.pow.ex(hyper = hp, input = x, gamma = 1) diag(Sigma) <- diag(Sigma) + exp(hp$vv) y <- u0+u1*b + mvrnorm(n=1, mu=rep(0,n), Sigma=Sigma) scalar_train[i,] <- c(u0,u1) t_train[i,] <- tt x_train[i,] <- x response_train[i,] <- y } ## Test data (M+1)-th realisation ------------------ n_new <- 60 t_new <- seq(-4,4,len=n_new) b_new <- sin((0.5*t_new)^3) u0_new <- rnorm(1) u1_new <- rnorm(1,10,5) scalar_new <- cbind(u0_new, u1_new) x_new <- exp(t_new) + rnorm(n_new, 0, 0.1) Sigma_new <- cov.pow.ex(hyper = hp, input = x_new, gamma = 1) diag(Sigma_new) <- diag(Sigma_new) + exp(hp$vv) response_new <- u0_new + u1_new*b_new + mvrnorm(n=1, mu=rep(0,n_new), Sigma=Sigma_new) ``` ```{r, include=F, eval=F} dataExampleGPFR <- list(tt=tt, response_train=response_train, x_train=x_train, scalar_train=scalar_train, t_new=t_new, response_new=response_new, x_new=x_new, scalar_new=scalar_new) save(dataExampleGPFR, file = "data/dataExampleGPFR.rda") ``` The estimation of mean and covariance functions in the GPFR model is done using the `gpfr` function: ```{r, results=F} a1 <- gpfr(response = response_train, time = tt, uReg = scalar_train, fxReg = NULL, gpReg = x_train, fyList = list(nbasis = 23, lambda = 0.0001), uCoefList = list(list(lambda = 0.0001, nbasi = 23)), Cov = 'pow.ex', gamma = 1, fitting = T) ``` Note that the estimated covariance function hyperparameters are similar to the true values: ```{r} unlist(lapply(a1$hyper,exp)) ``` ### Plot of raw data To visualise all the realisations of the training data: ```{r} plot(a1, type='raw') ``` To visualise three realisations of the training data: ```{r} plot(a1, type='raw', realisations = 1:3) ``` ### FR fit for training data The in-sample fit using mean function from FR model only can be seen: ```{r} plot(a1, type = 'meanFunction', realisations = 1:3) ``` ### GPFR fit for training data The GPFR model fit to the training data is visualised by using: ```{r} plot(a1, type = 'fitted', realisations = 1:3) ``` ### Type I prediction: $y_{M+1}$ observed If $y_{M+1}(t)$ is observed over all the domain of $t$, the Type I prediction can be seen: ```{r, results=F} b1 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new, uReg = scalar_new, fxReg = NULL, gpReg = list('response' = response_new, 'input' = x_new, 'time' = t_new)) plot(b1, type = 'prediction', colourTrain = 'pink') lines(t_new, response_new, type = 'b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2) ``` ### Type I prediction: $y_{M+1}$ partially observed If we assume that $y_{M+1}(t)$ is only partially observed, we can obtain Type I predictions via: ```{r, results=F} b2 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new, uReg = scalar_new, fxReg = NULL, gpReg = list('response' = response_new[1:20], 'input' = x_new[1:20], 'time' = t_new[1:20])) plot(b2, type = 'prediction', colourTrain = 'pink') lines(t_new, response_new, type = 'b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2) ``` ### Type II prediction: $y_{M+1}$ not observed Type II prediction, which is made by not including any information about $y_{M+1}(t)$, is visualised below. ```{r, results=F} b3 <- gpfrPredict(a1, testInputGP = x_new, testTime = t_new, uReg = scalar_new, fxReg = NULL, gpReg = NULL) plot(b3, type = 'prediction', colourTrain = 'pink') lines(t_new, response_new, type='b', col = 4, pch = 19, cex = 0.6, lty = 3, lwd = 2) ``` ```{r, include = FALSE} options(old) ```