---
title: "MGPR example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mgpr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
        echo = TRUE, results = 'hold', warning=F, cache=F, #dev = 'pdf', 
  message=F, 
  fig.width=5, fig.height=5,
  tidy.opts=list(width.cutoff=75), tidy=FALSE
)
old <- options(scipen = 1, digits = 4)
```

\newcommand{\Cov}{\text{Cov}}

```{r setup}
library(GPFDA)
require(MASS)
```

## Simulating data from a multiple GP with three outputs

In this example, we simulate data from a multivariate (convolved) GP model. 
See details of this model in Chapter 8 of Shi, J. Q., and Choi, T. (2011), 
"Gaussian Process Regression Analysis for Functional Data", CRC Press.

We simulate $30$ realisations of three dependent outputs, with $250$ time 
points on $[0,1]$ for each output.
```{r}
set.seed(123)
nrep <- 30
n1 <- 250
n2 <- 250
n3 <- 250
N <- 3
n <- n1+n2+n3
input1 <- sapply(1:n1, function(x) (x - min(1:n1))/max(1:n1 - min(1:n1)))
input2 <- input1
input3 <- input1

# storing input vectors in a list
Data <- list()
Data$input <- list(input1, input2, input3)

# true hyperparameter values
nu0s <- c(6, 4, 2)
nu1s <- c(0.1, 0.05, 0.01)
a0s <- c(500, 500, 500)
a1s <- c(100, 100, 100)
sigm <- 0.05
hp <- c(nu0s, log(nu1s), log(a0s), log(a1s), log(sigm))

# Calculate covariance matrix
Psi <- mgpCovMat(Data=Data, hp=hp)
```


We need an index vector identifying to which output the data corresponds:
```{r}
ns <- sapply(Data$input, length)
idx <- c(unlist(sapply(1:N, function(i) rep(i, ns[i])))) 
```

Covariance functions $\Cov \big[X_j(t), X_\ell(0) \big]$ can be plotted as follows.
The arguments `output` and `outputp` correspond to $j$ and $\ell$, respectively. 

Given the hyperparameters `hp`, we can plot the auto- and cross-covariance 
functions as follows:
```{r}
# Plotting an auto-covariance function
plotmgpCovFun(type="Cov", output=1, outputp=1, Data=Data, hp=hp, idx=idx)
# Plotting a cross-covariance function
plotmgpCovFun(type="Cov", output=1, outputp=2, Data=Data, hp=hp, idx=idx)
```

Corresponding correlation functions can be plotted by setting `type=Cor`:
```{r}
# Plotting an auto-correlation function
plotmgpCovFun(type="Cor", output=1, outputp=1, Data=Data, hp=hp, idx=idx)
# Plotting a cross-correlation function
plotmgpCovFun(type="Cor", output=1, outputp=2, Data=Data, hp=hp, idx=idx)
```


We assume that the mean functions for each output are 
$\mu_1(t) = 5t$, $\mu_2(t) = 10t$, and $\mu_3(t) = -3t$ and simulate the data as follows
```{r}
mu <- c( 5*input1, 10*input2, -3*input3)
Y <- t(mvrnorm(n=nrep, mu=mu, Sigma=Psi))
response <- list()
for(j in 1:N){
  response[[j]] <- Y[idx==j,,drop=F]
}
# storing the response in the list
Data$response <- response
```

```{r, include=F, eval=F}
dataExampleMGPR <- Data
save(dataExampleMGPR, file = "data/dataExampleMGPR.rda")
```


Below we estimate the mean and covariance functions using a subset of data 
including $m=100$ observations (out of $750$ of the sample) aiming for a faster estimation.
These $m$ observations are chosen randomly. For the mean functions, we choose 
the linear model by settting `meanModel = 't'`.
```{r}
res <- mgpr(Data=Data, m=100, meanModel = 't')
```


Next, based on the estimated model, we want to predict the values of the three 
outputs at new time points:
```{r}
n_star <- 60*N
input1star <- seq(min(input1), max(input1), length.out = n_star/N)
input2star <- seq(min(input2), max(input2), length.out = n_star/N)
input3star <- seq(min(input3), max(input3), length.out = n_star/N)
DataNew <- list()
DataNew$input <- list(input1star, input2star, input3star)
```

We have trained the model using $m$ time points. However, for visualisation purposes, 
it is more interesting to see predictions based on very few data points.
Therefore, let's use a very small subset of observations and make predictions given this 
small subset. We will use observations from the fifth multivariate realisation stored in `Data'.
```{r}
realisation <- 5

obsSet <- list()
obsSet[[1]] <- c(5, 10, 23, 50, 80, 200)
obsSet[[2]] <- c(10, 23, 180)
obsSet[[3]] <- c(3, 11, 30, 240)

DataObs <- list()
DataObs$input[[1]] <- Data$input[[1]][obsSet[[1]]]
DataObs$input[[2]] <- Data$input[[2]][obsSet[[2]]]
DataObs$input[[3]] <- Data$input[[3]][obsSet[[3]]]
DataObs$response[[1]] <- Data$response[[1]][obsSet[[1]], realisation]
DataObs$response[[2]] <- Data$response[[2]][obsSet[[2]], realisation]
DataObs$response[[3]] <- Data$response[[3]][obsSet[[3]], realisation]
```


The `mgprPredict` function returns a list containing the predictive mean and 
standard deviation for the curves of each output at the new time points.
```{r}
# Calculate predictions for the test set given some observations
predCGP <- mgprPredict(train=res, DataObs=DataObs, DataNew=DataNew)
str(predCGP)
```


The predictions (with 95\% confidence inverval) for the $5$th curve at the new 
time points can be visualised by using the model estimated by the `mgpr` function:
```{r, fig.width=9, fig.height=4}
plot(res, DataObs=DataObs, DataNew=DataNew)
```


Let's assume that we have additional information for the first two functions by 
also including their 100th and 150th observations:

```{r}
obsSet[[1]] <- c(5, 10, 23, 50, 80, 100, 150, 200)
obsSet[[2]] <- c(10, 23, 100, 150, 180)

DataObs$input[[1]] <- Data$input[[1]][obsSet[[1]]]
DataObs$input[[2]] <- Data$input[[2]][obsSet[[2]]]
DataObs$response[[1]] <- Data$response[[1]][obsSet[[1]], realisation]
DataObs$response[[2]] <- Data$response[[2]][obsSet[[2]], realisation]
```


```{r}
predCGP <- mgprPredict(train=res, DataObs=DataObs, DataNew=DataNew)
```

Now notice how predictions for the third function are affected by the information 
added to the other functions.
```{r, fig.width=9, fig.height=4}
plot(res, DataObs=DataObs, DataNew=DataNew)
```


```{r, include = FALSE}
options(old)
```