September 14, 2017

## Introduction

• GVSU Math / UMich Bioinformatics
• Worked with Prof. Aaron King to build biological models in R
• I was an R novice a year and a half ago!
• ashtonsb@umich.edu
• github.com/ashtonbaker

• Much more than the Markov Chains you might have seen in Linear Algebra
• Markov models are great for time series!
• They allow us to use our understanding of the mechanisms that are generating data.

## Markov Chain

• A stochastic process that satisfies the Markov property
• All the information about the future state of the system is contained in the present state of the system.
• States of the system at sequential time points are related by a stochastic process with parameters $$\theta$$.

$X_0 \overset{\theta}{\rightarrow} X_1 \overset{\theta}{\rightarrow} X_2 \overset{\theta}{\rightarrow} \cdots$

## Markov Chain

• A stochastic process that satisfies the Markov property
• All the information about the future state of the system is contained in the present state of the system.
• States of the system at sequential time points are related by a stochastic process with parameters $$\theta$$.

$\begin{bmatrix}A_0\\B_0\\C_0\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_1\\B_1\\C_1\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_2\\B_2\\C_2\end{bmatrix} \overset{\theta}{\rightarrow} \cdots$

## Markov Chain Example

• The state variables in this system are binary-valued:
##   X.bull. X.bear. X.stagnant.
## 1       1       0           0
## 2       0       1           0
## 3       1       0           0
## 4       0       0           1
## 5       1       0           0
## 6       0       1           0
• Best way to estimate parameters in this case is to calculate MLE for the transition probabilities

## Hidden Markov Model

• The underlying system is a Markov process, with unobserved states.
• Data is generated by a (stochastic) observation process, and comprise "observed" states.
• Observation states and Process states may have any of a variety of relationships.

$\begin{matrix} X_0 & \rightarrow & X_1 & \rightarrow & X_2 & \rightarrow & \cdots & \text{Signal} \\ \downarrow & & \downarrow & & \downarrow & & & \\ Y_0 & & Y_1 & & Y_2 & & \cdots & \text{Observations} \end{matrix}$

## Transcription Factor Binding Site Prediction

• Hidden state: position in binding site.
• Observed state: nucleic acid sequence

$\begin{array} \text{B} & B & B & B & 1 & 2 & 3 & 4 & B & \text{Signal} \\ A & C & G & A & A & C & T & C & A & \text{Observations} \end{array}$

Mathelier, A., & Wasserman, W. W. (2013). The next generation of transcription factor binding site prediction. PLoS computational biology, 9(9), e1003214.

## Using pomp

• We can use the pomp package to encode Hidden Markov Models as R objects.

## The structure of a pomp model

• Our goal is to use pomp to perform "simulation-based inference"
• We need to encode everything that we need to simulate the process that we hypothesize

## Ricker model

• A Ricker model is a population model that relates the population at time $$t$$ to the population at time $$t + 1$$. $N_{t+1} =r N_t e^{\left(-\frac{N_t}{k}\right)}$

• $$r$$ is the intrinsic growth rate

• $$k$$ is the carrying capacity

## A stochastic Ricker Model

• If we assume that the growth rate $$r$$ is log-normally distributed, then $N_{t+1} = r N_t \exp(-N_t/k + \epsilon_t),$ where $$\epsilon_t \sim \text{Normal}(0, \sigma)$$.

• In any real system, we need to model measurement error: $y_t \sim \text{Poisson}(\phi N_t)$

## Ricker model

library(ggplot2)
ggplot(data, aes(time, pop)) + geom_line()

## Using pomp

• pomp requires a few things to build a minimal model:
library(pomp)

rickerModel <- pomp(data = data,
times = 'time',
t0 = 0)
• Specifying a few more arguments will help us in the next step:
rickerModel <- pomp(rickerModel,
statenames = c('N'),
obsnames = c('pop'),
paramnames = c('r','k','sigma'))

## Initialization process

• In our model, the initial values for $$N$$ and $$\epsilon$$ are parameters.
init <- function (params, t0,...){
return(c('N' = params[['N.0']] ))
}

rickerModel <- pomp(rickerModel, initializer = init)

## Specifying the stochastic process

rproc <- discrete.time.sim(
step.fun = function(x, t, params, delta.t, ...) {
sigma <- params[['sigma']]
r <- params[['r']]
k <- params[['k']]
N <- x[['N']]
e <- rnorm(1, mean=0, sd=sigma)

result <- c('N' = r * N * exp(e - N/k))

return(result)
},
delta.t = 1
)

rickerModel <- pomp(rickerModel, rprocess = rproc)

## Specifying the measurement model

$$y_t \sim \text{Poisson}(\phi N_t)$$

rmeas <- function (x, t, params, ...) {
N <- x[['N']]
phi <- params[['phi']]
result <- c('pop' = rpois(n=1, lambda=phi * N))

return(result)
}

rickerModel <- pomp(rickerModel, rmeasure = rmeas)

## Specifying the measurement model distribution function

• dmeasure should be the PDF of the Poisson distribution we used for rmeasure. This is available in R as the dpois function.
dmeas <- function(y, x, t, params, log, ...) {
N <- x[['N']]
pop <- y[['pop']]
phi <- params[['phi']]

return(dpois(pop, lambda = phi*N, log=log))
}

rickerModel <- pomp(rickerModel, dmeasure = dmeas)

## Simulation

p <- c(N.0=1, r=20, k=1,sigma=0.1,phi=200)
sim <- simulate(rickerModel, params = p, as.data.frame = T)
ggplot(sim, aes(time, pop)) + geom_line()

## Comparing simulations to the data

sim <- simulate(rickerModel, params = p, nsim=8, as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim,aes(x=time,y=pop,group=sim,color=(sim=="data")))+
geom_line()+guides(color=FALSE)+
facet_wrap(~sim,ncol=3)