September 14, 2017

Introduction

About me

  • GVSU Math / UMich Bioinformatics
  • Worked with Prof. Aaron King to build biological models in R
  • I was an R novice a year and a half ago!
  • ashtonsb@umich.edu
  • github.com/ashtonbaker

About Markov Models

  • Much more than the Markov Chains you might have seen in Linear Algebra
  • Markov models are great for time series!
  • They allow us to use our understanding of the mechanisms that are generating data.

Markov Models

Markov Chain

  • A stochastic process that satisfies the Markov property
  • All the information about the future state of the system is contained in the present state of the system.
  • States of the system at sequential time points are related by a stochastic process with parameters \(\theta\).

\[X_0 \overset{\theta}{\rightarrow} X_1 \overset{\theta}{\rightarrow} X_2 \overset{\theta}{\rightarrow} \cdots\]

Markov Chain

  • A stochastic process that satisfies the Markov property
  • All the information about the future state of the system is contained in the present state of the system.
  • States of the system at sequential time points are related by a stochastic process with parameters \(\theta\).

\[\begin{bmatrix}A_0\\B_0\\C_0\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_1\\B_1\\C_1\end{bmatrix} \overset{\theta}{\rightarrow} \begin{bmatrix}A_2\\B_2\\C_2\end{bmatrix} \overset{\theta}{\rightarrow} \cdots\]

Markov Chain Example

Markov Chain Example

  • The state variables in this system are binary-valued:
##   X.bull. X.bear. X.stagnant.
## 1       1       0           0
## 2       0       1           0
## 3       1       0           0
## 4       0       0           1
## 5       1       0           0
## 6       0       1           0
  • Best way to estimate parameters in this case is to calculate MLE for the transition probabilities

Hidden Markov Model

  • The underlying system is a Markov process, with unobserved states.
  • Data is generated by a (stochastic) observation process, and comprise "observed" states.
  • Observation states and Process states may have any of a variety of relationships.

\[ \begin{matrix} X_0 & \rightarrow & X_1 & \rightarrow & X_2 & \rightarrow & \cdots & \text{Signal} \\ \downarrow & & \downarrow & & \downarrow & & & \\ Y_0 & & Y_1 & & Y_2 & & \cdots & \text{Observations} \end{matrix} \]

Transcription Factor Binding Site Prediction

  • Hidden state: position in binding site.
  • Observed state: nucleic acid sequence

\[\begin{array} \text{B} & B & B & B & 1 & 2 & 3 & 4 & B & \text{Signal} \\ A & C & G & A & A & C & T & C & A & \text{Observations} \end{array}\]

Mathelier, A., & Wasserman, W. W. (2013). The next generation of transcription factor binding site prediction. PLoS computational biology, 9(9), e1003214.

Using pomp

Using pomp

  • We can use the pomp package to encode Hidden Markov Models as R objects.

The structure of a pomp model

  • Our goal is to use pomp to perform "simulation-based inference"
  • We need to encode everything that we need to simulate the process that we hypothesize

Ricker model

  • A Ricker model is a population model that relates the population at time \(t\) to the population at time \(t + 1\). \[N_{t+1} =r N_t e^{\left(-\frac{N_t}{k}\right)}\]

  • \(r\) is the intrinsic growth rate

  • \(k\) is the carrying capacity

A stochastic Ricker Model

  • If we assume that the growth rate \(r\) is log-normally distributed, then \[N_{t+1} = r N_t \exp(-N_t/k + \epsilon_t),\] where \(\epsilon_t \sim \text{Normal}(0, \sigma)\).

  • In any real system, we need to model measurement error: \[y_t \sim \text{Poisson}(\phi N_t)\]

Ricker model

library(ggplot2)
data <- read.csv("./data/rickerdata.csv")
ggplot(data, aes(time, pop)) + geom_line()

Using pomp

  • pomp requires a few things to build a minimal model:
library(pomp)

rickerModel <- pomp(data = data,
                    times = 'time',
                    t0 = 0)
  • Specifying a few more arguments will help us in the next step:
rickerModel <- pomp(rickerModel,
                    statenames = c('N'),
                    obsnames = c('pop'),
                    paramnames = c('r','k','sigma'))

Initialization process

  • In our model, the initial values for \(N\) and \(\epsilon\) are parameters.
init <- function (params, t0,...){
  return(c('N' = params[['N.0']] ))
}

rickerModel <- pomp(rickerModel, initializer = init)

Specifying the stochastic process

rproc <- discrete.time.sim(
  step.fun = function(x, t, params, delta.t, ...) {
    sigma <- params[['sigma']]
    r <- params[['r']]
    k <- params[['k']]
    N <- x[['N']]
    e <- rnorm(1, mean=0, sd=sigma)

    result <- c('N' = r * N * exp(e - N/k))

    return(result)
  },
  delta.t = 1
)

rickerModel <- pomp(rickerModel, rprocess = rproc)

Specifying the measurement model

\(y_t \sim \text{Poisson}(\phi N_t)\)

rmeas <- function (x, t, params, ...) {
  N <- x[['N']]
  phi <- params[['phi']]
  result <- c('pop' = rpois(n=1, lambda=phi * N))

  return(result)
}

rickerModel <- pomp(rickerModel, rmeasure = rmeas)

Specifying the measurement model distribution function

  • dmeasure should be the PDF of the Poisson distribution we used for rmeasure. This is available in R as the dpois function.
dmeas <- function(y, x, t, params, log, ...) {
  N <- x[['N']]
  pop <- y[['pop']]
  phi <- params[['phi']]

  return(dpois(pop, lambda = phi*N, log=log))
}

rickerModel <- pomp(rickerModel, dmeasure = dmeas)

Simulation

p <- c(N.0=1, r=20, k=1,sigma=0.1,phi=200)
sim <- simulate(rickerModel, params = p, as.data.frame = T)
ggplot(sim, aes(time, pop)) + geom_line()

Comparing simulations to the data

sim <- simulate(rickerModel, params = p, nsim=8, as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim,aes(x=time,y=pop,group=sim,color=(sim=="data")))+
  geom_line()+guides(color=FALSE)+
  facet_wrap(~sim,ncol=3)