Author: Alan O’Callaghan (alan.b.ocallaghan@gmail.com)

1 Introduction

Regularised regression is a form of statistical analysis commonly used within biological research. It is beyond the scope of the present vignette to explain in detail. In brief, regularization applies a penalty term to coefficients. This may be applied in several ways.

LASSO (Least Absolute Selection and Shrinkage Operator), or L1 regularization, applies a penalty to the absolute value of the coefficients. Increasing the degree of penalization in this instance tends to cause coefficients to be reduced to zero, effectively removing those features from the model.

L2 regularization, or ridge regression, applies a similar penalty to the squared value of coefficients. In this instance, the consequence of increasing the penalty term is not the removal of features, but rather the reduction in magnitude of all features. This can be used to reduce the variance of the model, and to prevent overfitting.

Finally, the “elastic net” is a combination of these two methods. The proportion of each penalty type applied is controlled by an alpha parameter, which may vary between zero (ridge regression) and one (LASSO).

The glmnet package implements the elastic net for many common generalized linear models, including linear regression, logistic regression, Poisson regression, and Cox proportional hazards models. In this instance, we will examine some simple plots using Cox proportional hazards models, though the techniques described herein may be applied to any form of elastic net regression.

2 Lambda plot from glmnet package

First, we must generate a set of penalized regression models. To do this, we may use example data from the glmnet package. In order to visualize the coefficients selected, we can generate plots of coefficient magnitude against the lambda parameter.

suppressPackageStartupMessages(library(glmnet))
## Example data from the glmnet package
data(CoxExample)

## Fit ridge regression model
fit1 <- glmnet(x, y, 
  family="cox", 
  nlambda=500, 
  alpha=0)

## Visualise coefficients at different values of lambda
plot(fit1, "lambda")

## Fit LASSO regression model
fit2 <- glmnet(x, y, 
  family="cox", 
  nlambda=500, 
  alpha=1)

## Visualise coefficients at different values of lambda
plot(fit2, "lambda")

It is worth noting that this is a very useful visualization. In LASSO models, one may observe how many coefficients drop off at different values of lambda, which may indicate whether some subset of these holds most of the predictive power within initial predictor set. However it should also be noted that it can be difficult to identify features which are being retained in the model, particularly when there are large numbers of features. In ridge regression models, one may see the relative magnitudes of different predictors. However, overplotting is very common as coefficients may take similar paths through lambda space.

3 heatmaply plot of coefficients against lambda

heatmaply allows a similarly useful visualization of coefficient magnitude against lambda values, but in this case it may be easier to identify coefficients which have interesting behavior over the range of lambda values used. In the present example, feature names are meaningless, however using this method with real data would allow the user to rapidly identify features which are being retained, or features which appear to be unstable (due to changing coefficient sign). The user may zoom to see individual values, and row annotations could be added to show extra information about the included predictors. In the case of large matrices (large numbers of features and/or lambda values), it is useful to disable clustering of the columns (lambda). In this instance, we will retain row clustering in order to show similar coefficients together.

beta1 <- as.matrix(fit1$beta)
heatmaply(beta1, dendrogram="row",
  main = "Heatmap of ridge regression model coefficients over range of lambda values",
  margins = c(50,40),
  limits = c(-max(abs(beta1)), max(abs(beta1))), # to make sure 0 is in the center of the colors
  col = cool_warm, grid_gap = 0.5, k_row = 5
  )
beta2 <- as.matrix(fit2$beta)
heatmaply(beta2, dendrogram="row", plot_method = "plotly",
    main = "Heatmap of LASSO model coefficients over range of lambda values",
    margins = c(50,40),
      limits = c(-max(abs(beta2)), max(abs(beta2))), # to make sure 0 is in the center of the colors
  col = cool_warm, grid_gap = 0.5, k_row = 5
    )

4 References

5 sessionInfo

sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255   
#> [3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                  
#> [5] LC_TIME=Hebrew_Israel.1255    
#> 
#> attached base packages:
#> [1] parallel  stats     graphics  grDevices datasets  utils     methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] gplots_3.0.1            limma_3.22.7           
#>  [3] ALL_1.7.1               Biobase_2.26.0         
#>  [5] BiocGenerics_0.12.1     dendextend_1.6.0       
#>  [7] glmnet_2.0-10           foreach_1.4.3          
#>  [9] Matrix_1.2-10           bindrcpp_0.2           
#> [11] knitr_1.16              heatmaplyExamples_0.1.0
#> [13] heatmaply_0.11.0        viridis_0.4.0          
#> [15] viridisLite_0.2.0       plotly_4.7.1           
#> [17] ggplot2_2.2.1.9000      installr_0.19.0        
#> [19] stringr_1.2.0          
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.2.1         tidyr_0.6.3        jsonlite_1.5      
#>  [4] gtools_3.5.0       shiny_1.0.3        assertthat_0.2.0  
#>  [7] stats4_3.4.1       yaml_2.1.14        robustbase_0.92-7 
#> [10] backports_1.1.0    lattice_0.20-35    glue_1.1.1        
#> [13] digest_0.6.12      RColorBrewer_1.1-2 colorspace_1.3-2  
#> [16] httpuv_1.3.5       htmltools_0.3.6    plyr_1.8.4        
#> [19] devtools_1.13.2    pkgconfig_2.0.1    xtable_1.8-2      
#> [22] purrr_0.2.2.2      mvtnorm_1.0-6      scales_0.5.0      
#> [25] gdata_2.18.0       whisker_0.3-2      tibble_1.3.4      
#> [28] mgcv_1.8-17        withr_2.0.0        nnet_7.3-12       
#> [31] lazyeval_0.2.0     mime_0.5           magrittr_1.5      
#> [34] mclust_5.3         memoise_1.1.0      evaluate_0.10.1   
#> [37] nlme_3.1-131       MASS_7.3-47        class_7.3-14      
#> [40] tools_3.4.1        registry_0.3       data.table_1.10.4 
#> [43] trimcluster_0.1-2  kernlab_0.9-25     munsell_0.4.3     
#> [46] cluster_2.0.6      fpc_2.1-10         compiler_3.4.1    
#> [49] caTools_1.17.1     rlang_0.1.2        grid_3.4.1        
#> [52] iterators_1.0.8    htmlwidgets_0.9    crosstalk_1.0.0   
#> [55] labeling_0.3       bitops_1.0-6       rmarkdown_1.6     
#> [58] gtable_0.2.0       codetools_0.2-15   flexmix_2.3-14    
#> [61] reshape2_1.4.2     TSP_1.1-5          R6_2.2.2          
#> [64] seriation_1.2-2    gridExtra_2.2.1    prabclus_2.2-6    
#> [67] dplyr_0.7.3.9000   bindr_0.1          rprojroot_1.2     
#> [70] KernSmooth_2.23-15 modeltools_0.2-21  stringi_1.1.5     
#> [73] Rcpp_0.12.12       gclus_1.3.1        DEoptimR_1.0-8    
#> [76] diptest_0.75-7