Last updated: 2018-08-28
workflowr checks: (Click a bullet for more information) ✖ R Markdown file: uncommitted changes
The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(12345)
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 241c630
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: analysis/.DS_Store
Ignored: data/.DS_Store
Ignored: data/aux_info/
Ignored: data/hg_38/
Ignored: data/libParams/
Ignored: output/.DS_Store
Untracked files:
Untracked: _workflowr.yml
Untracked: analysis/Collection_dates.Rmd
Untracked: analysis/Converting_IDs.Rmd
Untracked: analysis/Global_variation.Rmd
Untracked: analysis/Preliminary_clinical_covariate.Rmd
Untracked: analysis/VennDiagram2018-07-24_06-55-46.log
Untracked: analysis/VennDiagram2018-07-24_06-56-13.log
Untracked: analysis/VennDiagram2018-07-24_06-56-50.log
Untracked: analysis/VennDiagram2018-07-24_06-58-41.log
Untracked: analysis/VennDiagram2018-07-24_07-00-07.log
Untracked: analysis/VennDiagram2018-07-24_07-00-42.log
Untracked: analysis/VennDiagram2018-07-24_07-01-08.log
Untracked: analysis/VennDiagram2018-08-17_15-13-24.log
Untracked: analysis/VennDiagram2018-08-17_15-13-30.log
Untracked: analysis/VennDiagram2018-08-17_15-15-06.log
Untracked: analysis/VennDiagram2018-08-17_15-16-01.log
Untracked: analysis/VennDiagram2018-08-17_15-17-51.log
Untracked: analysis/VennDiagram2018-08-17_15-18-42.log
Untracked: analysis/VennDiagram2018-08-17_15-19-21.log
Untracked: analysis/VennDiagram2018-08-20_09-07-57.log
Untracked: analysis/VennDiagram2018-08-20_09-08-37.log
Untracked: analysis/VennDiagram2018-08-26_19-54-03.log
Untracked: analysis/VennDiagram2018-08-26_20-47-08.log
Untracked: analysis/VennDiagram2018-08-26_20-49-49.log
Untracked: analysis/VennDiagram2018-08-27_00-04-36.log
Untracked: analysis/VennDiagram2018-08-27_00-09-27.log
Untracked: analysis/VennDiagram2018-08-27_00-13-57.log
Untracked: analysis/VennDiagram2018-08-27_00-16-32.log
Untracked: analysis/VennDiagram2018-08-27_10-00-25.log
Untracked: analysis/VennDiagram2018-08-28_06-03-13.log
Untracked: analysis/VennDiagram2018-08-28_06-03-14.log
Untracked: analysis/VennDiagram2018-08-28_06-05-50.log
Untracked: analysis/VennDiagram2018-08-28_06-06-58.log
Untracked: analysis/VennDiagram2018-08-28_06-10-12.log
Untracked: analysis/VennDiagram2018-08-28_06-10-13.log
Untracked: analysis/background_dds_david.csv
Untracked: analysis/correlations_bet_covariates.Rmd
Untracked: analysis/correlations_over_time.Rmd
Untracked: analysis/genocode_annotation_info.Rmd
Untracked: analysis/genotypes.Rmd
Untracked: analysis/import_transcript_level_estimates.Rmd
Untracked: analysis/test_dds_david.csv
Untracked: analysis/tximport.Rmd
Untracked: analysis/unnormalized_data.Rmd
Untracked: analysis/variables_by_time.Rmd
Untracked: analysis/voom_limma.Rmd
Untracked: analysis/voom_limma_weight_change.Rmd
Untracked: data/BAN2 Dates_T1_T2.xlsx
Untracked: data/BAN_DATES.csv
Untracked: data/BAN_DATES.xlsx
Untracked: data/BAN_DATES_txt.csv
Untracked: data/Ban_geno.csv
Untracked: data/Ban_geno.xlsx
Untracked: data/Blood_dates.txt
Untracked: data/DAVID_background.txt
Untracked: data/DAVID_list_T1T2.txt
Untracked: data/DAVID_list_T1T2_weight.txt
Untracked: data/DAVID_list_T2T3.txt
Untracked: data/DAVID_list_T2T3_weight.txt
Untracked: data/DAVID_results/
Untracked: data/DAVID_top100_list_T1T2.txt
Untracked: data/DAVID_top100_list_T1T2_weight.txt
Untracked: data/DAVID_top100_list_T2T3.txt
Untracked: data/DAVID_top100_list_T2T3_weight.txt
Untracked: data/Eigengenes/
Untracked: data/FemaleWeightRestoration-01-dataInput.RData
Untracked: data/FemaleWeightRestoration-resid-01-dataInput.RData
Untracked: data/FemaleWeightRestoration-resid-T1T2-01-dataInput.RData
Untracked: data/HTSF_IDs.sav
Untracked: data/Homo_sapiens.GRCh38.v22_table.txt
Untracked: data/Labels.csv
Untracked: data/Labels.xlsx
Untracked: data/RIN.xlsx
Untracked: data/RIN_over_time.csv
Untracked: data/RIN_over_time.xlsx
Untracked: data/T0_consolid.csv
Untracked: data/T0_consolid.xlsx
Untracked: data/age_t1.txt
Untracked: data/birthday_age.csv
Untracked: data/birthday_age.xlsx
Untracked: data/cmd_info.json
Untracked: data/counts_hg38_gc.RData
Untracked: data/counts_hg38_gc_dds.RData
Untracked: data/counts_hg38_gc_txsalmon.RData
Untracked: data/covar_lm.csv
Untracked: data/covar_lm_missing.csv
Untracked: data/eigengenes_T1_T2_cov_adj_exp_5_modules.txt
Untracked: data/eigengenes_T1_T2_module_background.txt
Untracked: data/eigengenes_adj_exp_7_modules.txt
Untracked: data/eigengenes_cov_adj_exp_14_modules.txt
Untracked: data/eigengenes_module_background.txt
Untracked: data/eigengenes_unadj_exp_10_modules.txt
Untracked: data/eigengenes_unadj_exp_6_modules.txt
Untracked: data/eigengenes_unadj_exp_9_modules.txt
Untracked: data/files_list.txt
Untracked: data/final_covariates.csv
Untracked: data/gene_exp_values_2202.txt
Untracked: data/gene_exp_values_2209.txt
Untracked: data/gene_exp_values_2218.txt
Untracked: data/gene_exp_values_2220.txt
Untracked: data/gene_exp_values_2226.txt
Untracked: data/gene_exp_values_2228.txt
Untracked: data/gene_names_58387.txt
Untracked: data/gene_to_tran.txt
Untracked: data/lm_covar_fixed_random.csv
Untracked: data/logs/
Untracked: data/module_T1T2_cov_adj_blue.txt
Untracked: data/module_T1T2_cov_adj_brown.txt
Untracked: data/module_T1T2_cov_adj_turquoise.txt
Untracked: data/module_T1T2_cov_adj_yellow.txt
Untracked: data/module_adj_cov_merged_blue.txt
Untracked: data/module_adj_cov_merged_brown.txt
Untracked: data/module_adj_cov_merged_cyan.txt
Untracked: data/module_adj_cov_merged_green.txt
Untracked: data/module_adj_cov_merged_greenyellow.txt
Untracked: data/module_adj_cov_merged_magenta.txt
Untracked: data/module_adj_cov_merged_red.txt
Untracked: data/module_adj_cov_merged_salmon.txt
Untracked: data/module_adj_cov_merged_tan.txt
Untracked: data/module_adj_cov_merged_yellow.txt
Untracked: data/module_black.txt
Untracked: data/module_blue.txt
Untracked: data/module_brown.txt
Untracked: data/module_cov_adj_black.txt
Untracked: data/module_cov_adj_blue.txt
Untracked: data/module_cov_adj_brown.txt
Untracked: data/module_cov_adj_cyan.txt
Untracked: data/module_cov_adj_green.txt
Untracked: data/module_cov_adj_greenyellow.txt
Untracked: data/module_cov_adj_magenta.txt
Untracked: data/module_cov_adj_pink.txt
Untracked: data/module_cov_adj_purple.txt
Untracked: data/module_cov_adj_red.txt
Untracked: data/module_cov_adj_salmon.txt
Untracked: data/module_cov_adj_tan.txt
Untracked: data/module_cov_adj_turquoise.txt
Untracked: data/module_cov_adj_yellow.txt
Untracked: data/module_cyan.txt
Untracked: data/module_green.txt
Untracked: data/module_greenyellow.txt
Untracked: data/module_magenta.txt
Untracked: data/module_merged_black.txt
Untracked: data/module_merged_blue.txt
Untracked: data/module_merged_brown.txt
Untracked: data/module_merged_cyan.txt
Untracked: data/module_merged_green.txt
Untracked: data/module_merged_greenyellow.txt
Untracked: data/module_merged_magenta.txt
Untracked: data/module_merged_pink.txt
Untracked: data/module_merged_purple.txt
Untracked: data/module_merged_red.txt
Untracked: data/module_merged_salmon.txt
Untracked: data/module_merged_tan.txt
Untracked: data/module_merged_turquoise.txt
Untracked: data/module_merged_yellow.txt
Untracked: data/module_pink.txt
Untracked: data/module_purple.txt
Untracked: data/module_red.txt
Untracked: data/module_salmon.txt
Untracked: data/module_tan.txt
Untracked: data/module_turquoise.txt
Untracked: data/module_yellow.txt
Untracked: data/notimecovariates.csv
Untracked: data/only_individuals_biomarkers_weight_restoration_study.xlsx
Untracked: data/pcs_genes.csv
Untracked: data/pcs_genes.txt
Untracked: data/rest1t2_BI_hg37.rds
Untracked: data/rest1t2_BI_hg38.rds
Untracked: data/rest1t2_hg37.rds
Untracked: data/rest1t2_psych_meds_BMI_hg37.rds
Untracked: data/rest1t2_psych_meds_hg37.rds
Untracked: data/rest2t3_BI_hg37.rds
Untracked: data/rest2t3_BI_hg38.rds
Untracked: data/rest2t3_hg37.rds
Untracked: data/rest2t3_psych_meds_BMI_hg37.rds
Untracked: data/rest2t3_psych_meds_hg37.rds
Untracked: data/salmon_gene_matrix_bak_reorder_time.txt
Untracked: data/technical_sample_info.csv
Untracked: data/tx_to_gene.txt
Untracked: data/tx_to_gene_37.txt
Untracked: data/usa2.pcawithref.menv.mds_cov
Untracked: data/~$Labels.xlsx
Untracked: data/~$T0_consolid.xlsx
Untracked: docs/VennDiagram2018-07-24_06-55-46.log
Untracked: docs/VennDiagram2018-07-24_06-56-13.log
Untracked: docs/VennDiagram2018-07-24_06-56-50.log
Untracked: docs/VennDiagram2018-07-24_06-58-41.log
Untracked: docs/VennDiagram2018-07-24_07-00-07.log
Untracked: docs/VennDiagram2018-07-24_07-00-42.log
Untracked: docs/VennDiagram2018-07-24_07-01-08.log
Untracked: docs/figure/
Unstaged changes:
Modified: analysis/_site.yml
Modified: analysis/about.Rmd
Deleted: analysis/chunks.R
Modified: analysis/index.Rmd
Modified: analysis/license.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
# Library
library(edgeR)
Loading required package: limma
library(limma)
library(VennDiagram)
Warning: package 'VennDiagram' was built under R version 3.4.4
Loading required package: grid
Loading required package: futile.logger
library(ggplot2)
Warning: package 'ggplot2' was built under R version 3.4.4
library(cowplot)
Warning: package 'cowplot' was built under R version 3.4.4
Attaching package: 'cowplot'
The following object is masked from 'package:ggplot2':
ggsave
# Read in the data
tx.salmon <- readRDS("../data/counts_hg38_gc_txsalmon.RData")
salmon_counts<- as.data.frame(tx.salmon$counts)
#tx.salmon <- readRDS("../data/counts_hg38_gc_dds.RData")
#salmon_counts<- as.data.frame(tx.salmon)
# Subset to T1-T3
salmon_counts <- salmon_counts[,1:144]
# Read in the clinical covariates
clinical_sample_info <- read.csv("../data/lm_covar_fixed_random.csv")
dim(clinical_sample_info)
[1] 156 13
# Subset to T1-T3
clinical_sample <- clinical_sample_info[1:144,(-12)]
dim(clinical_sample)
[1] 144 12
# Filter lowly expressed reads
cpm <- cpm(salmon_counts, log=TRUE)
expr_cutoff <- 1.5
hist(cpm, main = "log2(CPM) values in unfiltered data", breaks = 100, xlab = "log2(CPM) values")
abline(v = expr_cutoff, col = "red", lwd = 3)
hist(cpm, main = "log2(CPM) values in unfiltered data", breaks = 100, xlab = "log2(CPM) values", ylim = c(0, 100000))
abline(v = expr_cutoff, col = "red", lwd = 3)
# Basic filtering
cpm_filtered <- (rowSums(cpm > 1.5) > 72)
genes_in_cutoff <- cpm[cpm_filtered==TRUE,]
hist(as.numeric(unlist(genes_in_cutoff)), main = "log2(CPM) values in filtered data", breaks = 100, xlab = "log2(CPM) values")
# Find the original counts of all of the genes that fit the criteria
counts_genes_in_cutoff <- salmon_counts[cpm_filtered==TRUE,]
dim(counts_genes_in_cutoff)
[1] 11619 144
# Filter out hemoglobin
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBB" ),]
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBA2" ),]
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBA1" ),]
# Take the TMM of the counts only for the genes that remain after filtering
dge_in_cutoff <- DGEList(counts=as.matrix(counts_genes_in_cutoff), genes=rownames(counts_genes_in_cutoff), group = as.character(t(clinical_sample$Individual)))
dge_in_cutoff <- calcNormFactors(dge_in_cutoff)
cpm_in_cutoff <- cpm(dge_in_cutoff, normalized.lib.sizes=TRUE, log=TRUE)
# Run PCA on the cpm data
pca_genes <- prcomp(t(cpm_in_cutoff), scale = T, retx = TRUE, center = TRUE)
matrixpca <- pca_genes$x
PC1 <- matrixpca[,1]
PC2 <- matrixpca[,2]
pc3 <- matrixpca[,3]
pc4 <- matrixpca[,4]
pc5 <- matrixpca[,5]
pcs <- data.frame(PC1, PC2, pc3, pc4, pc5)
summary <- summary(pca_genes)
head(summary$importance[2,1:5])
PC1 PC2 PC3 PC4 PC5
0.25012 0.13000 0.08757 0.05524 0.03285
norm_count <- ggplot(data=pcs, aes(x=PC1, y=PC2, color= as.factor(clinical_sample$Time))) + geom_point(aes(colour = as.factor(clinical_sample$Time))) + ggtitle("PCA of normalized counts") + scale_color_discrete(name = "Time")
plot_grid(norm_count)
# Make sure certain clinical factors are factors (e.g. presence of psychiatric meds or not)
clinical_sample[,1] <- as.factor(clinical_sample[,1])
clinical_sample[,2] <- as.factor(clinical_sample[,2])
clinical_sample[,4] <- as.factor(clinical_sample[,4])
clinical_sample[,5] <- as.factor(clinical_sample[,5])
clinical_sample[,6] <- as.factor(clinical_sample[,6])
# Create the design matrix
# Use the standard treatment-contrasts parametrization. See Ch. 9 of limma
# User's Guide.
design <- model.matrix(~as.factor(Time) + Age + as.factor(Race) + as.factor(BE_GROUP) + as.factor(psychmeds) + RBC + AN + AE + AL + RIN, data = clinical_sample)
colnames(design) <- c("Intercept", "Time2", "Time3", "Race3", "Race5", "Age", "BE", "Psychmeds", "RBC", "AN", "AE", "AL", "RIN")
# Fit model
# Model individual as a random effect.
# Recommended to run both voom and duplicateCorrelation twice.
# https://support.bioconductor.org/p/59700/#67620
cpm.voom <- voom(dge_in_cutoff, design, normalize.method="none")
#check_rel <- duplicateCorrelation(cpm.voom, design, block = clinical_sample$Individual)
check_rel_correlation <- 0.1179835
cpm.voom.corfit <- voom(dge_in_cutoff, design, normalize.method="none", plot = TRUE, block = clinical_sample$Individual, correlation = check_rel_correlation)
#check_rel <- duplicateCorrelation(cpm.voom.corfit, design, block = clinical_sample$Individual)
check_rel_correlation <- 0.1188083
# Look at the densities of the gene expression data
plotDensities(cpm.voom.corfit[,1])
plotDensities(cpm.voom.corfit[,2])
plotDensities(cpm.voom.corfit[,3])
# PCA on the filtered, normalized data
pca_genes <- prcomp(t(cpm.voom.corfit$E), scale = T, retx = TRUE, center = TRUE)
matrixpca <- pca_genes$x
PC1 <- matrixpca[,1]
PC2 <- matrixpca[,2]
pc3 <- matrixpca[,3]
pc4 <- matrixpca[,4]
pc5 <- matrixpca[,5]
pcs <- data.frame(PC1, PC2, pc3, pc4, pc5)
summary <- summary(pca_genes)
head(summary$importance[2,1:5])
PC1 PC2 PC3 PC4 PC5
0.25066 0.13028 0.08774 0.05501 0.03289
ggplot(data=pcs, aes(x=PC1, y=PC2, color=clinical_sample$Time)) + geom_point(aes(colour = as.factor(clinical_sample$Time))) + ggtitle("PCA of normalized counts") + scale_color_discrete(name = "Time")
# Run lmFit and eBayes in limma
fit <- lmFit(cpm.voom.corfit, design, block=clinical_sample$Individual, correlation=check_rel_correlation)
# In the contrast matrix, have the time points
cm1 <- makeContrasts(Time1v2 = Time2, Time2v3 = Time3 - Time2, levels = design)
#cm1 <- makeContrasts(Time1v2 = Time2, Time2v3 = Time3, levels = design)
# Fit the new model
diff_species <- contrasts.fit(fit, cm1)
fit1 <- eBayes(diff_species)
# Pull the limma output for all genes
FDR_level <- 0.05
Time1v2 =topTable(fit1, coef=1, adjust="BH", number=Inf, sort.by="none")
Time2v3 =topTable(fit1, coef=2, adjust="BH", number=Inf, sort.by="none")
# Compare genes from T1-T2 to genes fromo T2-T3
plot(Time1v2$logFC, Time2v3$logFC)
plot(Time1v2$t, Time2v3$t)
plot(Time1v2$adj.P.Val, Time2v3$adj.P.Val)
# Look at the DE genes
dim(Time1v2[which(Time1v2$adj.P.Val < FDR_level),])
[1] 551 7
dim(Time2v3[which(Time2v3$adj.P.Val < FDR_level),])
[1] 2284 7
head(topTable(fit1, coef=1, adjust="BH", number=100, sort.by="T"))
genes logFC AveExpr t P.Value adj.P.Val
DCAF6 DCAF6 0.6657584 5.448487 7.392736 1.332299e-11 1.547599e-07
RNF10 RNF10 1.0376273 8.824150 6.845550 2.395666e-10 9.398454e-07
CTNNAL1 CTNNAL1 1.4082454 1.961155 6.843024 2.427287e-10 9.398454e-07
ALAS2 ALAS2 1.9586412 10.196972 6.733417 4.279062e-10 1.242640e-06
ABALON ABALON 1.4481459 2.170769 6.602659 8.369152e-10 1.744599e-06
GPR146 GPR146 1.4308371 4.684580 6.588174 9.011357e-10 1.744599e-06
B
DCAF6 15.98307
RNF10 13.20469
CTNNAL1 12.22306
ALAS2 12.60339
ABALON 11.42340
GPR146 11.94909
head(topTable(fit1, coef=2, adjust="BH", number=100, sort.by="T"))
genes logFC AveExpr t P.Value
MPHOSPH8 MPHOSPH8 0.7231271 6.149241 9.988373 5.945896e-18
CYP20A1 CYP20A1 0.6462985 4.471779 9.930597 8.318459e-18
DFFA DFFA 0.5373614 4.550302 9.190698 5.959160e-16
RTF1 RTF1 0.5818325 5.568985 8.748933 7.390434e-15
RP11-83A24.2 RP11-83A24.2 1.0230461 2.905414 8.211697 1.511135e-13
CTA-292E10.9 CTA-292E10.9 0.8009923 3.931612 8.091686 2.942350e-13
adj.P.Val B
MPHOSPH8 4.831361e-14 30.11951
CYP20A1 4.831361e-14 29.66214
DFFA 2.307387e-12 25.59082
RTF1 2.146182e-11 23.24499
RP11-83A24.2 3.510668e-10 19.94151
CTA-292E10.9 5.696390e-10 19.59297
Time12 <- rownames(Time1v2[which(Time1v2$adj.P.Val < FDR_level),])
Time23 <- rownames(Time2v3[which(Time2v3$adj.P.Val < FDR_level),])
mylist <- list()
mylist[["DE T1 to T2"]] <- Time12
mylist[["DE T2 to T3"]] <- Time23
# Make as pdf
Four_comp <- venn.diagram(mylist, filename= NULL, main="DE genes between timepoints (5% FDR)", cex=1.5 , fill = NULL, lty=1, height=2000, width=2000, rotation.degree = 180, scaled = FALSE, cat.pos = c(0,0))
grid.draw(Four_comp)
dev.off()
null device
1
#pdf(file = "~/Dropbox/Figures/DET1_T2.pdf")
# grid.draw(Four_comp)
#dev.off()
Step 1: Use awk command to go from annotation file to ENSG and gene name only
cat gencode.v22.annotation.gtf | awk ‘BEGIN{FS=“”}{split($9,a,“;”); if($3~“gene”) print a[1]“”a[3]“”$1“:”$4“-”$5“”$7}’ | sed ‘s/gene_id “//’ | sed ’s/gene_id”//’ | sed ‘s/gene_biotype “//’| sed ’s/gene_name”//’ | sed ’s/“//g’ > Homo_sapiens.GRCh38.v22_table.txt
Step 2: Get ENSEMBL Gene IDs for background
# Get gene names after filtering
genes=as.data.frame(rownames(counts_genes_in_cutoff))
colnames(genes) <- c("Gene_name")
# Gene to ID conversion document
gene_id <- read.table("../data/Homo_sapiens.GRCh38.v22_table.txt", stringsAsFactors = FALSE)
colnames(gene_id) <- c("ENSEMBL", "Gene_name")
# Eliminate the feature after the period (for DAVID)
check <- gsub("\\..*","",gene_id$ENSEMBL)
new_gene_id <- cbind(check, gene_id$Gene_name)
gene_id <- new_gene_id
colnames(gene_id) <- c("ENSEMBL", "Gene_name")
# Get gene names of the background
comb_background <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_background$Gene_name))
Mode FALSE TRUE
logical 11616 387
comb_background1 <- comb_background[!duplicated(comb_background$Gene_name),]
dim(comb_background1)
[1] 11616 2
write.table(comb_background1$ENSEMBL, "../data/DAVID_background.txt", quote = F, row.names = F, col.names = F)
Step 3: Get ENSEMBL Gene IDs for T1 to T2 list
# Get gene names after filtering
genes=as.data.frame(rownames(Time1v2[which(Time1v2$adj.P.Val < FDR_level),]))
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 551 1
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
dim(comb_list)
[1] 551 2
write.table(comb_list$ENSEMBL, "../data/DAVID_list_T1T2.txt", quote = F, row.names = F, col.names = F)
# Kim et al used the top 100 genes
genes=as.data.frame(rownames(topTable(fit1, coef=1, adjust="BH", number=100, sort.by="T")))
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE
logical 100
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
dim(comb_list)
[1] 100 2
write.table(comb_list$ENSEMBL, "../data/DAVID_top100_list_T1T2.txt", quote = F, row.names = F, col.names = F)
Step 4: Get ENSEMBL Gene IDs for T2 to T3 list
# Get gene names after filtering
genes=as.data.frame(rownames(Time2v3[which(Time2v3$adj.P.Val < FDR_level),]))
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 2284 17
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
dim(comb_list)
[1] 2284 2
write.table(comb_list$ENSEMBL, "../data/DAVID_list_T2T3.txt", quote = F, row.names = F, col.names = F)
# Kim et al used the top 100 genes
genes=as.data.frame(rownames(topTable(fit1, coef=2, adjust="BH", number=100, sort.by="T")))
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 100 1
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
dim(comb_list)
[1] 100 2
write.table(comb_list$ENSEMBL, "../data/DAVID_top100_list_T2T3.txt", quote = F, row.names = F, col.names = F)
Step 5: Upload or copy and paste the gene lists to DAVID at https://david.ncifcrf.gov/conversion.jsp. Note, larger gene lists are more likely to need to be copied and pasted in the “Upload” tab, rather than uploaded.
# Load the package
library(WGCNA)
Loading required package: dynamicTreeCut
Loading required package: fastcluster
Warning: package 'fastcluster' was built under R version 3.4.4
Attaching package: 'fastcluster'
The following object is masked from 'package:stats':
hclust
==========================================================================
*
* Package WGCNA 1.63 loaded.
*
==========================================================================
Attaching package: 'WGCNA'
The following object is masked from 'package:stats':
cor
library(AnnotationDbi)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from 'package:limma':
plotMA
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, append, as.data.frame, cbind, colMeans,
colnames, colSums, do.call, duplicated, eval, evalq, Filter,
Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
rowMeans, rownames, rowSums, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
library(anRichment)
Loading required package: GO.db
Attaching package: 'anRichment'
The following object is masked from 'package:WGCNA':
userListEnrichment
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE)
enableWGCNAThreads()
Allowing parallel execution with up to 7 working processes.
allowWGCNAThreads()
Allowing multi-threading with up to 8 threads.
# Pull the normalized gene expression for DE genes
genes12=as.data.frame(rownames(Time1v2[which(Time1v2$adj.P.Val < FDR_level),]))
colnames(genes12) <- c("DE_genes")
genes23=as.data.frame(rownames(Time2v3[which(Time2v3$adj.P.Val < FDR_level),]))
colnames(genes23) <- c("DE_genes")
de_genes <- rbind(genes12, genes23)
check_gene <- rownames(cpm.voom.corfit$E) %in% de_genes$DE_genes
check_gene <- as.data.frame(check_gene)
pair_gene <- cbind(cpm.voom.corfit$E, check_gene)
norm_exp_T1T2 <- pair_gene[which(pair_gene$check_gene == TRUE),1:144]
dim(norm_exp_T1T2) #There are fewer rows than total DE genes because some genes are DE between T1 and T2 as well as T2 and T3
[1] 2586 144
# Reshape the data so that we have 1 gene measurement per column and each sample name per row.
transposed_norm_exp_T1T2 <- as.data.frame(t(norm_exp_T1T2))
dim(transposed_norm_exp_T1T2)
[1] 144 2586
Here, we have RNA seq data from 144 samples.
This step is present in the WGCNA tutorial and is useful when using microarray data or unfiltered RNA seq data(which we are not).
# Rename so it matches
datExpr0 <- transposed_norm_exp_T1T2
gsg = goodSamplesGenes(datExpr0, verbose = 3);
Flagging genes and samples with too many missing values...
..step 1
gsg$allOK
[1] TRUE
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
All of our samples passed this filtering step.
This clustering is based on gene expression data for each sample.
sampleTree = hclust(dist(datExpr0), method = "average");
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 15, col = "red")
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 1)
table(clust)
clust
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# NOTE: In the WGCNA tutorial, cutHeight = 15. To start, we want to use all of the samples. Therefore, we will use the command below to maintain all samples
datExpr = datExpr0
datTraits = clinical_sample
datTraits[,1] <- as.numeric(datTraits[,1])
datTraits[,2] <- as.numeric(datTraits[,2])
datTraits[,4] <- as.numeric(datTraits[,4])
datTraits[,5] <- as.numeric(datTraits[,5])
datTraits[,6] <- as.numeric(datTraits[,6])
dim(datTraits)
[1] 144 12
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")
save(datExpr, datTraits, file = "../data/FemaleWeightRestoration-01-dataInput.RData")
Legend: white means low, red means high, grey means missing entry
# Load the data saved in the previous section
lnames = load(file = "../data/FemaleWeightRestoration-01-dataInput.RData")
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 2586.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 2586 of 2586
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.3930 1.680 0.856 706.00 703.000 1080.0
2 2 0.0227 -0.248 0.801 290.00 287.000 613.0
3 3 0.3240 -0.953 0.887 145.00 138.000 394.0
4 4 0.5150 -1.270 0.921 82.10 72.300 271.0
5 5 0.5990 -1.460 0.911 50.50 40.900 194.0
6 6 0.6360 -1.460 0.890 33.00 24.100 143.0
7 7 0.7890 -1.280 0.945 22.60 14.600 109.0
8 8 0.9040 -1.250 0.989 16.10 9.240 86.9
9 9 0.9260 -1.350 0.996 11.80 6.030 76.4
10 10 0.9360 -1.400 0.993 8.89 4.010 67.7
11 12 0.9550 -1.470 0.986 5.36 1.850 54.1
12 14 0.9720 -1.470 0.984 3.44 0.893 43.9
13 16 0.9740 -1.460 0.980 2.32 0.462 36.2
14 18 0.9830 -1.450 0.988 1.63 0.246 30.3
15 20 0.9770 -1.420 0.979 1.18 0.128 25.7
# Plot the results:
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers,col="red")
softPower = 8;
adjacency = adjacency(datExpr, power = softPower)
# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency);
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
dissTOM = 1-TOM
geneTree = hclust(as.dist(dissTOM), method = "average");
# Plot the resulting clustering tree (dendrogram)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04)
# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30;
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
..cutHeight not given, setting it to 0.997 ===> 99% of the (truncated) height range in dendro.
..done.
table(dynamicMods)
dynamicMods
0 1 2 3 4 5 6 7 8 9 10 11 12 13
56 395 390 266 224 186 185 164 155 150 137 108 95 75
# Give each module a color
# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
dynamicColors
black blue brown green greenyellow grey
164 390 266 186 108 56
magenta pink purple red salmon tan
150 155 137 185 75 95
turquoise yellow
395 224
# Plot the dendrogram and colors underneath
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Gene dendrogram and module colors")
module_colors= setdiff(unique(dynamicColors), "grey")
for (color in module_colors){
module=colnames(datExpr0)[which(dynamicColors==color)]
# Convert to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(module)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
comb_list <- comb_list$ENSEMBL
write.table(comb_list, paste("../data/module_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
Each color represents a different module.
# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
rownames(MEs) <- rownames(datExpr)
# Save the eigengenes
write.table(MEs, "../data/eigengenes_unadj_exp_9_modules.txt", quote = F)
# Calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# Plot the result
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")
MEDissThres = 0.25
# Plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
mergeCloseModules: Merging modules whose distance is less than 0.25
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 14 module eigengenes in given set.
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 11 module eigengenes in given set.
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 10 module eigengenes in given set.
Calculating new MEs...
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 10 module eigengenes in given set.
# The merged module colors
mergedColors = merge$colors;
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs;
#pdf(file = "Plots/geneDendro-3.pdf", wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
# Get the number of genes per
summary(as.factor(merge$colors))
black blue brown green greenyellow grey
164 390 490 861 108 56
magenta pink purple salmon
150 155 137 75
# Save the eigengenes
rownames(mergedMEs) <- rownames(datExpr)
write.table(mergedMEs, "../data/eigengenes_unadj_exp_10_modules.txt", quote = F)
# Convert the genes in each cluster, then save the genes in each cluster
module_colors= setdiff(unique(merge$colors), "grey")
for (color in module_colors){
module=colnames(datExpr0)[which(merge$colors==color)]
# Convert to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(module)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
comb_list <- comb_list$ENSEMBL
write.table(comb_list, paste("../data/module_merged_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
# Make background
wgcna_background <- rownames(norm_exp_T1T2)
# Convert background to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(wgcna_background)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 2586 17
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
write.table(comb_list$ENSEMBL, "../data/eigengenes_module_background.txt", quote = F, row.names = F, col.names = F)
Follow directions for using DAVID, using the genes in each module as the gene list and the eigengenes_module_background as the background.
# Pull the normalized gene expression for DE genes
genes12=as.data.frame(rownames(Time1v2[which(Time1v2$adj.P.Val < FDR_level),]))
colnames(genes12) <- c("DE_genes")
genes23=as.data.frame(rownames(Time2v3[which(Time2v3$adj.P.Val < FDR_level),]))
colnames(genes23) <- c("DE_genes")
de_genes <- rbind(genes12, genes23)
check_gene <- rownames(cpm.voom.corfit$E) %in% de_genes$DE_genes
check_gene <- as.data.frame(check_gene)
pair_gene <- cbind(cpm.voom.corfit$E, check_gene)
norm_exp_T1T2 <- pair_gene[which(pair_gene$check_gene == TRUE),1:144]
dim(norm_exp_T1T2) #There are fewer rows than total DE genes because some genes are DE between T1 and T2 as well as T2 and T3
[1] 2586 144
# Reshape the data so that we have 1 gene measurement per column and each sample name per row.
transposed_norm_exp_T1T2 <- as.data.frame(t(norm_exp_T1T2))
dim(transposed_norm_exp_T1T2)
[1] 144 2586
resid_exprs <- array(0, dim = dim(norm_exp_T1T2))
for (i in 1:nrow(norm_exp_T1T2)){
resid_exprs[i,] <- lm(t(norm_exp_T1T2[i,]) ~ as.factor(clinical_sample$Race) + clinical_sample$Age + as.factor(clinical_sample$BE) + as.factor(clinical_sample$psychmeds) + clinical_sample$RBC + clinical_sample$AN + clinical_sample$AE + clinical_sample$AL + clinical_sample$RIN)$resid
}
rownames(resid_exprs) <- colnames(transposed_norm_exp_T1T2)
# Transpose so in the correct format
transposed_norm_exp_T1T2 <- as.data.frame(t(resid_exprs))
Here, we have RNA seq data from 144 samples.
# Rename so it matches
datExpr0 <- transposed_norm_exp_T1T2
gsg = goodSamplesGenes(datExpr0, verbose = 3);
Flagging genes and samples with too many missing values...
..step 1
gsg$allOK
[1] TRUE
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
All of our samples passed this filtering step.
This clustering is based on gene expression data for each sample.
sampleTree = hclust(dist(datExpr0), method = "average");
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 15, col = "red")
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 1)
table(clust)
clust
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# NOTE: In the WGCNA tutorial, cutHeight = 15. To start, we want to use all of the samples. Therefore, we will use the command below to maintain all samples
datExpr = datExpr0
datTraits = clinical_sample
datTraits[,1] <- as.numeric(datTraits[,1])
datTraits[,2] <- as.numeric(datTraits[,2])
datTraits[,4] <- as.numeric(datTraits[,4])
datTraits[,5] <- as.numeric(datTraits[,5])
datTraits[,6] <- as.numeric(datTraits[,6])
dim(datTraits)
[1] 144 12
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")
save(datExpr, datTraits, file = "../data/FemaleWeightRestoration-resid-01-dataInput.RData")
Legend: white means low, red means high, grey means missing entry
# Load the data saved in the previous section
lnames = load(file = "../data/FemaleWeightRestoration-resid-01-dataInput.RData")
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 2586.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 2586 of 2586
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.2750 1.19 0.890 700.000 691.0000 1120.0
2 2 0.0984 -0.49 0.818 286.000 272.0000 646.0
3 3 0.4400 -1.13 0.890 142.000 131.0000 420.0
4 4 0.5870 -1.32 0.945 80.100 69.7000 291.0
5 5 0.6680 -1.48 0.945 48.900 38.9000 211.0
6 6 0.7050 -1.54 0.940 31.700 22.3000 157.0
7 7 0.7420 -1.52 0.941 21.600 13.5000 119.0
8 8 0.8000 -1.44 0.948 15.200 8.3300 92.1
9 9 0.9020 -1.32 0.983 11.000 5.3100 72.2
10 10 0.9090 -1.36 0.974 8.230 3.4100 61.6
11 12 0.9400 -1.43 0.993 4.860 1.4800 48.6
12 14 0.9490 -1.46 0.988 3.060 0.6960 39.1
13 16 0.9480 -1.50 0.975 2.030 0.3370 31.9
14 18 0.9590 -1.48 0.980 1.400 0.1720 26.3
15 20 0.9540 -1.46 0.966 0.998 0.0896 21.9
# Plot the results:
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers,col="red")
softPower = 9;
adjacency = adjacency(datExpr, power = softPower)
# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency);
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
dissTOM = 1-TOM
geneTree = hclust(as.dist(dissTOM), method = "average");
# Plot the resulting clustering tree (dendrogram)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04)
# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30;
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
..cutHeight not given, setting it to 0.998 ===> 99% of the (truncated) height range in dendro.
..done.
table(dynamicMods)
dynamicMods
0 1 2 3 4 5 6 7 8 9 10 11 12 13
180 574 370 280 170 159 157 138 131 109 100 78 77 63
# Give each module a color
# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
dynamicColors
black blue brown green greenyellow grey
138 370 280 159 78 180
magenta pink purple red salmon tan
109 131 100 157 63 77
turquoise yellow
574 170
# Plot the dendrogram and colors underneath
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Gene dendrogram and module colors")
module_colors= setdiff(unique(dynamicColors), "grey")
for (color in module_colors){
module=colnames(datExpr0)[which(dynamicColors==color)]
# Convert to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(module)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
comb_list <- comb_list$ENSEMBL
write.table(comb_list, paste("../data/module_cov_adj_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
Each color represents a different module.
# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
rownames(MEs) <- colnames(norm_exp_T1T2)
# Save the eigengenes
write.table(MEs, "../data/eigengenes_cov_adj_exp_14_modules.txt", quote = F)
# Calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# Plot the result
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")
MEDissThres = 0.25
# Plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
mergeCloseModules: Merging modules whose distance is less than 0.25
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 14 module eigengenes in given set.
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 9 module eigengenes in given set.
Calculating new MEs...
multiSetMEs: Calculating module MEs.
Working on set 1 ...
moduleEigengenes: Calculating 9 module eigengenes in given set.
# The merged module colors
mergedColors = merge$colors;
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs;
#pdf(file = "Plots/geneDendro-3.pdf", wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
# Get the number of genes per
summary(as.factor(merge$colors))
blue brown green greenyellow grey red
370 280 290 187 180 731
salmon tan yellow
301 77 170
# Save the eigengenes
rownames(mergedMEs) <- colnames(norm_exp_T1T2)
write.table(mergedMEs, "../data/eigengenes_adj_exp_7_modules.txt", quote = F)
# Convert the genes in each cluster, then save the genes in each cluster
module_colors= setdiff(unique(merge$colors), "grey")
for (color in module_colors){
module=colnames(datExpr0)[which(merge$colors==color)]
# Convert to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(module)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
comb_list <- comb_list$ENSEMBL
write.table(comb_list, paste("../data/module_adj_cov_merged_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
# Make background
wgcna_background <- rownames(norm_exp_T1T2)
# Convert background to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(wgcna_background)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 2586 17
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
write.table(comb_list$ENSEMBL, "../data/eigengenes_module_background.txt", quote = F, row.names = F, col.names = F)
# Pull the normalized gene expression for DE genes
genes12=as.data.frame(rownames(Time1v2[which(Time1v2$adj.P.Val < FDR_level),]))
colnames(genes12) <- c("DE_genes")
de_genes <- genes12
check_gene <- rownames(cpm.voom.corfit$E) %in% de_genes$DE_genes
check_gene <- as.data.frame(check_gene)
pair_gene <- cbind(cpm.voom.corfit$E, check_gene)
norm_exp_T1T2 <- pair_gene[which(pair_gene$check_gene == TRUE),1:144]
dim(norm_exp_T1T2) #There are fewer rows than total DE genes because some genes are DE between T1 and T2 as well as T2 and T3
[1] 551 144
# Reshape the data so that we have 1 gene measurement per column and each sample name per row.
transposed_norm_exp_T1T2 <- as.data.frame(t(norm_exp_T1T2))
dim(transposed_norm_exp_T1T2)
[1] 144 551
resid_exprs <- array(0, dim = dim(norm_exp_T1T2))
for (i in 1:nrow(norm_exp_T1T2)){
resid_exprs[i,] <- lm(t(norm_exp_T1T2[i,]) ~ as.factor(clinical_sample$Race) + clinical_sample$Age + as.factor(clinical_sample$BE) + as.factor(clinical_sample$psychmeds) + clinical_sample$RBC + clinical_sample$AN + clinical_sample$AE + clinical_sample$AL + clinical_sample$RIN)$resid
}
rownames(resid_exprs) <- colnames(transposed_norm_exp_T1T2)
# Transpose so in the correct format
transposed_norm_exp_T1T2 <- as.data.frame(t(resid_exprs))
Here, we have RNA seq data from 144 samples.
# Rename so it matches
datExpr0 <- transposed_norm_exp_T1T2
gsg = goodSamplesGenes(datExpr0, verbose = 3);
Flagging genes and samples with too many missing values...
..step 1
gsg$allOK
[1] TRUE
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
All of our samples passed this filtering step.
This clustering is based on gene expression data for each sample.
sampleTree = hclust(dist(datExpr0), method = "average");
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 15, col = "red")
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 1)
table(clust)
clust
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
23 21 19 13 11 8 4 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# NOTE: In the WGCNA tutorial, cutHeight = 15. To start, we want to use all of the samples. Therefore, we will use the command below to maintain all samples
datExpr = datExpr0
datTraits = clinical_sample
datTraits[,1] <- as.numeric(datTraits[,1])
datTraits[,2] <- as.numeric(datTraits[,2])
datTraits[,4] <- as.numeric(datTraits[,4])
datTraits[,5] <- as.numeric(datTraits[,5])
datTraits[,6] <- as.numeric(datTraits[,6])
dim(datTraits)
[1] 144 12
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")
save(datExpr, datTraits, file = "../data/FemaleWeightRestoration-resid-T1T2-01-dataInput.RData")
Legend: white means low, red means high, grey means missing entry
# Load the data saved in the previous section
lnames = load(file = "../data/FemaleWeightRestoration-resid-T1T2-01-dataInput.RData")
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 551.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 551 of 551
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.841 1.320 0.8080 235.00 257.000 341.0
2 2 0.234 0.207 0.0838 134.00 146.000 244.0
3 3 0.284 -0.197 0.5280 87.60 88.500 189.0
4 4 0.571 -0.368 0.6240 61.50 57.100 153.0
5 5 0.816 -0.466 0.8080 45.20 37.100 127.0
6 6 0.896 -0.535 0.8740 34.30 25.300 107.0
7 7 0.925 -0.624 0.9030 26.70 17.300 91.8
8 8 0.947 -0.673 0.9350 21.20 12.200 79.7
9 9 0.866 -0.755 0.8510 17.00 8.630 69.8
10 10 0.882 -0.800 0.8810 13.90 6.230 61.5
11 12 0.933 -0.869 0.9420 9.52 3.500 48.6
12 14 0.925 -0.919 0.9420 6.75 2.010 39.1
13 16 0.902 -0.996 0.9190 4.92 1.180 31.9
14 18 0.919 -1.020 0.9470 3.66 0.727 26.3
15 20 0.915 -1.030 0.9360 2.78 0.444 21.9
# Plot the results:
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers,col="red")
softPower = 6;
adjacency = adjacency(datExpr, power = softPower)
# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency);
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
dissTOM = 1-TOM
geneTree = hclust(as.dist(dissTOM), method = "average");
# Plot the resulting clustering tree (dendrogram)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04)
# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30;
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
..cutHeight not given, setting it to 0.996 ===> 99% of the (truncated) height range in dendro.
..done.
table(dynamicMods)
dynamicMods
0 1 2 3
6 426 63 56
# Give each module a color
# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
dynamicColors
blue brown grey turquoise
63 56 6 426
# Plot the dendrogram and colors underneath
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Gene dendrogram and module colors")
module_colors= setdiff(unique(dynamicColors), "grey")
for (color in module_colors){
module=colnames(datExpr0)[which(dynamicColors==color)]
# Convert to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(module)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
comb_list <- comb_list$ENSEMBL
write.table(comb_list, paste("../data/module_T1T2_cov_adj_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
Each color represents a different module.
# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
rownames(MEs) <- colnames(norm_exp_T1T2)
# Save the eigengenes
write.table(MEs, "../data/eigengenes_T1_T2_cov_adj_exp_5_modules.txt", quote = F)
# Make background
wgcna_background <- rownames(norm_exp_T1T2)
# Convert background to Ensembl gene ID
# Get gene names after filtering
genes=as.data.frame(wgcna_background)
colnames(genes) <- c("Gene_name")
# Get gene names of the list
comb_list <- merge(genes, gene_id, by = c("Gene_name"))
summary(duplicated(comb_list$Gene_name))
Mode FALSE TRUE
logical 551 1
comb_list <- comb_list[!duplicated(comb_list$Gene_name),]
write.table(comb_list$ENSEMBL, "../data/eigengenes_T1_T2_module_background.txt", quote = F, row.names = F, col.names = F)
# Read in the data
tx.salmon <- readRDS("../data/counts_hg38_gc_txsalmon.RData")
salmon_counts<- as.data.frame(tx.salmon$counts)
select_samples <- c(4, 5, 6, 145, 146, 23, 24, 25, 147, 148, 37, 38, 39, 149, 150, 43, 44, 45, 151, 152, 54, 55, 56, 153, 154, 57, 58, 59, 155, 156)
#salmon_counts <- salmon_counts[,select_samples]
# Read in the clinical covariates
clinical_sample_info <- read.csv("../data/lm_covar_fixed_random.csv")
dim(clinical_sample_info)
[1] 156 13
# Subset to T1-T5
clinical_sample <- clinical_sample_info[,(-12)]
dim(clinical_sample)
[1] 156 12
# Filter lowly expressed reads
cpm <- cpm(salmon_counts, log=TRUE)
expr_cutoff <- 1.5
hist(cpm, main = "log2(CPM) values in unfiltered data", breaks = 100, xlab = "log2(CPM) values")
abline(v = expr_cutoff, col = "red", lwd = 3)
hist(cpm, main = "log2(CPM) values in unfiltered data", breaks = 100, xlab = "log2(CPM) values", ylim = c(0, 100000))
abline(v = expr_cutoff, col = "red", lwd = 3)
# Basic filtering
cpm_filtered <- (rowSums(cpm > 1.5) > 78)
genes_in_cutoff <- cpm[cpm_filtered==TRUE,]
hist(as.numeric(unlist(genes_in_cutoff)), main = "log2(CPM) values in filtered data", breaks = 100, xlab = "log2(CPM) values")
# Find the original counts of all of the genes that fit the criteria
counts_genes_in_cutoff <- salmon_counts[cpm_filtered==TRUE,]
dim(counts_genes_in_cutoff)
[1] 11507 156
# Filter out hemoglobin
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBB" ),]
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBA2" ),]
counts_genes_in_cutoff <- counts_genes_in_cutoff[which( rownames(counts_genes_in_cutoff) != "HBA1" ),]
# Take the TMM of the counts only for the genes that remain after filtering
dge_in_cutoff <- DGEList(counts=as.matrix(counts_genes_in_cutoff), genes=rownames(counts_genes_in_cutoff), group = as.character(t(clinical_sample$Individual)))
dge_in_cutoff <- calcNormFactors(dge_in_cutoff)
cpm_in_cutoff <- cpm(dge_in_cutoff, normalized.lib.sizes=TRUE, log=TRUE)
# Create the design matrix
# Use the standard treatment-contrasts parametrization. See Ch. 9 of limma
# User's Guide.
design <- model.matrix(~as.factor(Time), data = clinical_sample)
colnames(design) <- c("Intercept", "Time2", "Time3", "Time4", "Time5")
# Fit model
# Model individual as a random effect.
# Recommended to run both voom and duplicateCorrelation twice.
# https://support.bioconductor.org/p/59700/#67620
cpm.voom <- voom(dge_in_cutoff, design, normalize.method="none")
#check_rel <- duplicateCorrelation(cpm.voom, design, block = clinical_sample$Individual)
check_rel_correlation <- 0.1708146
cpm.voom.corfit <- voom(dge_in_cutoff, design, normalize.method="none", block = clinical_sample$Individual, correlation = check_rel_correlation)
# Pull the gene expression values
gene_exp_values <- cpm.voom.corfit$E
gene_exp_values12 <- gene_exp_values[,select_samples]
# Get the first
gene_exp_values_2202 <- gene_exp_values12[,1:5]
gene_exp_values_2202 <- cbind(rownames(gene_exp_values_2202), gene_exp_values_2202)
colnames(gene_exp_values_2202) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
gene_exp_values_2209 <- gene_exp_values12[,6:10]
gene_exp_values_2209 <- cbind(rownames(gene_exp_values_2209), gene_exp_values_2209)
colnames(gene_exp_values_2209) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
gene_exp_values_2218 <- gene_exp_values12[,11:15]
gene_exp_values_2218 <- cbind(rownames(gene_exp_values_2218), gene_exp_values_2218)
colnames(gene_exp_values_2218) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
gene_exp_values_2220 <- gene_exp_values12[,16:20]
gene_exp_values_2220 <- cbind(rownames(gene_exp_values_2220), gene_exp_values_2220)
colnames(gene_exp_values_2220) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
gene_exp_values_2226 <- gene_exp_values12[,21:25]
gene_exp_values_2226 <- cbind(rownames(gene_exp_values_2226), gene_exp_values_2226)
colnames(gene_exp_values_2226) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
gene_exp_values_2228 <- gene_exp_values12[,26:30]
gene_exp_values_2228 <- cbind(rownames(gene_exp_values_2228), gene_exp_values_2228)
colnames(gene_exp_values_2228) <- c("Gene Symbol", "T1", "T2", "T3", "T4", "T5")
write.table(gene_exp_values_2202, "../data/gene_exp_values_2202.txt", row.names = FALSE, quote = FALSE, sep = "\t")
write.table(gene_exp_values_2209, "../data/gene_exp_values_2209.txt", row.names = FALSE, quote = FALSE, sep = "\t")
write.table(gene_exp_values_2218, "../data/gene_exp_values_2218.txt", row.names = FALSE, quote = FALSE, sep = "\t")
write.table(gene_exp_values_2220, "../data/gene_exp_values_2220.txt", row.names = FALSE, quote = FALSE, sep = "\t")
write.table(gene_exp_values_2226, "../data/gene_exp_values_2226.txt", row.names = FALSE, quote = FALSE, sep = "\t")
write.table(gene_exp_values_2228, "../data/gene_exp_values_2228.txt", row.names = FALSE, quote = FALSE, sep = "\t")
# Pull the clinical values
clinical_sample12 <- clinical_sample[select_samples,]
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] anRichment_0.82-1 GO.db_3.5.0 AnnotationDbi_1.40.0
[4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
[7] BiocGenerics_0.24.0 WGCNA_1.63 fastcluster_1.1.25
[10] dynamicTreeCut_1.63-1 cowplot_0.9.3 ggplot2_3.0.0
[13] VennDiagram_1.6.20 futile.logger_1.4.3 edgeR_3.20.9
[16] limma_3.34.9
loaded via a namespace (and not attached):
[1] matrixStats_0.54.0 fit.models_0.5-14 robust_0.4-18
[4] bit64_0.9-7 doParallel_1.0.11 RColorBrewer_1.1-2
[7] rprojroot_1.3-2 tools_3.4.3 backports_1.1.2
[10] R6_2.2.2 rpart_4.1-13 Hmisc_4.1-1
[13] DBI_1.0.0 lazyeval_0.2.1 colorspace_1.3-2
[16] nnet_7.3-12 withr_2.1.2 tidyselect_0.2.4
[19] gridExtra_2.3 bit_1.1-14 compiler_3.4.3
[22] git2r_0.23.0 preprocessCore_1.40.0 formatR_1.5
[25] htmlTable_1.12 labeling_0.3 scales_1.0.0
[28] checkmate_1.8.5 mvtnorm_1.0-8 DEoptimR_1.0-8
[31] robustbase_0.93-1.1 stringr_1.3.1 digest_0.6.16
[34] foreign_0.8-71 rmarkdown_1.10 R.utils_2.6.0
[37] rrcov_1.4-4 base64enc_0.1-3 pkgconfig_2.0.2
[40] htmltools_0.3.6 htmlwidgets_1.2 rlang_0.2.2
[43] rstudioapi_0.7 RSQLite_2.1.1 impute_1.52.0
[46] bindr_0.1.1 acepack_1.4.1 dplyr_0.7.6
[49] R.oo_1.22.0 magrittr_1.5 Formula_1.2-3
[52] Matrix_1.2-14 Rcpp_0.12.18 munsell_0.5.0
[55] R.methodsS3_1.7.1 stringi_1.2.4 whisker_0.3-2
[58] yaml_2.2.0 MASS_7.3-50 plyr_1.8.4
[61] blob_1.1.1 crayon_1.3.4 lattice_0.20-35
[64] splines_3.4.3 locfit_1.5-9.1 knitr_1.20
[67] pillar_1.3.0 codetools_0.2-15 futile.options_1.0.1
[70] glue_1.3.0 evaluate_0.11 latticeExtra_0.6-28
[73] lambda.r_1.2.3 data.table_1.11.4 foreach_1.4.4
[76] gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0
[79] pcaPP_1.9-73 survival_2.42-6 tibble_1.4.2
[82] iterators_1.0.10 memoise_1.1.0 workflowr_1.1.1
[85] bindrcpp_0.2.2 cluster_2.0.7-1
This reproducible R Markdown analysis was created with workflowr 1.1.1