library(dplyr)
library(ggplot2)
library(magrittr)
library(DT)

options(stringsAsFactors = FALSE)

write.delim <- function(x, file, sep='\t', quote = FALSE, row.names=FALSE, na = '', ...) {
  write.table(x = x, file = file, sep=sep, quote=quote, row.names=row.names, na=na, ...)
}

Here, we parse the MEDI resource (1), which can be downloaded here. The resource is described (unless noted, blockquotes refer to material from the MEDI publication):

We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication–indication pairs that optimizes recall while maintaining high precision.

High precision subset (HPS)

The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13,304 unique indication pairs regarding 2,136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22±6.09. The estimated precision of MEDI-HPS is 92%.

# Read the HPS dataset
hps.df <- file.path('download', 'MEDI_01212013_HPS_0.csv') %>%
  read.csv(blank.lines.skip = TRUE, colClasses = 'character') %>%
  dplyr::transmute(
    rxnorm_id = RXCUI_IN,
    drug_name = DRUG_DESC,
    disease_icd9 = ICD9,
    disease_name = INDICATION_DESCRIPTION,
    on_label = as.integer(POSSIBLE_LABEL_USE)
    )

# Calculate HPS counts
hps_count.df <- hps.df %>%
  dplyr::distinct(rxnorm_id, disease_icd9) %>%
  dplyr::summarize(
    resource = 'hps',
    medications = n_distinct(rxnorm_id),
    diseases = n_distinct(disease_icd9),
    indications = n())

# Display the HPS
hps.df %>% DT::datatable()