Introduction

For an introduction to the upbm package, please see the quick start vignette (vignette("upbm-quickstart")). Here, we provide details on the various testing procedures briefly referenced in the quick start vignette. We assume basic knowledge of the PBMExperiment class (see vignette("upbm-classes")). The same example HOXC9 dataset from the upbmData package is also used in this vignette to illustrate the various functions.

The upbm package includes three testing functions which all take the output of upbm::kmerFit and return a table of 8-mer-level effect sizes and significance estimates. In the quick start vignette, only a single testing procedure was covered, namely, differential affinity testing using upbm::kmerTestContrast. The three testing procedures are:

preferential affinity testing (upbm::kmerTestAffinity),
differential affinity testing (upbm::kmerTestContrast),
differential specificity testing (upbm::kmerTestSpecificity).

We cover each procedure in detail in this vignette.

suppressPackageStartupMessages(library("upbm"))

Throughout, we will be using broom::tidy to tidy data stored in PBMExperiment and SummarizedExperiment objects for interactive analysis and plotting (see vignette("other-tidydata")). We load the dplyr package to help with interactive analysis.

suppressPackageStartupMessages(library("dplyr"))
suppressPackageStartupMessages(library("ggplot2"))

HOXC9 Dataset

For details on the example HOXC9 dataset, see the quick start vignette in this package or the upbmData package vignette. Here, we will use both Alexa488 and Cy3 scans.

data(hoxc9alexa, package = "upbmData")
data(hoxc9cy3, package = "upbmData")

Here, we again use just single PMT gain scan for the Alexa488 scans.

alexa_subset <- hoxc9alexa[, colData(hoxc9alexa)$pmt == 450]

We also load the reference Cy3 dataset.

data(refcy3_8x60k_v1, package = "upbmAux")

Sample Processing

We will perform the earlier steps of pre-processing using the upbm::pmbPreprocess wrapper. Details on pre-processing steps are covered in an earlier vignette (see vignette("analysis-preprocessing").

alexa_subset <- upbmPreprocess(pe = alexa_subset,
                               cy3pe = hoxc9cy3,
                               cy3refpe = refcy3_8x60k_v1,
                               verbose = FALSE)

We will further summarize results to 8-mer-level affinity estimates with HOXC9-REF as the baseline condition. As above, details on summarization steps are covered in an earlier vignette (see vignette("analysis-summarization").

alexa_pfit <- probeFit(alexa_subset)
alexa_kfit <- kmerFit(alexa_pfit, baseline = "HOXC9-REF")

Preferential Affinity

In contrast to the two other testing procedures which consider differential behavior across conditions, preferential affinity considers each condition separately. For each condition, we apply a testing procedure to determine the set of 8-mers preferentially bound by the condition. The upbm::kmerTestAffinity function only takes the single PBMExperiment returned by upbm::kmerFit as input.

alexa_pa <- kmerTestAffinity(alexa_kfit)
alexa_pa

## class: SummarizedExperiment 
## dim: 32896 4 
## metadata(0):
## assays(6): affinityEstimate affinityVariance ... affinityP
##   affinityQ
## rownames: NULL
## rowData names(1): seq
## colnames(4): HOXC9-K195R HOXC9-R193K HOXC9-R222W HOXC9-REF
## colData names(0):

The returned PBMExperiment object contains the results of affinity testing in several assays along with many of the same assays as the input.

assayNames(alexa_pa)

## [1] "affinityEstimate" "affinityVariance" "affinitySignal"  
## [4] "affinityZ"        "affinityP"        "affinityQ"

Importantly, the new affinitySignal assay contains the estimated “signal” component of the K-mer affinity (affinityEstimate assay). The signal component is computed by fitting a normal and expontential convolution to the K-mer affinity estimates and taking the expected exponential component. This is similar to a step included in the RMA approach to microarray data normalization (Irizarry et al. 2003). The affinityZ, affinityP and affinityQ, contain the z-score, un-adjusted and adjusted p-values from testing the null hypothesis that affinitySignal is zero.

The distribution of affinityEstimates can be colored by the adjusted p-values (affinityQ) to show the distribution of preferentially bound 8-mers in each condition. We bin the significance values for cleaner plotting.

pa_dat <- broom::tidy(alexa_pa, c("affinityEstimate", "affinityQ"))

pa_dat %>%
    dplyr::mutate(affinityQ = cut(affinityQ, c(1e0, 1e-2, 1e-3, 1e-6, 0),
                                  include.lowest = TRUE),
                  affinityQ = factor(affinityQ, levels = rev(levels(affinityQ)))) %>%
    ggplot(aes(x = affinityEstimate, fill = affinityQ)) +
    geom_histogram(binwidth = .1, boundary = 10, color = 'black', position = "stack", alpha = 1) +
    scale_fill_brewer("q-value", palette = "Spectral", direction = -1, drop = FALSE,
                      na.value = "black") +
    theme_bw() +
    facet_wrap( ~ cname, nrow = 1, scales = "free_y") +
    ggtitle("HOXC9 8-mer preferential affinity")

Differential Affinity

Differential affinity is likely to most intuitive of the three testing procedures. With differential affinity, we are interested in identifying the set of 8-mers with differential signal across PBM samples between the specified baseline and all other conditions. The underlying hypothesis we test here is whether the estimated affinity differences are zero or not.

The upbm::kmerTestContrast, again, only takes the single PBMExperiment returned by upbm::kmerFit as input.

alexa_da <- kmerTestContrast(alexa_kfit)
alexa_da

## class: SummarizedExperiment 
## dim: 32896 4 
## metadata(0):
## assays(6): contrastAverage contrastDifference ... contrastP
##   contrastQ
## rownames: NULL
## rowData names(1): seq
## colnames(4): HOXC9-K195R HOXC9-R193K HOXC9-R222W HOXC9-REF
## colData names(0):

The returned PBMExperiment object contains the results of differential affinity testing a few new assays. The contrastZ, contrastP, and contrastQ contain the z-score, and unadjusted and adjusted p-values for the test at each 8-mer.

assayNames(alexa_da)

## [1] "contrastAverage"    "contrastDifference" "contrastVariance"  
## [4] "contrastZ"          "contrastP"          "contrastQ"

Here, the z-score is simply the ratio of the contrastDifference and contrastVariance computed during the upbm::kmerFit call. We can again plot the results as an MA plot and color 8-mers by significance level. We also remember to exclude the baseline condition, HOXC9-REF, from the table as all of the values are NAs.

da_dat <- broom::tidy(alexa_da, c("contrastAverage", "contrastDifference",
                                  "contrastQ"))

da_dat <- da_dat %>%
    dplyr::filter(cname != "HOXC9-REF") %>%
    dplyr::mutate(contrastQ = cut(contrastQ, c(1e0, 1e-2, 1e-3, 1e-6, 0),
                                  include.lowest = TRUE),
                  contrastQ = factor(contrastQ, levels = rev(levels(contrastQ))))

da_dat %>%
    ggplot(aes(x = contrastAverage, y = contrastDifference,
               color = contrastQ)) +
    geom_point(alpha = 1/10) +
    scale_color_brewer("q-value", palette = "Spectral", direction = -1, drop = FALSE,
                       na.value = "black") +
    geom_hline(color = 'gray', yintercept = 0) + 
    theme_bw() +
    guides(color = guide_legend(override.aes = list(alpha = 1))) +
    facet_grid(. ~ cname) +
    ggtitle("HOXC9 8-mer differential affinity")

We can see that few 8-mers are called differential for most of the conditions. More formally, we can count up the number of 8-mers at each of the different significance thresholds above.

da_dat %>%
    dplyr::count(cname, contrastQ) %>%
    tidyr::spread(contrastQ, n, fill = 0)

## # A tibble: 3 x 5
##   cname       `(0.01,1]` `(0.001,0.01]` `(1e-06,0.001]` `[0,1e-06]`
##   <chr>            <dbl>          <dbl>           <dbl>       <dbl>
## 1 HOXC9-K195R      32885              7               3           1
## 2 HOXC9-R193K      32882             14               0           0
## 3 HOXC9-R222W      20623           2991            4075        5207

Differential Specificity

Differential specificity, while similar to affinity is treated separately. In some comparisons, the two proteins being contrasted may have completely different binding preferences, preferring different subsets of 8-mers. We call this phenomenon differential specificity.

In the MA plot above, we see that the general trend for all contrasts is a single trajectory, even for the R222W allele. This is the result of shared specificity, even with decreased affinity. In contrast, when differential specificity is present, rather than follow a single trajectory, the MA plot will appear to split, with 8-mers preferential in the two condition spreading.

Intuitively, testing for differential specificity is performed by first fitting a single trajectory to the MA plot, followed by testing for deviations from the trajectory. Since deviations from the baseline condition are often of more interest, rather than using the average between conditions as the x-axis, the trajectory is fit with the reference affinity along the x-axis. This can be changed by setting useref = FALSE in the call to upbm::kmerTestSpecificity.

alexa_ds <- kmerTestSpecificity(alexa_kfit)
alexa_ds

## class: SummarizedExperiment 
## dim: 32896 4 
## metadata(0):
## assays(9): contrastAverage contrastDifference ... specificityQ
##   specificityAxis
## rownames: NULL
## rowData names(1): seq
## colnames(4): HOXC9-K195R HOXC9-R193K HOXC9-R222W HOXC9-REF
## colData names(0):

By default, the trajectory is fit using cubic B-splines with 3 knots placed evenly spaced along the range of the x-axis. Alternatively, a LOESS smoother can be fit by specifying method = "loess".

The returned PBMExperiment includes several new assays.

assayNames(alexa_ds)

## [1] "contrastAverage"    "contrastDifference" "contrastVariance"  
## [4] "contrastFit"        "contrastResidual"   "specificityZ"      
## [7] "specificityP"       "specificityQ"       "specificityAxis"

The x-axis used for fitting the specificity trend is stored in the specificityAxis assay. The fit and residuals are included in the contrastFit and contrastResidual assays. Finally, the specificityZ, `specificityP, and specificityQ assays contain the z-score, unadjusted and adjusted p-values from testing the null hypothesis that contrastResidual is zero.

We can plot the data along with the contrastFit to see the B-spline trajectories fit to the data.

ds_dat <- broom::tidy(alexa_ds, c("specificityAxis", "specificityQ",
                                  "contrastDifference", "contrastFit",
                                  "contrastResidual"))

ds_dat <- ds_dat %>%
    dplyr::filter(cname != "HOXC9-REF") %>%
    dplyr::mutate(specificityQ = cut(specificityQ, c(1e0, 1e-2, 1e-3, 1e-6, 0),
                                  include.lowest = TRUE),
                  specificityQ = factor(specificityQ, levels = rev(levels(specificityQ))))

ds_dat %>%
    ggplot(aes(x = specificityAxis)) +
    geom_point(aes(y = contrastDifference, color = specificityQ), alpha = 1/10) +
    theme_bw() +
    geom_line(aes(y = contrastFit), color = 'black') + 
    scale_color_brewer("q-value", palette = "Spectral", direction = -1, drop = FALSE,
                       na.value = "black") +
    geom_hline(yintercept = 0, color = 'gray') + 
    guides(color = guide_legend(override.aes = list(alpha = 1))) +
    facet_grid(. ~ cname) +
    ggtitle("HOXC9 8-mer differential specificity")

We see that unlike differential affinity, very few 8-mers are called in the differential specificity analysis for all alleles.

ds_dat %>%
    dplyr::count(cname, specificityQ) %>%
    tidyr::spread(specificityQ, n, fill = 0)

## # A tibble: 3 x 5
##   cname       `(0.01,1]` `(0.001,0.01]` `(1e-06,0.001]` `[0,1e-06]`
##   <chr>            <dbl>          <dbl>           <dbl>       <dbl>
## 1 HOXC9-K195R      32837             36              22           1
## 2 HOXC9-R193K      32896              0               0           0
## 3 HOXC9-R222W      32825             31              34           6

We can similarly plot the residuals as well.

ds_dat %>%
    ggplot(aes(x = specificityAxis)) +
    geom_point(aes(y = contrastResidual, color = specificityQ), alpha = 1/10) +
    geom_hline(yintercept = 0, color = 'gray') + 
    theme_bw() +
    scale_color_brewer("q-value", palette = "Spectral", direction = -1, drop = FALSE,
                       na.value = "black") +
    guides(color = guide_legend(override.aes = list(alpha = 1))) +
    facet_grid(. ~ cname) +
    ggtitle("HOXC9 8-mer differential specificity (residuals)")

References

Irizarry, Rafael A, Bridget Hobbs, Francois Collin, Yasmin D Beazer-Barclay, Kristen J Antonellis, Uwe Scherf, and Terence P Speed. 2003. “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data.” Biostatistics 4 (2): 249–64.

upbm: Inference Details

Patrick K. Kimes

15 February 2020