Normalize across replicates — normalizeAcrossReplicates • upbm

Universal PBM experiments are often performed with several conditions of interest, e.g. allelic variants, assayed on separate arrays of the same plate with few replicates. Within and across plates, probe intensities can vary for biologically uninteresting reasons, such as concentration differences. To explicitly correct for these differences, normalization is performed in two steps.

First, normalization is performed within replicates (plates). More detail on this procedure can be found in the normalizeWithinReplicates documentation.

Second, normalization is performed across replicates (plates) with the assumption that biologically uninteresting differences between replicates affect probe intensities both multiplicatively and additively on the log-scale. A single log-scale multiplicative normalization factor is first estimated for all samples within a replicate. Then, a log-scale additive normalization is estimated such that the median intensities of the baseline samples in each replicate are equal. More details on this calculation are provided below.

normalizeAcrossReplicates(
  pe,
  assay = SummarizedExperiment::assayNames(pe)[1],
  group = "id",
  stratify = "condition",
  baseline = NULL,
  verbose = FALSE
)

Arguments

pe	SummarizedExperiment object containing GPR intensity information.
assay	a string name of the assay to normalize. (default = `SummarizedExperiment::assayNames(pe)[1]`)
group	a character string specifying a column in `colData(pe)` to use for grouping replicates. (default = `"id"`)
stratify	a character string specifying a column in `colData(pe)` to use for determining the unique baseline scan within each `group` and to match samples across values of `group`. (default = `"condition"`)
baseline	a character string specifying the baseline condition in the `stratify` column to normalize other conditions against within each `group`. If not specified and set to NULL, the baseline value is guessed by looking for values in the `stratify` column ending in ``ref". If multiple unique matching values are found, a warning is thrown and the first matching sample is used. (default = NULL)
verbose	a logical value whether to print verbose output during analysis. (default = FALSE)

Value

Original PBMExperiment object with assay containing cross-replicate normalized intensities ("normalized") and new columns added to the colData, "acrossRepMultScale" and "acrossRepAddScale", containing the inverse of the log-scale multiplicative and additive scaling factors used to normalize intensities. If an assay with the same name is already included in the object, it will be overwritten.

Details

The following procedure is used to estimate the log-scale multiplicative factor for each replicate. First, a cross-replicate reference is computed for each baseline condition (specified by stratify= and baseline=) by taking the cross-replicate mean quantiles of the observed log2 intensities. Next, a per-replicate log multiplicative scaling factor is computed by taking the median ratio of the rank-ordered and median-centered log-probe intensities between the baseline samples in each replicate and the reference distribution. Visually, this can be interpreted as the approximate slope of the quantile-quantile (QQ) plot generated using log-scale intensities. To reduce the impact of outlier probes, scaling factors are estimated using only the middle 80

After log-scale multiplicative factors have been estimated to correct for differences in log-scale variance across replicates, a second log-scale additive factor is estimated for each replicate to correct for differences in log-scale shift. A "global median" intensity is first calculated across replicates by taking the geometric mean of the median intensities in all baseline samples across replicates. This "global median" is computed using the input probe intensities, i.e. without any cross-replicate normalization. The log-scale additive factor estimated as the difference between the median normalized probe intensity of the baseline sample in each replicate and the "global median". While the log-scale additive factor is estimated using only baseline samples, the normalization is applied to all samples in the replicate.

Cross-replicate normalization is first carried out for replicates containing a baseline sample as described above. Replicates without a baseline sample are then normalized to already normalized replicates using overlapping conditions in the stratify= column.