vignettes/upbm-classes.Rmd
upbm-classes.Rmd
For an introduction to the upbm package, please see the quick start vignette (vignette("upbm-quickstart")
). Here, we provide details on the two classes introduced in the package, PBMDesign and PBMExperiment. The PBMDesign class acts as a container for protein binding microarray (PBM) design information, while the PBMExperiment class serves as the container for PBM experimental data.
suppressPackageStartupMessages(library("upbm"))
PBMDesign objects for standard universial PBM (uPBM) designs are included in the upbmAux package. Additionally, example PBMExperiment objects are included in the upbmData package containing data from uPBM experiments performed using the HOXC9 transcription factor. We use these objects to help illustrate the simple structure of each class.
suppressPackageStartupMessages(library("upbmAux"))
suppressPackageStartupMessages(library("upbmData"))
The PBMDesign class is used to organize the probe information for PBM experiments. PBMDesign objects are composed of 3 components (slots):
design
: a data.frame of probe IDs, sequences, and other metadata,probeFilter
: an optional list of filtering rules to distinguish signal probes from control spots,probeTrim
: an optional pair of integers specifying the start and end of the probe sequences to use for analysis.We can take a look at the pbm_8x60k_v1
design object from the upbmAux package to get a better sense of these components.
data(pbm_8x60k_v1, package = "upbmAux")
pbm_8x60k_v1
## class: PBMDesign
## design dim: 62976 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36
From the object description, we can see that the design contains 62976 rows and 4 columns. Each row is a probe in the PBM array. Two columns, "probeID"
and "Sequence"
must be defined for the object to be a valid PBMDesign. The other two columns, "Row"
and "Column"
are optional information which are needed to perform spatial adjustment with upbm::spatiallyAdjust
.
head(design(pbm_8x60k_v1))
## Column Row probeID
## 1 1 1 GE_BrightCorner
## 2 2 1 GE_BrightCorner
## 3 3 1 DarkCorner
## 4 4 1 DarkCorner
## 5 5 1 Cbf_5b_Hi_n359_o2_r1
## 6 6 1 dBr_14334_Jan07
## Sequence
## 1 #N/A
## 2 #N/A
## 3 #N/A
## 4 #N/A
## 5 GAAGCTATTCAGATCGACGTGACATGTATATAGTAGGTCTGTGTTCCGTTGTCCGTGCTG
## 6 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTG
In addition to dsDNA probes which measure protein binding, PBMs typically include control probes designed by Agilent or the user. These probes should be excluded for certain normalization and downstream analysis steps (e.g. Cy3 normalization and spatial adjustment).
The probeFilter
slot is a list of functions which are used to distinguish target probes from control probes. The list must be named with names matching columns in the design shown above. We can get and set the probeFilter
slot by calling upbm::probeFilter
.
names(probeFilter(pbm_8x60k_v1))
## [1] "probeID"
Here, the PBMDesign object includes a single probe filtering rule associated with the "probeID"
column.
probeFilter(pbm_8x60k_v1)$probeID
## function(x) { grepl("^dBr", x) }
## <bytecode: 0x7fc402f1a898>
The rule must be a function that takes the specified column ("probeID"
) as input and returns a logical vector of the same length. Only rows which return TRUE
for all rules are kept for downstream analysis. In the pbm_8x60k_v1
design, the single filter checks the probe IDs and returns TRUE
for the subset prefixed with dBr_
, corresponding to the subset of de Bruijn sequence probes on the array. Filtering can be performed using upbm::pbmFilterProbes
.
pbm_8x60k_filtered <- pbmFilterProbes(pbm_8x60k_v1)
pbm_8x60k_filtered
## class: PBMDesign
## design dim: 41944 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(0):
## probeTrim: 1 36
Notice that the returned PBMDesign has fewer probes than the original design. We can explicitly verify that this smaller number of probes matches the number of probes containing the dBr
prefix in the "probeID"
column.
##
## FALSE TRUE
## 21032 41944
After filtering has been performed, the probeFilter
slot of the returned PBMDesign is cleared to prevent iterative application of the filtering critera.
Finally, the probeTrim
slot is an optional numeric vector of length 2. Again, this can be accessed and set using the upbm::probeTrim
function.
probeTrim(pbm_8x60k_filtered)
## [1] 1 36
Revisiting the design of the probes, notice that the probe sequences all end in a common 24nt primer sequence.
head(design(pbm_8x60k_filtered)$Sequence)
## [1] "GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTG"
## [2] "CAGTCTAAGTTTTCGGATTACCATTAGAAATTGATGGTCTGTGTTCCGTTGTCCGTGCTG"
## [3] "CTTTTTAAAGACCTAGGAATCATTGCATTCTTATTGGTCTGTGTTCCGTTGTCCGTGCTG"
## [4] "CAGCTACGGAAGAGTAAAGGTTGTCGAAGCCGTGCAGTCTGTGTTCCGTTGTCCGTGCTG"
## [5] "GCTTCGAACGTGGGCGATGGAACAAGTCGCAGTCATGTCTGTGTTCCGTTGTCCGTGCTG"
## [6] "CGCCCGTGGTAACAGTTACGAAAATTACATGCGAAAGTCTGTGTTCCGTTGTCCGTGCTG"
The probeTrim
slot specifies the start and end positions in the sequences that should be kept to map K-mers to probes. By default, this is set to (1, 36)
in the uPBM designs in upbmAux to exclude the common 24nt primer sequence at the end of all probes. While the values may be modified to include some or all of the primer sequence, this has not been extensively evaluated. Probe trimming can be performed using upbm::pbmTrimProbes
.
pbm_8x60k_trimmed <- pbmTrimProbes(pbm_8x60k_filtered)
head(design(pbm_8x60k_trimmed))
## Column Row probeID Sequence
## 6 6 1 dBr_14334_Jan07 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGT
## 7 7 1 dBr_06208_Jan07 CAGTCTAAGTTTTCGGATTACCATTAGAAATTGATG
## 8 8 1 dBr_39317_Jan07 CTTTTTAAAGACCTAGGAATCATTGCATTCTTATTG
## 9 9 1 dBr_06987_Jan07 CAGCTACGGAAGAGTAAAGGTTGTCGAAGCCGTGCA
## 10 10 1 dBr_05074_Jan07 GCTTCGAACGTGGGCGATGGAACAAGTCGCAGTCAT
## 15 15 1 dBr_16182_Jan07 CGCCCGTGGTAACAGTTACGAAAATTACATGCGAAA
All of the slots described above can be modified or updated. For example, we can add an additional (redundant) rule to the probeFilter
list of the example PBMDesign.
probeFilter(pbm_8x60k_v1) <- c(probeFilter(pbm_8x60k_v1),
Sequence = function(x) { x != "#N/A" })
We can also easily change the trimming boundaries.
## class: PBMDesign
## design dim: 41944 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(0):
## probeTrim: 1 50
A new PBMDesign object can be created by passing a data.frame of probe designs to the upbm::PBMDesign
constructor. As stated above, at minimum, the data.frame must contain two columns, "probeID"
and "Sequence"
.
new_design <- data.frame(probeID = LETTERS[1:3],
Sequence = rep("AAA", 3))
pbmDesign <- PBMDesign(new_design)
pbmDesign
## class: PBMDesign
## design dim: 3 2
## design colnames(2): probeID Sequence
## probeFilter names(0):
## probeTrim:
The probeFilter
and probeTrim
slots can also be specified directly to the upbm::PBMDesign
constructor or modified after constructing the PBMDesign object, as shown above.
Probe-level PBM data is stored using the PBMExperiment class, an extension of the SummarizedExperiment class with additional slots to support working with PBMDesign objects.
An example PBMExperiment object containing data from a series of HOXC9 experiments is included in the upbmData package. We will only look at the PBMExperiment containing Alexa488 scans. However, the probe-level data from Cy3 scans are similarly stored and included as a separate PBMExperiment object in the package.
data(hoxc9alexa, package = "upbmData")
hoxc9alexa
## class: PBMExperiment
## dim: 62976 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36
A SummarizedExperiment object is made up of three parts: 1. assays (primary data, e.g. probe intensities), 2. column metadata, and 3. row metadata. Here, rows correspond to probes and columns correspond to scans. For more details on the SummarizedExperiment class, see the corresponding package vignette.
In addition to these three components, PBMExperiment objects also include probeFilter
, probeTrim
and probeCols
slots. The probeFilter
and probeTrim
slots of the PBMExperiment class are identical to the slots described above as part of the PBMDesign class. The final probeCols
slot is a character vector specifying the columns in the row metadata corresponding to the design
of a PBMDesign object. Rather than storing the design
as a separate slot in PBMExperiment objects, the design is kept in the row metadata and the corresponding columns are recorded in probeCols
.
As with PBMDesign objects, these slots can be both accessed and modified. The hoxc9alexa
object was constructed with the pbm_8x60k_v1
probe design introduced above. We can see that the probeTrim
and probeFilter
values match what we saw above.
probeFilter(hoxc9alexa)
## $probeID
## function(x) { grepl("^dBr", x) }
## <bytecode: 0x7fc3f0784878>
probeTrim(hoxc9alexa)
## [1] 1 36
Additionally, we can see that the probeCols
columns of the row metadata match the design
slot of pbm_8x60k_v1
.
probeCols(hoxc9alexa)
## [1] "Column" "Row" "probeID" "Sequence"
## DataFrame with 6 rows and 4 columns
## Column Row probeID
## <integer> <integer> <character>
## 1 1 1 GE_BrightCorner
## 2 2 1 GE_BrightCorner
## 3 3 1 DarkCorner
## 4 4 1 DarkCorner
## 5 5 1 Cbf_5b_Hi_n359_o2_r1
## 6 6 1 dBr_14334_Jan07
## Sequence
## <character>
## 1 #N/A
## 2 #N/A
## 3 #N/A
## 4 #N/A
## 5 GAAGCTATTCAGATCGACGTGACATGTATATAGTAGGTCTGTGTTCCGTTGTCCGTGCTG
## 6 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTG
Finally, the corresponding PBMDesign object of a PBMExperiment can be extracted by passing the PBMExperiment object to upbm::PBMDesign
.
PBMDesign(hoxc9alexa)
## class: PBMDesign
## design dim: 62976 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36
We can check that this is identical to the PBMDesign from above.
## [1] FALSE
The PBMDesign associated values can be quickly modified or added using the associated setter function.
PBMDesign(hoxc9alexa) <- pbm_8x60k_v1
This can be useful when the design information needs to be added after the PBMExperiment object has been created. However, this should be performed with caution as all prior design associated values will be overwritten.
As with PBMDesign, calls to upbm::pbmFilterProbes
and upbm::pbmTrimProbes
can be used to filter and trim probes in the PBMExperiment object.
pbmFilterProbes(hoxc9alexa)
## class: PBMExperiment
## dim: 41944 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(0):
## probeTrim: 1 36
It is also important to note that whenever broom::tidy
is called to convert assay data to a tidy format, upbm::pbmFilterProbes
and upbm::pbmTrimProbes
run unless process = FALSE
is specified.
## [1] 41944
## [1] 62976
The easiest way to create a new PBMExperiment object is by reading scan data from GPR files using upbm::gpr2PBMExperiment
. Given a table of GPR file metadata, including paths to the files, the function will return the parsed GPR data as a PBMExperiment object.
In the case of the example dataset, the table used to read the GPR files is also included in the upbmData package.
## # A tibble: 5 x 11
## gpr date version id reuse type pmt idx target condition
## <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 ./PB… 170606 v14 226 1 Alexa 400 2 HOXC9 HOXC9-REF
## 2 ./PB… 170606 v14 226 1 Alexa 400 3 HOXC9 HOXC9-R1…
## 3 ./PB… 170606 v14 226 1 Alexa 400 6 HOXC9 HOXC9-K1…
## 4 ./PB… 170606 v14 226 1 Alexa 450 2 HOXC9 HOXC9-REF
## 5 ./PB… 170606 v14 226 1 Alexa 450 3 HOXC9 HOXC9-R1…
## # … with 1 more variable: id_idx <chr>
At a minimum, the full path to each GPR file must be specified in a column named gpr
. The data.frame should also include any relevant metadata about the scan and sample (e.g. scan parameters or properties of the assayed protein).
Both Alexa488 and Cy3 scans for the example dataset in the upbmData package are provided in hoxc9table
. We will subset to just the Alexa488 scans.
hoxc9table_alexa <- dplyr::filter(hoxc9table, type == "Alexa")
In addition to the sample table the corresponding PBMDesign should also be specified when calling upbm::gpr2PBMExperiment
. While not run here (since raw GPR files are not included with the package), the following call was made to generate the PBMExperiment object included in the upbmData package.
hoxc9alexa <- gpr2PBMExperiment(scans = hoxc9table_alexa,
probes = pbm_8x60k_v1)
We can see that metadata included in the sample table are stored in the column metadata of the returned PBMExperiment object.
## DataFrame with 5 rows and 10 columns
## date version id reuse type pmt
## <numeric> <character> <numeric> <numeric> <character> <numeric>
## s1 170606 v14 226 1 Alexa 400
## s2 170606 v14 226 1 Alexa 400
## s3 170606 v14 226 1 Alexa 400
## s4 170606 v14 226 1 Alexa 450
## s5 170606 v14 226 1 Alexa 450
## idx target condition id_idx
## <numeric> <character> <character> <character>
## s1 2 HOXC9 HOXC9-REF 226_i2
## s2 3 HOXC9 HOXC9-R193K 226_i3
## s3 6 HOXC9 HOXC9-K195R 226_i6
## s4 2 HOXC9 HOXC9-REF 226_i2
## s5 3 HOXC9 HOXC9-R193K 226_i3
For PBMExperiment objects created from GPR files, probe-level foreground intensities are stored in the "fore"
assay. By default, (but optionally) a second assay, "back"
(background) is also read in from the GPRs with the background intensities for each probe and sample.
## DataFrame with 5 rows and 5 columns
## s1 s2 s3 s4 s5
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 4446 2191 2209 9780 4681
## 2 4057 1978 2041 8967 4297
## 3 5326 3443 4059 11924 7476
## 4 5686 3410 4012 12573 7458
## 5 4401 3444 2353 9943 7499
## DataFrame with 5 rows and 5 columns
## s1 s2 s3 s4 s5
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 197 76 65 405 132
## 2 169 76 58 324 134
## 3 161 76 63 324 136
## 4 172 83 85 354 141
## 5 210 100 92 430 177
Alternatively, a new PBMExperiment object can be constructed from existing SummarizedExperiment and PBMDesign objects. To demonstrate this, we can coerce hoxc9alexa
to a SummarizedExperiment object, stripping away all PBMExperiment-specific information.
## class: SummarizedExperiment
## dim: 62976 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
Notice that the probe annotations are still kept in the row metadata. We can additionally remove this if we would like.
## class: SummarizedExperiment
## dim: 62976 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(0):
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
With the SummarizedExperiment object, we can now create a new PBMExperiment object using the PBMExperiment(..)
constructor. As with gpr2PBMExperiment(..)
, a PBMDesign can be specified as an optional parameter.
PBMExperiment(se)
## class: PBMExperiment
## dim: 62976 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(2): Sequence probeID
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(2): Sequence probeID
## probeFilter names(0):
## probeTrim:
PBMExperiment(se, pbmDesign = pbm_8x60k_v1)
## class: PBMExperiment
## dim: 62976 30
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(2): probeID Sequence
## probeTrim: 1 36