vignettes/upbm-classes.Rmd
      upbm-classes.RmdFor an introduction to the upbm package, please see the quick start vignette (vignette("upbm-quickstart")). Here, we provide details on the two classes introduced in the package, PBMDesign and PBMExperiment. The PBMDesign class acts as a container for protein binding microarray (PBM) design information, while the PBMExperiment class serves as the container for PBM experimental data.
suppressPackageStartupMessages(library("upbm"))PBMDesign objects for standard universial PBM (uPBM) designs are included in the upbmAux package. Additionally, example PBMExperiment objects are included in the upbmData package containing data from uPBM experiments performed using the HOXC9 transcription factor. We use these objects to help illustrate the simple structure of each class.
suppressPackageStartupMessages(library("upbmAux"))
suppressPackageStartupMessages(library("upbmData"))The PBMDesign class is used to organize the probe information for PBM experiments. PBMDesign objects are composed of 3 components (slots):
design: a data.frame of probe IDs, sequences, and other metadata,probeFilter: an optional list of filtering rules to distinguish signal probes from control spots,probeTrim: an optional pair of integers specifying the start and end of the probe sequences to use for analysis.We can take a look at the pbm_8x60k_v1 design object from the upbmAux package to get a better sense of these components.
data(pbm_8x60k_v1, package = "upbmAux")
pbm_8x60k_v1## class: PBMDesign
## design dim: 62976 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36From the object description, we can see that the design contains 62976 rows and 4 columns. Each row is a probe in the PBM array. Two columns, "probeID" and "Sequence" must be defined for the object to be a valid PBMDesign. The other two columns, "Row" and "Column" are optional information which are needed to perform spatial adjustment with upbm::spatiallyAdjust.
head(design(pbm_8x60k_v1))##   Column Row              probeID
## 1      1   1      GE_BrightCorner
## 2      2   1      GE_BrightCorner
## 3      3   1           DarkCorner
## 4      4   1           DarkCorner
## 5      5   1 Cbf_5b_Hi_n359_o2_r1
## 6      6   1      dBr_14334_Jan07
##                                                       Sequence
## 1                                                         #N/A
## 2                                                         #N/A
## 3                                                         #N/A
## 4                                                         #N/A
## 5 GAAGCTATTCAGATCGACGTGACATGTATATAGTAGGTCTGTGTTCCGTTGTCCGTGCTG
## 6 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTGIn addition to dsDNA probes which measure protein binding, PBMs typically include control probes designed by Agilent or the user. These probes should be excluded for certain normalization and downstream analysis steps (e.g. Cy3 normalization and spatial adjustment).
The probeFilter slot is a list of functions which are used to distinguish target probes from control probes. The list must be named with names matching columns in the design shown above. We can get and set the probeFilter slot by calling upbm::probeFilter.
names(probeFilter(pbm_8x60k_v1))## [1] "probeID"Here, the PBMDesign object includes a single probe filtering rule associated with the "probeID" column.
probeFilter(pbm_8x60k_v1)$probeID## function(x) { grepl("^dBr", x) }
## <bytecode: 0x7fc402f1a898>The rule must be a function that takes the specified column ("probeID") as input and returns a logical vector of the same length. Only rows which return TRUE for all rules are kept for downstream analysis. In the pbm_8x60k_v1 design, the single filter checks the probe IDs and returns TRUE for the subset prefixed with dBr_, corresponding to the subset of de Bruijn sequence probes on the array. Filtering can be performed using upbm::pbmFilterProbes.
pbm_8x60k_filtered <- pbmFilterProbes(pbm_8x60k_v1)
pbm_8x60k_filtered## class: PBMDesign
## design dim: 41944 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(0): 
## probeTrim: 1 36Notice that the returned PBMDesign has fewer probes than the original design. We can explicitly verify that this smaller number of probes matches the number of probes containing the dBr prefix in the "probeID" column.
## 
## FALSE  TRUE 
## 21032 41944After filtering has been performed, the probeFilter slot of the returned PBMDesign is cleared to prevent iterative application of the filtering critera.
Finally, the probeTrim slot is an optional numeric vector of length 2. Again, this can be accessed and set using the upbm::probeTrim function.
probeTrim(pbm_8x60k_filtered)## [1]  1 36Revisiting the design of the probes, notice that the probe sequences all end in a common 24nt primer sequence.
head(design(pbm_8x60k_filtered)$Sequence)## [1] "GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTG"
## [2] "CAGTCTAAGTTTTCGGATTACCATTAGAAATTGATGGTCTGTGTTCCGTTGTCCGTGCTG"
## [3] "CTTTTTAAAGACCTAGGAATCATTGCATTCTTATTGGTCTGTGTTCCGTTGTCCGTGCTG"
## [4] "CAGCTACGGAAGAGTAAAGGTTGTCGAAGCCGTGCAGTCTGTGTTCCGTTGTCCGTGCTG"
## [5] "GCTTCGAACGTGGGCGATGGAACAAGTCGCAGTCATGTCTGTGTTCCGTTGTCCGTGCTG"
## [6] "CGCCCGTGGTAACAGTTACGAAAATTACATGCGAAAGTCTGTGTTCCGTTGTCCGTGCTG"The probeTrim slot specifies the start and end positions in the sequences that should be kept to map K-mers to probes. By default, this is set to (1, 36) in the uPBM designs in upbmAux to exclude the common 24nt primer sequence at the end of all probes. While the values may be modified to include some or all of the primer sequence, this has not been extensively evaluated. Probe trimming can be performed using upbm::pbmTrimProbes.
pbm_8x60k_trimmed <- pbmTrimProbes(pbm_8x60k_filtered)
head(design(pbm_8x60k_trimmed))##    Column Row         probeID                             Sequence
## 6       6   1 dBr_14334_Jan07 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGT
## 7       7   1 dBr_06208_Jan07 CAGTCTAAGTTTTCGGATTACCATTAGAAATTGATG
## 8       8   1 dBr_39317_Jan07 CTTTTTAAAGACCTAGGAATCATTGCATTCTTATTG
## 9       9   1 dBr_06987_Jan07 CAGCTACGGAAGAGTAAAGGTTGTCGAAGCCGTGCA
## 10     10   1 dBr_05074_Jan07 GCTTCGAACGTGGGCGATGGAACAAGTCGCAGTCAT
## 15     15   1 dBr_16182_Jan07 CGCCCGTGGTAACAGTTACGAAAATTACATGCGAAAAll of the slots described above can be modified or updated. For example, we can add an additional (redundant) rule to the probeFilter list of the example PBMDesign.
probeFilter(pbm_8x60k_v1) <- c(probeFilter(pbm_8x60k_v1),
                               Sequence = function(x) { x != "#N/A" })We can also easily change the trimming boundaries.
## class: PBMDesign
## design dim: 41944 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(0): 
## probeTrim: 1 50A new PBMDesign object can be created by passing a data.frame of probe designs to the upbm::PBMDesign constructor. As stated above, at minimum, the data.frame must contain two columns, "probeID" and "Sequence".
new_design <- data.frame(probeID = LETTERS[1:3],
                         Sequence = rep("AAA", 3))
pbmDesign <- PBMDesign(new_design)
pbmDesign## class: PBMDesign
## design dim: 3 2
## design colnames(2): probeID Sequence
## probeFilter names(0): 
## probeTrim:The probeFilter and probeTrim slots can also be specified directly to the upbm::PBMDesign constructor or modified after constructing the PBMDesign object, as shown above.
Probe-level PBM data is stored using the PBMExperiment class, an extension of the SummarizedExperiment class with additional slots to support working with PBMDesign objects.
An example PBMExperiment object containing data from a series of HOXC9 experiments is included in the upbmData package. We will only look at the PBMExperiment containing Alexa488 scans. However, the probe-level data from Cy3 scans are similarly stored and included as a separate PBMExperiment object in the package.
data(hoxc9alexa, package = "upbmData")
hoxc9alexa## class: PBMExperiment 
## dim: 62976 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36A SummarizedExperiment object is made up of three parts: 1. assays (primary data, e.g. probe intensities), 2. column metadata, and 3. row metadata. Here, rows correspond to probes and columns correspond to scans. For more details on the SummarizedExperiment class, see the corresponding package vignette.
In addition to these three components, PBMExperiment objects also include probeFilter, probeTrim and probeCols slots. The probeFilter and probeTrim slots of the PBMExperiment class are identical to the slots described above as part of the PBMDesign class. The final probeCols slot is a character vector specifying the columns in the row metadata corresponding to the design of a PBMDesign object. Rather than storing the design as a separate slot in PBMExperiment objects, the design is kept in the row metadata and the corresponding columns are recorded in probeCols.
As with PBMDesign objects, these slots can be both accessed and modified. The hoxc9alexa object was constructed with the pbm_8x60k_v1 probe design introduced above. We can see that the probeTrim and probeFilter values match what we saw above.
probeFilter(hoxc9alexa)## $probeID
## function(x) { grepl("^dBr", x) }
## <bytecode: 0x7fc3f0784878>probeTrim(hoxc9alexa)## [1]  1 36Additionally, we can see that the probeCols columns of the row metadata match the design slot of pbm_8x60k_v1.
probeCols(hoxc9alexa)## [1] "Column"   "Row"      "probeID"  "Sequence"## DataFrame with 6 rows and 4 columns
##      Column       Row              probeID
##   <integer> <integer>          <character>
## 1         1         1      GE_BrightCorner
## 2         2         1      GE_BrightCorner
## 3         3         1           DarkCorner
## 4         4         1           DarkCorner
## 5         5         1 Cbf_5b_Hi_n359_o2_r1
## 6         6         1      dBr_14334_Jan07
##                                                       Sequence
##                                                    <character>
## 1                                                         #N/A
## 2                                                         #N/A
## 3                                                         #N/A
## 4                                                         #N/A
## 5 GAAGCTATTCAGATCGACGTGACATGTATATAGTAGGTCTGTGTTCCGTTGTCCGTGCTG
## 6 GGTGTGAGTCCATTTCGTCAAACCAACGCAACAGGTGTCTGTGTTCCGTTGTCCGTGCTGFinally, the corresponding PBMDesign object of a PBMExperiment can be extracted by passing the PBMExperiment object to upbm::PBMDesign.
PBMDesign(hoxc9alexa)## class: PBMDesign
## design dim: 62976 4
## design colnames(4): Column Row probeID Sequence
## probeFilter names(1): probeID
## probeTrim: 1 36We can check that this is identical to the PBMDesign from above.
## [1] FALSEThe PBMDesign associated values can be quickly modified or added using the associated setter function.
PBMDesign(hoxc9alexa) <- pbm_8x60k_v1This can be useful when the design information needs to be added after the PBMExperiment object has been created. However, this should be performed with caution as all prior design associated values will be overwritten.
As with PBMDesign, calls to upbm::pbmFilterProbes and upbm::pbmTrimProbes can be used to filter and trim probes in the PBMExperiment object.
pbmFilterProbes(hoxc9alexa)## class: PBMExperiment 
## dim: 41944 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(0): 
## probeTrim: 1 36It is also important to note that whenever broom::tidy is called to convert assay data to a tidy format, upbm::pbmFilterProbes and upbm::pbmTrimProbes run unless process = FALSE is specified.
## [1] 41944## [1] 62976The easiest way to create a new PBMExperiment object is by reading scan data from GPR files using upbm::gpr2PBMExperiment. Given a table of GPR file metadata, including paths to the files, the function will return the parsed GPR data as a PBMExperiment object.
In the case of the example dataset, the table used to read the GPR files is also included in the upbmData package.
## # A tibble: 5 x 11
##   gpr     date version    id reuse type    pmt   idx target condition
##   <chr>  <dbl> <chr>   <dbl> <dbl> <chr> <dbl> <dbl> <chr>  <chr>    
## 1 ./PB… 170606 v14       226     1 Alexa   400     2 HOXC9  HOXC9-REF
## 2 ./PB… 170606 v14       226     1 Alexa   400     3 HOXC9  HOXC9-R1…
## 3 ./PB… 170606 v14       226     1 Alexa   400     6 HOXC9  HOXC9-K1…
## 4 ./PB… 170606 v14       226     1 Alexa   450     2 HOXC9  HOXC9-REF
## 5 ./PB… 170606 v14       226     1 Alexa   450     3 HOXC9  HOXC9-R1…
## # … with 1 more variable: id_idx <chr>At a minimum, the full path to each GPR file must be specified in a column named gpr. The data.frame should also include any relevant metadata about the scan and sample (e.g. scan parameters or properties of the assayed protein).
Both Alexa488 and Cy3 scans for the example dataset in the upbmData package are provided in hoxc9table. We will subset to just the Alexa488 scans.
hoxc9table_alexa <- dplyr::filter(hoxc9table, type == "Alexa")In addition to the sample table the corresponding PBMDesign should also be specified when calling upbm::gpr2PBMExperiment. While not run here (since raw GPR files are not included with the package), the following call was made to generate the PBMExperiment object included in the upbmData package.
hoxc9alexa <- gpr2PBMExperiment(scans = hoxc9table_alexa,
                                probes = pbm_8x60k_v1)We can see that metadata included in the sample table are stored in the column metadata of the returned PBMExperiment object.
## DataFrame with 5 rows and 10 columns
##         date     version        id     reuse        type       pmt
##    <numeric> <character> <numeric> <numeric> <character> <numeric>
## s1    170606         v14       226         1       Alexa       400
## s2    170606         v14       226         1       Alexa       400
## s3    170606         v14       226         1       Alexa       400
## s4    170606         v14       226         1       Alexa       450
## s5    170606         v14       226         1       Alexa       450
##          idx      target   condition      id_idx
##    <numeric> <character> <character> <character>
## s1         2       HOXC9   HOXC9-REF      226_i2
## s2         3       HOXC9 HOXC9-R193K      226_i3
## s3         6       HOXC9 HOXC9-K195R      226_i6
## s4         2       HOXC9   HOXC9-REF      226_i2
## s5         3       HOXC9 HOXC9-R193K      226_i3For PBMExperiment objects created from GPR files, probe-level foreground intensities are stored in the "fore" assay. By default, (but optionally) a second assay, "back" (background) is also read in from the GPRs with the background intensities for each probe and sample.
## DataFrame with 5 rows and 5 columns
##          s1        s2        s3        s4        s5
##   <numeric> <numeric> <numeric> <numeric> <numeric>
## 1      4446      2191      2209      9780      4681
## 2      4057      1978      2041      8967      4297
## 3      5326      3443      4059     11924      7476
## 4      5686      3410      4012     12573      7458
## 5      4401      3444      2353      9943      7499## DataFrame with 5 rows and 5 columns
##          s1        s2        s3        s4        s5
##   <numeric> <numeric> <numeric> <numeric> <numeric>
## 1       197        76        65       405       132
## 2       169        76        58       324       134
## 3       161        76        63       324       136
## 4       172        83        85       354       141
## 5       210       100        92       430       177Alternatively, a new PBMExperiment object can be constructed from existing SummarizedExperiment and PBMDesign objects. To demonstrate this, we can coerce hoxc9alexa to a SummarizedExperiment object, stripping away all PBMExperiment-specific information.
## class: SummarizedExperiment 
## dim: 62976 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idxNotice that the probe annotations are still kept in the row metadata. We can additionally remove this if we would like.
## class: SummarizedExperiment 
## dim: 62976 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(0):
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idxWith the SummarizedExperiment object, we can now create a new PBMExperiment object using the PBMExperiment(..) constructor. As with gpr2PBMExperiment(..), a PBMDesign can be specified as an optional parameter.
PBMExperiment(se)## class: PBMExperiment 
## dim: 62976 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(2): Sequence probeID
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(2): Sequence probeID
## probeFilter names(0): 
## probeTrim:PBMExperiment(se, pbmDesign = pbm_8x60k_v1)## class: PBMExperiment 
## dim: 62976 30 
## metadata(0):
## assays(2): fore back
## rownames: NULL
## rowData names(4): Column Row probeID Sequence
## colnames(30): s1 s2 ... s35 s36
## colData names(10): date version ... condition id_idx
## probeCols(4): Column Row probeID Sequence
## probeFilter names(2): probeID Sequence
## probeTrim: 1 36