How To Analyze Multi-RLF Merge Data In ROSALIND

By Praveer Sharma (Bay Area FAS, NanoString)

MultiRLF Merge data can be imported into Rosalind by exporting the normalized merged data from nSolver. This exported data has a couple of features not present in standard (i.e. non merged) data. First, it is missing the synthetic ERCC controls entirely. Rosalind will not successfully import data without the synthetic controls, so these need to be incorporated into the file that will be imported. Second, as different codesets are developed over time, the exact probe sequences used are changed and refined. So different RLFs can have different probes for the same target, and these duplicates will be present in the normalized file exported from nSolver. Rosalind will not successfully import data where housekeepers are duplicated; it will import duplicated endogenous genes, but will not perform statistics on them properly (this is due to the way Rosalind reads the import file). So the three concerns that need to be addressed are: 1) missing synthetic controls, 2) duplicated housekeepers, 3) duplicated endogenous genes.

Perform the MultiRLF Merge in nSolver and export the normalized data, following the instructions starting on page 91 of the nSolver User Manual.
Copy the rows for negative and positive controls, starting from the annotation (i.e. Negative or Positive), and including the labels such as NEG_A or POS_F, and insert them right above the endogenous counts. These values can be chosen from either one of the original panels (i.e. pre-merge). Keep in mind that Rosalind will perform QC on these values, so if any samples in the chosen panel failed QC, that will be carried over. Conversely, if they failed QC in the other panel, that flag will not show up in Rosalind. It is to the users discretion whether or not to keep samples that failed QC for importing in to Rosalind. The result should look like this:
Locate duplicated genes. This can be done in Excel by selecting the gene names, choosing Conditional Formatting in the menu bar, then choosing Highlight Cell Rules -> Duplicate Values. The result should be like this:
Rosalind requires housekeeping genes to be present in only one copy. In most cases, normalized counts for a given duplicated housekeeper in both RLFs will be roughly similar (a less than 2-fold difference). The recommended method to work with such genes is by creating a new row that consists of the arithmetic mean of the counts in the preceding rows, then deleting the old rows. The result will be: However, if the duplicate counts for a housekeeper are highly divergent, follow the steps suggested for endogenous genes.
Duplicated endogenous genes can be addressed in a few ways. The most straightforward way is to rename them so the different copies can be tracked. This renaming can be simply adding _copy1 and _copy2 to the gene names, or it can incorporate the panels each copy came from. To do this, first determine the NS Probe IDs for each copy by opening the Merged data in the table view in nSolver, noting the Probe IDs, then opening the individual RLF data (raw or normalized), and checking which Probe ID is associated with which RLF. We would thus convert:
to:
Alternatively, one copy can be deleted. However, this must be done carefully and with biological justification.
Once these changes have been made, the saved .csv file can be imported into Rosalind (as a normalized file), and it should process correctly.