How is my Gene Expression nCounter data normalized in ROSALIND? How does ROSALIND calculate differential expression for my Gene Expression nCounter data?

Short answer: we use the same normalization methods as nSolver.

Gene Expression RCC Normalization

ROSALIND® follows the NanoString nCounter® Advanced Analysis protocol for data normalization of Gene Expression nCounter RCC Analysis.

Normalization for run-to-run and sample-to-sample variability is done by dividing counts within a lane by the geometric mean of the normalizer probes from the same lane. Normalizer probes are selected by the geNorm algorithm as implemented in the Bioconductor package NormqPCR. While the expression of a good housekeeping gene may vary between samples in non-normalized data, the ratio between the housekeepers should be stable. geNorm relies on the behavior of housekeepers rising and falling together to iteratively remove candidate housekeepers with the least stable expression relative to other candidates.

For scenarios where a user has more than one lot of the same panel or for PlexSet data, the user is able to define reference or calibration samples in ROSALIND during experiment setup. These samples are used to quantify and adjust for variability in probe efficiency across batches or lanes. Calibration factors are calculated on a per lot basis and are multiplied across all probes in that lot.

How does ROSALIND calculate differential expression for my data?

Differential gene expression analysis in ROSALIND is currently run with one of the two methods defined by NanoString and available in nSolver Advanced Analysis, the “Optimal” method.  To maintain consistency with nSolver Advanced Analysis default settings that have been utilized for many years, we are converting the default in ROSALIND to the “Fast” method. 

The Optimal method for calculating differential expression was initially designed to manage special cases for low count data using an additional component in the model to estimate noise dispersion. After extensive use, Fast has been demonstrated to successfully analysis of wide array of datasets, including low count data, and has been adopted as the default and more widely used method. With this history of use, we have concluded that the estimation of noise dispersion in the optimal method is an unnecessary addition for analysis of low count data.

For Gene Expression nCounter data, ROSALIND follows the nCounter® Advanced Analysis protocol to identity the targets which express significant increased or decreased expression. Differential expression is calculated based on user specified groups. In ROSALIND, users can set up comparisons based on sample attributes or selecting specific samples for each comparison of interest. P-value adjustment is performed using the Benjamini-Hochberg method of estimating false discovery rates (FDR).

We use the same procedure to generate normalization factors:

1. Calculate the geometric mean of the selected probes.

2. Calculate the arithmetic mean of these geometric means for all sample lanes.
3.  Divide the arithmetic mean by the geometric mean of each lane to generate a lane-specific normalization factor.

4. Multiply the counts for every probe by its lane-specific normalization factor.