Title: | Finder of Rare Entities (FiRE) |
---|---|
Description: | The algorithm assigns rareness/ outlierness score to every sample in voluminous datasets. The algorithm makes multiple estimations of the proximity between a pair of samples, in low-dimensional spaces. To compute proximity, FiRE uses Sketching, a variant of locality sensitive hashing. For more details: Jindal, A., Gupta, P., Jayadeva and Sengupta, D., 2018. Discovery of rare cells from voluminous single cell expression data. Nature Communications, 9(1), p.4719. <doi:10.1038/s41467-018-07234-6>. |
Authors: | Prashant Gupta [aut, cre], Aashi Jindal [aut], Jayadeva [aut], Debarka Sengupta [aut] |
Maintainer: | Prashant Gupta <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-06 04:11:25 UTC |
Source: | https://github.com/cran/FiRE |
This package sorts samples as per their rareness/ oultierness. Instead of dichotomized decisions, FiRE assigns rareness/ outlierness score to every sample. These scores can then be used to identify rare samples or outliers, with varying degrees of rareness. FiRE takes multiple estimations of the proximity between a pair of samples, in low-dimensional spaces to compute scores.
FiRE is written in c++ and wrapped with Rcpp to create an R interface. To use FiRE, an object with the number of estimators (L), number of dimensions to be sampled per estimator (M), number of bins per estimator (H = 1017881), seed for random number generator (seed = 0) and verbose level (verbose = 0/1) needs to be created. Once the object is created, fit
function needs to be called to hash all the samples into bins. Once hashing is done, score
function needs to be called to retrieve score for every sample. Sample commands may be seen in the Examples section. The resulting model has four model parameters which may be accessed, dimensions (d) of size M sampled per estimator, thresholds (ths) of size M per d, weights(w) of size M generated per estimator and bin (b) of size H per estimator.
Prashant Gupta, [email protected]
Aashi Jindal, [email protected]
Jayadeva, [email protected]
Debarka Sengupta, [email protected]
Maintainer: Prashant Gupta <[email protected]>
[1]Jindal, A., Gupta, P., Jayadeva and Sengupta, D., 2018. Discovery of rare cells from voluminous single cell expression data. Nature Communications, 9(1), p.4719.
[2]Wang, Z., Dong, W., Josephson, W., Lv, Q., Charikar, M. and Li, K., 2007, June. Sizing sketches: a rank-based analysis for similarity search. In ACM SIGMETRICS Performance Evaluation Review (Vol. 35, No. 1, pp. 157-168). ACM.
## L <- number of estimators ## M <- Number of dims to sample ## H <- Number of bins ## seed <- seed for random number generator ## verbose <- verbose level library('FiRE') data(sample_data) #Samples * Features data(sample_label) ## Samples with label '1' represent abundant, ## while samples with label '2' represent rare. ## Saving rownames and colnames rnames <- rownames(sample_data) cnames <- colnames(sample_data) ## Converting data.frame to matrix sample_data <- as.matrix(sample_data) sample_label <- as.matrix(sample_label) L <- 100 # Number of estimators M <- 50 # Dims to be sampled # Model creation without optional parameter model <- new(FiRE::FiRE, L, M) ## There are 3 more optional parameters they can be passed as ## model <- new(FiRE::FiRE, L, M, H, seed, verbose) ## Hashing all samples model$fit(sample_data) ## Computing score of each sample rareness_score <- model$score(sample_data) ## Apply IQR-based criteria to identify rare samples for further downstream analysis. q3 <- quantile(rareness_score, 0.75) iqr <- IQR(rareness_score) th <- q3 + (1.5*iqr) ## Select indexes that satisfy IQR-based thresholding criteria. indIqr <- which(rareness_score >= th) ## Create a vector for binary predictions predictions <- rep(1, dim(sample_data)[1]) predictions[indIqr] <- 2 #Replace predictions for rare samples with '2'. ## To access model parameters ## Parameters - generated weights # model$w ## Parameters - sample dimensions # model$d ## Parameters - generated thresholds # model$ths ## Parameters - Bins # model$b
## L <- number of estimators ## M <- Number of dims to sample ## H <- Number of bins ## seed <- seed for random number generator ## verbose <- verbose level library('FiRE') data(sample_data) #Samples * Features data(sample_label) ## Samples with label '1' represent abundant, ## while samples with label '2' represent rare. ## Saving rownames and colnames rnames <- rownames(sample_data) cnames <- colnames(sample_data) ## Converting data.frame to matrix sample_data <- as.matrix(sample_data) sample_label <- as.matrix(sample_label) L <- 100 # Number of estimators M <- 50 # Dims to be sampled # Model creation without optional parameter model <- new(FiRE::FiRE, L, M) ## There are 3 more optional parameters they can be passed as ## model <- new(FiRE::FiRE, L, M, H, seed, verbose) ## Hashing all samples model$fit(sample_data) ## Computing score of each sample rareness_score <- model$score(sample_data) ## Apply IQR-based criteria to identify rare samples for further downstream analysis. q3 <- quantile(rareness_score, 0.75) iqr <- IQR(rareness_score) th <- q3 + (1.5*iqr) ## Select indexes that satisfy IQR-based thresholding criteria. indIqr <- which(rareness_score >= th) ## Create a vector for binary predictions predictions <- rep(1, dim(sample_data)[1]) predictions[indIqr] <- 2 #Replace predictions for rare samples with '2'. ## To access model parameters ## Parameters - generated weights # model$w ## Parameters - sample dimensions # model$d ## Parameters - generated thresholds # model$ths ## Parameters - Bins # model$b
Constructor to create an instance of FiRE class. There are two ways in which constructor can be created.
L |
Number of estimators. |
M |
Number of dimensions to be sampled per estimator. |
H |
Number of bins. Default=1017881 |
seed |
seed for random number generator. Default=0 |
verbose |
verbose level. Default=1 |
For usage see example.
L <- 100 M <- 50 H <- 107881 seed <- 0 verbose <- 1 ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) ## Creating class object with all arguments model <- new(FiRE::FiRE, L, M, H, seed, verbose)
L <- 100 M <- 50 H <- 107881 seed <- 0 verbose <- 1 ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) ## Creating class object with all arguments model <- new(FiRE::FiRE, L, M, H, seed, verbose)
Hashing samples into bins, using Sketching as hash function. This function repeats hashing process L
times by sampling M
dimensions at a time and hashes samples into one of H
bins.
data |
On which rarity score needs to be computed. Required to be a |
For usage see example.
This function does not do any preprocessing, so preprocessed data must be passed if required.
## Not run: ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) model$fit(data) ## End(Not run)
## Not run: ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) model$fit(data) ## End(Not run)
This data set has been generated from 293T and Jurkat cell data containing a total of ~3200 cells, with an almost equal number of representative transcriptomes of each type. The cells were mixed in vitro at equal proportions. Authors of the study resolved the cell types bioinformatically exploiting their SNV profiles. With this data, we mimicked the rare cell phenomenon by bioinformatically diluting Jurkat cell proportion to 2.5% in the data.
data(sample_data)
data(sample_data)
A matrix containing 1580 observations (in rows), with 1000 features per observation (in columns).
10xgenomics, https://support.10xgenomics.com/single-cell-gene-expression/datasets
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
This contains labels for 1580 observations. There are two different clusters, one is abundant (denoted by 1) and the other is rare (denoted by 2). Rare cluster consists of Jurkat cells, which form 2.5% of the total population. The abundant cluster consists of 293T cells.
data(sample_label)
data(sample_label)
A vector containing 1580 observations.
10xgenomics, https://support.10xgenomics.com/single-cell-gene-expression/datasets
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Once hashing is done using fit
, compute score via proximity estimation.
data |
On which rarity score needs to be computed. |
For usage see example.
## Not run: ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) model$fit(data) score <- model$score(data) ## End(Not run)
## Not run: ## Creating class object with required arguments model <- new(FiRE::FiRE, L, M) model$fit(data) score <- model$score(data) ## End(Not run)