Title: | Fast, Sensitive, and Accurate Integration of Single Cell Data |
---|---|
Description: | Implementation of the Harmony algorithm for single cell integration, described in Korsunsky et al <doi:10.1038/s41592-019-0619-0>. Package includes a standalone Harmony function and interfaces to external frameworks. |
Authors: | Ilya Korsunsky [cre, aut] , Martin Hemberg [aut] , Nikolaos Patikas [aut, ctb] , Hongcheng Yao [aut, ctb] , Nghia Millard [aut] , Jean Fan [aut, ctb] , Kamil Slowikowski [aut, ctb] , Miles Smith [ctb], Soumya Raychaudhuri [aut] |
Maintainer: | Ilya Korsunsky <[email protected]> |
License: | GPL-3 |
Version: | 1.2.3 |
Built: | 2024-11-11 22:55:46 UTC |
Source: | https://github.com/immunogenomics/harmony |
List of metadata table and scaled PCs matrix
cell_lines
cell_lines
: meta_data: data.table of 9478 rows with defining dataset and cell_type scaled_pcs: data.table of 9478 rows (cells) and 20 columns (PCs)
Same as cell_lines but smaller (300 cells).
cell_lines_small
cell_lines_small
An object of class list
of length 2.
Algorithm for single cell integration.
?RunHarmony to run Harmony on cell embeddings matrix, Seurat or SingleCellExperiment objects.
Report bugs at https://github.com/immunogenomics/harmony/issues
Read the manuscript doi:10.1038/s41592-019-0619-0
Set advanced parameters for RunHarmony
harmony_options( alpha = 0.2, tau = 0, block.size = 0.05, max.iter.cluster = 20, epsilon.cluster = 0.001, epsilon.harmony = 0.01 )
harmony_options( alpha = 0.2, tau = 0, block.size = 0.05, max.iter.cluster = 20, epsilon.cluster = 0.001, epsilon.harmony = 0.01 )
alpha |
When setting lambda = NULL and use lambda estimation mode, lambda would be determined by the expected number of cells assuming idependece between batches and clusters. i.e., lambda = alpha * expected number of cells, default 0.2 and alpha should be 0 < alpha < 1 |
tau |
Protection against overclustering small datasets with large ones. 'tau' is the expected number of cells per cluster. |
block.size |
What proportion of cells to update during clustering. Between 0 to 1, default 0.05. Larger values may be faster but less accurate. |
max.iter.cluster |
Maximum number of rounds to run clustering at each round of Harmony. |
epsilon.cluster |
Convergence tolerance for clustering round of Harmony. Set to -Inf to never stop early. |
epsilon.harmony |
Convergence tolerance for Harmony. Set to -Inf to never stop early. When 'epsilon.harmony' is set to not NULL, then user-supplied values of 'early_stop' is ignored. |
Return a list for '.options' argument of 'RunHarmony'
## If want to set max.iter.cluster to be 100, do ## Not run: RunHarmony(data_meta, meta_data, vars_use, .options = harmony_options(max.iter.cluster = 100)) ## End(Not run)
## If want to set max.iter.cluster to be 100, do ## Not run: RunHarmony(data_meta, meta_data, vars_use, .options = harmony_options(max.iter.cluster = 100)) ## End(Not run)
RunHarmony()
. Deprecated.Maintain name backwards compatibility with version 0 of harmony. However, API is not backwards compatible with version 0. This function will be deprecated in later versions of Harmony.
HarmonyMatrix(...)
HarmonyMatrix(...)
... |
Arguments passed on to
|
Utility function to get ridge regression coefficients from trained Harmony object
moe_ridge_get_betas(harmonyObj)
moe_ridge_get_betas(harmonyObj)
harmonyObj |
Trained harmony object. Get this by running RunHarmony function with return_object=TRUE. |
Returns nothing, modifies object in place.
Gene expression data of control PBMC from Kang et al. 2017. This contains a sample of 1000 cells from that condition and is used for the Seurat Vignette.
pbmc.ctrl
pbmc.ctrl
An object of class dgCMatrix
with 9015 rows and 1000 columns.
Gene expression data of stimulated PBMC from Kang et al. 2017. This contains a sample of 1000 cells from that condition and is used for the Seurat Vignette.
pbmc.stim
pbmc.stim
An object of class dgCMatrix
with 9015 rows and 1000 columns.
RunHarmony is generic function that runs the main Harmony
algorithm. If working with single cell R objects, please refer to
the documentation of the appropriate generic API:
(RunHarmony.Seurat()
or RunHarmony.SingleCellExperiment()
). If
users work with other forms of cell embeddings, the can pass them
directly to harmony using RunHarmony.default()
API. All the
function arguments listed here are common in all RunHarmony
interfaces.
RunHarmony(...)
RunHarmony(...)
... |
Arguments passed on to
|
If used with single-cell objects, it will return the
updated single-sell object. For standalone operation, it
returns the corrected cell embeddings or the R6 harmony object
(see RunHarmony.default()
).
Other RunHarmony:
RunHarmony.Seurat()
,
RunHarmony.SingleCellExperiment()
,
RunHarmony.default()
Use this generic with a cell embeddings matrix, a metadata table and a categorical covariate to run the Harmony algorithm directly on cell embedding matrix.
## Default S3 method: RunHarmony( data_mat, meta_data, vars_use, theta = NULL, sigma = 0.1, lambda = 1, nclust = NULL, max_iter = 10, early_stop = TRUE, ncores = 1, plot_convergence = FALSE, return_object = FALSE, verbose = TRUE, .options = harmony_options(), ... )
## Default S3 method: RunHarmony( data_mat, meta_data, vars_use, theta = NULL, sigma = 0.1, lambda = 1, nclust = NULL, max_iter = 10, early_stop = TRUE, ncores = 1, plot_convergence = FALSE, return_object = FALSE, verbose = TRUE, .options = harmony_options(), ... )
data_mat |
Matrix of cell embeddings. Cells can be rows or columns and will be inferred by the rows of meta_data. |
meta_data |
Either (1) Dataframe with variables to integrate or (2) vector with labels. |
vars_use |
If meta_data is dataframe, this defined which variable(s) to remove (character vector). |
theta |
Diversity clustering penalty parameter. Specify for each variable in vars_use Default theta=2. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters. |
sigma |
Width of soft kmeans clusters. Default sigma=0.1. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in cells assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering. |
lambda |
Ridge regression penalty. Default lambda=1. Bigger values protect against over correction. If several covariates are specified, then lambda can also be a vector which needs to be equal length with the number of variables to be corrected. In this scenario, each covariate level group will be assigned the scalars specified by the user. If set to NULL, harmony will start lambda estimation mode to determine lambdas automatically and try to minimize overcorrection (Use with caution still in beta testing). |
nclust |
Number of clusters in model. nclust=1 equivalent to simple linear regression. |
max_iter |
Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step. |
early_stop |
Enable early stopping for harmony. The harmonization process will stop when the change of objective function between corrections drops below 1e-4 |
ncores |
Number of processors to be used for math operations when optimized BLAS is available. If BLAS is not supporting multithreaded then this option has no effect. By default, ncore=1 which runs as a single-threaded process. Although Harmony supports multiple cores, it is not optimized for multithreading. Increase this number for large datasets iff single-core performance is not adequate. |
plot_convergence |
Whether to print the convergence plot of the clustering objective function. TRUE to plot, FALSE to suppress. This can be useful for debugging. |
return_object |
(Advanced Usage) Whether to return the Harmony object or only the corrected PCA embeddings. |
verbose |
Whether to print progress messages. TRUE to print, FALSE to suppress. |
.options |
Setting advanced parameters of RunHarmony. This must be the result from a call to 'harmony_options'. See ?'harmony_options' for parameters not listed above and more details. |
... |
other parameters that are not part of the API |
By default, matrix with corrected PCA embeddings. If return_object is TRUE, returns the full Harmony object (R6 reference class type).
Other RunHarmony:
RunHarmony.Seurat()
,
RunHarmony.SingleCellExperiment()
,
RunHarmony()
## By default, Harmony inputs a cell embedding matrix ## Not run: harmony_embeddings <- RunHarmony(cell_embeddings, meta_data, 'dataset') ## End(Not run) ## If PCA is the input, the PCs need to be scaled data(cell_lines_small) pca_matrix <- cell_lines_small$scaled_pcs meta_data <- cell_lines_small$meta_data harmony_embeddings <- RunHarmony(pca_matrix, meta_data, 'dataset') ## Output is a matrix of corrected PC embeddings dim(harmony_embeddings) harmony_embeddings[seq_len(5), seq_len(5)] ## Finally, we can return an object with all the underlying data structures harmony_object <- RunHarmony(pca_matrix, meta_data, 'dataset', return_object=TRUE) dim(harmony_object$Y) ## cluster centroids dim(harmony_object$R) ## soft cluster assignment dim(harmony_object$Z_corr) ## corrected PCA embeddings head(harmony_object$O) ## batch by cluster co-occurence matrix
## By default, Harmony inputs a cell embedding matrix ## Not run: harmony_embeddings <- RunHarmony(cell_embeddings, meta_data, 'dataset') ## End(Not run) ## If PCA is the input, the PCs need to be scaled data(cell_lines_small) pca_matrix <- cell_lines_small$scaled_pcs meta_data <- cell_lines_small$meta_data harmony_embeddings <- RunHarmony(pca_matrix, meta_data, 'dataset') ## Output is a matrix of corrected PC embeddings dim(harmony_embeddings) harmony_embeddings[seq_len(5), seq_len(5)] ## Finally, we can return an object with all the underlying data structures harmony_object <- RunHarmony(pca_matrix, meta_data, 'dataset', return_object=TRUE) dim(harmony_object$Y) ## cluster centroids dim(harmony_object$R) ## soft cluster assignment dim(harmony_object$Z_corr) ## corrected PCA embeddings head(harmony_object$O) ## batch by cluster co-occurence matrix
Applies harmony on a Seurat object cell embedding.
## S3 method for class 'Seurat' RunHarmony( object, group.by.vars, reduction.use = "pca", dims.use = NULL, reduction.save = "harmony", project.dim = TRUE, ... )
## S3 method for class 'Seurat' RunHarmony( object, group.by.vars, reduction.use = "pca", dims.use = NULL, reduction.save = "harmony", project.dim = TRUE, ... )
object |
the Seurat object. It needs to have the appropriate slot of cell embeddings precomputed. |
group.by.vars |
the name(s) of covariates that harmony will remove its effect on the data. |
reduction.use |
Name of dimension reduction to use. Default is pca. |
dims.use |
indices of the cell embedding features to be used |
reduction.save |
the name of the new slot that is going to be created by harmony. By default, harmony. |
project.dim |
Project dimension reduction loadings. Default TRUE. |
... |
Arguments passed on to
|
Seurat object. Harmony dimensions placed into a new slot in the Seurat object according to the reduction.save. For downstream Seurat analyses, use reduction='harmony'.
Other RunHarmony:
RunHarmony.SingleCellExperiment()
,
RunHarmony.default()
,
RunHarmony()
## Not run: ## seu is a Seurat single-Cell R object seu <- RunHarmony(seu, "donor_id") ## End(Not run)
## Not run: ## seu is a Seurat single-Cell R object seu <- RunHarmony(seu, "donor_id") ## End(Not run)
Applies harmony on PCA cell embeddings of a SingleCellExperiment.
## S3 method for class 'SingleCellExperiment' RunHarmony( object, group.by.vars, dims.use = NULL, verbose = TRUE, reduction.save = "HARMONY", ... )
## S3 method for class 'SingleCellExperiment' RunHarmony( object, group.by.vars, dims.use = NULL, verbose = TRUE, reduction.save = "HARMONY", ... )
object |
SingleCellExperiment with the PCA reducedDim cell embeddings populated |
group.by.vars |
the name(s) of covariates that harmony will remove its effect on the data. |
dims.use |
a vector of indices that allows only selected cell embeddings features to be used. |
verbose |
enable verbosity |
reduction.save |
the name of the new slot that is going to be created by harmony. By default, HARMONY. |
... |
Arguments passed on to
|
SingleCellExperiment object. After running RunHarmony, the corrected cell embeddings can be accessed with reducedDim(object, "Harmony").
Other RunHarmony:
RunHarmony.Seurat()
,
RunHarmony.default()
,
RunHarmony()
## Not run: ## sce is a SingleCellExperiment R object sce <- RunHarmony(sce, "donor_id") ## End(Not run)
## Not run: ## sce is a SingleCellExperiment R object sce <- RunHarmony(sce, "donor_id") ## End(Not run)