When to perform SNP-analysis
To compare closely related genomes, it is common to analyze SNPs (Single Nucleotide Polymorphisms) - sites in the genome which differ between two or more genomes. This method works fairly well for comparing, e.g., human genomes; for microbes, this analysis is typically restricted to highly similar genomes or subsets of genomes. SNP-analysis can also be use to separate cells from different donors.
Citing
Bascet wraps cellsnp-lite for SNP-calling, so please cite this paper if you use it:
Xianjie Huang, Yuanhua Huang, Cellsnp-lite: an efficient tool for genotyping single cells, Bioinformatics, Volume 37, Issue 23, December 2021, Pages 4569–4571, https://doi.org/10.1093/bioinformatics/btab358
Alignment
SNP-analysis relies on having data aligned to a reference, to which SNPs (differences) are called. First follow the alignment workflow to generate the needed BAM-files.
SNP-calling
You can now check how each cells genome deviate from the reference and obtain SNPs. This is done by calling the integrated CellSNP-lite. The input data must be position sorted reads.
BascetRunCellSNP(
bascetRoot,
inputName="myref_aligned",
numLocalThreads=5,
runner=SlurmRunner(bascet_runner.default, ncpu="5")
)Loading the count matrix
The heavy work is now over. You can load the count matrix like this:
# get aligned counts
cnt_myref <- ReadBascetCountMatrix(bascetRoot,"cnt_myref")
# Compute fraction aligned
mapped_reads <- Matrix::rowSums(cnt_myref@X)
cnt_myref@obs$tot_reads <- as.double(cnt_myref@obs$unclassified_reads + mapped_reads)
cnt_myref@obs$frac_mapped <- as.double(mapped_reads / cnt_myref@obs$tot_reads)Subsetting cells
If you do microbial analysis and only want cells that are similar enough for comparision, you can filter out other cells using the following procedure.
First produce a kneeplot of how much cells deviate SNP-wise:
### Knee plot of alignment
df <- data.frame(
frac_mapped = sort(cnt_myref$obs$frac_mapped, decreasing = TRUE),
index = 1:nrow(cnt_myref$obs)
)
ggplot(df, aes(index, frac_mapped)) +
geom_line() +
scale_x_log10() +
theme_bw()You can then subset as follows:
cnt_myref <- cnt_myref[cnt_myref$obs$frac_mapped>0.8,]Clustering by SNPs
You can produce UMAPs and clusters based on the SNP matrix. Load the CellSNP-lite counts as follows:
snp_ad <- ReadCellSNPmatrix(
bascetRoot,
"cellsnp"
)Create a Seurat object from the counts:
adata <- CreateSeuratObject(counts = t(snp_ad), project = "adata3k", min.cells = 3, min.features = 0)
#transfer count metadata (optional)
adata$frac_mapped <- cnt_mutans$obs[colnames(adata),]$frac_mapped
adata$tot_reads <- cnt_mutans$obs[colnames(adata),]$tot_readsDo the usual transformations and clustering (another option is to normalize by sequencing depth of cell, not SNP count; or run ATAC-seq-like normalization. Best practices are yet to be established):
### Normalize data, PCA etc
adata <- NormalizeData(adata, normalization.method = "LogNormalize", scale.factor = 10000)
adata <- NormalizeData(adata)
adata <- FindVariableFeatures(adata, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(adata)
adata <- ScaleData(adata, features = all.genes)
adata <- RunPCA(adata, features = VariableFeatures(object = adata))
adata <- FindNeighbors(adata, dims = 1:10)
adata <- FindClusters(adata, resolution = 0.5)
adata <- RunUMAP(adata, dims = 1:10)
adata$snp_count <- rowSums(adata@assays$RNA@layers$counts)Do some quick plots to check for clusters:
DimPlot(adata, reduction = "umap")
FeaturePlot(adata, reduction = "umap", features = c("snp_count"))
FeaturePlot(adata, reduction = "umap", features = c("frac_mapped"))
FeaturePlot(adata, reduction = "umap", features = c("tot_reads"))Find differential SNPs using the Seurat marker tools
adata.markers <- FindAllMarkers(adata, only.pos = TRUE)
adata.markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 2) %>%
slice_head(n = 100) %>%
ungroup() -> top10
as.data.frame(top10)Plot SNPs to your liking
FeaturePlot(adata, reduction = "umap", features = c("NZ-CP077404.1-7148-T-to-C"))That’s it!
SNP-based deconvolution of cells from different donors
The output files from SNP-analysis are directly compatible with Vireo. See their tutorials for how to perform this analysis.