Seurat单细胞处理流程之二:数据质控

1.简介

单细胞RNA测序(scRNA-seq)数据质量控制(Quality Control, QC)是分析流程中的关键步骤。由于单细胞测序技术存在一定的噪音和技术误差,质控的目的是去除低质量细胞和异常数据,提高后续分析的可靠性和生物学意义。

2.数据读入

以pbmc示例数据为例:

rm(list = ls())
setwd("/mnt/DEV_8T/zhaozm/seurat全流程/数据质控")

##禁止转化为因子
options(stringsAsFactors = FALSE)
library(Seurat)
library(dplyr)
library(readr)
library(Matrix)
library(ggplot2)
library(patchwork)
library(ggplot2)
载入需要的程序包:SeuratObject

载入需要的程序包:sp


载入程序包:‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t



载入程序包:‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
#读入数据
pbmc.data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")
## 创建seurat对象
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
#初步查看Seurat对象
pbmc
An object of class Seurat 
13714 features across 2700 samples within 1 assay 
Active assay: RNA (13714 features, 0 variable features)
 1 layer present: counts

3.质控数据及可视化

## 计算线粒体比例
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-") 
#人源的数据为MT,鼠源的需要换成mt
head(pbmc@meta.data,5)
A data.frame: 5 × 4
orig.identnCount_RNAnFeature_RNApercent.mt
<fct><dbl><int><dbl>
AAACATACAACCAC-1pbmc3k2419 7793.0177759
AAACATTGAGCTAC-1pbmc3k490313523.7935958
AAACATTGATCAGC-1pbmc3k314711290.8897363
AAACCGTGCTTCCG-1pbmc3k2639 9601.7430845
AAACCGTGTATGCG-1pbmc3k 980 5211.2244898
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”

质控过滤_7_1.png


plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
if(!require(patchwork))install.packages("patchwork")
CombinePlots(plots = list(plot1, plot2)) 
Warning message in CombinePlots(plots = list(plot1, plot2)):
“CombinePlots is being deprecated. Plots should now be combined using the patchwork system.”

质控过滤_8_1.png

## 过滤
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)   
ncol(as.data.frame(GetAssayData(pbmc, slot = "counts")))
## [1] 2638
Warning message:
“The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.”

2638

## 过滤之后的图片
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
if(!require(patchwork))install.packages("patchwork")
#CombinePlots这步需要你的绘图窗口足够大
CombinePlots(plots = list(plot1, plot2)) 

Warning message in CombinePlots(plots = list(plot1, plot2)):
“CombinePlots is being deprecated. Plots should now be combined using the patchwork system.”

质控过滤_10_1.png

VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3) 
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”

质控过滤_11_1.png