Differential expression and GO enrichment analysis¶
This tutorial demonstrates how to perform Differential Expression (DE) analysis and Gene Ontology (GO) enrichment analysis on the high-definition clustering results obtained from SpaHDmap. We use the results from Analysis of multi-section ST data tutorial as an illustrative example.
1. Import necessary libraries¶
First, we load the necessary R libraries, including Seurat and clusterProfiler for data analysis.
[1]:
library(Seurat)
library(ggplot2)
library(clusterProfiler)
library(org.Hs.eg.db)
Warning message:
"package 'Seurat' was built under R version 4.4.3"
Loading required package: SeuratObject
Loading required package: sp
Warning message:
"package 'sp' was built under R version 4.4.3"
Attaching package: 'SeuratObject'
The following objects are masked from 'package:base':
intersect, t
clusterProfiler v4.12.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
Please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan,
X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141
Attaching package: 'clusterProfiler'
The following object is masked from 'package:stats':
filter
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following object is masked from 'package:SeuratObject':
intersect
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
tapply, union, unique, unsplit, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:clusterProfiler':
rename
The following object is masked from 'package:utils':
findMatches
The following objects are masked from 'package:base':
expand.grid, I, unname
Attaching package: 'IRanges'
The following object is masked from 'package:clusterProfiler':
slice
The following object is masked from 'package:sp':
%over%
The following object is masked from 'package:grDevices':
windows
Attaching package: 'AnnotationDbi'
The following object is masked from 'package:clusterProfiler':
select
2. Load the data and pre-process¶
We will load the data from multiple samples, normalize it, and assign cluster labels based on the provided metadata. The threshold variable is used to filter out spots with low confidence in their clustering.
Note that the spot’s clustering metadata is processed by average the pixel cluster labels in this spot.
Load and process expression data¶
[2]:
# Set the threshold for label assignment
threshold <- 0.7
[3]:
# Load data from multiple samples and perform normalization
data_list <- list()
for (sample in c("Palbocilib_A","Palbocilib_B","Control_C","Control_D")){
data <- Load10X_Spatial(data.dir = paste0("data/",sample), filename = "raw_feature_bc_matrix.h5")
data <- NormalizeData(data)
# Load clustering metadata
clustering <- read.csv(paste0("data/", sample, "/", sample, "_meta.csv"), row.names = 1)
label <- apply(clustering,1,function(x){which.max(x)[1]})
label[apply(clustering,1,max)<=threshold] <- NA
data <- AddMetaData(data,label,"cluster")
data_list[[sample]] <- data
}
Warning message:
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
Normalizing layer: counts
Warning message:
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
Normalizing layer: counts
Warning message:
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
Normalizing layer: counts
Warning message:
"Feature names cannot have underscores ('_'), replacing with dashes ('-')"
Normalizing layer: counts
[14]:
# Merge the data from all samples into a single Seurat object
data <- scCustomize::Merge_Seurat_List(
data_list,
add.cell.ids = names(data_list),
merge.data = TRUE,
project = "SeuratProject"
)
data <- JoinLayers(data)
Warning message:
"Key 'slice1_' taken, using 'slice12_' instead"
Warning message:
"Key 'slice1_' taken, using 'slice12_' instead"
Warning message:
"Key 'slice1_' taken, using 'slice12_' instead"
Refine cluster labels¶
We map the numeric cluster labels to more descriptive names (e.g., T1, T2) and handle spots that were not confidently clustered by labeling them as ‘Normal’ for the DEG analysis.
[15]:
data@meta.data$cluster <- plyr::mapvalues(data@meta.data$cluster,
c("3","4","1","2","6","10","13","15"),
c("T1","T2","T3","T4","T5","T6","T7","T8"))
data@meta.data$cluster[!stringr::str_detect(data@meta.data$cluster,"T")] <- NA
data@meta.data$spot_cluster_forDEG <- data@meta.data$cluster
data@meta.data$spot_cluster_forDEG[is.na(data@meta.data$spot_cluster_forDEG)] <- "Normal"
3. Perform differential expression analysis and GO enrichment analysis¶
Differential expression analysis¶
Using FindMarkers from Seurat, we identify genes that are differentially expressed between each tumor cluster and the ‘Normal’ spots.
[16]:
DEG_list <- list()
cluster_list <- paste0("T",1:8)
data <- data[stringr::str_detect(rownames(data), "hg38-"),]
rownames(data) <- stringr::str_remove(rownames(data), "hg38-")
for (cluster in cluster_list){
DEG_list[[cluster]] <- FindMarkers(data,ident.1 = cluster,ident.2 = "Normal",
group.by = "spot_cluster_forDEG", logfc.threshold=0,
only.pos = T)
}
Warning message:
"Not validating Centroids objects"
Warning message:
"Not validating Centroids objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating Seurat objects"
Warning message:
"Not validating Centroids objects"
Warning message:
"Not validating Centroids objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating FOV objects"
Warning message:
"Not validating Seurat objects"
GO enrichment analysis¶
Next, we use the enricher function in the clusterProfiler package to perform GO enrichment analysis on the differentially expressed genes.
[17]:
geneset <- readRDS("data/GenesetForGO.rds")
DEG_tidy <- list()
for (cluster in cluster_list){
DEG_tidy[[cluster]] <- rownames(DEG_list[[cluster]])
}
# Perform enrichment analysis using the `enricher` function from `clusterProfiler`
ck <- compareCluster(geneCluster = DEG_tidy, fun = enricher, TERM2GENE = geneset, pvalueCutoff = 0.1)
Visualize Enrichment Results¶
Finally, we create a dot plot to visualize the results of the enrichment analysis, focusing on specific pathways of interest.
[18]:
ck_copy <- ck
ck_copy@compareClusterResult <- ck_copy@compareClusterResult[ck_copy@compareClusterResult$ID %in% c("SHH.A","SHH.B","SHH.C","NEURON_DIFFERENTIATION","GOBP_POSITIVE_REGULATION_OF_CELL_CYCLE","GOBP_REGULATION_OF_MITOTIC_NUCLEAR_DIVISION","GOBP_REGULATION_OF_TRANS_SYNAPTIC_SIGNALING","GOBP_SYNAPSE_ASSEMBLY","GOBP_CELL_CYCLE_DNA_REPLICATION"),]
options(repr.plot.width = 10, repr.plot.height = 8)
dotplot(ck_copy,showCategory = 10)
[ ]: