Differential Methylation analysis and functional enrichment

In [1]:

Copied!





import pandas as pd
import numpy as np
import scanpy as sc
import scMethtools as scm
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scanpy as sc
import scMethtools as scm
import matplotlib.pyplot as plt

 ####   ####  #    # ##### #    # #####  ####   ####  #       ####  
 #      #    # ##  ##   #   #    #   #   #    # #    # #      #      
  ####  #      # ## #   #   ######   #   #    # #    # #       ####  
      # #      #    #   #   #    #   #   #    # #    # #           # 
 #    # #    # #    #   #   #    #   #   #    # #    # #      #    # 
  ####   ####  #    #   #   #    #   #    ####   ####  ######  ####                                                

Version: 1.0.0, Tutorials: https://ngdc.cncb.ac.cn/methbank/scm

In [5]:

Copied!

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

Import Data and settings¶

In [3]:

Copied!

#import anndata as ad
# adata = ad.read('/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/scbs/after/promoters.h5ad')
adata = scm.read('/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/scbs/after/promoters.h5ad')
#import anndata as ad
# adata = ad.read('/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/scbs/after/promoters.h5ad')
adata = scm.read('/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/scbs/after/promoters.h5ad')

/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/anndata/__init__.py:51: FutureWarning: `anndata.read` is deprecated, use `anndata.read_h5ad` instead. `ad.read` will be removed in mid 2024.
  warnings.warn(

In [2]:

Copied!

#gloabel  rcParams parameters
scm.settings.set_figure_params(dpi=150,figsize=[3,3],fontsize=8,font='Arial')
#gloabel  rcParams parameters
scm.settings.set_figure_params(dpi=150,figsize=[3,3],fontsize=8,font='Arial')

Differential analysis requires known cell sample labels. For methylation data, low-throughput datasets (e.g., those obtained using mouth pipetting) typically have known labels, such as sampling site and cell type.
For high-throughput methods based on microfluidics and cell barcoding, cell labels may sometimes be unknown, requiring clustering and even cell type annotation beforehand.

In the following example, we use the existing labels in the demo data for differential identification. These labels do not represent the true cell types but are used for comparison purposes only.

In [4]:

Copied!

scm.pl.set_colors(adata, 'Treatment', palette=scm.pl.palette(), n_colors=5)
scm.pl.set_colors(adata, 'Treatment', palette=scm.pl.palette(), n_colors=5)

Out[4]:

AnnData object with n_obs × n_vars = 45 × 38794
    obs: 'cell_id', 'cell_name', 'sites', 'meth', 'n_total', 'global_meth_level', 'Series', 'condition', 'sample', 'Run', 'Organism', 'Cell_ID', 'Source_name', 'Cell_type', 'total_cell_type', 'subgroup', 'Development_stage', 'Strain', 'Tissue', 'Cell_line', 'Genotype', 'Treatment', 'Age', 'Sex', 'Race', 'other', 'Disease', 'CenterName', 'avgLength', 'LibraryStrategy', 'LibraryLayout', 'Platform', 'Model', 'Total_bases.Gb', 'Total_reads', 'Reads_After_trim', 'Uniq_mapping.reads', 'Mapping_ratio', 'Conversion_ratio', 'TotalCpGs(1X)', 'CpGsTotalUnique(1X)', 'CpGsMeanCov(1X)', 'CpGsMethRatio(1X)', 'CpGsMethReadRatio(1X)', 'TotalCpGs(3X)', 'CpGsTotalUnique(3X)', 'CpGsMeanCov(3X)', 'CpGsMethRatio(3X)', 'CpGsMethReadRatio(3X)', 'TotalCpGs(5X)', 'CpGsTotalUnique(5X)', 'CpGsMeanCov(5X)', 'CpGsMethRatio(5X)', 'CpGsMethReadRatio(5X)', 'Label', 'n_features', 'pct_peaks', 'leiden'
    var: 'chromosome', 'start', 'end', 'covered_cell', 'var', 'sr_var', 'mean', 'pct_cell', 'feature_select_sr', 'feature_select_var', 'Accession', 'Gene', 'Distance'
    uns: 'Cell_type_colors', 'Label_colors', 'Treatment_colors', 'dendrogram_leiden', 'label_color', 'leiden', 'leiden_colors', 'motif_analysis', 'neighbors', 'relative_pca_var_ratios', 'treatment', 'umap', 'workdir'
    obsm: 'X_tsne', 'X_umap', 'relative_pca'
    layers: 'relative'
    obsp: 'connectivities', 'distances'

In [6]:

Copied!

#scm.pl.set_colors(adata, 'Treatment', palette=scm.pl.palette(), n_colors=5)
scm.pl.grouped_value_boxplot(adata,color_by='Treatment',value_column='global_meth_level')
#scm.pl.set_colors(adata, 'Treatment', palette=scm.pl.palette(), n_colors=5)
scm.pl.grouped_value_boxplot(adata,color_by='Treatment',value_column='global_meth_level')

Get colors from adata.unsTreatment_colors ...

/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scMethtools/plotting/basic.py:233: UserWarning: Numpy array is not a supported type for `palette`. Please convert your palette to a list. This will become an error in v0.14
  boxplot = sns.boxplot(x=color_by, y=value_column, data=adata.obs,
/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scMethtools/plotting/basic.py:233: UserWarning: The palette list has more values (5) than needed (3), which may not be intended.
  boxplot = sns.boxplot(x=color_by, y=value_column, data=adata.obs,
/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scMethtools/plotting/basic.py:238: UserWarning: Numpy array is not a supported type for `palette`. Please convert your palette to a list. This will become an error in v0.14
  sns.stripplot(x=color_by, y=value_column, data=adata.obs,
/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scMethtools/plotting/basic.py:238: UserWarning: The palette list has more values (5) than needed (3), which may not be intended.
  sns.stripplot(x=color_by, y=value_column, data=adata.obs,

No description has been provided for this image

Out[6]:

(<Figure size 450x450 with 1 Axes>,
 <Axes: title={'center': 'Grouped Boxplot of global_meth_level by Treatment'}, xlabel='Treatment', ylabel='global_meth_level'>)

Differentially methylated region calling¶

one vs one compare¶

We can also do pairwise comparisons between two groups. For instance, clusters 1 & 2 are both mESC cells but different treatment.

In [8]:

Copied!

scm.dm.mdiff_pairwise(adata, group_by='Treatment', cluster='2i', reference='serum/LIF', key_added = "Treatment")
scm.dm.mdiff_pairwise(adata, group_by='Treatment', cluster='2i', reference='serum/LIF', key_added = "Treatment")

... Running differential methylation analysis between 2i and reference serum/LIF

In [9]:

Copied!

df_pairwise = sc.get.rank_genes_groups_df(adata,group=None,key='Treatment')
df_pairwise = sc.get.rank_genes_groups_df(adata,group=None,key='Treatment')

In [10]:

Copied!

df_pairwise.head()
df_pairwise.head()

Out[10]:

	names	scores	logfoldchanges	pvals	pvals_adj
0	chr12:69360258-69362258	2.028904	1.587244	0.058618	1.0
1	chr16:64850652-64852652	2.011422	1.679350	0.065555	1.0
2	chr12:69360339-69362339	1.927094	1.426012	0.068754	1.0
3	chr12:69360481-69362481	1.801351	1.302340	0.086583	1.0
4	chr11:20542253-20544253	1.738439	2.164821	0.106372	1.0

In [12]:

Copied!

scm.pl.plot_volcano(adata,key='Treatment', group='2i',log2fc_min=-10, show_name=False)
scm.pl.plot_volcano(adata,key='Treatment', group='2i',log2fc_min=-10, show_name=False)

It can be observed that mESCs treated with 2i exhibit differential hypomethylation compared to those treated with serum/LIF, with nearly all DMRs showing hypomethylation.

one vs rest compare¶

We can also do comparisons of individual clusters on one vs many clusters to find specific DMR

In [13]:

Copied!

scm.dm.mdiff_specific(adata,group_by='Treatment', key_added='Treatment_all')
scm.dm.mdiff_specific(adata,group_by='Treatment', key_added='Treatment_all')

... Running differential methylation analysis between ['2i', 'Unknown', 'serum/LIF'] and reference rest

In [14]:

Copied!

df = sc.get.rank_genes_groups_df(adata,key='Treatment_all',group=None)
df = sc.get.rank_genes_groups_df(adata,key='Treatment_all',group=None)

In [15]:

Copied!

df.head()
df.head()

Out[15]:

	group	names	scores	logfoldchanges	pvals	pvals_adj
0	2i	chr3:96219168-96221168	2.810465	NaN	0.004947	0.009682
1	2i	chr11:20542253-20544253	2.784799	2.394727	0.005356	0.010361
2	2i	chr12:56535908-56537908	2.707800	NaN	0.006773	0.012659
3	2i	chr3:96218865-96220865	2.463970	NaN	0.013741	0.023136
4	2i	chr12:69360258-69362258	2.220139	1.572072	0.026409	0.040364

In [16]:

Copied!

df['group'].unique().tolist()
df['group'].unique().tolist()

Out[16]:

['2i', 'Unknown', 'serum/LIF']

plot marker for cluster

In [25]:

Copied!

scm.pl.plot_marker(adata,top=10,basis='umap',key_added='Treatment_all',cluster='2i', size=30,ncols=5)
scm.pl.plot_marker(adata,top=10,basis='umap',key_added='Treatment_all',cluster='2i', size=30,ncols=5)

In [28]:

Copied!

var_names = adata.uns['Treatment_all']['names']['2i'].tolist()
sc.pl.stacked_violin(adata, var_names[:20], groupby='Treatment',save='test_stack.png',cmap='Greens')
var_names = adata.uns['Treatment_all']['names']['2i'].tolist()
sc.pl.stacked_violin(adata, var_names[:20], groupby='Treatment',save='test_stack.png',cmap='Greens')

/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scanpy/plotting/_stacked_violin.py:461: UserWarning: Numpy array is not a supported type for `palette`. Please convert your palette to a list. This will become an error in v0.14
  row_ax = sns.violinplot(
/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scanpy/plotting/_stacked_violin.py:461: UserWarning: Numpy array is not a supported type for `palette`. Please convert your palette to a list. This will become an error in v0.14
  row_ax = sns.violinplot(
/p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scanpy/plotting/_stacked_violin.py:461: UserWarning: Numpy array is not a supported type for `palette`. Please convert your palette to a list. This will become an error in v0.14
  row_ax = sns.violinplot(

WARNING: saving figure to file figures/stacked_violin_test_stack.png

In [31]:

Copied!

scm.pl.heatmap(adata, var_names=var_names[:50],swap_axes=True,annotation=['Cell_type','Treatment','Label'],groupby='leiden',cmap="GnBu",show_gene_labels=True,dendrogram=True)
scm.pl.heatmap(adata, var_names=var_names[:50],swap_axes=True,annotation=['Cell_type','Treatment','Label'],groupby='leiden',cmap="GnBu",show_gene_labels=True,dendrogram=True)

dendrogram_key by  dendrogram_leiden
Get colors from adata.unsCell_type_colors ...
Get colors from adata.unsTreatment_colors ...
Get colors from adata.unsLabel_colors ...

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[31], line 1
----> 1 scm.pl.heatmap(adata, var_names=var_names[:50],swap_axes=True,annotation=['Cell_type','Treatment','Label'],groupby='leiden',cmap="GnBu",show_gene_labels=True,dendrogram=True)

File /p300s/baoym_group/zongwt/zongwt/software/miniconda3/envs/py39/lib/python3.9/site-packages/scMethtools/plotting/heatmap.py:698, in heatmap(adata, var_names, groupby, annotation, use_raw, log, num_categories, dendrogram, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, standard_scale, swap_axes, show_gene_labels, show, save, figsize, vmin, vmax, vcenter, norm, **kwds)
    695 if var_group_positions is not None and len(var_group_positions) > 0:
    696     return_ax_dict["gene_groups_ax"] = gene_groups_ax
--> 698 _utils.savefig_or_show("heatmap", show=show, save=save)
    699 show = settings.autoshow if show is None else show
    700 if show:

AttributeError: module 'scanpy._utils' has no attribute 'savefig_or_show'

Gene Annotation of Differential Regions¶

To investigate the potential functions of the differential regions, DMRs can be annotated to the nearest gene.

In [ ]:

Copied!

#数据下载
#!wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz -O data/data_tutorial_buenrostro/gencode.v19.annotation.gtf.gz
gtf_file='/xtdisk/methbank_baoym/zongwt/single/genome/mouse/Mus_musculus.GRCm39.111.chr.gtf'
#数据下载
#!wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz -O data/data_tutorial_buenrostro/gencode.v19.annotation.gtf.gz
gtf_file='/xtdisk/methbank_baoym/zongwt/single/genome/mouse/Mus_musculus.GRCm39.111.chr.gtf'

The annotation process may take a few minutes.

In [56]:

Copied!

reference = scm.dm.parse_gtf(gtf_file)
scm.dm.annotation(adata,ref=reference,gene_type='protein_coding')
reference = scm.dm.parse_gtf(gtf_file)
scm.dm.annotation(adata,ref=reference,gene_type='protein_coding')

... Loading gene references
... protein_coding gene will be annotated.
... Done
... Loading regions info
... Overlapping genes with regions
...Overlapping finish

In [19]:

Copied!

adata.var.head()
adata.var.head()

Out[19]:

	chromosome	start	end	covered_cell	var	sr_var	mean	pct_cell	feature_select_sr	feature_select_var	Accession	Gene	Distance
index
chr1:58391898-58393898	chr1	58391898	58393898	42	0.005050	0.001915	0.030952	0.933333	False	False	ENSMUSG00000079554	Aox2	0
chr1:87189607-87191607	chr1	87189607	87191607	43	0.119925	0.093893	0.453605	0.955556	False	False	ENSMUSG00000026255	Efhd1	478
chr1:91610406-91612406	chr1	91610406	91612406	43	0.115410	0.091781	0.349442	0.955556	False	False	ENSMUSG00000007805	Twist2	116777
chr1:180356649-180358649	chr1	180356649	180358649	43	0.173246	0.121766	0.501581	0.955556	True	True	ENSMUSG00000026496	Parp1	37840
chr1:123755558-123757558	chr1	123755558	123757558	33	0.171086	0.069639	0.566970	0.733333	False	False	ENSMUSG00000036815	Dpp10	0

In [20]:

Copied!

scm.pl.umap(adata,frameon=False, color=['Safb2','Ccdc69'],gene_symbols='Gene',size=50,cmap='Greens')
scm.pl.umap(adata,frameon=False, color=['Safb2','Ccdc69'],gene_symbols='Gene',size=50,cmap='Greens')

Note: Multiple regions are matched for Gene,  mean methylation value will be plotted 
Note: Multiple regions are matched for Gene,  mean methylation value will be plotted

Functional Enrichment analysis¶

In [21]:

Copied!

df_anno = sc.get.rank_genes_groups_df(adata,key='Treatment_all',group=None, gene_symbols='Gene')
df_anno = sc.get.rank_genes_groups_df(adata,key='Treatment_all',group=None, gene_symbols='Gene')

In [22]:

Copied!

df_anno.head()
df_anno.head()

Out[22]:

	group	names	scores	logfoldchanges	pvals	pvals_adj	Gene
0	2i	chr3:96219168-96221168	2.810465	NaN	0.004947	0.009682	Fcgr1
1	2i	chr11:20542253-20544253	2.784799	2.394727	0.005356	0.010361	Sertad2
2	2i	chr12:56535908-56537908	2.707800	NaN	0.006773	0.012659	Nkx2-1
3	2i	chr3:96218865-96220865	2.463970	NaN	0.013741	0.023136	Fcgr1
4	2i	chr12:69360258-69362258	2.220139	1.572072	0.026409	0.040364	Nemf

In [33]:

Copied!

#filter
gene_list = df_anno['Gene'].unique().to_list()
#filter
gene_list = df_anno['Gene'].unique().to_list()

In [42]:

Copied!

gene_list = scm.get.get_dmr_genes(adata,key_added='Treatment_all',direction='up',groups='2i')
gene_list = scm.get.get_dmr_genes(adata,key_added='Treatment_all',direction='up',groups='2i')

In [46]:

Copied!

gene_list['2i'][:5] #gene name should be upper for enrich
gene_list['2i'][:5] #gene name should be upper for enrich

Out[46]:

['SERTAD2', 'NEMF', 'NEMF', 'ESRP1', 'NRG3']

Functional enrichment analysis requires corresponding gene and pathway annotation files. If the runtime environment has internet access, online analysis can be performed by specifying library files.
Built-in libraries:
1. GO_Biological_Process_2018.gmt
2. HDSigDB_Human_2021.gmt
3. KEGG_2019_Human.gmt
4. WikiPathways_2024_Human.gmt
5. GO_Biological_Process_2025.gmt
6. HDSigDB_Mouse_2021.gmt
7. KEGG_2019_Mouse.gmt
8. WikiPathways_2024_Mouse.gmt
If the runtime environment does not have internet access, a limited set of built-in libraries can be used.
Users can also download the required libraries and specify the file path via parameters for analysis, but the files must be in .gmt format.

download .gmt file wget http://maayanlab.cloud/Enrichr/geneSetLibrary?mode=text&libraryName=KEGG_2021_Human -O KEGG_2021_Human.gmt

In [50]:

Copied!

pathways = scm.dm.enrich(gene_list['2i'], gene_sets='/xtdisk/methbank_baoym/zongwt/single/genome/enrich_library/WikiPathways_2024_Mouse.gmt')
pathways = scm.dm.enrich(gene_list['2i'], gene_sets='/xtdisk/methbank_baoym/zongwt/single/genome/enrich_library/WikiPathways_2024_Mouse.gmt')

2025-03-26 13:47:06,848 [INFO] User defined gene sets is given: /xtdisk/methbank_baoym/zongwt/single/genome/enrich_library/WikiPathways_2024_Mouse.gmt
2025-03-26 13:47:06,854 [INFO] Run: WikiPathways_2024_Mouse.gmt 
2025-03-26 13:47:06,883 [INFO] Background is not set! Use all 4494 genes in WikiPathways_2024_Mouse.gmt.
2025-03-26 13:47:06,936 [INFO] Done.

In [51]:

Copied!

scm.pl.plot_enrichment(pathways,figsize=(8,4),dpi=200,palette='Reds',save='test_enrich.png')
scm.pl.plot_enrichment(pathways,figsize=(8,4),dpi=200,palette='Reds',save='test_enrich.png')

Saving figure to : ./figures/test_enrich.png

In [ ]:

Copied!

pathways = scm.dm.enrich(gene_list['2i'], gene_sets='GO_Biological_Process_2018')
pathways = scm.dm.enrich(gene_list['2i'], gene_sets='GO_Biological_Process_2018')

Motif Analysis¶

Transcription factor binding motifs are often enriched with cis-regulatory elements, which can reveal potential regulatory mechanisms. The first step in studying these DNA motifs is to scan for their occurrences within genomic regions.

First, we need the FASTA sequence of the organism's genome, which can be downloaded from the UCSC repository.

In [ ]:

Copied!

!mkdir -p data
!wget https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz -O data/mm10.fa.gz
!cd data/ && gzip -d -f mm10.fa.gz
!mkdir -p data
!wget https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz -O data/mm10.fa.gz
!cd data/ && gzip -d -f mm10.fa.gz

1. Motif Scan¶

Using Position Weight Matrix (PWM) to predict transcription factor (TF) binding sites in DNA sequences.

for backgroung regions, we use all regions passed filter before, you can specified your own background regions

In [58]:

Copied!

all_regions = scm.get.get_var(adata)
all_regions = scm.get.get_var(adata)

In [59]:

Copied!

len(all_regions)
len(all_regions)

Out[59]:

In [ ]:

Copied!

scm.dm.motif_scan(adata, regions=all_regions, genome='/p300s/baoym_group/zongwt/zongwt/genome/mm10/mm10.fa')
scm.dm.motif_scan(adata, regions=all_regions, genome='/p300s/baoym_group/zongwt/zongwt/genome/mm10/mm10.fa')

It takes approximately a few tens of minutes to scan motifs across the whole genome regions.

2. Motif Enrichment¶

The enrichment of transcription factors (TFs) in target regions (DMRs) is statistically analyzed to infer whether TFs regulate specific genes.

In [124]:

Copied!

motif_result = motif_enrichment(adata,key_added='Treatment_all',direction='up',groups=None)
motif_result = motif_enrichment(adata,key_added='Treatment_all',direction='up',groups=None)

Calculating enrichment for group 2i ...

Finding enrichments: 100%|██████████| 1646/1646 [00:01<00:00, 1011.17it/s]

Calculating enrichment for group Unknown ...

Finding enrichments: 100%|██████████| 1646/1646 [00:01<00:00, 916.20it/s]

Calculating enrichment for group serum/LIF ...

Finding enrichments: 100%|██████████| 1646/1646 [00:01<00:00, 834.87it/s]

In [120]:

Copied!

motif_result
motif_result

Out[120]:

	2i	Unknown	serum/LIF
EN1	9.960413e-01	9.863188e-01	9.996782e-01
GFI1	1.000000e+00	1.000000e+00	8.308049e-01
TEAD1	9.999999e-01	9.999987e-01	9.999472e-01
OVO	1.000000e+00	1.000000e+00	1.000000e+00
VSX1	1.000000e+00	1.000000e+00	1.000000e+00
...	...	...	...
KLF17	1.474806e-39	1.946404e-29	5.903653e-13
MSGN1	1.000000e+00	1.000000e+00	9.999996e-01
OVOL1	7.372146e-01	5.817805e-01	3.373687e-01
FOXF1	1.000000e+00	1.000000e+00	9.999999e-01
PTF1A.VAR.3	6.600707e-01	2.905419e-01	9.999988e-01

1646 rows × 3 columns

In [125]:

Copied!

scm.pl.plot_motif(motif_result,top_n=5)
scm.pl.plot_motif(motif_result,top_n=5)

Out[125]:

(<Figure size 450x450 with 2 Axes>,
 <Axes: title={'center': 'Motif Enrichment Heatmap'}, xlabel='Group', ylabel='Motif'>)

Methylation Pattern for DMR¶

Do not use too long regions,this process is memory consumed

In [144]:

Copied!





gtf_file = "/xtdisk/methbank_baoym/zongwt/single/genome/mouse/Mus_musculus.GRCm39.111.chr.gtf"
scm.pl.profile(adata,
            npz_file="/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/sscm/chr7.npz",
            region="7:97591403-97593402",
            annotation_file=gtf_file,
            clusters=["2i"],group_by="Treatment",
            sample=None)
gtf_file = "/xtdisk/methbank_baoym/zongwt/single/genome/mouse/Mus_musculus.GRCm39.111.chr.gtf"
scm.pl.profile(adata,
            npz_file="/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/sscm/chr7.npz",
            region="7:97591403-97593402",
            annotation_file=gtf_file,
            clusters=["2i"],group_by="Treatment",
            sample=None)

2i  shape: 12
Others  shape: 33
['Gene Annotation', 'Profile', '2i', 'Others']

In [ ]: