Skip to content

Input Data

Since scMethtools is designed for the analysis of post-alignment data, the tool's input file is a pre-processed single-cell methylation file. Each row in the input file corresponds to a cytosine site in the genome and must include at least four essential columns: - chromosome, - base position - base context - and methylation status. However, methylation files processed by upstream software typically contain additional columns of information. For example, the methylation data files stored in the scMethBank database include seven columns: - chromosome - strand information (positive/negative) - base position - base context - C coverage - methylation level - and methylated C.

Column Index Column Name Example Description
0 chromosome Chr1 Chromosome name (consistent with the reference genome)
1 position 135006 Cytosine position (1-based)
2 strand + Positive or negative strand
3 context CGT Base context
4 mc 1 Methylation level
5 cov 2 Site coverage
6 methylated 1 Number of methylated cytosines recorded

The methylation file formats obtained from different upstream analysis software vary. To accommodate various input file formats, we have embedded file format parsing functionality in the software to support methylation files generated by different upstream processing tools, such as Bismark (Krueger & Andrews, 2011), BSseeker2 (Guo et al., 2018), MethylPy (Schultz et al., 2015), and others. Additionally, to support input file formats generated by other software packages and command-line tools, scMethtools also allows for custom file format parsing through parameters. Custom parameters must specify the field positions using a colon-separated string.

Common Format

Bismark

Column Index Column Name Example Description
0 chromosome Chr1 Chromosome name (consistent with the reference genome)
1 position 135006 Cytosine position (1-based)
2 strand + Positive or negative strand
3 context CGT Base context
4 mc 1 Methylation level
5 cov 2 Site coverage
6 methylated 1 Number of methylated cytosines recorded

MethylPy

Column Index Column Name Example Description
0 chromosome Chr1 Chromosome name (consistent with the reference genome)
1 position 135006 Cytosine position (1-based)
2 strand + Positive or negative strand
3 context CGT Base context
4 mc 1 Methylation level
5 cov 2 Site coverage
6 methylated 1 Number of methylated cytosines recorded

User-defined Format

Customize during reading by specifying pipeline parameters, such as:

Text Only
scm.pp.import_cells(data_dir="/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/demo_test_dir/raw",output_dir="/xtdisk/methbank_baoym/zongwt/single/data/GSE56789/demo_test_dir/",  pipeline="1:2:5:6c:4:\t:0")
- custom order string: "1:2:3:6c:5:\t:0" (chrom:position:methylated_C:coverage©/unmethylated_C(u):context:sep:header) - note: 1-based index