Input Data
Since scMethtools is designed for the analysis of post-alignment data, the tool's input file is a pre-processed single-cell methylation file. Each row in the input file corresponds to a cytosine site in the genome and must include at least four essential columns: - chromosome, - base position - base context - and methylation status. However, methylation files processed by upstream software typically contain additional columns of information. For example, the methylation data files stored in the scMethBank database include seven columns: - chromosome - strand information (positive/negative) - base position - base context - C coverage - methylation level - and methylated C.
Column Index | Column Name | Example | Description |
---|---|---|---|
0 | chromosome | Chr1 | Chromosome name (consistent with the reference genome) |
1 | position | 135006 | Cytosine position (1-based) |
2 | strand | + | Positive or negative strand |
3 | context | CGT | Base context |
4 | mc | 1 | Methylation level |
5 | cov | 2 | Site coverage |
6 | methylated | 1 | Number of methylated cytosines recorded |
The methylation file formats obtained from different upstream analysis software vary. To accommodate various input file formats, we have embedded file format parsing functionality in the software to support methylation files generated by different upstream processing tools, such as Bismark (Krueger & Andrews, 2011), BSseeker2 (Guo et al., 2018), MethylPy (Schultz et al., 2015), and others. Additionally, to support input file formats generated by other software packages and command-line tools, scMethtools also allows for custom file format parsing through parameters. Custom parameters must specify the field positions using a colon-separated string.
Common Format¶
Bismark
Column Index | Column Name | Example | Description |
---|---|---|---|
0 | chromosome | Chr1 | Chromosome name (consistent with the reference genome) |
1 | position | 135006 | Cytosine position (1-based) |
2 | strand | + | Positive or negative strand |
3 | context | CGT | Base context |
4 | mc | 1 | Methylation level |
5 | cov | 2 | Site coverage |
6 | methylated | 1 | Number of methylated cytosines recorded |
MethylPy
Column Index | Column Name | Example | Description |
---|---|---|---|
0 | chromosome | Chr1 | Chromosome name (consistent with the reference genome) |
1 | position | 135006 | Cytosine position (1-based) |
2 | strand | + | Positive or negative strand |
3 | context | CGT | Base context |
4 | mc | 1 | Methylation level |
5 | cov | 2 | Site coverage |
6 | methylated | 1 | Number of methylated cytosines recorded |
User-defined Format¶
Customize during reading by specifying pipeline parameters, such as:
- custom order string: "1:2:3:6c:5:\t:0" (chrom:position:methylated_C:coverage©/unmethylated_C(u):context:sep:header) - note: 1-based index