SAM (file format) is a text-based format for storing biological sequences aligned to a reference sequence developed by Heng Li. The acronym SAM stands for Sequence Alignment/Map. It is widely used for storing data, such as nucleotide sequences, generated by Next generation sequencing technologies. "The format supports short and long reads (up to 128Mbp) produced by different sequencing platforms and is used to hold mapped data within the GATK and across the Broad Institute, the Sanger Centre, and throughout the 1000 Genomes project. Sequence Alignment/Map (SAM) format for alignment of nucleotide sequences (e.g. sequencing reads) to (a) reference sequence(s). May contain base-call and alignment qualities and other data."
Contents
Format
The SAM format consists of a header and an alignment section. The binary representation of a SAM file is a BAM file, which is a compressed SAM file. SAM files can be analysed and edited with the software SAMtools. The header section must be prior to the alignment section if it is present. Heading's begin with the '@' symbol, which distinguishes them from the alignment section. Alignment sections have 11 mandatory fields, as well as a variable number of optional fields.
Link to EDAM ontology term
EDAM ontology term for SAM: http://purl.bioontology.org/ontology/EDAM?conceptid=http%3A%2F%2Fedamontology.org%2Fformat_2573