In mathematics, a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. For example, the sequence
Contents
The subsequence should not be confused with substring
Common subsequence
Given two sequences X and Y, a sequence Z is said to be a common subsequence of X and Y, if Z is a subsequence of both X and Y. For example, if
then a common subsequence of X and Y could be
This would not be the longest common subsequence, since Z only has length 3, and the common subsequence
Applications
Subsequences have applications to computer science, especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.
Take two sequences of DNA containing 37 elements, say:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTASEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAAThe longest common subsequence of sequences 1 and 2 is:
LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTAThis can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTASEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAAAnother way to show this is to align the two sequences, i.e., to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) in one sequence when two elements in the same column differ:
SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA- | || ||| ||||| | | | | || | || | || | |||SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAASubsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.