Rahul Sharma (Editor)

Subsequence

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In mathematics, a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. For example, the sequence A , B , D is a subsequence of A , B , C , D , E , F obtained after removal of elements C , E , and F . The relation of one sequence being the subsequence of another is a preorder.

Contents

The subsequence should not be confused with substring A , B , C , D which can be derived from the above string A , B , C , D , E , F by deleting substring E , F . The substring is a refinement of the subsequence.

Common subsequence

Given two sequences X and Y, a sequence Z is said to be a common subsequence of X and Y, if Z is a subsequence of both X and Y. For example, if

X = A , C , B , D , E , G , C , E , D , B , G and Y = B , E , G , C , F , E , U , B , K

then a common subsequence of X and Y could be

Z = B , E , E .

This would not be the longest common subsequence, since Z only has length 3, and the common subsequence B , E , E , B has length 4. The longest common subsequence of X and Y is B , E , G , C , E , B .

Applications

Subsequences have applications to computer science, especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.

Take two sequences of DNA containing 37 elements, say:

SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

The longest common subsequence of sequences 1 and 2 is:

LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA

This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:

SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

Another way to show this is to align the two sequences, i.e., to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) in one sequence when two elements in the same column differ:

SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-         | || ||| ||||| |  | |  | || |  || | || |  ||| SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA

Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.

Theorems

  • Every infinite sequence of real numbers has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem).
  • Every bounded infinite sequence in Rn has a convergent subsequence (This is the Bolzano–Weierstrass theorem).
  • For every integers r and s, every finite sequence of length at least (r − 1)(s − 1) + 1 contains a monotonically increasing subsequence of length r or a monotonically decreasing subsequence of length s (This is the Erdős–Szekeres theorem).
  • References

    Subsequence Wikipedia