Exercise: DNA sequencing


  • Create a file called dna_sequencing.py
  • A, C, T, G are called bases or nucleotides
  • Accept a sequence on the command line like this: python dna_sequencing.py ACCGXXCXXGTTACTGGGCXTTGTXX
  • Given a sequence such as the one above (some nucleotides mixed up with other elements represented by an X)
  • First return the sequences containing only ACTG. The above string can will be changed to ['ACCG', 'C', 'GTTACTGGGC', 'TTGT'].
  • Then sort them by lenght. Expected result: ['GTTACTGGGC', 'ACCG', 'TTGT', 'C']
  • Create a file called extended_dna_sequencing.py
  • In this case the original string contains more than on type of foreign elements: e.g. 'ACCGXXTXXYYGTTQRACQQTGGGCXTTGTXX'.
  • Expected output: ['TGGGC', 'ACCG', 'TTGT', 'GTT', 'AC', 'T']
  • Ask for a sequence on the Standard Input (STDIN) like this:
python extended_dna_sequencing.py
Please type in a sequence: