Need help with DNA program

Hello, need some guidance on below questions:

Here’s a description of protein:
A protein has the following qualities:

  1. It begins with a “start codon”: ATG.
  2. It ends with a “stop codon”: TGA.
  3. In between, each additional codon is a sequence of three nucleotides.

Code:
//find out which is protein

public class DNA {

public static void main(String args){

String dna1 = "ATGCGATACGCTTGA";
String dna2 = "ATGCGATACGTGA";
String dna3 = "ATTAATATGTACTGA";
String dna = dna3;
int start = dna.indexOf("ATG");
int stop = dna.indexOf("TGA");

if (start != -1 && stop != -1 && (start-stop) % 3 == 0){

  String protein = dna.substring(start, **stop+3);**
  System.out.println("Protein: " + protein);
  System.out.println("Condition 1 and 2 and 3 are satisfied.");
  } else {
  System.out.println("No Protein");
  }

Why does the substring need the additional +3? I know without it it wont print the full DNA out, but wondering why is it at 3 instead of other numbers?

Also description states that protein begins with start codon “ATG”, dna3 does not have ATG in the beginning but rather in the middle. However, program says its DNA, is it really accurate?

Thanks! :slight_smile:

int stop = dna.indexOf("TGA");

This gives the index of the stop codon. If the string “TGA” is found in dna, then the indexOf method will return the index where this string begins.

In the dna3 "ATTAATATGTACTGA" string, "TGA" is at the very end of the string.
The "T" is at index 12, "G" at index 13, and "A" at index 14.
The indexOf method will return 12 because this is the index where the string "TGA" begins.

Since we want to include the stop codon “TGA” in protein and the stop codon consists of three characters (T, G, A), so 3 is added to the stop index.
In the dna3 case, the start codon is found at index 6, and the stop codon is at index 12.

//** Would select the substring "ATGTAC"
String protein =  dna.substring(start, stop); 

//** Would select the substring "ATGTACTGA"
String protein =  dna.substring(start, stop+3);

The start codon is not constrained to occur at the beginning of the dna string. Similarly, the end codon doesn’t have to be at the very end of the dna string. As long as the start codon is found somewhere in the dna, and the end codon is found somewhere after the start codon, that is sufficient to fulfill the first two conditions. However, if the sequence between the codons is not a multiple of 3, then condition 3 would be violated. This happens for dna2.

1 Like