I’m working on a python question and need a sample draft to help me learn. Explanation is long but simple

Example: Suppose that your program receives the …

Patient Sample
11223344 GGTCGGTAGACAGGTCGGTAGACAGGTCGGTAGACA
22233344 TTTCAGAATTAGACTGTTTAGAGAAACTAGACCACA
33344455 CCTAGTATGCACTATTGAAATGCTCGTTGATAGACA
55667788 TGCTCGTTAGTGAACACTTAGACTGTTTAGGACCAC

You know from a that these DNA sequences convert to…

  • Glycine,Arginine,.,Threonine,Glycine,Arginine,.,Threonine,Glycine,Arginine,.,Threonine
  • Phenylalanine,Glutamine,Asparagine,.,Threonine,Valine,.,Arginine,Asparagine,.,Threonine,Threonine
  • Proline,Serine,Methionine,Histidine,Tyrosine,.,Asparagine,Alanine,Arginine,.,.,Threonine
  • Cysteine,Serine,Leucine,Valine,Asparagine,Threonine,.,Threonine,Valine,.,Aspartic acid,Histidine

These are amino acid sequences, and your program’s job is to let the user search for patients whose amino acid sequences contain a user-specified sequence.

The program would prompt the user to enter an amino acid sequence, and the user might enter “Methionine,Histidine,Tyrosine,.” Your program would then iterate through the CSV, convert each DNA sequence to an amino acid sequence, and check to see if each patient’s amino acid sequence contained the user’s specified string. In this case, the third patient’s sequence contains the user input. Therefore, in this case, your program would print “33344455” on a line by itself (because that’s the patient’s ID, according to the CSV). If other patients also matched, then your program also would print their numbers, as well, one per line.

As another example, suppose that the user entered “Threonine,Valine,.” In this case, the second and fourth patients, above, would match. In that case, the program would print two lines of output containing “22233344” and “55667788”, respectively.

You can succeed in this assignment by doing four things.

1. Use your text editor to create a .py file containing a Python program that prompts the user to enter an amino acid sequence. Then, the program should call input() a single time to read the specified sequence into a variable called query. For example, the user might enter “Methionine,Histidine,Tyrosine,.” as in the example above.

2. Extend your program by adding code that reads in a CSV named “samples.csv” containing two columns: Patient and Sample. After the header row, every row’s Patient column will contain an 8-digit number, and the corresponding Sample column will contain a DNA sequence consisting of letters in the set T, G, A and C. The program should iterate through the rows of the CSV. You can use to test that your program can successfully iterate through the rows.

3. Near the top of your program, write a function called translate(dna) that takes one parameter called dna containing a DNA sequence and returns a string of the corresponding amino acid sequence. Your function should break the DNA sequence into chunks of 3 characters, convert the 3-character chunk into an amino acid, and then concatenate the amino acid sequence together into a string with one comma between each amino acid. Refer to your prior to remind yourself of the conversion from DNA chunks (called codons) to amino acids. Verify that this function works by passing some of the example DNA strings, above, into your translate() function to verify that the function returns the right output for each.

4. Go back to the part of your program where it iterates through the rows of the CSV and, inside the loop, pass the specified DNA sequence of each patient to your translate(dna) function. For each patient, you now have the patient’s number, the patient’s DNA (as a string), and the patient’s corresponding amino acid sequence (as a string). Write a conditional to see if the patient’s amino acid sequence contains the string stored in query. (Hint: use the . Or use the find() function.) If the patient’s amino acid sequence contains the query, then print out that patient’s number on a line.

This assignment is worth 70 points:

  • 10 points for reading the amino acid sequence into a variable called query
  • 10 points for creating a function called translate(dna) near the top of your program
  • 10 points if the translate(dna) function gives the correct outputs for the following inputs:
    • GGTCGGTAGACAGGTCGGTAGACAGGTCGGTAGACA
    • TTTCAGAATTAGACTGTTTAGAGAAACTAGACCACA
    • CCTAGTATGCACTATTGAAATGCTCGTTGATAGACA
    • TGCTCGTTAGTGAACACTTAGACTGTTTAGGACCAC
  • 10 points if the translate(dna) function gives the correct output for one other DNA string that you don’t know in advance (it’s a secret)
  • 10 points if your program handles the following queries correctly for :
    • It prints one line containing “33344455” when given a query of “Methionine,Histidine,Tyrosine,.”
    • It prints two lines containing “22233344” and “55667788” when given the input “Threonine,Valine,.”
  • 20 points if your program handles another different CSV with many rows, and a query that you don’t know in advance (it’s a secret)