22131_Trinucleotide

2022-5-16 18:21| 发布者: Hocassian| 查看: 23| 评论: 0|原作者: 肇庆学院ACM合集

摘要:
C:\Users\Administrator\Downloads\2019-10-12-10-14-5-89506134356900-Problem List-采集的数据-后羿采集器.html

Pro.ID

22131

Title

Trinucleotide

Title链接

http://10.20.2.8/oj/exercise/problem?problem_id=22131

AC

0

Submit

80

Ratio

0.00%

时间&空间限制

  • Time Limit: 2000/1000 MS (Java/Others)     Memory Limit: 65536/65536 K (Java/Others)
  • 描述

    Bioinformatics, as a new subject, get its great development in science. Researches in DNA (Deoxyribonucleic acid) are very popular today. We know that DNA is made up with nucleotides, which are 'A', 'T', 'C', 'G'.

    Studying from some thesis, we know "Genomic Signature"(基因特征组), another hotspot in bioinformatics, is preserved in short DNA fragments. So in this problem, we focus on "Trinucleotide"(三核苷酸) — DNA fragments with length 3.

    There are 64 species of trinucleotide, which are "AAA", "AAG", ... ,"GGG". In a DNA sequence with length L, there are (L-2) trinucleotides. We use statistical method to do some analyses. See below:

    For these (L-2) trinucleotides, we give each of them a label, from 1 to L-2.

    We select every pairs of trinucleotides, there are (L-2)*(L-3)/2 pairs totally. If a pair of two trinucleotides is same, we note down the distance of these two trinucleotides. The distance is defined as the differences between the labels.

    According these "Sample Data"(样本数据) we noted down, we need to calculate the "Variance"(方差) of the sample data. Do you still remember how to calculate the variance? S2=[(x1-X)2+{x2-X)2 .. .+ (xn-X)2]/n, X=(x1+x2+.. .+xn)/n. If the sample data size n=0, we assume that S2=X=0.

    For example, for the DNA sequence ATATATA:

    We label the trinucleotides.  L1 : ATA,  L2 : TAT,  L3 : ATA,  L4 : TAT,  L5 : AtA

    (L1, L3)=2, (L1, L5)=4, (L3, L5)=2, (L2, L4)=2. So the sample data is 2, 4, 2, 2.

    The average X=(2+4+2+2)/4=2.5 .

    The variance S2= [(2-2.5)2+(4-2.5)2+(2-2.5)2+(2-2.5)2]/4 = 0.75

    Now I give you a DNA sequence, please calculate the variance mentioned aboves.

    输入

    The first line of input there is one integer T ( T ≤ 100), giving the number of test cases in the input. For each test case, there is a string consists with 'A', 'T', 'C', 'G', which is the DNA sequence. The length of the string will be no less than 3 and no more than 100000.

    输出

    Description

    Bioinformatics, as a new subject, get its great development in science. Researches in DNA (Deoxyribonucleic acid) are very popular today. We know that DNA is made up with nucleotides, which are 'A', 'T', 'C', 'G'.

    Studying from some thesis, we know "Genomic Signature"(基因特征组), another hotspot in bioinformatics, is preserved in short DNA fragments. So in this problem, we focus on "Trinucleotide"(三核苷酸) — DNA fragments with length 3.

    There are 64 species of trinucleotide, which are "AAA", "AAG", ... ,"GGG". In a DNA sequence with length L, there are (L-2) trinucleotides. We use statistical method to do some analyses. See below:

    For these (L-2) trinucleotides, we give each of them a label, from 1 to L-2.

    We select every pairs of trinucleotides, there are (L-2)*(L-3)/2 pairs totally. If a pair of two trinucleotides is same, we note down the distance of these two trinucleotides. The distance is defined as the differences between the labels.

    According these "Sample Data"(样本数据) we noted down, we need to calculate the "Variance"(方差) of the sample data. Do you still remember how to calculate the variance? S2=[(x1-X)2+{x2-X)2 .. .+ (xn-X)2]/n, X=(x1+x2+.. .+xn)/n. If the sample data size n=0, we assume that S2=X=0.

    For example, for the DNA sequence ATATATA:

    We label the trinucleotides.  L1 : ATA,  L2 : TAT,  L3 : ATA,  L4 : TAT,  L5 : AtA

    (L1, L3)=2, (L1, L5)=4, (L3, L5)=2, (L2, L4)=2. So the sample data is 2, 4, 2, 2.

    The average X=(2+4+2+2)/4=2.5 .

    The variance S2= [(2-2.5)2+(4-2.5)2+(2-2.5)2+(2-2.5)2]/4 = 0.75

    Now I give you a DNA sequence, please calculate the variance mentioned aboves.

    Input

    The first line of input there is one integer T ( T ≤ 100), giving the number of test cases in the input. For each test case, there is a string consists with 'A', 'T', 'C', 'G', which is the DNA sequence. The length of the string will be no less than 3 and no more than 100000.

    Output

    For each test case, output one line with the answer S2, rounded to 1e-6. If the "Relative error" between your answer and standard output is less than 1e-8, we consider you are right.

    Sample Input

    1
    ATATATA

    Sample Output

    0.750000

    Source

    样例输入

    1
    ATATATA

    样例输出

    0.750000

    作者


    路过

    雷人

    握手

    鲜花

    鸡蛋

    最新评论

    返回顶部