HCV Database
HCV sequence database
 



To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.


TreeRate Explanation

Input
Rooting Methods
Options for multiple timepoints

Specify time points and group taxa

Below is an example of grouped taxon names. Groups are specified as lists of taxon names. Each taxon must be on a separate line, and groups are separated by an empty line. The first item in a group will be taken as the time point of the group and should be numeric and end in ':'. If the first time point of a group is in the 'Discard:' list, the taxa belong to the group will not be considered in the calculation. Any taxa that are not present in any group will be considered to be in the 'Discard:' group.

The following can be pasted in with the Sample Input for testing the Grouped Taxon Names option:

1990:
B.US.90.5_
B.US.90.2_
B.US.90.3_

1981:
B.US.81.7_
B.US.81.5_
B.US.81.2_
B.US.81.6_

Discard:
B.US.81.1_

Discard taxa

You can remove sequences, or groups of sequences, from the analysis without removing them from the Newick tree. Taxa to be discarded can be included in the file of grouped taxon names (as shown above), or submitted as a separate file in the "Discard taxa" input box:

example:

Discard:
B.US.90.1_
B.US.81.7_
or just taxon names:
B.US.90.1_
B.US.81.7_

Values of time points are 2-digit years

If your sequences are named with 2-digit years (for example, B.US.08.sequence_name), select this option to specify that this year value is the actual time distance between samples.

For example, if left unchecked, the timepoint values 99, 00, 01 would be placed in numerical order as 00, 01, 99.

Calculate evolutionary rate

Select this option to calculate an evolutionary rate. You must define the numerical time distance between your grouped timepoints, unless this information is encoded in your timepoint names.

For example, if your sampling timepoints were named A, B, C, you need to provide a data file that defines the actual time distance numerically. If your timepoints were named 5.0, 6.6, 8.2, and if these numbers correspond to the actual number of years (or months or days), then you do NOT need to provide any additional input here. See sample file.

Sequence length to calculate Jukes-Cantor error

Enter the length of the alignment that was used to generate the tree. This number is required to generate the approximate confidence interval of Δd.

Trim out discarded taxa

If checked, this option will remove discarded taxa from your tree (and from the resulting treefile that you can pass to other tools). If unchecked, discarded taxa will not be used in the analysis, but will still appear in your tree.

Calculations for multiple time points

Based on the user input, the tool roots the input tree in all possible ways. For each rooting point, the tool estimates an average distance from the root to the Timepoint 1 taxa (x1) and an average distance from the root to the Timepoint 2 taxa (x2). The difference between the average distances from the Timepoint 2 taxa and the Timepoint 1 taxa (x2 - x1) gives a Δd value for each rooting point. There can be two or more timepoints (defined by the user's groupings). The tool then calculates the sum of variances of the taxa in the different timepoints for each rooting point. The Δd from the rooting point that gives the lowest minimum sum of variances will give the best estimation of an evolutionary rate for the chosen time points in the tree.

TreeRate Delta d calculation.

Figure. Illustration of average distance (x) and difference (Δd) values for 2 time points.

For the calculations of evolutionary rate, the tool calculates an average time for each group. The differences between the average times give Δt values. The evolutionary rate for each Δt is calculated by dividing Δd by Δt, and is presented as substitutions per site per unit time (in whatever units were used in the dates input file). The evolutionary rate for every rooting point of the input tree is calculated; the best estimated evolutionary rate will be the one with the lowest minimum sum of variances.

Output for multiple time points

 




Questions or comments? Contact us at seq-info@lanl.gov