HCV Database
HCV sequence database
 



To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.


Consensus Maker takes an input file of aligned sequences in table, mase, or fasta format and calculates a consensus sequence for those sequences. The program either returns the consensus alone combined with the original alignment. A copy of the output file can be downloaded. If the input alignment contains blocks of sequences (e.g., HCV sequences grouped by genotype) then the program can calculate a consensus for each sequence block.

A good way to understand the options available in this program is to click the blue Sample Input button at the top of the submission page. This causes a simple, hypothetical alignment (in table format) to be loaded into the form. You can then calculate the consensus of this alignment under varying input options to see the results of those options. Each column of the Sample Input has been chosen to illustrate the workings of the various options. Col. 1: unanimity, Col. 2: majority, Col. 3: no majority letter but resolvable by common character, Col. 4: gaps, Col. 5: irresolvable tie in consensus, Col. 6: X character, Cols. 7-9: missing information (trailing blanks).

Consensus options

  • Format of input alignment. There are three acceptable input formats -- fasta (default), mase (Intelligenetics), or table. Example formats.
  • Do consensus for each block. If the input contains blocks of sequences then calculate a consensus for each block, not just a single consensus for the alignment as a whole. Default = false. If false only a single consensus is computed for the entire alignment. If true then you must leave a blank empty line between sequence blocks. Example.
  • Min. no. seqs. for consensus. If a block contains fewer than "n" sequences, then don't calculate a consensus for that block. Default = 3.
  • Do consensus of consensuses. If consensuses are to be computed for each block in the alignment also calculate a consensus of these consensuses. Default = false.
  • Consensus + alignment. Results will show consensus appended to the top of the user's alignment. Default = true. When false, the output consists of the consensus alone.
  • Unanimous value. The fraction of characters in a column of the alignment needed to establish unanimity (shown as a capital letter) for that column. For example, if unanimous = 1.0 then all characters in a column must be the same in order for the consensus to show a capital letter. A value of .9 requires 90% agreement to show a capital. Default = 1.0
  • Majority value. The fraction of characters in a column of the alignment needed to establish majority (shown as a lowercase letter) for that column. For example, if majority = 0.5 then at least half the characters in a column must agree in order for the consensus to show a lowercase letter. If there is no majority letter for a column the consensus indicates this with a ? or by the most common character.Default = 0.5
  • Use most common character. This option determines what symbol to enter in the consensus for a column that has no majority character. Suppose a column contained letters AAAGGTTC. Does the user want that column to be represented in the consensus by "a" (i.e., the most common letter)? If so, then set this value to its default, true. Or does the user want that column to be represented in the consensus by "?" (i.e., no letter forms a majority)? If so, then set this value to false. If there is a tie between two most common letters, e.g., AAAGGGT, then the programs puts a "?" into the consensus. If multiple blocks are present in the alignment and there is a tie between two letters in one block, the program will try to resolve the tie by looking at that column of the alignment in all other blocks as well. For example, if column 1 of block 1 is AAAGGGT, and column 1 of block 2 is AAAAG, then the consensus for column 1 block 1 will be "a", not "?"
  • Characters to count when making consensus. This is a set of characters ("letters") that the program considers when making a consensus. The default for nucleotide alignments is the set of valid nucleotide characters and the gap character "ACGTU-". Using these defaults, the alignment column AAAAAXAA would have a consensus of "A" because the "X" character is ignored -- it's not in the set of valid characters. If we edit the ACGTU- set by adding "X" to it, then the consensus for that column would be "a" (majority A, not unanimous). A similar set of amino acid codes, also editable, is defined on the input form. You should first run your alignment with the default character sets to see if that produces the alignment you want. If not then you can edit the character sets so the resulting consensus matches your intent.
  • Make a consensus of IUPAC ambiguity codes for an alignment, or make multiple consensuses of blocks of sequences in an alignment. This site works only for nucleotide alignments. Characters considered are ACGTU. You can specify a percentage presence which must be met by any character if it is to be considered in computing the consensus for a column. For example the column AAAAAAAAAAG would be scored as "R" if the percentage is 0, but would be scored as A if the percentage was 10%. In the latter case the G would be ignored since it makes up <10% of the total column.
  • Use any character when making consensus. Finally, if you want to consider ALL characters (including blanks, *, x, $, etc.) when making a consensus check that box.

Example file formats:

fasta format (for the purposes of this program the empty blank line following sequence 2 divides the alignment into 2 blocks)

>SequenceName1
aaactatcgtagctagctagctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc
>SequenceName2
aaactatcgtagctagctag------gatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc

>SequenceName3
aaactatcgtagctagctttctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc
>SequenceName4
acactatcgtagctagctagctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctttctagctgatcgcgcgagcgctacgagc

mase format (for the purposes of this program the empty blank line following sequence 2 divides the alignment into 2 blocks)

;
SequenceName1
aaactatcgtagctagctagctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc
;
SequenceName2
aaactatcgtagctagctag------gatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc

;
SequenceName3
aaactatcgtagctagctttctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctagctagctgatcgcgcgagcgctacgagc
;
SequenceName4
acactatcgtagctagctagctgatcgatgctagctgatcgaccggcgagagcgatgatctactatc
atcagcgagcatcgacgagctttctagctgatcgcgcgagcgctacgagc

table format one sequence per line, "tab" character between name and sequence. Blank line following sequence 2 divides the alignment into 2 blocks

SequenceName1 aaactatcgtagctagctagctgatcgatgctagctgatcg.... etc
SequenceName2 aaactatcgtagctagctag------gatgctagctgatcg.... etc

SequenceName3 aaactatcgtagctagctttctgatcgatgctagctgatcg.... etc
SequenceName4 acactatcgtagctagctagctgatcgatgctagctgatcg.... etc



Questions or comments? Contact us at hcv-info@lanl.gov