HCV Database
HCV sequence database
 



To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.


Explanation of QuickAlign Options

Provide query sequence(s)

Provide one or more nucleotide or protein sequences in any standard format, and the tool will align each one to an alignment. This tool can be used to align epitopes, functional domains, primers, or any region of interest. If multiple input sequences are used, they do not need to be aligned or from the same region. The output will be trimmed to the region(s) of the provided sequences. If nucleotide sequences are provided, the reverse complement of the sequences will also be considered when making the best match. If a reverse-complement sequence has a better match score than the original query sequence, then the aligned position of the reverse-complement sequence will be used to retrieve the alignment, instead of the direct sequence match.

Choose region coordinates

Instead of providing a query sequence, QuickAlign can be used to simply show the alignment of any particular region from our premade alignments. Enter coordinate values (either coordinates relative to the complete HCV sequence, or relative to any gene) and the tool will extract an alignment encompassing just the region of these coordinates. To show an entire gene or region, enter the start value as "1" and the end value as "end".

LANL database alignment type

Several choices are provided. See Alignments page for additional details on premade alignment types offered.

Delete Gaps and Shift

If the "Delete gaps and shift" option is selected, then gaps placed to bring sequences into alignment will be squeezed out and the alignment shifted rightwards (toward the C-terminal end). For example, suppose your query has a one-amino acid insertion relative to most other sequences, then following alignment:

QUERY  VARELHP
REF    VAR-LHP
seq2   VAR-LHP
seq3   VAR-LFP
seq4   VAR-LMP

would be presented like this with gaps deleted:

QUERY  VARELHP
REF    QVARLHP
seq2   QVARLHP
seq3   QVARLFP
seq4   QVARLMP

Q is the amino acid one position to the left of the V. As a result of squeezing gaps and shifting characters rightward, alignments in gappy regions will look "bad."

NOTE: The delete gaps option is useful for aligning immunologically reactive epitopes, because in such cases it is particularly important to maintain the alignment of the C-terminal anchor residues.

Wide Output

For ease of reading, QuickAlign presents its alignment result in groups of 10 characters, with a maximum line width of 50 characters. If the query is longer than 50 characters, it will be continued below. However, you can force lines to be longer than 50 characters, by checking "Yes".

See sample output on Explanation of QuickAlign Results.

Calculate frequency by position

On the Results page, you will see buttons for "Summarize All" and "Summarize by Subtype". If you have selected "Calculate frequency by position", these summaries will include data showing the frequency of each nucleotide or amino acid at each position.

If "Calculate frequency by position" is selected, the Summarize pages will also contain links to "See full raw counts", which will show you the full residue counts without applying any cutoff.

Below the frequency table is a Sequence Logo (frequency graph) that shows a visual representation the frequency of each residue at each position. The height of letters indicates the relative frequence of each residue at each position. The width of a stack of letters is proportional to the fraction of valid residues in that position, i.e., columns with many gaps or unknown residues are narrow. These graphs are produced by WebLogo 3.

See sample output on Explanation of QuickAlign Results.

Cut-off for calculating frequency by position

The frequency table will show only the residues with the highest representation(s), as determined by a cutoff. If the cutoff is 100%, all residue frequencies will be shown. If the cutoff is 95%, the most frequent residues will be shown, up to a cumulative total of 95%, then all others will be presented as "other". Lumping together the infrequent residues can be a useful simplification, particularly in the case of protein sequences.

Include surrounding region

If checked, this option will display an additional 15 residues on each side of the query. These residues are taken from the appropriate reference sequence.

Links

 




Questions or comments? Contact us at hcv-info@lanl.gov