HCV Database
HCV sequence database
 



To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.


Highlighter Explanation

Click here to Return to Highlighter tool page

Table of Contents

 

Data input

Mismatches

This analysis enables you to find the nucleotides/amino acids in a query sequence that do not match with those in a SINGLE master sequence. The nucleotides/amino acids that do not match with the master are assigned a color. The example below describes mismatches in a nucleotide sequence:

= T  = A  = C  = G

Consider the following sequences,

MasterAGTTAG
QueryTAGCAG
Result||||

In the above example, the query sequence differs from the master in the positions 1, 2, 3 and 4. Hence these changes in the query are indicated by a "|" in a color depending on which nucleotide is present in the given location in the query.

Mark potential glycosylation sites (amino acid input only)

In a Misatches analysis with amino acid sequences, this option will mark glycosylation motifs. A pink dot will mark glycosylation motifs in the master sequence; a pink diamond will mark sites where the query has an additional glycolation motif; a blue diamond will mark sites where the query has lost a glycosylation motif, compared to the master.

Transitions and transversions

This analysis compares your NUCLEOTIDE query sequences with a single master sequence and highlights transitions and transversions.

= Transitions  = Transversions 

Consider the following sequences:

MasterATGCGTM
QueryAAAGGCG
Result   | | |   | |

In the above example, all transitions (A<->T or C<->G) are marked in gold, and all the transversions are marked in pink.

For positions with IUPAC codes (such as the last position in the example above), the result you see for this position will vary depending on what option you select for handling IUPAC codes. If "Use codes to compare" is selected, and the position could be either a transition or transversion, it is marked as a transition, as above. See below for details on IUPAC handling.

Silent and non-silent mutations

This analysis enables you to compare query NUCLEOTIDE sequences with the master and highlight silent and non-silent mutations with the following colors.

= Silent  = Non-silent

Consider the following sequences,

MasterATGACTAATTAG
QueryATGACCGTTTAA
Result||||

The tool converts the nucleotide sequences to the corresponding amino acid sequences and highlights silent and non-silent mutations. In the above example, the second codon (ACT) in the master encodes threonine, and the corresponding codon in the query (ACC) also codes for the same, hence this is shown as a silent mutation. In contrast, the 3rd codon in the master (AAT) codes for amino acid asparagine, while the corresponding codon in the query (GTT) codes for valine, hence this is shown as a non-silent mutation.

This tool uses SNAP to calculate the statistics. It only compares the Master sequence with the other sequences and does not compare all pairs of sequences. For a more detailed analysis of silent and non-silent mutations, please use SNAP.

Matches

This analysis enables you to identify the matching nucleotides/amino acids between a single or multiple masters and a query sequence. If the number of masters is entered as 2, the top 2 sequences in the file will be considered as master sequences.

Each of the masters is assigned a unique color and is matched to each of the query sequences. The nucleotide/amino acid matches in the query are highlighted in the color of the master that it matched.

Consider the following example:

Master1ATTGGC
Master2AGGCAT
Query1AGTTAG
ResultA||T|G

In the above example, the G in position 2 of the query matches with master2 and is indicated by a green "|" in the respective position in the result. The query sequence also matches with the T of master1 in position 3 and this is indicated by the red "|". The query matches with both master1 and master2 in position 1 and hence this position is left uncolored as only unique matches are displayed in the result. With regards to positions 4 and 6, since there are no matches, this is treated as a polymorphic site and depending on the option chosen to mark for unique, this is either left uncolored or is colored black. For more info on marking black for unique, see below.

Mark potential glycosylation sites (Matches only; amino acids only)

In a Matches analysis with amino acid sequences, this option will mark motifs that match potential N-linked glycosylation sites in the master sequence(s). A pink dot will mark sites where the glycolation motif is unique to the query; a pink diamond will mark sites where the glycolation motif is shared by the query and at least one master.

Mark black for unique (Matches only)

If selected, this option will make a black tic mark for each position where the query is unique, i.e., does not match ANY of the masters.

Mark gray for a match to multiple masters (Matches only)

This option may be selected only when you have 3 or more masters. If selected, a gray tic mark will appear when the query matches 2 or more masters (but not if it matches ALL masters). For example, if you have 4 masters and this option is selected, the gray mark will appear when the query matches 2 or 3 of them.

Consider the following example:

Master1ATTGGC
Master2ATGCAT
Master3AGGCAT
Query1AGTTAT
ResultA||T||

In the above example, the last two positions in the query match two masters, and this option causes them to be marked with gray bars. (Without this option selected, they would be unmarked.) The first position matches ALL masters and is unmarked.

Change masters

This feature enables you to select the sequence(s) that will act as master.

By default, the top sequence will be taken as the master. When multiple masters are required (for Matches analysis), the top n sequences will be taken as masters, where n is the number you specify. To select a different master(s), click the box for "Change masters", and you will be given a list to select from. All master sequences must be the same length.

Ignore alphabet validation (amino acids only)

If unchecked, each sequence is evaluated to be nucleotide, amino acid, or indeterminate. If more than 2% of the characters are unambiguous amino acids [QEILFP], then the sequence is evaluated as protein. If more that 94% of the characters are ATGCUNRY, it is evaluated to be a nucleotide sequence. Otherwise, it is considered ambiguous.

Unchecking the box allows you to submit sequences that have dash marks to indicate identity. For example:

MasterTFQPSSGGDLEI
Seq1--M---------
Seq2--D------D--

This input can be used only if 'Ignore alphabet validation' is checked.

Treat gaps as character

Matches:
When a Matches analysis is done with the option "treat gaps as character", the gaps are treated as a "5th nucleotide" (or "21st amino acid"), and a gap in the query is matched with a gap in the master in the same position.

Consider the following example:

Master1-TTGG-
Master2AGGCA-
Query1-GTTA-
Result|||T|G

In the above example, when "treat gaps as character" is selected, the gap in the 1st position of the query is matched with the gap in the first position of master 1. However, the gap in the 6th position is not taken into account and is ignored because it matches with more than one master.

Mismatch, Transition & transversion, and Silent & non-silent analyses:
When any of the these analyses are run with "treat gaps as a character", a gap IN THE QUERY is highlighted if there is no gap in the master sequence at the same position. If the "treat gaps as character" option is not chosen, such a gap is ignored.

Handling of IUPAC codes

IUPAC codes may occur in your nucleotide sequences:

CharacterMeaning
MA and C
RA and G
WA and T
SC and G
YC and T
KG and T
BC and G and T
HA and C and T
VA and C and G
DA and G and T
NA or C or G or T
?Any state or nothing

Ignore

When the ignore option is selected, the tool skips over the IUPAC code and does not perform any comparison at that position.

Use codes to compare

Treat IUPAC codes as characters

In this case, the tool treats IUPAC codes as regular characters without using the nucleotides they stand for while comparing. For example, although R stands for A and G, while using this option it will match ONLY another R.

Mark as unknown

When this option is selected, all positions with an IUPAC in either the master or query will be marked with a black dot.

Sort sequences

By similarity

When this option is chosen, the sequences are compared to the master sequence and are sorted according to their similarity with this sequence. The most similar sequence is placed at the top of the result graph, and the least similar at the bottom. When there are multiple masters, the sequences are sorted according to their similarity with the first master in the alignment file.

By tree

When this option is chosen, the sequences are sorted based on their evolutionary relationship. You may supply your own tree file, or the program will generate one using PAUP*.

Do not sort

When this option is chosen, the sequences in the result set will appear in the same order they were in the alignment file.

Options for coloring matches/mismatches (amino acids only)

1. Standard

His
Asp, Glu
Lys, Asn, Gln, Arg
Met
Ile, Leu, Val
Phe, Trp, Tyr
Cys
Ala, Gly, Ser, Thr
Pro
Gap
Other

2. Se-Al (default)

For information about the Se-Al software click here
Ala, Gly, Pro, Ser, Thr
His, Lys, Arg
Asp, Glu, Asn, Gln
Cys
Ile, Leu, Met, Val
Phe, Trp, Tyr
Gap
Other

3. Se-Al (polar/non-polar)

For information about the Se-Al software click here
Ala, Phe, Ile, Leu, Met, Pro, Val, Trp
Cys, Gly, Asn, Gln, Ser, Thr, Tyr
Asp, Glu
His, Lys, Arg
Gap
Other

3. BioEdit

For information about the BioEdit software click here
Ala Gly Pro Ser
Asp Glu Trp Tyr
His Lys Arg Ile
Leu Met Val Asn
Gln Thr Phe Cys
Gap Other

Display of PostScript files for Mac users

The default application for the display of PostScript files on Mac computers is the Preview application. We noticed that this application does not always display the generated PostScript output properly. Specifically, the diamonds and pink-filled circles that denote G->A conversions and APOBEC signatures are affected. If you find that these symbols are offset, please use the downloadable PNG file instead.

How to cite this tool

When referencing Highlighter in publications, please cite the tool name and the following reference:

Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping LH, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, Shaw GM.
Proc Natl Acad Sci U S A. 2008 May 27;105(21):7552-7.
PMID: 18490657






Questions or comments? Contact us at hcv-info@lanl.gov