HCV Database
HCV sequence database
 



To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.


Highlighter Explanation

Click here to Return to Highlighter Input Page

Topics below:

 

Format permitted

Highlighter takes only nucleotide alignments as input. The alignments can be in any one of the Common Sequence Formats. The input should be codon aligned if you wish to use the silent and non-silent statistics.

Change masters

This feature enables you to select the sequence(s) that will act as master.

If the box is not clicked, the top sequence will be taken as the master by default. When multiple masters are required while using the option match, the top n sequences will be taken as masters where n is a number specified by the user. However, when the change masters button is clicked, the sequences will be extracted from the input file and a check box will be displayed. Using these checkboxes, you can select the sequences you want to use as masters.

NOTE: When there are multiple selections but only one master is needed, any one of the selections will be used as master.

Mismatch

This option enables you to find the nucleotides in a query sequence that do not match with those in a SINGLE master sequence. The nucleotides that do not match with the master are assigned a color as given below:

= T  = A  = C  = G

Consider the following sequences,

MasterAGTTAG
QueryTAGCAG
Result||||

In the above example, the query sequence differs from the master in the positions 1, 2, 3 and 4. Hence these changes in the query are indicated by a "|" in a color depending on which nucleotide is present in the given location in the query.

Transitions and transversions

This option enables you to compare query sequences with the master and highlight transitions, transversions and A<->G transitions with the following colors:

= Transversions  = C<->T Transitions  = A<->G Transitions  = Transition or transversion

Consider the following sequences,

MasterATGCATM
Query-AAGGCG
Result|||||||

In the above example, all transitions except the one in the 6th position is an A<->G transition and is hence marked in light blue. All the transversions are marked in pink. In the last position (7), M of the master represents A or C and the query has a G in the corresponding position, hence this could be either a transition or transversion and is marked in green.

Silent and non-silent mutations

This option enables you to compare query NUCLEOTIDE sequences with the master and highlight silent and non-silent mutations with the following colors.

= Silent  = Non-silent

Consider the following sequences,

MasterATGACTAATTAG
QueryATGACCGTTTAA
Result||||

The tool converts the nucleotide sequences to the corresponding amino acid sequences and highlights silent and non-silent mutations. In the above example, the second codon (ACT) in the master encodes threonine, and the corresponding codon in the query (ACC) also codes for the same, hence this is shown as a silent mutation. In contrast, the 3rd codon in the master (AAT) codes for amino acid asparagine, while the corresponding codon in the query (GTT) codes for valine, hence this is shown as a non-silent mutation.

This tool uses SNAP to calculate the statistics. It only compares the Master sequence with the other sequences and does not compare all pairs of sequences. For a more detailed analysis of silent and non-silent mutations, please use SNAP.

Match

This option enables you to identify the matching nucleotides between a single or multiple masters and a query sequence. If the number of masters is entered as 2, the top 2 sequences in the file will be considered as master sequences.

Each of the masters is assigned a unique color and is matched to each of the query sequences. The nucleotide matches in the query are highlighted in the color of the master that it matched. If a nucleotide in the query matches more than one master, this match is ignored, and only unique matches are colored.

Consider the following example,

Master1ATTGGC
Master2AGGCAT
Query1AGTTAG
ResultA||T|G

In the above example, the G in position 2 of the query matches with master2 and is indicated by a green "|" in the respective position in the result. The query sequence also matches with the T of master1 in position 3 and this is indicated by the red "|". The query matches with both master1 and master2 in position 1 and hence this position is left uncolored as only unique matches are displayed in the result. With regards to positions 4 and 6, since there are no matches, this is treated as a polymorphic site and depending on the option chosen to label polymorphisms, this is either left uncolored or is colored black. For more info on labeling polymorphisms, see below.

Label polymorphisms

Consider the following sequences,

Master1: ATTGATA
Master2: ATTGTTA
Query1 : ATTGCTA

All the sequences above are identical except for the nucleotides in the 5th position. While the master sequences have an A and T in their respective positions, Query1 has a C. By selecting the option to label polymorphisms, these mismatches will also be indicated in black, while they will be ignored if this option is not selected.

Indicate successive matches

This option highlights successive matches with bars as shown below:

Master1ATTGGC
Master2AGGCAT
Query1ATTTAG
ResultT|G

In the example, the first three successive matches are represented as a single long bar as shown when the 'Use bars to indicate successive matches' option is selected. If the option is not selected, the matches are represented as regular vertical lines as shown in position 5.

Treat gaps as character

Match:
When a match is done with the option "treat gaps as character", the gaps are treated as a "fifth nucleotide" and a gap in the query is matched with a gap in the master in the same position.

Consider the following example:

Master1-TTGG-
Master2AGGCA-
Query1-GTTA-
Result|||T|G

In the above example, when "treat gaps as character" is selected, the gap in the 1st position of the query is matched with the gap in the first position of master 1. However, the gap in the 6th position is not taken into account and is ignored because it matches with more than one master. For more details on match see above.

Mismatch, Transition & transversion, and Silent & non-silent:
When any of the above options are run with the option "treat gaps as a character", a gap IN THE QUERY is highlighted in gray if there is no gap in the master sequence at the same position. If the "treat gaps as character" option is not chosen, such a gap is ignored.

Handling of IUPAC codes

When the option "match IUPAC codes" is selected, the following IUPAC codes are also considered during a match or mismatch:

CharacterMeaning
MA and C
RA and G
WA and T
SC and G
YC and T
KG and T
BC and G and T
HA and C and T
VA and C and G
DA and G and T
NA or C or G or T
?Any state or nothing

Ignore

When the ignore option is selected, the tool skips over the IUPAC code in the sequence and does not perform any comparison in that position.

Treat IUPAC codes as characters

In this case, the tool treats IUPAC codes as regular characters without using the nucleotides they stand for while comparing. For example, although R stands for A and G, while using this option it will match ONLY another R.

Use IUPAC codes to compare

Match:
When IUPAC codes are included in the match, the codes are also matched based on the nucleotides they represent. For example, if the master had an M in the 2nd position and the query had an R in the corresponding position, then this is considered a match because M could be an A and so could R. Whereas, if the master had a C in the 2nd position and the query had a D in the same position then this would not be a match because D includes everything but C.

Consider the following example:

Master1AMRW?TGC??----?
Master2MHND?Y?T?G-AA-T
Query1BTACAT?M?T--??-
Result|||||||||||||||

In the above example, in the first position, the query matches with Master 2 because the B in the query matches with M in Master 2 since both codes can represent C. Position 2 shows a similar example. In position 3, A in the query matches both Master 1 and Master 2 hence this match is not shown. Looking at position 4, we find that the C in the query does not match with either W(A and T) or D(A and G and T). If the option label polymorphisms is selected, then this is labeled black, else it is ignored. To learn about labeling polymorphisms, see here. When a question mark is present in the query during a match, it is considered a match with all the masters and is ignored. Whereas, if it is present in a master and the query does not match with any other masters in the corresponding position, then this is considered a match and is shown. This is demonstrated in the example in positions 7, 9, 13, 14 and 15.

Mismatch, transitions & transversions and silent & non-silent:
When IUPAC codes are included in any of the above cases, a difference is shown only if there are no common nucleotides between the IUPAC codes. Consider the following example during mismatch:

MasterRMATGC-D??
QueryAHWGGDABA?
ResultAHW|G||BA?

In the above example, the first three positions of the query match with the first three positions of the master and hence are not shown. Whereas at the 4th position, the T in the master does not match with the G in the query and this is shown in yellow. Similarly at position 6, the master has a C while the query has a D which represents the nucleotides A, G and T. Hence this is considered a mismatch.

Sort sequences by similarity

When the sort sequences by similarity option is chosen, the sequences are compared to the first sequence in the alignment file and are sorted according to their similarity with this sequence. The most similar sequence is placed at the top of the result graph and the least similar at the bottom. When there are multiple masters, the sequences are sorted according to their similarity with the first sequence in the alignment file.




Questions or comments? Contact us at hcv-info@lanl.gov