HCV Database
HCV sequence database

To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.

HCV databases: Archived news

The 2011 Reference Alignments are now available online. 31 October 2011

We have added a new tool, GenBank Entry Generation, that generates Sequin files for HCV sequences. This tool automatically generates protein annotations and includes them in the Sequin file, together with related data, for submission to GenBank. 05 October 2011

Highlighter is a tool to visualize mutations: matches, mismatches, transition and transversion mutations, and silent and non-silent mutations in a set of nucleotide sequences that are aligned and in frame. It is particularly useful for allowing visualization of potential recombination and SNPs in closely related sequences, such as those isolated from a single patient. Highlighter now also work with amino acid sequences. 24 May 2011

We are proud to announce the third member in our viral sequence database family, the Hemorrhagic Fever Virus database at http://hfv.lanl.gov. The HFV database contains published sequences from the Arena-, Bunya-, Filo-, Flavi-, Paramyxo-, and Togaviridae, and offers much the same capabilities as the HCV database. Rather than one reference sequence, the HFV database uses different reference sequences for each viral species. 20 May 2011

A beta version of a new tool, Protein Feature Accent, is now available. This user-friendly tool allows a user to highlight regions and features of interest (e.g. resistance mutations or immune epitopes) on protein structures, to calculate and plot entropy, and to display the protein in many different ways. The tool uses Jmol, and automatically checks for and lists all 3D protein structures available in PDB. 26 February 2010

Geographic distribution of search results. With this new feature of our main search interface, you can display the results of your sequence search as a map, with pies to display the geographical origins of the sequences. Use the 'Geography' button on the search results page. 21 October 2009

Good news! The new HCV database is now available, including new sequences, new annotation and a new search interface. The new database will be updated continuously and annotated as much as possible. We gratefully acknowledge the financial support of Roche Pharmaceuticals for this work. The new search interface gives access to many more fields and uses a "flip-open" design to hide fields that are not needed for your search. There may be some small bugs we haven't found while testing; if you find a problem, please let us know. As always, suggestions for improvement are welcome. 05 August 2009

New tool PepMap is available. The tool maps an input set of peptides on the H77 reference sequence, generates a table with peptide locations and peptide maps, similar to eptiope maps. 23 February 2009

We now provide an interface to PhyML, which generates much better trees than our simple Treemaker tool. The ML method is resource intensive, please try not to overload it. 16 January 2009

We have a new tool called Pixel that visualizes large alignments as an image, using 1 pixel per residue. It can be very helpful when the alignments get large enough that you lose the overview, and quickly shows problems and misalignments. It works for both nucleotide and amino acid alignments. 25 October 2008

The Codon alignment tool can restore codons in your alignment so that the resulting alignment can be immediately translated. Currently this tool requires an input reading frame (or does its work for all 3 frames). In the future we will connect it to the reference sequence and the 'locate' tool so that it can find the correct reading frame automatically. 21 June 2008

There are new ready-made alignments for all categories (all, consensus, and genotype reference). They are called '2008 alignments' and are now the default for the alignment interface. 26 May 2008

We added a beta version of the new tool HCValign, which uses our HCV HMM alignment model to align user sequences. It can also codon-align the sequences, and separate individual HCV genes, and thus is a near-successor to Gene cutter. Please note that the tool has not yet been extensively tested. We would appreciate bug reports and other feedback. 07 May 2008

Try out Phyloplace! This tool was designed to help users decide whether their sequence fits inside a currently knows genotype and/or subtype, or would be better classified as a new one. It can use either an intuitive distance-based method or the phylogenetic tree-based Branching index, and produces user-friendly graphical output for both. The tool also shows some promise for easily finding potential recombinants. 14 January 2008

The Treemaker tool now lets you download your input sequences sorted in the order of the tree, which makes it much easier to select sequences in the alignment based on their phylogenetic behavior. 06 November 2007

We have added a new tool called ElimDupes, for Eliminate Duplicates. It will take your alignment and remove duplicate sequences. Several options can be set, and the sequences can be automatically divided into groups (e.g. if there are data from several patients with multiple sequences each). In the future we will add an option to also remove sequences that are more than x% similar where x<=100. 31 October 2007

We will keep a list of tools we are working on, and tools for which the problems we are aware of have been fixed. The list will be updated frequently. You can find it here. 31 October 2007

We will be making some infrastructural changes to the HCV database website in the coming week. This may break some links and bookmarks. We hope to minimize the impact, but please bear with us while we are at work. Please let us know of any problems at hcv-info@lanl.gov. 22 October 2007

Regretfully, NIH has decided to stop funding this project. The funding has been moved to the Viral Bioinformatics Resource Center, who will be maintaining a different HCV site at http://www.hcvdb.org/.

Because of this, we can no longer maintain the work-intensive HCV immunology database. The HCV immunology database will remain accessible for the foreseeable future, but due to lack of recources, no new information wll be added. We will display this information at the top of our immunology web pages. For new epitope information, users of this database can try the Immuno Epitope Database (http://www.immuneepitope.org).

The HCV sequence database is still being maintained, although not quite as diligently as before. The website will be tied even more closely to the HIV site, so that all new HIV tools will also be available for HCV. When we feel (or you let us know) that the annotation quality begins to significantly deteriorate, this site will be closed.

The VBRC people are working hard to provide a worthy successor to this database. Please take a look at what they are doing, and give them a chance to come up to speed. If you still think the disappearance of our database and website will seriously affect your work, you may be able to help in two ways. First, if you can contribute financially, this will help us to free up resources to keep the database and website alive longer. Second, you can send feedback to Dr Caroline Heilman, the director of DMID, the NIH program office that distributes the HCV funds, and/or to Dr Valentina DiFrancisco, the new HCV database program officer (their contact information is here). And please CC us at hcv-info@lanl.gov, so we can keep a record.

21 October 2007

In the coming weeks we will be rolling out our new site design. We think it is a vast improvement over the old one. We will try hard to minimize the inconvenience, but while we are updating it you may notice a few glitches. If you find a problem, please send an email to hcv-info@lanl.gov. This will put it at the top of our "to be fixed" pile, so it will be solved sooner. 12 October 2006

The Principal Coordinate Analysis tool PCOORD has been improved by the addition of options to strip gaps from the alignment and to calculate distances using either ID or Smith amino acid matrix scoring. The program works with both amino acid and nucleotide alignments; the tool identifies them automatically. 11 October 2006

The new version of Gene cutter can align your nucleotide sequences, codon-align the coding regions, clip pre-defined regions from a sequence or alignment, and provide alternate translations for codons with ambiguity codes. It no longer requires the reference sequence to be in the alignment. 30 August 2006

Finally we have added a simple tool to translate nucleotide to amino acid sequences. 20 July 2006

We have added a site map with a concise overview of (and links to) all the available tools. 19 July 2006

The curated alignments have been updated, and separate alignments are now also available for the putative ARF-P (alternate reading frame) protein and for the Okamoto region. 18 July 2006

We have created a page that lists several primer sets that people have used successfully to amplify different HCV genotypes. On this page you can also sign up for a mailing list that can be used to ask (and answer!) HCV primer-related questions. 21 February 2006

The HCV search interface will now automatically pad your downloaded alignments with gaps, so all sequences will have the same length. Superfluous columns containing only gaps will be removed ("squeezed") by default, but this option can be switched off (for example if it is needed to maintain the reading frame). 09 December 2005

A new tool, Branchlength, is available, again based on a Perl program initially created by Bette Korber. You provide a Newick treefile, the tool draws a 'clickable' tree and shows cumulative branch lengths from the selected node. 08 December 2005

A web interface is now available for Bette Korber's program VESPA (Viral Epidemiology Signature Pattern Analysis). This program compares two alignments and identifies the most consistent differences ("signature patterns") between them. It can be used to find amino acid or nucleotide positions that best distinguish two groups of sequences. 23 November 2005

A 'Last modification date' field has been added to the search interface. You can find it in the 'Other fields' pulldown menu, and it takes the same >MM-DD-YYYY (don't forget the dashes) format as the Download date field. The field is updated any time any field in the database that is associated with that sequence is changed. 19 October 2005

The search interface can now take aligned user sequences as input. The search will be automatically limited to database sequences that span the same region. Trees can be built via the search interface that include the user provided alignment, search results, and genotype reference sequences. 11 October 2005

Since the publication of the the new HCV nomenclature proposal, the HCV database has switched to using the H77 sequence as a reference sequence, instead of HCV-H. The main advantage of H77 is that its 3' UTR is much longer. All genes and proteins in the rest of the genome are equally long in both strains, so the coordinates do not change except in the 3' UTR. The changes affect the search interface and the Sequence Locator, Primalign and Epilign tools. 11 October 2005

Another consequence of the the new HCV nomenclature proposal is the division of genotype assignments into "provisional" and "confirmed". By default, all genotypes are included in search results, but the search interface offers the possibility to limit the search to only confirmed genotypes. More information is here. 11 October 2005

You can now include the genotype reference sequence alignment when you download sequences from the database. The reference alignment for the correct region will be selected automatically; if your search criteria did not include a genomic region, the complete genome reference alignment will be used. 16 August 2005

The search interface now includes a Download date field, which can be used to search only sequences that were downloaded from Genbank before or after that date. The field can be found in the Other fields menu; help is available by clicking any of the search interface field names. 15 August 2005

NIH has agreed to fund the sequencing of a number of complete genomes of unusual HCV variants. We are looking for samples for a large number of genotypes that have been found in at least three unrelated patients, and for which fewer than three complete genomes have been sequenced; they are listed here. If you would like to collaborate on this project by donating samples, please contact us at hcv-info@t10.lanl.gov 25 June 2005

Try out the new search interface! It lets you build phylogenetic trees, directly from database sequences that you have retrieved, including reference sequences. You can also
  • download the sequences in many different formats
  • create sequence names so they contain information from many fields in the database, and determine the separators and missing value characters
  • download background information
  • easily select groups of sequences
  • sort the sequences on start and end coordinates
  • and more...
24 June 2005

The antibody section of the HCV immunology database is now available. It contains multi-part entries for HCV-specific antibodies with references and notes, antibody epitopes summary table, antibody epitopes maps, antibody index by name and antibody index by binding type. 26 May 2005

You can now automatically retrieve the "Okamoto region", a frequently sequenced region of ~300nt in NS5B, using the search interface. Select the "Okamoto region" from the "genome region" menu. Also read the background information about retrieving this region. 15 April 2005

We have added protein F (ARFP, the alternate reading frame protein) to the search interface; you can now search for this protein, and (perhaps more importantly) download it aligned. Also see the help text about this feature. 14 April 2005

NIH has agreed to fund the sequencing of a number of complete genomes of unusual HCV variants. We are looking for samples for a large number of genotypes that have been found in at least three unrelated patients, and for which fewer than three complete genomes have been sequenced; they are listed here. If you would like to collaborate on this project by donating samples, please contact us at hcv-info@t10.lanl.gov 08 April 2005

An updated HLA anchor residue motif scanner Motif Scan is now available to scan for possible epitopes in any protein sequence or alignment, or in predefined HCV consensus sequences. A new feature allows also to scan simultaneously for all possible motifs and potential epitopes in a protein sequence. 30 March 2005

The Epitope Location Finder (ELF) can quickly find probable epitopes within a protein sequence, based on HLA anchor motifs or on previously described epitopes stored in the immunology database. It can also help identify potentially missed CTL reactivities due to variations in the sequence strain selected as a basis for the peptides used to test the response. The tool is meant to be a workbench for experimentalists who want a fast summary of their peptide CTL reactivity results. 29 March 2005

The search interface has two new 'exclude' checkboxes, to automatically exclude synthetic sequences (from the Genbank SYN database, sequences that result from laboratory manipulation) and "bad" sequences which either contain more than 10% N's or IUPAC codes, or which the HCV database staff judged to be 'suspicious'. See the search interface help page for more information. 22 March 2005

A 'gapsqueeze' option has been added to our Gapstrip tool. 'Squeezing' gaps means removing all columns that contain only gaps. The tool now also automatically reads any valid file format, and returns an output file in the same format. 17 March 2005

Try out Synchaligns, a new tool that can synchronize (or "merge") two alignments:
Alignment 1: seq1  JKLMNYOPQR-ST    Alignment 2: seq3  HIJK-LMN-P
             seq2  JKLMNYOPQRYST                 seq4  --JK-LMNOP
                                                 seq5  HIJKXLMNOP
seq3  HIJK-LMN--P-----
seq4  --JK-LMN-OP-----
seq5  HIJKXLMN-OP-----
16 March 2005

Our BLAST interface now lets you Blast your sequences against a subset of all genotyped sequences; you can select each genotype or any combination of genotypes. 31 January 2005

The HCV sequence database will collect data on, and provide access to external sequence sets that have not been deposited in Genbank. These sets are often generated for purposes of genotyping or other clinical use and are often unannotated, but they can still be useful for some types of analysis. Contact information is available on request. We invite people who have such sets and are willing to provide them to researchers upon request to contact us so that these sets can be added to the list. 01 December 2004

The 3' UTR alignments have been added to the ready-made alignments page. 30 November 2004

Bill Bruno provided the code for FindModel, which is similar to Posada and Crandall's Modeltest script, but uses Ziheng Yang's PAML as a back end. FindModel analyzes your sequence alignment to determine which evolutionary model fits it best; you can then use this model to build a better tree. 03 November 2004

The ENTROPY program calculates and plots the Shannon entropy (a measure of variability) for each position in an alignment. It can also compare the entropy of all positions in two alignments, and perform a permutation-based statistical significance test to find positions with different variability. 17 September 2004

The CTL and Helper sections of the HCV immunology database are available! 04 September 2004

The data that were used for the HCV variability graphs can now be viewed or downloaded. 03 September 2004

Two new annotated fields have been added to the database: HIV coinfection (confirmed/excluded/unknown) and HBV coinfection (same values). They can be searched using the "Other fields" pulldown menu in the search interface. 01 September 2004

It is now possible to download the listed search results as a tab-delimited file, with or without the actual sequence. Scroll down to the bottom of the search results page. 23 August 2004

An new MotifScan is available that is adapted for immunological motifs, and can be used to locate HLA anchor motifs in a protein sequence or alignment, or in predefined HCV-H and HCV consensus sequences. 12 July 2004

The updated HCV curated alignments are available. Pre-made consensus sequences have been added to the collection. 08 July 2004

The alignments you retrieve using the search interface will now also be codon-aligned, and in general should translate to amino acids without moving any gaps. This may mean that the alignments are not exactly the same as alignments retrieved previously; please let us know if this creates a problem. We have also added the option to include HCV-H in any alignment you retrieve. 23 June 2004

The search interface now lets you select multiple (adjacent) genes. If you select two non-adjacent genes, for example Core and NS5B, the genes in between will be selected automatically. 22 June 2004

Epilign now includes a SUMMARIZE function that shows the frequency of each variant of your epitope. 04 June 2004

The sequence locator tool now also does reverse lookup: you can give it coordinates in HCV-H and it will find the corresponding amino acid sequence. If you want to do the same for nucleotides, please use the search interface. 05 May 2004

We have a new and versatile consensus tool. Among other things, you can set different thresholds for unanimity and majority, divide your alignment into blocks and automatically calculate a consensus for each block as well as a consensus-of-consensuses, and flexibly deal with gaps and non-standard characters. 03 May 2004

We have created a series of graphs showing the genotype and subtype variability of all HCV proteins. Three variability measures are provided: a histogram of the frequency of pairwise distances within genotypes and subtypes; a sliding window graph of the entropy of each protein, and a position-by-position plot of the estimated ds (synonymous changes) and dn (non-synonymous changes) of each protein. 29 April 2004

We have a new version of BLAST that also accepts protein sequences. It automatically recognizes those, and when a protein sequence is submitted, TBLASTN is used to search the BLAST database. This version also offers the option of excluding sequences that do not have a genotype, which can be convenient if your query sequence has many matches that are ungenotyped and you want to check its genotype. 28 April 2004

We have added a page of downloadable database software. You can use this software if the datafile you want to analyze is very large, or if you want to run batch analyses. The page can be found here, and we have added a link to it on the navigation bar. 06 April 2004

We provide an overview of assigned genotype/subtype designations. If you think you have found a new genotype or subtype, please consult this table to avoid conflicts with existing designations. If you have already assigned geno/subtypes that are not in this list, please contact us and we will add them immediately. 18 March 2004

We have an updated version of Sequence Locator. This tool finds the coordinates of your input sequence(s) relative to the start of the HCV-H genome, the CDS of the polyprotein, and each individual protein it overlaps. It also produces a map showing the location of your sequence, and if you submit a protein sequence, it lists the corresponding nucleotide sequence in HCV-H. 26 February 2004

It is now possible to retrieve "clean" sequence sets that include only one sequence per patient (or cluster of epidemiologically related patients). When you use the search interface to retrieve sequences and include the genomic region in your search, a button will show up on the results page that says "Exclude related". When you click this button, a clean sequence set will be returned. This function only works when the genomic region field is included in the search, as it makes no sense to delete a sequence from one region because another region from the same patient is already in the set. In conjunction with this, clusters of epidemiologically related sequences have been defined. You can search for a cluster, or get an overview of existing clusters by going to the search interface, selecting 'Cluster name' from the 'other fields' menu, and typing a '_' in that field; this will list all sequences with a defined cluster. More information is here. 20 February 2004

The HCV sequence website is now searchable; click on the "Enter the HCV sequence database" link above and use the search box in the navigation frame. 18 November 2003

We have created an alignment of flavivirus complete genomes. The alignments are provided on an "as is" and "let the user beware" basis. We are still working on improving them; significant improvements will be announced here. 12 November 2003

Please try the new automatic format converter, OmniRead. The program is based on the Readseq and Fmtseq programs, and attempts to capture possible conversion errors. It does NOT do a perfect job of recognizing formats, but can be used for many of the most common formats, including several that SeqConvert does not read, such as phylip. It also produces some output formats that SeqConvert does not offer, for example PAUP/Nexus. 30 October 2003

Several new fields have been added to the search interface . It is now possible to search on ALT level and whether or not the person was drug naive at the time of sampling (in the 'Other fields' pulldown menu), as well as on the therapy resonse at the end of the study. Also, sequences obtained from known non-human hosts and sequences used for patent applications (which are usually badly annotated and often duplicates of other sequences in the database) can be automatically excluded from the retrieval. 22 October 2003

Treemaker now allows different distance models to create a tree. 21 October 2003

Epilign, the amino acid/epitope equivalent of Primalign is now available for HCV. This program takes a user input sequence (amino acids in the case of Epilign, nucleotides in the case of Primalign) and aligns it to the corresponding region in the complete genome alignment. The program is designed to produce a quick overview of how conserved your epitope is. 02 October 2003

The BLAST interface HCV BLAST interface now accepts accession numbers as a well as sequences as input. 23 September 2003

A new (and improved) Primalign program is now available. This program takes a user input sequence and aligns it to the corresponding region in the complete genome alignment, so it gives a quick overview of how conserved the region is that your primer covers. It also takes reverse complements. 27 August 2003

You can submit comments and suggestions to the HCV database. 18 August 2003

The sequence locator tool now also searches for reverse complements. 13 August 2003

Annotation of the HCV genomic map has been expanded. 18 July 2003

A bug in Motifscan has been fixed, for now by deleting the epitope alignments; these will be reinstated when the HCV Immunology Database is up, and epitope alignments are available. 16 July 2003

The ready-made alignments of (near-)complete HCV genes and genomes are now available. We provide both alignments of all available genes, and 'genotype reference' sets which contain a few representatives of all genotypes in the database. For most of these sets, not all genotypes are well-represented in the database. The alignments can be downloaded here. 15 July 2003

Questions or comments? Contact us at hcv-info@lanl.gov