HCV Database
HCV sequence database

To our users Please note that the HCV database site is no longer funded. We try to keep the database updated and the tools running, but unfortunately, we cannot guarantee we can provide help for using this site. Data won't be manually curated either.

Gene Cutter Help


Gene Cutter is useful for extracting protein sequences from viral DNA sequences. It uses a reference sequence to decide where to cut the sequence into genes (details below). It appropriately handles introns and overlapping genes.

Gene Cutter is useful for annotating coding regions, as needed for deposit of DNA sequences to GenBank. For HIV-1, HIV-2, and SIV, we provide a GenBank Entry Generation tool, which incorporates Gene Cutter results into a sequin file ready for deposit.

Gene Cutter does not align non-coding regions or LTRs. You may need to use other alignment tools to correctly handle these regions. Our HCVAlign tool provides similar functions to Gene Cutter, and its MAFFT option will align untranslated regions appropriately.



Regions to align and extract

Gene Cutter can give you just one gene or region, or all genes that your input touches.

Reference options

Codon align the region

This option will insert gaps into your alignment so that it stays in the correct reading frame, even if your sequences contain frameshifts.

Output format and translation options

If you want your results as nucleotides, choose "Do not translate". If you want your output as amino acids, you have 3 choices, described below. If your sequences contain no IUPAC codes, you may select any of the 3 translation options and your results will be identical.

If you request translated output and your sequences contain IUPAC (ambiguity) codes, they can be translated in 3 possible ways:

Note: regardless of which translation option is selected, the presence of IUPAC characters may result in a translation that cannot be read by sequence editor and analysis programs!

Symbols in output

Translations are in the standard 1-letter amino acid alphabet.

# = frame shift or partial codon
$ = stop codon (in nucleotide output)
^ = stop codon (in amino acid output)

Note: codons containing "-" are always translated to either "-" (gap) or "#" (partial codon)

Large Jobs

Gene Cutter has no limit on the number of input sequences, but please observe these suggestions!

How it Works

How Gene Cutter aligns the sequences

Because it contains an internal reference sequence, Gene Cutter frequently gives a better multiple alignment than computationally-based alignment programs. (Gene Cutter uses Hmmer v 2.32 with a training set of the full-length genome alignment).

NOTE: Mis-alignments at the ends of a coding region may result in a few amino acids/bases not appearing in the output.

How Gene Cutter finds the genes and proteins

Gene Cutter clips the coding regions from a nucleotide alignment and (optionally) codon aligns the sequences. To define the boundaries of genes or domains of interest, and to codon-align the sequences, Gene Cutter uses the coordinates from the HCV reference sequence H77.

How Gene Cutter codon-aligns

The sequences in the alignment are internally aligned to the H77 reference sequence (provided by the program). This reference sequence is annotated with the correct reading frame for all genes, so the program knows where to start the translation. Gaps will be inserted in groups of 3, or shifted to form groups of 3, and are inserted only between codons, not in the middle of a codon. In some sequences, insertions are compensated within a short distance by a deletion, or vice versa. Because these frameshifts may not inactivate the protein, if a compensating mutation is within 5 amino acids of an initial frameshift, Gene Cutter will shift it so that the reading frame is left intact. Otherwise, the frame shift is marked in the output with the hash symbol (#), and the translation is continued in the correct reading frame beyond that codon. Stop codons are marked by a dollar sign ($).


Questions or comments? Contact us at hcv-info@lanl.gov