Project Documentation & Protocols: Maize Gene Discovery Project: ESTs: Assembly
Contents: Index | Libraries | Reports | Assembly | Annotation | Unigene | Search | Ordering | Protocols | FAQs
In 2002, the Maize Gene Discovery Project switched the ZmDBAssembler to the PaCE software, which is what MaizeGDB uses for clustering; read more at sequence page.
EST contig assembly was previously done with the ZmDBAssembler. The ZmDBAssembler uses external programs such as the blastn and the Cap3 programs. The blastn program is used to cluster the ESTs and the Cap3 program is used to do the ultimate assembly and consensus sequence drawing. A simple diagram below illustrates the assembly scheme used by the ZmDBAssembler:
Contigs are EST clusters with two or more member ESTs. Singlets are ESTs that are not significantly similar to any other ESTs. The combined contigs and singlets presumably represent a set of unique maize EST clusters. For now, we called them TUG, i.e. Tentative Unique Genes. We also call the contigs as TUC, Tentative Unique Contigs and the singlets as TUS, Tentative Unique Singlets. TUG = TUC + TUS. "Tentative" indicates that they are subject to constant and frequent changes with new ESTs being added to the assembly.
- A blast hit is defined as a TUG having over 95% sequence identities with the query EST (new EST) over a 40bp region.
- If a hit is a singlet, the singlet is put into a cluster with the new EST.
- If a hit is a contig, all of its member ESTs are put into the cluster. The contig is removed from the TUGs.
- The CAP3 program is used to assemble the ESTs in the cluster and draw the consensus sequence(s). Assembly criteria: 95 as the overlap percent identity cutoff value and 30 as the overlap length cutoff value.
- Newly assembled contigs are given the names that reflect the date of the assembly. Old contigs, on the other hand, keep their names that reflect the date of their assembly. A degree of consistency is preserved this way between assemblies. In other words, the current TUC collection may have contigs named TUC04-05-**** that are contigs assembled on 04/05/00 and contigs named TUC07-14-**** that are contigs assembled on 07/14/00.
You can search the latest versions of these sequences at PlantGDB.
Return to Documentation Index | Return to Maize Gene Discovery Project index | Return to Homepage