cnag.bsc.es

  • Increase font size
  • Default font size
  • Decrease font size

Simulated Genome Released to Participants

E-mail Print PDF
For the purpose of self-evaluation, we are providing a link to the simulated genome sequence. You may download the genome here.
Last Updated on Friday, 25 March 2011 15:31
 

What is dnGASP?

E-mail Print PDF
It is a collaborative effort among researchers to compare and evaluate methods and strategies for de novo genome assembly (dnGASP) using data from 2nd generation sequencing platforms and is being organized by the National Center for Genome Analysis (CNAG) in Barcelona, Spain. A sister project dubbed RGASP3 (the third incarnation of th RNA-Seq Genome Annotation Assessment Project) is focused on evaluating RNASeq read alignment algorithms and will be organized separately by the Centre for Genomic Regulation and the Wellcome Trust Sanger Institute. Both projects will culminate in a joint workshop in Barcelona April 5-7, 2011, organized in partnership with the International Center for Scientific Debate (ISCD), an initiative of Biocat with support from “la Caixa” Welfare Projects, and supported by additional funds from READNA.

This web portal is dedicated to providing the primary point of information and data exchange for dnGASP. We encourage those groups interested in participating in RGASP3 to retrieve the relevant information at the CRG.

All groups interested in testing their algorithms for assembly of large genomes from second generation sequencing data are invited to participate in dnGASP. The format of the project will follow the tradition of the previous "GASP" workshops (CASP, GASP, EGASP, NGASP, RGASP, etc.): a dataset will be provided, submissions of the processed dataset will be solicited (deadline: Feb 15, 2011), the submissions will be evaluated by CNAG, and the afforementioned workshop will be held to discuss the results.

The Genome

The reference genome is an unidentified naturally composed eukaryotic genome of known sequence with the following characteristics:

Ploidy: diploid (SNP frequency ~1/1000)
Genome size: ~1.8Gb
Chromosome number: 14
GC content: ~42% (36-50%)
Repeat content: similar to vertebrate repeat content
Derivation: Our "genome" sequence is derived from sequence assembled by a traditional combination of WGS and clone-based approaches using Sanger technology, an undisclosed transformation was applied to the sequences to mask their identity, and finally alleles were simulated based on a realistic SNP distribution (SNPs and small indels). The genome additionally contains a minimal amount of purely artificial sequence introduced by the evaluation committee.

Read Data

A total of 64x coverage of the simulated genome has been simulated as Illumina GAIIx reads. Sequencing errors were introduced according to quality values (empirically determined from real reads). The 64x coverage is subdivided into four libraries with the following characteristics:

read length insert size kind coverage
114nt x 2 500bp paired ends (PE) 44x
36nt x 2 3kb mate pairs (MP) 8x
36nt x 2 5kb mate pairs (MP) 8x
36nt x 2 10kb mate pairs (MP) 4x

The read data can be downloaded here: DOWNLOAD

Submissions

We will ask that one assembly be submitted for each of the following:

  • level 1: PE
  • level 2: PE + 3kb MP
  • level 3: PE + 3kb + 5kb MP
  • level 4: PE + 3kb + 5kb + 10kb MP

Each assembly submissions will consist of a single multifasta file of scaffolds (including all scaffolded and unscaffolded contigs). Gaps of estimated size are to be represented by strings of N’s of length equal to the most likely gap size. Scaffolds which do not contain at least one contig of length 115 nt (read length plus one) or greater will not be considered for evaluation. We reserve the right to apply an even higher length threshold during evaluation if it is deemed necessary to make the analysis fairer or more tractable.

Deadline

March 1st 23:59 GMT. Files are to be uploaded through the project web site.

Evaluation

We will evaluate all submitted assemblies up to a maximum of one per level per participant according to several criteria. In addition to using the standard measures of assembly quality (N50/N90 and the largest/mean/median contig/scaffold size), we will leverage the fact that the reference genome sequence is absolutely known and the position from which each read was simulated is known. Assemblies will be aligned to the reference genome using established alignment methods and various measures will be calculated to quantify the level of completeness (e.g. coverage, number of gaps closed, N50 of aligned contigs, etc.) and correctness (e.g. synteny, accuracy of gap size estimation). Additionally, we will measure the ability to bridge different types of repeats (both “naturally” occurring and artificially constructed and embedded) varying in repeat unit size, total length, copy number within the genome and amount of variation. Finally, we will analyze other factors which might impact assembly quality such as variation in GC content or SNP rate.

Last Updated on Wednesday, 13 April 2011 09:06
 

Sponsorship

E-mail Print PDF

International Center for Scientific Debate (CIDC)

The International Center for Scientific Debate (ICSD) is an initiative of Biocat, with the support of Welfare Projects “la Caixa” Foundation, which aims to drive first-rate international scientific events to promote dialogue, collaboration and open exchange of ideas, projects and knowledge among experts of renowned national and international prestige. The ICSD aims to generate advanced debate on the various disciplines that are linked to the life sciences field and their repercussion on society, contributing to Catalonia’s position as a country of scientific excellence. The dnGASP/RGASP3 workshop is one of the ICSD debates planned for 2011.

READNAAdditional support is provided by READNA.

Last Updated on Tuesday, 15 March 2011 15:46
 

dnGASP mailing list

dnGASP

Main Menu

Team Login & Registration

Please login using your team name and password. Login and registration is now currently restricted to dnGASP workshop participating teams only. Contact Tyler if you have forgotten your password or need another user created.