Genome Assembly

Downloadable File
Organism: 
Seriola dorsalis (Yellowtail Amberjack)
Program, Pipeline Name or Method Name: 
Masurca
Program, Pipeline or Method version: 
2.3.2
Source Name: 
Seriola dorsalis assembly version 1 in fasta format
Source Version: 
1
Time Executed: 
09/09/2016
Materials & Methods (Description and/or Program Settings): 
Paired-end and mate-pair reads in FASTQ format were used for the assembly. MaSuRCA assembler (version 2.3.2) (Zimin et al. 2013) was used to assemble the raw data into scaffolds. To obtain a more reasonable assembly for visualization in Jbrowse, scaffolds were filtered for the following parameters: scaffolds less than 800 bases or where 90% of its length was contained in a larger scaffold were removed, and (Sedor_35K.fasta) must contain a gene or have a size larger than 10,000 bases (bioprojectID PRJNA319656) resulting in the 4,717 scaffolds. These scaffolds were then scrutinized for contamination. NCBI Reference Sequence: NC_001422.1 was blast queried against the genome assembly to identify PhiX contamination. One scaffold (scaffold_26907) was identified and removed. Blobtools (Kumar et al. 2013) was used to identified another 277 scaffolds (contamination277.txt) that appear to be contamination from the phytoplankton Emiliania huxleyi. The final assembly size is 4,439 scaffolds. The quality of the final assembly was assessed using BUSCO{Simao:vp}. The genome contains 2,848/3,023 BUSCO groups.