Amplicon Pipeline Description

The ACE mitag pipeline (AMP) currently works with the following genes - SSU (16S/18S) rRNA, LSU (23S/28S) rRNA and fungal ITS. It consists of a quality control module, amplicon clustering with QIIME, and taxonomy assignment on representative OTU sequences using BLAST. Currently the pipeline runs on the forward (R1) read only, although with fungal ITS data, we have an option to perform an overlap and trim, as these products are typically smaller than the SSU or LSU products.

Quality Control

All fastq files are processed with fastqc and the reports made available. The first 20 bases of all fastq files are then trimmed to remove primer sequence, and quality trimmed to remove poor quality sequence using a sliding window of 4 bases with an average base quality above 15 using the software Trimmomatic. All reads are then hard trimmed to 250 bases, and any with less than 250 bases excluded. Fastq files are then converted to fasta files.

Read Clustering and Taxonomy Assignment

Fasta files are processed using QIIME's workflow with default parameters (97% similarity) and taxonomy assignment and alignment features suppressed. The resulting OTU table is filtered to remove any OTU with an abundance of less than 0.05%. Representative OTU sequences are then BLASTed against the reference database (Greengenes version 2013/05 for 16S, Silva version 119 for LSU, and UNITE singleton included release 04/07/2014 for fungal ITS amplicons). The main analysis output is an OTU table comprising the taxonomic classification of the best database match and a representative sequence for each OTU.

What we provide

We provide the fastqc reports, OTU tables (QIIME BIOM file, a filtered raw count table, and a filtered fraction table), processing statistics file and associated bar plot, and a file listing software versions from the QIIME pipeline.

The count and fraction OTU tables have the following columns:

  • OTU ID
  • Sample 1 Abundance (counts or fraction)
  • Sample 2 Abundance (counts or fraction)
  • …Sample N Abundance (counts or fraction) ….
  • BLAST (best match) hit name and description for representative OTU
  • Number of HSPs in best match hit
  • Length of HSPs in best match hit
  • Percentage identity of HSPs in best match hit
  • Representative sequence ID
  • Representative Sequence

The processing statistics file (jobid_run_statistics.csv) contains the following columns for each sample:

  • Sample
  • Total Read Count R1
  • Total Read Count R2
  • Reads Passing QC R1
  • Reads Passing QC R2
  • QIIME pre-filtered reads R1 (reads with < 60% match to any sequence in the database)
  • QIIME unclustered singletons R1 (reads that do not cluster with any other reads)
  • Reads in complete OTU table R1
  • Reads in filtered OTU table (>0.05% abundance) R1

Kind Regards, The ACE Team

amplicon_pipeline_readme.txt · Last modified: 2016/02/03 04:08 by brian
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0