Implementing an RNA-Seq pipeline on DNAnexus

Blog Authors

The DNAnexus science internship has sought to mentor students, from undergraduates to Ph.D. candidates, in bioinformatics since 2015. This project was performed by Adeline Petersen during her internship with DNAnexus under the supervision of Samantha Zarate.

We are currently recruiting summer 2019 interns. If you are interested in interning with the DNAnexus science team, please apply here.

RNA-sequencing, or RNA-seq, is an important tool used to characterize and analyze the RNA transcripts present in a sample. Utilizing next-generation sequencing (NGS) technology, RNA-seq boasts advantages over previous transcriptome profiling technologies because it has higher sensitivity and precision in measuring gene expression than other hybridization-based approaches. Given the decreasing cost of NGS, the popularity of RNA-seq is on the rise: the publications on PubMed NCBI mentioning RNA-seq have increased from only 6 articles in 2008 to 3580 articles published in 2017, and the targeted RNA-seq market has been listed as a major growth area up to 2024. The clinical applications of RNA-seq range from detection of mutations in tumor and other inherited diseases to comparative gene expression analysis. Accordingly, interest in the questions that RNA-seq workflows can answer has increased among our customers. Here we present to the community an RNA-seq workflow that we developed to work out-of-the-box, as well as benchmarks demonstrating our workflow’s speed and accuracy.

Introduction

Given the rise in popularity of RNA-seq, many research institutions have published best-practices RNA-seq pipelines. The ExaScience Life Lab in Leuven, Belgium has published Halvade-RNA, a parallel, multi-node pipeline that primarily focuses on variant calling based on the GATK best practices; the Broad Institute has also published an RNA-seq best-practices short-variant calling workflow. These variant calling workflows allow users to focus on identifying variants in genes expressed, as RNA editing can lead to variation between the genes encoded in DNA and those expressed in RNA. ENCODE has similarly published different RNA-seq read quantification pipelines, all of which leverage STAR for read alignment and RSEM for read quantification. Measuring read quantification can help researchers establish a healthy baseline for gene expression and therefore identify disease progression over time. Both variant calling and read quantification are common uses of RNA-seq data, so we wanted to implement both tasks in our workflow.

The QuickRNASeq model, developed by Pfizer, focuses on both variant calling and read quantification using large-scale RNA-seq data. QuickRNASeq generates accessible data analyses and visualizations by using STAR for read alignment, featureCounts from the Subread package for read quantification, VarScan for germline variant calling, and RSeQC for quality control of RNA-seq data. We chose to model our first attempt at a DNAnexus-recommended RNA-seq workflow after the QuickRNASeq model because of its versatility, accuracy, runtime, and emphasis on large-scale data input.

QuickRNASeq Model

Figure 1: Visualization of the proposed workflow. We utilized STAR for read alignment, featureCounts for read quantification, VarScan for germline variant calling, and RSeQC for quality control of RNA-seq data.

Features of the Workflow

STAR

STAR, or Spliced Transcripts Alignment to a Reference, is used to map reads to a reference genome. In particular, STAR lends itself well to RNA-seq analysis by accepting an annotations file in GTF format, allowing it to detect splice junctions within a read.

We compared STAR v2.6.1a, published as of August 2018, with v2.5.3a and found that these metrics were nearly identical for both the generate genome index and mapping steps (Figure 2).

Runtime Comparison STAR

Figure 2: Runtime comparison between STAR versions 2.5.3a and 2.6.1a. The runtimes for the generate genome index and mapping steps are nearly identical for both versions.

featureCounts

We chose featureCounts v1.6.2 from the Subread package as the application for read quantification. This application is an efficient and general-purpose read summarization program, and counts mapped reads to specific genomic features. Furthermore, featureCounts is both accurate and specific, accounting for insertion/deletions, junctions, and structural variants. As input, featureCounts uses the alignment file from STAR (the genome BAM file) and an annotation file. It outputs a read counts file and a read counts summary file.

VarScan

We chose VarScan v2.4.3 as the germline variant calling application. VarScan is a platform-independent variant calling program that utilizes samtools mpileup. It can call germline variants, somatic variants, copy number variants, and de novo mutations. For our application, we used the germline variant calling functionality of VarScan to identify SNPs and insertions/deletions present in the RNA-seq input data compared to the reference genome. This application takes the genome BAM from STAR, a reference genome, and an annotation file as inputs; its primary output is a VCF file containing SNPs and indels.

RSeQC

We chose RSeQC v2.6.5 as the quality control application for the workflow. Quality control is particularly important for RNA-seq analyses because the transcripts present in a sample can vary drastically over time, between cell types in a sample, and between physiological conditions of a sample. Therefore, verifying the quality of the RNA-seq data through reproducibility, as is typically done with DNA sequencing, is challenging. RSeQC includes a number of metrics to ensure the quality of alignment, sequencing, and library preparation. It also assesses the feasibility of performing alternative splicing analysis. RSeQC takes the genome BAM file from STAR as an input, along with a 12-column BED file as the annotation file. RSeQC outputs a PDF file containing metrics and plots for analyzing alignment and sequencing quality, a sampling of which is shown in Figure 3.

RSeQC Output

Figure 3: Example sample graphics from the RSeQC quality control report.

3a: clipping profile showing clipped read alignment.

3b: gene body coverage plot showing high average coverage over the transcriptome.

3c: heat map showing nucleotide Phred quality score over read, where blue represents low nucleotide density and red represents high nucleotide density.

3d: junction saturation plot, showing medium RNA-seq data saturation.

Benchmarking

At present, there does not yet exist a “truth set” of read quantities we can use as a benchmark. Therefore, because ENCODE has been leading the effort towards uniform processing pipelines, we chose to compare our workflow against it for our benchmarking. We benchmarked our workflow against the ENCODE RNA-seq workflow, which had also been implemented on DNAnexus and used a different version of STAR for alignment and RSEM for read quantification. We modified our own workflow (Figure 1) to provide a more even comparison, paring it down to utilize STAR for alignment and featureCounts for read quantification.

We found that the read quantification results correlated closely between our workflow and the ENCODE workflow, with a mean correlation coefficient of 0.895 and minimum 60,000 reads mapped (Figure 6). The workflow we developed was nearly twice as fast as the ENCODE workflow, with much of the benefit coming from our implementation of featureCounts (Figure 7). As a result, we can be confident of both the results and efficiency of our workflow.

However, we must note that RSEM and featureCounts are not identical programs: RSEM performs transcript and read quantification, whereas featureCounts performs read quantification only, meaning that RSEM can give users more information than what we used to perform this read quantification analysis. This may explain the differences in runtime between our workflow and that of ENCODE.

Correlation Scatter Plots

Figure 4: Correlation scatter plots between read counts generated by our workflow (x-axis) and read counts generated by the ENCODE workflow (y-axis). The four scatter plots represent the outputs from the samples BT474, SKBR3, SRR064286, and SRR064287.

Comparison DNAnexus Encode

Figure 5: Runtime comparison between the workflow we developed and the ENCODE workflow. Our workflow runs at least 2X faster than the ENCODE workflow, though this may be due to RSEM’s ability to perform isoform as well as read quantification.

Final Notes and Conclusion

DNAnexus has worked with many customers on RNA-seq workflows. Through this experience, we decided that developing a workflow to make publicly available and to universally recommend to customers would benefit the scientific community as a whole. We are committed to providing users with the most up-to-date tools and standards available, and this preliminary effort represents a foray into providing best-practices workflows accordingly.

We generated the data presented in these benchmarks and figures using the SRA database (SRR064286 and SRR064287) as well as BT474 and SKBR3 samples.

The workflow is currently available upon request; please contact support@dnanexus.com to obtain access. In the future, we plan on releasing or updating all applications mentioned here on the DNAnexus app library and creating a Featured Project on the platform showcasing the workflow and giving users sample data on which they can run this workflow. We also plan on investigating other RNA-seq analysis tools such as Sailfish, Kallisto, and Salmon in the future.

This research was performed by Adeline Petersen as part of her internship with DNAnexus. The project was supervised by Samantha Zarate and assisted by Yih-Chii Hwang, Ph.D., Alpha Diallo, Ph.D., Naina Thangaraj, Steve Osazuwa, Brett Hannigan, Ph.D., and Andrew Carroll, Ph.D.

Battling Aligners: Benchmarking Tools on Mosaic with Mock Metagenomes

Blog Authors

The DNAnexus science internship has sought to mentor students, from undergraduates to PhD students, in bioinformatics since 2015. This project was performed by Diem-Trang Pham during her internship with DNAnexus under the supervision of Sam Westreich.

We are currently recruiting summer 2019 interns. If you are interested in interning with the DNAnexus science team, please apply here.

The Problem

Investigation of microbiomes, the different types of bacteria found living together in an environment, has grown increasingly popular in recent years.  With decreasing costs of sequencing and an increased number of available programs for analyzing the sequenced digital data, more researchers are turning to metagenomics – sequencing all the DNA extracted from all microbes within an environment – for examining the microbiome.

Although a number of different approaches for aligning the reads of a metagenome to a reference database have emerged over the last decade, it is tough to decide which program to use for an analysis.  Each program claims to be the best option, but the authors often evaluate on their own criteria and/or test datasets. An impartial, third-party, easily replicated comparison is needed to determine how different aligners perform. This established, objective measure is currently poorly defined.

Several papers and groups have attempted to perform third-party analyses to compare different metagenome tools. Groups like CAMI hold community challenges to compare different metagenome tools. However, even in these cases, researchers must trust that the analyses were performed correctly, and there is no opportunity for researchers to update the published results with new versions of tools, different tool parameters, or different reference datasets. Individuals can download CAMI raw files for testing, but must use their own machines for processing – which can introduce even more variation. CAMI challenges and other benchmarking papers provide an important starting point, but are difficult to replicate, tough to update,  and are of little use to users without access to their own computing cluster.

In order to allow for more easily replicated tests, we built new applications on the Mosaic data analytics platform.  Mosaic is a community-oriented platform, powered by DNAnexus, for microbiome data analysis that allows users to create custom third-party applications.  These applications run on AWS-hosted instances in the cloud, and can be run with a wide range of available hard drive space, RAM, and number of CPUs.  Because users write the application code that is executed, parameters can be changed and customized for any program, and run with various combinations of RAM and available CPUs.

To ensure we accurately scored the results of different alignment tools on our metagenomes, we synthetically generated FASTQ datasets, mimicking the bacteria present in a real life gut environment.  We generated low, medium, and high-complexity metagenomes based on the most abundant bacteria found within the human gut environment, using a modified version of the program metaART.  Using these synthetically generated metagenomes, we ran several different aligner programs, including Bowtie2, BURST, CUSHAW3, MOSAIK, MINIMAP2, and Burrows-Wheeler Aligner (BWA).  The synthetic metagenomes were compared against two references – one built from their parent genomes, and one built from 952 genomes of NCBI-identified Human microbes.

Creating Test Data

Creating a Mock Microbial Community Metagenome

We decided to build mock microbial community metagenomes to simulate gut microbiome data, in order to make confident scoring assessment of the results.  We chose organisms from a list of 57 of the most common human gut microbial species, originally published by Qin et al. (2010).  Qin’s paper includes both a list of species and their average relative abundances within samples.  We created low, medium, and high-complexity datasets by varying the number of species included in each mock metagenome:

  • The low-complexity metagenomes contain reads from the 19 most abundant species (top ⅓ of list);
  • The medium-complexity metagenomes contain reads from the 28 most abundant species (top ½ of list);
  • The high-complexity metagenomes contain reads from all 57 species on the list.

For each complexity level, synthetic metagenomes were produced with three different levels of quality, created by varying the insertion rate, deletion rate, and overall base quality:

Insertion rate Deletion rate Base quality
Low 0.00027 – 0.00045 0.00033 – 0.00069 10 – 20
Med 0.00018 – 0.00030 0.00022 – 0.00046 10 – 30
High 0.0009 – 0.00015 0.00011 – 0.00023 10 – 40

Microbial GenomesSource: https://www.nature.com/articles/nature08821

We used the program metaART to create simulated synthetic sequencing reads from our starting genomes.  metaART is a wrapper for the art_illumina binary, part of the ART simulation tool.  The original version of metaART was able to rapidly create simulated Illumina paired-end reads from a starting genome, but lacked the additional ability to provide custom parameters for base quality and indel rate.  We made modifications to the metaART code to allow for these additional parameters to be specified and provided different values to create low, medium, and high quality synthetic metagenomes. The modified version of metaART, named “magic-metaART”, is available as an application on the Mosaic platform.

Using magic-metaART, we generated eighteen simulated metagenome datasets – nine with 100-bp paired-end reads at varying (low, medium, high) levels of quality and/or complexity, and nine with 150-bp paired-end reads at varying levels of quality and/or complexity.  These simulated metagenome datasets are publicly available in Mosaic workspaces, along with the starting genomes used.

Scoring against Aligners

We aligned our simulated metagenomes of varying complexity, quality and read length against the databases of starting genomes using a variety of different alignment programs.  All of these programs have been created as applications on Mosaic and are publicly available. All programs were run with default parameters; for BURST, the cutoff threshold was set at 0.97.

Alignment Results 100bp

Figure 1: 100 bp reads, low quality data, aligned against known (starting) bacterial genomes

Alignment Results 150BP

Figure 2: 150 bp reads, low quality data, aligned against known (starting) bacterial genomes

For most of these tools, the accuracy when aligning against the known bacterial genomes (the genomes used to originally construct these metagenomes) was quite high.  We saw slightly better alignments when using 150 bp paired-end reads [Figure 2] when compared to 100 bp paired-end [Figure 1]. CUSHAW3 had the highest accuracy at all complexity levels, while BURST had the highest number of false negatives (reads that were not aligned to their genome of origin).

We also aligned both the 100 bp paired-end reads [Figure 3] and 150 bp paired-end reads [Figure 4] against a larger database, consisting of 952 complete genomes of known human-associated microbial species, derived from NCBI’s reference genomes collection.

Alignment Results NCBI 100bp

Figure 3: 100 bp reads, low quality data, aligned against the NCBI 952 reference genomes

Alignment Results NCBI 150bp

Figure 4: 150 bp reads, low quality data, aligned against the NCBI 952 reference genomes

For all tools except CUSHAW3, we observed approximately a 20-30% decrease in true positives when aligning against the larger NCBI database, with an accompanying increase in the number of false positives.  Because we considered a match to any organism beyond the exact species that provided the read to be a false positive, the introduction of closely related species is likely responsible for the decreased accuracy.  Once again, increased complexity led to slight decreases in the number of true positives and slight increases in the number of false positives.

Overall, we saw few performance differences between low-quality and high-quality datasets.  We did note that longer reads (150 bp paired-end vs. 100 bp paired-end) increased the number of true positives.  We also observed that increases in complexity led to slight decreases in the number of true positives. Against both reference databases, CUSHAW3 correctly aligned the highest percentage of reads, while BURST had a higher number of false negatives than the other aligners.

Evaluating the Resource Requirements  of Aligners

We observed the amount of time and memory needed to run each aligner for two steps: indexing the reference, and aligning metagenomic reads against the reference.  Building an index is often more intensive, due to the amount of data that must be read and processed, although this usually only needs to be done once. After a reference is indexed, it can be used for an unlimited number of searches.

Memory Building NCBI Index

Figure 5: Memory (gigabytes) needed to build a searchable index from the 952 NCBI reference genomes.

Running Time Building Index

Figure 6: Time (seconds) needed to build a searchable index from the 952 NCBI reference genomes.

Comparing the six aligners, we saw large differences in memory and time required [Figures 5 & 6].  MOSAIK is not displayed in the graphs, as it failed to successfully finish building an index before the instance timed out (>24 hours).  BURST requires the most memory to build an index, while BWA and CUSHAW3 both have very low memory requirements. Of the aligners that successfully built an index, BOWTIE2 was much slower than the others, while MINIMAP2 was fastest by a significant margin.

Running Time Aligners 100bp

Figure 7: Time (minutes) needed to align low, medium, or high complexity simulated metagenomes (100bp, low quality reads) against the Known Bacteria reference.

When aligning the simulated metagenomes against each reference, we saw a linear increase in run time as the complexity of the metagenome increased [Figure 7], but no change in run time due to the quality of the simulated metagenome.  When aligning against the index of known bacteria used to generate the simulated metagenomes, MOSAIK required the longest time to align reads to the reference, while all other tools finished the alignments in under 200 minutes. MINIMAP2 performed the fastest of the 6 tools, finishing even the high-complexity metagenome in under two hours.

Relative Tool Performance versus Cost

F1 Score vs Cost

Figure 8: F1 scores for each aligner (color) at each complexity level (shape), graphed versus cost (dollars) to run each program on the Mosaic platform.

To provide an “overall” assessment of the performance of the different aligners, we looked at the F1 score (the harmonic average of precision and recall) for each tool compared to its cost to run on the Mosaic platform [Figure 8].  CUSHAW3 had the highest F1 score, but also proved to be the most expensive to run, due largely to its long run time for high-complexity simulated metagenomes. BURST had the lowest F1 score, lagging behind the other aligners due to an increased number of false negatives.  MINIMAP2 proved to be the cheapest aligner to run, due to its rapid time and reasonable memory demands.

Abundance Profile Accuracy After Alignment

Although the percentage of aligned reads is a useful observation, the true question is how the results of each aligner predicts the actual abundance profile from the simulated metagenomes. In other words, how well do the calculated abundance profiles match the original abundances of the genomes used to create the simulated data?

Cosine Simularity 100bp

Figure 9: Cosine similarity for abundance profiles calculated from each aligner’s results, run against the known bacteria reference.

When we compare the estimated abundances against the original abundances from the results matched against the database of known bacteria [Figure 9], we see that the variance increases for all tools as complexity increases.  This increase in variance leads to a decrease in cosine similarity for all tools; CUSHAW3’s shows the least decrease, while MOSAIK’s results are the most dissimilar from the truth. Surprisingly, although BURST had the highest number of discarded reads, it appears to still provide a fairly accurate estimate of the abundances of different organisms within the simulated metagenome.

Cosine Simularity 150bp

Figure 10: Cosine similarity for abundance profiles calculated from each aligner’s results, run against the NCBI bacteria reference.

When we examine the estimated abundances from alignment against the NCBI database of bacteria [Figure 10], we see an interesting story; MINIMAP2, BWA, and Bowtie2 all perform at a very similar level, while BURST and CUSHAW3 lag slightly behind.  In contrast to the abundance estimates against solely known bacteria, these results increase in accuracy as the complexity of the simulated metagenome increases; we hypothesize that some of the reads always map to incorrect annotations, and thus adding more diversity reduces their negative impact on cosine similarity score.

Conclusions

Although it is extremely difficult to conclude that any one alignment tool is objectively better than another, we have showed that, through the use of Mosaic applications, we can evaluate the performance of multiple alignment tools on simulated metagenomic datasets.  We created simulated metagenomic datasets of varying quality and complexity, modeled after the most abundant organisms found within the gut microbiome. We aligned these simulated metagenomes against two references of different sizes and observed the accuracy, memory requirement, and time requirement of each program.

Overall, we observed the highest alignment accuracy from CUSHAW3, and the highest speed from MINIMAP2.  Bowtie2 and BWA both lagged in accuracy when compared to CUSHAW3, and ran slower than MINIMAP2 without offering any additional accuracy.  BURST had the highest number of false negatives in alignments.

When we examined how the number of annotations impacted estimated abundance, we found that, although BURST drops significantly more reads than the other aligners, these dropped reads do not cause an overly large decrease in abundance accuracy estimation compared to other tools.

In conclusion, we show that different aligner tools have varying performances in regard to accuracy, speed, and memory usage.  We strongly encourage that, when developing an alignment-based pipeline for metagenomic data, users test multiple aligners to compare the results for optimal speed and accuracy.

Program and File Resources

Availability of Mock Metagenomes

The 100bp simulated gut metagenomes are available here.

The 150bp simulated gut metagenomes are available here.

All of the alignment tools are publicly available to run on Mosaic.

Tool versions: ART – 2.5.8; Bowtie2 – 2.3.4.1; BWA – 0.7.17; BURST – 0.99.7; MOSAIK – 2.2.26; CUSHAW3 – 3.0.3; SAMtools – 1.9; MINIMAP2 – 2.12

Reference

Qin et al. (2010). A human gut microbial gene catalogue established by metagenomic sequencing.

DNAnexus at ASHG: Elevating Translational Informatics

ASHG Logo

We are gearing up for the annual American Society of Human Genetics (ASHG) meeting next week in San Diego, and are especially excited to debut our new ApolloTM platform for multi-omic and clinical data science exploration, analysis, and discovery. ApolloTM provides translational researchers with a scalable cloud environment, flexible data models, intuitive analysis and visualization tools to simplify research workflows for R&D teams globally and dramatically improve the efficiency of research organizations.

Visit DNAnexus in booth 622 to learn more about how to leverage our newApolloTM to inform decision making, save time, and maximize value at each step of the drug discovery process. Stop by our booth anytime during the conference, or email us to schedule a meeting with a member of our science team.

Lunchtime Talk:

Leveraging Translational Informatics for the Advancement of Drug Discovery & Improved Clinical Outcomes

Thursday, October 18th, 12:30pm – 1:45pm
San Diego Convention Center, Upper Level, Room 30E

Join us to learn how biopharma customer MedImmune and academic medical center, Baylor College of Medicine’s Human Genome Sequencing Center are leveraging massive volumes of biomedical data to gain better insights into biological, environmental, and behavioral factors that influence health.

Speakers:

  • Medimmune LogoDavid Fenstermacher, PhD, Vice President R&D & Bioinformatics, MedImmune
  • Will Salerno, PhD, Director Genome Informatics atBaylor College of Medicine Human Genome Sequencing CenterHGSC Logo
  • Brady Davis, VP Strategy, DNAnexus  

Lunch will be provided; RSVP to reserve your spot!

Activities in DNAnexus Booth 622:

Visualization Hour – Come Explore GWAS data in 3-D, using virtual reality!

Human brains are wired for spatial reasoning, making virtual reality a potentially powerful way for scientists to achieve an intuitive understanding of data. To test VR on genomic data, we combined two iconic visualizations in genomics, the Manhattan plot and the circos plot, into a fully immersive data exploration experience called BigTop. Come explore the GWAS circus with us and learn about other exciting visualization projects at DNAnexus!

  • Wednesday, 10/17, 12-1pm
  • Thursday, 10/18, 12-1pm
  • Friday, 10/19, 12-1pm

Translational Informatics Hour – Elevating Translational Informatics

Come demo the new DNAnexus ApolloTM and hear how biopharma customer MedImmune is using it to inform decision making, save time, and maximize value at each step of the drug discovery process.

  • Wednesday, 10/17, 2-3pm
  • Thursday, 10/18, 2-3pm
  • Friday, 10/19, 2-3pm

Meet the xVantage Group

Come meet members of xVantage Group, a dedicated team with deep technical and scientific expertise, ready to create innovative and tailored solutions for customers and partners. From data ingestion to pipeline and software development, learn more about the broad range of services and customers xVantage is supporting.

  • Wednesday, 10/17, 10-11am
  • Thursday, 10/18, 10-11am

Customer Talks:

Wednesday, October 17th
Time Title Speaker Location
5:30 PM Sequencing of whole genome, exome and transcriptome for pediatric precision oncology: Somatic variants and actionable findings from 253 patients enrolled in the Genomes for Kids study Scott Newman, St Jude Children’s Research Hospital Session #25 – Integrated Variant Analysis in Cancer Genomics; Ballroom 20BC
6:15 PM Structural variation across human populations and families in more than 37,000 whole-genomes Will Salerno, Human Genome Sequencing Center, Baylor College of Medicine Session #33 – Characterization of Structural Variation in Population Controls and Disease; Ballroom 20A

Posters Featuring DNAnexus:

Wednesday, October 17th
2:00pm-3:00pm
Poster Number Title Affiliation
PgmNr 1761 Variant identification from whole genome sequencing at the UPMC Genome Center UPMC Genome Center
3:00pm-4:00pm
Poster Number Title Affiliation
PgmNr 1998 Hi-C-based characterization of the landscape of physically interacting regions and interaction mechanisms across six human cell lines using HiPPIE2. University of Pennsylvania, DNAnexus
PgmNr 1554 A high-quality benchmark dataset of SV calls from multiple technologies Illumina, Baylor College of medicine
PgmNr 3186 Identification of novel structural variations affecting common and complex disease risks with >16,000 whole genome sequences from ARIC and HCHS/SOL. Human Genetics Center, University of Texas Health Science Center, Baylor College of Medicine HGSC, Albert Einstein College of Medicine, DNAnexus, Fred Hutchinson Cancer Research Center, University of Washington, Johns Hopskins University

 

Thursday, October 18th, 3:00pm-4:00pm
Poster Number Title Affiliation
PgmNr 1222 Novel mutations of SCN9A gene in patient with congenital insensitivity to pain identified by whole genome sequencing. Intermountain Healthcare
PgmNr 1648 How well can you detect structural variants: Towards a standard framework to benchmark human structural variation. NIST, NHGRI, Genome in  a Bottle Consortium Baylor College of Medicine HGSC, PacBio, Spiral Genetics, NCBI, BioNano Genomics, 10x Genomics, Max Planck Institute, USC, Boston University Medical School, DNAnexus, Joint Initiative for Metrology in Biology

 

Friday, October 19th, 2:00pm-3:00pm
Poster Number Title Affiliation
PgmNr 1439 Are we close to constructing a fully diploid view of the human genome? DNAnexus
PgmNr 1505 How well can we create phased, diploid, human genomes? An assessment of FALCON-Unzip phasing using a human trio. DNAnexus
PgmNr 2732 A population genetics approach to discover genome-wide saturation of structural variants from 22,600 human genomes Center for Integrative Bioinformatics Vienna, University of Vienna, Baylor College of Medicine HGSC