About the project
In the 1980s and 1990s big, buttery Chardonnay was the white wine to drink. A backlash eventually occurred and some drinkers began to ask for “ABC — anything but Chardonnay”.
Since then, the styles of Chardonnay available in Australia have broadened significantly and Chardonnay is making a deserved comeback. This is a testament to the great malleability of the variety and to the improved quality and diversity of available plant material. Plantings of this grape in Australia far outweigh any other white grape variety — in part because Chardonnay is an important component of sparkling wine.
As the world's fourth largest exporter of wine, Australia’s wine industry is very important to the nation’s economic health. With 18 per cent of all grape plantings in Australia being Chardonnay, the variety contributes significantly to the economy and viability of the wine industry.
The Australian Wine Research Institute (AWRI) in South Australia and Canada’s BC Genome Sciences Centre at the University of British Columbia are investigating the genetic make-up of the Chardonnay grape to help the industry better understand the variety.
“Unraveling the underlying genetic complexities of grapevine and how genetic variation shapes wine quality is critical. It can facilitate vine selection and enable the tailoring of wines by winemakers, allowing them to meet ever-changing consumer demands and access new markets,” said Dr Simon Schmidt, an AWRI Senior Research Scientist.
The trans-Pacific project team, led by the AWRI’s Dr Anthony Borneman and Dr Schmidt, are creating a de novo reference genome for Chardonnay. Chardonnay, like all wine-grapes, is propagated from cuttings rather than grown from seed. This means that diversity in a variety comes from clonal selection rather than breeding. Having a high quality reference genome for Chardonnay will enable the identification and mapping of genetic variation between commercially selected clones of this cultivar. A better understanding of the available genetic variation in Chardonnay, and wine grapes more generally, will help with planting decisions and potentially be used to identify clonal type in the field.
Chardonnay has an unusual genetic heritage, resulting from a cross, centuries ago in northeastern France, between Pinot Noir and Gouais Blanc. There are now many clones of this variety, exhibiting remarkable variations in fruit composition, flavour, aroma, colour, ripening time, flower morphology (leading to seedless grapes), bunch morphology, and yield.
Until very recently, only two de novo grapevine sequencing projects had been completed, both on the Pinot Noir variety. Recent advances in sequencing have made these projects far more achievable and have seen an order of magnitude improvement in the quality of the resulting assemblies.
Last month saw the release of a new de novo Cabernet Sauvignon genome assembly by a U.S group, as well as the completion of the Chardonnay genome assembly described here, both of exceptional quality. The understanding of the relationship between grapevine genotype and phenotype is growing rapidly as a result of this work and these new assemblies should greatly aid in understanding the molecular basis of grapevine inter-varietal and inter-clonal variation.
To crunch the numbers, the AWRI is using a QRIScloud large memory instance, which bioinformatician Dr Michael Roach described as “the workhorse for the project”. It is a virtual Ubuntu instance running 60 cores with 900 GB RAM and two storage volumes, an initial 2 TB and a subsequent larger 5 TB. The team is also using the Nectar-funded Genomics Virtual Laboratory to test software and assemblies.
“De novo genome assemblies on this scale require an enormous amount of compute time and memory as the assembly involves comparing trillions of bases worth of sequencing reads and piecing them together,” said Dr Roach. “To make matters more difficult the genome we are assembling is diploid and highly heterozygous; assembling such genomes is sort of like trying to simultaneously do two very similar looking jigsaw puzzles with all the pieces jumbled together.”
As it lacked the computing resources in-house to complete a task of this scale, the AWRI turned to QRIScloud. Upgrading hardware might have done the trick but doing so would have tied up the AWRI’s resources for weeks. “QRIScloud primarily allowed us to complete an assembly for ourselves without it impacting on our other bioinformatics projects,” said Dr Roach.
The AWRI found QRIScloud’s support team to be a tremendous help in tailoring the instance to the project’s needs, including preloading most of the required software, and increasing memory capacity relatively quickly when needed.
“Because it was a virtual instance made just for us we were given root access to install anything the system lacked. The resources were all assigned to the instance, which meant we didn’t have to worry about working with job queuing systems etc., (but could if we wanted to). All this meant that setting up and running the assembly was very easy and straightforward from our perspective, compared to how it would have been on a classical style shared high-performance computing cluster.
“Another huge benefit came later on when we ran into trouble with our first assembly attempt. An update to the program caused it to generate a lot more raw assembly data than in previous versions. We had also underestimated the volume of HDD [hard disk drive] space we would need for an assembly on this scale (getting accurate predictions was difficult). As such we promptly ran out of HDD space and had to abort the first assembly attempt. We asked if it were possible to get more HDD space on the instance and within a week we had a shiny new 5 TB volume, which allowed us to successfully complete the assembly.”
“If we didn’t have the GVL to test the software and assemblies then we would have had to spend the first few days or so on the [QRIScloud] large memory instance doing these tests etc., which is not ideal. The trial runs on the GVL were invaluable to giving us a much better idea of the sort of resources we would need for the large assembly and the timeframe for which we’d need them. This capability was especially useful given the ongoing developments in assembly software supporting the newer sequencing technologies.
“We only really used it [GVL] as a compute server and didn’t use the Galaxy or the SMRT analysis suites that much, but had we needed to do any routine bioinformatics tasks the tools were there ready for use. The nature of the virtual instance allowed us to have root access and hence the same freedom we have with our own server, which just made things easier. Our impressions are that the GVL is a very capable alternative to maintaining an in-house computing cluster.”
Completion of the Chardonnay reference genome primary assembly by the AWRI has been expedited through the use of QRIScloud. “Of course there remains the polishing and curation tasks,” said Dr Schmidt; “however, the combination of PacBio long-read sequencing, QRIScloud bioinformatics infrastructure and newly developed haplotype aware assembly software has already provided a high-quality draft genome. By using this high-quality draft, we have been able to fasttrack the mapping of Chardonnay clone sequence data which will bring a better understanding of structural and sequence variation in this clonally propagated woody plant species.” Which should all hopefully result in the wine industry producing exceptional Chardonnay vintages in the years to come.