Table 2

Roadblocks and strategies to encounter that appeared during the deployment of Chan Zuckerburg ID SARS-CoV-2 consensus genome pipeline on AKUs’ existing infrastructure

RoadblockDescriptionRisk mitigation strategyLinks
Docker downloading issueWhile downloading essential files (reference, primer, kraken and VADR) using docker from s3 amazon server, encounter an issue in downloading reference genome.The issue was specific to Linux environment, hg38.fa.gz was downloaded on windows environment and then transferred to Linux workspace.https://github.com/chanzuckerberg/idseq-workflows/issues/177
Export mini-WDL cache folderThe essential files required as input in CZ ID pipeline are downloaded and stored in ‘mini-wdl_download_cache’ folder. This was a one-time download which took ~4 hours to complete. But in our case, those files were downloading and over-riding ‘mini-wdl_download_cache’ folder repeatedly at every run.A query regarding this issue was posted on CZ ID GitHub page. The issue was solved by incorporating additional parameters to main mini-WDL run command.https://github.com/chanzuckerberg/idseq-workflows/issues/179
Creation of extraneous control charactersA six extraneous control characters folder was created that aborted the pipeline in the middle.A query was posted on CZ ID GitHub page. The issue was solved by defining the path for storing cache files. The developers then updated their GitHub page accordingly.https://github.com/chanzuckerberg/idseq-workflows/issues/142
Updating CZ ID consensus genome workflowThe developers made changes in the main code by updating iVar consensus threshold value from 0.9 to 0.75. (75% base frequency is required for a call, otherwise resulted in mixed sites).CZ ID pipeline was updated locally, and new changes were incorporated.
CZ ID multi sample runProcessing all samples at a time and obtaining combined consensus genome file with combined vcf file is currently not possible through mini-WDL consensus genome workflow.In-house bash script was written to process multiple samples at a time using a single command.
Timestamp directoryMini-WDL pipeline creates a timestamp consensus genome directory for every sample in which all intermediate and final output folders are stored. This was a challenge since we were running the sample in batches and timestamp directory was creating difficulty in matching output files with specific sample ID.-d parameter in mini-WDL pipeline with trailing slash and dot can be used to specify and create the folder having name of choice. In this way, we renamed the directory as sample id instead of timestamp.https://github.com/chanzuckerberg/czid-workflows/issues/33
  • AKU, Aga Khan University; CZ ID, Chan Zuckerburg ID ; VADR, Variant Analysis of DNA Repair; WDL, Workflow Descriptive Language.