CANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes

TitleCANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes
Publication TypeJournal Article
Year of Publication2020
AuthorsBayati, M., H. R. Rabiee, and H. Alinejad-Rokny
JournalScientific Reports
Date Published01/2020
Type of ArticleOpen Access
Accession Number1890
AbstractAnalysis of cancer mutational signatures have been instrumental in identification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly interface for analysis of cancer mutational signatures is necessary. In this manuscript we introduce CANCERSIGN, which enables users to identify 3-mer and 5-mer mutational signatures within whole genome, whole exome or pooled samples. Additionally, this tool enables users to perform clustering on tumor samples based on the proportion of mutational signatures in each sample. Using CANCERSIGN, we analysed all the whole genome somatic mutation datasets profiled by the International Cancer Genome Consortium (ICGC) and identified a number of novel signatures. By examining signatures found in exonic and non-exonic regions of the genome using WGS and comparing this to signatures found in WES data we observe that WGS can identify additional non-exonic signatures that are enriched in the non-coding regions of the genome while the deeper sequencing of WES may help identify weak signatures that are otherwise missed in shallower WGS data.
URL<a href="/dmlsite/?q=%3Ca%20href%3D%22/dmlsite/%3Fq%3D%253Ca%2520href%253D%2522/dmlsite/%253Fq%253D%25253Ca%252520href%25253D%252522/dmlsite/%25253Fq%25253D%2525253Ca%25252520href%2525253D%25252522/dmlsite/%2525253Fq%2525253D%252525253Ca%2525252520href%25
Full TextIntroduction Aberrant somatic changes in DNA resulting from endogenous sources (e.g. APOBEC-induced mutagenesis and DNA repair defects) and exogenous factors (e.g. tobacco smoking and UV radiation) are the hallmark of cancer. These alternations in DNA may have different forms, ranging from gross chromosomal rearrangements to single base substitutions1. The whole genome sequencing of tumor cells has shown that the number of mutations varies from less than one hundred per genome to hundreds of thousands depending on the cancer type and patient. Moreover, the type of mutation and sequence context of many cancer mutations are not random. For instance, C-to-T mutation within the CG (a.k.a. CpG) dinucleotide is a prevalent mutation in cancer and as its abundance is proportional to the age of patient it is referred to as an “aging” signature2. Many cancers also have a large number of C-to-T and C-to-G mutations within TCA and TCT trinucleotides3. These mutations are attributed to the aberrant changes in the level and activity of APOBEC enzymes. The mutational landscape of each cancer genome is thus a cumulative result of multiple mutational signatures, each caused by a unique process such as methylation, APOBEC mediated changes, etc.1. Typically, signatures of mutational processes are determined by considering the trinucleotide context of single base substitutions. If all mutations are presented based on changes in the same DNA strand, there are 96 possible different types of mutations within trinucleotide motifs4. In 2013, Alexandrov et al. proposed a mathematical framework for analysing mutational signatures4 based on these 96 types of mutations. Using a matrix factorization algorithm, the authors uncovered 30 independent mutational signatures. They have recently updated the cancer mutational signature profiles by identifying 67 single base substitution mutational signatures5. Details of these signatures including their prevalence in each cancer type and potential etiology are available at the COSMIC database ( 6). The discovery of mutational signatures was a breakthrough in the field of cancer research. Therefore, the mathematical framework developed by Alexandrov et al.4 is now routinely used to identify novel mutational signatures and to study the processes involved in different cancers and in different patients. To help the progress of this field, we have developed a computational tool, CANCERSIGN, which enables the users to easily apply a matrix factorization analysis to cancer mutation datasets and receive a complete set of mutational signatures. Compared to the previously developed packages in R7,8,9, CANCERSIGN is unique in that it is a stand-alone package (i.e. it does not require additional software programming). Therefore, to use this tool, no programming skills are required. Additionally, it enables the users to perform de novo mutational signature analyses. Application of CANCERSIGN is not limited to extracting mutational signatures based on nucleotides immediately flanking the mutated site (i.e. tri-nucleotide motifs). It allows the users to extend the analysis to two bases on each side of the mutated base (i.e. penta-nucleotides motifs). According to a recent study10, taking larger sequence contexts into consideration provides a greater power to explain variability in genomic substitution probabilities. In addition, CANCERSIGN allows the user to select trinucleotides of interest, and determine their penta-nucleotide mutational signatures. Furthermore, it has a built-in clustering option to study the groupings of cancer samples based on the raw mutation counts and/or composition of mutational signatures. In this manuscript, we introduce CANCERSIGN and show the new mutational signatures obtained from a de novo analysis of whole genome ICGC dataset. This analysis was performed for each cancer type separately and resulted in 77 mutational signatures. Each of the obtained signatures were shown to be highly similar to at least one of the 67 signatures discovered recently by Alexandrov et al.5, except two signatures that potentially can be considered as novel.