CapsProm: A capsule network for promoter prediction.” Computers in Biology and Medicine, Pp. 105627. Publisher's VersionAbstract. 2022. “
Locating the promoter region in DNA sequences is of paramount importance in bioinformatics. This problem has been widely studied in the literature, but it has not yet been fully resolved. Some researchers have shown remarkable results using convolutional networks that allowed the automatic extraction of features from a DNA chain. However, a single architecture schema that could learn the promoter prediction task competitively for several organisms has not yet been achieved. Thus, researchers must seek new architectures by hand-designing or by Neural Architecture Search for each new evaluated organism dataset. This work proposes a versatile architecture based on a capsule network that can accurately identify promoter sequences in raw DNA data from five different organisms, eukaryotic and prokaryotic. Our architecture, the CapsProm, could help create models with minimum effort to learn the promoter identification task between different datasets. Furthermore, the CapsProm showed competitive results, overcoming the baseline method in five out of seven tested datasets (F1-score). The models and source code are made available at https://github.com/lauromoraes/CapsNet-promoter.