Abstract:
The accurate identification of promoter regions in DNA sequences holds significant importance in the field of bioinformatics. While this problem has garnered substantial attention in the literature, it remains unresolved. Several researchers have achieved notable outcomes by employing diverse machine-learning techniques to predict promoter regions. However, only a few have thoroughly explored the utilization of features derived from the physicochemical properties of DNA across various organism types. This study investigates the advantages of incorporating these features in the training of machine-learning models. The research evaluates and compares the performance of multiple metrics on diverse datasets encompassing both prokaryotic and eukaryotic organisms. The state-of-the-art CNNProm method is employed as the baseline for our experiments. The models and source code associated with this study can be accessed at the following URL of the project's repository:
https://anonymous.4open.science/r/bracis-paper-1458/.