TY - GEN
T1 - Data Fusion Analysis for Determining Localization of Proteins Associated to Escherichia coli
AU - Orjuela-Canon, Alvaro David
AU - Rodriguez Burbano, Diana C.
AU - Perdomo, Oscar
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/7/27
Y1 - 2022/7/27
N2 - In recent years, the interest in protein analysis based on biomolecular features has rapidly grown. This has led to explore the use of machine learning models, as they could hold an important alternative to contribute to the problems associated to these analyses. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. Two main sources of data were used to train the models: the information from targeting signal and from the protein sequences to determine the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines.
AB - In recent years, the interest in protein analysis based on biomolecular features has rapidly grown. This has led to explore the use of machine learning models, as they could hold an important alternative to contribute to the problems associated to these analyses. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. Two main sources of data were used to train the models: the information from targeting signal and from the protein sequences to determine the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines.
UR - http://www.scopus.com/inward/record.url?scp=85141436488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141436488&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/c31d5447-f255-3143-9639-625e98bb937a/
U2 - 10.1109/colcaci56938.2022.9905354
DO - 10.1109/colcaci56938.2022.9905354
M3 - Conference contribution
AN - SCOPUS:85141436488
SN - 9781665474702
T3 - 2022 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI)
BT - 2022 IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022 - Proceedings
A2 - Orjuela-Canon, Alvaro David
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022
Y2 - 27 July 2022 through 29 July 2022
ER -