-تشخیص سرطان سینه با استفاده از طبقهبندهای ترکیبی جهت بهبود دقت | ||
مدیریت مهندسی و رایانش نرم | ||
مقاله 6، دوره 8، شماره 2 - شماره پیاپی 15، مهر 1401، صفحه 92-109 اصل مقاله (1.95 M) | ||
نویسندگان | ||
محبوبه شمسی* 1؛ محدثه کریمیان2؛ مرضیه کریمیان3 | ||
1استادیار دانشکده برق و کامپیوتر، دانشگاه صنعتی قم، قم، ایران. رایانامه: shamsi@qut.ac.ir | ||
2کارشناسی ارشد مهندسی کامپیوتر، دانشکده مهندسی برق و کامپیوتر، دانشگاه شهاب دانش، قم، ایران. رایانامه: m.karimian90@gmail.com | ||
3کارشناسی ارشد مهندسی کامپیوتر، دانشکده مهندسی برق و کامپیوتر، دانشگاه شهاب دانش، قم، ایران. رایانامه: m.karimian64@gmail.com | ||
چکیده | ||
تشخیص زودهنگام سرطان سینه نقش بسیار کلیدی در درمان بیمار ایفا میکند. امروزه الگوریتمهای دادهکاوی میتوانند روشهای هوشمندی در نظام سلامت ارائه دهند که با دقت بالایی سرطان سینه را تشخیص دهند. هدف از انجام این مطالعه، تشخیص سرطان سینه با استفاده از طبقهبندهای ترکیبی بر روی پایگاه دادهی آمادهسازی شدهی WBC و WDBC میباشد. مدل پیشنهادی ما در پایگاه دادهی WBC (کاهش ویژگیها با CFS+ بهینه کردن نمونه ها با روش Resample+ طبقه بند ترکیبی (kstar+ جنگل تصادفی+ شبکهی بیز و بیزین ساده))، دارای بهترین دقت تشخیص (% 100)، زمان پیادهسازی (0 ثانیه) و بدون هیچ خطایی میباشد و در پایگاه دادهی WDBC (کاهش ویژگیها با CFS+ بهینه کردن نمونه ها با روش Resample+ طبقه بند ترکیبی (الگوریتم IBK+ شبکهی بیز، بیزین ساده و kstar))، دارای دقت %99.29، زمان پیادهسازی 0 ثانیه و میانگین خطای مطلق 0.007 میباشد. نتایج این مطالعه نشان میدهد که با توجه به روشهای طبقهبند ترکیبی بر روی پایگاهدادهی آمادهسازی شده میتوان سیستمهای نوینی برای کمک به پزشکان طراحی نمود که موجب تسهیل در فرآیندهای تشخیصی و درمانی شوند. | ||
کلیدواژهها | ||
انتخاب ویژگی؛ بهبود دقت؛ داده کاوی؛ طبقهبندهای ترکیبی؛ نمونه گیری | ||
عنوان مقاله [English] | ||
Breast Cancer Detection Using Ensemble Classifiers for Accuracy Improvement | ||
نویسندگان [English] | ||
Mahboubeh Shamsi1؛ Mohadaseh Karimian2؛ Marziyeh Karimian3 | ||
1Assistant Prof. faculty of Electrical and Computer, Qom University of Technology, Qom, Irany. Email: shamsi@qut.ac.ir | ||
2Msc. of Computer Engineering, Faculty of Electrical and Computer Engineering, Shahab Danesh University, Qom, Iran. Email: m.karimian90@gmail.com | ||
3Msc. of Computer Engineering, Faculty of Electrical and Computer Engineering, Shahab Danesh University, Qom, Iran. Email: m.karimian64@gmail.com | ||
چکیده [English] | ||
Early diagnosis of breast cancer plays a crucial role in treating the patient. Nowadays, data mining algorithms can provide intelligent methods in the health and treatment system that accurately detect breast cancer. The purpose of this study is breast cancer detection using ensemble classifier based on WBC and WDBC prepared databasesa. Our proposed model in the WBC database (reducing features by cfs+ optimizing samples using Resample+ ensemble classifier using data mining algorithms (kstar + random forest + Naïve Bayes and Bayes network)) has the best detection accuracy ( 100%), implementation time (0 seconds) and without any errors and on the WDBC database (reducing features by cfs+ optimizing samples using Resample+ ensemble classifier using data mining algorithms (IBK algorithm+ Naïve Bayes, Bayes network and kstar)) has an accuracy of 99/29, the implementation time is 0 seconds, and the mean absolute error is 0/007. The results of this study show that according to the ensemble classifier methods using data mining algorithms on the prepared database, new systems can be designed to help physicians that facilitate treatment processes. | ||
کلیدواژهها [English] | ||
Accuracy Improvement, Data Mining, Ensemble Classifiers, Feature Selection, Sampling | ||
مراجع | ||
Abdullah, M., Al-Anzi, F., & Al-Sharhan, S. (2018). Hybrid Multistage Fuzzy Clustering System for Medical Data Classification. Computing Sciences and Engineering (ICCSE), 2018 International Conference On, 1–6. IEEE. DOI: https://doi.org/10.1109/ICCSE1.2018.8374213 Adegoke, V. F., Chen, D., Banissi, E., & Barikzai, S. (2017). Prediction of breast cancer survivability using ensemble algorithms. Smart Systems and Technologies (SST), 2017 International Conference On, 223–231. IEEE. DOI: https://doi.org/10.1109/SST.2017.8188699 Alickovic, E., & Subasi, A. (2017). Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and Applications, 28(4), 753–763. DOI: https://doi.org/10.1007/s00521-015-2103-9 Alyami, R., Alhajjaj, J., Alnajrani, B., Elaalami, I., Alqahtani, A., Aldhafferi, N., … Olatunji, S. O. (2017). Investigating the effect of Correlation based Feature Selection on breast cancer diagnosis using Artificial Neural Network and Support Vector Machines. Informatics, Health & Technology (ICIHT), International Conference On, 1–7. IEEE. DOI: https://doi.org/10.1109/ICIHT.2017.7899011 Ani, R., Jose, J., Wilson, M., & Deepa, O. S. (2018). Modified Rotation Forest Ensemble Classifier for Medical Diagnosis in Decision Support Systems. In Progress in Advanced Computing and Intelligent Engineering (pp. 137–146). Springer. DOI: https://doi.org/10.1016/j.jisa.2023.103541 Arach, S., & Bouden, H. (2019). Performance Analysis on Three Breast Cancer Datasets using Ensemble Classifiers Techniques. Computer Science, 14(4), 935–952. DOI: https://doi.org/10.1016/j.eswa.2023.122641 Avinash, K., Bijoy, M. B., & Jayaraj, P. B. (2020). Early Detection of Breast Cancer Using Support Vector Machine With Sequential Minimal Optimization. In Advanced Computing and Intelligent Engineering (pp. 13–24). Springer DOI: https://doi.org/10.1007/978-981-15-1081-6_2 Chaurasia, V., & Pal, S. (2014). Data mining techniques: to predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing IJCSMC, 3(1), 10–22. Chaurasia, V., & Pal, S. (2017b). Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Chawla, N. V, Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1–6. DOI: https://doi.org/10.1145/1007730.1007733 Cleary, J. G., & Trigg, L. E. (1995). K*: An Instance-based Learner Using an Entropic Distance Measure. ICML, 108–114. DOI: https://doi.org/10.1016/B978-1-55860-377-6.50022-0 El-Baz, A. H. (2015). Hybrid intelligent system-based rough set and ensemble classifier for breast cancer diagnosis. Neural Computing and Applications, 26(2), 437–446 DOI: https://doi.org/10.1007/s00521-014-1731-9 Fenton, N. E., & Ohlsson, N. (2000). Quantitative analysis of faults and failures in a complex software system. Software Engineering, IEEE Transactions On, 26(8), 797–814. DOI: https://doi.org/10.1109/32.879815 Gbenga, D. E., Christopher, N., & Yetunde, D. C. (2017). Performance Comparison of Machine Learning Techniques for Breast Cancer Detection. Nova, 6(1), 1–8 DOI: https://doi.org/10.20286/nova-jeas-060105 Gupta, P., & Shalini, L. (2018). Analysis of Machine Learning Techniques for Breast Cancer Prediction. International Journal Of Engineering And Computer Science, 7(05), 23891–23895. DOI: https://doi.org/10.31033/ijemr.11.1.12 Hall, M. A. (1999). Correlation-based feature selection for machine learning. DOI: https://doi.org/10.4236/ojbm.2021.92030 Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier DOI: https://doi.org/10.4236/als.2019.74012 Hazra, A., Mandal, S. K., & Gupta, A. (2016). Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms. International Journal of Computer Applications, 145(2). DOI: https://doi.org/10.5120/ijca2016910595 Huang, M.-W., Chen, C.-W., Lin, W.-C., Ke, S.-W., & Tsai, C.-F. (2017). SVM and SVM ensembles in breast cancer prediction. PloS One, 12(1), e0161501 DOI: https://doi.org/10.1371/journal.pone.0161501 Jensen, F. V. (1996). An introduction to Bayesian networks (Vol. 210). UCL press London. DOI: https://doi.org/10.1016/j.ifacol.2018.07.024 Joshi, A., & Mehta, A. (2018a). ANALYSIS OF K-NEAREST NEIGHBOR TECHNIQUE FOR BREAST CANCER DISEASE CLASSIFICATION. Machine Learning, 98, 13. DOI: https://doi.org/10.47611/jsrhs.v12i4.5577 Joshi, A., & Mehta, A. (2018b). BREAST CANCER DATA CLASSIFICATION USING NEURAL NETWORK AND DEEP NEURAL NETWORK TECHNIQUES. Int J Recent Sci Res, 9(4), 25788–25792. DOI: https://doi.org/10.1504/IJISDC.2020.10037864 Khuriwal, N., & Mishra, N. (2018). Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm. 2018 IEEMA Engineer Infinite Conference (ETechNxT), 1–5. IEEE. DOI: https://doi.org/10.1109/ETECHNXT.2018.8385355 Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. DOI: https://doi.org/10.1109/34.667881 Koru, A. G., & Liu, H. (2005). Building effective defect-prediction models in practice. Software, IEEE, 22(6), 23–29. DOI: https://doi.org/10.1109/MS.2005.149 Krawczyk, B. (2015). One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing, 150, 490–500. DOI: https://doi.org/10.1016/j.neucom.2014.07.068 Kumar, U. K., Nikhil, M. B. S., & Sumangali, K. (2017). Prediction of breast cancer using voting classifier technique. Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 2017 IEEE International Conference On, 108–114. IEEE DOI: https://doi.org/10.1109/ICSTM.2017.8089135 Mandal, S. K. (2017). Performance Analysis Of Data Mining Algorithms For Breast Cancer Cell Detection Using Naïve Bayes, Logistic Regression and Decision Tree. International Journal Of Engineering And Computer Science, 6(2) DOI: https://doi.org/10.1088/1742-6596/1577/1/012051 Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. Software Engineering, IEEE Transactions On, 33(1), 2–13. DOI: https://doi.org/10.1109/TSE.2007.256941 Michalak, K., & Kwasnicka, H. (2006). Correlation-based feature selection strategy in neural classification. Intelligent Systems Design and Applications, 2006. ISDA’06. Sixth International Conference On, 1, 741–746. IEEE. DOI: https://doi.org/10.1109/ISDA.2006.128 Newman, D. J., Hettich, S., Blake, C. L., Merz, C. J., & Aha, D. W. (1998). UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. 1998 of Conference, Http://Archive. Ics. Uci. Edu/Ml/Datasets. Html. DOI: https://doi.org/10.4236/me.2013.410068 Nilashi, M., bin Ibrahim, O., Ahmadi, H., & Shahmoradi, L. (2017). An analytical method for diseases prediction using machine learning techniques. Computers & Chemical Engineering, 106, 212–223. DOI: https://doi.org/10.1016/j.compchemeng.2017.06.011 Peng, C.-Y. J., Harwell, M., Liou, S.-M., & Ehman, L. H. (2006). Advances in missing data methods and implications for educational research. Real Data Analysis, 3178 DOI: https://doi.org/10.1007/s42979-022-01249-z Rachman, G. H., Khodra, M. L., & Widyantoro, D. H. (2017). Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic 36Representation. Journal of Physics: Conference Series, 801(1), 12070. IOP Publishing DOI: https://doi.org/10.1088/1742-6596/801/1/012070 Rohan, T. I., Siddik, A. B., Islam, M., & Yusuf, M. S. U. (2019). A Precise Breast Cancer Detection Approach Using Ensemble of Random Forest with AdaBoost. 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), 1–4. IEEE. DOI: https://doi.org/10.1109/IC4ME247184.2019.9036697 Salama, G. I., Abdelhalim, M., & Zeid, M. A. (2012). Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC), 32(569), 2 Siegel, R. L., Miller, K. D., & Jemal, A. (2017). Cancer statistics, 2017. CA: A Cancer Journal for Clinicians, 67(1), 7–30. DOI: https://doi.org/10.3322/caac.21387 Teh, Y.-C., Tan, G.-H., Taib, N. A., Rahmat, K., Westerhout, C. J., Fadzli, F., … Yip, C.-H. (2015). Opportunistic mammography screening provides effective detection rates in a limited resource healthcare system. BMC Cancer, 15(1), 405 DOI: https://doi.org/10.1186/s12885-015-1419-2 West, D., Mangiameli, P., Rampal, R., & West, V. (2005). Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application. European Journal of Operational Research, 162(2), 532–551 DOI: https://doi.org/10.1016/j.ejor.2003.10.013 Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann ISBN:978-0-12-374856-0 Wozniak, M., Grana, M., & Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, 3–17. DOI: https://doi.org/10.1016/j.inffus.2013.04.006 Zhang, H., & Su, J. (2008). Naive Bayes for optimal ranking. Journal of Experimental & Theoretical Artificial Intelligence, 20(2), 79–93 DOI: https://doi.org/10.1080/09528130701476391 | ||
آمار تعداد مشاهده مقاله: 855 تعداد دریافت فایل اصل مقاله: 680 |