Abstract |
Child mortality under-five years of age remains a pressing global health challenge. This study aims to develop a machine learning model to predict under-five mortality in South Africa and identify key determinants of this mortality. Data from the 2016 South Africa Demographic and Health survey was used to explore a model that optimally predicts under-five years mortality. The study employed a chi-square test and analysis of variance for feature selection, while the synthetic minority oversampling technique was used to manage class imbalances. The models were evaluated based on multiple evaluation metrics. The best-performing models were used to determine key factors to predict child mortality. Among the models tested, random forest, XGboost and logistic regression were the best performing models. The breastfeeding status and the number of children under five years in the household were identified as the most important key factors to predict child mortality. Other influential variables were being one of a twin, the total number of children born to the mother, and access to clean drinking water. The results show the potential of machine learning models to predict under-five mortality and identify key risk factors. Random forest, XGboost and logistic regression models the best performing models for predicting under-five mortality. Child breastfeeding and children five years and under in the household have the highest influence on under-five mortality. The results of this study show the need for targeted policy intervention on promoting breastfeeding, improving the need for basic services and ensuring support for larger families with more children under the age of five in the household. The results provide policymakers with insights into designing strategies that will assist the country in achieving the Sustainable Development Goal 3. |