Reflections on Feature Engineering and Design Using Causal Machine Learning (CML) for African Swine Fever (ASF) Diagnosis

Authors

  • Steven Lububu
  • Boniface Kabaso

Keywords:

Feature Engineering, Causal Machine Learning (CML), Accuracy

Abstract

Feature engineering is a crucial step in the process of machine learning, where raw data is transformed into meaningful features that can effectively represent the underlying patterns and relationships in the data. The goal is to improve the performance of machine learning models by providing them with more informative and meaningful input features. Automated feature engineering techniques, such as genetic algorithms, can also be used to automatically generate and optimise features. These methods search a space of potential features and select or create features based on their impact on the model's performance. Overall, feature engineering plays a crucial role in machine learning by enabling models to exploit the most relevant and informative aspects of the data, thereby improving their accuracy, robustness, and interpretability. This paper reports empirical studies aimed at demonstrating which types of technical features are best suited to establish relationships between ASF viruses and clinical symptoms to accurately diagnose ASF disease. Various machine learning models such as neural networks, decision trees, random forests, linear regression, and Bayesian regression accept ASF features and provide predictions. The experiment demonstrates the extent to which the machine learning model can establish correlations between ASF viruses and clinical symptoms by independently analysing the required feature. The focus is on establishing relationships between ASF viruses and clinical symptoms for diagnosis. Data from the European Union Reference Laboratory for African swine fever (ASF) was collected for the study. This paper provides essential information on ASF datasets based on the interpretation of results obtained by using appropriate samples and validated tests in combination with information from laboratory tests on ASF disease epidemiology, scenario, clinical signs, and lesions caused by different virulence. The study proposes to use causal ML to establish relationships between ASF viruses and symptoms to improve the accuracy of the ASF disease. In this study, the performance and validation of the models were measured using metrics such as R-squared, mean absolute error (MAE) and mean square error (MSE).

https://doi.org/10.59200/ICARTI.2023.004

Downloads

Published

2023-12-10