A Convolutional Neural Network Model for Setswana Named Entity Recognition

Authors

  • Shumile Chabalala
  • Pius Owolawi
  • Sunday Ojo

Keywords:

Natural Language Processing, Named Entity Recognition, Convolutional Neural Network, Setswana

Abstract

Named entity recognition (NER) is a key component of the core task of natural language processing (NLP). In order to represent language, neural networks have been used starting in the 2000s, which enhanced entity recognition outcomes. The Setswana language, in contrast, has never been used with neural networks, in particular convolutional neural networks (CNN). Recently, problems with NLP have been addressed using CNNs, and the results have been quite interesting. CNNs are frequently used in NLP due to their ease of training and reputation as the best in sequence labelling. They depict the interdependence of all conceivable word combinations. Given the difficulties in identifying named entities for South African languages, including Setswana, and the inadequacy of resources, this research proposes the use of CNN model to identify named entities for Setswana. The results obtained are benchmarked with traditional methods such as Conditional random fields (CRF). The performance metrics such as F1-Score are explored in establishing the magnitude of trust and reliability of the proposed model. The model is evaluated using data from the South African Centre for Digital Language Resources' Setswana NER dataset. Compared to the present CRF model, which had an F-score performance of 78.0%, the testing results demonstrate that the model performs 94.0% better.

https://doi.org/10.59200/ICARTI.2023.024

Downloads

Published

2023-12-10