
1 INTRODUCTION
COVID-19, which is emerging as an upper respiratory and lung infection, first appeared in Wuhan province, China in late 2019. It is mostly infected by airway and contact, and causing severe damage to the infected lungs. The virus has since spread rapidly and has become a global epidemic. The number of cases and related deaths continue to increase every day.1 One of the major steps to prevent the spread of the COVID-19 is to effectively scan people suspected of being infected to prevent the virus from spreading further across the population. In this context, any technological tool that allows fast and high-accuracy scanning of COVID-19 infection is useful to health professionals. Lung imaging with x-ray is one of the most widely used and accessible modality for rapid examination of lung conditions.2 Chest x-ray images are ready to be quickly analyzed by radiologists. Chest x-ray images have been proved to be useful in monitoring the effects COVID-19 causes on lung tissue. Therefore, chest x-ray images can be used to diagnose COVID-19. Lung x-ray radiography is an effective tool for triaging non-COVID-19 patients with pneumonia to efficiently allocate hospital resources. However, there are many common features between x-ray images of COVID-19 and pneumonia caused by other viral infections, like common influenza.3, 4 These common observations in images are ground-glass opacities, air space consolidation, bronchovascular thickening, vascular thickening, and bronchial wall thickening.5 In this context, chest x-ray images of COVID-19 and pneumonia are similar and contain similar features. This similarity makes it difficult for radiologists to diagnose COVID-19 cases. A reliable method for classifying COVID-19 and non-COVID-19 chest x-ray images can accelerate the triage process of non-COVID-19 cases and maximize the allocation of hospital resources to COVID-19 cases.
Real-world data (like chest x-ray images) given to models in machine learning (ML) are in the form of feature vectors and these feature vectors are extracted from the raw data. Feature engineering is the process of extracting relevant features from data for a ML model.6 The count of features is as important as the features themselves. If the features in the feature vectors representing real-world data given to the model are not sufficient, the model will not be able to perform its main task. If there is a feature vector that contains more than necessary and unrelated features, the model will still not produce accurate results. In the ML process, not only the model but also the features that will represent the data are selected. Well-selected features facilitate subsequent modeling steps and increase the resulting model’s ability to complete the desired task. However, if the features are not selected properly, a much more complex model may be required to achieve the same level of performance. Thus, we study on performances of feature extraction techniques and feature vectors of chest x-ray images in this study.
- –
In this study, a general framework (HANDEFU) that could be used in different computer vision and ML problems is developed.
- –
This original framework supports handcrafted, deep, and fusion-based feature extraction techniques for feature engineering.
- –
Any feature extraction technique and model could then be added dynamically to the library of software at a later time upon request.
- –
This framework is utilized for diagnosing COVID-19 from chest x-ray images.
- –
The user can build a feature extraction method and classification to train the model. Then, all performance evaluations on test data are performed with this software.
The rest of this article is organized as follows. In Section 2, the related works in literature are presented. In Section 3, the methodology which contains stages of the framework is explained. In Section 4, utilized dataset for testing is stated. Then, experimental results are presented and findings are discussed in Section 5. Finally, Section 6 concludes the article and presents some future work opportunities.
2 RELATED WORK
In the literature, there exist numerous approaches to diagnose COVID-19 from chest x-ray images based on various ML, deep learning, and hybrid methods. The studies based on machine and deep learning techniques are gaining increasing popularity for radiology images. When the related studies on this subject are examined, the most common and biggest problem is the lack of a sufficiently large data set. In general, studies have aimed to overcome this problem with various methods. Studies conducted on this subject can be given as follows: related studies using a similar open-access dataset7 with this study are summarized as follows. In some studies, different-sized image samples belonging to the classes may have been taken from the same data set at different times. Therefore, it can be observed that similar techniques give different performances in such a situation. For instance, Chowdhury et al.8 created this data set, made it available for open-access, and increased the size of the data over time. They studied the utility of artificial intelligence for accurate detection of COVID-19 from chest x-ray images rapidly. They obtained 99.7% classification accuracy with DenseNet201 model (used dataset contained 423 COVID-19, 1579 normal, and 1485 viral pneumonia chest x-ray images). Ahammed et al.9 used a CNN model to detect COVID-19 positive patients in early phase. Their model achieved 94.03% accuracy. The limitation of this study small number of data (285 COVID-19 images) was used for training and this is not adequate for deep-learning and COVID-19 prediction. Aslan et al.10 proposed a hybrid model with CNN-based transfer learning and BiLSTM. Their hybrid architecture gives 98.70% accuracy for COVID-19 infection detection on chest x-ray images on this similar data set. Gupta et al.11 proposed an integrated stacked deep CNN model which name is InstaCovNet-19. Their model is composed of various pre-trained models. Their model gives accuracy of 99.08% and 99.52% on three-class and two-class classification respectively. Ouchicha et al.5 proposed CVDNet that is deep CNN model to classify COVID-19 cases from other cases (normal or pneumonia) using chest x-ray images. They achieved an average accuracy of 97.20% for two-class (binary) and 96.69% for three-class (multiple) classifications. Similarly, Asif et al.12 proposed the detection COVID-19 pneumonia patients using deep CNN on same dataset (this consisted of 1345 viral pneumonia, 864 COVID-19, and 1341 normal images). They obtained the best performance with more than 98% classification accuracy. Cavallo et al.13 segmented manually lung areas by using polygonal regions of interest from this dataset. They extracted texture features with co-occurrence matrix by using gray level histogram. They extracted a total of 308 features from per ROI, and 110 COVID-19 chest x-ray images were selected for the final analysis. They obtained 91.8% accuracy with ensemble ML and 92.9% accuracy with artificial neural network. The common limitations of these studies are the small number of data and owning to the small number of COVID-19 samples. Therefore, the lack of images of the COVID-19 class does not make learning perfect and makes it difficult to detect COVID-19.
Other related studies could be summarized as follows. Rasheed et al.14 proposed the use of logistic regression and CNN for diagnosis of COVID-19. They investigated their method with PCA and without PCA to obtain high accuracy. They obtained overall accuracy between 95.2% and 97.6% without PCA, and between 97.6% and 100% with PCA for identification. In order to detect COVID-19 from chest x-ray images, Ahmed et al.15 proposed a COVID-Net which is combination of residual network and parallel convolution. They achieved accuracy of 97.99%. Sharifrazia et al.16 proposed a fusion of CNN, SVM, and Sobel filter to detect COVID-19 using x-ray images. They used high pass filter with Sobel to get the edges of the images. The highest classification accuracy achieved with this method is 99.02%. Shankar and Perumal17 proposed FM-HCF-DLF model that is a novel fusion method hand-crafted with deep learning features for COVID-19 classification and diagnosis. Their proposed model outcome superior performance with accuracy of 94.08%. Joshi et al.18 proposed a deep learning-based system for chest x-ray images to detect the COVID-19. The average test accuracy of 97.11% was achieved for multi-class and 99.81% for binary classification. Jain et al.19 studied on deep learning-based CNN models for detection of COVID-19 on chest x-ray images. They compared the performance of Xception, Inception V3, and ResNeXt models about accuracy. When compared to other methods, Xception gives the highest accuracy with 97.97%. To detect COVID-19 patients, Sarker et al.20 proposed a deep learning-based approach using the model of Densenet-121. The test accuracy of 96.49% was achieved for binary classification and 93.71% for multi-class classification. Turkoglu21 proposed COVIDetectioNet model for diagnosis of COVID-19. This model utilizes features selected from combination of deep features. A transfer learning approach was used with a pretrained CNN-based AlexNet architecture and SVM was used for classification. In experimental results, an accuracy of 99.18% was achieved by using this model. Since the data was imbalanced in some studies, the diversity of the data was increased with the image augmentation technique for the training model. For instance, Rahman et al.22 achieved 95.11% classification accuracy with DenseNet201 model (used large dataset that contains 18,479 images with 3616 COVID-19, 6012 non-COVID, and 8851 normal images). They utilized from augmentation technique (an image rotation-based technique) to generate training images before CNN models for training. Similarly, Luz et al.23 utilized from this augmentation technique for effective training. They achieved 93.5% classification accuracy with EfficientNet architecture (used large dataset that contains 13,770 images with 183 COVID-19, 5521 viral pneumonia, and 8066 normal images).
The common shortcoming of state-of-art studies is that most deep learning-based models are trained on unbalanced datasets and therefore it may lack robustness. There exists a small number of studies on creating a scalable deep learning model using image enhancement/preprocessing techniques and efficient feature extraction. Due to unbalanced data processing and inability to extract necessary features from images, the classification accuracy does not reach the desired level. Also, proposed models in literature cannot guarantee to reproduce the promising results when these are evaluated on a larger dataset. It is a handicap that running a CNN architecture with many iterations on a small dataset leads to overfitting. In order to overcome these limitations, this framework combines image preprocessing with handcrafting, deep and fusion based feature extraction techniques to create a scalable model.
3 METHODOLOGY
In this study, a general framework (HANDEFU) including different…
A deep and handcrafted features‐based framework for diagnosis of COVID‐19 from chest x‐ray
