Lung Cancer Prediction and Detection Using Image Processing Mechanisms: An Overview

ABSTRACT


Introduction
Currently, the most hazardous disease that faced humanity's life and led to fatal death is referred to as Cancer [1] . In other words, the abnormality of enhancing human cells and converting it into a tumor is known as Cancer [2]. Among the various forms of cancer, the Lung-Cancer is enumerated as the riskiest one when compared to the other types around the globe. According to the doctors' view, the most essential factor that causes Lung-Cancer is tobacco and the survival rate of patients will increase if the cancerous tumor detects in its early age [3]. Among genders, the male is more faced to Lung-Cancer than female due to the higher ratio of smoking [4]. For instances, according to a report which showed that among (116,470) males and (109,690) females, (87,750) males and (72,590) females were died because of Lung-Cancer. In the World, merely one from five deaths is due to smoking and utilizing tobacco.
Nowadays, cancer has counted as a hazardous disease that many people suffered from especially Lung-Cancer. Cancer is the disease that cell has grown rapidly and abnormally that is why treating it is somehow tough in some cases but it can be controlled if it is detected in the initial stage. Image Processing Mechanisms have a vital role in predicting and recognizing both benign and malignant cells with the help of classifier mechanisms such as Decision-Tree (D-Tree), A-NN, Support-Vector-Machine, and Naïve-Bayes classifier which are widely utilized in the biomedical field. These classifiers are available to classify the usual and unusual cells. This study aims to review the most well-known Image Processing Mechanisms for Lung-Cancer Detection and Prediction. Brief information about the main steps of proposing an effective system by using Image Processing stages like Image Acquisition, pre-processing of the image which includes noise elimination and enhancement, Segmentation, Extracting Feature, and Binarization had been demonstrated. In the literature, several researchers' work had been reviewed. A comparison had been done among various reviewed research papers that proposed various models for recognizing and estimating the Lung-Cancer nodule. The comparison based on the Image Processing Mechanisms, accuracy, and classifier used in each reviewed research paper.
There are two major types of Lung-Cancer [2] [5] [6]: the first one is named Non-Small-Cell Cancer of Lung (NSCCL) this type further categorized into Squamous-Cell Carcinoma which constituted approximately 25% to 30% of all Lung-Cancer, Adenocarcinoma which comprised of nearly 40% of Lung-Cancer which counted as the most common Lung-Cancer type that has been seen in people who are not smoking especially women and youngster people like teenager and children, and Large-Cell Carcinoma which constituted about 10-15% of Non-Small-Cell Cancer of Lung. However, the second type is named Small-Cell Cancer of Lung (SCCL). Merely 10% to 15% were occupied up by this form. Smokers were suffered most from this kind of Lung-Cancer [7]. There were numerous methods to diagnose Lung-Cancer, such as Chest-Radiography 'X-Ray', CT-Scan, and Magnetic-Resonance-Imaging 'MRI-Scan' [8]. Conversely, most of these methods were costly and time spending. Moreover, these methods were identifying the Lung-Cancer in its advanced periods, so the survival rate will be quite low. Consequently, a new machinery tool was quite necessary to diagnose the Lung-Cancer in its initial periods [9].
The image processing mechanisms were frequently utilized for prediction of Lung-Cancer and also for premature detection to avoid the Lung-Cancer [8]. Image processing mechanisms include several stages like image Pre-processing, noise elimination, enhancement, and segmentation, etc [10]. Currently, there is numerous computers aided diagnosis 'CAD' system proposed by the researcher for this purpose, but still a wide area of research. To forecast the Lung-Cancer several features must be extracted from the image and then these features classify by classifier to indicate the normality or abnormality [11].
The major target of this study is to: concern on Image Processing Mechanisms for detection and prediction of Lung-Cancer, review several proposed systems that have been created by researchers to predict and detect both normal and abnormal tumor in human's Lung, and compare among the image processing mechanisms and various classifiers that were used for classifying normality or ab-normality of lung tumors in terms of accuracy.
In this study, the arrangement of the other sections will be like this: in section 2 the other researcher's work has been reviewed. The main steps of Image Processing Mechanisms to propose an effective system to predict and detect Lung-Cancer have been discussed in section 3. A comparison between the reviewed papers had been done and discussed in section 4. Ultimately, section 5 is the paper's conclusion. The statistics about the forms of Lung-Cancer are shown in Figure 1.

Literature Review
Recently, many researchers conducted a significant investigation and proposed an automated system for detecting and predicting the abnormal tumor in the Lung CT-Scan image by using various techniques, algorithms of image processing along with machine learning algorithms such as (SVM, ANN, and FFNN, etc.). Some of those related up-to-date researches in this field are reviewed in this section which is between 2010 and 2020 as enlisted below: Azamimi et al., 2010, [12] in this paper, an efficient system had been proposed by utilizing Cellular Neural Network 'CNN' algorithm to assist the physicians to identify the uncertain cancerous areas in 'X-Ray' image. The objective of this investigation was to recommend a cheap and electronic recognition method for Lung-Cancer recognition by utilizing 'CNN' pattern such as (Median, Log & Operation, Smoothing, and Edge Recognition). The gained outcomes were compared with the perception done by physicians and doctors the outcomes indicated that the proposed system can successfully detect Lung-Cancer tumors. Sharma et al., 2011, [13] in this study, an investigation had been done about the issue of enhancing an automatic scheme for identifying the existence of abnormal nodules in the Lung CT-Scan image. The main stage of evolving a scheme like that required an emphasis on identifying the nodules in their initial steps because the nodules were tiny. Moreover, this study contained a cancer recognition scheme that relied on extracting texture features from the portion of 'DICOM' Lung CT-Scan images for the detection of tumorous nodules. In evolving this scheme the obtainable Lung CT-Scan images had passed through three main steps of image processing mechanisms to gain more accurate experimental outcomes: the first step was Pre-processing which involved several image enhancement methods in which the enhancement of the images had been done by enhancing the contrast, thresholding, the noise eliminating, and blob analysis. The second step was a process of separating the suspected nodule areas 'SNA' from the image by utilizing an image segmentation method with a thresholding method by utilizing the thresholding algorithm called Otsu. The last step was about extracting texture features which had an important role in making a comparison between malignant and benign images. It was extremely important to distinguish malignant from benign nodules to achieve more accurate detection outcome for this purpose a wellknown and powerful classifier had been utilized which was Artificial Neural Net 'A-NN'. The classifier had trained through the Back-Propagation algorithm and tested it with various images obtained from a 'DICOM' database of Lung-Image-Database-Consortium 'LIDC' dataset. The proposed scheme had gained purely 85% accuracy.
Taher et al., 2011, [14] in this research paper segmentation approach had been utilized with two critical classifiers named Hopfield Neural Network 'H-NN' and Fuzzy C-Mean 'F-CM'. These methods were utilized to separate sputum color images 'SCI' to identify the Lung-Cancer tumors in its initial phases. The physical analysis of the phlegm samples was time spending, incorrect, and needed a seriously trained individual to prevent diagnostic faults. The segmentation outcomes will be utilized for a Computer-Aided-Diagnosis 'CAD' scheme for primary recognition of Lung-Cancer in this way the survival rate will be raised. Conversely, the extreme difference in the graylevel and the contrast between the images make the outcomes less accurate. The thresholding method had been applied in the Pre-processing stage for each image to extract nuclei and cytoplasm areas, because of the quantitative techniques were relied on the nuclear feature. Consequently, the thresholding method successfully extracted the nuclei and cytoplasm areas. Furthermore, it also succeeded in defining the finest thresholding values. The 'H-NN' and 'F-CM' techniques were utilized to classify the image of 'N' pixels between several classes. In this experimental study, more than 1000 phlegm color images had been utilized for testing both techniques. As a result, 'H-NN' had depicted a better classification outcome than 'F-CM'. Additionally, the 'H-NN' successfully extracted the nuclei and cytoplasm areas. Generally, the thresholding classifier had obtained the highest accuracy which was 98% with high sensitivity and specificity which were 83% and 99%, respectively.
Patil et al., 2012, [15] this paper had applied texture feature algorithms on numerous Lung-Cancer tumor forms such as Small Cell 'SC', Non-Small Cell 'NSC', and tuberculosis 'TB' images. The 'X-Ray' images had been used in this study. Originally, the detecting features were gained from the 'X-Ray' images by utilizing image processing techniques such as pre-processing, segmentation, and binarization. Next, these features were trained by skilled classification techniques to distinguish the tumors into cancerous 'SC'/'NSC' and non-cancerous 'TB'. After that, these features were utilized as an input to the classifier which was Feed-Forward Neural Net 'FF-NN' for the classification purpose. Consequently, the established systems helped the doctor to identify the cancerous tumor more accurately in a short time. The accuracy of the proposed scheme was about 83%. Mostly, the Pre-processing of the images was done by using MATLAB software. All image samples were scanned and kept with 512 X 512 pixels. Commonly, in the process of image scanning, the image quality will be affected by various artifacts such as noise, motion, and non-uniform intensity. The major goal of image Pre-processing was to remove the presented redundancy in the scanned images without affecting the image details by using a filter such as median, min, max, impulse or salt-pepper, and mean filter. Kumar et al. , 2015, [16] in this paper an effective Computer-Aided System 'CAD' had been projected by utilizing Deep Features Extraction 'DFE' from an auto-encoder to classify the Lunglumps whether it was cancerous or non-cancerous. For this purpose, approximately (4303) samples that contained (4323) lumps from the National Cancer Institute 'NCI' Lung-Image-Database-Consortium 'LIDC' dataset had taken. As a consequence, the 'CAD' system gained 75.01% accuracy with a sensitivity of 83.35%.
Fernandes et al., 2017, [17] in this work, the initial recognition of Lung-Cancer was created to aid the physicians to effectively identify the malignant tumor with a minimum cost and in a short time. Indeed, the Computer-Aided-Diagnosis system 'CAD' counted as an important system that helped doctors in diagnosing cancer cells. Additionally, Computed Tomography 'CT' is one of the critical ways to record the internal parts of the human body especially human Lungs. Fortunately, there was recently advanced in radioscopy that supported to record the 2-Dimensional and 3-Dimensional Lung images. The major medical challenge is to improve the appropriate 'CAD' system to extract and analyze the cell tumor from 2-Dimensional and 3-Dimensional radioscopy images. Therefore, it is vital to improving an automatic scheme with the following ability: recognition, classification, and quantification of the Lung tumor. In this projected work, the abnormality of the Lung tumor was divided by utilizing the wavelet method and the divided area of interest was later classified by a Novel-Classifier Unit 'NCU' for obtaining an accurate result. The accuracy that was obtained from the classifier was 78.03%.
Makaju et al., 2018, [18], a scheme had been presented to distinguish the cancerous lump from the Lung CT-Scan image by utilizing a watershed segmentation method for recognition and 'SVM' for classifying the nodule as cancerous or non-cancerous. The proposed scheme identified cancer with an accuracy of 92% which was greater than the traditional model which had 86.6% accuracy. Generally, the development had been seen in the proposed scheme when compared to the traditional scheme. However, this projected scheme did not classify the tumor into various stages as stage 1, 2, 3, 4 of cancer.
Ashwini et al., 2019, [19] in this study an image processing mechanisms had been proposed for identifying and estimating Lung-Cancer cells. A multi support-vector machine had been used as a classifier to differentiate the nodules as either malignant or benign. For this purpose, the main stages of image processing mechanisms had applied such as image enhancing, noise eliminating, image segmenting, and feature extraction. All these had done by using MATLAB Software. The segmentation method of the image was a very critical step in the proposed scheme because it aided in identifying the exact size and form of the tumor and the region. In the projected scheme CT image was improved by utilizing the steps of image Pre-processing such as exchanging color, removing or eliminating noise, and histogram equalization. Next, the image had segmented into sections by utilizing the 'K-Mean' algorithm. Then, the texture feature had extracted by a significant method called Gray-Level Co-Occurrence-Matrix 'GL-CM'. Ultimately, the multi 'SVM' classifier had utilized to classify the cell whether it was normal or abnormal. As a consequence, an accurate outcome obtained from the proposed scheme which was merely 97%. Shakeel et al. ,2019, [20], in this research paper, an efficient model has been proposed for identifying and predicting the lung cancer via using the Improved Profuse Clustering Technique 'IPCT' and Deep-Learning with Instantaneously Trained Neural-Networks 'DITNN' method. Firstly, the lung CT images were gathered from the renowned public dataset namely Cancer Imaging Archive 'CIA'. In fact, 'CIA' contains (5043) images in DICOM format that were separated into (3000) training images and (2043) testing images. Secondly, the image preprocessing has been done to enhance the CT images quality enhanced by performing the Weighted Mean Histogram Equalization 'WMHE' that substituted the pixel by utilizing probability distribution and cumulative distribution method. Thirdly, the enhanced CT images were passed through segmentation by computing Improved Profuse Clustering Technique 'IPCT'. Fourthly, numerous spectral features were extracted from the segmented area region by computing the pixel similarity value. Finally, the extracted spectral features were fed to Deep-Learning with Instantaneously Trained Neural-Networks 'DITNN' for training and classifying the cancerous and non-cancerous lung tumor. The experimental results revealed that the classifier methods efficiently and accurately predict the cancerous tumor in the lung CT images up to 98.42% of accuracy along with a minimum classification error which was 0.038.
Shanthi et al., 2020, [21], efficient automated lung cancer prediction and detection have been proposed in this study. The proposed model based on several phases namely data collection, feature extraction, feature selection, classification, and identification. Initially, lung CT images were collected from the Cancer Genome Atlas 'CGA' database which consists of (140) normal and (130) abnormal CT images. Next, Grey Level Co-Occurrence Matrix 'GLCM' with the Gabor Filter 'GF' was employed for feature extraction. Then, extracted features were reduced by an excellent feature selection mechanism which was Modified Stochastic Diffusion Search 'SDS' algorithm to enhance the classifiers' performance. Thus, optimal feature subsets were selected by 'SDS' which can significantly increase the classifiers' accuracy. Therefore, the cancerous tumor in the lung CT images was identified and detected by using three various classifiers namely Neural Network 'NN', Naïve Bayes 'NB', and Decision Tree 'DT'. The results of the experiment proved that the proposed system is credible in predicting and detecting cancerous lung CT images according to the accuracy achieved by each classifier which was 87.41% by 'DT', 88.52% by 'NB', and 89.63 % by 'NN'. According to the gained accuracy in this work, 'NN' was the best classifier among the 'DT' and 'NB'.
From the literature review, it is clearly noted that, image preprocessing mechanisms and techniques have a significant role in medical imaging area specifically in detecting and predicting Lung cancerous nodules that helps the doctors to detect the cancerous nodules in its early stages. As a consequence, Deep-Learning with Instantaneously Trained Neural-Networks 'DITNN' obtained the highest accuracy in classifying the Lung-Cancer nodule which was 98.42% that used Weighted Mean Histogram Equalization 'WMHE' approach as an Image Processing Mechanism followed by Hopfield Neural Net with Fuzzy C-Mean which gave 98% accuracy that used Thresholding algorithm as an Image Processing Technique.

The Proposed System for Predicting and Detecting Lung-Cancer
In this section, the primary stages that are required for proposing an efficient Lung-Cancer prediction and detection automated model are illustrated in details as follows. Figure 2 depicts the main stages of the Proposed System to recognize and estimate Lung-Cancer by utilizing Image Processing Mechanisms.

The Acquisition of Image
It was the first step of the proposed system which comprised of gathering CT-Scan images from the image database. Then, the images were stored in MATLAB software and demonstrated as a gray-scale image. The Lung CT-Scan images had little noise when compared to other forms such as MRI-Scan image. As a consequence, CT-Scan images were widely utilized in the proposing system because the CT-Scan image had superior quality, little noise, and distortion. For the investigational purpose, CT-Scans could be kept in the image database in any format such as JPEG, and PNG image standards [23].

The Pre-Processing of Image
This stage usually consists of Image Denoising or 'Noise Elimination', and Normalisation against illumination. An image is created by capturing the image. Generally, have three main forms of the digital image which are Binary, GrayScale, and Color image. The first form of the image is called Binary due to having only two main colors which are black and white. In some cases, '0' is mentioned as black while '1' as white. The second form is known as the Gray-Scale image. Each pixel consists of only one vector which is 8-bit but can be described by 256 different colors. The final form is referred to as a Color image. In each pixel, there are three main vectors. Each vector represents a color that is RGB color. Each color represented by a single byte (8-bits) [22].

A. Noise Elimination
In most of the digital images, the noise has been seen due to one of these sources such as Image Acquisition or when the image has been transmitted. The most widely known noise types are Gaussian, Photon or Poisson, Speckle, and (Salt and Pepper). So, before working on any digital image some filtrations are required to eliminate noises. Several various forms of the filter can be used for this purpose such as Min-Filter, Max-Filter, and Median-Filter, etc. The most widely utilized filter was Median-Filter. The noise is eliminated in the Median-Filter by image smoothing. ISSN 2714-6677 Vol. This filter drops the intensity variation among all pixels of an image. In Median-Filter, the value of pixel was exchanged by the average median value. The pixel value was ordered in ascending to calculate the median value and then the middle pixel value will be changed with the median value. This filter is efficient for impulse noise reduction [24] [25].

B. The Enhancement of Image
It simply defined as an approach to advance image quality. As a result, the obtained image was better than the image before enhancement. The process of enhancing the image quality consists of making the image brighter or darker that could be done by Matlab software. The main goal for advancing the quality of the digital image enhancement was to develop the image appearance or to give a superior transform representation of an image. The enhancement step is very significant because various images such as medical, satellite and aerial images suffer from contrast and noise. As a result, it is essential to improve the contrast and eliminate the noise to improve the quality of the image [23].

The Segmentation of Image
In computer vision, the process of subdividing a digital image into numerous sections known as Segmentation. Image segmentation is normally utilized to detect entities and boundaries in the images. Indeed, the process of allocating a label for each pixel in the digital image is referred to as image segmentation in which the pixels that had a similar label share their visual features. The major target of segmenting an image is to simplify and alter the image representation into a means to analyze. An image segmentation outcome is a group of sections that cooperatively cover the whole image or a group of shapes removed from the image. All the pixels in the area were the same except some features such as color, intensity, and texture. The main objective of processing any digital image was to find proper features that can easily differentiate the feature from others. By segmenting the image into the different segment, the pixels could be easily checked whether it belonged to the object or not which produced binary-image. If the pixel value was '1', then the pixel belonged to the object. However, the pixel value '0' means that the pixel did not belong to the object [26].

A. Thresholding Method
It was relied on a threshold value 'T' to alter a gray-scale image into a binary-image or white and black image. The conversion from gray-scale into binary-image could be done by choosing a sufficient threshold value 'T'. The main benefit of Thresholding was in the discerning foreground from the background. The binary-image must comprise all of the vital information about the location and form of the objects. The most common method to change a gray-scale image into a binary-image was to choose a single threshold value 'T'. Then, all the gray-scale values lower than this 'T' will be assigned as black '0', and those higher than this 'T' will be assigned as white '1'. The most common method of thresholding is Otsu's system is relied on threshold value 'T' which selected by statistical criteria [27] [28].

The Extraction of Feature
This step is counted as the most significant step in image processing mechanisms that utilizing algorithms and methods to identify and separate various preferred slices or features of an image. After the segmentation is completed on the Lung area, the features can be gained from it and the analysis rule can be calculated to exactly identify the malignancy nodules in the Lungs. This diagnosis or analysis rules can remove the false recognition of malignancy nodules which resulted in segmentation and offers improved diagnosis [29].

A. Binarization Method
It is a method that exchanges a gray-scale image into a black and white image 'binary-image. In binarization method '0' assigned to black pixel whereas '1' for a white pixel. This method relies on the fact of the number of black and white pixels. If the '0' which is black pixels are higher than the threshold 'T', then it means that the image is standard or normal. However, if the '0' which is black Vol. 1, No. 3, November 2019, pp. 20-31 Ahmed (Lung Cancer Prediction and Detection Using Image Processing Mechanisms: An Overview) pixels are less than the 'T', then it depicts that the image is not standard or abnormal. This approach is enumerated as the easiest way for the identification and estimation of Lung-Cancer [29] [30]. Table 1 presents the comparison among the newly reviewed research papers in this era. In this survey paper, a comprehensive review has been done about the up-to-date research papers that conducted by the researchers in this field, and the main steps of image processing techniques to identify and predict the Lung-Cancer tumor are as follows: image acquisition, pre-processing, segmentation, and feature extraction. The comparison has been done in this section among various proposed system which had been reviewed in the literature review in this paper to detect and predict Lung-Cancer and differentiating whether the nodule is cancerous or non-cancerous in terms of the image processing mechanisms, classifiers, and accuracy. As a result, this survey indicated that the best proposed systems which were gained high accuracy were projected by Shakeel, Taher, and Ashwini that they recommended Weighted Mean Histogram Equalization 'WMHE', Thresholding, and Grey Level Co-Occurrence Matrix 'GLCM', K-means and clustering in their projected scheme as the most critical image processing mechanisms in identifying and recognizing Lung tumor. The accuracy of (98.42%) obtained in the study [20] outperforms the other studies because it used a highly accurate and efficient classifier named Deep-Learning with Instantaneously Trained Neural-Networks 'DITNN'. Likewise, the study [14] attained high accuracy rate which was (98%) due to using efficient and accurate classifiers called Hopfield Neural Net and Fuzzy C-Mean. Based on the attained results in the reviewed paper, it is noted that image processing techniques have a significant role in the medical area.

Conclusion
Cancer is counted as a deadly disease among the other form of diseases. Lung-Cancer is the widest form of cancer that many people around the world suffer from it. Lung-Cancer can be controlled if it is recognized in its early stage. Image Processing Mechanism has a critical role particularly in the medical area to recognize and estimate various diseases among them Lung-Cancer. The accurate system can be proposed by implementing the main phases of Image Processing to predict and detect lung cancerous tumors which mainly consist of noise removing, image enhancing, image segmenting, extracting feature, and classifying the feature to identify whether the nodule is normal or abnormal. Many systems were proposed by the researchers and most of them have been reviewed in this survey paper. The main purpose of proposing these systems was to predict and detect lung cancerous nodules by utilizing various classifiers such as Neural Net, Support-Vector-Machine, and D-Tree, etc. As a result, Deep-Learning with Instantaneously Trained Neural-Networks 'DITNN' obtained the highest accuracy in classifying the Lung-Cancer nodule which was 98.42% that used Weighted Mean Histogram Equalization 'WMHE' approach as an Image Processing Mechanism, Hopfield Neural Net with Fuzzy C-Mean gave 98% accuracy which used Thresholding algorithm as an Image Processing Technique and followed by multi 'SVM' classifier which gained 97% accuracy that utilized Grey Level Co-Occurrence Matrix 'GLCM', K-means and clustering as an Image Processing Technique.
For future study, I intend to propose an efficient Lung-Cancer detection and prediction system by applying image preprocessing steps such as image preprocessing using weighted median filter, image segmentation using watershed technique, feature extraction using swarm intelligence algorithm like shark smell optimization, nodule classification using light gradient boosting machine (lightGBM).