Mangrove Forest Classification in Drone Images Using HSV Color Moment and Haralick Features Extraction with K-Nearest Neighbor

There are problems with changes in forest area uncontrolled mangrove. Sometimes people cut down mangrove forests and converted it into aquaculture, agriculture or development projects [1]. Some caused by natural disasters, anthropogenic forces, and uncontrolled population growth [2]. On the other hand, mangrove forests provide habitat for several species of mammals, birds, crustaceans, fish, and others. Apart from being an animal habitat, mangroves can reduce the impact of sea waves, tsunamis, and wind erosion [3]. At other side, there is a large amount of carbon released to atmosphere due to mangrove deforestation. The need for management and supervision of this mangrove forest in maintaining the function of the mangrove forest and can be used to estimating carbon stocks [4] [5].

Due to the problems with uncontrolled changes in mangrove forests, a forest function management and supervision is required. The form of mangrove forest management carried out in this study is to measure the area of mangrove forests by observing the forests using drones or crewless aircraft. Drones are used to take photos because they can capture vast mangrove forests with high resolution. The drone was flown over above the mangrove forest and took several photos. The method used in this study is extracting color features using mean values, standard deviations, and skewness in the HSV color space and texture feature extraction with Haralick features. The classification method used is the k-nearest neighbor method. This study conducted three tests, namely testing the accuracy of the system, testing the distance method used in the k-nearest neighbor classification method, and testing the k value. Based on the results of the three tests above, three conclusions obtained. The first conclusion is that the classification system produces an accuracy of 84%. The second conclusion is that the distance method used in the knearest neighbor classification method influences the accuracy of the system. The distance method that produces the highest accuracy is the Euclidean distance method with an accuracy of 84%. The third conclusion is that the k value used in the k-nearest neighbor classification method influences the accuracy of the system. The k-value that produces the highest accuracy is k = 3, with an accuracy of 84%.
The form of mangrove forest management carried out in this study is to measure the area of mangrove forests. Because mangroves do not only consist of mangrove trees but also land and water areas, it is necessary to calculate the mangrove area only to get a more accurate area value. Besides, distinguishing between the extent of natural mangrove forests and mangrove forests planted helps researchers to measure the success of mangrove planting.
The research conducted was to observe mangrove forests using drones or crewless aircraft. The drone was flown over above the mangrove forest and took several photos. Photos taken will be used as data to calculate the area of mangrove forests [6]. In one photo, the area of natural mangroves, mangroves from planting, land areas, and water areas will be calculated. Drones are used to take photos because they can capture vast mangrove forests with high resolution [7]. Drones are more widely used than humans because of the cost of drones and time is more efficient than humans [8].
This system was created to assist researchers in measuring the area of mangrove forests in the images taken. The system can classify natural mangrove forests, planted mangrove forests, land areas, and water areas. One image will be classified for each part so that it can be distinguished between natural mangrove forests, mangrove forests as a result of planting, land areas, and water areas. To classify drone images taken automatically, an introduction to the characteristics of mangrove forests is needed. There needs to be a distinction between natural mangrove forests, mangroves from planting, land areas, and water areas. Characteristics that can be used as differentiators are color, shape, and texture characters [9] [10].
In previous studies, suitable color characteristics used for representation in forest imagery were HSV (Hue, Saturation, Value). The Hue value is a pure color value and is represented in numbers from 0 to 360, the Saturation value is a value that shows the percentage of gray colors in the image and is represented in decimal numbers from 0 to 1, and the Value is the brightness value in the image and presented in numbers decimal 0 to 1 [11]. Color characteristics have three parameters that can be used namely mean, standard deviation, and skewness. In addition to the color characteristics, texture characteristics can also be used. One method that can be used is Haralick features. In previous studies, the texture characteristics of Haralick features were used to extract features in high-resolution satellite imagery. This study resulted in an accuracy rate of 93.29% [12].
In addition to extracting the values of color and texture features, classification is also needed. There are several classification methods, such as K-nearest Neighbor, Support Vector Machine (SVM), Naive Bayes, and others [13]. Based on previous research, the k-nearest neighbor method is a method that has a high level of time efficiency and uses less memory [14]. Also, the k-nearest neighbor method produces higher accuracy values compared to the SVM method in the classification using Haralick's texture features based on previous research [12].
Based on the needs of management and supervision of mangrove forest areas and previous studies, this study makes an application that can classify mangrove trees from drone images using HSV color features and Haralick texture features with the k-nearest neighbor classification method.

Related Work
There are several previous studies relating to the feature extraction method and the classification method used in this study. These studies are used as a reference in solving problems in this study.
The first study is research on the introduction of image developments of chicken embryo by using GLCM texture features with k-nearest neighbor method. This study classifies the images of chicken embryo development. This study uses GLCM as features. To extract texture features, this study uses GLCM. GLCM is made by using 4 different angular directions, namely 0°, 45°, 90° and 135°. The features extracted from the GLCM matrix are features of homogeneity, contrast, correlation, energy, and entropy. The classification method used in this study is k-nearest neighbor using the Euclidean. The results obtained from this study were the highest accuracy of 93.33% [15].
The second study is a study of the effect of color models on the performance of Content-based Image Retrieval (CBIR) systems based on color moments. This study focuses on measuring the effect of taking color moment values on different color spaces in images to accuracy. This study uses the mean color features, standard deviations, and skewness in the HSV color space. This study conducted three tests on different color spaces, namely the HSV color space, the RGB color space, and the combined RGB and HSV color space. The best results obtained from these tests are the accuracy of 91% in the HSV color space testing. The combined color space of RGB and HSV produces an accuracy of 90%, and the RGB color space produces an accuracy of 88% [16].
The third research is research on the classification of satellite images using SVM classification methods. This study classifies objects captured on satellites. This study uses the color features and texture features of Haralick. Haralick texture features used are features of homogeneity, contrast, correlation, local homogeneity, and entropy. This study conducted several tests to measure the effect of color features and texture features on accuracy. The best results were obtained in tests using color features and texture features with an accuracy of 93.68%. Meanwhile, using only the color feature, the accuracy obtained is 83.19% and by using the texture feature only, the accuracy obtained is 87.27% [17].
The fourth study is a study of the diagnosis of glaucoma eye disease using Haralick's texture features. This study uses the retinal eye image to diagnose glaucoma. The retinal image can determine whether the patient has glaucoma or not. The feature used in this study is the Haralick texture feature, which consists of 14 features. These features are the angular second moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, information of correlation, and maximum correlation coefficient. The classification method used is a k-nearest neighbor. The results obtained from this study are accuracy above 98% in diagnosing retinal eye images [18].
Based on the four previous studies, the exact texture feature extraction method used for classification in satellite imagery is Haralick as evidenced by the fourth study. Meanwhile, the most appropriate color feature is the HSV color moment as evidenced by the second study. Then, the appropriate classification method used is the k-nearest neighbor method by comparing the level of accuracy in the third and fourth studies. The details of researches used on this paper can be seen in Table 1.

HSV Color Moment Feature Extraction
Color features are an essential feature of images because colors can be seen visually by the human eye [18] [19]. The color feature has three parameters that can be used for value extraction, which are mean, standard deviation, and skewness. One technique that can be used to extract color feature values is to use a color histogram. Color histograms often have noise values that reduce accuracy, so a color moment technique is needed to overcome them [20].
The Hue value is a true color value. Numbers from 0 to 360 and is the number to represent it. The Saturation value is a value that shows the percentage of gray colors in the image and is represented in decimal numbers from 0 to 1. Moreover, the Value is the brightness value in the image and presented in numbers decimal 0 to 1 [21]. The formula to calculate the Hue value can be seen in Equation 1 [22] [23].
Where Max is Maximum value between the red, green, and blue color values. Min is a minimum value between the red, green, and blue color values. R is a red color value. G is a green color value. Furthermore, B is a blue color value.
The formula for calculating the Saturation value can be seen in Equation 2 [22] [23].
The formula for calculating value can be seen in Equation 3 [22] [23].
The color moment is a representation of the value of color characteristics stored in the image. The color moment has three operations in extracting these values in the form of mean, standard deviation, and skewness [24] [25]. The mean value is the average color value of the image, the standard deviation value is the square root value of the variant, and the skewness value is the value of the degree of asymmetry. The formula for calculating the mean can be seen in Equation 4 [26]. The formula for calculating the standard deviation can be seen in Equation 5 [26]. The formula for calculating the skewness value can be seen in Equation 6 [26].
Where Pij is the value of the i th color component in the j th pixel, and N is the number of pixels of the image.

Haralick Features Extraction
Texture features are visual patterns that show homogeneity in the image. The texture feature has essential information regarding the pattern of image structure and its relationship to the environment around the image [27]. Some techniques that can be used to extract texture features are Local Binary Pattern (LBP), Gray-level Co-occurrence Matrix (GLCM), and Haralick features [28] [29] [30].
Haralick features are one method that can be used to extract texture features from images. Haralick features use the co-occurrence matrix to store the value of texture features. Based on the research, the Haralick features have five features that can be used, namely Contrast, Correlation, Energy, Entropy, and Local Homogeneity. The five features will be calculated at degrees 0, 45, 90, and 135 to create a co-occurrence matrix [31]. The formula to calculate the value of the five features can be seen in

K-Nearest Neigbor
K-nearest neighbor algorithm is one of the classification algorithms in recognizing patterns [32]. This algorithm classifies based on the class that has the lowest distance value or the highest similarity value than other classes. This algorithm has the advantage of being fast, easy to learn, robust against noise, and valid for extensive data. Meanwhile, the weakness is that this algorithm will produce low accuracy if the training data contains irrelevant values [14]. In the k-nearest neighbor method, a method is needed to determine the value of distance or similarity, such as Average Distance, Euclidean Distance, Manhattan Distance, and Maximum Distance. The formula for calculating distances in the Average Distance method can be seen in Equation 12. Moreover, the formula for calculating Average Distance, Euclidean Distance, Manhattan Distance, and Maximum Distance uses equations in Equations 12, 13, 14, and 15, respectively.

Proposed Method
The steps of the method to be used in this study are as follows: 1. Input the training and testing data Data used in this study were divided into 4 classes namely, natural, replanted, soil, and river. Natural is a mangrove forest that grows naturally, replanted is a mangrove forest that grows by planting by humans, soil is an area of land, and a river is a river. Examples of data used in this study can be seen in Figure 1.  3. Classification by applying the K-Nearest Neighbor method.

Extract
The steps used above are to combine all the methods used in source research in Table 1. It is expected that by combining the methods used in the source paper, a better accuracy value is obtained in doing image classification.

Testing for Distance Methods
Testing the accuracy of the distance method system is done using 4 scenarios: • The first scenario is to classify K-Nearest Neighbors using the Average Distance method.
• The second scenario is to classify K-Nearest Neighbors using the Euclidean Distance method.
• The third scenario is to classify K-Nearest Neighbors using the Manhattan Distance method.
• The fourth scenario is to classify K-Nearest Neighbors using the Maximum Distance method.
The test was carried out using 20 training data and 1 test data. Each test data will be divided into 25 equal parts so that there will be 25 classifications. The k-value used in the k-nearest neighbor classification method is 3. The results of system testing using the Average Distance method can be seen in Table 2.  Based on Table 2, the test results using the Average Distance method obtained 17 results from a total of 25 images that are classified correctly, consisting of 7 natural class images, 9 replanted class images, 1 soil class image, and 0 river class images. The accuracy of this test is 68%. Table 3 shows the results of system testing using the Euclidean Distance method. Table 3.
Confusion  Table 3, the test results using the Euclidean Distance method obtained 21 results from a total of 25 images that are classified correctly, consisting of 7 natural class images, 12 replanted class images, 2 soil class images, and 0 river class images. The accuracy of this test is 84%. Table 4 shows the results of testing the system using the Manhattan Distance method.  Table 4, the results of testing using the Manhattan Distance method obtained 18 results from a total of 25 correctly classified images, consisting of 7 natural class images, 9 replanted class images, 2 soil class images, and 0 river class images. The accuracy of this test is 72%. The results of testing the system using the Maximum Distance method can be seen in Table 5. Table 5.
Confusion  Table 5, the results of testing using the Maximum Distance method obtained 12 results from a total of 25 correctly classified images, consisting of 7 natural class images, 5 replanted class images, 0 soil class images, and 0 river class images. The accuracy of this test is 48%.
Based on the test results on the four tests above, we get different results between tests. The accuracy produced by the Average Distance, Euclidean Distance, Manhattan Distance, and Maximum Distance methods can be seen in Figure 2.

Fig. 2. Comparison of Accuracy of Distance Method
Based on the results of the system accuracy in Figure 2, it was found that the Euclidean distance method produces a higher level of accuracy compared to the Average Distance, Manhattan Distance, and Maximum Distance methods. The highest accuracy produced is 92%. This test proves that the distance method used in the k-nearest neighbor classification method has an influence on the accuracy of this classification system.

Testing for K-Value
Testing the value of k against the accuracy of the system is done by using k values of 3, 5, 7, and 9. The distance method used is Euclidean Distance. This test uses 20 training data and 3 test data. Each test data will be divided into 25 equal parts so there will be 75 classifications. The results of testing the system using the value k = 3 can be seen in Table 6. Table 6.
Confusion Based on Table 6, the results of testing using K = 3, there were 63 of a total of 75 images classified correctly, consisting of 18 natural class images, 37 replanted class images, 4 soil class images, and 4 river class images. The accuracy of this test is 84%. The results of testing the system using the value k = 5 can be seen in Table 7. Based on Table 7, the results of testing using K = 5, there are 55 of a total of 75 right class images, consisting of 16 natural class images, 33 replanted class images, 3 soil class images, and 3 river class images. The accuracy of this test is 74.67%. The results of testing the system using the value k = 7 can be seen in Table 8. Based on Table 8, the results of testing using K = 7, there are 53 of a total of 75 images that are classified correctly, consisting of 15 natural class images, 32 replanted class images, 3 soil class images, and 3 river class images. The accuracy of this test is 70.67%. The results of testing the system using the value k = 9 can be seen in Table 9. Table 9.
Confusion Matrix Testing K = 9 Based on Table 9, the test results using K = 9, there are 47 out of a total of 75 images classified correctly, consisting of 11 natural class images, 30 replanted class images, 3 soil class images, and 3 river class images. The accuracy of this test is 62.67%.

Prediction
Based on the test results on the four tests above, we get different results between tests. The accuracy produced by each k-value can be seen in Table 10. Based on the results of the system accuracy in Table 10, it was found that testing with a value of k = 3 resulted in a higher level of accuracy compared with a value of k = 5, a value of k = 7, and a value of k = 9. The highest accuracy produced is 84%. This test proves that the value of k has an influence on the accuracy of this classification system.

Conclusion
Based on the testing that has been done, the following conclusions are obtained the classification system using the HSV color moment color features and Haralick feature texture features using the knearest neighbor classification method produces an accuracy of 84% using Euclidean distance with k=3.
With a good accuracy, it is expected that this system can be used in managing mangrove forest areas in order to maintain the function of mangrove forests and avoid undesirable things, such as