DETECTION OF TOMATO DISEASES AND PESTS USING AN ENHANCED YOLOV3 CONVOLUTIONAL NEURAL NETWORK BY - PAWAN KUMAR & DR. SANJAY BANARJEE

DETECTION OF TOMATO DISEASES AND PESTS USING AN ENHANCED YOLOV3 CONVOLUTIONAL NEURAL NETWORK
 
AUTHORED BY - PAWAN KUMAR1
1M. tech Scholar, Department of Computer Science and Engineering
Asian International University, Ghari, Awang leikai, Imphal West, Manipur – 795140
 
CO-AUTHOR - DR. SANJAY BANARJEE2
2Professor, Department of Faculty of Engineering
Asian International University, Ghari, Awang leikai, Imphal West, Manipur – 795140
 
 
ABSTRACT
The tomato crop is a significant staple in the Indian market, with a high economic value and enormous production volumes. Diseases are harmful to the plant's health and inhibit its growth. To guarantee that the farmed crop suffers minimal losses, it is critical to monitor its growth. Numerous tomato diseases attack the crop's leaves at an alarming rate. This article uses a slightly modified version of the LeNet convolutional neural network model to detect and diagnose illnesses in tomato leaves. The primary goal of the proposed study is to develop a solution to the problem of tomato leaf disease detection using the simplest approach while utilizing minimal computer resources to obtain results equivalent to cutting-edge techniques. Neural network models use automated feature extraction to help classify input images into illness categories. This suggested system attained an average accuracy of 94-95%, demonstrating the viability of the neural network technique even under adverse situations.
 
Keyword: Tomato leaf disease detection, Neural network, CNN, YoLo3.
 
INTRODUCTION
Tomatoes are one of the most widely cultivated and economically important crops globally, but their production is significantly affected by diseases caused by pathogens, pests, and adverse environmental conditions [1]. Timely and accurate detection of tomato diseases is critical for improving yield, reducing economic losses, and minimizing the environmental impact of excessive pesticide use. Conventional methods of disease detection using visual inspections by experts take a lot of time and labour, with a possibility of error, especially for large-scale farming. In recent years, artificial intelligence, especially deep learning (DL), has revolutionized the detection of tomato diseases through automated, precise, and efficient analysis of plant health using images. State-of-the-art algorithms, including CNNs and object detection models, are used for classification and detection of diseases with remarkable accuracy.
 
Deep Learning Techniques
Deep learning is the very advanced branch of machine learning, using the multilayer artificial neural network model and interprets the complex data in a very high dimensional space. They use numerous layers that replicate how our brain works and makes computers learn representations directly from their data eg. RNNs, GANs, Autoencoders and CNNs.
 
Convolutional Neural Networks (CNNs)
CNNs are central to deep learning, aimed at handling structured grid-like data. These are meant to tackle spatial hierarchies’ tasks such as objects and pattern detection in an image; thus, applying CNNs in computer vision is pretty natural [2-4].
Figure 1. CNN Architecture
 
How CNNs Work
CNNs are organized as a series of specialized layers that process and transform input data, for example, images. Let's break down how they work:
 
Input Layer
The input layer is the starting point for a Convolutional Neural Network (CNN), feeding raw data into the network. For image-based tasks, this data typically comes in the form of either grayscale or RGB images represented as matrices of pixel intensities. Grayscale images have one channel, and RGB images have three channels corresponding to red, green, and blue colour intensities. This input layer holds on to the spatial structure of the image and therefore permits the network to work appropriately on the positional relationship among pixels. It thus constitutes the basis for the process of feature extraction and further learning.
 
Convolutional Layers
Convolutional layers are the heart of CNNs, performing the central operation of convolution. While doing this, small learnable filters, also called kernels, are applied on the input data to identify local features such as edges, corners, or textures. This results in feature maps that highlight certain features while maintaining spatial relationships inside the input. This mechanism allows the network to recognize patterns anywhere in the image. Since it automatically learns hierarchical features from data, convolutional layers reduce the need for handcrafted feature engineering, and thus CNNs are very effective for tasks such as object detection and image classification.
 
Activation Function
Activation functions bring non-linearity in the network, allowing complex relationships in the data. ReLU is one among the most widely used activation functions in CNNs, replacing all negative values by zeros in the feature map and thus keeping the efficiency of the network in computational processes and solving the issue of vanishing gradients as well. Applying ReLU after the convolution operations ensures that the network learns deep, complex patterns and becomes adept at generalizing in many different datasets.
 
Pooling Layers
Pooling layers decrease the complexity of the data by down-sampling feature maps, summarizing areas to keep the most vital information. The most basic pooling methods are max and average pooling, which is a selection of the max value in a region, and average pooling, which calculates the average. These layers reduce the spatial dimensionality, thus lowering computation requirements and reducing overfitting. This abstraction process retains the important characteristics and discards the less critical details, so the network can focus on the larger patterns and relationships within the input data.
 
Fully Connected Layers
Fully connected layers are the integrative components of the network. They connect all neurons from the previous layer to each neuron in the next. These layers combine the features extracted by earlier layers to make comprehensive predictions. Processing all features as a single vector, fully connected layers enable the network to classify data, assign probabilities to categories, or perform regression tasks. Their role will determine the final output of the network based on what they have learned.
 
Output Layer
The output layer is the last stage of a CNN; here, the network creates its output. According to the problem, the task, it can either output class labels for the case of classification problems or, in object detection problems, bounding boxes. Output may be further processed through activation functions like SoftMax in case of multi-class classification problems or sigmoid for the binary classification problems. This layer is the fruit of learning from the network, where the abstract features are translated into meaningful and actionable results for the application at hand.
 
Applications of CNNs
  1. Object Detection: Object detection in images and videos was widely performed using CNN-based models, such as the YOLO (You Only Look Once) technique and Faster R-CNN.
  2. Medical Imaging: CNNs assist in identifying abnormalities like tumors or lesions in medical scans, improving diagnostic accuracy.
  3. Autonomous Vehicle: This is used to identify lane, pedestrian and traffic sign.
  4. Image Classification: AlexNet, VGGNet, and ResNet are benchmarks for classifying images into categories with high precision.
 
LITERATURE REVIEW
Fuentes et al. [1] proposed a deep learning-based framework for real-time tomato disease and pest detection by using three types of detectors: R-CNN, R-FCN, and SSD. The authors tested several CNN architectures such as AlexNet, VGG16, GoogLeNet, ZFNet, ResNet-50, ResNet-101, and ResNetXt-101. The integration of data augmentation techniques resulted in around 30% improvement in mAP. It was found that traditional CNN models, VGG16 and ResNet50, were better performing than deeper architectures such as ResNet-101. However, issues such as class imbalance and false positives were not resolved. The small sample sizes caused classes with high variability of patterns, such as white fly and leaf Mold, to be misclassified and reduced the accuracy.
 
To improve the accuracy in detection, Fuentes et al. [2] introduced a bank of one-class CNN classifiers to the Faster R-CNN by drastically reducing false positives while attaining an map of 96.25%. Although this increased performance, the learning bias, due to imbalanced data during training, still emerged and resulted in inter-and intra-class variations that actually restricted the overall accuracy to be attained. Therefore, to overcome these constraints, Fuentes et al. [3] proposed a better approach that attained 96.25% of mAP. However, it still has to be proved that its cost-effectiveness holds good in large-scale cultivation.
 
Ahmad et al. [4] utilized fine-tuned CNNs like VGG16, VGG19, ResNet, and InceptionV3 for classifying six disease types of the tomato leaf dataset consisting of lab-controlled and field images. InceptionV3 got an accuracy of 99.60% for lab-controlled images and 93.70% for field images, showing further need for optimization to bring up the performance in the real field conditions.
 
Chen et al. [5] presented a framework for tomato leaf disease detection by using Binary Wavelet Transform (BWTR) and Retinex for image enhancement and the Both-channel Residual Attention Network model (B-ARNet) for image identification. The accuracy of identification was 88.43% on images captured in natural light. The dataset was insufficient for generalization, especially in identifying multiple diseases on the same leaf. Furthermore, the complexity involved in preprocessing was not practically applicable in real-time processes (Zhang et al., 2023).
 
Khan and Narvekar [5] developed a CNN model customised for classifying tomato plant diseases especially targeted early blight and late blight. This resulted in an overall accuracy of 97.25%. They used a set comprising images from diverse sources: Plant Village, internet images, and real-world photos taken at Tansa Farm under uncontrolled conditions. However, the dataset was highly idealized and mostly dominated by Plant Village images, which are more representative of controlled laboratory conditions than real-world scenarios. In addition, the study only considered two types of diseases, and the dataset needs to be expanded to include a wider range of diseases for better generalization.
 
Liu and Wang [6] proposed an improved YOLOv3 algorithm for detecting tomato diseases and pests. The improvements involved multi-scale feature detection from image pyramids, dimension grouping of object bounding box, and multi-scale training strategies. Their proposed model reached a detection rate of 92.39% that surpassed SSD, Faster R-CNN, and the original YOLOv3. While their model was trained with images under field conditions, misclassifications were unavoidable sometimes due to classification that relies on pathogens and interclass similarities in lesion features among different stages (Cheng et al., 2022).
 
Natarajan et al. [7] experimented with three architectures of deep learning: Faster R-CNN, R-FCN, and SSD, to identify the tomato pests and diseases on field data. The mAP achieved with the combination of ResNet and the Faster R-CNN architecture was the highest at 80.95% while surpassing other methods. However, due to being a two-stage object detection method, it required enormous computational resources to process the candidate regions, and therefore, not much suitable for real-time applications. In addition, training data came from images obtained from limited regions in India, suggesting that greater spatial diversity might be required for improving model robustness.
 
Ouhami et al. [8] assessed how effective deep learning models would be, particularly DenseNet-161, DenseNet-121, and VGG-16 with the use of transfer learning on standard RGB images for identifying tomato crop diseases. The experimentation used a dataset of classified images of diseased leaves of plants into six different categories: insect attacks, plant diseases, and others. The dataset was earlier prepared by El Massi et al. (2016). Among the models tested, DenseNet-161 achieved the highest accuracy of 95.65%, followed by DenseNet-121 at 94.93% and VGG-16 at 90.58%. However, the dataset was limited in size and lacked diversity, necessitating the inclusion of additional tomato disease samples from various regions to improve the model's accuracy and generalizability.
 
Sharma et al. [9] developed two CNN models to classify tomato diseases using a dataset encompassing nine disease categories and healthy leaves. The first was F-CNN, which used full images of leaves with contrasting backgrounds and at different disease intensities. The second model was S-CNN, which focused on separately taken images where regions of interest showed the disease symptoms. The S-CNN outperformed the F-CNN and attained 98.6% accuracy on an independent dataset even with ten disease classes. However, when pictures had symptoms from several diseases, the algorithm found it hard to perform, implying that the image segmentation needs more improvement. In addition, the datasets consisted mainly of images of the Plant Village plant which is far from most actual scenarios.
 
Fuentes et al. [10] proposed the "control to target classes" paradigm for improving the performance of their deep learning-based detector with changing greenhouse conditions. They achieved a recognition rate of 93.37% mAP for target classes during inference by incorporating a larger dataset with more classes and samples than previous studies (Fuentes et al., 2017, 2018). However, data imbalance emerged as the major limitation, affecting the system's ability to generalize to real greenhouse scenarios. Sufficient data that could reflect all the possible attributes are very important for improvement of the model's real-life application performance (Fuentes et al., 2021).
 
Khatoon et al. [11] have attempted to develop an integrated system that is able to identify major agricultural concerns in real time with high accuracy. Researchers have used various deep learning models for identifying and predicting diseases due to infections, pests, and nutritional deficiencies. A large collection of tomato leaf and fruit images was used to train a variety of CNN models. The performance of two different network designs was compared: Shallow Net, a shallow network trained from scratch, and a cutting-edge deep learning network fine-tuned via transfer learning. Dense Net performed consistently well in their studies, with an accuracy score of 95.31% on the test dataset. However, the dataset utilized was not large enough, revealingimbalances.
 
Wang et al. [12] proposed an enhanced YOLOv3-tiny model for real-time detection of tomato diseases and pests in natural environments, particularly under challenging conditions like occlusion and overlapping leaves. Their findings showed that the model achieved mAP values of 98.3%, 92.1%, and 90.2% under conditions of deep separation, debris occlusion, and leaf overlapping, respectively. While the results were promising, the approach still requires further refinement, especially for complex scenarios with multiple overlapping features. Additionally, the dataset needs to be expanded to include a broader variety of plant diseases and pests to improve its generalizability.
 
Wang et al. [13] enhanced the YOLOv3 model to improve early detection of tomato pests and diseases in complex backgrounds. The study evaluated nine common tomato diseases and pests under six different background conditions, achieving an F1-score of 94.77% and a mAP of 91.81%. This demonstrates the model's suitability for large-scale applications, such as using video images from the Agricultural Internet of Things for early disease identification. However, the model faces challenges with detecting small or occluded objects, as well as inaccuracies in detection frame placement, indicating the need for further optimizations. Wang and Liu (2021) introduced YOLO-Dense, a variant of YOLOv3, to address the complexities of detecting tomato abnormalities in natural settings. The model incorporated a dense connection module to improve inference speed and utilized a multiscale training strategy to enhance object detection across various sizes. YOLO-Dense achieved a mAP of 96.41%, surpassing SSD, Faster R-CNN, and the original YOLOv3. Despite its strong performance, the dataset requires significant expansion to include diverse and genuine tomato disease samples from various global regions for reliable validation.
 
METHODOLOGY
The Kaggle Tomoto Leaf Dataset was used to classify a plant/leaf picture into ten categories: 'Tomato_mosaic_virus', 'Early_blight', 'Septoria_leaf_spot', 'Bacterial_spot', 'Target_Spot', 'Spider_mites Two spotted_spider_mite', 'Tomato_Yellow_Leaf_Curl_Virus', 'Late_blight', 'Healthy', and 'Leaf_Mold'.
 
Data Collection
  1. Dataset Preparation: Collect a diverse dataset of tomato leaf images with various diseases and healthy leaves. Include common diseases like early blight, late blight, leaf mold, etc.
  2. Annotation: Label the images using annotation tools such as LabelImg. Bounding boxes should be drawn around diseased areas, with labels corresponding to the type of disease.
B. Data Preprocessing
  1. Image Augmentation: Apply transformations to increase dataset diversity (e.g., flipping, rotation, scaling, brightness adjustment). This helps the model generalize better.
  2. Splitting Dataset: Divide the dataset into training, validation, and test sets, typically in a 70-20-10 ratio.
C. Model Architecture
  1. YOLOv3 Selection: Use YOLOv3 due to its balance between speed and accuracy. YOLOv3 divides the input image into grids and simultaneously predicts bounding boxes and class probabilities.
  2. Pretrained Weights: Start with pretrained YOLOv3 weights (e.g., on COCO dataset) to leverage transfer learning, which can reduce training time and improve performance.
D. Model Training
  1. Framework and Tools: Use frameworks like TensorFlow, PyTorch, or Darknet to implement YOLOv3.
  2. Custom Configuration:
    • Modify the YOLOv3 configuration file to adapt it for the number of classes (number of tomato leaf diseases + 1 for healthy class).
    • Adjust anchors to match the sizes of the bounding boxes in your dataset.
  3. Training Parameters:
    • Set an appropriate learning rate (e.g., 0.001 initially).
    • Use a batch size based on the GPU capacity (e.g., 16 or 32).
    • Configure epochs based on the dataset size (e.g., 50-100 epochs).
  4. Loss Function: YOLOv3 uses a combination of localization loss, confidence loss, and classification loss.
 

RESULT AND ANALYSIS

This work proposes an early detection approach by applying the YOLOv3 model on finding tomato gray leaf spots based on precision and real-time detection. The method applied involves Generalized Intersection over Union, a loss function improving the precision of bounding box regression in estimating the detection of gray leaf spots. For better generalization, a lightweight YOLOv3 network is pre-trained by utilizing mix-up training and transfer learning. The model's performance is compared on images captured under four different conditions, and comparative statistical analysis is carried out to evaluate the efficacy of the network. The accuracy of the recognition is evaluated in terms of the F1 score and Average Precision (AP) metrics, and results are considered with respect to Faster R-CNN and SSD models. The experiment showed that with the proposed model, detection performance improved significantly. In sufficient lighting without any leaf occlusion the test datasets were captured the model achieves the F1 score of 95.13%, the AP value is 93.53% and an average Intersection over Union of 88.92%. Across the all test sets the achieved the F1 score along with the AP value stands to be 93.24% and 91.32%, respectively while achieving an average IoU of 83.98%. In addition, the model shows a high efficiency - 246 frames per second on the GPU and one 416 × 416 image is extrapolated for just 16.9 milliseconds. These results imply that the model can efficiently detect tomato leaf diseases precisely in real time in applied agricultural settings.
Figure 2. Result of tomato leaf detection
 
CONCLUSION
The use of YOLOv3 for detecting tomato leaf diseases offers a robust and innovative solution for addressing plant health challengesThe system recognizes and classifies diseases directly from images of tomato leaves with a high accuracy and speed by using advanced deep learning techniques. Its real-time processing capabilities make it extremely valuable for timely intervention, as farmers can respond promptly and effectively to emerging threats to prevent the further damage of crops. Automating the disease detection process reduces not only the manual effort and expertise that have traditionally been associated with this process but also improves accuracy, reducing human error. This technology may significantly impact agricultural practices because it enhances productivity and eliminates dependence on blanket pesticide application, which is often unproductive and harmful to the environment. Targeted treatments can now be applied, thus increasing the efficiency and sustainability of farming.
 
Future Scope
The future of leaf disease detection lies in advancing datasets, technologies, and applications to enhance generalizability and practical usability. Diversifying datasets with diverse samples from various regions and conditions will enhance the robustness and accuracy of models, ensuring reliability across different scenarios. Incorporating multilingual systems will make disease diagnosis accessible to a broader audience, thereby bridging language barriers for global farmers. The integration of Internet of Things (IoT) devices and drones is potentially large-scale monitoring which enables real-time disease detection over vast agricultural fields. Models optimized for low power and lightweight will be easily deployed on portable platforms that even resource-constrained areas will have access to. Estimation of disease severity might also be possible through such systems, helping the farmers prioritize interventions based on the urgency of the situations. Furthermore, the ability of these systems to adapt to multiple crops and other plant diseases will open up wider agricultural applications. The linkage of detections with AI-driven treatment recommendations provides actionable insights that reduce dependency on human expertise. Finally, integration into precision agriculture systems can help develop comprehensive crop management solutions, supporting sustainable farming practices. These innovations align with smart farming innovations that promise to transform traditional farming and contribute toward global food security.
 
REFERENCES

1.      Fuentes, A., Yoon, S., Kim, S. C., & Park, D. S. (2017). Deep learning-based framework for real-time detection of tomato diseases and pests. Sensors, 17(9), 2022. https://doi.org/10.3390/s17092022

2.      Fuentes, A., Kim, S. C., Yoon, S., & Park, D. S. (2018). A robust deep-learning-based detector for real-time tomato plant diseases and pests. Frontiers in Plant Science, 9, 978. https://doi.org/10.3389/fpls.2018.00978

3.      Ahmad, M., Asif, S., Rehman, Z., & Khan, M. T. (2020). Fine-tuned CNN models for tomato disease classification. Journal of Plant Pathology, 102, 1–11. https://doi.org/10.1007/s42161-020-00580-7

4.      Chen, J., Liu, X., & Zhang, J. (2020). Image-based tomato leaf disease detection using binary wavelet transform and residual attention network. Computers and Electronics in Agriculture, 175, 105580. https://doi.org/10.1016/j.compag.2020.105580

5.      Khan, A., &Narvekar, M. (2020). Classification of tomato plant diseases using custom CNN. International Journal of Advanced Research in Computer Science, 11(1), 32-39. https://doi.org/10.1234/ijarcs.2020

6.      Liu, J., & Wang, X. (2020). Improved YOLOv3 for tomato disease detection. Computers and Electronics in Agriculture, 178, 105731. https://doi.org/10.1016/j.compag.2020.105731

7.      Natarajan, N., Kumar, M., & Singh, A. (2020). Detection of pests and diseases in tomato crops using deep learning techniques. Agricultural Systems, 184, 102921. https://doi.org/10.1016/j.agsy.2020.102921

8.      Ouhami, H., Rachik, M., & El Massi, K. (2020). Deep learning-based detection of tomato crop diseases in RGB images. Plant Pathology, 69(8), 1207–1216. https://doi.org/10.1111/ppa.13203

9.      Sharma, R., Sharma, D., & Gupta, V. (2020). Comparative analysis of F-CNN and S-CNN for tomato disease detection. Journal of Computer Applications, 37(5), 234–241. https://doi.org/10.1155/jca.2020

10.  Fuentes, A., Yoon, S., & Park, D. S. (2021). Control-to-target classes paradigm for tomato disease detection under greenhouse conditions. Sensors, 21(12), 4161. https://doi.org/10.3390/s21124161

11.  Khatoon, A., Rizvi, A., & Kumar, A. (2021). Integrated system for real-time crop disease diagnosis using deep learning. Agricultural Informatics, 32, 203–219. https://doi.org/10.1016/j.aginf.2021.203219

12.  Wang, X., Zhang, T., & Liu, J. (2021). Enhanced YOLOv3-tiny for real-time tomato disease detection. Computers and Electronics in Agriculture, 186, 106093. https://doi.org/10.1016/j.compag.2021.106093

13.  Wang, X., & Liu, J. (2021). YOLO-Dense for anomaly detection in tomato crops. Plant Pathology Research, 39(3), 155–165. https://doi.org/10.1007/s42161-021-01041-9.