DETECTION OF TOMATO DISEASES AND PESTS USING AN ENHANCED YOLOV3 CONVOLUTIONAL NEURAL NETWORK BY - PAWAN KUMAR & DR. SANJAY BANARJEE
DETECTION
OF TOMATO DISEASES AND PESTS USING AN ENHANCED YOLOV3 CONVOLUTIONAL NEURAL
NETWORK
AUTHORED BY
- PAWAN KUMAR1
1M. tech Scholar, Department of Computer Science and
Engineering
Asian International University,
Ghari, Awang leikai, Imphal West, Manipur – 795140
CO-AUTHOR -
DR. SANJAY BANARJEE2
2Professor, Department of Faculty of Engineering
Asian International University,
Ghari, Awang leikai, Imphal West, Manipur – 795140
ABSTRACT
The tomato crop is a significant
staple in the Indian market, with a high economic value and enormous production
volumes. Diseases are harmful to the plant's health and inhibit its growth. To
guarantee that the farmed crop suffers minimal losses, it is critical to
monitor its growth. Numerous tomato diseases attack the crop's leaves at an
alarming rate. This article uses a slightly modified version of the LeNet
convolutional neural network model to detect and diagnose illnesses in tomato
leaves. The primary goal of the proposed study is to develop a solution to the
problem of tomato leaf disease detection using the simplest approach while
utilizing minimal computer resources to obtain results equivalent to
cutting-edge techniques. Neural network models use automated feature extraction
to help classify input images into illness categories. This suggested system
attained an average accuracy of 94-95%, demonstrating the viability of the
neural network technique even under adverse situations.
Keyword: Tomato leaf disease
detection, Neural
network, CNN, YoLo3.
INTRODUCTION
Tomatoes are one of the most widely
cultivated and economically important crops globally, but their production is
significantly affected by diseases caused by pathogens, pests, and adverse
environmental conditions [1]. Timely and accurate detection of tomato diseases
is critical for improving yield, reducing economic losses, and minimizing the
environmental impact of excessive pesticide use. Conventional methods of
disease detection using visual inspections by experts take a lot of time and labour,
with a possibility of error, especially for large-scale farming. In recent
years, artificial intelligence, especially deep learning (DL), has
revolutionized the detection of tomato diseases through automated, precise, and
efficient analysis of plant health using images. State-of-the-art algorithms,
including CNNs and object detection models, are used for classification and
detection of diseases with remarkable accuracy.
Deep Learning Techniques
Deep learning is the very advanced
branch of machine learning, using the multilayer artificial neural network
model and interprets the complex data in a very high dimensional space. They
use numerous layers that replicate how our brain works and makes computers
learn representations directly from their data eg. RNNs, GANs, Autoencoders and
CNNs.
Convolutional Neural Networks (CNNs)
CNNs are central to deep learning,
aimed at handling structured grid-like data. These are meant to tackle spatial
hierarchies’ tasks such as objects and pattern detection in an image; thus,
applying CNNs in computer vision is pretty natural [2-4].
Figure 1. CNN Architecture
How CNNs Work
CNNs are organized as a series of
specialized layers that process and transform input data, for example, images.
Let's break down how they work:
Input Layer
The input layer is the starting point
for a Convolutional Neural Network (CNN), feeding raw data into the network.
For image-based tasks, this data typically comes in the form of either
grayscale or RGB images represented as matrices of pixel intensities. Grayscale
images have one channel, and RGB images have three channels corresponding to
red, green, and blue colour intensities. This input layer holds on to the
spatial structure of the image and therefore permits the network to work
appropriately on the positional relationship among pixels. It thus constitutes
the basis for the process of feature extraction and further learning.
Convolutional Layers
Convolutional layers are the heart of
CNNs, performing the central operation of convolution. While doing this, small
learnable filters, also called kernels, are applied on the input data to
identify local features such as edges, corners, or textures. This results in
feature maps that highlight certain features while maintaining spatial
relationships inside the input. This mechanism allows the network to recognize
patterns anywhere in the image. Since it automatically learns hierarchical
features from data, convolutional layers reduce the need for handcrafted
feature engineering, and thus CNNs are very effective for tasks such as object
detection and image classification.
Activation Function
Activation functions bring
non-linearity in the network, allowing complex relationships in the data. ReLU
is one among the most widely used activation functions in CNNs, replacing all
negative values by zeros in the feature map and thus keeping the efficiency of
the network in computational processes and solving the issue of vanishing
gradients as well. Applying ReLU after the convolution operations ensures that
the network learns deep, complex patterns and becomes adept at generalizing in
many different datasets.
Pooling Layers
Pooling layers decrease the
complexity of the data by down-sampling feature maps, summarizing areas to keep
the most vital information. The most basic pooling methods are max and average
pooling, which is a selection of the max value in a region, and average
pooling, which calculates the average. These layers reduce the spatial
dimensionality, thus lowering computation requirements and reducing
overfitting. This abstraction process retains the important characteristics and
discards the less critical details, so the network can focus on the larger
patterns and relationships within the input data.
Fully Connected Layers
Fully connected layers are the
integrative components of the network. They connect all neurons from the
previous layer to each neuron in the next. These layers combine the features
extracted by earlier layers to make comprehensive predictions. Processing all
features as a single vector, fully connected layers enable the network to
classify data, assign probabilities to categories, or perform regression tasks.
Their role will determine the final output of the network based on what they
have learned.
Output Layer
The output layer is the last stage of
a CNN; here, the network creates its output. According to the problem, the
task, it can either output class labels for the case of classification problems
or, in object detection problems, bounding boxes. Output may be further
processed through activation functions like SoftMax in case of multi-class
classification problems or sigmoid for the binary classification problems. This
layer is the fruit of learning from the network, where the abstract features
are translated into meaningful and actionable results for the application at
hand.
Applications of CNNs
- Object Detection: Object detection in images and videos was widely
performed using CNN-based models, such as the YOLO (You Only Look Once)
technique and Faster R-CNN.
- Medical Imaging: CNNs assist in identifying abnormalities like tumors or
lesions in medical scans, improving diagnostic accuracy.
- Autonomous Vehicle: This is used to identify lane, pedestrian and
traffic sign.
- Image Classification: AlexNet, VGGNet, and ResNet are benchmarks for
classifying images into categories with high precision.
LITERATURE REVIEW
Fuentes et al. [1] proposed a deep
learning-based framework for real-time tomato disease and pest detection by
using three types of detectors: R-CNN, R-FCN, and SSD. The authors tested
several CNN architectures such as AlexNet, VGG16, GoogLeNet, ZFNet, ResNet-50,
ResNet-101, and ResNetXt-101. The integration of data augmentation techniques
resulted in around 30% improvement in mAP. It was found that traditional CNN
models, VGG16 and ResNet50, were better performing than deeper architectures
such as ResNet-101. However, issues such as class imbalance and false positives
were not resolved. The small sample sizes caused classes with high variability
of patterns, such as white fly and leaf Mold, to be misclassified and reduced
the accuracy.
To improve the accuracy in detection,
Fuentes et al. [2] introduced a bank of one-class CNN classifiers to the Faster
R-CNN by drastically reducing false positives while attaining an map of 96.25%.
Although this increased performance, the learning bias, due to imbalanced data
during training, still emerged and resulted in inter-and intra-class variations
that actually restricted the overall accuracy to be attained. Therefore, to
overcome these constraints, Fuentes et al. [3] proposed a better approach that
attained 96.25% of mAP. However, it still has to be proved that its
cost-effectiveness holds good in large-scale cultivation.
Ahmad et al. [4] utilized fine-tuned
CNNs like VGG16, VGG19, ResNet, and InceptionV3 for classifying six disease
types of the tomato leaf dataset consisting of lab-controlled and field images.
InceptionV3 got an accuracy of 99.60% for lab-controlled images and 93.70% for
field images, showing further need for optimization to bring up the performance
in the real field conditions.
Chen et al. [5] presented a framework
for tomato leaf disease detection by using Binary Wavelet Transform (BWTR) and
Retinex for image enhancement and the Both-channel Residual Attention Network
model (B-ARNet) for image identification. The accuracy of identification was
88.43% on images captured in natural light. The dataset was insufficient for
generalization, especially in identifying multiple diseases on the same leaf.
Furthermore, the complexity involved in preprocessing was not practically
applicable in real-time processes (Zhang et al., 2023).
Khan and Narvekar [5] developed a CNN
model customised for classifying tomato plant diseases especially targeted
early blight and late blight. This resulted in an overall accuracy of 97.25%.
They used a set comprising images from diverse sources: Plant Village, internet
images, and real-world photos taken at Tansa Farm under uncontrolled
conditions. However, the dataset was highly idealized and mostly dominated by Plant
Village images, which are more representative of controlled laboratory
conditions than real-world scenarios. In addition, the study only considered
two types of diseases, and the dataset needs to be expanded to include a wider
range of diseases for better generalization.
Liu and Wang [6] proposed an improved
YOLOv3 algorithm for detecting tomato diseases and pests. The improvements
involved multi-scale feature detection from image pyramids, dimension grouping
of object bounding box, and multi-scale training strategies. Their proposed
model reached a detection rate of 92.39% that surpassed SSD, Faster R-CNN, and
the original YOLOv3. While their model was trained with images under field
conditions, misclassifications were unavoidable sometimes due to classification
that relies on pathogens and interclass similarities in lesion features among
different stages (Cheng et al., 2022).
Natarajan et al. [7] experimented
with three architectures of deep learning: Faster R-CNN, R-FCN, and SSD, to
identify the tomato pests and diseases on field data. The mAP achieved with the
combination of ResNet and the Faster R-CNN architecture was the highest at
80.95% while surpassing other methods. However, due to being a two-stage object
detection method, it required enormous computational resources to process the
candidate regions, and therefore, not much suitable for real-time applications.
In addition, training data came from images obtained from limited regions in
India, suggesting that greater spatial diversity might be required for improving
model robustness.
Ouhami et al. [8] assessed how
effective deep learning models would be, particularly DenseNet-161,
DenseNet-121, and VGG-16 with the use of transfer learning on standard RGB
images for identifying tomato crop diseases. The experimentation used a dataset
of classified images of diseased leaves of plants into six different
categories: insect attacks, plant diseases, and others. The dataset was earlier
prepared by El Massi et al. (2016). Among the models tested, DenseNet-161 achieved
the highest accuracy of 95.65%, followed by DenseNet-121 at 94.93% and VGG-16
at 90.58%. However, the dataset was limited in size and lacked diversity,
necessitating the inclusion of additional tomato disease samples from various
regions to improve the model's accuracy and generalizability.
Sharma et al. [9] developed two CNN
models to classify tomato diseases using a dataset encompassing nine disease
categories and healthy leaves. The first was F-CNN, which used full images of
leaves with contrasting backgrounds and at different disease intensities. The
second model was S-CNN, which focused on separately taken images where regions
of interest showed the disease symptoms. The S-CNN outperformed the F-CNN and
attained 98.6% accuracy on an independent dataset even with ten disease
classes. However, when pictures had symptoms from several diseases, the
algorithm found it hard to perform, implying that the image segmentation needs
more improvement. In addition, the datasets consisted mainly of images of the Plant
Village plant which is far from most actual scenarios.
Fuentes et al. [10] proposed the
"control to target classes" paradigm for improving the performance of
their deep learning-based detector with changing greenhouse conditions. They
achieved a recognition rate of 93.37% mAP for target classes during inference
by incorporating a larger dataset with more classes and samples than previous
studies (Fuentes et al., 2017, 2018). However, data imbalance emerged as the
major limitation, affecting the system's ability to generalize to real
greenhouse scenarios. Sufficient data that could reflect all the possible
attributes are very important for improvement of the model's real-life
application performance (Fuentes et al., 2021).
Khatoon et al. [11] have attempted to
develop an integrated system that is able to identify major agricultural
concerns in real time with high accuracy. Researchers have used various deep
learning models for identifying and predicting diseases due to infections,
pests, and nutritional deficiencies. A large collection of tomato leaf and
fruit images was used to train a variety of CNN models. The performance of two
different network designs was compared: Shallow Net, a shallow network trained
from scratch, and a cutting-edge deep learning network fine-tuned via transfer
learning. Dense Net performed consistently well in their studies, with an
accuracy score of 95.31% on the test dataset. However, the dataset utilized was
not large enough, revealingimbalances.
Wang et al. [12] proposed an enhanced
YOLOv3-tiny model for real-time detection of tomato diseases and pests in
natural environments, particularly under challenging conditions like occlusion
and overlapping leaves. Their findings showed that the model achieved mAP
values of 98.3%, 92.1%, and 90.2% under conditions of deep separation, debris
occlusion, and leaf overlapping, respectively. While the results were
promising, the approach still requires further refinement, especially for
complex scenarios with multiple overlapping features. Additionally, the dataset
needs to be expanded to include a broader variety of plant diseases and pests
to improve its generalizability.
Wang et al. [13] enhanced the YOLOv3
model to improve early detection of tomato pests and diseases in complex
backgrounds. The study evaluated nine common tomato diseases and pests under
six different background conditions, achieving an F1-score of 94.77% and a mAP
of 91.81%. This demonstrates the model's suitability for large-scale
applications, such as using video images from the Agricultural Internet of
Things for early disease identification. However, the model faces challenges
with detecting small or occluded objects, as well as inaccuracies in detection
frame placement, indicating the need for further optimizations. Wang and Liu
(2021) introduced YOLO-Dense, a variant of YOLOv3, to address the complexities
of detecting tomato abnormalities in natural settings. The model incorporated a
dense connection module to improve inference speed and utilized a multiscale
training strategy to enhance object detection across various sizes. YOLO-Dense
achieved a mAP of 96.41%, surpassing SSD, Faster R-CNN, and the original
YOLOv3. Despite its strong performance, the dataset requires significant
expansion to include diverse and genuine tomato disease samples from various
global regions for reliable validation.
METHODOLOGY
The Kaggle Tomoto Leaf Dataset was
used to classify a plant/leaf picture into ten categories:
'Tomato_mosaic_virus', 'Early_blight', 'Septoria_leaf_spot', 'Bacterial_spot',
'Target_Spot', 'Spider_mites Two spotted_spider_mite',
'Tomato_Yellow_Leaf_Curl_Virus', 'Late_blight', 'Healthy', and 'Leaf_Mold'.
Data
Collection
- Dataset Preparation:
Collect a diverse dataset of tomato leaf images with various diseases and
healthy leaves. Include common diseases like early blight, late blight, leaf
mold, etc.
- Annotation:
Label the images using annotation tools such as LabelImg. Bounding boxes
should be drawn around diseased areas, with labels corresponding to the
type of disease.
B.
Data Preprocessing
- Image Augmentation:
Apply transformations to increase dataset diversity (e.g., flipping,
rotation, scaling, brightness adjustment). This helps the model generalize
better.
- Splitting Dataset:
Divide the dataset into training, validation, and test sets, typically in
a 70-20-10 ratio.
C.
Model Architecture
- YOLOv3 Selection: Use
YOLOv3 due to its balance between speed and accuracy. YOLOv3 divides the
input image into grids and simultaneously predicts bounding boxes and
class probabilities.
- Pretrained Weights:
Start with pretrained YOLOv3 weights (e.g., on COCO dataset) to leverage
transfer learning, which can reduce training time and improve performance.
D.
Model Training
- Framework and Tools: Use
frameworks like TensorFlow, PyTorch, or Darknet to implement YOLOv3.
- Custom Configuration:
- Modify the YOLOv3
configuration file to adapt it for the number of classes (number of
tomato leaf diseases + 1 for healthy class).
- Adjust anchors to match the
sizes of the bounding boxes in your dataset.
- Training Parameters:
- Set an appropriate learning
rate (e.g., 0.001 initially).
- Use a batch size based on
the GPU capacity (e.g., 16 or 32).
- Configure epochs based on
the dataset size (e.g., 50-100 epochs).
- Loss Function:
YOLOv3 uses a combination of localization loss, confidence loss, and
classification loss.
RESULT AND
ANALYSIS
This work
proposes an early detection approach by applying the YOLOv3 model on finding
tomato gray leaf spots based on precision and real-time detection. The method
applied involves Generalized Intersection over Union, a loss function improving
the precision of bounding box regression in estimating the detection of gray
leaf spots. For better generalization, a lightweight YOLOv3 network is
pre-trained by utilizing mix-up training and transfer learning. The model's
performance is compared on images captured under four different conditions, and
comparative statistical analysis is carried out to evaluate the efficacy of the
network. The accuracy of the recognition is evaluated in terms of the F1 score
and Average Precision (AP) metrics, and results are considered with respect to
Faster R-CNN and SSD models. The experiment showed that with the proposed
model, detection performance improved significantly. In sufficient lighting
without any leaf occlusion the test datasets were captured the model achieves
the F1 score of 95.13%, the AP value is 93.53% and an average Intersection over
Union of 88.92%. Across the all test sets the achieved the F1 score along with
the AP value stands to be 93.24% and 91.32%, respectively while achieving an
average IoU of 83.98%. In addition, the model shows a high efficiency - 246
frames per second on the GPU and one 416 × 416 image is extrapolated for just
16.9 milliseconds. These results imply that the model can efficiently detect tomato
leaf diseases precisely in real time in applied agricultural settings.
Figure 2. Result of tomato leaf detection
CONCLUSION
The use of
YOLOv3 for detecting tomato leaf diseases offers a robust and innovative
solution for addressing plant health challengesThe system recognizes and
classifies diseases directly from images of tomato leaves with a high accuracy
and speed by using advanced deep learning techniques. Its real-time processing
capabilities make it extremely valuable for timely intervention, as farmers can
respond promptly and effectively to emerging threats to prevent the further
damage of crops. Automating the disease detection process reduces not only the
manual effort and expertise that have traditionally been associated with this
process but also improves accuracy, reducing human error. This technology may
significantly impact agricultural practices because it enhances productivity
and eliminates dependence on blanket pesticide application, which is often
unproductive and harmful to the environment. Targeted treatments can now be
applied, thus increasing the efficiency and sustainability of farming.
Future
Scope
The future of leaf disease detection
lies in advancing datasets, technologies, and applications to enhance generalizability
and practical usability. Diversifying datasets with diverse samples from
various regions and conditions will enhance the robustness and accuracy of
models, ensuring reliability across different scenarios. Incorporating
multilingual systems will make disease diagnosis accessible to a broader
audience, thereby bridging language barriers for global farmers. The
integration of Internet of Things (IoT) devices and drones is potentially
large-scale monitoring which enables real-time disease detection over vast
agricultural fields. Models optimized for low power and lightweight will be
easily deployed on portable platforms that even resource-constrained areas will
have access to. Estimation of disease severity might also be possible through
such systems, helping the farmers prioritize interventions based on the urgency
of the situations. Furthermore, the ability of these systems to adapt to multiple
crops and other plant diseases will open up wider agricultural applications.
The linkage of detections with AI-driven treatment recommendations provides
actionable insights that reduce dependency on human expertise. Finally,
integration into precision agriculture systems can help develop comprehensive
crop management solutions, supporting sustainable farming practices. These
innovations align with smart farming innovations that promise to transform
traditional farming and contribute toward global food security.
REFERENCES