How to classify crop types and crop-growing phases based on satellite imagery?
ML and DL algorithms
fitting data into equation
deriving equation from data
theory driven
data driven
The theory-driven approach is rooted in the principles of established scientific theories and concepts. It involves forming hypotheses based on these theories and then designing experiments or studies to test these hypotheses. For example, Newton's laws of motion were conceptualized based on theoretical reasoning and then validated through numerous experimental tests. The strength of a theory-driven approach lies in its basis in established scientific principles, giving the results credibility and allowing them to be integrated into broader theoretical frameworks. Its limitations, however, may become apparent when dealing with highly complex systems where the theory may not be comprehensive or completely accurate.
The data-driven approach shines in situations where data collection is significantly more cost-effective than physical experimentation or direct observation. In this approach, massive amounts of data are gathered and analyzed with the help of algorithms, which can identify patterns and correlations without the need for pre-existing theories or hypotheses. For example, predicting crop types based on satellite imagery using machine learning algorithms can be much more cost-effective than physically inspecting each field. These algorithms can make sense of large amounts of data, identifying patterns that might not be visible to the human eye, and making accurate predictions based on these patterns. The data-driven approach excels in its ability to handle vast quantities of data and uncover unexpected patterns, providing novel insights. However, it might not always provide the underlying causal relationships or be as easily interpretable, as it's primarily focused on correlations in the data.
Theory-Driven Approach
Data-Driven Approach
supervised learning
unsupervised learning
The term "Artificial Intelligence" is coined by John McCarthy at the Dartmouth Conference.
The "nearest neighbor" rule was formulated, a foundational concept for what would become known as the field of pattern recognition in machine learning.
The concept of backpropagation begins to be explored, but it doesn't get much attention until the mid-1980s.
Yann LeCun developed the LeNet-1, one of the earliest convolutional neural networks, which was used to recognize handwritten numbers on cheques.
Support Vector Machines (SVMs) are developed, providing a robust approach for supervised learning tasks.
The term "Deep Learning" is introduced to the machine learning community by Hinton and Salakhutdinov.
AlexNet, a convolutional neural network designed by Krizhevsky, Sutskever, and Hinton, achieves a top-5 error rate of 15.3% in the ImageNet 2012 challenge, significantly better than previous designs, bringing convolutional neural networks and deep learning to the forefront.One of the most standard display colors in terms of the pixel is RGB, and this composition provides natural blend color look.
The clustering data aims to cluster one paddy rice from the image, and three scenarios could segment it correctly.
This processing step uses a simple if-else statement to cluster the rice paddy, showing as black masks.
Cutoff >= 200
Based on intensity distribution, we can design various cutoffs for simple clustering data. This traditional technique is simple but prone to error when applied across domains or changing a dataset.
One of the most simple techniques of classtering data is to use an if-else statement to create generalized rules to group the scattering data. The previous work uses a simple cutoff of 0.5-1.3 to cluster the rice paddies. However, when the data cover a large area, they need to increase the tree depth (creating more conditions).
yes
no
find the shortest paths between each data point and centroid.
yes/no condition, the program will end if the shortest distances between centroids and data points equal the threshold (near 0)
QC Data
Processing
Data
Algorithm Selection
Model
Traning
Model
Evaluation
Our goal for this project is to develop a machine learning model that accurately identifies rice paddies in satellite imagery. We'll use the XGBoost algorithm, a supervised learning technique based on decision trees. To achieve this, we'll label different sections of the image to train the model. This training process will teach the model to correctly categorize each segment of the image, determining whether it represents a rice paddy or not.
ML is a data-driven approach requiring big data to construct an algorithm, utilizing some parts of input data for training and validation, and this technique calls splitting data. So which part of the data should be used for training and validation then?
training = A+B+C+D
validation = E
test = F
click
Which section of the data should utilize as
training,
val, and
test?
Or is there
a better way!
So which input data plus preprocessing is the best?
hold-out
k-fold
repeated k-fold
stratified k-fold
XGBOOST contains several parameters that require trial-out and error to find the best parameter set. The most straightforward technique is grid search which tries all parameters. The random search uses an intelligent method to select a potential test set—both methods consume a lot of trail-out and error. The current method is Bayesian optimization, which can automatically narrow the optimal range of the test set.
best window
input
data
iteration
2
iteration
1
iteration
k
Which iterated function has the lowest sum of values?
input
data
iteration
2
iteration
1
iteration
k
Forecasting
Predicting crop yields based on previous seasons and current growth indicators.
Estimating water requirements for upcoming days or weeks based on forecasted weather patterns and historical crop water usage.
Anomaly Detection
Identifying areas of a field that might be under stress due to pests, diseases, or water-logging.
Spotting unexpected changes in land usage or identifying fallow land.
Classification
Classifying types of crops in a region based on the spectral signature from satellite imagery.
Distinguishing between healthy and stressed plants based on their reflectance properties.
Clustering
Grouping similar regions or fields based on crop types, farming practices, or soil health.
Segmenting regions based on their response to different weather events.
Change Point Detection
Identifying shifts in land use, like the transition from one crop to another or from agricultural land to non-agricultural land.
Detecting onset of specific agricultural events like irrigation, flowering, or harvesting.