How did farming affect your day today? If you live in a city, you might feel disconnected from the farms and fields that produce your food. Agriculture is a core piece of our lives, but we often take it for granted.
Farmers today face a huge challenge — feeding a growing global population with less available land. The world’s population is expected to grow to nearly 10 billion by 2050, increasing the global food demand by 50%. As this demand for food grows, land, water, and other resources will come under even more pressure.
The variability inherent in farming, like changing weather conditions, and threats like weeds and pests also have consequential effects on a farmer’s ability to produce food. The only way to produce more food while using less resources is through agricultural robots that can help farmers with difficult jobs, offering more consistency, precision, and efficiency.
Agricultural robots use PyTorch
At Blue River Technology, we are building the next generation of smart machines. Farmers use our tools to control weeds and reduce costs in a way that promotes agricultural sustainability. Our weeding robot integrates cameras, computer vision, machine learning and robotics to make an intelligent sprayer that drives through fields (using AutoTrac to minimize the load on the driver) and quickly targets and sprays weeds, leaving the crops intact.
The machine needs to make real-time decisions on what is a crop and what is a weed. As the machine drives through the field, high-resolution cameras collect imagery at a high frame rate. We developed a convolutional neural network (CNN) using PyTorch to analyze each frame and produce a pixel-accurate map of where the crops and weeds are. Once the plants are all identified, each weed and crop is mapped to field locations, and the robot sprays only the weeds. This entire process happens in milliseconds, allowing the farmer to cover as much ground as possible since efficiency matters. The video above is a great See & Spray Video that explains the process in more detail.
To support the machine learning (ML) and robotics stack, we built an impressive compute unit, based on the NVIDIA Jetson AGX Xavier System on Module System On Module (SOM) AI on the edge computer. Since all our inference happens in real time, uploading to the cloud would take too long, so we bring the server farms to the field. The total compute power onboard the robot just dedicated to visual inference and spray robotics is on par with IBM’s super computer, Blue Gene (2007). This makes this a machine with some of the highest compute capacity of any moving machine machinery in the world!
Building weed detection models
My team of researchers and engineers is responsible for training the neural network model that identifies crops and weeds. This is a challenging problem because many weeds look just like crops. Professional agronomists and weed scientists train our labeling workforce to label the images correctly – can you spot the weeds below?
In the image below, the cotton plants are in green and the weeds are in red.
Machine learning stack uses PyTorch for training
On the machine learning front, we have a sophisticated stack. We use PyTorch for training all our models. We have built a set of internal libraries on top of PyTorch which allow us to perform repeatable machine learning experiments. The responsibilities of my team fall into three categories:
- Build production models to deploy onto the robots
- Perform machine learning experiments and research with the goal of continually improving model performance
- Data analysis / data science related to machine learning, A/B testing, process improvement, software engineering
We chose PyTorch because it’s very flexible and easy to debug. New team members can quickly get up to speed, and the documentation is thorough. Before working with PyTorch, our team used Caffe and Tensorflow extensively. In 2019, we made a decision to switch to PyTorch and the transition was seamless. The framework gives us the ability to support production model workflows and research workflows simultaneously. For example we use the torchvision library for image transforms and tensor transformations. It contains some basic functionality and it also integrates really nicely with sophisticated augmentation packages like imgaug. The transforms object in torchvision is a piece of cake to integrate with imgaug.
Below is a code example using the Fashion MNIST dataset. A class called Custom Augmentor initializes the iaa.Sequential object in the constructor, then calls augment_image() in the call method. CustomAugmentor() is then added to the call to transforms.Compose(), prior to ToTensor(). Now the train and val data loaders will apply the augmentations defined in CustomAugmentor() when the batches are loaded for training and validation.
Additionally, PyTorch has emerged as a favorite tool in the computer vision ecosystem (looking at Papers With Code, PyTorch is a common submission). This makes it easy for us to try out new techniques like Debiased Contrastive Learning for semi- supervised training.
On the model training front, we have two normal workflows: production and research. For research applications, our team runs PyTorch on an internal, on-prem compute cluster. Jobs being executed on the on-premise cluster are managed by Slurm, which is an HPC batch job based scheduler. It is free, easy to set up and maintain, and provides all the functionality our group needs for running thousands of machine learning jobs. For our production based workflows we utilize an Argo workflow on top of a Kubernetes (K8s) cluster hosted in AWS. Our PyTorch training code is deployed to the cloud using Docker.
Deploying models on field robots
For production deployment, one of our top priorities is high-speed inference on the edge computing device. If the robot needs to drive more slowly to wait for inferences, it can’t be as efficient in the fields. To this end, we use TensorRT to convert the network to an Xavier optimized model. TensorRT doesn’t accept JIT models as input so we use ONNX to convert from JIT to ONNX format, and from there we use TensorRT to convert to a TensorRT engine file that we deploy directly to the device. As the toolstack evolves, we expect this process to improve as well. Our models are deployed to Artifactory using a Jenkins build process and they are deployed to remote machines in the field by pulling from Artifactory.
To monitor and evaluate our machine learning runs, we have found the Weights & Biases platform to be the best solution. Their API makes it fast to integrate W&B logging into an existing codebase. We use W&B to monitor training runs in progress, including live curves of the training and validation loss.
SGD vs Adam Project
As an example of using PyTorch and W&B, I will run an experiment and compare the results of using different solvers in PyTorch. There are a number of different solvers in PyTorch – the obvious question is which one should you pick? A popular choice of solver is Adam. It often gives good results without needing to set any parameters and is our usual choice for our models. In PyTorch, this solver is available under torch.optim.adam.
Another popular choice of solver for machine learning researchers is Stochastic Gradient Descent (SGD). This solver is available in PyTorch as torch.optim.SGD. Momentum is an important concept in machine learning, as it can help the solver to find better solutions by avoiding getting stuck in local minima in the optimization space. Using SGD and momentum the question is this: Can I find a momentum setting for SGD that beats Adam?