Why and how to run machine learning algorithms on edge devices

Intel’s Neural Compute Stick 2 is an example of machine learning hardware for edge devices. Source: Intel

Analyzing large amounts of data based on complex machine learning algorithms requires significant computational capabilities. Therefore, much processing of data takes place in on-premises data centers or cloud-based infrastructure. However, with the arrival of powerful, low-energy consumption Internet of Things devices, computations can now be executed on edge devices such as robots themselves. This has given rise to the era of deploying advanced machine learning methods such as convolutional neural networks, or CNNs, at the edges of the network for “edge-based” ML.

The following sections focus on industries that will benefit the most from edge-based ML and existing hardware, software, and machine learning methods that are implemented on the network edges.

Edge devices in healthcare

The need for on-device data analysis arises in cases where decisions based on data processing have to be made immediately. For example, there may not be sufficient time for data to be transferred to back-end servers, or there is no connectivity at all.

Intensive care is an area that could benefit from edge-based ML, where real-time data processing and decision making are important for closed-loop systems that must maintain critical physiological parameters, such as blood glucose level or blood pressure, within specific range of values.

As the hardware and machine learning methods become more sophisticated, more complex parameters can be monitored and analyzed by edge devices, like neurological activity or cardiac rhythms.

Another area that may benefit from edge-based data processing is “ambient intelligence” (AmI). AmI refers to edge devices that are sensitive and responsive to the presence of people. It could enhance how people and environments interact with each other.

Daily activity monitoring for elder people is an example of AmI. The main objective of the smart environment for assisted living is to quickly detect anomalies such as a fall or a fire and take immediate action by calling emergency help.

Edge devices include smart watches, stationary microphones and cameras (or those on mobile robots), and wearable gyroscopes or accelerometers. Each type of edge device or sensor technology has its advantages and disadvantages, such as privacy concerns for cameras or regular charging for wearables.

Mining, oil, and gas and industrial automation

The business value of edge-based ML becomes obvious in the oil, gas, or mining industry, where company employees work in sites far from populated areas, where connectivity is non-existent. Sensors on edge devices such as robots can capture large amounts of data and accurately predict things like as pressure across pumps or operating parameters outside their normal range of values.

Connectivity is also an issue in manufacturing, where predictive maintenance of machinery can reduce unnecessary costs and extend the life of industrial assets. Traditionally, factories take machinery offline at regular intervals, and they conduct full inspections as per the specifications of the equipment manufacturers. However, this approach is expensive and inefficient, and it does not consider the special operating conditions of every machine.

Alternatively, embedded sensors of all machines inside a factory or warehouse can take readings and apply deep learning to still images, video, or audio in order to identify patterns that are indicative of future equipment breakdown.

Edge devices and ML frameworks

The table below describes some of the most popular ML frameworks that run on edge devices. Most of these frameworks provide pre-trained models for speech recognition, object detection, natural language processing (NLP), and image recognition and classification, among others. They also give the option to the data scientist to leverage transfer learning or start from scratch and develop a custom ML model.

Popular ML frameworks for IoT edge devices

Framework name	Edge device requirements
TensorFlow Lite – Google	Android, iOS, Linux, microcontrollers (ARM Cortex-M, ESP32)
ML Kit for Firebase – Google	Android, iOS
PyTorch Mobile – Facebook	Android, iOS
Core ML 3 – Apple	iOS
Embedded Learning Library (ELL) – Microsoft	Raspberry Pi, Arduino, micro:bit
Apache MXNet – Apache Software Foundation (ASF)	Linux, Raspberry Pi, NVIDIA Jetson

TensorFlow Lite was developed by Google and has application programming interfaces [APIs] for many programming languages, including Java, C++, Python, Swift and Objective-C. It is optimized for on-device applications and provides an interpreter tuned for on-device ML. Custom models are converted in TensorFlow Lite format, and their size is optimized to increase efficiency.

ML for Firebase was also developed by Google. It targets mobile platforms and uses TensorFlow Lite, Google Cloud Vision API, and Android Neural Networks API to provide on-device ML features, such as facial detection, bar-code scanning, and object detection, among others.

PyTorch Mobile was developed by Facebook. The currently experimental release targets the two major mobile platforms and deploys on the mobile devices models that were trained and saved as torchscript models.

Core ML 3 comes from Apple and is the biggest update to Core ML since its original release, supporting several ML methods, especially related to deep neural networks.

ELL is a software library from Microsoft that deploys ML algorithms on small, single-board computers and has APIs for Python and C++. Models are compiled on a computer and then deployed and invoked on the edge devices.

Finally, Apache MXNet supports many programming languages (Python, Scala, R, Julia, C++, Clojure among others), where the python API offers most of the improvements on training models.

Edge device hardware

In most of real-life use cases, the tasks that edge devices are asked to complete are image and speech recognition, natural language processing, and anomaly detection. For tasks like these, the best machine algorithms fall under the area of deep learning, where multiple layers are used to deliver the output parameters based on the input.

Due to the nature of the deep learning algorithms that require large parallel matrix multiplications, the optimal hardware to use for the edge devices includes application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), RISC-based processors and embedded graphics processing units (GPUs).

Table 2 summarizes some popular edge devices with the corresponding hardware specs.

Popular edge devices and their hardware specs

Edge device	GPU	CPU	ML software support
Coral SoM – Google	Vivante GC7000Lite	Quad ARM Cortex- A53 + Cortex-M4F	TensorFlow Lite, AutoML Vision Edge
Intel NCS2	Movidius Myriad X VPU (not GPU)		TensorFlow, Caffe, OpenVINO toolkit
Raspberry Pi 4	VideoCore VC6	Quad ARM Cortex- A72	TensorFlow, TensorFlow Lite
NVIDIA Jetson TX2	NVIDIA Pascal	Dual Denver 2 64-bit + quad ARM A57	TensorFlow, Caffe
RISC-V GAP8			TensorFlow
ARM Ethos N-77	8 NPUs in cluster, 64 NPUs in mesh		TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX
ECM3531 A – Eta Compute	ARM Cortex-M3 + NXP CoolFlux DSP		TensorFlow, Caffe

Coral System-on-Module (SoM) by Google is a fully integrated system for ML applications that includes CPU, GPU, and Edge Tensor Processing Unit (TPU). The Edge TPU is an ASIC that accelerates execution of deep learning networks and is capable of performing 4 trillion operations (tera-operations) per second
(TOPS).

The Intel Neural Compute Stick 2 (NCS2) looks like a standard USB thumb drive and is built on the latest Intel Movidius Myriad X Vision Processing Unit (VPU), which is a system-on-chip (SoC) system with a dedicated Neural Compute Engine for accelerating deep-learning inferences.

Raspberry Pi 4 is a single-board computer based on the Broadcom BCM2711 SoC, running its own version of the Debian OS (Raspbian); ML algorithms can be accelerated if the Coral USB is connected to its USB 3.0 port.

NVIDIA Jetson TX2 is an embedded SoC used for deploying computer vision and deep learning algorithms. The company also offers the Jetson Xavier NX.

RISC-V GAP8 is designed by Greenwaves Technologies and is an ultra-low power, eight-core, RISC-V based processor optimized to execute algorithms used for image and audio recognition. Models have to be ported to TensorFLow via the Open Neural Network Exchange (ONNX) open format before deployed.

ARM Ethos N-77 is a multi-core Neural Processing Unit (NPU), part of the ARM Ethos, ML-focused family. It delivers up to 4 TOPs of performance and supports several ML algorithms used for image/speech/sound recognition.

ECM3531 is an ASIC by Eta Compute, based on the ARM Cortex-M3 architecture which is able to perform deep learning algorithms in very few milliwatts. Programmers can choose to run deep neural networks on the DSP, which reduces the power consumption even more.

Conclusions

Due to the limited memory and computation resources of edge devices, training large amounts of data on the devices is not feasible most of the times. The deep learning models are trained in powerful on-premises or cloud server instances and then deployed on the edge devices.

Developers can use several methods to tackle this issue: designing power-efficient ML algorithms, developing better and more specialized hardware, and inventing new distributed-learning algorithms where all IoT devices communicate and share data.

The last approach is limited by the network bandwidth, therefore future 5G networks, which provide ultra-reliable, low-latency communication services, will help immensely in the area of edge computing.

In addition, edge-based ML has been shown to enhance the privacy and security of the data sets that the edge devices capture, since they can be programmed to discard the sensitive data fields. Overall system response times are improved due to the edge devices processing the data, enriching them (by adding metadata) and then sending them to the backend systems.

I believe that further advances on the hardware of the devices and the design of the ML algorithms will bring innovations to many industries and will truly demonstrate the transformational power of edge-based machine learning.

About the author

Fotis Konstantinidis is managing director and head of AI and digital transformation at Stout Risius Ross LLC. He has more than 15 years of experience in data mining, advanced analytics, digital strategy, and integration of digital technologies in enterprises.

Konstantinidis started applying data mining techniques as a brain researcher at the Laboratory of Neuro-Imaging at UCLA, focusing on identifying data patterns for patients with Alzheimer’s disease. He was also one of the leads in applying machine learning techniques in the field of genome evolution. Konstantinidis has implemented AI in a number of industries, including banking, retail, automotive, and energy.

Prior to joining Stout, Konstantinidis held leadership positions leading AI-driven products and services at CO-OP Financial Services, McKinsey & Co., Visa, and Accenture.