Researchers use sparse convolutional neural network (CNN) processor architecture and algorithms that enable seamless integration of CNN models on edge devices.
With the advancements in computing and storage technologies, computational services are migrating from cloud-based computing to the ‘edge’ allowing algorithms and data to be processed locally on the device itself. This kind of architecture enables faster and smarter Internet of Things (IoT) applications that perform complex tasks such as image processing.
Convolutional Neural Networks (CNNs) have enabled various Artificial Intelligence (AI) based applications such as image processing. CNNs are the standard approach for performing such complex tasks. Accurate CNNs involve hundreds of layers and thousands of channels, resulting in increased computation time and memory use. Therefore, implementing CNNs on low-power edge devices is challenging because of resource requirements.Â
Researchers from Tokyo Institute of Technology have solved this problem by efficiently sparse CNN processor architecture and training algorithms that enable seamless integration of CNN models on edge devices.
Sparse CNNs involve removing parts that do not enhance or signify the model’s performance. This approach reduces the computation cost significantly while still maintaining the model accuracy. But sparse techniques limit reusability and result in irregular data structures, making them inefficient for real-world settings.
Researchers from Tokyo Institute of Technology have developed a 40nm sparse CNN chip that achieves both high accuracy and efficiency, using a Cartesian-product MAC (multiply and accumulate) array, and pipelined activation aligners that shifts the set of input/output values onto regular Cartesian MAC array.
“Regular and dense computations on a parallel computational array are more efficient than irregular or sparse ones. With our novel architecture employing MAC array and activation aligners, we were able to achieve dense computing of sparse convolution,” says Prof. Ando, the principal researcher, explaining the significance of the study. He adds, “Moreover, zero weights could be eliminated from both storage and computation, resulting in better resource utilization.” The findings will be presented at the 33rd Annual Hot Chips Symposium.
“The proposed architecture and its efficient sparse CNN training algorithm enable advanced CNN models to be integrated into low-power edge devices. With a range of applications, from smartphones to industrial IoTs, our study could pave the way for a paradigm shift in edge AI,” Prof. Motomura says.