The new Neural Processing Unit boosts performance fourfold and enhances power efficiency, supporting complex machine learning applications.
Arm has introduced the Ethos-U85, their latest Neural Processing Unit (NPU). This NPU increases performance four times and enhances power efficiency compared to its predecessor, ranging from 128 to 2048 Multiply-Accumulate (MAC) units, essential for specific machine learning tasks. The Ethos-U85 is designed for applications like factory automation and smart cameras, offering a toolchain that allows partners to utilize previous investments for development. It supports AI frameworks such as TensorFlow Lite and PyTorch.
As AI on edge devices becomes more complex, silicon designers face challenges in system complexity and performance expectations. At the same time, software developers need smooth, unified development environments and easy integration with new AI frameworks and libraries.
The Ethos-U85 supports Transformer Networks and Convolutional Neural Networks (CNNs), enhancing AI inference at the edge. These technologies enable new applications in vision and generative AI, such as improved video analytics, image enhancement, and real-time image classification and object detection. When used with Arm’s Armv9 Cortex-A CPUs, the Ethos-U85 drives machine learning tasks and extends edge inference across various hardware platforms in industries like industrial machine vision, wearables, and robotics.
Some of the Key features of the Ethos-U85 include:
- Configurations from 128 to 2048 MACs/cycle – 256 GOPS/s to 4 TOP/s at 1GHz.
- Support for int8 weights and int8 or int16 activations.
- Support for transformer architecture networks, as well as CNNs and RNNs.
- Hardware native support for 2/4 sparsity for double throughput.
- Internal SRAM of 29 to 267 KB and up to six 128-bit AXI5 interfaces.
- Support for weight compression with both standard and fast-weight decoders.
- Support for extended compression.
The Ethos-U85 includes native hardware support for transformer networks and DeeplabV3 semantic segmentation network operations such as TRANSPOSE, GATHER, MATMUL, RESIZE BILINEAR, and ARGMAX. It also supports element wise operator chaining, which combines an elementwise operation with a previous operation to save SRAM from having to write and then read the intermediate tensor. This improves NPU efficiency by reducing data transfer between the NPU and memory.
For more information, click here.