The Tensor Unit for AI, offers computational strength, handles matrix tasks, and integrates with vector-supported systems while maintaining energy efficiency.
Semidynamics has launched the RISC-V Tensor Unit designed for high-speed AI applications, leveraging its adaptable 64-bit cores. Modern machine learning models, like LLaMa-2 or ChatGPT, boast billions of parameters and necessitate computational capabilities to tune several trillion operations every second. Balancing this immense processing demand with energy efficiency is a formidable task in hardware engineering. Addressing this issue is the Tensor Unit, which was crafted to offer unparalleled computational strength for AI applications with insatiable performance demands. The primary calculations in Large Language Models (LLMs) are centred around fully connected layers, optimally executed as matrix multiplications. With its design honed for matrix multiplication tasks, the Tensor Unit offers a significant performance surge for AI endeavours.
The Tensor Unit is founded on Semidynamics’ RVV1.0 Vector Processing Unit, utilising the pre-existing vector registers for matrix storage. This integration allows the Tensor Unit to cater to layers demanding matrix multiplication capabilities, like Fully Connected and Convolution layers while employing the Vector Unit for activation function layers. This advancement is notable, especially compared to independent Neural Processing Units (NPUs) that often grapple with managing activation layers. Furthermore, since the Tensor Unit employs the vector registers for data storage and steers clear of introducing new architecturally visible states, it integrates flawlessly with any RISC-V vector-supported Linux without necessitating modifications.
Semidynamics’ CEO and founder, Roger Espasa, said, “This new Tensor Unit is designed to fully integrate with our other innovative technologies to provide solutions with outstanding AI performance. First, at the heart, is our 64-bit, fully customisable RISC-V core. Then our Vector Unit, which is constantly fed data by our Gazzillion technology so that there are no data misses. And then the Tensor Unit that does the matrix multiplications required by AI. Every stage of this solution has been designed to be fully integrated with the others for optimal AI performance and very easy programming. The result is a performance increase of 128x compared to just running the AI software on the scalar core. The world wants super-fast AI solutions and that is what our unique set of technologies can now provide.”