LeapMind offers AI inference accelerator IPElectronic Components Distributor

Home » Technologies » LeapMind offers AI inference accelerator IP

LeapMind offers AI inference accelerator IP

Author: EIS Release Date: Apr 29, 2020

In the autumn, LeapMind, the Tokyo AI startup, plans to ship “Efficiera”, an ultra-low power AI inference accelerator IP for ASICs and FPGAs.

“Efficiera” is an AI Inference Accelerator IP specialized for Convolutional Neural Network inference calculation processing; it functions as a circuit in an FPGA or ASIC device.

Its “extreme low bit quantization” technology, which minimizes the number of quantized bits to 1–2 bits, does not require cutting-edge semiconductor manufacturing processes or the use of specialized cell libraries to maximize the power and space efficiency associated with convolution operations, which account for a majority of inference processing.

This product enables the inclusion of deep learning capabilities in various edge devices that are technologically limited by power consumption and cost, such as consumer appliances (household electrical goods), industrial machinery (construction equipment), surveillance cameras, and broadcasting equipment as well as miniature machinery and robots with limited heat dissipation capabilities.

LeapMind is simultaneously launching several related products and services: “Efficiera SDK,” a software development tool providing a dedicated learning and development environment for Efficiera, the “Efficiera Deep Learning Model” for efficient training of deep learning models, and “Efficiera Professional Services,” an application-specific semi-custom model building service based on LeapMind’s expertise that enables customers to build extreme low bit quantized deep learning models applicable to their own unique requirements

The power required for convolutional processing is reduced by minimizing the amount of data transfer and the number of bits.

Performance: The number of calculation cycles can be reduced by reducing the calculation logic, thereby improving calculation performance relative to area and on a per cycle basis.
Space savings: The silicon area is reduced while maintaining performance by reducing the calculation logic using 1–2 bit quantization; thus, the area per computing unit is minimized.

In general, the use of numerical expressions with wide bit ranges of 16 or 32 bits (FP16 and FP32 data types) improves the accuracy of inferential results; however, it also increases the size of calculation circuits (area) and both processing time and power consumption.

Conversely, when the bit width is reduced and a numerical expression with low bit width of 1–2 bits is used, the circuit scale is reduced and both processing time and power consumption are reduced; however, this results in a decrease in the accuracy, which is an issue when attempting to reduce power and area. However, using extreme low bit quantization technology,

LeapMind provides extreme quantization such as a 1-bit Weight (weight coefficient) and 2-bit Activation (Intermediate data), while maintaining accuracy and achieving a significant reduction in the model area, thereby maximizing speed, power efficiency, and space efficiency.

Back