Updated: More on: Maxim’s AI chip for battery-powered products also adds Risc-V

Author: EIS Release Date: Oct 19, 2020


Earlier today, Maxim announced an AI processing chip for battery-powered devices needing convolutional neural networks (CNNs).

MAX78000-maxims-view-of-CNN-AI
What the company has done, is to put custom CNN processing hardware alongside a conventional 100MHz Arm Cortex-M4F core – 4F is the floating point M4 – and squeezed in the added surprise of a 60MHz 32Bit Risc-V core for low power signal processing – all on a chip called MAX78000, part of Maxim’s Darwin family.
MAX78000-blockCompared with running a given CNN on a Cortex-M4F, the company is claiming that its custom neural net processor consumes 1,100x less power on the MNIST benchmark and 600x less power while keyword spotting. On top of this, it is claiming a 400x speed-up on MNIST and 200x faster keyword spotting compared with a 96MHx Cortex-M4F.
“The entire neural network accelerator is designed specifically for AI processing,” maxim director of microcontrollers Kris Ardis told Electronics Weekly. “In particular we’ve tried to really help convolutional neural networks run smoothly.  There is hardware support for 1D and 2D convolutions, activation functions – ReLU, Abs, pooling, and much more. Once started, the accelerator runs independently of the microcontroller cores – the ARM and RISC-V cores aren’t involved except to configure the network and load data. The CNN accelerator is a state machine that will execute the neural network on its own and isn’t in any way an extension of a microcontroller. It is a big portion of the chip, and it’s a highly optimised hardware engine made to execute networks in as little energy as possible.”

The neural net accelerator is offers:

  • 442,000 8bit weight capacity with weights of 1, 2, 4 or 8bit – so 3.5 million 1bit weights
  • Per layer weight length choice
  • Programmable input image size up to 1,024 x 1,024 pixels
  • Programmable network depth to 64 layers
  • Programmable per layer network width up to 1,024 channels
  • 1 and 2 dimensional convolution processing
  • A streaming mode

Support for other networks including MLP (multi-layer perceptron) and recurrent neural networks
“The CNN weight memory is sram-based, so AI network updates can be made on the fly,” accordinng to the company. “The CNN engine also has 512kbyte of data memory, and the architecture is flexible allowing networks to be trained in conventional tool-sets like PyTorch and TensorFlow, then converted for execution on the MAX78000 using tools provided by Maxim.”
MAX78000-audio-appSpoken keyword analysis using <40% of resources
What can the chip do?
The company has provided some examples of what the CNN accelerator can do (see photos), and the comparison with the Cortex-M4F above, but pinning down performance is difficult because embedded AI is in its infancy.
“There aren’t any common benchmarks of metrics for AI in the embedded space,” said Ardis. “Things like pJ/MAC can be misleading because it might not include energy spent on data movement, which we’ve found to be significant and was a major focus of our design to minimise. One comparison method that seems to be gaining popularity in the embedded space is a relative comparison to a pure software implementation: if you run the same neural network on a Cortex M4F or Cortex M7, then run it on your device, how much faster and how much lower energy are you?  This is the path we’ve chosen to use in our communications because we think it is the most meaningful for the embedded universe. Of course, the best way to communicate is to share actual measured latency and energy numbers on a given network.”
How about power consumption per task?
Ardis said a focus on energy is more realistic: “That actually takes the clock rate and power out of the equation a bit. We’re a digital architecture, so to a first order, if you halve the clock rate, then you halve the power, but the energy stays about the same – minus some static current that doesn’t scale with clock rate. Another difficulty in benchmarks in the AI space is that you might say ‘I can run an MNIST using 4uJ’ while another person might have an MNIST implementation at 8uJ. You still don’t know which one is better because the accuracy of the lower energy network might be poor.  Network architecture is an important factor in understanding these performance metrics, just as they are in the software universe.”
And what about the rest of the silicon?
While the neural network accelerator is the only thing that is specifically made for AI processing, the other peripherals in the chip are intended to participate in connecting the CNN to the real world, according to Ardis.
“The micro cores not only provide system control, but are meant to get outside world data from the peripherals into the CNN as efficiently as possible,” he said. “The I/O peripherals are meant to provide plenty of options to connect to external devices: cameras, audio inputs, sensors, etc. The security blocks are intended to make sure new neural network updates are authenticated, and that any data or results can be communicated safely.”
MCU section

  • 100MHz Arm Cortex-M4F (has floating point unit)
  • 512kbyte flash
  • 128kbyte ram
  • 16kbyte instruction cache
  • Ram error correction code (ECC-SEC-DED)
  • 60MHz 32Bit RISC-V co-processor
  • Up to 52 GPIO pins
  • 12bit parallel camera interface
  • I2S digital audio interface

MAX78000-face-appImage analysis using <40% of resources
For security, there are optional blocks including: secure boot, AES 128/192/256 hardware acceleration and a true random number generation for seed generation.
For battery operation, the device works from between 2.0 and 3.6V, and includes its own single-inductor multiple output (SIMO) dc-dc converter to produce appropriate rails for the internal logic and dynamic voltage scaling to minimise core dissipation. Power can be as low as 22.2μA/MHz (Arm core only, While loop execution at 3.0V from cache). Low power modes are available with coices on sram retention and real-time clock operation.
Packaging is 8 x 8mm 81pin CTBGA (0.8mm pitch) or 4.6 x 3.7mm 130pin WLP (0.35mm pitch).
What AI applications are reasonably foreseen for this device?
“We’ve had conversations and engagements with customers across many segments: industrial, consumer, medical, payment and others,” said Ardis. Each new customer conversation seems to bring us a new use case for the technology.  I’m especially excited about the vision and audio applications that we are engaged with – counting things like people and pills, and recognising things like animals, events or people – because we’re bringing new functionality to applications since these are things embedded systems and microcontrollers could never really achieve before.”
The company’s list of potential applications includes: object detection and classification, spoken multi-keyword recognition, sound classification, noise cancellation, facial recognition, heart rate analysis, heath sign analysis, multi-sensor analysis and predictive maintenance.