Arm’s Cortex-R82 embedded processor for big-memory products

Author: EIS Release Date: Sep 14, 2020


Arm has announced its first 64bit, Linux-capable Cortex-R processor, designed for computational storage solutions.

Cortex-R82

Called the Cortex-R82, it is a successor to the 32bit Cortex-R5 and Cortex-R8 processors used in solid-state drives.

“These systems have historically required less then 4Gbyte of DRAM and addressable space, and have not had a need to run Linux,” according to the company. “With continually increasing storage capacities and performance requirements to saturate increasing throughput of storage host interfaces, the 4Gbyte limit and inability to run Linux are adding complexity, and in some cases, becoming barriers.”

And moving to 64bit gives native addressing capability beyond 4Gbyte.

In this case, it offers 40bit addressing for up to 1Tbyte.

As a real-time processor, the new core retains Cortex-R deterministic response times and low-latency ports, as well as having ports for peripherals and memories including tightly-coupled memories (TCMs) and caches.

Compared with Cortex-R8, R82 shows a 74-125% performance increase running ‘customer code benchmarks’, according to Arm, and 21% uplift over Cortex-A55 when running SPECINT2006 benchmarks, and 23% improvement on SPECFP2006.

An optional memory management unit (MMU) is available for Linux.

“In traditional Cortex-R real-time behaviour,”, said Arm, “a Cortex-R82 core can be configured with a memory protection unit to run bare metal and RTOS. In Cortex-R82, that same core can also be configured with an optional MMU to allow a high-level operating system, like Linux, to execute. Both the real-time and MMU contexts can be handled by the same core simultaneously, or selected cores in a cluster can be dedicated to real-time or Linux. This choice is handled by software, and can even be changed dynamically.”

Having Linux support “paves the way for simplified computational storage architectures and flexible SoC designs that can reallocate compute resources dynamically based upon changing workloads or different products,” said Arm. “Cortex-R82 leverages the Arm Linux ecosystem. Linux, or any other high-level operating system, that today work on Arm Cortex-A series processors, will seamlessly work on Cortex-R82.”

For machine learning applications, “that will be at the heart of computational storage applications”, said Arm, Cortex-R82 optionally supports Arm’s Neon. The company’s Compute Library and NN (neural network) library can be accelerated by Neon – perhaps to search for a specific image in a drive full of images.

The company has estimated a typical four-core cluster implementation of the Cortex-R82 processor on mainstream low-power 5nm process with standard-performance cell libraries.

If each core has:

  • 32kbyte L1 instruction cache
  • 32kbyte L1 data cache
  • 32kbyte of ITCM
  • 32kbyte of DTCM
  • floating-point and SIMD engine

And the cluster has 1Mbyte L2 shared cache.

Then maximum clock frequency will be above 1.8GHz and 3.41 / 4.32 / 8.67DMIPS/MHz will be available, 30DMIPS/mW, or 5.82CoreMark/MHz. Total area including cluster, cores, ram and routing will be as small as 2mm2 – the company puts caveats on all these figures here.