Wafer scale is back

Author: EIS Release Date: Aug 20, 2019


Wafer scale is back with US start-up Cerebras Systems unveiling an 8 inch by 9 inch wafer scale device designed for AI applications.

Co-founder and Chief Hardware Architect of  Cerebras,  Sean Lie, shows off the device below:

In AI, chip size is profoundly important, says Cerebras. Big chips process information more quickly, producing answers in less time. Reducing the time-to-insight, or “training time,” allows researchers to test more ideas, use more data, and solve new problems.

Google, Facebook, OpenAI, Tencent, Baidu, and many others argue that the fundamental limitation to today’s AI is that it takes too long to train models. Reducing training time removes a major bottleneck to industry-wide progress.

“Designed from the ground up for AI work, the Cerebras WSE contains fundamental innovations that advance the state-of-the-art by solving decades-old technical challenges that limited chip size—such as cross-reticle connectivity, yield, power delivery, and packaging,” says Andrew Feldman, co-founder and CEO of Cerebras Systems, “every architectural decision was made to optimize performance for AI work. The result is that the Cerebras WSE delivers, depending on workload, hundreds or thousands of times the performance of existing solutions at a tiny fraction of the power draw and space.”

These performance gains are accomplished by accelerating all the elements of neural network training. A neural network is a multistage computational feedback loop. The faster inputs move through the loop, the faster the loop learns or “trains.” The way to move inputs through the loop faster is to accelerate the calculation and communication within the loop.

With an exclusive focus on AI, the Cerebras Wafer Scale Engine accelerates calculation and communication and thereby reduces training time.

The approach is straightforward and is a function of the size of the WSE: With 56.7 times more silicon area than the largest graphics processing unit, the WSE provides more cores to do calculations and more memory closer to the cores so the cores can operate efficiently.

Because this vast array of cores and memory are on a single chip, all communication is kept on-silicon. This means the WSE’s low-latency communication bandwidth is immense, so groups of cores can collaborate with maximum efficiency, and memory bandwidth is no longer a bottleneck.

The 46,225 square millimeters of silicon in the Cerebras WSE house 400,000 AI-optimized, no-cache, no-overhead, compute cores and 18 gigabytes of local, distributed, superfast SRAM memory as the one and only level of the memory hierarchy.

Memory bandwidth is 9 petabytes per second. The cores are linked together with a fine-grained, all-hardware, on-chip mesh-connected communication network that delivers an aggregate bandwidth of 100 petabits per second.

More cores, more local memory and a low latency high bandwidth fabric together create the optimal architecture for accelerating AI work.

The oblong wafer scale device is reminiscent of Gene Amdahl’s Trilogy chip. Trilogy raised $290 million but could never get the technology to work.

The Cerebras device is water cooled via a cold plate attached above it containing multiple water pipes.

Cerebras claims the chip has been sold to customers, is already running customer workloads and can reduce the time it takes to process some complex data from months to minutes.

Among individual investors in Cerebras are: Sam Altman, Founder OpenAI and YCombinator; Andy Bechtolsheim, Founder Sun Microsystems, Granite, Arista and DSSD; David “Dadi” Perlmutter, Former EVP, Chief Product Officer, GM Intel Architecture Group; Pradeep Sindhu, Founder Juniper Networks; Ilya Sutskever, Founder and Chief Scientist OpenAI; Lip-Bu Tan, CEO Cadence and Fred Weber Former CTO AMD.

Among VCs investing in the company are: Coatue, Benchmark, Altimeter, Vy Capital, Foundation Capital and Eclipse.

Cerebras has raised $120 million and has 150 engineers. The company’s most recent valuation was at $860 million.