Hsieh Department Of Electrical Engineering

Device, Circuit, and Architecture Challenges for Super Conducting Chips
Full Day Tutorial on Super Conducting Chips
HPCA 2019
Feb 17th 2019


Prof. Murali Annavaram, University of Southern California
Prof. Coenrad Fourie, Stellenbosch University, South Africa
Prof. Akira Fujimaki, Nagoya University, Japan
Prof. Massoud Pedram, University of Southern California

Who should attend:

  • Ever wondered about post-CMOS technology solutions to build processors?
  • What is superconductive electronics (SCE) and how does it work?
  • How can we build a processor to run at 30GHz (and still not burn the house down)?
  • What is the principle behind the operation of Josephson junctions (JJ)?
  • What should a micro-architect know about SCE?
If the above questions intrigue you then you are the audience for this tutorial.

Why Superconductive Electronics ?:

Superconductive electronics (SCE) based on single flux quantum (SFQ) family of logic cells has appeared as a potent and within-reach Òbeyond-CMOSÓ technology. With proven switching speeds in 100Õs of GHz and energy dissipation approaching Joules per transition (and lower for the adiabatic family), it is one of the most promising post-CMOS technologies that can break the current performance limit of 4 or so GHz CMOS processors, delivering a 30GHz single-threaded performance for a SCE processor. There are several challenges on the technology, CAD, design tools and the (micro) architecture front to bring SCE processors to the market.

List of SCE challenges:

This tutorial is aimed at bringing SCE design challenges to the forefront of processor design community and introduce them to the recent advances in the field.

To list a few of the challenges:

  • The state-of-the-art in terms of libraries, simulation and analysis, compact modeling, synthesis, physical design of SFQ-based logic is far behind that of CMOS, with semi-manual design of 16-bit SFQ adders, simple filters and ADCs, and bit-serial processors defining the state-of-the-art.

  • These chips operate at cryogenic temperatures and their power source is the biasing currents. Managing and providing large amounts of current to a microprocessor is a non-trivial challenge.

  • There is no notion of a wire as understood in traditional CMOS. These devices must use specialized superconductive transmission lines (i.e., active ones comprising Josephson junction (JJ) and inductors or passive ones requiring full strip lines with associated drivers and receivers) to transport data and control bits across the chip.

  • In an SFQ design most circuit elements (including all logic cells) are clocked elements i.e., each logic cell becomes a pipeline stage and hence even simple circuit blocks such as 64-bit adders may have a dozen pipeline stages. As a result, while the pipelined throughput may not suffer, dependent instructions may encounter the full impact of large execution latencies.

  • For correct operation of a logic cell, it is typically necessary for different inputs of the cell to arrive at the cell input in the same clock cycle. Hence, when different inputs take different path lengths to traverse they must be explicitly path balanced using clocked padding delays (i.e., D flip-flops).

  • Splitting an output signal to more than 2 or 3 (fanout) paths is a fundamentally challenging task and hence large fanout designs are infeasible (or at the least very inefficient), requiring clockless splitter trees with immediate adverse impact on the peak clock frequency.

  • Cells are physically large (order of 40um by 50um) and hence worst-case transmission line lengths can be in range of mm, resulting in large delays even at speeds of signal propagation which is a third to half the speed of light.

  • To support traditional architectural pipelining, one must utilize two different clocks, a fast micropipeline clock applied to all logic cells (say at 50GHz) and a slower macropipeline clock applied to architectural pipeline registers (say at 10GHz); this is a challenging task, requiring advanced clock network synthesis and clock distribution.

  • All on-chip memories are designed as flip flops and hence the size of the memory is a stringent constraint. At the same time, read outs are generally destructive and hence flip flops lose their value once read. On the other hand, non-destructive read out flops are much more area and power hungry. Hence, our reliance on producing a data output and storing it in registers for multiple reads needs rethinking of data and computation reuse strategies.
Topics covered in the tutorial:

This tutorial provides a comprehensive understanding of the following topics.

  • First, it provides a basic overview of SCE SFQ technology starting from JJ device operation and the primary characteristics of these devices, particularly from a microarchitecture view point. It also explains power efficiency of the superconductive electronics in spite of the energy dissipation needed to cool these circuit to cryogenic temperatures.

  • The tutorial then provides how these devices may be modeled and simulated to understand their operation at various operating conditions.

  • The device modeling will be followed by a discussion on the compact modeling of logic cells and superconductive transmission lines, to specialized logic synthesis, clock tree synthesis, bias distribution, and place&route engines.

  • Circuit and layout techniques relating to path balancing and large fanout challenges will be presented.

  • Interfacing issues related to feeding inputs from a conventional room-temperature processor to a cryogenic superconductive co-processor and vice versa.

  • Finally, it goes on to cover architecture design challenges that emanate from the SCE technology and SFQ logic families, including very small cache sizes, gate-level pipelining, high execution latencies, etc.

Presenter Bios:

Prof. Massoud Pedram, USC:

Prof. Pedram leads the ColdFlux project at USC, a large multi-university and multi-country IARPA supported effort. The goal of the ColdFlux project is to enable very large scale integration (VLSI) design of Superconductive Electronics (SCE) as a step toward the development of energy-efficient, scalable high performance computers. Toward this end, the ColdFlux team seeks to develop a comprehensive set of open-source EDA tools to enable VLSI design and verification of SCE from at least from the register-transfer level (RTL) description to mask tooling data, demonstrate open-source physics-based technology computer-aided design (CAD) tools to enable device and process simulations and device parameter extractions for better design-to-hardware fidelity, and develop open, interoperable (cell) library formats and example cell libraries.

Prof. Pedram's interests span the areas of computer-aided-design (CAD) of VLSI circuits and systems with emphasis on developing methodologies and techniques for low power design, dynamic power management in electronic systems, smart battery technology and design, and quantum computing.

Prof. Coenrad Fourie, Stellenbosch University:

Prof. Fourie's group's primary research is Superconductive electronics / Applied superconductivity with focus on circuit extraction / device simulation / compact modeling / logic dell design / layout synthesis.

Prof. Murali Annavaram, USC:

Relating to this tutorial, Prof. Annavaram’s group works in the area of tackling the microarchitecture challenges of superconducting chips, such as tackling destructive read outs in register files, lack of data forwarding paths etc.,

Prof. Pascal Febvre, Universite Savoie Mont Blanc:

Prof. Febvre primary research deals with the design and experimental characterization of analogue and digital superconducting electronics devices and systems, focusing in particular on JJ device modeling and simulation and development of high frequency and magnetometry applications.