IEEE/ACM 2022 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN 41st Edition

### A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components

<u>Maico Cassel dos Santos<sup>1,\*</sup>,</u>Tianyu Jia<sup>2,\*</sup>, Martin Cochet<sup>3</sup>, Karthik Swaminathan<sup>3</sup>, Joseph Zuckerman<sup>1</sup>, Paolo Mantovani<sup>1</sup>, Davide Giri<sup>1</sup>, Jeff Jun Zhang<sup>2</sup>, Erik Jens Loscalzo<sup>1</sup>, Gabriele Tombesi<sup>1</sup>, Kevin Tien<sup>3</sup>, Nandhini Chandramoorthy<sup>3</sup>, John-David Wellman<sup>3</sup>, David Brooks<sup>2</sup>, Gu-Yeon Wei<sup>2</sup>, Kenneth Shepard<sup>1</sup>, Luca Carloni<sup>1</sup>, Pradip Bose<sup>3</sup>

\* This authors have equal contributions

<sup>1</sup>Columbia University <sup>2</sup>Harvard University <sup>3</sup>IBM Research

# Technology limitations and heterogeneity



John L. Hennessy, David A. Patterson, Communications of the ACM, February 2019

|             | 4.65               |                    |                 |  |
|-------------|--------------------|--------------------|-----------------|--|
| 10          | Acc.               | Acc.               | Acc.            |  |
| Acc.        | RISC-V<br>(Ariane) | (Ariane)           |                 |  |
| FFT<br>Acc. | RISC-V<br>(Ariane) | RISC-V<br>(Ariane) | MEM             |  |
| FFT<br>Acc. |                    | LLC<br>MEM         | Viterbi<br>Acc. |  |
|             | C Routing          |                    |                 |  |

The EPOCHS-0 Heterogeneous Multi-core SoC



# The need for design-reuse





# A light at the end of the tunnel?



# Physical design flow key properties:



### Flexibility

Smooth integration of OSH Adaptable to different technologies and EDA tools



### Robustness

Achieves QoR metrics Verifies correctness in all implementation stages



### Scalability

Handles growth in size and complexity with a sublinear growth in computation infrastructure, engineering effort and design time



# Our methodology for agile chip development





# Outline

- ESP: background and enhancements for ASIC design
- Tile-based physical design flow
- SoC physical integration
- Verification and testing
- Experience with two chip designs
- Conclusion



NoC routers











































# Outline

• ESP: background and enhancements for ASIC design

- Tile-based physical design flow
- SoC physical integration
- Verification and testing
- Experience with two chip designs
- Conclusion







• Top-level floorplan



Custom shapes



• Top-level floorplan





Fixed shapes



• Top-level floorplan







#### **Regular shapes**



• Timing constraints



*NoC clock frequency* 

Ethernet external delays







Power Strategy

#### For a 13-layer process

| Layers                         | IP                | NoC              |
|--------------------------------|-------------------|------------------|
| M1-4                           | Vip/Vss           | Vnoc/Vss         |
| M5-9                           | Vip/Vmem/<br>Vss  | Vnoc/Vip/<br>Vss |
| M10-11                         | Vip/Vmem/Vnoc/Vss |                  |
| M12-M13 reserved for top level |                   |                  |

#### For a 6-layer process

| Layers | IP           | NoC      |
|--------|--------------|----------|
| M1     | Vip/Vss      | Vnoc/Vss |
| M2-4   | Vip/Vss      | Vnoc/Vss |
| M5-6   | Vip/Vnoc/Vss |          |



Power Strategy

#### For a 13-layer process

| Layers                         | IP                | NoC              |
|--------------------------------|-------------------|------------------|
| M1-4                           | Vip/Vss           | Vnoc/Vss         |
| M5-9                           | Vip/Vmem/<br>Vss  | Vnoc/Vip/<br>Vss |
| M10-11                         | Vip/Vmem/Vnoc/Vss |                  |
| M12-M13 reserved for top level |                   |                  |

#### For a 6-layer process

| Layers | IP           | NoC      |
|--------|--------------|----------|
| M1     | Vip/Vss      | Vnoc/Vss |
| M2-4   | Vip/Vss      | Vnoc/Vss |
| M5-6   | Vip/Vnoc/Vss |          |



28

Power Strategy

#### For a 13-layer process

| Layers                         | IP                | NoC              |
|--------------------------------|-------------------|------------------|
| M1-4                           | Vip/Vss           | Vnoc/Vss         |
| M5-9                           | Vip/Vmem/<br>Vss  | Vnoc/Vip/<br>Vss |
| M10-11                         | Vip/Vmem/Vnoc/Vss |                  |
| M12-M13 reserved for top level |                   |                  |

#### For a 6-layer process

| Layers | IP           | NoC      |
|--------|--------------|----------|
| M1     | Vip/Vss      | Vnoc/Vss |
| M2-4   | Vip/Vss      | Vnoc/Vss |
| M5-6   | Vip/Vnoc/Vss |          |



29

Power Strategy

#### For a 13-layer process

| Layers                         | IP                | NoC              |
|--------------------------------|-------------------|------------------|
| M1-4                           | Vip/Vss           | Vnoc/Vss         |
| M5-9                           | Vip/Vmem/<br>Vss  | Vnoc/Vip/<br>Vss |
| M10-11                         | Vip/Vmem/Vnoc/Vss |                  |
| M12-M13 reserved for top level |                   |                  |

#### For a 6-layer process

| Layers | IP           | NoC      |
|--------|--------------|----------|
| M1     | Vip/Vss      | Vnoc/Vss |
| M2-4   | Vip/Vss      | Vnoc/Vss |
| M5-6   | Vip/Vnoc/Vss |          |



Power Strategy

#### For a 13-layer process

| Layers                         | IP                | NoC              |
|--------------------------------|-------------------|------------------|
| M1-4                           | Vip/Vss           | Vnoc/Vss         |
| M5-9                           | Vip/Vmem/<br>Vss  | Vnoc/Vip/<br>Vss |
| M10-11                         | Vip/Vmem/Vnoc/Vss |                  |
| M12-M13 reserved for top level |                   |                  |

#### For a 6-layer process

| Layers | IP           | NoC      |
|--------|--------------|----------|
| M1     | Vip/Vss      | Vnoc/Vss |
| M2-4   | Vip/Vss      | Vnoc/Vss |
| M5-6   | Vip/Vnoc/Vss |          |



# Outline

- ESP: background and enhancements for ASIC design
- Tile-based physical design flow
- SoC physical integration
- Verification and testing
- Experience with two chip designs
- Conclusion



# SoC physical integration





# SoC physical integration

SoC Power Allocation





\*Vglobal and Vss are distributed globally













# SoC physical integration

• Timing Closure





Only need to constrain neighboring tiles!



# Outline

- ESP: background and enhancements for ASIC design
- Tile-based physical design flow
- SoC physical integration
- Verification and testing
- Experience with two chip designs
- Conclusion



- Focus on system-level
  - Integration of new components
  - System functionality
- Assumptions
  - OSH components are thoroughly verified
  - ESP NoC, sockets, buses, and platform services are pre-verified
- Four verification approaches
  - RTL simulation using bare-metal applications
  - Netlist simulation with simplified bare-metal applications
  - Full system FPGA emulation for longer tests
  - Logic Equivalence Checking





### **Bare-metal**



CPU executes bootloader from Aux tile





### **Bare-metal**

1

CPU executes bootloader from Aux tile

2 CPU executes program from main memory, sets up data for accelerator





### **Bare-metal**

1

2

3

CPU executes bootloader from Aux tile

CPU executes program from main memory, sets up data for accelerator

CPU configures and starts accelerator





### **Bare-metal**

1

3

4

- CPU executes bootloader from Aux tile
- 2 CPU executes program from main memory, sets up data for accelerator
  - CPU configures and starts accelerator
  - Accelerator performs DMA to main memory





### **Bare-metal**

1

3

5

- CPU executes bootloader from Aux tile
- 2 CPU executes program from main memory, sets up data for accelerator
  - CPU configures and starts accelerator
  - Accelerator performs DMA to main memory
  - CPU validates accelerator outputs





### **Bare-metal**

1

3

- CPU executes bootloader from Aux tile
- 2 CPU executes program from main memory, sets up data for accelerator
  - CPU configures and starts accelerator
  - Accelerator performs DMA to main memory
  - CPU validates accelerator outputs
- 6 CPU writes outputs through UART





### **Simplified bare-metal**

- CPU executes bootloader from Aux tile
- CPU executes program from main memory, sets up data for accelerator
- CPU configures and starts accelerator
- Accelerator performs DMA to main memory
- CPU validates accelerator outputs
- 6 CPU writes outputs through UART

The tile under test uses the netlist The rest of the system uses RTL





2

### **FPGA Emulation**

- Boot Linux OS
- Run Linux applications
- Emulates full system application
- Performance estimates

For longer tests









# Outline

- ESP: background and enhancements for ASIC design
- Tile-based physical design flow
- SoC physical integration
- Verification and testing
- Experience with two chip designs
- Conclusion





T. Jia et al. (ESSCIRC 2022)





M. Cassel dos Santos et al

PUTER-AIDE





• Top: 51 hours in 64-core 376 GB RAM machine

Top: 66 hours in 64-core 376 GB RAM machine





## Conclusion

- Scalable, flexible and robust methodology for agile chip development with open-source hardware components
- Fully push-button OSH integration for ASIC prototyping
- Minimizes manual steps of physical design flow
- Smooth top-level integration
- Limits design computer resources as design scales
- Comprehensive verification in all design steps
- Rapid, user-friendly testing environment
- Increased productivity as demonstrated by two complex tape-outs



IEEE/ACM 2022 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN 41st Edition

# Thank you!

Maico Cassel dos Santos mcassel@cs.columbia.edu