

# Architectural Considerations for CPU and Network Interface Integration

C.D. Cranor, R. Gopalakrishnan, P. Z. Onufryk

AT&T Labs - Research Florham Park, NJ {chuck,gopal,pzo}@research.att.com



## Communications Processor ≠ CPU + NIC

## **Communications Processors**

#### Single Chip

- Die size: 25 to 100 mm<sup>2</sup>

#### Shared Functionality

- Simple data link interfaces
- Shared DMA controller

#### Low Latency

- Single bus
- Tightly integrated memory controller

#### Small Buffers

 Motorola MPC860T 10/100 Ethernet interface has 448 bytes total for TX/RX FIFOs

#### Small Burst Transfers

## **PCs/Workstations**

#### System

- CPU die size 75 to 500 mm<sup>2</sup>
- Popular ATM 25/155 SAR, 50 mm<sup>2</sup> in 0.5 μm
- Intel 82559 10/100 Ethernet controller, 34 mm<sup>2</sup> in 0.35 μm

#### Replicated Functionality

- Standalone network interfaces
- Each NIC has its own DMA controller

#### High Latency

- Multiple buses (CPU/memory and I/O)
- Bus bridges

#### Large Buffers

- AMD Am79C975 10/100 Ethernet controller has 12K bytes total for TX/RX FIFOs
- Large Burst Transfers

## **Recipe for a Communications Processor**

- Define system architecture
- Develop application specific blocks
- License CPU core
  - Popular choices: MIPS, ARM and PowerPC
- License commodity blocks
  - 10/100 Ethernet MAC
  - USB
  - HDLC
- Design a multi-channel DMA controller
- Tie the whole system together with on-chip bus
  - Emerging standards (e.g., AMBA, CoreConnect, IP Bus) will simplify this task









## **Data Movement**

- Problems with PIO
  - Twice bus bandwidth of fly-by DMAs
  - Inefficient unaligned address transfers
  - Difficult to generate burst transfers
  - Pollutes data cache
  - Ties up CPU

### UNUM integrates data movement instructions into CPU

- Data movement instructions support:
  - Fly-by transfers
  - Unaligned address transfers
  - Burst transfers
- Software implementation of data movement allows:
  - Customization of descriptor structure and information
  - Interface specific processing
- Concurrent execution with CPU pipeline possible (like multiplier)
  - Allows CPU to execute as long as no cache misses







# Conclusions

### UNUM - Add CPU features for data movement and interface specific processing

- Eliminates DMA
  - Speeds time-to-market
  - Provides "full featured" data movement
- Increases flexibility
- Fast events + data movement ➡ fly-by processing
  - Soft interfaces (e.g., ATM SAR)
  - Applications: Encryption, coding, overload control, packet classification, packet telephony
- UNUM targeted at low-cost SOC designs for consumer devices
  - May not be appropriate for other applications
    - PCs/workstations or parallel processors/SANs
    - High speed packet processors for switches and routers