# System Cache v1.00.a

**Product Guide** 

PG031 April 24, 2012



# 

# **Table of Contents**

#### **Chapter 1: Overview**

| Feature Summary      | 5  |
|----------------------|----|
| Applications         | 11 |
| Unsupported Features | 13 |
| Licensing            | 13 |

#### **Chapter 2: Product Specification**

| Standards Compliance | 14 |
|----------------------|----|
| Performance          | 14 |
| Resource Utilization | 16 |
| Port Descriptions    | 18 |
| Register Space       | 19 |

#### **Chapter 3: Customizing and Generating the Core**

| GUI              | 28 |
|------------------|----|
| Parameter Values | 30 |

#### **Chapter 4: Designing with the Core**

| 33 |
|----|
| 34 |
| 35 |
| 35 |
|    |

#### Appendix A: Additional Resources

| Xilinx Resources | 36 |
|------------------|----|
| Solution Centers | 36 |
| References       | 36 |

#### **E** XILINX.

| Technical Support                  | 37 |
|------------------------------------|----|
| Ordering Information               | 37 |
| Revision History                   | 38 |
| Notice of Disclaimer               | 38 |
| Automotive Applications Disclaimer | 38 |



## Introduction

The LogiCORE<sup>™</sup> System Cache provides system level caching capability to an AMBA<sup>®</sup> AXI4 system. The System Cache resides in front of the external memory controller and is seen as a Level 2 Cache from the MicroBlaze<sup>™</sup> processor point of view.

## Features

- Dedicated AXI4 slave ports for MicroBlaze
- Connects up to 4 MicroBlaze processors
- Generic AXI4 slave port for other AXI4 masters
- AXI4 master port connecting the external memory controller
- Highly configurable cache
- Optional AXI4-Lite Statistics and Control port

| Lo                                           | ogiCORE IP Facts Table                                                                                                              |  |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|--|
| Core Specifics                               |                                                                                                                                     |  |
| Supported<br>Device<br>Family <sup>(1)</sup> | Virtex <sup>®</sup> -6, Spartan <sup>®</sup> -6, Virtex-7, Kintex <sup>™</sup> -7<br>Artix <sup>™</sup> -7, Zynq <sup>™</sup> -7000 |  |
| Supported<br>User Interfaces                 | AXI4                                                                                                                                |  |
| Resources                                    | See Table 2-7                                                                                                                       |  |
|                                              | Provided with Core                                                                                                                  |  |
| Design Files                                 | VHDI                                                                                                                                |  |
| Example<br>Design                            | Not Provided                                                                                                                        |  |
| Test Bench                                   | Not Provideo                                                                                                                        |  |
| Constraints<br>File                          | Not Provided                                                                                                                        |  |
| Simulation<br>Model                          | Not Provideo                                                                                                                        |  |
| Supported<br>S/W Driver                      | N/4                                                                                                                                 |  |
|                                              | Tested Design Tools                                                                                                                 |  |
| Design Entry<br>Tools                        | Xilinx Platform Studio (XPS                                                                                                         |  |
| Simulation <sup>(2)</sup>                    | ModelSin                                                                                                                            |  |
| Synthesis<br>Tools <sup>(2)</sup>            | ISE 14.1                                                                                                                            |  |
|                                              | Support                                                                                                                             |  |
| Provided                                     | by Xilinx @ <u>www.xilinx.com/support</u>                                                                                           |  |

#### Notes:

- 1. For a complete listing of supported devices, see the <u>release</u> <u>notes</u> for this core.
- 2. For the supported versions of the tools, see the <u>ISE Design</u> <u>Suite 14: Release Notes Guide</u>.



# Overview

# **Feature Summary**

The System Cache can be added to an AXI system to improve overall system computing performance, regarding accesses to external memory. The System Cache is typically used in a MicroBlaze™ system implementing a Level 2 Cache with up to four MicroBlaze processors. The generic AXI4 interface provides access to the caching capability for all other AXI4 masters in the system.

#### Performance

The effect the System Cache has on performance is very system and application dependent. Application and system characteristics where performance improvements can be expected are:

- Applications with repeated access of data occupying a certain address range, for example, when external memory is used to buffer data during computations. In particular, performance improvements are achieved when the data set exceeds the capacity of the MicroBlaze internal data cache.
- Systems with small MicroBlaze caches, for example, when the MicroBlaze implementation is tuned to achieve as high frequency as possible. In this case, the increased system frequency contributes to the performance improvements, and the System Cache alleviates the performance loss incurred by the reduced size of the MicroBlaze internal caches.

### **Typical Systems**

In a typical system with one MicroBlaze processor, shown in Figure 1-1, the instruction and data cache interfaces (M\_AXI\_IC and M\_AXI\_DC) are connected to dedicated AXI4 interfaces optimized for MicroBlaze on the System Cache. The System Cache often makes it possible to reduce the MicroBlaze internal cache sizes, without reducing system performance. Non-MicroBlaze AXI4 masters are connected to the generic AXI4 slave interface of the System Cache through an AXI interconnect.



Figure 1-1: Typical System With a Single Processor

The System Cache can also be used in a system without any MicroBlaze processor, as illustrated in Figure 1-2.



Figure 1-2: System Without Processor

The System Cache has eight cache interfaces optimized for MicroBlaze, enabling direct connection of up to four MicroBlaze processors, depicted in Figure 1-3.



Figure 1-3: Typical System With Multiple MicroBlaze Processors

#### **MicroBlaze Optimized AXI4 Slave Interface**

The System Cache has eight AXI4 interfaces optimized for accesses performed by the cache interfaces on MicroBlaze. Because MicroBlaze has one AXI4 interface for the instruction cache and one for the data cache, this means that systems with up to four MicroBlaze processors are supported.

By only using a 1:1 AXI interconnect to directly connect MicroBlaze and the System Cache, access latency for MicroBlaze cache misses is reduced, which improves performance. The optimization to only handle the types of AXI4 accesses issued by MicroBlaze simplifies the implementation, saving area resources as well as improving performance. The data widths of the MicroBlaze optimized interfaces are parameterized to match the data widths of the connected MicroBlaze processors. With wide interfaces the MicroBlaze cache line length normally determines the data width.

The Optimized AXI4 slave interfaces are compliant to a subset of the AXI4 interface specification. The interface includes the subsequent features and exceptions:

- Support for 32-, 128-, 256-, and 512-bit data widths
- Support for some AXI4 burst types and sizes
  - No support for FIXED bursts
  - WRAP bursts corresponding to the MicroBlaze cache line length, that is, either 4 beats or 8 beats
  - Single beat INCR burst, or either 4 beats or 8 beats corresponding to the MicroBlaze cache line length
  - Exclusive accesses are treated as a normal accesses, never returning EXOKAY
  - Only support for native transaction size, that is, same as data width for the port
- Support for burst sizes that are less than the data width, with either 32-, 128-, 256-, or 512-bits
- AXI user signals are not necessary or supported
- All transactions executed in order regardless of thread ID value. No read reordering or write reordering is implemented.

#### **Generic AXI4 Slave Interface**

To handle several AXI4 masters in a system an AXI interconnect is used to share the single generic AXI4 slave interface on the System Cache. The generic AXI4 interface has a configurable data width to efficiently match the connected AXI4 masters. This ensures that both the system area and the AXI4 access latency are reduced.

The Generic AXI4 slave interface is compliant to the full AXI4 interface specification. The interface includes the subsequent features and exceptions:

- Support for 32-, 64-, 128-, 256-, and 512-bit data widths
- Support for all AXI4 burst types and sizes
  - FIXED bursts are handled as INCR type burst operations (no QUEUE burst capability)
  - 16 beats for WRAP bursts
  - 16 beats for FIXED bursts (treated as INCR burst type)
  - 256 beats for INCR burst
  - Exclusive accesses are treated as a normal accesses, never returning EXOKAY
- Support for burst sizes that are less than the data width, *narrow* bursts
- AXI user signals are not necessary or supported
- All transactions executed in order regardless of thread ID value. No read reordering or write reordering is implemented.

#### Memory Controller AXI4 Master Interface

The AXI4 master interface is used to connect the external memory controller. The data width of the interface can be parameterized to match the data width of the AXI4 slave interface on the memory controller. For best performance and resource usage, the parameters on the interface and the Memory Controller should match.

The Memory Controller AXI4 master interface is compliant to the AXI4 interface specification. The interface includes the subsequent features:

- Support for 32-, 64-, 128-, 256-, and 512-bit data widths
- Generates the following AXI4 burst types and sizes
  - 2 16 beats for WRAP bursts
  - 1 16 beats for INCR burst
- AXI user signals are not provided
- A single thread ID value is generated

#### **Cache Memory**

The Cache memory provides the actual cache functionality in the System Cache. The cache is configurable in terms of size and associativity.

The cache size can be configured with the parameter C\_CACHE\_SIZE according to Table 3-1. The selected size is a trade-off between performance and resource usage, in particular the number of Block RAMs.

The associativity can be configured with the parameter C\_NUM\_SETS according to Table 3-1. Increased associativity generally provides better hit rate, which gives better performance but requires more area resources.

The correspondence between selected parameters and used Block RAMs is listed in Table 2-7.

#### **Statistics and Control**

The optional Statistics and Control block can be used to collect cache statistics such as cache hit rate and access latency. The statistics is primarily intended for internal Xilinx use, but can also be utilized to tailor the configuration of the System Cache to meet the needs of a specific application.

The following types of statistics are collected:

- Port statistics for each slave interface
  - Total Read and Write transaction counts
  - Port queue usage for the six transaction queues associated with each port
  - Read and Write transaction latency
- Arbitration statistics
- Functional unit statistics
  - Cache hit rates for read and write
  - Stall cycles
  - Internal queue usage
- Port statistics for the master interface
  - Read and write latency

For details on the registers used to read statistics and control how statistics is gathered, see Chapter 2, Register Space.

# **Applications**

An example of an Ethernet communication system is given in Figure 1-4. The system consists of a MicroBlaze processor connected point-to-point to two optimized ports of the System Cache. A DMA controller is connected to the generic port of the System Cache through a 3:1 AXI interconnect, since the DMA controller has three AXI master ports. The DMA in turn is connected to the Ethernet IP by AXI4-Stream links. Standard peripheral functions like UART, timer, interrupt controller as well as the DMA controller control port are connected to the MicroBlaze peripheral data port (M\_AXI\_DP) for register configuration and control.

With this partitioning the bandwidth critical interfaces are connected directly to the System Cache and kept completely separated from the AXI4-Lite based configuration and control connections.



This system is used as an example throughout the documentation.

Figure 1-4: Ethernet System

In this example MicroBlaze is configured for high performance, while still being able to reach a high maximum frequency. The MicroBlaze frequency is mainly improved due to small cache sizes, implemented using distributed RAM.

The lower hit rate from small caches is mitigated by the higher system frequency and the use of the System Cache. The decreased hit rate in the MicroBlaze caches is compensated by cache hits in the System Cache, which incur less penalty than accesses to external memory.

Write-back data cache is enabled in MicroBlaze, which in the majority of cases gives higher performance than using the default write-through cache.

Finally victim cache is enabled for both the MicroBlaze instruction and data cache, which improves the hit rate by storing the most recently discarded cache lines.

All AXI data widths on the System Cache ports are matched to the AXI data widths of the connecting modules to avoid data width conversions, which minimizes the AXI Interconnect area overhead. The AXI 1:1 connections are only implemented as routing without any logic in this case.

All AXI ports are clocked using the same clock, which means that there is no need for clock conversion within the AXI interconnects. Avoiding clock conversion gives minimal area and latency for the AXI interconnects.

| Parameter              | Value |
|------------------------|-------|
| C_CACHE_BYTE_SIZE      | 512   |
| C_ICACHE_ALWAYS_USED   | 1     |
| C_ICACHE_LINE_LEN      | 8     |
| C_ICACHE_STREAMS       | 1     |
| C_ICACHE_VICTIMS       | 8     |
| C_DCACHE_BYTE_SIZE     | 512   |
| C_DCACHE_ALWAYS_USED   | 1     |
| C_DCACHE_LINE_LEN      | 8     |
| C_DCACHE_USE_WRITEBACK | 1     |
| C_DCACHE_VICTIMS       | 8     |

Table 1-1: MicroBlaze Parameter Settings for the Ethernet System

#### Table 1-2: System Cache Parameter Settings for the Ethernet System

| Parameter             | Value |
|-----------------------|-------|
| C_NUM_OPTIMIZED_PORTS | 2     |
| C_NUM_GENERIC_PORTS   | 1     |
| C_NUM_SETS            | 4     |
| C_CACHE_SIZE          | 65536 |
| C_M_AXI_DATA_WIDTH    | 32    |

## **Unsupported Features**

The System Cache provides no support for coherency between the MicroBlaze internal caches.

This means that software must ensure coherency for data exchanged between the processors. When the MicroBlaze processors use write-back data caches, all processors need to flush their caches to ensure correct data being exchanged. For write-through caches, it is only the processors reading data that need to flush their caches to ensure correct data being exchanged.

# Licensing

The System Cache IP core does not require a license key. The System Cache core is provided under the terms of the Xilinx End User License Agreement.



# **Product Specification**

# **Standards Compliance**

The System Core adheres to the AMBA<sup>®</sup> AXI4 Interface standard (see ARM<sup>®</sup> AMBA AXI Protocol Specification, Version 2.0 ARM IHI 002C).

# Performance

The perceived performance is dependent on many factors such as frequency, latency and throughput. Which factor that has the dominating effect is application specific. There is also a correlation between the performance factors, that is, achieving high frequency can add latency, wide datapaths for throughput can affect frequency etc.

#### **Maximum Frequencies**

The following are clock frequencies for the target families. The maximum achievable clock frequency can vary. The maximum achievable clock frequency and all resource counts can be affected by other tool options, additional logic in the FPGA, using a different version of Xilinx tools, and other factors.

| Architecture | Speed grade |      |       |      |      |      |
|--------------|-------------|------|-------|------|------|------|
| Architecture | (-1I)       | (-1) | (-2I) | (-2) | (-3) | (-4) |
| Spartan®-6   | 85          | N/A  | N/A   | 120  | 140  | 150  |
| Virtex®-6    | 170         | 170  | N/A   | 210  | 230  | N/A  |
| Artix™-7     | N/A         | 140  | 120   | 155  | 180  | N/A  |
| Kintex™-7    | N/A         | 175  | 175   | 210  | 240  | N/A  |
| Virtex-7     | N/A         | 170  | 170   | 210  | 240  | N/A  |

Table 2-1: Maximum Frequencies

#### Cache Latency

Read latency is defined as the clock cycle from the read address is accepted by the System Cache to the cycle when read data is available.

Write latency is defined as the clock cycle from the write address is accepted by the System Cache to the cycle when the response is valid.

The latency depends on many factors such as traffic from other ports, conflict with earlier transactions, etc. The numbers listed here assume a completely idle System Cache and no write data delay for transactions on one of the optimized ports.

For transactions using the Generic AXI port an additional two clock cycle latency is added.

| Туре            | Optimized Port Latency                                                                                                                                                        |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Read Hit        | 4                                                                                                                                                                             |
| Read Miss       | 6 + latency added by memory subsystem                                                                                                                                         |
| Read Miss Dirty | <ul> <li>Maximum of:</li> <li>6 + latency added by memory subsystem</li> <li>6 + latency added for evicting dirty data (cache line length * 32 / M_AXI Data Width)</li> </ul> |
| Write Hit       | 4 + burst length                                                                                                                                                              |
| Write Miss      | 6 + latency added by memory subsystem for writing data                                                                                                                        |

Table 2-2: System Cache Latencies for Optimized Port

The numbers for an actual application varies depending on access patterns, hit/miss ratio and other factors. Below are example values from a system running the iperf network testing tool with the LWIP TCP/IP stack in raw mode. Table 2-3 contains the hit rate for transactions from all ports. Table 2-4, Table 2-5 and Table 2-6 show per port latencies for the three active ports.

Table 2-3: Application Total Hit Rates

| Туре  | Hit rate |
|-------|----------|
| Read  | 99.8%    |
| Write | 83.8%    |

#### Table 2-4: System Cache Latencies for MicroBlaze D-Side Port

| Туре  | Min | Max | Average | Standard Deviation |
|-------|-----|-----|---------|--------------------|
| Read  | 4   | 198 | 9       | 5                  |
| Write | 6   | 301 | 21      | 7                  |

| Туре  | Min | in Max Average |     | Standard Deviation |
|-------|-----|----------------|-----|--------------------|
| Read  | 4   | 279            | 12  | 5                  |
| Write | N/A | N/A            | N/A | N/A                |

| Table 2-5: | System Cache Latencies for MicroBlaze I-Side Port |
|------------|---------------------------------------------------|
|------------|---------------------------------------------------|

Table 2-6: System Cache Latencies for Generic Port

| Туре  | Min | Max | Average | Standard Deviation |
|-------|-----|-----|---------|--------------------|
| Read  | 6   | 186 | 11      | 8                  |
| Write | 9   | 210 | 39      | 10                 |

#### Throughput

The System Cache is fully pipelined and can have a theoretical maximum transaction rate of one read or write hit data concurrent with one read and one write miss data per clock cycle when there are no conflicts with earlier transactions.

This theoretical limit is subject to memory subsystem bandwidth, intra-transaction conflicts and cache hit detection overhead, which will reduce the achieved throughput to less than three data beats per clock cycle.

### **Resource Utilization**

Resources required for the System Cache core have been estimated for the Kintex<sup>™</sup>-7 FPGA (Table 2-7). These values were generated using the Xilinx® ISE® tools, version 14.1. They are derived from post-synthesis reports, and might be changed by MAP and PAR.

|                       |                     | Feat       | De                  | vice Resour        | ces          |      |      |               |
|-----------------------|---------------------|------------|---------------------|--------------------|--------------|------|------|---------------|
| C_NUM_OPTIMIZED_PORTS | c_NUM_GENERIC_PORTS | C_NUM_SETS | c_s0_axi_data_width | c_m_axi_data_width | C_CACHE_SIZE | LUTs | FFs  | Block<br>RAMs |
| 1                     | 0                   | 2          | 32                  | 32                 | 32kB         | 1430 | 806  | 10            |
| 2                     | 0                   | 2          | 32                  | 32                 | 32kB         | 1745 | 913  | 10            |
| 4                     | 0                   | 2          | 32                  | 32                 | 32kB         | 2264 | 1110 | 10            |
| 8                     | 0                   | 2          | 32                  | 32                 | 32kB         | 3424 | 1497 | 10            |
| 0                     | 1                   | 2          | 32                  | 32                 | 32kB         | 1933 | 1133 | 10            |
| 2                     | 1                   | 2          | 32                  | 32                 | 32kB         | 2492 | 1348 | 10            |
| 2                     | 0                   | 4          | 32                  | 32                 | 32kB         | 2166 | 1083 | 9             |
| 2                     | 0                   | 2          | 32                  | 32                 | 64kB         | 1782 | 911  | 18            |
| 2                     | 0                   | 2          | 32                  | 32                 | 128kB        | 1785 | 908  | 34            |
| 2                     | 0                   | 2          | 32                  | 512                | 128kB        | 8038 | 2234 | 34            |
| 2                     | 0                   | 2          | 512                 | 512                | 128kB        | 8480 | 3078 | 34            |

Table 2-7: Kintex-7 System Cache FPGA Resource Estimates

## **Port Descriptions**

All System Cache interfaces are compliant with AXI4. The input signals ACLK and ARESET implement clock and reset for the entire System Cache.





| Table 2-8: | System | Cache | I/O | Interfaces |
|------------|--------|-------|-----|------------|
|------------|--------|-------|-----|------------|

| Interface Name      | Туре           | Description                       |
|---------------------|----------------|-----------------------------------|
| ACLK                | Input          | Clock for System Cache            |
| ARESETN             | Input          | Synchronous reset of System Cache |
| Sx_AXI <sup>1</sup> | AXI4 Slave     | MicroBlaze Optimized Cache Port   |
| SO_AXI_GEN          | AXI4 Slave     | Generic Cache Port                |
| M_AXI               | AXI4 Master    | Memory Controller Master Port     |
| S_AXI_CTRL          | AX4-lite Slave | Control port                      |

1. x = 0 - 7

# **Register Space**

All registers in the optional Statistics module are 64-bits wide. The address map is structure according to Table 2-9.

Table 2-9: Address Structure

|    | Category |    | Port Number |   | Functionality |   | Register |   | High/<br>Low | Alw<br>"0 | ays<br>0" |
|----|----------|----|-------------|---|---------------|---|----------|---|--------------|-----------|-----------|
| 16 | 14       | 12 | 10          | 9 | 5             | 4 |          | 3 | 2            | 1         | 0         |

The address coding of all functional units in the System Cache with statistic gathering capability is defined by Table 2-10.

| Address (binary)      | Category and Port<br>number | Description                                                                        |
|-----------------------|-----------------------------|------------------------------------------------------------------------------------|
| 0_0000_00xx_xxxx_xx00 | Optimized port 0            | All statistics for Optimized port #0 defined in Table 2-11 when used, 0 otherwise. |
| 0_0000_01xx_xxxx_xx00 | Optimized port 1            | All statistics for Optimized port #1 defined in Table 2-11 when used, 0 otherwise. |
| 0_0000_10xx_xxxx_xx00 | Optimized port 2            | All statistics for Optimized port #2 defined in Table 2-11 when used, 0 otherwise. |
| 0_0000_11xx_xxxx_xx00 | Optimized port 3            | All statistics for Optimized port #3 defined in Table 2-11 when used, 0 otherwise. |
| 0_0001_00xx_xxxx_xx00 | Optimized port 4            | All statistics for Optimized port #4 defined in Table 2-11 when used, 0 otherwise. |
| 0_0001_01xx_xxxx_xx00 | Optimized port 5            | All statistics for Optimized port #5 defined in Table 2-11 when used, 0 otherwise. |
| 0_0001_10xx_xxxx_xx00 | Optimized port 6            | All statistics for Optimized port #6 defined in Table 2-11 when used, 0 otherwise. |
| 0_0001_11xx_xxxx_xx00 | Optimized port 7            | All statistics for Optimized port #7 defined in Table 2-11 when used, 0 otherwise. |
| 0_0100_00xx_xxxx_xx00 | Generic port                | All statistics for the Generic port defined in Table 2-12 when used, 0 otherwise.  |
| 0_1000_00xx_xxxx_xx00 | Arbiter                     | Statistics available in arbiter stage defined in Table 2-13                        |
| 0_1100_00xx_xxxx_xx00 | Access                      | Statistics available in access stage defined in Table 2-14                         |
| 1_0000_00xx_xxxx_xx00 | Lookup                      | Statistics available in lookup stage defined in Table 2-15                         |
| 1_0100_00xx_xxxx_xx00 | Update                      | Statistics available in update stage defined in Table 2-16                         |

Table 2-10: System Cache Address Map, Category and Port Number Field

| Address (binary)      | Category and Port<br>number | Description                                                 |
|-----------------------|-----------------------------|-------------------------------------------------------------|
| 1_1000_00xx_xxxx_xx00 | Backend                     | Statistics available in backend stage defined in Table 2-17 |
| 1_1100_00xx_xxxx_xx00 | Reserved                    | Reserved                                                    |

| Table 2-10: | System Cache Address Map | , Category and Port Number Field (Cont'd) |
|-------------|--------------------------|-------------------------------------------|
|-------------|--------------------------|-------------------------------------------|

The address decoding of the MicroBlaze<sup>™</sup> optimized ports statistics functionality is according to Table 2-11.

| Address (binary)      | Functionality                  | R/W | Statistics<br>Format | Description                                                                                                                |
|-----------------------|--------------------------------|-----|----------------------|----------------------------------------------------------------------------------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Read Segments                  | R   | COUNT <sup>1</sup>   | Number of segments per read transaction                                                                                    |
| x_xxxx_xx00_001x_xx00 | Write Segments                 | R   | COUNT <sup>1</sup>   | Number of segments per write transaction                                                                                   |
| x_xxxx_xx00_010x_xx00 | RIP                            | R   | QUEUE <sup>2</sup>   | Read Information Port queue statistics                                                                                     |
| x_xxxx_xx00_011x_xx00 | R                              | R   | QUEUE <sup>2</sup>   | Read data queue statistics                                                                                                 |
| x_xxxx_xx00_100x_xx00 | BIP                            | R   | QUEUE <sup>2</sup>   | BRESP Information Port queue statistics                                                                                    |
| x_xxxx_xx00_101x_xx00 | BP                             | R   | QUEUE <sup>2</sup>   | BRESP Port queue statistics                                                                                                |
| x_xxxx_xx00_110x_xx00 | WIP                            | R   | QUEUE <sup>2</sup>   | Write Information Port queue statistics                                                                                    |
| x_xxxx_xx00_111x_xx00 | W                              | R   | QUEUE <sup>2</sup>   | Write data queue statistics                                                                                                |
| x_xxxx_xx01_000x_xx00 | Read Blocked                   | R   | COUNT <sup>1</sup>   | Number of cycles a read was<br>prohibited from taking part in<br>arbitration                                               |
| x_xxxx_xx01_001x_xx00 | Read Latency                   | R   | COUNT <sup>1</sup>   | Read latency statistics                                                                                                    |
| x_xxxx_xx01_010x_xx00 | Write Latency                  | R   | COUNT <sup>1</sup>   | Write latency statistics                                                                                                   |
| x_xxxx_xx01_011x_xx00 | Read Latency<br>Configuration  | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 0. Available modes are<br>defined in Table 2-25. |
| x_xxxx_xx01_100x_xx00 | Write Latency<br>Configuration | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 4. Available modes are<br>defined in Table 2-26. |

1. See Table 2-18 for the COUNT register fields.

2. See Table 2-19 for the QUEUE register fields.

3. See Table 2-20 for the LONGINT register fields.

The address decoding to the statistics functionality in the Generic ports is according to Table 2-12.

| Table 2-12: | System Cache Address Map, Statistics Field for Generic Port |
|-------------|-------------------------------------------------------------|
|-------------|-------------------------------------------------------------|

| Address (binary)      | Functionality                  | R/W | Statistics<br>Format | Description                                                                                                           |
|-----------------------|--------------------------------|-----|----------------------|-----------------------------------------------------------------------------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Read Segments                  | R   | COUNT <sup>1</sup>   | Number of segments per read transaction                                                                               |
| x_xxxx_xx00_001x_xx00 | Write Segments                 | R   | COUNT <sup>1</sup>   | Number of segments per write transaction                                                                              |
| x_xxxx_xx00_010x_xx00 | RIP                            | R   | QUEUE <sup>2</sup>   | Read Information Port queue statistics                                                                                |
| x_xxxx_xx00_011x_xx00 | R                              | R   | QUEUE <sup>2</sup>   | Read data queue statistics                                                                                            |
| x_xxxx_xx00_100x_xx00 | BIP                            | R   | QUEUE <sup>2</sup>   | BRESP Information Port queue statistics                                                                               |
| x_xxxx_xx00_101x_xx00 | BP                             | R   | QUEUE <sup>2</sup>   | BRESP Port queue statistics                                                                                           |
| x_xxxx_xx00_110x_xx00 | WIP                            | R   | QUEUE <sup>2</sup>   | Write Information Port queue statistics                                                                               |
| x_xxxx_xx00_111x_xx00 | W                              | R   | QUEUE <sup>2</sup>   | Write data queue statistics                                                                                           |
| x_xxxx_xx01_000x_xx00 | Read Blocked                   | R   | COUNT <sup>1</sup>   | Number of cycles a read was<br>prohibited from taking part in<br>arbitration                                          |
| x_xxxx_xx01_001x_xx00 | Read Latency                   | R   | COUNT <sup>1</sup>   | Read latency statistics                                                                                               |
| x_xxxx_xx01_010x_xx00 | Write Latency                  | R   | COUNT <sup>1</sup>   | Write latency statistics                                                                                              |
| x_xxxx_xx01_011x_xx00 | Read Latency<br>Configuration  | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 0. Modes available<br>defined in Table 2-25 |
| x_xxxx_xx01_100x_xx00 | Write Latency<br>Configuration | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 4. Modes available<br>defined in Table 2-26 |

1. See Table 2-18 for the COUNT register fields.

2. See Table 2-19 for the QUEUE register fields.

3. See Table 2-20 for the LONGINT register fields.

The address decoding to the statistics functionality in the Arbiter functional unit is according to Table 2-13.

Table 2-13: System Cache Address Map, Statistics Field for Arbiter

| Address (binary)      | Functionality | R/W | Statistics<br>Format | Description                                                            |
|-----------------------|---------------|-----|----------------------|------------------------------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Valid         | R   | COUNT <sup>1</sup>   | The number of clock cycles a transaction takes after being arbitrated  |
| x_xxxx_xx00_001x_xx00 | Concurrent    | R   | COUNT <sup>1</sup>   | Number of transactions<br>available to select from when<br>arbitrating |

1. See Table 2-18 for the COUNT register fields.

The address decoding to the statistic functionality in the Access functional unit is according to Table 2-14.

Table 2-14: System Cache Address Map, Statistics Field for Access

| Address (binary)      | Functionality | R/W | Statistics<br>Format | Description                                                                         |
|-----------------------|---------------|-----|----------------------|-------------------------------------------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Valid         | R   | COUNT <sup>1</sup>   | The number of clock cycles a<br>transaction takes after passing<br>the access stage |

1. See Table 2-18 for the COUNT register fields.

The address decoding to the statistic functionality in the Access functional unit is according to Table 2-15.

Table 2-15: System Cache Address Map, Statistics Field for Lookup

| -                     | •                |     |                      | -                                     |
|-----------------------|------------------|-----|----------------------|---------------------------------------|
| Address (binary)      | Functionality    | R/W | Statistics<br>Format | Description                           |
| x_xxxx_xx00_000x_xx00 | Write Hit        | R   | COUNT <sup>1</sup>   | Number of write hits                  |
| x_xxxx_xx00_001x_xx00 | Write Miss       | R   | COUNT <sup>1</sup>   | Number of write misses                |
| x_xxxx_xx00_010x_xx00 | Write Miss Dirty | R   | COUNT <sup>1</sup>   | Number of dirty write misses          |
| x_xxxx_xx00_011x_xx00 | Read Hit         | R   | COUNT <sup>1</sup>   | Number of read hits                   |
| x_xxxx_xx00_100x_xx00 | Read Miss        | R   | COUNT <sup>1</sup>   | Number of read misses                 |
| x_xxxx_xx00_101x_xx00 | Read Miss Dirty  | R   | COUNT <sup>1</sup>   | Number of dirty read misses           |
| x_xxxx_xx00_110x_xx00 | Locked Write Hit | R   | COUNT <sup>1</sup>   | Number of locked write hits           |
| x_xxxx_xx00_111x_xx00 | Locked Read Hit  | R   | COUNT <sup>1</sup>   | Number of locked read hits            |
| x_xxxx_xx01_000x_xx00 | First Write Hit  | R   | COUNT <sup>1</sup>   | Number of first write hits            |
| x_xxxx_xx01_001x_xx00 | Fetch Stall      | R   | COUNT <sup>1</sup>   | Time fetch stalls because of conflict |

| Address (binary)      | Functionality   | R/W | Statistics<br>Format | Description                         |
|-----------------------|-----------------|-----|----------------------|-------------------------------------|
| x_xxxx_xx01_010x_xx00 | Mem Stall       | R   | COUNT <sup>1</sup>   | Time mem stalls because of conflict |
| x_xxxx_xx01_011x_xx00 | Data Stall      | R   | COUNT <sup>1</sup>   | Time stalled due to memory access   |
| x_xxxx_xx01_100x_xx00 | Data Hit Stall  | R   | COUNT <sup>1</sup>   | Time stalled due to conflict        |
| x_xxxx_xx01_101x_xx00 | Data Miss Stall | R   | COUNT <sup>1</sup>   | Time stalled due to full buffers    |

Table 2-15: System Cache Address Map, Statistics Field for Lookup (Cont'd)

1. See Table 2-18 for the COUNT register fields.

The address decoding to the statistic functionality in the Update functional unit is according to Table 2-16.

Table 2-16: System Cache Address Map, Statistics Field for Update

| Address (binary)      | Functionality       | R/W | Statistics<br>Format | Description                                    |
|-----------------------|---------------------|-----|----------------------|------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Stall               | R   | COUNT <sup>1</sup>   | Cycles transactions are stalled                |
| x_xxxx_xx00_001x_xx00 | Tag Free            | R   | COUNT <sup>1</sup>   | Cycles tag is free                             |
| x_xxxx_xx00_010x_xx00 | Data free           | R   | COUNT <sup>1</sup>   | Cycles data is free                            |
| x_xxxx_xx00_011x_xx00 | Read Information    | R   | QUEUE <sup>2</sup>   | Queue statistics for read<br>transactions      |
| x_xxxx_xx00_100x_xx00 | Read Data           | R   | QUEUE <sup>2</sup>   | Queue statistics for read data                 |
| x_xxxx_xx00_101x_xx00 | Evict               | R   | QUEUE <sup>2</sup>   | Queue statistics for evict information         |
| x_xxxx_xx00_110x_xx00 | BRESP Source        | R   | QUEUE <sup>2</sup>   | Queue statistics for BRESP source information  |
| x_xxxx_xx00_111x_xx00 | Write Miss          | R   | QUEUE <sup>2</sup>   | Queue statistics for write miss information    |
| x_xxxx_xx01_000x_xx00 | Write Miss Allocate | R   | QUEUE <sup>2</sup>   | Queue statistics for allocated write miss data |

1. See Table 2-18 for the COUNT register fields.

2. See Table 2-19 for the QUEUE register fields.

The address decoding to the statistic functionality in the Backend functional unit is according to Table 2-17.

| Address (binary)      | Functionality                  | R/W | Statistics<br>Format | Description                                                                                                               |
|-----------------------|--------------------------------|-----|----------------------|---------------------------------------------------------------------------------------------------------------------------|
| x_xxxx_xx00_000x_xx00 | Write Address                  | R   | QUEUE <sup>1</sup>   | Queue statistics for write address channel information                                                                    |
| x_xxxx_xx00_001x_xx00 | Write Data                     | R   | QUEUE <sup>1</sup>   | Queue statistics for write channel data                                                                                   |
| x_xxxx_xx00_010x_xx00 | Read Address                   | R   | QUEUE <sup>1</sup>   | Queue statistics for read address channel information                                                                     |
| x_xxxx_xx00_011x_xx00 | Search Depth                   | R   | COUNT <sup>2</sup>   | Transaction search depth for read access before released                                                                  |
| x_xxxx_xx00_100x_xx00 | Read Stall                     | R   | COUNT <sup>2</sup>   | Cycles stall due to search                                                                                                |
| x_xxxx_xx00_101x_xx00 | Read Protected Stall           | R   | COUNT <sup>2</sup>   | Cycles stall due to conflict                                                                                              |
| x_xxxx_xx00_110x_xx00 | Read Latency                   | R   | COUNT <sup>2</sup>   | Read latency statistics for external transactions to memory                                                               |
| x_xxxx_xx00_111x_xx00 | Write Latency                  | R   | COUNT <sup>2</sup>   | Write latency statistics for<br>external transactions to memory                                                           |
| x_xxxx_xx01_000x_xx00 | Read Latency<br>Configuration  | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 0. Available modes are<br>defined in Table 2-25 |
| x_xxxx_xx01_001x_xx00 | Write Latency<br>Configuration | R/W | LONGINT <sup>3</sup> | Configuration for read latency<br>statistics collection. Default<br>value 4. Available modes are<br>defined in Table 2-26 |

Table 2-17: System Cache Address Map, Statistics Field for Backend

1. See Table 2-19 for the QUEUE register fields.

2. See Table 2-18 for the COUNT register fields.

3. See Table 2-20 for the LONGINT register fields.

The address decoding to the different registers in a statistic record being of type COUNT is according to Table 2-18.

Table 2-18: System Cache Address Map, Register Field for COUNT

| Address (binary)      | Register         | R/W | Format               | Description                                                     |
|-----------------------|------------------|-----|----------------------|-----------------------------------------------------------------|
| x_xxxx_xxxx_xxx0_0x00 | Events           | R   | LONGINT <sup>1</sup> | Number of times the event has been triggered                    |
| x_xxxx_xxxx_xxx0_1x00 | Min Max Status   | R   | MINMAX <sup>2</sup>  | Min, max and status information defined according to Table 2-23 |
| x_xxxx_xxxx_xxx1_0x00 | Sum              | R   | LONGINT <sup>1</sup> | Sum of measured data                                            |
| x_xxxx_xxxx_xxx1_1x00 | Sum <sup>2</sup> | R   | LONGINT <sup>1</sup> | Sum of measured data squared                                    |

1. See Table 2-20 for the LONGINT register fields.

2. See Table 2-21 for the MINMAX register fields.

The address decoding to the different registers in a statistic record of type QUEUE is according to Table 2-19.

| Table 2-19: | System Cache Address Map, Register Field for QUEUE |
|-------------|----------------------------------------------------|
|             |                                                    |

| -                     | • •           |     |                      |                                                      |
|-----------------------|---------------|-----|----------------------|------------------------------------------------------|
| Address (binary)      | Register      | R/W | Format               | Description                                          |
| x_xxxx_xxxx_xxx0_0x00 | Empty Cycles  | R   | LONGINT <sup>1</sup> | Clock cycles the queue has been idle                 |
| x_xxxx_xxxx_xxx0_1x00 | Index Updates | R   | LONGINT <sup>1</sup> | Number of times updated with push or pop             |
| x_xxxx_xxxx_xxx1_0x00 | Index Max     | R   | MINMAX <sup>2</sup>  | Maximum depth for queue<br>(only maximum field used) |
| x_xxxx_xxxx_xxx1_1x00 | Index Sum     | R   | LONGINT <sup>1</sup> | Sum of queue depth when updated                      |

1. See Table 2-20 for the LONGINT register fields.

2. See Table 2-21 for the MINMAX register fields.

The address decoding of the 64-bit vector LONGINT is according to Table 2-20.

Table 2-20: System Cache Address Map, High-Low Field for LONG INT

| Address (binary)      | High Low | Description                               |
|-----------------------|----------|-------------------------------------------|
| x_xxxx_xxxx_xxxx_x000 | LOW      | LONGINT Bits 31-0, least significant half |
| x_xxxx_xxxx_xxxx_x100 | HIGH     | LONGINT Bits 63-32, most significant half |

The address decoding of the 64-bit vector MIN MAX is according to Table 2-21.

Table 2-21: System Cache Address Map, High-Low Field for MIN MAX

| Address (binary)      | High Low | Description        |
|-----------------------|----------|--------------------|
| x_xxxx_xxxx_xxxx_x000 | LOW      | MIN MAX Bits 31-0  |
| x_xxxx_xxxx_xxxx_x100 | HIGH     | MIN MAX Bits 63-32 |

Bit field definition of the LONG INT register is according to Table 2-22.

#### Table 2-22: LONG INT Register Bit Allocation

|    | Long Integer |  |
|----|--------------|--|
| 63 | 0            |  |

Bit field definition of the MIN MAX register is according to Table 2-23.

#### Table 2-23: MIN MAX Register Bit Allocation

|    | Min | Мах   | reserved | Full | Over<br>flow |
|----|-----|-------|----------|------|--------------|
| 63 |     | 47 32 | 31 2     | 1    | 0            |

Field definitions for MIN MAX register type according to Table 2-24.

#### Table 2-24: MIN MAX Field Definition

| Field    | Description                                                                                                                                                                           |
|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Min      | Minimum unsigned measurement encountered                                                                                                                                              |
| Max      | Maximum unsigned measurement encountered, saturates when 0xFFFF is reached                                                                                                            |
| Full     | Flag if number of concurrent events of the measured type has been reached, indicating that the resulting statistics are inaccurate.                                                   |
| Overflow | Flag if measurements have been saturated; this means the statistics results are less accurate. Both average and standard deviation measurements will be lower than the actual values. |

Mode definitions for read latency measurements is according to Table 2-25.

#### Table 2-25: Read Latency Measurement

| Value | Description                                              |
|-------|----------------------------------------------------------|
| 0     | AR channel valid until first data is acknowledged        |
| 1     | AR channel acknowledged until first data is acknowledged |
| 2     | AR channel valid until last data is acknowledged         |
| 3     | AR channel acknowledged until last data is acknowledged  |

Mode definitions for write latency measurements is according to Table 2-26.

Table 2-26: Write Latency Measurement

| Value | Description                                         |
|-------|-----------------------------------------------------|
| 0     | AW channel valid until first data is written        |
| 1     | AW channel acknowledged until first data is written |
| 2     | AW channel valid until last data is written         |
| 3     | AW channel acknowledged until last data is written  |
| 4     | AW channel valid until BRESP is acknowledged        |
| 5     | AW channel acknowledged until BRESP is acknowledged |



# Customizing and Generating the Core

This chapter includes information on using Xilinx® tools to customize and generate the core.

# GUI

The System Cache parameters are divided info three categories: core, system and interconnect related. See Table 3-1 for allowed values.

The core parameter tab showing all the parameters is illustrated in Figure 3-1.

| Core System Interconnect Settings for BUSIF | HDL 🥦 🥏 |
|---------------------------------------------|---------|
| 🕞 Ports                                     |         |
| Number of Optimized AXI4 Ports              | 2 -     |
| Use Generic AXI4 Port                       |         |
| 🕞 Associative Sets                          |         |
| Number of Associative Sets                  | 2 -     |
| 🕞 Cache Settings                            |         |
| Data Width                                  | 32      |
| Line Length                                 | 16 🔻    |
| Size                                        | 32k 👻   |
| ⊡ Statistics                                |         |
| Enable AXI Control Interface                |         |
|                                             |         |
|                                             |         |
|                                             |         |
|                                             |         |

Figure 3-1: Core Parameter Tab

- Number of Optimized AXI4 Ports Sets the number of optimized ports that are available to connect to a MicroBlaze<sup>™</sup> or equivalent IP in terms of AXI4 transaction support.
- Use Generic AXI4 Port Set if the Generic AXI4 port are available for IPs not adhering to the AXI4 subset required for the optimized port, such as DMA etc.

- Number of Associative Sets Specify how many sets the associativity uses.
- Data With Internal data width is automatically calculated from the M\_AXI interface.
- Line Length System Cache cache line length is fixed to 16.
- Size Sets the size of the System Cache in bytes.
- Enable AXI Control Interface Set if statistics interface is available.

The system parameter tab is shown in Figure 3-2 with the address parameters and S0 interface parameters visible.

| User System Interconnect Settings for BUSIF | HDL 🅦 🬌   |
|---------------------------------------------|-----------|
| 😑 Addresses                                 | <b>_</b>  |
| Base Address                                | 0xc000000 |
| High Address                                | 0xfffffff |
| Control Interface Base Address              | ØxFFFFFFF |
| Control Interface High Address              | 0x0000000 |
| ⊖ S0_AXI                                    |           |
| S0_AXI Data Width                           | 32 🔻      |
| S0_AXI Address Width                        | 32        |
| S0_AXI ID Width                             | 1         |
| AXI4 protocol                               | AXI4      |
| Frequency of AXI4 Slave                     | Αυτο 🐚    |
| ⊕ S1_AXI                                    |           |
| € S2_AXI                                    |           |
| € S3_AXI                                    |           |
| G SA AYI                                    |           |

Figure 3-2: System Parameter Tab

- Base/High Address Sets the address range for the cacheable area.
- Control Interface Base/High Address Sets the address range for the control interface area that contains all statistics and control registers. Only available when the control interface is enabled.
- Sx\_AXI Data Width Sets the data width of the Optimized ports individually.
- S0\_AXI\_GEN Data Width Sets the data width of the Generic port.
- M\_AXI Data Width Sets the data width of the master interface that is connected to the memory subsystem.

The interconnect parameter tab is illustrated in Figure 3-3, showing the first few parameters.

| User System Interconnect Settings for BUSIF | HDL) 嬕     |
|---------------------------------------------|------------|
| © M_AXI                                     |            |
| Unique Master ID                            | Αυτο 🐚     |
| IS ACLK Asynchronous to Interconnect_ACLK   | Αυτο 🐚     |
| ACLK Frequency Ratio                        | 10000000   |
| Arbitration Priority                        |            |
| Use register slice on AW channel            | BYPASS     |
| Use register slice on AR channel            | BYPASS     |
| Use register slice on W channel             | BYPASS     |
| Use register slice on R channel             | BYPASS     |
| Use register slice on B channel             | BYPASS     |
| Write Data FIFO Depth                       | 0 (None) 👻 |
| Write Data FIFO Burst Delay                 | □ <b>-</b> |

Figure 3-3: Interconnect Parameter Tab

All parameters on this tab configure how the interconnect of each AXI interface should be customized to get the desired system level performance and achieve timing closure.

### **Parameter Values**

Certain parameters are only available in some configurations, others impose restrictions that IP cores connected to the System Cache need to adhere to. All these restrictions are enforced by Design Rule Checks to guarantee a valid configuration.

The parameter restrictions are:

- Internal cache data width must either be 32 or a multiple of the cache line length of masters connected to the optimized ports (C\_CACHE\_DATA\_WIDTH = 32 or C\_CACHE\_DATA\_WIDTH = n \* 32 \* C\_Lx\_CACHE\_LINE\_LENGTH).
- All Optimized slave port data widths must be less than or equal to the internal cache data width (C\_Sx\_AXI\_DATA\_WIDTH ≤ C\_CACHE\_DATA\_WIDTH).
- Generic slave port data width must be less than or equal to the internal cache data width (C\_S0\_AXI\_GEN\_DATA\_WIDTH ≤ C\_CACHE\_DATA\_WIDTH).

- The master port data width must be greater then or equal to the internal cache data width (C\_CACHE\_DATA\_WIDTH ≤ C\_M\_AXI\_DATA\_WIDTH).
- The internal cache line length must be greater than or equal to the corresponding cache line length of the AXI masters connected to the optimized port (C\_CACHE\_LINE\_LENGTH ≥ C\_Lx\_CACHE\_LINE\_LENGTH).

 Table 3-1:
 System Cache I/O Interfaces

| Parameter Name                   | Feature/Description                                                                                             | Allowable Values        | Default<br>Value | VHDL<br>Type         |
|----------------------------------|-----------------------------------------------------------------------------------------------------------------|-------------------------|------------------|----------------------|
| C_FAMILY                         | FPGA Architecture                                                                                               | Supported architectures | "virtex6"        | string               |
| C_INSTANCE                       | Instance Name                                                                                                   | Any instance name       | system_cache     | string               |
| C_BASEADDR                       | Cacheable area base address                                                                                     |                         | 0xFFFFFFFF       | std_logi<br>c_vector |
| C_HIGHADDR                       | Cacheable area high address.<br>Minimum size is 32kB                                                            |                         | 0x00000000       | std_logi<br>c_vector |
| C_ENABLE_CTRL                    | Enable implementation of Statistics<br>and Control function                                                     | 0, 1                    | 0                | natural              |
| C_NUM_OPTIMIZED_PORTS            | Number of ports optimized for<br>MicroBlaze cache connection                                                    | 0 - 8                   | 1                | natural              |
| C_NUM_GENERIC_PORTS              | Number of ports supporting full AXI4                                                                            | 0, 1                    | 0                | natural              |
| C_NUM_SETS                       | Cache associativity                                                                                             | 2, 4                    | 2                | natural              |
| C_CACHE_DATA_WIDTH               | Cache data width used internally.<br>Automatically calculated to match<br>AXI master interface                  | 32, 64, 128, 256, 512   | 32               | natural              |
| C_CACHE_LINE_LENGTH              | Cache line length. Constant value.                                                                              | 16                      | 16               | natural              |
| C_CACHE_SIZE                     | Cache size in bytes                                                                                             | 32768, 65536, 131072    | 32768            | natural              |
| C_Lx_CACHE_LINE_LENGTH           | Cache line length on masters<br>connected to optimized ports.<br>Automatically assigned with manual<br>override | 4, 8                    | 4                | natural              |
| M                                | croBlaze cache optimized AXI4 slave inte                                                                        | rface parameters        | 1                | P                    |
| C_Sx_AXI_ADDR_WIDTH <sup>1</sup> | Address width. Constant value.                                                                                  | 32                      | 32               | natural              |
| C_Sx_AXI_DATA_WIDTH <sup>1</sup> | Data width                                                                                                      | 32, 128, 256, 512       | 32               | natural              |
| C_Sx_AXI_ID_WIDTH <sup>1</sup>   | ID width, automatically assigned                                                                                | 1 - 32                  | 1                | natural              |
|                                  | Generic AXI4 slave interface para                                                                               | meters                  | I                | I                    |
| C_S0_AXI_GEN_ADDR_WIDTH          | Address Width. Constant value.                                                                                  | 32                      | 32               | natural              |
| C_S0_AXI_GEN_DATA_WIDTH          | Data Width                                                                                                      | 32, 64, 128, 256, 512   | 32               | natural              |
| C_S0_AXI_GEN_ID_WIDTH            | ID width, automatically assigned                                                                                | 1 - 32                  | 1                | natural              |
| S                                | tatistics and Control AXI4-Lite slave inter                                                                     | face parameters         | 1                | <u> </u>             |
| C_S_AXI_CTRL_BASEADDR            | Control area base address                                                                                       |                         | 0xFFFFFFFF       | std_logi<br>c_vector |

| Table 3-1: System Cache I/O Interfaces (C | (Cont'd) |
|-------------------------------------------|----------|
|-------------------------------------------|----------|

| Parameter Name                                     | Feature/Description                                      | Allowable Values  | Default<br>Value | VHDL<br>Type         |
|----------------------------------------------------|----------------------------------------------------------|-------------------|------------------|----------------------|
| C_S_AXI_CTRL_HIGHADDR                              | Control area high address. Minimum<br>size is 128kB      |                   | 0x00000000       | std_logi<br>c_vector |
| C_S_AXI_CTRL_ADDR_WIDTH                            | Address Width. Constant value.                           | 32                | 32               | natural              |
| C_S_AXI_CTRL_DATA_WIDTH                            | Data Width. Constant value.                              | 32                | 32               | natural              |
| Memory Controller AXI4 master interface parameters |                                                          |                   |                  |                      |
| C_M_AXI_ADDR_WIDTH                                 | Address Width. Constant value.                           | 32                | 32               | natural              |
| C_M_AXI_DATA_WIDTH                                 | Data Width                                               | 32, 128, 256, 512 | 32               | natural              |
| C_M_AXI_THREAD_ID_WIDTH                            | ID width. Automatically assigned<br>with manual override | 1 - 32            | 1                | natural              |

1. x = 0 - 7



# Designing with the Core

This chapter includes guidelines and additional information to make designing with the core easier.

# **General Design Guidelines**

The are no golden settings to achieve maximum performance for all cases, as performance is application and system dependent. This chapter contains general guidelines that should be considered when configuring System Cache and other IP cores to improve performance.

#### **AXI Data Widths**

AXI Data widths should match wherever possible. Matching widths results in minimal area overhead and latency for the AXI interconnects.

### **AXI Clocking**

The System Cache is fully synchronous. Using the same clock for all the AXI ports removes the need for clock conversion blocks and results in minimal area overhead and latency for the AXI interconnects.

#### **Frequency and Hit Rate**

Increased cache hit rate results in higher performance.

The System Cache size should be configured to be larger than the connected L1 caches to achieve any improvements. Increasing the System Cache size will increase hit rate and have a positive effect on performance. The downside of increasing the System Cache size is increased number of FPGA resources being used. Higher set associativity usually increase the hit rate and the application performance.

The maximum frequency of MicroBlaze<sup>™</sup> is affected by its cache sizes. Smaller MicroBlaze cache sizes usually means that MicroBlaze can meet higher frequency targets. The sweet spot for the frequency versus cache size trade-off when using the System Cache occurs when configuring MicroBlaze caches to either 256 or 512 bytes, depending on other

MicroBlaze configuration settings. The key to improve frequency is to implement MicroBlaze cache tags with distributed RAM.

Enabling the MicroBlaze Branch Target Cache can improve performance but might reduce the maximum obtainable frequency. Depending on the rest of the MicroBlaze configuration smaller BTC sizes, such as 32 entries (C\_BRANCH\_TARGET\_CACHE\_SIZE = 3), should be considered.

Enabling MicroBlaze victim caches increases MicroBlaze cache hit rates, with improved performance as a result. Enabling victim caches can however reduce MicroBlaze maximum frequency in some cases.

MicroBlaze performance is often improved by using 8-word cache lines on the Instruction Cache and Data Cache.

#### Bandwidth

Using wider AXI interfaces increases data bandwidth, but also increases FPGA resource usage. Using the widest possible common AXI data width between the System Cache AXI Master and the external memory gives the highest possible bandwidth. This also applies to the AXI connection between MicroBlaze caches and the System Cache. The widest possible common width gives the highest bandwidth.

### Arbitration

The System Cache arbitration scheme is round-robin. When the selected port does not have a pending transaction, the first port with an available transaction is scheduled, considering the optimized ports in ascending numeric order and finally the generic port.

Only one read request per port is processed at a time. While one port has a read in progress no other reads from the same port are scheduled. A write from any port or read from any other port with no read in progress can be arbitrated during this time.

# Clocking

The System Cache is fully synchronous with all interfaces and the internal function clocked by the ACLK input signal. It is advisable to avoid asynchronous clock transitions in the system as they add latency and consumes area resources.

## Resets

The System Cache is reset by the ARESETN input signal. ARESETN is synchronous to ACLK and needs be asserted one ACLK cycle to take affect. The System Cache is ready for operation two ACLK cycles after ARESETN is deasserted.

# **Protocol Description**

All interfaces to the System Cache adhere to AXI4 protocol.



## Appendix A

# **Additional Resources**

## **Xilinx Resources**

For support resources such as Answers, Documentation, Downloads, and Forums, see the Xilinx Support website at:

www.xilinx.com/support.

For a glossary of technical terms used in Xilinx documentation, see:

www.xilinx.com/company/terms.htm.

## **Solution Centers**

See the <u>Xilinx Solution Centers</u> for support on devices, software tools, and intellectual property at all stages of the design cycle. Topics include design assistance, advisories, and troubleshooting tips.

### References

These documents provide supplemental material useful with this user guide:

AMBA AXI4 Interface Protocol

# **Technical Support**

Xilinx provides technical support at <u>www.xilinx.com/support</u> for this LogiCORE<sup>™</sup> IP product when used as described in the product documentation. Xilinx cannot guarantee timing, functionality, or support of product if implemented in devices that are not defined in the documentation, if customized beyond that allowed in the product documentation, or if changes are made to any section of the design labeled DO NOT MODIFY.

# **Ordering Information**

This Xilinx LogiCORE IP module is provided at no additional cost with the Xilinx ISE® Design Suite Embedded Edition software under the terms of the Xilinx End User License Agreement and is included in the Platform Studio and Embedded Development Kit (EDK).

Contact your local Xilinx <u>sales representative</u> for pricing and availability of additional Xilinx LogiCORE IP modules and software. Information about additional Xilinx LogiCORE IP modules is available on the Xilinx <u>IP Center</u>.

### **Revision History**

The following table shows the revision history for this document.

| Date     | Version | Revision                |
|----------|---------|-------------------------|
| 04/24/12 | 1.0     | Initial Xilinx release. |

# **Notice of Disclaimer**

The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior written consent. Certain products are subject to the terms and conditions of the Limited Warranties which can be viewed at <a href="http://www.xilinx.com/warranty.htm">http://www.xilinx.com/warranty.htm</a>; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx products in Critical Applications: <a href="http://www.xilinx.com/warranty.htm#critapps">http://www.xilinx.com/warranty.htm#critapps</a>.

© Copyright 2012 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. ARM is a registered trademark of ARM in the EU and other countries The AMBA trademark is a registered trademark or ARM Limited. All other trademarks are the property of their respective owners.

#### **Automotive Applications Disclaimer**

XILINX PRODUCTS ARE NOT DESIGNED OR INTENDED TO BE FAIL-SAFE, OR FOR USE IN ANY APPLICATION REQUIRING FAIL-SAFE PERFORMANCE, SUCH AS APPLICATIONS RELATED TO: (I) THE DEPLOYMENT OF AIRBAGS, (II) CONTROL OF A VEHICLE, UNLESS THERE IS A FAIL-SAFE OR REDUNDANCY FEATURE (WHICH DOES NOT INCLUDE USE OF SOFTWARE IN THE XILINX DEVICE TO IMPLEMENT THE REDUNDANCY) AND A WARNING SIGNAL UPON FAILURE TO THE OPERATOR, OR (III) USES THAT COULD LEAD TO DEATH OR PERSONAL INJURY. CUSTOMER ASSUMES THE SOLE RISK AND LIABILITY OF ANY USE OF XILINX PRODUCTS IN SUCH APPLICATIONS.