AR# 61241

|

Soft Error Mitigation IP Guidance for testing with error injection

描述

This article supplements existing SEM Product Guides, and covers how to evaluate SEM IP functionality by manually injecting errors.

解决方案

General guidance:

Always use Linear Frame Address (LFA) as this is a consecutive range of address unlike PFA.

The range of valid LFA can be reported by the IP core by issuing the "status" command, available via the monitor interface in the following format:

MF {8-digit hex value} Maximum Frame (linear count)

Alternatively, the below documents list the valid range of LFA:


Spartan-6:
(Xilinx Answer 61736) SEM IP Soft Error Mitigation - What is the valid range of addresses for error injection by LFA targeting Spartan 6 devices?

Virtex-6 or 7 Series:

(Xilinx Answer 65539) What is the valid range of addresses for error injection by LFA targeting Virtex-6, 7 series, and Zynq-7000 devices?

UltraScale:

SEM IP product guide table 2-2, page 14

http://www.xilinx.com/cgi-bin/docs/ipdoc?c=sem_ultra;v=latest;d=pg187-ultrascale-sem.pdf

Basic error injection testing:

Goal:

  • to test IP detection and correct errors
  • to test that the system is logging errors detected and corrected correctly, or to verify that the system is able to reconfigure designs upon finding an error or uncorrectable error
  • NOT to test a design's behavior when an error impacts on a design's function

Procedure:

  • Injection error in ECC word of a frame - pick one of the ECC parity bits as the target bit for injection
  • Injected error will never interfere with a design's function as the ECC word is just a parity for the configuration bits

ECC word location in a configuration frame for each FPGA family:


Spartan-6 66/66th word of 16-bit words


Example of injecting a 1-bit error to the LSB of the last word (66th) of frame #F under 36-bit linear frame address injection:

> N C0000F820

Virtex-6 41/81st word of 32-bit words


Example of injecting a 1-bit error to the LSB of the 41st word of frame #F under 36-bit linear frame address injection:

> N C0000F500


7 Series 51/101st word of 32-bit words


Example of injecting a 1-bit error to the LSB of the 51st word of frame #F under 40-bit linear frame address injection:

> N C00000F640

UltraScale word 62/123rd word of 32 bits


Example of injecting a 1bit error to the LSB of the 62nd word of frame #F under 40-bit linear frame address injection:

> N C00000F7A0

Please note to target only the lowest byte of the word so that always populated ECC bits can be injected. 

Please also note that not all addresses exist, so the above injection might not get detected. 

Please also reference the corresponding FPGA family's SEM IP documentations for the specific command formatting.


Randomize error injection testing:

Goal:


Injection to random location within a configuration frame is possible, but you will need to anticipate any of the following undesired results:

  1. User design might stop functioning
  2. IP can freeze, hang or misbehave
  3. Error might not be detected as many of the configuration bit locations are masked or do not exist in actual configuration memory 
  4. Short cascades of error detection and correction, often associated with multi bit error injections

Note: Spartan-6 is unique in that masked frames are only read masked, but can be written. If injecting to such frames, errors can change a design's behavior but SEM IP will not detect it.

This testing can be used to mimic real life design and system response to SEU. You are advised to understand and estimate how to react to such scenarios and mitigate the system response for the most graceful reaction.

When performing random error injection, Xilinx does not support analyzing any specific customer design outcome or association of the error location to the design. Random error injection is only supported as is for customers that cannot gain access to beam testing facilities.


General guidance on injecting errors using SEM IP monitor and error injection interface:

  • Error injection commands will only work when the feature is enabled at IP generation
  • Error injection should be performed only when the IP is in IDLE (use monitor port ASCII confirmation or status_* signals), you must explicitly transition to the IDLE state prior to applying the injection.
  • Check that the state transitions to status_injection and back to IDLE to validate that the IP accepted and completed the command. This does not guarantee that injection was successful.
  • If injection of multiple bits is desired, Inject one bit error at a time and confirm that IP is in IDLE again before doing the next injection. If IP is not in IDLE, error injection instructions can be dropped or lost.
  • Inject errors within the reported valid LFA range, otherwise the IP will ignore it.
    Note: check the core product guide for absolute maximum frame as this varies across families (i.e. MF-1 or MF-2).
  • We recommend using the monitor/UART interface to perform and monitor error injection as this interface provides the most verbose info (it does not echo a command if it is invalid - wrong syntax, number of ASCII character, shows any state change, etc.)
  • When using the command interface, make sure to monitor the status interface to verify that the IP is in IDLE before injecting any error, and also to verify that the IP transitions to the error injection state.
  • It is best practice to reconfigure the FPGA after each set of injections if any unexpected result is seen via the monitor interface or status ports.
  • Accumulating random error injection over and over is not supported. This does not reflect real life estimation, as the chance of a single FPGA design being hit by many errors is almost nil.
  • *UltraScale only: Use the Query command before and after error injection to confirm that configuration data has actually changed. This is a good method to validate if injection was successful. If the data shows no change, then the configuration frame might be masked or correspond to a non-existent memory location. In such cases, the injection is unsuccessful, and no error will be detected. This is normal behavior.

Special considerations when injecting errors:

When SEM has reported an uncorrectable error condition, the recovery method is to reconfigure the device. 

Reconfiguration is also necessary if SEM has frozen, hung, or begins to misbehave. If the user design malfunctions, recovery by reconfiguration will always succeed. If the user design supports recovery through logic-level reset, this method is also possible.

Uncorrectable Error Reported -- reconfigure the device before further injection attempts.

User design stops functioning -- reconfiguration always works, alternatively if correction is successful then maybe logic reset if supported by the design.

If the IP freezes, hangs or misbehaves, reconfigure the device and discard the previous injection or correction result.

  • For Zynq, there are LFA addresses that corresponds to the PS location that have no responses to error injection. We suggest using addresses that are in the middle of the total LFA range.
  • Due to masked and unimplemented frames, there is a possibility that an error will not be not injected and hence not detected and corrected.
  • Frames that are masked are related to dynamic memory (DRP, SRL, etc.)
  • After one bit error is injected, explicit transition to OBSERVATION is necessary for the IP to attempt detection, or correction.

链接问答记录

子答复记录

Answer Number 问答标题 问题版本 已解决问题的版本
66570 UltraScale Architecture Soft Error Mitigation Controller - Guidance for testing with error injection N/A N/A
AR# 61241
日期 05/09/2016
状态 Active
Type 综合文章
器件 More Less
IP
People Also Viewed