AR# 66570

|

UltraScale Architecture Soft Error Mitigation Controller - Guidance for testing with error injection

描述

This article supplements the existing information in (PG187) LogiCORE IP UltraScale Architecture Soft Error Mitigation Controller Product Guide on how to evaluate a Soft Error Mitigation (SEM) IP's functionality by manually injecting errors.

解决方案

General guidance:

  • Always use Linear Frame Address (LFA) as this is a consecutive range of addresses unlike with Physical Frame Address (PFA).
  • Any valid LFA can be translated to PFA by the IP using the translate command.
  • The range of valid LFA can be reported by the IP core by issuing the "status" command. This is available via monitor interface in the following format.

MF {8-digit hex value} Maximum Frame (linear count)

Alternatively, you can find information on the valid range of LFA in the SEM IP product guide:

(PG187), table 2-2, page 14

Basic error injection testing:

Goal:

  • To test IP detection and correct errors
  • To test whether the system is logging errors which have been detected and corrected correctly, or to verify that the system is able to reconfigure a design upon detecting an error or uncorrectable error
  • NOT to test a design's behavior when an error impacts on a design's function

Recommendation:

Procedure:

  • Inject an error into the ECC word of a frame. Select one of the ECC parity bits as target bit for injection
  • The Injected error will never interfere with design's function as the ECC word is just a parity for the configuration bits

ECC word location in a configuration frame:

UltraScale devices uses word 62/123 word of 32 bits

For Example:

Injecting a 1bit error to the LSB of the 62nd word of frame #F under 40bit linear frame address injection:

> N C00000F 7A0

Please note to target only the lowest byte of the word so that the always populated ECC bits can be injected.

Please also note that not all addresses exist so the above injection might not get detected.

Randomize error injection testing:

Goal:

  • Injection to a random location within a configuration frame is possible, but you will need to anticipate any of the following undesired results:
  1. The design might stop functioning
  2. IP can freeze, hang or misbehave
  3. The Error might not be detected because many of the configuration bit locations are masked or do not exist in actual configuration memory
  4. Short cascades of error detection and correction, often associated with multi-bit error injections

This testing can be used to mimic real life design and system responses to an SEU. 

Customer is advised to understand and estimate how to react to such scenarios and design the system response for most graceful reaction.

When performing random error injection, Xilinx does not support analysis of any specific customer design outcome, or the tracing back of the error location to the design. 

Random error injection is only supported "as is" for customers that cannot gain access to beam testing facilities.

General guidance on injecting errors using SEM IP monitor and error injection interface:

  • Error injection commands will only work when the feature is enabled at IP generation
  • Error injection should be performed only when the IP is in IDLE (use monitor port ASCII confirmation or status_* signals). You must explicitly transition to the IDLE state prior to apply injection.
  • You will need to check the state transition to status_injection and back to status_idle to validate that the IP has accepted and completed the command although this does not mean that injection was successful.
  • If injection of multiple bits is desired, Inject one bit error at a time and confirm that IP is in IDLE again before doing the next injection. If the IP is not in IDLE, error injection instruction might be dropped or lost.
  • Inject errors within the reported valid LFA range or else the IP will ignore it.
  • It is recommended to use the monitor/UART interface to perform and monitor error injection, because this interface provides the most verbose information. (Note that it does not echo a command if it is invalid - wrong syntax, number of ASCII character, shows any state change, etc.)
  • When using the command interface, make sure to monitor the status interface to verify that the IP is in IDLE before injecting any error and also to verify that the IP transitions to an error injection state
  • It is best practice to reconfigure the FPGA after each set of injections if any unexpected result is seen by monitor interface or status ports.
  • Accumulating random error injection over and over is not supported. This does not reflect real life estimation, as a single FPGA design hit by many errors almost never happens. As a result you should not attempt to test
  • Utilize the Query command before and after error injection to confirm that configuration data is actually changed. This is a way to validate if injection was successful.
    If the data shows no change, then the configuration frame might be masked or correspond to a non-existent memory location. In such a case, the injection is unsuccessful, and no error will be detected. This is normal.

Special considerations when injecting errors:

When SEM has reported an uncorrectable error condition, the recovery method is to reconfigure the device. Reconfiguration is also necessary if SEM has frozen, hung, or begins to misbehave.

If the user design malfunctions, recovery by reconfiguration will always succeed. If the user design supports recovery through logic-level reset, this method is also possible.

  • Uncorrectable Error Reported - reconfigure the device before further injection attempts
  • User design stops functioning - reconfiguration always works, or if correction is successful then logic reset is an option if supported by the design.

If the IP freezes, hangs or misbehaves, reconfigure the device and discard the previous injection or correction result

  • For Zynq, there are LFA addresses that correspond to the PS location that have no responses to error injection. The recommendation is to use addresses that are in the middle of the total LFA range.
  • Due to masked and unimplemented frames, there is a possibility that an error is not injected and hence not detected and hence not corrected.
  • Frames that are masked are related to dynamic memory (DRP, SRL, etc.)
  • After one bit error is injected, explicit transition to OBSERVATION is necessary for the IP to attempt detection, or correction.

链接问答记录

主要问答记录

Answer Number 问答标题 问题版本 已解决问题的版本
61241 Soft Error Mitigation IP Guidance for testing with error injection N/A N/A
AR# 66570
日期 12/02/2016
状态 Active
Type 综合文章
器件
IP
People Also Viewed