This Xilinx Answer discuss the following topics:
Error Management Hardware
Zynq MPSoC has a dedicated error handler to aggregate all of the fatal errors across the SoC and handle them. Refer to the TRM/Architecture Spec for details.
To summarize, all of the fatal errors routed to Error Manager can be either set to be handled by Hardware ( and trigger a SRST/PoR) or trigger an interrupt to PMU.
Error Management in PMU Firmware
The PMU Firmware (PMUFW) provides APIs to register custom error handlers or assign a default SRST/PoR action in response to an Error. There is a specific module (xpfw_mod_em.c) already provided in the PMUFW and it is enabled by default.
All error handling code should reside in this module and there are already a couple of examples for handling WDT errors.
Actions for each error can be set up using the XPfw_EmSetAction API:
The following actions are supported for the parameter ActionId:
Action ID | Description |
---|---|
EM_ACTION_POR | Trigger a Power-On-Reset |
EM_ACTION_SRST | Trigger a System Reset |
EM_ACTION_CUSTOM | Call the custom handler registered as ErrorHandler parameter |
Below is a list of Error IDs for the ErrorId parameter:
Error ID | Description |
---|---|
EM_ERR_ID_CSU_ROM | Errors logged by CSU Boot ROM (CBR) |
EM_ERR_ID_PMU_PB | Errors logged by PMU Boot ROM (PBR) in the pre-boot stage |
EM_ERR_ID_PMU_SERVICE | Errors logged by PBR in service mode |
EM_ERR_ID_PMU_FW | Errors logged by PMUFW |
EM_ERR_ID_PMU_UC | Un-Correctable Errors logged by PMU HW. This includes PMU ROM validation Error, PMU TMR Error, uncorrectable PMU RAM ECC Error, and PMU Local Register Address Error. |
EM_ERR_ID_CSU | CSU Hardware related Errors |
EM_ERR_ID_PLL_LOCK | Errors set when a PLL looses lock (These need to be enabled only after the PLL locks-up) |
EM_ERR_ID_PL | PL Generic Errors passed to PS |
EM_ERR_ID_TO | All Time-out Errors [FPS_TO, LPS_TO] |
EM_ERR_ID_AUX3 | Auxiliary Error 3 |
EM_ERR_ID_AUX2 | Auxiliary Error 2 |
EM_ERR_ID_AUX1 | Auxiliary Error 1 |
EM_ERR_ID_AUX0 | Auxiliary Error 0 |
EM_ERR_ID_DFT | Error associated with the unexpected enablement of DFT features |
EM_ERR_ID_CLK_MON | Clock Monitor Error |
EM_ERR_ID_XMPU | XPMU Errors [LPS XMPU, FPS XPMU] |
EM_ERR_ID_PWR_SUPPLY | Supply Detection Failure Errors |
EM_ERR_ID_FPD_SWDT | FPD System Watch-Dog Timer Error |
EM_ERR_ID_LPD_SWDT | LPD System Watch-Dog Timer Error |
EM_ERR_ID_RPU_CCF | Asserted if any of the RPU CCF errors are generated |
EM_ERR_ID_RPU_LS | Asserted if any of the RPU CCF errors are generated |
EM_ERR_ID_FPD_TEMP | FPD Temperature Shutdown Alert |
EM_ERR_ID_LPD_TEMP | LPD Temperature Shutdown Alert |
EM_ERR_ID_RPU1 | RPU1 Error including both Correctable and Uncorrectable Errors |
EM_ERR_ID_RPU0 | RPU0 Error including both Correctable and Uncorrectable Errors |
EM_ERR_ID_OCM_ECC | OCM Uncorrectable ECC Error |
EM_ERR_ID_DDR_ECC | DDR Uncorrectable ECC Error |
Example for Error Management (Custom Handler)
In the example below, an OCM uncorrectable error (EM_ERR_ID_OCM_ECC) is considered.
A custom handler is registered for this error in the PMUFW and the handler in this case just prints out the error message. In a more realistic case, the corrupted memory might be reloaded, but this example is limited to clearing the error and printing a message.
Adding an Error Handler in the PMUFW:
Diff for xpfw_mod_em.c
Execute from an R5/A53 target on the XSDB:
Tip:
The above code is in Tcl for debugging. The Same code can be easily ported to a 'C' source by replacing the mwr/mrd with Xil_Out32/Xil_In32
Example for Error Management ( PoR as a response to Error)
Some errors might be too fatal, and the system recovery from those errors might not be feasible without doing a Reset of the entire system.
In such cases PoR or SRST can be used as actions. In this example we use PoR reset as a response to the OCM ECC double-bit error.
Here is the code that adds the PoR as an action:
Diff for xpfw_mod_em.c
The Tcl script is the same as the one from the above example to inject an OCM ECC error.
Once you trigger the error, a PoR occurs and you can see that all processors are in a reset state, similar to how they would be in a fresh power-on state.
PMU RAM also gets cleared off during a PoR, so the PMUFW needs to be reloaded.
AR# 67820 | |
---|---|
日期 | 11/15/2016 |
状态 | Active |
Type | 综合文章 |
器件 |