AR# 67820

|

Zynq UltraScale+ MPSoC: 2016.3 PMUFW, Error Management

描述

This Xilinx Answer discuss the following topics:

  • Error Management Hardware
  • Error Management in PMU Firmware
  • Example for Error Management (Custom Handler)
  • Example for Error Management ( PoR as a response to Error)

解决方案

Error Management Hardware

Zynq MPSoC has a dedicated error handler to aggregate all of the fatal errors across the SoC and handle them. Refer to the TRM/Architecture Spec for details.

To summarize, all of the fatal errors routed to Error Manager can be either set to be handled by Hardware ( and trigger a SRST/PoR) or trigger an interrupt to PMU.

Error Management in PMU Firmware

The PMU Firmware (PMUFW) provides APIs to register custom error handlers or assign a default SRST/PoR action in response to an Error. There is a specific module (xpfw_mod_em.c) already provided in the PMUFW and it is enabled by default.

All error handling code should reside in this module and there are already a couple of examples for handling WDT errors.

Actions for each error can be set up using the XPfw_EmSetAction API:

 /**
 * Set action to be taken when a specific error occurs
 *
 * @param ErrorId is the ID for error as defined in this file
 * @param ActionId is one of the actions defined in this file
 * @param ErrorHandler is the handler to be called in case of custom action
 *
 * @return XST_SUCCESS if the action was successfully registered
 *         XST_FAILURE if the registration fails
 */
s32 XPfw_EmSetAction(u8 ErrorId, u8 ActionId, XPfw_ErrorHandler_t ErrorHandler);
 

The following actions are supported for the parameter ActionId:

Action IDDescription
EM_ACTION_PORTrigger a Power-On-Reset
EM_ACTION_SRSTTrigger a System Reset
EM_ACTION_CUSTOMCall the custom handler registered as ErrorHandler parameter


Below is a list of Error IDs for the ErrorId parameter:

Error IDDescription
EM_ERR_ID_CSU_ROMErrors logged by CSU Boot ROM (CBR)
EM_ERR_ID_PMU_PBErrors logged by PMU Boot ROM (PBR) in the pre-boot stage
EM_ERR_ID_PMU_SERVICEErrors logged by PBR in service mode
EM_ERR_ID_PMU_FWErrors logged by PMUFW
EM_ERR_ID_PMU_UCUn-Correctable Errors logged by PMU HW. This includes PMU ROM validation Error, PMU TMR Error, uncorrectable PMU RAM ECC Error, and PMU Local Register Address Error.
EM_ERR_ID_CSUCSU Hardware related Errors
EM_ERR_ID_PLL_LOCKErrors set when a PLL looses lock (These need to be enabled only after the PLL locks-up)
EM_ERR_ID_PL PL Generic Errors passed to PS
EM_ERR_ID_TOAll Time-out Errors [FPS_TO, LPS_TO]
EM_ERR_ID_AUX3Auxiliary Error 3
EM_ERR_ID_AUX2Auxiliary Error 2
EM_ERR_ID_AUX1Auxiliary Error 1
EM_ERR_ID_AUX0Auxiliary Error 0
EM_ERR_ID_DFTError associated with the unexpected enablement of DFT features
EM_ERR_ID_CLK_MONClock Monitor Error
EM_ERR_ID_XMPUXPMU Errors [LPS XMPU, FPS XPMU]
EM_ERR_ID_PWR_SUPPLYSupply Detection Failure Errors
EM_ERR_ID_FPD_SWDTFPD System Watch-Dog Timer Error
EM_ERR_ID_LPD_SWDTLPD System Watch-Dog Timer Error
EM_ERR_ID_RPU_CCFAsserted if any of the RPU CCF errors are generated
EM_ERR_ID_RPU_LSAsserted if any of the RPU CCF errors are generated
EM_ERR_ID_FPD_TEMPFPD Temperature Shutdown Alert
EM_ERR_ID_LPD_TEMPLPD Temperature Shutdown Alert
EM_ERR_ID_RPU1RPU1 Error including both Correctable and Uncorrectable Errors
EM_ERR_ID_RPU0RPU0 Error including both Correctable and Uncorrectable Errors
EM_ERR_ID_OCM_ECCOCM Uncorrectable ECC Error
EM_ERR_ID_DDR_ECCDDR Uncorrectable ECC Error
 

Example for Error Management (Custom Handler)

In the example below, an OCM uncorrectable error (EM_ERR_ID_OCM_ECC)  is considered. 

A custom handler is registered for this error in the PMUFW and the handler in this case just prints out the error message. In a more realistic case, the corrupted memory might be reloaded, but this example is limited to clearing the error and printing a message.

Adding an Error Handler in the PMUFW:

Diff for xpfw_mod_em.c

@@ -88,6 +88,15 @@ static void FpdSwdtHandler(u8 ErrorId)                                 
        }                                                                                 
 }                                                                                        
                                                                                          
+/* OCM Uncorrectable Error Handler */                                                    
+static void OcmErrHandler(u8 ErrorId)                                                    
+{                                                                                        
+       fw_printf("EM: OCM ECC error detected\n");                                        
+       /* Clear the Error Status in OCM registers */                                     
+       XPfw_Write32(0xFF960004,BIT(7));                                                  
+                                                                                         
+}                                                                                        
+                                                                                         
 /* CfgInit Handler */                                                                    
 static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,                   
                u32 Len)                                                                  
@@ -102,6 +111,7 @@ static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,
        XPfw_EmSetAction(EM_ERR_ID_RPU_LS, EM_ACTION_CUSTOM, RpuLsHandler);               
        XPfw_EmSetAction(EM_ERR_ID_LPD_SWDT, EM_ACTION_CUSTOM, LpdSwdtHandler);           
        XPfw_EmSetAction(EM_ERR_ID_FPD_SWDT, EM_ACTION_CUSTOM, FpdSwdtHandler);           
+       XPfw_EmSetAction(EM_ERR_ID_OCM_ECC, EM_ACTION_CUSTOM, OcmErrHandler);             
                                                                                          
        fw_printf("EM Module (MOD-%d): Initialized.\r\n",                                 
                        ModPtr->ModId);                             
Injecting an Error using the debugger (xsdb):

Execute from an R5/A53 target on the XSDB:

 
# Enable ECC_UE interrupt in OCM_IEN
mwr -force 0xFF96000C [expr 1 << 7 ]
 
# Write to Fault Injection Data 0 Register OCM_FI_D0
# Errors will be injected in the bits which are set, here its bit0, bit1
mwr -force 0xFF96004C 3
 
# Enable ECC and Fault Injection
mwr -force 0xFF960014 1
 
# Clear the Count Register : OCM_FI_CNTR
mwr -force 0xFF960074 0
# Now write data to OCM for the fault to be injected
# Since OCM does a RMW for 32-bit transactions, it should trigger error here
mwr -force 0xFFFE0000 0x1234
 
# Read back to trigger error again
mrd -force 0xFFFE0000

Tip:

The above code is in Tcl for debugging. The Same code can be easily ported to a 'C' source by replacing the mwr/mrd with Xil_Out32/Xil_In32

Example for Error Management ( PoR as a response to Error)

Some errors might be too fatal, and the system recovery from those errors might not be feasible without doing a Reset of the entire system. 

In such cases PoR or SRST can be used as actions. In this example we use PoR reset as a response to the OCM ECC double-bit error.

Here is the code that adds the PoR as an action:

Diff for xpfw_mod_em.c

 @@ -102,6 +102,7 @@ static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,  
        XPfw_EmSetAction(EM_ERR_ID_RPU_LS, EM_ACTION_CUSTOM, RpuLsHandler);                 
        XPfw_EmSetAction(EM_ERR_ID_LPD_SWDT, EM_ACTION_CUSTOM, LpdSwdtHandler);             
        XPfw_EmSetAction(EM_ERR_ID_FPD_SWDT, EM_ACTION_CUSTOM, FpdSwdtHandler);             
+       XPfw_EmSetAction(EM_ERR_ID_OCM_ECC, EM_ACTION_POR, NULL);                           
                                                                                            
        fw_printf("EM Module (MOD-%d): Initialized.\r\n",                                   

The Tcl script is the same as the one from the above example to inject an OCM ECC error. 

Once you trigger the error, a PoR occurs and you can see that all processors are in a reset state, similar to how they would be in a fresh power-on state.

PMU RAM also gets cleared off during a PoR, so the PMUFW needs to be reloaded.

AR# 67820
日期 11/15/2016
状态 Active
Type 综合文章
器件
People Also Viewed