UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

AR# 67820

Zynq UltraScale+ MPSoC: 2016.3 PMUFW, Error Management

Description

This Xilinx Answer discuss the following topics:

  • Error Management Hardware
  • Error Management in PMU Firmware
  • Example for Error Management (Custom Handler)
  • Example for Error Management ( PoR as a response to Error)

Solution

Error Management Hardware

Zynq MPSoC has a dedicated error handler to aggregate all of the fatal errors across the SoC and handle them. Refer to the TRM/Architecture Spec for details.

To summarize, all of the fatal errors routed to Error Manager can be either set to be handled by Hardware ( and trigger a SRST/PoR) or trigger an interrupt to PMU.

Error Management in PMU Firmware

The PMU Firmware (PMUFW) provides APIs to register custom error handlers or assign a default SRST/PoR action in response to an Error. There is a specific module (xpfw_mod_em.c) already provided in the PMUFW and it is enabled by default.

All error handling code should reside in this module and there are already a couple of examples for handling WDT errors.

Actions for each error can be set up using the XPfw_EmSetAction API:

 /**
 * Set action to be taken when a specific error occurs
 *
 * @param ErrorId is the ID for error as defined in this file
 * @param ActionId is one of the actions defined in this file
 * @param ErrorHandler is the handler to be called in case of custom action
 *
 * @return XST_SUCCESS if the action was successfully registered
 *         XST_FAILURE if the registration fails
 */
s32 XPfw_EmSetAction(u8 ErrorId, u8 ActionId, XPfw_ErrorHandler_t ErrorHandler);
 

The following actions are supported for the parameter ActionId:

Action IDDescription
EM_ACTION_PORTrigger a Power-On-Reset
EM_ACTION_SRSTTrigger a System Reset
EM_ACTION_CUSTOMCall the custom handler registered as ErrorHandler parameter


Below is a list of Error IDs for the ErrorId parameter:

Error IDDescription
EM_ERR_ID_CSU_ROMErrors logged by CSU Boot ROM (CBR)
EM_ERR_ID_PMU_PBErrors logged by PMU Boot ROM (PBR) in the pre-boot stage
EM_ERR_ID_PMU_SERVICEErrors logged by PBR in service mode
EM_ERR_ID_PMU_FWErrors logged by PMUFW
EM_ERR_ID_PMU_UCUn-Correctable Errors logged by PMU HW. This includes PMU ROM validation Error, PMU TMR Error, uncorrectable PMU RAM ECC Error, and PMU Local Register Address Error.
EM_ERR_ID_CSUCSU Hardware related Errors
EM_ERR_ID_PLL_LOCKErrors set when a PLL looses lock (These need to be enabled only after the PLL locks-up)
EM_ERR_ID_PL PL Generic Errors passed to PS
EM_ERR_ID_TOAll Time-out Errors [FPS_TO, LPS_TO]
EM_ERR_ID_AUX3Auxiliary Error 3
EM_ERR_ID_AUX2Auxiliary Error 2
EM_ERR_ID_AUX1Auxiliary Error 1
EM_ERR_ID_AUX0Auxiliary Error 0
EM_ERR_ID_DFTError associated with the unexpected enablement of DFT features
EM_ERR_ID_CLK_MONClock Monitor Error
EM_ERR_ID_XMPUXPMU Errors [LPS XMPU, FPS XPMU]
EM_ERR_ID_PWR_SUPPLYSupply Detection Failure Errors
EM_ERR_ID_FPD_SWDTFPD System Watch-Dog Timer Error
EM_ERR_ID_LPD_SWDTLPD System Watch-Dog Timer Error
EM_ERR_ID_RPU_CCFAsserted if any of the RPU CCF errors are generated
EM_ERR_ID_RPU_LSAsserted if any of the RPU CCF errors are generated
EM_ERR_ID_FPD_TEMPFPD Temperature Shutdown Alert
EM_ERR_ID_LPD_TEMPLPD Temperature Shutdown Alert
EM_ERR_ID_RPU1RPU1 Error including both Correctable and Uncorrectable Errors
EM_ERR_ID_RPU0RPU0 Error including both Correctable and Uncorrectable Errors
EM_ERR_ID_OCM_ECCOCM Uncorrectable ECC Error
EM_ERR_ID_DDR_ECCDDR Uncorrectable ECC Error
 

Example for Error Management (Custom Handler)

In the example below, an OCM uncorrectable error (EM_ERR_ID_OCM_ECC)  is considered. 

A custom handler is registered for this error in the PMUFW and the handler in this case just prints out the error message. In a more realistic case, the corrupted memory might be reloaded, but this example is limited to clearing the error and printing a message.

Adding an Error Handler in the PMUFW:

Diff for xpfw_mod_em.c

@@ -88,6 +88,15 @@ static void FpdSwdtHandler(u8 ErrorId)                                 
        }                                                                                 
 }                                                                                        
                                                                                          
+/* OCM Uncorrectable Error Handler */                                                    
+static void OcmErrHandler(u8 ErrorId)                                                    
+{                                                                                        
+       fw_printf("EM: OCM ECC error detected\n");                                        
+       /* Clear the Error Status in OCM registers */                                     
+       XPfw_Write32(0xFF960004,BIT(7));                                                  
+                                                                                         
+}                                                                                        
+                                                                                         
 /* CfgInit Handler */                                                                    
 static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,                   
                u32 Len)                                                                  
@@ -102,6 +111,7 @@ static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,
        XPfw_EmSetAction(EM_ERR_ID_RPU_LS, EM_ACTION_CUSTOM, RpuLsHandler);               
        XPfw_EmSetAction(EM_ERR_ID_LPD_SWDT, EM_ACTION_CUSTOM, LpdSwdtHandler);           
        XPfw_EmSetAction(EM_ERR_ID_FPD_SWDT, EM_ACTION_CUSTOM, FpdSwdtHandler);           
+       XPfw_EmSetAction(EM_ERR_ID_OCM_ECC, EM_ACTION_CUSTOM, OcmErrHandler);             
                                                                                          
        fw_printf("EM Module (MOD-%d): Initialized.\r\n",                                 
                        ModPtr->ModId);                             
Injecting an Error using the debugger (xsdb):

Execute from an R5/A53 target on the XSDB:

 
# Enable ECC_UE interrupt in OCM_IEN
mwr -force 0xFF96000C [expr 1 << 7 ]
 
# Write to Fault Injection Data 0 Register OCM_FI_D0
# Errors will be injected in the bits which are set, here its bit0, bit1
mwr -force 0xFF96004C 3
 
# Enable ECC and Fault Injection
mwr -force 0xFF960014 1
 
# Clear the Count Register : OCM_FI_CNTR
mwr -force 0xFF960074 0
# Now write data to OCM for the fault to be injected
# Since OCM does a RMW for 32-bit transactions, it should trigger error here
mwr -force 0xFFFE0000 0x1234
 
# Read back to trigger error again
mrd -force 0xFFFE0000

Tip:

The above code is in Tcl for debugging. The Same code can be easily ported to a 'C' source by replacing the mwr/mrd with Xil_Out32/Xil_In32

Example for Error Management ( PoR as a response to Error)

Some errors might be too fatal, and the system recovery from those errors might not be feasible without doing a Reset of the entire system. 

In such cases PoR or SRST can be used as actions. In this example we use PoR reset as a response to the OCM ECC double-bit error.

Here is the code that adds the PoR as an action:

Diff for xpfw_mod_em.c

 @@ -102,6 +102,7 @@ static void EmCfgInit(const XPfw_Module_t *ModPtr, const u32 *CfgData,  
        XPfw_EmSetAction(EM_ERR_ID_RPU_LS, EM_ACTION_CUSTOM, RpuLsHandler);                 
        XPfw_EmSetAction(EM_ERR_ID_LPD_SWDT, EM_ACTION_CUSTOM, LpdSwdtHandler);             
        XPfw_EmSetAction(EM_ERR_ID_FPD_SWDT, EM_ACTION_CUSTOM, FpdSwdtHandler);             
+       XPfw_EmSetAction(EM_ERR_ID_OCM_ECC, EM_ACTION_POR, NULL);                           
                                                                                            
        fw_printf("EM Module (MOD-%d): Initialized.\r\n",                                   

The Tcl script is the same as the one from the above example to inject an OCM ECC error. 

Once you trigger the error, a PoR occurs and you can see that all processors are in a reset state, similar to how they would be in a fresh power-on state.

PMU RAM also gets cleared off during a PoR, so the PMUFW needs to be reloaded.

AR# 67820
Date Created 09/06/2016
Last Updated 11/15/2016
Status Active
Type General Article
Devices
  • Zynq UltraScale+ MPSoC