When using the standalone BSP or an Operating System which uses it (such as FreeRTOS), accesses during synchronization barriers and other peripheral accesses can take multiple uS to complete with no obvious cause.
What could cause this behavior?
To support execution from QSPI (via execute-in-place) or external SMC devices, these peripherals are by default marked as "Normal" memory in the standalone BSP translation_table.S:
However, a side-effect of this memory setting is that speculative accesses can occur to these peripherals, even if unused.
During speculative accesses, a memory barrier or other peripheral access might be stalled until this slow access completes.
To resolve this issue, consider changing these memory ranges to device-type memory:
For more information, see (Xilinx Answer 52486). As of this writing, the Zynq UltraScale+ translation table marks these regions as Device ranges, and so is currently unaffected.