Under certain timing circumstances, an MVA data or unified cache line maintenance operation that targets an Inner Shareable memory region can fail to proceed up to either the Point of Coherency or to the Point of Unification of the system. This is likely to affect self-modifying code. For this problem to occur, the two processors work in the SMP mode with the broadcasting of CP15 maintenance operations being enabled. This issue has a known work-around.
Follow recommended procedure, see Work-around Details below for more information.
Systems that use one or both ARM processors in SMP mode.
|Device Revision(s) Affected:||All, no plan to fix. Refer to (Xilinx Answer 47916) - Design Advisory Master Answer Record for Zynq-7000 Devices.|
Under certain timing circumstances, a data or unified cache line maintenance operation by MVA targeting an Inner Shareable memory region can fail to proceed up to either the Point of Coherency or to the Point of Unification of the system. This is likely to affect self-modifying code.
For this problem to occur, the two processors work in the SMP mode with the broadcasting of CP15 maintenance operations being enabled.
The following scenario illustrates how the issue can happen:
One CPU performs a data or unified cache line maintenance operation by MVA targeting a memory region which is locally dirty.
A second CPU issues a memory request targeting this same memory location within the same time frame.
A race condition can occur, resulting in the cache operation not being performed up to the specified Point, either Coherency or Unification.
The following maintenance operations are affected:
DCIMVAC: Invalidate data or unified cache line by MVA to PoC
DCCMVAC:Clean data or unified cache line by MVA to PoC
DCCMVAU: Clean data or unified cache line by MVA to PoU
DCCIMVAC:Clean and invalidate data or unified cache line by MVA to PoC
The issue can arise when the second CPU is performing either of:
A read request resulting from any Load instruction; the Load can be a speculative one.
A write request resulting from any Store instruction.
A data Pre-fetch resulting from a PLD instruction; the PLD can be a speculative one.
Since the cache maintenance operation is not ensured to be executed to either the Point of Unification or the Point of Coherence, stale data can remain in the data cache, and not become visible to other cache agents who should have gained visibility on it.
As such, self-modifying code can fail, the new code sequence written into the Data Cache not having been made visible to the Instruction Cache.
Note that the data remains coherent on the L1 Data side. Any data read from the other processor in the Cortex-A9 MP-Core cluster, or from the ACP, would see the correct data. Identically, any write from another processor in the Cortex-A9 MP-Core cluster, or from the ACP, on the same cache line, will not cause a data corruption resulting from a loss of either data.
Note that false sharing on a memory region used for self modifying code is extremely unlikely to exist. As such, the write operation targeting the same cache line than the cache operation occurring within the timing window required to trigger this issue might not represent a real case. So the issue trigger in the case of self-modifying code is probably restricted to read operations being the consequence of either a speculative load, or a blind PLD instruction.
In addition, production of data to an agent external from the coherency domain can fail; particularly, the data target of the cache maintenance operation might not have been made visible to an external DMA engine when it completes. Again, false sharing on a memory region also accessed by an external agent like a DMA engine is extremely unlikely to exist. As such, the issue trigger when producing data for an external DMA agent is probably restricted to read operations being the consequence of either a speculative load, or a blind PLD instruction.
To work around this issue, ARM recommends implementing the following:
Ensure there is no false sharing (on a cache line size alignment) for both self-modifying code and data to be cleaned to an external agent like a DMA engine.
Set bit 0 in the undocumented SCU diagnostic control register located at offset 0x30 from the PERIPHBASE address. Setting this bit disables the migratory bit feature. This forces a dirty cache line to be evicted to the lower memory subsystem which is both the point of coherency and the point of unification when it is being read by another processor
Insert a DSB instruction in front of the cache maintenance operation. Please note that if the cache maintenance operation is executed within a loop with no other memory operations, ARM only recommends adding a DSB prior to entering the loop.
This will reduce the probability of the occurrence of this issue, but does not eliminate it. Therefore, Xilinx proposes another work-around under which the issue has never been observed. In this work-around a double-flush operation is executed:
STR > DSB > DCC_MVA > DSB > DCC_MVA