From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Thu, 13 Sep 2018 14:17:12 +0100 Subject: Clarifying dma_wmb behavior in presence of non-coherent masters and outer caches In-Reply-To: <72f67412-a011-6552-03be-6730a9302ebb@pengutronix.de> References: <1530275290.22468.69.camel@pengutronix.de> <20180629142539.GH17271@n2100.armlinux.org.uk> <20180629162248.GB20010@arm.com> <20180629164801.GJ17271@n2100.armlinux.org.uk> <20180629174321.GA20909@arm.com> <1530539366.22468.89.camel@pengutronix.de> <20180702174552.GD23687@arm.com> <20180706122610.GA6993@arm.com> <72f67412-a011-6552-03be-6730a9302ebb@pengutronix.de> Message-ID: <20180913131711.GB13242@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Jul 09, 2018 at 08:20:22AM +0200, Oleksij Rempel wrote: > On 06.07.2018 14:26, Will Deacon wrote: > > On Mon, Jul 02, 2018 at 06:45:52PM +0100, Will Deacon wrote: > >> On Mon, Jul 02, 2018 at 03:49:26PM +0200, Lucas Stach wrote: > >>> Also the section "External caches", "Before AMBA4 ACE" seems to suggest > >>> that the barriers aren't fully propagated to the PL310 write-caches and > >>> master ports. I'm unsure if this is just an artifact of the mentioned > >>> MMIO access, so handling of normal vs. device memory transactions in > >>> the PL310. > >> > >> I'll check with the hardware folks about the PL310, but I don't see how > >> you could propagate barriers over AXI3 anyway because I don't think this > >> stuff existed back then. > > > > Damn. As I feared, you can't propagate the barrier to the PL310, so the > > A9 is unable to enforce ordering beyond the responses it gets back. The > > PL310 will then re-order normal memory accesses (including non-cacheable) > > in its store buffer, so we do actually need a bloody CACHE SYNC here. > > > > Patch below. Does it help? > > Suddenly no. I just noticed that I'm still carrying this patch in my tree. Are you saying it doesn't help on your system? It all went quiet, so I assume you fixed this issue one way or the other? Will > > From 5a2769b45e94b212d21bc44f5940ed5f3bde5cc2 Mon Sep 17 00:00:00 2001 > > From: Will Deacon > > Date: Wed, 4 Jul 2018 10:56:22 +0100 > > Subject: [PATCH] ARM: barrier: Kick outer-cache and SoC fabric in dma_wmb() > > > > Prior to AMBA4-ACE, the AXI bus protocol had no support for barrier > > transactions or shareability domains and was therefore unable to enforce > > ordering between memory accesses either side of a barrier instruction, > > requiring the CPU to handle all of the ordering itself. > > > > Unfortunately, IP such as the venerable PL310 will re-order stores to > > Normal memory inside its store-buffer, meaning that accesses to a coherent > > DMA buffer (i.e. one allocated by dma_alloc_coherent()) can be observed > > out-of-order by a non-coherent DMA engine unless the PL310 is kicked with > > a CACHE_SYNC command between buffering the accesses. Whilst this is already > > the case for mb() and wmb(), dma_wmb() just expands to a DMB OSHST, leading > > to sequences such as: > > > > STR Rdata, [Rbuf, #DATA_OFFSET] > > DMB OSHST > > STR Rvalid, [Rbuf, #VALID_OFFSET] > > > > being reordered such that the DMA device sees the write of valid but not > > the write of data. > > > > This patch ensures that the SoC fabric and the outer-cache are kicked > > by dma_wmb() if they are able to buffer transactions outside of the CPU > > but before the Point of Coherency. > > > > Reported-by: Lucas Stach > > Signed-off-by: Will Deacon > > --- > > arch/arm/include/asm/barrier.h | 14 ++++++++++---- > > arch/arm/include/asm/outercache.h | 6 ++++++ > > arch/arm/mm/flush.c | 24 +++++++++++++++++++++++- > > 3 files changed, 39 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h > > index 69772e742a0a..e31833f72b69 100644 > > --- a/arch/arm/include/asm/barrier.h > > +++ b/arch/arm/include/asm/barrier.h > > @@ -53,17 +53,23 @@ > > #ifdef CONFIG_ARM_HEAVY_MB > > extern void (*soc_mb)(void); > > extern void arm_heavy_mb(void); > > -#define __arm_heavy_mb(x...) do { dsb(x); arm_heavy_mb(); } while (0) > > +extern void arm_heavy_wmb(void); > > +extern void arm_heavy_dma_wmb(void); > > +#define __arm_heavy_mb() arm_heavy_mb() > > +#define __arm_heavy_wmb() arm_heavy_wmb() > > +#define __arm_heavy_dma_wmb() arm_heavy_dma_wmb() > > #else > > -#define __arm_heavy_mb(x...) dsb(x) > > +#define __arm_heavy_mb() dsb() > > +#define __arm_heavy_wmb() dsb(st) > > +#define __arm_heavy_dma_wmb() dmb(oshst) > > #endif > > > > #if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) > > #define mb() __arm_heavy_mb() > > #define rmb() dsb() > > -#define wmb() __arm_heavy_mb(st) > > +#define wmb() __arm_heavy_wmb() > > #define dma_rmb() dmb(osh) > > -#define dma_wmb() dmb(oshst) > > +#define dma_wmb() __arm_heavy_dma_wmb() > > #else > > #define mb() barrier() > > #define rmb() barrier() > > diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h > > index c2bf24f40177..e744efe6e2a7 100644 > > --- a/arch/arm/include/asm/outercache.h > > +++ b/arch/arm/include/asm/outercache.h > > @@ -43,6 +43,12 @@ struct outer_cache_fns { > > > > extern struct outer_cache_fns outer_cache; > > > > +#ifdef CONFIG_OUTER_CACHE_SYNC > > +static inline bool outer_cache_has_sync(void) { return outer_cache.sync; } > > +#else > > +static inline bool outer_cache_has_sync(void) { return 0; } > > +#endif > > + > > #ifdef CONFIG_OUTER_CACHE > > /** > > * outer_inv_range - invalidate range of outer cache lines > > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c > > index 58469623b015..1252689362d8 100644 > > --- a/arch/arm/mm/flush.c > > +++ b/arch/arm/mm/flush.c > > @@ -24,7 +24,7 @@ > > #ifdef CONFIG_ARM_HEAVY_MB > > void (*soc_mb)(void); > > > > -void arm_heavy_mb(void) > > +static void __arm_heavy_sync(void) > > { > > #ifdef CONFIG_OUTER_CACHE_SYNC > > if (outer_cache.sync) > > @@ -33,7 +33,29 @@ void arm_heavy_mb(void) > > if (soc_mb) > > soc_mb(); > > } > > + > > +void arm_heavy_mb(void) > > +{ > > + dsb(); > > + __arm_heavy_sync(); > > +} > > EXPORT_SYMBOL(arm_heavy_mb); > > + > > +void arm_heavy_wmb(void) > > +{ > > + dsb(st); > > + __arm_heavy_sync(); > > +} > > +EXPORT_SYMBOL(arm_heavy_wmb); > > + > > +void arm_heavy_dma_wmb(void) > > +{ > > + if (outer_cache_has_sync() || soc_mb) > > + arm_heavy_wmb(); > > + else > > + dmb(oshst); > > +} > > +EXPORT_SYMBOL(arm_heavy_dma_wmb); > > #endif > > > > #ifdef CONFIG_CPU_CACHE_VIPT > > >