From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Clarifying dma_wmb behavior in presence of non-coherent masters and outer caches
Date: Thu, 13 Sep 2018 14:17:12 +0100 [thread overview]
Message-ID: <20180913131711.GB13242@arm.com> (raw)
In-Reply-To: <72f67412-a011-6552-03be-6730a9302ebb@pengutronix.de>
On Mon, Jul 09, 2018 at 08:20:22AM +0200, Oleksij Rempel wrote:
> On 06.07.2018 14:26, Will Deacon wrote:
> > On Mon, Jul 02, 2018 at 06:45:52PM +0100, Will Deacon wrote:
> >> On Mon, Jul 02, 2018 at 03:49:26PM +0200, Lucas Stach wrote:
> >>> Also the section "External caches", "Before AMBA4 ACE" seems to suggest
> >>> that the barriers aren't fully propagated to the PL310 write-caches and
> >>> master ports. I'm unsure if this is just an artifact of the mentioned
> >>> MMIO access, so handling of normal vs. device memory transactions in
> >>> the PL310.
> >>
> >> I'll check with the hardware folks about the PL310, but I don't see how
> >> you could propagate barriers over AXI3 anyway because I don't think this
> >> stuff existed back then.
> >
> > Damn. As I feared, you can't propagate the barrier to the PL310, so the
> > A9 is unable to enforce ordering beyond the responses it gets back. The
> > PL310 will then re-order normal memory accesses (including non-cacheable)
> > in its store buffer, so we do actually need a bloody CACHE SYNC here.
> >
> > Patch below. Does it help?
>
> Suddenly no.
I just noticed that I'm still carrying this patch in my tree. Are you saying
it doesn't help on your system? It all went quiet, so I assume you fixed
this issue one way or the other?
Will
> > From 5a2769b45e94b212d21bc44f5940ed5f3bde5cc2 Mon Sep 17 00:00:00 2001
> > From: Will Deacon <will.deacon@arm.com>
> > Date: Wed, 4 Jul 2018 10:56:22 +0100
> > Subject: [PATCH] ARM: barrier: Kick outer-cache and SoC fabric in dma_wmb()
> >
> > Prior to AMBA4-ACE, the AXI bus protocol had no support for barrier
> > transactions or shareability domains and was therefore unable to enforce
> > ordering between memory accesses either side of a barrier instruction,
> > requiring the CPU to handle all of the ordering itself.
> >
> > Unfortunately, IP such as the venerable PL310 will re-order stores to
> > Normal memory inside its store-buffer, meaning that accesses to a coherent
> > DMA buffer (i.e. one allocated by dma_alloc_coherent()) can be observed
> > out-of-order by a non-coherent DMA engine unless the PL310 is kicked with
> > a CACHE_SYNC command between buffering the accesses. Whilst this is already
> > the case for mb() and wmb(), dma_wmb() just expands to a DMB OSHST, leading
> > to sequences such as:
> >
> > STR Rdata, [Rbuf, #DATA_OFFSET]
> > DMB OSHST
> > STR Rvalid, [Rbuf, #VALID_OFFSET]
> >
> > being reordered such that the DMA device sees the write of valid but not
> > the write of data.
> >
> > This patch ensures that the SoC fabric and the outer-cache are kicked
> > by dma_wmb() if they are able to buffer transactions outside of the CPU
> > but before the Point of Coherency.
> >
> > Reported-by: Lucas Stach <l.stach@pengutronix.de>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> > arch/arm/include/asm/barrier.h | 14 ++++++++++----
> > arch/arm/include/asm/outercache.h | 6 ++++++
> > arch/arm/mm/flush.c | 24 +++++++++++++++++++++++-
> > 3 files changed, 39 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
> > index 69772e742a0a..e31833f72b69 100644
> > --- a/arch/arm/include/asm/barrier.h
> > +++ b/arch/arm/include/asm/barrier.h
> > @@ -53,17 +53,23 @@
> > #ifdef CONFIG_ARM_HEAVY_MB
> > extern void (*soc_mb)(void);
> > extern void arm_heavy_mb(void);
> > -#define __arm_heavy_mb(x...) do { dsb(x); arm_heavy_mb(); } while (0)
> > +extern void arm_heavy_wmb(void);
> > +extern void arm_heavy_dma_wmb(void);
> > +#define __arm_heavy_mb() arm_heavy_mb()
> > +#define __arm_heavy_wmb() arm_heavy_wmb()
> > +#define __arm_heavy_dma_wmb() arm_heavy_dma_wmb()
> > #else
> > -#define __arm_heavy_mb(x...) dsb(x)
> > +#define __arm_heavy_mb() dsb()
> > +#define __arm_heavy_wmb() dsb(st)
> > +#define __arm_heavy_dma_wmb() dmb(oshst)
> > #endif
> >
> > #if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
> > #define mb() __arm_heavy_mb()
> > #define rmb() dsb()
> > -#define wmb() __arm_heavy_mb(st)
> > +#define wmb() __arm_heavy_wmb()
> > #define dma_rmb() dmb(osh)
> > -#define dma_wmb() dmb(oshst)
> > +#define dma_wmb() __arm_heavy_dma_wmb()
> > #else
> > #define mb() barrier()
> > #define rmb() barrier()
> > diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
> > index c2bf24f40177..e744efe6e2a7 100644
> > --- a/arch/arm/include/asm/outercache.h
> > +++ b/arch/arm/include/asm/outercache.h
> > @@ -43,6 +43,12 @@ struct outer_cache_fns {
> >
> > extern struct outer_cache_fns outer_cache;
> >
> > +#ifdef CONFIG_OUTER_CACHE_SYNC
> > +static inline bool outer_cache_has_sync(void) { return outer_cache.sync; }
> > +#else
> > +static inline bool outer_cache_has_sync(void) { return 0; }
> > +#endif
> > +
> > #ifdef CONFIG_OUTER_CACHE
> > /**
> > * outer_inv_range - invalidate range of outer cache lines
> > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> > index 58469623b015..1252689362d8 100644
> > --- a/arch/arm/mm/flush.c
> > +++ b/arch/arm/mm/flush.c
> > @@ -24,7 +24,7 @@
> > #ifdef CONFIG_ARM_HEAVY_MB
> > void (*soc_mb)(void);
> >
> > -void arm_heavy_mb(void)
> > +static void __arm_heavy_sync(void)
> > {
> > #ifdef CONFIG_OUTER_CACHE_SYNC
> > if (outer_cache.sync)
> > @@ -33,7 +33,29 @@ void arm_heavy_mb(void)
> > if (soc_mb)
> > soc_mb();
> > }
> > +
> > +void arm_heavy_mb(void)
> > +{
> > + dsb();
> > + __arm_heavy_sync();
> > +}
> > EXPORT_SYMBOL(arm_heavy_mb);
> > +
> > +void arm_heavy_wmb(void)
> > +{
> > + dsb(st);
> > + __arm_heavy_sync();
> > +}
> > +EXPORT_SYMBOL(arm_heavy_wmb);
> > +
> > +void arm_heavy_dma_wmb(void)
> > +{
> > + if (outer_cache_has_sync() || soc_mb)
> > + arm_heavy_wmb();
> > + else
> > + dmb(oshst);
> > +}
> > +EXPORT_SYMBOL(arm_heavy_dma_wmb);
> > #endif
> >
> > #ifdef CONFIG_CPU_CACHE_VIPT
> >
>
next prev parent reply other threads:[~2018-09-13 13:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-29 12:28 Clarifying dma_wmb behavior in presence of non-coherent masters and outer caches Lucas Stach
2018-06-29 14:25 ` Russell King - ARM Linux
2018-06-29 16:19 ` Lucas Stach
2018-06-29 16:22 ` Will Deacon
2018-06-29 16:48 ` Russell King - ARM Linux
2018-06-29 17:43 ` Will Deacon
2018-06-29 18:01 ` Russell King - ARM Linux
2018-07-02 13:49 ` Lucas Stach
2018-07-02 17:45 ` Will Deacon
2018-07-06 12:26 ` Will Deacon
2018-07-09 6:20 ` Oleksij Rempel
2018-09-13 13:17 ` Will Deacon [this message]
2018-09-13 14:09 ` Oleksij Rempel
2018-07-09 9:45 ` Lucas Stach
2018-06-29 17:14 ` Lucas Stach
2018-06-29 17:46 ` Will Deacon
2018-07-02 9:58 ` Lucas Stach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180913131711.GB13242@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).