All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mathieu.desnoyers@polymtl.ca" <mathieu.desnoyers@polymtl.ca>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"heiko.carstens@de.ibm.com" <heiko.carstens@de.ibm.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"mikey@neuling.org" <mikey@neuling.org>,
	"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>,
	"donald.c.skidmore@intel.com" <donald.c.skidmore@intel.com>,
	"matthew.vick@intel.com" <matthew.vick@intel.com>,
	"geert@linux-m68k.org" <geert@linux-m68k.org>,
	"jeffrey.t.kirsher@intel.com" <jeffrey.t.kirsher@intel.com>,
	"romieu@fr.zoreil.com" <romieu@fr.zoreil.com>,
	"paulmck@linux.vnet.ibm.com" <paulmck@linux.vn>
Subject: Re: [PATCH 2/4] arch: Add lightweight memory barriers fast_rmb() and fast_wmb()
Date: Tue, 18 Nov 2014 11:58:36 +0000	[thread overview]
Message-ID: <20141118115836.GL18842@arm.com> (raw)
In-Reply-To: <546AB959.1020602@redhat.com>

On Tue, Nov 18, 2014 at 03:13:29AM +0000, Alexander Duyck wrote:
> On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote:
> >> Yes and no.  So for example on ARM I used the dmb() operation, however
> >> I
> >> have to use the barrier at the system level instead of just the inner
> >> shared domain.  However on many other architectures they are just the
> >> same as the smp_* variants.
> >>
> >> Basically the resultant code is somewhere between the smp and non-smp
> >> barriers in terms of what they cover.
> > There I don't quite follow you. You need to explain better especially in
> > the documentation because otherwise people will get it wrong...
> >
> > If it's ordering in the coherent domain, I fail to see how a DMA agent
> > is different than another processor when it comes to barriers, so I fail
> > to see the difference with smp_*
> >
> > I understand the MMIO vs. memory issue, we do have the same on powerpc,
> > but that other aspect eludes me.
> >
> 
> ARM adds some funky things.  They have two different types of 
> primitives, a dmb() which is a data memory barrier, and a dsb() which is 
> a data synchronization barrier.  Then with each of those they have the 
> "domains" the barriers are effective within.
> 
> So for example on ARM a rmb() is dsb(sy) which means it is a system wide 
> synchronization barrier which stops execution on the CPU core until the 
> read completes.  However the smp_rmb() is a dmb(ish) which means it is 
> only a barrier as far as the inner shareable domain which I believe only 
> goes as far as the local shared cache hierarchy and only guarantees read 
> ordering without necessarily halting the CPU or stopping in-order 
> speculative reads.  So what a coherent_rmb() would be in my setup is 
> dmb(sy) which means the barrier runs all the way out to memory, and it 
> is allowed to speculative read as long as it does it in order.
> 
> If it is still unclear you might check out Will Deacon's talk on the 
> topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in 
> he explains the whole domains thing, and at 13:30 he explains dmb()/dsb().

So actually, this is an interesting case where the barrier would like to
know whether the memory returned by dma_alloc_coherent is h/w coherent
(normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben
is thinking of the h/w coherent case (i.e. actual snooping into the CPU
caches by the DMA master).

For the former, we could use inner-shareable barriers. For the latter, we'd
need to use outer-shareable barriers.

If we can't tell, then these should be dmb(osh), which will work for both.

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mathieu.desnoyers@polymtl.ca" <mathieu.desnoyers@polymtl.ca>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"heiko.carstens@de.ibm.com" <heiko.carstens@de.ibm.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"mikey@neuling.org" <mikey@neuling.org>,
	"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>,
	"donald.c.skidmore@intel.com" <donald.c.skidmore@intel.com>,
	"matthew.vick@intel.com" <matthew.vick@intel.com>,
	"geert@linux-m68k.org" <geert@linux-m68k.org>,
	"jeffrey.t.kirsher@intel.com" <jeffrey.t.kirsher@intel.com>,
	"romieu@fr.zoreil.com" <romieu@fr.zoreil.com>,
	"paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>,
	"nic_swsd@realtek.com" <nic_swsd@realtek.com>,
	"michael@ellerman.id.au" <michael@ellerman.id.au>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"oleg@redhat.com" <oleg@redhat.com>,
	"schwidefsky@de.ibm.com" <schwidefsky@de.ibm.com>,
	"fweisbec@gmail.com" <fweisbec@gmail.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH 2/4] arch: Add lightweight memory barriers fast_rmb() and fast_wmb()
Date: Tue, 18 Nov 2014 11:58:36 +0000	[thread overview]
Message-ID: <20141118115836.GL18842@arm.com> (raw)
Message-ID: <20141118115836.GFQFAE6iT4KEBqZdDJg9MSA2cMzYdzez_IoNShGFxBQ@z> (raw)
In-Reply-To: <546AB959.1020602@redhat.com>

On Tue, Nov 18, 2014 at 03:13:29AM +0000, Alexander Duyck wrote:
> On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote:
> >> Yes and no.  So for example on ARM I used the dmb() operation, however
> >> I
> >> have to use the barrier at the system level instead of just the inner
> >> shared domain.  However on many other architectures they are just the
> >> same as the smp_* variants.
> >>
> >> Basically the resultant code is somewhere between the smp and non-smp
> >> barriers in terms of what they cover.
> > There I don't quite follow you. You need to explain better especially in
> > the documentation because otherwise people will get it wrong...
> >
> > If it's ordering in the coherent domain, I fail to see how a DMA agent
> > is different than another processor when it comes to barriers, so I fail
> > to see the difference with smp_*
> >
> > I understand the MMIO vs. memory issue, we do have the same on powerpc,
> > but that other aspect eludes me.
> >
> 
> ARM adds some funky things.  They have two different types of 
> primitives, a dmb() which is a data memory barrier, and a dsb() which is 
> a data synchronization barrier.  Then with each of those they have the 
> "domains" the barriers are effective within.
> 
> So for example on ARM a rmb() is dsb(sy) which means it is a system wide 
> synchronization barrier which stops execution on the CPU core until the 
> read completes.  However the smp_rmb() is a dmb(ish) which means it is 
> only a barrier as far as the inner shareable domain which I believe only 
> goes as far as the local shared cache hierarchy and only guarantees read 
> ordering without necessarily halting the CPU or stopping in-order 
> speculative reads.  So what a coherent_rmb() would be in my setup is 
> dmb(sy) which means the barrier runs all the way out to memory, and it 
> is allowed to speculative read as long as it does it in order.
> 
> If it is still unclear you might check out Will Deacon's talk on the 
> topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in 
> he explains the whole domains thing, and at 13:30 he explains dmb()/dsb().

So actually, this is an interesting case where the barrier would like to
know whether the memory returned by dma_alloc_coherent is h/w coherent
(normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben
is thinking of the h/w coherent case (i.e. actual snooping into the CPU
caches by the DMA master).

For the former, we could use inner-shareable barriers. For the latter, we'd
need to use outer-shareable barriers.

If we can't tell, then these should be dmb(osh), which will work for both.

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mathieu.desnoyers@polymtl.ca" <mathieu.desnoyers@polymtl.ca>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"heiko.carstens@de.ibm.com" <heiko.carstens@de.ibm.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"mikey@neuling.org" <mikey@neuling.org>,
	"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>,
	"donald.c.skidmore@intel.com" <donald.c.skidmore@intel.com>,
	"matthew.vick@intel.com" <matthew.vick@intel.com>,
	"geert@linux-m68k.org" <geert@linux-m68k.org>,
	"jeffrey.t.kirsher@intel.com" <jeffrey.t.kirsher@intel.com>,
	"romieu@fr.zoreil.com" <romieu@fr.zoreil.com>,
	"paulmck@linux.vnet.ibm.com" <paulmck@linux.vn
Subject: Re: [PATCH 2/4] arch: Add lightweight memory barriers fast_rmb() and fast_wmb()
Date: Tue, 18 Nov 2014 11:58:36 +0000	[thread overview]
Message-ID: <20141118115836.GL18842@arm.com> (raw)
In-Reply-To: <546AB959.1020602@redhat.com>

On Tue, Nov 18, 2014 at 03:13:29AM +0000, Alexander Duyck wrote:
> On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote:
> >> Yes and no.  So for example on ARM I used the dmb() operation, however
> >> I
> >> have to use the barrier at the system level instead of just the inner
> >> shared domain.  However on many other architectures they are just the
> >> same as the smp_* variants.
> >>
> >> Basically the resultant code is somewhere between the smp and non-smp
> >> barriers in terms of what they cover.
> > There I don't quite follow you. You need to explain better especially in
> > the documentation because otherwise people will get it wrong...
> >
> > If it's ordering in the coherent domain, I fail to see how a DMA agent
> > is different than another processor when it comes to barriers, so I fail
> > to see the difference with smp_*
> >
> > I understand the MMIO vs. memory issue, we do have the same on powerpc,
> > but that other aspect eludes me.
> >
> 
> ARM adds some funky things.  They have two different types of 
> primitives, a dmb() which is a data memory barrier, and a dsb() which is 
> a data synchronization barrier.  Then with each of those they have the 
> "domains" the barriers are effective within.
> 
> So for example on ARM a rmb() is dsb(sy) which means it is a system wide 
> synchronization barrier which stops execution on the CPU core until the 
> read completes.  However the smp_rmb() is a dmb(ish) which means it is 
> only a barrier as far as the inner shareable domain which I believe only 
> goes as far as the local shared cache hierarchy and only guarantees read 
> ordering without necessarily halting the CPU or stopping in-order 
> speculative reads.  So what a coherent_rmb() would be in my setup is 
> dmb(sy) which means the barrier runs all the way out to memory, and it 
> is allowed to speculative read as long as it does it in order.
> 
> If it is still unclear you might check out Will Deacon's talk on the 
> topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in 
> he explains the whole domains thing, and at 13:30 he explains dmb()/dsb().

So actually, this is an interesting case where the barrier would like to
know whether the memory returned by dma_alloc_coherent is h/w coherent
(normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben
is thinking of the h/w coherent case (i.e. actual snooping into the CPU
caches by the DMA master).

For the former, we could use inner-shareable barriers. For the latter, we'd
need to use outer-shareable barriers.

If we can't tell, then these should be dmb(osh), which will work for both.

Will

  reply	other threads:[~2014-11-18 11:58 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-17 17:17 [PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb() Alexander Duyck
2014-11-17 17:17 ` [PATCH 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
2014-11-17 17:18 ` [PATCH 2/4] arch: Add lightweight memory barriers fast_rmb() and fast_wmb() Alexander Duyck
2014-11-17 20:04   ` Benjamin Herrenschmidt
2014-11-17 20:24     ` Alexander Duyck
2014-11-18  0:39       ` Benjamin Herrenschmidt
2014-11-18  3:13         ` Alexander Duyck
2014-11-18 11:58           ` Will Deacon [this message]
2014-11-18 11:58             ` Will Deacon
2014-11-18 11:58             ` Will Deacon
2014-11-18 16:20             ` Alexander Duyck
2014-11-18 16:20               ` Alexander Duyck
2014-11-18 16:20               ` Alexander Duyck
2014-11-18 16:48               ` Will Deacon
2014-11-18 16:48                 ` Will Deacon
2014-11-18 16:48                 ` Will Deacon
2014-11-18 21:07           ` Benjamin Herrenschmidt
2014-11-17 20:18   ` Paul E. McKenney
2014-11-17 21:11     ` Alexander Duyck
2014-11-17 23:17       ` Paul E. McKenney
2014-11-18  3:33         ` Alexander Duyck
2014-11-18  0:38     ` Benjamin Herrenschmidt
2014-11-17 20:52   ` Linus Torvalds
2014-11-17 20:52     ` Linus Torvalds
2014-11-17 20:52     ` Linus Torvalds
2014-11-17 21:54     ` Alexander Duyck
2014-11-17 21:54       ` Alexander Duyck
2014-11-17 21:54       ` Alexander Duyck
2014-11-18  0:43       ` Benjamin Herrenschmidt
2014-11-18  0:43         ` Benjamin Herrenschmidt
2014-11-18  0:43         ` Benjamin Herrenschmidt
2014-11-18  0:41     ` Benjamin Herrenschmidt
2014-11-18  0:41       ` Benjamin Herrenschmidt
2014-11-18  0:41       ` Benjamin Herrenschmidt
2014-11-17 17:18 ` [PATCH 3/4] r8169: Use fast_rmb() and fast_wmb() for DescOwn checks Alexander Duyck
2014-11-17 17:18 ` [PATCH 4/4] fm10k/igb/ixgbe: Use fast_rmb on Rx descriptor reads Alexander Duyck
2014-11-17 21:32   ` Jeff Kirsher
2014-11-18  9:57 ` [PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb() David Laight
2014-11-18  9:57   ` David Laight
2014-11-18  9:57   ` David Laight
2014-11-18 15:44   ` Alexander Duyck
2014-11-18 15:44     ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141118115836.GL18842@arm.com \
    --to=will.deacon@arm.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=donald.c.skidmore@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=matthew.vick@intel.com \
    --cc=mikey@neuling.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vn \
    --cc=peterz@infradead.org \
    --cc=romieu@fr.zoreil.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.