* memory barriers in rte_ring @ 2014-03-27 16:48 Olivier MATZ [not found] ` <53345655.9030907-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Olivier MATZ @ 2014-03-27 16:48 UTC (permalink / raw) To: dev-VfR2kkLFssw@public.gmane.org Hi, The commit 286bd05bf7 [1] removed the memory barriers in the ring functions. This patch is present in DPDK since version 1.4.0r0, so I guess it does not cause any issue. But after checking the excellent Linux kernel documentation about memory barriers [2], I'm wondering why memory barriers would not be required in that case. To illustrate the previous behavior (before dpdk 1.4): ring_enqueue() - move producer_head to reserve space in ring (atomically if multi producers) - write objects between producer_head and producer_tail - wmb() to ensure that STORE operations are issued - write producer_tail ring_dequeue() - move consumer_head (atomically if multi consumers) - rmb() to ensure that LOAD operations are issued: the read of consumer_head must occur before the reading of objects ptrs. In fact, rmb() is probably not needed here because knowing the value of consumer_head is required before reading the objects table. - read objects between consumer_head and consumer_tail - write consumer_tail The memory barriers have been removed, but in my understanding at least the wmb() would be needed according to the generic memory barrier documentation. Maybe this is not needed on newest Intel processors? Could anyone from Intel enlight me on this? Thanks & regards, Olivier [1] http://dpdk.org/browse/dpdk/commit/lib/librte_ring/rte_ring.h?id=286bd05bf70d1da1b6017007276c267a1e012c1d [2] http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <53345655.9030907-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>]
* Re: memory barriers in rte_ring [not found] ` <53345655.9030907-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> @ 2014-03-27 19:06 ` Stephen Hemminger 2014-03-27 19:47 ` Olivier MATZ 0 siblings, 1 reply; 5+ messages in thread From: Stephen Hemminger @ 2014-03-27 19:06 UTC (permalink / raw) To: Olivier MATZ; +Cc: dev-VfR2kkLFssw@public.gmane.org On Thu, 27 Mar 2014 17:48:21 +0100 Olivier MATZ <olivier.matz-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote: > Hi, > > The commit 286bd05bf7 [1] removed the memory barriers in the ring > functions. This patch is present in DPDK since version 1.4.0r0, so I > guess it does not cause any issue. > > But after checking the excellent Linux kernel documentation about memory > barriers [2], I'm wondering why memory barriers would not be required in > that case. > > To illustrate the previous behavior (before dpdk 1.4): > > ring_enqueue() > - move producer_head to reserve space in ring (atomically if > multi producers) > - write objects between producer_head and producer_tail > - wmb() to ensure that STORE operations are issued > - write producer_tail > > ring_dequeue() > - move consumer_head (atomically if multi consumers) > - rmb() to ensure that LOAD operations are issued: the read of > consumer_head must occur before the reading of objects ptrs. > In fact, rmb() is probably not needed here because knowing the > value of consumer_head is required before reading the objects > table. > - read objects between consumer_head and consumer_tail > - write consumer_tail > > The memory barriers have been removed, but in my understanding at least > the wmb() would be needed according to the generic memory barrier > documentation. Maybe this is not needed on newest Intel processors? > Could anyone from Intel enlight me on this? > > Thanks & regards, > Olivier > > > [1] > http://dpdk.org/browse/dpdk/commit/lib/librte_ring/rte_ring.h?id=286bd05bf70d1da1b6017007276c267a1e012c1d > > [2] http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt Short answer, only a compiler barrier is necessary. Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled for a few special cases, for most cases it is not. When cpu doesnt do out of order stores, there are no cases where other cpu will see wrong state. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: memory barriers in rte_ring 2014-03-27 19:06 ` Stephen Hemminger @ 2014-03-27 19:47 ` Olivier MATZ [not found] ` <53348059.6000505-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Olivier MATZ @ 2014-03-27 19:47 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev-VfR2kkLFssw@public.gmane.org Hi Stephen, On 03/27/2014 08:06 PM, Stephen Hemminger wrote: > Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb > in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler > compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled > for a few special cases, for most cases it is not. When cpu doesnt do out of order > stores, there are no cases where other cpu will see wrong state. Thank you for this clarification. So, if I understand properly, all usages of rte_*mb() sequencing memory operations between CPUs could be replaced by a compiler barrier. On the other hand, if the memory is also accessed by a device, a memory barrier has to be used. Olivier ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <53348059.6000505-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>]
* Re: memory barriers in rte_ring [not found] ` <53348059.6000505-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> @ 2014-03-27 20:20 ` Stephen Hemminger 2014-03-27 23:53 ` Venkatesan, Venky 0 siblings, 1 reply; 5+ messages in thread From: Stephen Hemminger @ 2014-03-27 20:20 UTC (permalink / raw) To: Olivier MATZ; +Cc: dev-VfR2kkLFssw@public.gmane.org On Thu, 27 Mar 2014 20:47:37 +0100 Olivier MATZ <olivier.matz-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote: > Hi Stephen, > > On 03/27/2014 08:06 PM, Stephen Hemminger wrote: > > Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb > > in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler > > compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled > > for a few special cases, for most cases it is not. When cpu doesnt do out of order > > stores, there are no cases where other cpu will see wrong state. > > Thank you for this clarification. > > So, if I understand properly, all usages of rte_*mb() sequencing memory > operations between CPUs could be replaced by a compiler barrier. On the > other hand, if the memory is also accessed by a device, a memory > barrier has to be used. > > Olivier > I think so for the current architecture that DPDK runs on. It might be good to abstract this in some way for eventual users in other environments. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: memory barriers in rte_ring 2014-03-27 20:20 ` Stephen Hemminger @ 2014-03-27 23:53 ` Venkatesan, Venky 0 siblings, 0 replies; 5+ messages in thread From: Venkatesan, Venky @ 2014-03-27 23:53 UTC (permalink / raw) To: Stephen Hemminger, Olivier MATZ; +Cc: dev-VfR2kkLFssw@public.gmane.org One caveat - a compiler_barrier should be enough when both sides are using strongly-ordered memory operations (as in the case of the rings). Weakly ordered operations will still need fencing. -Venky -----Original Message----- From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Stephen Hemminger Sent: Thursday, March 27, 2014 1:20 PM To: Olivier MATZ Cc: dev-VfR2kkLFssw@public.gmane.org Subject: Re: [dpdk-dev] memory barriers in rte_ring On Thu, 27 Mar 2014 20:47:37 +0100 Olivier MATZ <olivier.matz-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> wrote: > Hi Stephen, > > On 03/27/2014 08:06 PM, Stephen Hemminger wrote: > > Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb > > in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler > > compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled > > for a few special cases, for most cases it is not. When cpu doesnt do out of order > > stores, there are no cases where other cpu will see wrong state. > > Thank you for this clarification. > > So, if I understand properly, all usages of rte_*mb() sequencing > memory operations between CPUs could be replaced by a compiler > barrier. On the other hand, if the memory is also accessed by a > device, a memory barrier has to be used. > > Olivier > I think so for the current architecture that DPDK runs on. It might be good to abstract this in some way for eventual users in other environments. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-27 23:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-27 16:48 memory barriers in rte_ring Olivier MATZ [not found] ` <53345655.9030907-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> 2014-03-27 19:06 ` Stephen Hemminger 2014-03-27 19:47 ` Olivier MATZ [not found] ` <53348059.6000505-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> 2014-03-27 20:20 ` Stephen Hemminger 2014-03-27 23:53 ` Venkatesan, Venky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).