From: Peter Zijlstra <peterz@infradead.org>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
linux-kernel@vger.kernel.org,
Linux PPC dev <linuxppc-dev@ozlabs.org>,
anton@samba.org, Victor Kaplansky <VICTORK@il.ibm.com>
Subject: Re: perf events ring buffer memory barrier on powerpc
Date: Fri, 25 Oct 2013 19:37:49 +0200 [thread overview]
Message-ID: <20131025173749.GG19466@laptop.lan> (raw)
In-Reply-To: <20131023141948.GB3566@localhost.localdomain>
On Wed, Oct 23, 2013 at 03:19:51PM +0100, Frederic Weisbecker wrote:
> On Wed, Oct 23, 2013 at 10:54:54AM +1100, Michael Neuling wrote:
> > Frederic,
> >
> > The comment says atomic_dec_and_test() but the code is
> > local_dec_and_test().
> >
> > On powerpc, local_dec_and_test() doesn't have a memory barrier but
> > atomic_dec_and_test() does. Is the comment wrong, or is
> > local_dec_and_test() suppose to imply a memory barrier too and we have
> > it wrongly implemented in powerpc?
My bad; I converted from atomic to local without actually thinking it
seems. Your implementation of the local primitives is fine.
> > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > index cd55144..95768c6 100644
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -87,10 +87,10 @@ again:
> > goto out;
> >
> > /*
> > - * Publish the known good head. Rely on the full barrier implied
> > - * by atomic_dec_and_test() order the rb->head read and this
> > - * write.
> > + * Publish the known good head. We need a memory barrier to order the
> > + * order the rb->head read and this write.
> > */
> > + smp_mb ();
> > rb->user_page->data_head = head;
> >
> > /*
Right; so that would indeed be what the comment suggests it should be.
However I think the comment is now actively wrong too :-)
Since on the kernel side the buffer is strictly per-cpu, we don't need
memory barriers there.
> I think we want this ordering:
>
> Kernel User
>
> READ rb->user_page->data_tail READ rb->user_page->data_head
> smp_mb() smp_mb()
> WRITE rb data READ rb data
> smp_mb() smp_mb()
> rb->user_page->data_head WRITE rb->user_page->data_tail
>
I would argue for:
READ ->data_tail READ ->data_head
smp_rmb() (A) smp_rmb() (C)
WRITE $data READ $data
smp_wmb() (B) smp_mb() (D)
STORE ->data_head WRITE ->data_tail
Where A pairs with D, and B pairs with C.
I don't think A needs to be a full barrier because we won't in fact
write data until we see the store from userspace. So we simply don't
issue the data WRITE until we observe it.
OTOH, D needs to be a full barrier since it separates the data READ from
the tail WRITE.
For B a WMB is sufficient since it separates two WRITEs, and for C an
RMB is sufficient since it separates two READs.
---
kernel/events/ring_buffer.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index cd55144270b5..c91274ef4e23 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -87,10 +87,31 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
goto out;
/*
- * Publish the known good head. Rely on the full barrier implied
- * by atomic_dec_and_test() order the rb->head read and this
- * write.
+ * Since the mmap() consumer (userspace) can run on a different CPU:
+ *
+ * kernel user
+ *
+ * READ ->data_tail READ ->data_head
+ * smp_rmb() (A) smp_rmb() (C)
+ * WRITE $data READ $data
+ * smp_wmb() (B) smp_mb() (D)
+ * STORE ->data_head WRITE ->data_tail
+ *
+ * Where A pairs with D, and B pairs with C.
+ *
+ * I don't think A needs to be a full barrier because we won't in fact
+ * write data until we see the store from userspace. So we simply don't
+ * issue the data WRITE until we observe it.
+ *
+ * OTOH, D needs to be a full barrier since it separates the data READ
+ * from the tail WRITE.
+ *
+ * For B a WMB is sufficient since it separates two WRITEs, and for C
+ * an RMB is sufficient since it separates two READs.
+ *
+ * See perf_output_begin().
*/
+ smp_wmb();
rb->user_page->data_head = head;
/*
@@ -154,6 +175,8 @@ int perf_output_begin(struct perf_output_handle *handle,
* Userspace could choose to issue a mb() before updating the
* tail pointer. So that all reads will be completed before the
* write is issued.
+ *
+ * See perf_output_put_handle().
*/
tail = ACCESS_ONCE(rb->user_page->data_tail);
smp_rmb();
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>,
benh@kernel.crashing.org, anton@samba.org,
linux-kernel@vger.kernel.org,
Linux PPC dev <linuxppc-dev@ozlabs.org>,
Victor Kaplansky <VICTORK@il.ibm.com>,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
michael@ellerman.id.au
Subject: Re: perf events ring buffer memory barrier on powerpc
Date: Fri, 25 Oct 2013 19:37:49 +0200 [thread overview]
Message-ID: <20131025173749.GG19466@laptop.lan> (raw)
In-Reply-To: <20131023141948.GB3566@localhost.localdomain>
On Wed, Oct 23, 2013 at 03:19:51PM +0100, Frederic Weisbecker wrote:
> On Wed, Oct 23, 2013 at 10:54:54AM +1100, Michael Neuling wrote:
> > Frederic,
> >
> > The comment says atomic_dec_and_test() but the code is
> > local_dec_and_test().
> >
> > On powerpc, local_dec_and_test() doesn't have a memory barrier but
> > atomic_dec_and_test() does. Is the comment wrong, or is
> > local_dec_and_test() suppose to imply a memory barrier too and we have
> > it wrongly implemented in powerpc?
My bad; I converted from atomic to local without actually thinking it
seems. Your implementation of the local primitives is fine.
> > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > index cd55144..95768c6 100644
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -87,10 +87,10 @@ again:
> > goto out;
> >
> > /*
> > - * Publish the known good head. Rely on the full barrier implied
> > - * by atomic_dec_and_test() order the rb->head read and this
> > - * write.
> > + * Publish the known good head. We need a memory barrier to order the
> > + * order the rb->head read and this write.
> > */
> > + smp_mb ();
> > rb->user_page->data_head = head;
> >
> > /*
Right; so that would indeed be what the comment suggests it should be.
However I think the comment is now actively wrong too :-)
Since on the kernel side the buffer is strictly per-cpu, we don't need
memory barriers there.
> I think we want this ordering:
>
> Kernel User
>
> READ rb->user_page->data_tail READ rb->user_page->data_head
> smp_mb() smp_mb()
> WRITE rb data READ rb data
> smp_mb() smp_mb()
> rb->user_page->data_head WRITE rb->user_page->data_tail
>
I would argue for:
READ ->data_tail READ ->data_head
smp_rmb() (A) smp_rmb() (C)
WRITE $data READ $data
smp_wmb() (B) smp_mb() (D)
STORE ->data_head WRITE ->data_tail
Where A pairs with D, and B pairs with C.
I don't think A needs to be a full barrier because we won't in fact
write data until we see the store from userspace. So we simply don't
issue the data WRITE until we observe it.
OTOH, D needs to be a full barrier since it separates the data READ from
the tail WRITE.
For B a WMB is sufficient since it separates two WRITEs, and for C an
RMB is sufficient since it separates two READs.
---
kernel/events/ring_buffer.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index cd55144270b5..c91274ef4e23 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -87,10 +87,31 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
goto out;
/*
- * Publish the known good head. Rely on the full barrier implied
- * by atomic_dec_and_test() order the rb->head read and this
- * write.
+ * Since the mmap() consumer (userspace) can run on a different CPU:
+ *
+ * kernel user
+ *
+ * READ ->data_tail READ ->data_head
+ * smp_rmb() (A) smp_rmb() (C)
+ * WRITE $data READ $data
+ * smp_wmb() (B) smp_mb() (D)
+ * STORE ->data_head WRITE ->data_tail
+ *
+ * Where A pairs with D, and B pairs with C.
+ *
+ * I don't think A needs to be a full barrier because we won't in fact
+ * write data until we see the store from userspace. So we simply don't
+ * issue the data WRITE until we observe it.
+ *
+ * OTOH, D needs to be a full barrier since it separates the data READ
+ * from the tail WRITE.
+ *
+ * For B a WMB is sufficient since it separates two WRITEs, and for C
+ * an RMB is sufficient since it separates two READs.
+ *
+ * See perf_output_begin().
*/
+ smp_wmb();
rb->user_page->data_head = head;
/*
@@ -154,6 +175,8 @@ int perf_output_begin(struct perf_output_handle *handle,
* Userspace could choose to issue a mb() before updating the
* tail pointer. So that all reads will be completed before the
* write is issued.
+ *
+ * See perf_output_put_handle().
*/
tail = ACCESS_ONCE(rb->user_page->data_tail);
smp_rmb();
next prev parent reply other threads:[~2013-10-25 17:38 UTC|newest]
Thread overview: 215+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-22 23:54 perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-23 7:39 ` Victor Kaplansky
2013-10-23 7:39 ` Victor Kaplansky
2013-10-23 14:19 ` Frederic Weisbecker
2013-10-23 14:19 ` Frederic Weisbecker
2013-10-23 14:25 ` Frederic Weisbecker
2013-10-23 14:25 ` Frederic Weisbecker
2013-10-25 17:37 ` Peter Zijlstra [this message]
2013-10-25 17:37 ` Peter Zijlstra
2013-10-25 20:31 ` Michael Neuling
2013-10-25 20:31 ` Michael Neuling
2013-10-27 9:00 ` Victor Kaplansky
2013-10-27 9:00 ` Victor Kaplansky
2013-10-28 9:22 ` Peter Zijlstra
2013-10-28 9:22 ` Peter Zijlstra
2013-10-28 10:02 ` Frederic Weisbecker
2013-10-28 10:02 ` Frederic Weisbecker
2013-10-28 12:38 ` Victor Kaplansky
2013-10-28 12:38 ` Victor Kaplansky
2013-10-28 13:26 ` Peter Zijlstra
2013-10-28 13:26 ` Peter Zijlstra
2013-10-28 16:34 ` Paul E. McKenney
2013-10-28 16:34 ` Paul E. McKenney
2013-10-28 20:17 ` Oleg Nesterov
2013-10-28 20:17 ` Oleg Nesterov
2013-10-28 20:58 ` Victor Kaplansky
2013-10-28 20:58 ` Victor Kaplansky
2013-10-29 10:21 ` Peter Zijlstra
2013-10-29 10:21 ` Peter Zijlstra
2013-10-29 10:30 ` Peter Zijlstra
2013-10-29 10:30 ` Peter Zijlstra
2013-10-29 10:35 ` Peter Zijlstra
2013-10-29 10:35 ` Peter Zijlstra
2013-10-29 20:15 ` Oleg Nesterov
2013-10-29 20:15 ` Oleg Nesterov
2013-10-29 19:27 ` Vince Weaver
2013-10-29 19:27 ` Vince Weaver
2013-10-30 10:42 ` Peter Zijlstra
2013-10-30 10:42 ` Peter Zijlstra
2013-10-30 11:48 ` James Hogan
2013-10-30 11:48 ` James Hogan
2013-10-30 12:48 ` Peter Zijlstra
2013-10-30 12:48 ` Peter Zijlstra
2013-11-06 13:19 ` [tip:perf/core] tools/perf: Add required memory barriers tip-bot for Peter Zijlstra
2013-11-06 13:50 ` Vince Weaver
2013-11-06 14:00 ` Peter Zijlstra
2013-11-06 14:28 ` Peter Zijlstra
2013-11-06 14:55 ` Vince Weaver
2013-11-06 15:10 ` Peter Zijlstra
2013-11-06 15:23 ` Peter Zijlstra
2013-11-06 14:44 ` Peter Zijlstra
2013-11-06 16:07 ` Peter Zijlstra
2013-11-06 17:31 ` Vince Weaver
2013-11-06 18:24 ` Peter Zijlstra
2013-11-07 8:21 ` Ingo Molnar
2013-11-07 14:27 ` Vince Weaver
2013-11-07 15:55 ` Ingo Molnar
2013-11-11 16:24 ` Peter Zijlstra
2013-11-11 21:10 ` Ingo Molnar
2013-10-29 21:23 ` perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-29 21:23 ` Michael Neuling
2013-10-30 9:27 ` Paul E. McKenney
2013-10-30 9:27 ` Paul E. McKenney
2013-10-30 11:25 ` Peter Zijlstra
2013-10-30 11:25 ` Peter Zijlstra
2013-10-30 14:52 ` Victor Kaplansky
2013-10-30 14:52 ` Victor Kaplansky
2013-10-30 15:39 ` Peter Zijlstra
2013-10-30 15:39 ` Peter Zijlstra
2013-10-30 17:14 ` Victor Kaplansky
2013-10-30 17:14 ` Victor Kaplansky
2013-10-30 17:44 ` Peter Zijlstra
2013-10-30 17:44 ` Peter Zijlstra
2013-10-31 6:16 ` Paul E. McKenney
2013-10-31 6:16 ` Paul E. McKenney
2013-11-01 13:12 ` Victor Kaplansky
2013-11-01 13:12 ` Victor Kaplansky
2013-11-02 16:36 ` Paul E. McKenney
2013-11-02 16:36 ` Paul E. McKenney
2013-11-02 17:26 ` Paul E. McKenney
2013-11-02 17:26 ` Paul E. McKenney
2013-10-31 6:40 ` Paul E. McKenney
2013-10-31 6:40 ` Paul E. McKenney
2013-11-01 14:25 ` Victor Kaplansky
2013-11-01 14:25 ` Victor Kaplansky
2013-11-02 17:28 ` Paul E. McKenney
2013-11-02 17:28 ` Paul E. McKenney
2013-11-01 14:56 ` Peter Zijlstra
2013-11-01 14:56 ` Peter Zijlstra
2013-11-02 17:32 ` Paul E. McKenney
2013-11-02 17:32 ` Paul E. McKenney
2013-11-03 14:40 ` Paul E. McKenney
2013-11-03 14:40 ` Paul E. McKenney
2013-11-03 15:17 ` [RFC] arch: Introduce new TSO memory barrier smp_tmb() Peter Zijlstra
2013-11-03 15:17 ` Peter Zijlstra
2013-11-03 18:08 ` Linus Torvalds
2013-11-03 18:08 ` Linus Torvalds
2013-11-03 20:01 ` Peter Zijlstra
2013-11-03 20:01 ` Peter Zijlstra
2013-11-03 22:42 ` Paul E. McKenney
2013-11-03 22:42 ` Paul E. McKenney
2013-11-03 23:34 ` Linus Torvalds
2013-11-03 23:34 ` Linus Torvalds
2013-11-04 10:51 ` Paul E. McKenney
2013-11-04 10:51 ` Paul E. McKenney
2013-11-04 11:22 ` Peter Zijlstra
2013-11-04 11:22 ` Peter Zijlstra
2013-11-04 16:27 ` Paul E. McKenney
2013-11-04 16:27 ` Paul E. McKenney
2013-11-04 16:48 ` Peter Zijlstra
2013-11-04 16:48 ` Peter Zijlstra
2013-11-04 19:11 ` Peter Zijlstra
2013-11-04 19:11 ` Peter Zijlstra
2013-11-04 19:18 ` Peter Zijlstra
2013-11-04 19:18 ` Peter Zijlstra
2013-11-04 20:54 ` Paul E. McKenney
2013-11-04 20:54 ` Paul E. McKenney
2013-11-04 20:53 ` Paul E. McKenney
2013-11-04 20:53 ` Paul E. McKenney
2013-11-05 14:05 ` Will Deacon
2013-11-05 14:05 ` Will Deacon
2013-11-05 14:49 ` Paul E. McKenney
2013-11-05 14:49 ` Paul E. McKenney
2013-11-05 18:49 ` Peter Zijlstra
2013-11-05 18:49 ` Peter Zijlstra
2013-11-06 11:00 ` Will Deacon
2013-11-06 11:00 ` Will Deacon
2013-11-06 12:39 ` Peter Zijlstra
2013-11-06 12:39 ` Peter Zijlstra
2013-11-06 12:51 ` Geert Uytterhoeven
2013-11-06 12:51 ` Geert Uytterhoeven
2013-11-06 13:57 ` Peter Zijlstra
2013-11-06 13:57 ` Peter Zijlstra
2013-11-06 18:48 ` Paul E. McKenney
2013-11-06 18:48 ` Paul E. McKenney
2013-11-06 19:42 ` Peter Zijlstra
2013-11-06 19:42 ` Peter Zijlstra
2013-11-07 11:17 ` Will Deacon
2013-11-07 11:17 ` Will Deacon
2013-11-07 13:36 ` Peter Zijlstra
2013-11-07 13:36 ` Peter Zijlstra
2013-11-07 23:50 ` Mathieu Desnoyers
2013-11-07 23:50 ` Mathieu Desnoyers
2013-11-04 11:05 ` Will Deacon
2013-11-04 11:05 ` Will Deacon
2013-11-04 16:34 ` Paul E. McKenney
2013-11-04 16:34 ` Paul E. McKenney
2013-11-03 20:59 ` Benjamin Herrenschmidt
2013-11-03 20:59 ` Benjamin Herrenschmidt
2013-11-03 22:43 ` Paul E. McKenney
2013-11-03 22:43 ` Paul E. McKenney
2013-11-03 17:07 ` perf events ring buffer memory barrier on powerpc Will Deacon
2013-11-03 22:47 ` Paul E. McKenney
2013-11-04 9:57 ` Will Deacon
2013-11-04 10:52 ` Paul E. McKenney
2013-11-01 16:11 ` Peter Zijlstra
2013-11-01 16:11 ` Peter Zijlstra
2013-11-02 17:46 ` Paul E. McKenney
2013-11-02 17:46 ` Paul E. McKenney
2013-11-01 16:18 ` Peter Zijlstra
2013-11-01 16:18 ` Peter Zijlstra
2013-11-02 17:49 ` Paul E. McKenney
2013-11-02 17:49 ` Paul E. McKenney
2013-10-30 13:28 ` Victor Kaplansky
2013-10-30 13:28 ` Victor Kaplansky
2013-10-30 15:51 ` Peter Zijlstra
2013-10-30 15:51 ` Peter Zijlstra
2013-10-30 18:29 ` Peter Zijlstra
2013-10-30 18:29 ` Peter Zijlstra
2013-10-30 19:11 ` Peter Zijlstra
2013-10-30 19:11 ` Peter Zijlstra
2013-10-31 4:33 ` Paul E. McKenney
2013-10-31 4:33 ` Paul E. McKenney
2013-10-31 4:32 ` Paul E. McKenney
2013-10-31 4:32 ` Paul E. McKenney
2013-10-31 9:04 ` Peter Zijlstra
2013-10-31 9:04 ` Peter Zijlstra
2013-10-31 15:07 ` Paul E. McKenney
2013-10-31 15:07 ` Paul E. McKenney
2013-10-31 15:19 ` Peter Zijlstra
2013-10-31 15:19 ` Peter Zijlstra
2013-11-01 9:28 ` Paul E. McKenney
2013-11-01 9:28 ` Paul E. McKenney
2013-11-01 10:30 ` Peter Zijlstra
2013-11-01 10:30 ` Peter Zijlstra
2013-11-02 15:20 ` Paul E. McKenney
2013-11-02 15:20 ` Paul E. McKenney
2013-11-04 9:07 ` Peter Zijlstra
2013-11-04 9:07 ` Peter Zijlstra
2013-11-04 10:00 ` Paul E. McKenney
2013-11-04 10:00 ` Paul E. McKenney
2013-10-31 9:59 ` Victor Kaplansky
2013-10-31 9:59 ` Victor Kaplansky
2013-10-31 12:28 ` David Laight
2013-10-31 12:28 ` David Laight
2013-10-31 12:55 ` Victor Kaplansky
2013-10-31 12:55 ` Victor Kaplansky
2013-10-31 15:25 ` Paul E. McKenney
2013-10-31 15:25 ` Paul E. McKenney
2013-11-01 16:06 ` Victor Kaplansky
2013-11-01 16:06 ` Victor Kaplansky
2013-11-01 16:25 ` David Laight
2013-11-01 16:25 ` David Laight
2013-11-01 16:30 ` Victor Kaplansky
2013-11-01 16:30 ` Victor Kaplansky
2013-11-03 20:57 ` Benjamin Herrenschmidt
2013-11-03 20:57 ` Benjamin Herrenschmidt
2013-11-02 15:46 ` Paul E. McKenney
2013-11-02 15:46 ` Paul E. McKenney
2013-10-28 19:09 ` Oleg Nesterov
2013-10-28 19:09 ` Oleg Nesterov
2013-10-29 14:06 ` [tip:perf/urgent] perf: Fix perf ring buffer memory ordering tip-bot for Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2014-05-08 20:46 perf events ring buffer memory barrier on powerpc Mikulas Patocka
[not found] ` <OF667059AA.7F151BCC-ONC2257CD3.0036CFEB-C2257CD3.003BBF01@il.ibm.com>
2014-05-09 12:20 ` Mikulas Patocka
2014-05-09 13:47 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131025173749.GG19466@laptop.lan \
--to=peterz@infradead.org \
--cc=VICTORK@il.ibm.com \
--cc=anton@samba.org \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mikey@neuling.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.