From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb} Date: Thu, 18 Oct 2018 10:14:34 +0200 Message-ID: <20181018081434.GT3121@hirez.programming.kicks-ass.net> References: <20181017144156.16639-1-daniel@iogearbox.net> <20181017144156.16639-3-daniel@iogearbox.net> <20181017155050.GM3121@hirez.programming.kicks-ass.net> <55f86215-44a8-2bb8-b1d0-a77a142dc697@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: alexei.starovoitov@gmail.com, paulmck@linux.vnet.ibm.com, will.deacon@arm.com, acme@redhat.com, yhs@fb.com, john.fastabend@gmail.com, netdev@vger.kernel.org To: Daniel Borkmann Return-path: Received: from merlin.infradead.org ([205.233.59.134]:39366 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726323AbeJRQOp (ORCPT ); Thu, 18 Oct 2018 12:14:45 -0400 Content-Disposition: inline In-Reply-To: <55f86215-44a8-2bb8-b1d0-a77a142dc697@iogearbox.net> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Oct 18, 2018 at 01:10:15AM +0200, Daniel Borkmann wrote: > Wouldn't this then also allow the kernel side to use smp_store_release() > when it updates the head? We'd be pretty much at the model as described > in Documentation/core-api/circular-buffers.rst. > > Meaning, rough pseudo-code diff would look as: > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 5d3cf40..3d96275 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -84,8 +84,9 @@ static void perf_output_put_handle(struct perf_output_handle *handle) > * > * See perf_output_begin(). > */ > - smp_wmb(); /* B, matches C */ > - rb->user_page->data_head = head; > + > + /* B, matches C */ > + smp_store_release(&rb->user_page->data_head, head); Yes, this would be correct. The reason we didn't do this is because smp_store_release() ends up being smp_mb() + WRITE_ONCE() for a fair number of platforms, even if they have a cheaper smp_wmb(). Most notably ARM. (ARM64 OTOH would like to have smp_store_release() there I imagine; while x86 doesn't care either way around). A similar concern exists for the smp_load_acquire() I proposed for the userspace side, ARM would have to resort to smp_mb() in that situation, instead of the cheaper smp_rmb(). The smp_store_release() on the userspace side will actually be of equal cost or cheaper, since it already has an smp_mb(). Most notably, x86 can avoid barrier entirely, because TSO doesn't allow the LOAD-STORE reorder (it only allows the STORE-LOAD reorder). And PowerPC can use LWSYNC instead of SYNC.