From mboxrd@z Thu Jan  1 00:00:00 1970
From: Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers
 instead of {rmb,mb}
Date: Thu, 18 Oct 2018 10:14:34 +0200
Message-ID: <20181018081434.GT3121@hirez.programming.kicks-ass.net>
References: <20181017144156.16639-1-daniel@iogearbox.net>
 <20181017144156.16639-3-daniel@iogearbox.net>
 <20181017155050.GM3121@hirez.programming.kicks-ass.net>
 <55f86215-44a8-2bb8-b1d0-a77a142dc697@iogearbox.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: alexei.starovoitov@gmail.com, paulmck@linux.vnet.ibm.com,
        will.deacon@arm.com, acme@redhat.com, yhs@fb.com,
        john.fastabend@gmail.com, netdev@vger.kernel.org
To: Daniel Borkmann <daniel@iogearbox.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from merlin.infradead.org ([205.233.59.134]:39366 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726323AbeJRQOp (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 18 Oct 2018 12:14:45 -0400
Content-Disposition: inline
In-Reply-To: <55f86215-44a8-2bb8-b1d0-a77a142dc697@iogearbox.net>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Thu, Oct 18, 2018 at 01:10:15AM +0200, Daniel Borkmann wrote:

> Wouldn't this then also allow the kernel side to use smp_store_release()
> when it updates the head? We'd be pretty much at the model as described
> in Documentation/core-api/circular-buffers.rst.
> 
> Meaning, rough pseudo-code diff would look as:
> 
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5d3cf40..3d96275 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -84,8 +84,9 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
>  	 *
>  	 * See perf_output_begin().
>  	 */
> -	smp_wmb(); /* B, matches C */
> -	rb->user_page->data_head = head;
> +
> +	/* B, matches C */
> +	smp_store_release(&rb->user_page->data_head, head);

Yes, this would be correct.

The reason we didn't do this is because smp_store_release() ends up
being smp_mb() + WRITE_ONCE() for a fair number of platforms, even if
they have a cheaper smp_wmb(). Most notably ARM.

(ARM64 OTOH would like to have smp_store_release() there I imagine;
while x86 doesn't care either way around).

A similar concern exists for the smp_load_acquire() I proposed for the
userspace side, ARM would have to resort to smp_mb() in that situation,
instead of the cheaper smp_rmb().

The smp_store_release() on the userspace side will actually be of equal
cost or cheaper, since it already has an smp_mb(). Most notably, x86 can
avoid barrier entirely, because TSO doesn't allow the LOAD-STORE reorder
(it only allows the STORE-LOAD reorder). And PowerPC can use LWSYNC
instead of SYNC.