From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753518AbbEHOsY (ORCPT <rfc822;w@1wt.eu>);
	Fri, 8 May 2015 10:48:24 -0400
Received: from foss.arm.com ([217.140.101.70]:53226 "EHLO foss.arm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751340AbbEHOsX (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 8 May 2015 10:48:23 -0400
Date: Fri, 8 May 2015 15:48:20 +0100
From: Will Deacon <will.deacon@arm.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@gmail.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Question about barriers for ARM on tools/perf/
Message-ID: <20150508144820.GD25587@arm.com>
References: <20150508140459.GI7862@kernel.org>
 <20150508142107.GA25587@arm.com>
 <20150508142513.GM27504@twins.programming.kicks-ass.net>
 <20150508143729.GJ7862@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150508143729.GJ7862@kernel.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, May 08, 2015 at 03:37:29PM +0100, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > > Wouldn't it be better to go the other way, and use compiler builtins for
> > > the memory barriers instead of relying on the kernel? It looks like the
> > > perf_mmap__{read,write}_head functions are basically just acquire/release
> > > operations and could therefore be implemented using something like
> > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
>  
> > He wants to do smp refcounting, which needs atomic_inc() /
> > atomic_inc_non_zero() / atomic_dec_return() etc..
> 
> Right, Will concentrated on what we use those barriers for right now in
> tools/perf.
> 
> What I am doing right now is to expose what we use in perf to a wider
> audience, i.e. code being developed in tools/, with the current intent
> of implementing referece counting for multithreaded tools/perf/ tools,
> right now only 'perf top', but there are patches floating to load a
> perf.data file using as many CPUs as one would like, IIRC initially one
> per available CPU.
> 
> I am using as a fallback the gcc intrinsics (), but I've heard I rather
> should not use those, albeit they seemed to work well for x86_64 and
> sparc64:

Do you know what the objection to the intrinsics was? I believe that
the __sync versions are deprecated in favour of the C11-like __atomic
flavours, so if that was all the objection was about then we could use
one or the other depending on what the compiler supports.

> -------------------------------------------
> 
> /**
>  * atomic_inc - increment atomic variable
>  * @v: pointer of type atomic_t
>  *
>  * Atomically increments @v by 1.
>  */
> static inline void atomic_inc(atomic_t *v)
> {
>        __sync_add_and_fetch(&v->counter, 1);
> }
> 
> /**
>  * atomic_dec_and_test - decrement and test
>  * @v: pointer of type atomic_t
>  *
>  * Atomically decrements @v by 1 and
>  * returns true if the result is 0, or false for all other
>  * cases.
>  */
> static inline int atomic_dec_and_test(atomic_t *v)
> {
>        return __sync_sub_and_fetch(&v->counter, 1) == 0;
> }
> 
> -------------------------------------------
> 
> One of my hopes for a byproduct was to take advantage of improvements
> made to that code in the kernel, etc.
> 
> At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
> atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
> even cooler.

Perhaps, but including atomic.h sounds pretty fragile to me. Sure, if we
define the right set of macros we may get it to work today, but we could
easily get subtle breakages as the kernel sources move forward and we might
not easily notice/diagnose the failures in the perf tool.

Will