linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] perf: fix building for ARCv1
       [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
@ 2015-10-17 14:19 ` Vineet Gupta
  2015-10-18 11:15   ` Alexey Brodkin
  2015-10-30  6:21 ` Vineet Gupta
  1 sibling, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-17 14:19 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Peter Zijlstra, Aabid Rushdi, lkml,
	linux-perf-users@vger.kernel.org, Darren Hart

On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ----------------->8-----------------
> undefined reference to `__sync_add_and_fetch_4'
> ----------------->8-----------------
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.

Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
that support.
So what we are saying is that any arch (or a configuration thereof) which doesn't
support atomic r-m-w can't even build perf now - that sucks !

A better way would be to do feature test for __sync_xyz and make atomic_xxx
wrappers call __sync_xyz) vs. an empty stub.
So atleast such arches can build and do "some" perf work !

-Vineet

> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> ---
>  tools/perf/config/Makefile | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>    $(call detected,CONFIG_PERF_REGS)
>  endif

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
@ 2015-10-18 11:15   ` Alexey Brodkin
  2015-10-18 23:14     ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Brodkin @ 2015-10-18 11:15 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org,
	linux-perf-users@vger.kernel.org, dvhart@linux.intel.com,
	dsahern@gmail.com, acme@redhat.com

Hi Vineet,

Looks like this time atomics are a must. And that really sucks!

See these commits that introduce usage of atomic_xxx() all around the perf and tools it uses:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f812d3045c2385ac16237e68b156859c4005526e
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3a7c489c7fd2463e3b2c3a2179c7be879dd9cb4
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7143849a5d6a5c623d81790d92f0033507c5b14f
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=59a51c1dc9fbb3fb4af928b852d7b35df83edd74
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1ed3a5b87ed6759e16ec93f16aae83d2cc77ca2

and that's the one that introduced usage of the following generic gcc's atomics
(__sync_add_and_fetch/__sync_sub_and_fetch):
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da6d8567512df11e0473b710c07de87efde5709c

So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
perf building.

-Alexey

On Sat, 2015-10-17 at 14:19 +0000, Vineet Gupta wrote:
> On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> > Perf uses atomic options and so it is required to have atomics enabled
> > in toolchain.
> > 
> > In case of ARC atomics are enabled by default for ARCv2 but disabled for
> > ARCv1. Now we explicitly enable atomics for either ARC achitecture
> > version so perf could be successfully built.
> > 
> > Currently on attempt to build perf for ARCv1 you'll see tons of:
> > ----------------->8-----------------
> > undefined reference to `__sync_add_and_fetch_4'
> > ----------------->8-----------------
> > 
> > Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> > execution once "llock" instruction is attempted to be executed.
> 
> Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
> that support.
> So what we are saying is that any arch (or a configuration thereof) which doesn't
> support atomic r-m-w can't even build perf now - that sucks !
> 
> A better way would be to do feature test for __sync_xyz and make atomic_xxx
> wrappers call __sync_xyz) vs. an empty stub.
> So atleast such arches can build and do "some" perf work !
> 
> -Vineet
> 
> > Cc: Vineet Gupta <vgupta@synopsys.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> > ---
> >  tools/perf/config/Makefile | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> > index 38a0853..dc7c0a8 100644
> > --- a/tools/perf/config/Makefile
> > +++ b/tools/perf/config/Makefile
> > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
> >    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
> >  endif
> >  
> > +# Additional ARCH settings for ARC
> > +ifeq ($(ARCH),arc)
> > +  CFLAGS += -matomic
> > +endif
> > +
> >  ifeq ($(NO_PERF_REGS),0)
> >    $(call detected,CONFIG_PERF_REGS)
> >  endif
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-18 11:15   ` Alexey Brodkin
@ 2015-10-18 23:14     ` Andi Kleen
  2015-10-19  4:58       ` Vineet Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2015-10-18 23:14 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Vineet Gupta, Aabid Rushdi, linux-kernel@vger.kernel.org,
	peterz@infradead.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes:
>
> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
> perf building.

If your target is single cpu only you can always simulate them in C.

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-18 23:14     ` Andi Kleen
@ 2015-10-19  4:58       ` Vineet Gupta
  2015-10-19  5:49         ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-19  4:58 UTC (permalink / raw)
  To: Andi Kleen, Alexey Brodkin
  Cc: Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org,
	linux-perf-users@vger.kernel.org, dvhart@linux.intel.com,
	dsahern@gmail.com, acme@redhat.com

On Monday 19 October 2015 04:45 AM, Andi Kleen wrote:
> Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes:
>> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
>> perf building.
> If your target is single cpu only you can always simulate them in C.
>
> -Andi

But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
C just by itself.

-Vineet

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  4:58       ` Vineet Gupta
@ 2015-10-19  5:49         ` Andi Kleen
  2015-10-19  9:28           ` Vineet Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2015-10-19  5:49 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org,
	peterz@infradead.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>
> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
> C just by itself.

It matters when you access the perf ring buffer which is updated by kernel.
Also perf is now multi threaded to some degree.

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  5:49         ` Andi Kleen
@ 2015-10-19  9:28           ` Vineet Gupta
  2015-10-19  9:35             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-19  9:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org,
	peterz@infradead.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>> C just by itself.
> It matters when you access the perf ring buffer which is updated by kernel.

That's part of the problem. The issue is with atomic_* APIs proliferation in perf
user space code which assumes native atomix r-m-w support which is not always
true. So I think we still need a feature detection mechanism and if absent leave
the ball in arch court by calling arch_atomic_* which can use creative or half
working measures so perf will work to some extent atleast and not bomb outright.

Also can u please elaborate a bit on "simulate them in C" - u mean just simple
unprotected LD, OP, ST or do u fancy usage of futex etc?

> Also perf is now multi threaded to some degree.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:28           ` Vineet Gupta
@ 2015-10-19  9:35             ` Peter Zijlstra
  2015-10-19  9:46               ` Vineet Gupta
  2016-10-18 18:58               ` Vineet Gupta
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2015-10-19  9:35 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
> >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
> >> C just by itself.
> > It matters when you access the perf ring buffer which is updated by kernel.
> 
> That's part of the problem. The issue is with atomic_* APIs proliferation in perf
> user space code which assumes native atomix r-m-w support which is not always
> true. So I think we still need a feature detection mechanism and if absent leave
> the ball in arch court by calling arch_atomic_* which can use creative or half
> working measures so perf will work to some extent atleast and not bomb outright.
> 
> Also can u please elaborate a bit on "simulate them in C" - u mean just simple
> unprotected LD, OP, ST or do u fancy usage of futex etc?

Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
IRQ-disabled load-op-store sequence.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:35             ` Peter Zijlstra
@ 2015-10-19  9:46               ` Vineet Gupta
  2015-10-19  9:51                 ` Peter Zijlstra
  2016-10-18 18:58               ` Vineet Gupta
  1 sibling, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-19  9:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Monday 19 October 2015 03:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
>> > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>>>> > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>>>> > >> C just by itself.
>>> > > It matters when you access the perf ring buffer which is updated by kernel.
>> > 
>> > That's part of the problem. The issue is with atomic_* APIs proliferation in perf
>> > user space code which assumes native atomix r-m-w support which is not always
>> > true. So I think we still need a feature detection mechanism and if absent leave
>> > the ball in arch court by calling arch_atomic_* which can use creative or half
>> > working measures so perf will work to some extent atleast and not bomb outright.
>> > 
>> > Also can u please elaborate a bit on "simulate them in C" - u mean just simple
>> > unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

Yeah I remember seeing some syscall like that in ARM.

On ARC we could use the atomic EXchange to implement a user space only binary
semaphore - these atomic ops will be small duration so it is OK to spin wait for a
little bit. That's how the old pthread library worked for ARC w/o any atomic support.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:46               ` Vineet Gupta
@ 2015-10-19  9:51                 ` Peter Zijlstra
  2015-10-19 10:04                   ` Vineet Gupta
  2015-10-20  8:00                   ` Vineet Gupta
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2015-10-19  9:51 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
> On ARC we could use the atomic EXchange to implement a user space only binary
> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
> little bit. That's how the old pthread library worked for ARC w/o any atomic support.

That has the obvious problem of lock-holder-preemption and the horrible
performance issues that result from that.

I think the syscall at least has deterministic behaviour, whereas that
userspace spin loop has this abysmal worst case thing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:51                 ` Peter Zijlstra
@ 2015-10-19 10:04                   ` Vineet Gupta
  2015-10-20  8:00                   ` Vineet Gupta
  1 sibling, 0 replies; 22+ messages in thread
From: Vineet Gupta @ 2015-10-19 10:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
>> > On ARC we could use the atomic EXchange to implement a user space only binary
>> > semaphore - these atomic ops will be small duration so it is OK to spin wait for a
>> > little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I agree - we can add that syscall trivially and use it based on build time feature
detection for atomics !

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:51                 ` Peter Zijlstra
  2015-10-19 10:04                   ` Vineet Gupta
@ 2015-10-20  8:00                   ` Vineet Gupta
  2015-10-20 10:11                     ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-20  8:00 UTC (permalink / raw)
  To: Peter Zijlstra, Noam Camus
  Cc: Andi Kleen, Alexey Brodkin, Gilad Ben Yossef,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com

On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
>> On ARC we could use the atomic EXchange to implement a user space only binary
>> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
>> little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I don't have issue with adding the syscall per-se. But that comes with it's own
headaches of ABI change - more importantly it requires several things to match,
libc, kernel...  It would be easier if change was confined to say perf.

Can we use existing syscall(s) - again this is what our good old pthread library
code did.

static void __pthread_acquire(int * spinlock)
{
  int cnt = 0;
  struct timespec tm;

  READ_MEMORY_BARRIER();

  while (testandset(spinlock)) {   <---- atomic EXchange
    if (cnt < 50) {
      sched_yield();
      cnt++;
    } else {
      tm.tv_sec = 0;
      tm.tv_nsec = 2000001;
      nanosleep(&tm, ((void *)0));
      cnt = 0;
    }
  }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20  8:00                   ` Vineet Gupta
@ 2015-10-20 10:11                     ` Peter Zijlstra
  2015-10-20 10:45                       ` Vineet Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-10-20 10:11 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com,
	Thomas Gleixner

On Tue, Oct 20, 2015 at 08:00:46AM +0000, Vineet Gupta wrote:
> On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
> >> On ARC we could use the atomic EXchange to implement a user space only binary
> >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
> >> little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> > That has the obvious problem of lock-holder-preemption and the horrible
> > performance issues that result from that.
> >
> > I think the syscall at least has deterministic behaviour, whereas that
> > userspace spin loop has this abysmal worst case thing.
> 
> I don't have issue with adding the syscall per-se. But that comes with it's own
> headaches of ABI change - more importantly it requires several things to match,
> libc, kernel...  It would be easier if change was confined to say perf.

OTOH fixing all those would get you a 'sane' system :-)

> Can we use existing syscall(s) - again this is what our good old pthread library
> code did.
> 
> static void __pthread_acquire(int * spinlock)
> {
>   int cnt = 0;
>   struct timespec tm;
> 
>   READ_MEMORY_BARRIER();
> 
>   while (testandset(spinlock)) {   <---- atomic EXchange
>     if (cnt < 50) {
>       sched_yield();
>       cnt++;
>     } else {
>       tm.tv_sec = 0;
>       tm.tv_nsec = 2000001;
>       nanosleep(&tm, ((void *)0));
>       cnt = 0;
>     }
>   }

*shudder* that is quite horrible.

This means all your 'atomics' are broken for anything SCHED_FIFO and the
like. You simply _cannot_ run a realtime system.

(also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
test-and-set control dependency.)

But its your arch..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20 10:11                     ` Peter Zijlstra
@ 2015-10-20 10:45                       ` Vineet Gupta
  2015-10-29 15:58                         ` Alexey Brodkin
  0 siblings, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2015-10-20 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com,
	Thomas Gleixner

On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
>> > Can we use existing syscall(s) - again this is what our good old pthread library
>> > code did.
>> > 
>> > static void __pthread_acquire(int * spinlock)
>> > {
>> >   int cnt = 0;
>> >   struct timespec tm;
>> > 
>> >   READ_MEMORY_BARRIER();
>> > 
>> >   while (testandset(spinlock)) {   <---- atomic EXchange
>> >     if (cnt < 50) {
>> >       sched_yield();
>> >       cnt++;
>> >     } else {
>> >       tm.tv_sec = 0;
>> >       tm.tv_nsec = 2000001;
>> >       nanosleep(&tm, ((void *)0));
>> >       cnt = 0;
>> >     }
>> >   }
> *shudder* that is quite horrible.
>
> This means all your 'atomics' are broken for anything SCHED_FIFO and the
> like. You simply _cannot_ run a realtime system.

The code above is from uClibc old threading library which we don't use anymore.
The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
change we are talking about is only for the atomics in perf itself. I do
understand your POV though.

> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> test-and-set control dependency.)

Absolutely and in this case it will have to be added both inside the loop and one
at the end to cover both the scenarios !

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20 10:45                       ` Vineet Gupta
@ 2015-10-29 15:58                         ` Alexey Brodkin
  2015-10-30  6:19                           ` Vineet Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Brodkin @ 2015-10-29 15:58 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	tglx@linutronix.de, andi@firstfloor.org,
	linux-snps-arc@lists.infradead.org, acme@redhat.com,
	linux-perf-users@vger.kernel.org, giladb@ezchip.com,
	dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com

Hi Vineet,

On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > Can we use existing syscall(s) - again this is what our good old pthread library
> > > > code did.
> > > > 
> > > > static void __pthread_acquire(int * spinlock)
> > > > {
> > > >   int cnt = 0;
> > > >   struct timespec tm;
> > > > 
> > > >   READ_MEMORY_BARRIER();
> > > > 
> > > >   while (testandset(spinlock)) {   <---- atomic EXchange
> > > >     if (cnt < 50) {
> > > >       sched_yield();
> > > >       cnt++;
> > > >     } else {
> > > >       tm.tv_sec = 0;
> > > >       tm.tv_nsec = 2000001;
> > > >       nanosleep(&tm, ((void *)0));
> > > >       cnt = 0;
> > > >     }
> > > >   }
> > *shudder* that is quite horrible.
> > 
> > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > like. You simply _cannot_ run a realtime system.
> 
> The code above is from uClibc old threading library which we don't use anymore.
> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
> change we are talking about is only for the atomics in perf itself. I do
> understand your POV though.
> 
> > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > test-and-set control dependency.)
> 
> Absolutely and in this case it will have to be added both inside the loop and one
> at the end to cover both the scenarios !
> 

I'm wondering what are our plans for now?
Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
or we'll try to come up with more general solution?

-Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-29 15:58                         ` Alexey Brodkin
@ 2015-10-30  6:19                           ` Vineet Gupta
  2016-02-03 16:20                             ` Alexey Brodkin
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Vineet Gupta @ 2015-10-30  6:19 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	tglx@linutronix.de, andi@firstfloor.org,
	linux-snps-arc@lists.infradead.org, acme@redhat.com,
	linux-perf-users@vger.kernel.org, giladb@ezchip.com,
	dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com

On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> Hi Vineet,
>
> On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
>> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
>>>>> Can we use existing syscall(s) - again this is what our good old pthread library
>>>>> code did.
>>>>>
>>>>> static void __pthread_acquire(int * spinlock)
>>>>> {
>>>>>   int cnt = 0;
>>>>>   struct timespec tm;
>>>>>
>>>>>   READ_MEMORY_BARRIER();
>>>>>
>>>>>   while (testandset(spinlock)) {   <---- atomic EXchange
>>>>>     if (cnt < 50) {
>>>>>       sched_yield();
>>>>>       cnt++;
>>>>>     } else {
>>>>>       tm.tv_sec = 0;
>>>>>       tm.tv_nsec = 2000001;
>>>>>       nanosleep(&tm, ((void *)0));
>>>>>       cnt = 0;
>>>>>     }
>>>>>   }
>>> *shudder* that is quite horrible.
>>>
>>> This means all your 'atomics' are broken for anything SCHED_FIFO and the
>>> like. You simply _cannot_ run a realtime system.
>> The code above is from uClibc old threading library which we don't use anymore.
>> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
>> change we are talking about is only for the atomics in perf itself. I do
>> understand your POV though.
>>
>>> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
>>> test-and-set control dependency.)
>> Absolutely and in this case it will have to be added both inside the loop and one
>> at the end to cover both the scenarios !
>>
> I'm wondering what are our plans for now?
> Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
> or we'll try to come up with more general solution?

I agree with the current solution to add -atomic to for arc700 builds.
Although making that default for arc700 tools will be better but that will not fix
things before next release of tools etc.

But we *do* need to improve generic solution
1. Add atomics detection in perf to add fall back arch stubs
2. ARC needs to add syscall for facilitating atomic r-m-w !

-Vineet

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
       [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
  2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
@ 2015-10-30  6:21 ` Vineet Gupta
  1 sibling, 0 replies; 22+ messages in thread
From: Vineet Gupta @ 2015-10-30  6:21 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Peter Zijlstra, linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo, lkml,
	linux-snps-arc@lists.infradead.org

On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ----------------->8-----------------
> undefined reference to `__sync_add_and_fetch_4'
> ----------------->8-----------------
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.
>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>

Acked-by: Vineet Gupta <vgupta@synopsys.com>


> ---
>  tools/perf/config/Makefile | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>    $(call detected,CONFIG_PERF_REGS)
>  endif

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-30  6:19                           ` Vineet Gupta
@ 2016-02-03 16:20                             ` Alexey Brodkin
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Alexey Brodkin @ 2016-02-03 16:20 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	tglx@linutronix.de, andi@firstfloor.org,
	linux-snps-arc@lists.infradead.org, acme@redhat.com,
	linux-perf-users@vger.kernel.org, giladb@ezchip.com,
	dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com

Hi Vineet,

On Fri, 2015-10-30 at 06:19 +0000, Vineet Gupta wrote:
> On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> > Hi Vineet,
> > 
> > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
> > > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > > > Can we use existing syscall(s) - again this is what our good old pthread library
> > > > > > code did.
> > > > > > 
> > > > > > static void __pthread_acquire(int * spinlock)
> > > > > > {
> > > > > >   int cnt = 0;
> > > > > >   struct timespec tm;
> > > > > > 
> > > > > >   READ_MEMORY_BARRIER();
> > > > > > 
> > > > > >   while (testandset(spinlock)) {   <---- atomic EXchange
> > > > > >     if (cnt < 50) {
> > > > > >       sched_yield();
> > > > > >       cnt++;
> > > > > >     } else {
> > > > > >       tm.tv_sec = 0;
> > > > > >       tm.tv_nsec = 2000001;
> > > > > >       nanosleep(&tm, ((void *)0));
> > > > > >       cnt = 0;
> > > > > >     }
> > > > > >   }
> > > > *shudder* that is quite horrible.
> > > > 
> > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > > > like. You simply _cannot_ run a realtime system.
> > > The code above is from uClibc old threading library which we don't use anymore.
> > > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
> > > change we are talking about is only for the atomics in perf itself. I do
> > > understand your POV though.
> > > 
> > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > > > test-and-set control dependency.)
> > > Absolutely and in this case it will have to be added both inside the loop and one
> > > at the end to cover both the scenarios !
> > > 
> > I'm wondering what are our plans for now?
> > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
> > or we'll try to come up with more general solution?
> 
> I agree with the current solution to add -atomic to for arc700 builds.
> Although making that default for arc700 tools will be better but that will not fix
> things before next release of tools etc.
> 
> But we *do* need to improve generic solution
> 1. Add atomics detection in perf to add fall back arch stubs
> 2. ARC needs to add syscall for facilitating atomic r-m-w !

So the most recent ARC GNU tools (2015.12) were just released yesterday
and still atomics are disabled by default for ARCv1.
That means perf will continue to fail on building for now.

Do you think we may apply my initial fix to 4.5 while it is in development and
then to stable trees as well?

-Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
@ 2016-02-04  4:13                               ` Vineet Gupta
  2016-02-05 11:18                                 ` Noam Camus
  0 siblings, 1 reply; 22+ messages in thread
From: Vineet Gupta @ 2016-02-04  4:13 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-perf-users@vger.kernel.org, peterz@infradead.org,
	dvhart@linux.intel.com, linux-kernel@vger.kernel.org,
	acme@redhat.com, andi@firstfloor.org, noamc@ezchip.com,
	dsahern@gmail.com, tglx@linutronix.de,
	linux-snps-arc@lists.infradead.org, giladb@ezchip.com

+CC Noam

On Wednesday 03 February 2016 09:50 PM, Alexey Brodkin wrote:
>> I agree with the current solution to add -atomic to for arc700 builds.
>> > Although making that default for arc700 tools will be better but that will not fix
>> > things before next release of tools etc.
>> > 
>> > But we *do* need to improve generic solution
>> > 1. Add atomics detection in perf to add fall back arch stubs
>> > 2. ARC needs to add syscall for facilitating atomic r-m-w !
> So the most recent ARC GNU tools (2015.12) were just released yesterday
> and still atomics are disabled by default for ARCv1.
> That means perf will continue to fail on building for now.
> 
> Do you think we may apply my initial fix to 4.5 while it is in development and
> then to stable trees as well?

Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys OK!

-Vineet

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2016-02-04  4:13                               ` Vineet Gupta
@ 2016-02-05 11:18                                 ` Noam Camus
  2016-02-05 16:10                                   ` acme
  0 siblings, 1 reply; 22+ messages in thread
From: Noam Camus @ 2016-02-05 11:18 UTC (permalink / raw)
  To: Vineet Gupta, Alexey Brodkin
  Cc: linux-perf-users@vger.kernel.org, peterz@infradead.org,
	dvhart@linux.intel.com, linux-kernel@vger.kernel.org,
	acme@redhat.com, andi@firstfloor.org, dsahern@gmail.com,
	tglx@linutronix.de, linux-snps-arc@lists.infradead.org,
	Gilad Ben Yossef



________________________________________
>From: Vineet Gupta <Vineet.Gupta1@synopsys.com>
>Sent: Thursday, February 4, 2016 6:13 AM
>Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys OK!

Well here for EZchip I also see the:
undefined reference to `__sync_add_and_fetch_4'
undefined reference to `__sync_sub_and_fetch_4'

This is since at file tools/include/asm/atomic.h we use the generic implementation
If for ARC I could use just like x86 my own header file then functions like:
atomic_inc()
atomic_dec_and_test()
Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform.

You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*.
I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC.

So seem that either I have my own header at kernel or that I need to change the GCC implementation for __sync* to use my atomic instructions.
I am personally tend to the x86 solution and not the generic one since changing GCC will require to have new compiler dependency.
 
-Noam

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2016-02-05 11:18                                 ` Noam Camus
@ 2016-02-05 16:10                                   ` acme
  2016-02-10  3:09                                     ` Vineet Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: acme @ 2016-02-05 16:10 UTC (permalink / raw)
  To: Noam Camus
  Cc: peterz@infradead.org, Vineet Gupta, Alexey Brodkin,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	andi@firstfloor.org, dsahern@gmail.com, tglx@linutronix.de,
	linux-snps-arc@lists.infradead.org, dvhart@linux.intel.com,
	Gilad Ben Yossef

Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu:
> Well here for EZchip I also see the:
> undefined reference to `__sync_add_and_fetch_4'
> undefined reference to `__sync_sub_and_fetch_4'

Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't
you guys adapt arch/arc/include/asm/atomic.h to use in userspace?

- Arnaldo
 
> This is since at file tools/include/asm/atomic.h we use the generic implementation
> If for ARC I could use just like x86 my own header file then functions like:
> atomic_inc()
> atomic_dec_and_test()
> Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform.
> 
> You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*.
> I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC.

> So seem that either I have my own header at kernel or that I need to
> change the GCC implementation for __sync* to use my atomic
> instructions.  I am personally tend to the x86 solution and not the
> generic one since changing GCC will require to have new compiler
> dependency.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2016-02-05 16:10                                   ` acme
@ 2016-02-10  3:09                                     ` Vineet Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Vineet Gupta @ 2016-02-10  3:09 UTC (permalink / raw)
  To: acme@redhat.com, Noam Camus
  Cc: peterz@infradead.org, Alexey Brodkin,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	andi@firstfloor.org, dsahern@gmail.com, tglx@linutronix.de,
	linux-snps-arc@lists.infradead.org, dvhart@linux.intel.com,
	Gilad Ben Yossef

On Friday 05 February 2016 09:40 PM, acme@redhat.com wrote:
> Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu:
>> Well here for EZchip I also see the:
>> undefined reference to `__sync_add_and_fetch_4'
>> undefined reference to `__sync_sub_and_fetch_4'
> 
> Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't
> you guys adapt arch/arc/include/asm/atomic.h to use in userspace?

Sure - however we need to support 3 variants: LLSC, !LLSC, EZCHIP

If needed, latter 2 could be done using a new atomic assist syscall

I presume kernel Kconfig items are no go in this header so this diversity
management needs to use toolchain defined macros e.g. __ezchip__


> 
> - Arnaldo
>  
>> This is since at file tools/include/asm/atomic.h we use the generic implementation
>> If for ARC I could use just like x86 my own header file then functions like:
>> atomic_inc()
>> atomic_dec_and_test()
>> Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform.
>>
>> You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*.
>> I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC.
> 
>> So seem that either I have my own header at kernel or that I need to
>> change the GCC implementation for __sync* to use my atomic
>> instructions.  I am personally tend to the x86 solution and not the
>> generic one since changing GCC will require to have new compiler
>> dependency.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:35             ` Peter Zijlstra
  2015-10-19  9:46               ` Vineet Gupta
@ 2016-10-18 18:58               ` Vineet Gupta
  1 sibling, 0 replies; 22+ messages in thread
From: Vineet Gupta @ 2016-10-18 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexey Brodkin, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org, Russell King, arcml,
	acme@redhat.com

On 10/19/2015 02:35 AM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
>> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>>>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>>>> C just by itself.
>>> It matters when you access the perf ring buffer which is updated by kernel.
>> That's part of the problem. The issue is with atomic_* APIs proliferation in perf
>> user space code which assumes native atomix r-m-w support which is not always
>> true. So I think we still need a feature detection mechanism and if absent leave
>> the ball in arch court by calling arch_atomic_* which can use creative or half
>> working measures so perf will work to some extent atleast and not bomb outright.
>>
>> Also can u please elaborate a bit on "simulate them in C" - u mean just simple
>> unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

So I got around to addressing this - now that someone actually is trying to use
NPTL (which uses llock/scond) on ARC700 lacking those instructions. However given
that we are going this route, FWIW ARM kernel got rid of this syscall with
db695c0509d6ec ("ARM: remove user cmpxchg syscall") citing some security hole.
Even of we were to disregard, the code at the time had some open code MM trickery,
which I'd rather not replicate. My use case is simple - I only need to support UP
config - and a simple {get,put}_user would suffice - given that that can
potentially take a TLB refill Miss or worse still a full page fault. I'm going to
cook that patch to add that syscall, but wanted to get some thoughts ahead of that.

-Vineet

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-10-18 18:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
2015-10-18 11:15   ` Alexey Brodkin
2015-10-18 23:14     ` Andi Kleen
2015-10-19  4:58       ` Vineet Gupta
2015-10-19  5:49         ` Andi Kleen
2015-10-19  9:28           ` Vineet Gupta
2015-10-19  9:35             ` Peter Zijlstra
2015-10-19  9:46               ` Vineet Gupta
2015-10-19  9:51                 ` Peter Zijlstra
2015-10-19 10:04                   ` Vineet Gupta
2015-10-20  8:00                   ` Vineet Gupta
2015-10-20 10:11                     ` Peter Zijlstra
2015-10-20 10:45                       ` Vineet Gupta
2015-10-29 15:58                         ` Alexey Brodkin
2015-10-30  6:19                           ` Vineet Gupta
2016-02-03 16:20                             ` Alexey Brodkin
     [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
2016-02-04  4:13                               ` Vineet Gupta
2016-02-05 11:18                                 ` Noam Camus
2016-02-05 16:10                                   ` acme
2016-02-10  3:09                                     ` Vineet Gupta
2016-10-18 18:58               ` Vineet Gupta
2015-10-30  6:21 ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).