* Re: [RFC] perf: fix building for ARCv1 [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> @ 2015-10-17 14:19 ` Vineet Gupta 2015-10-18 11:15 ` Alexey Brodkin 2015-10-30 6:21 ` Vineet Gupta 1 sibling, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-17 14:19 UTC (permalink / raw) To: Alexey Brodkin Cc: Peter Zijlstra, Aabid Rushdi, lkml, linux-perf-users@vger.kernel.org, Darren Hart On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > Perf uses atomic options and so it is required to have atomics enabled > in toolchain. > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > ARCv1. Now we explicitly enable atomics for either ARC achitecture > version so perf could be successfully built. > > Currently on attempt to build perf for ARCv1 you'll see tons of: > ----------------->8----------------- > undefined reference to `__sync_add_and_fetch_4' > ----------------->8----------------- > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > execution once "llock" instruction is attempted to be executed. Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o that support. So what we are saying is that any arch (or a configuration thereof) which doesn't support atomic r-m-w can't even build perf now - that sucks ! A better way would be to do feature test for __sync_xyz and make atomic_xxx wrappers call __sync_xyz) vs. an empty stub. So atleast such arches can build and do "some" perf work ! -Vineet > Cc: Vineet Gupta <vgupta@synopsys.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> > --- > tools/perf/config/Makefile | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 38a0853..dc7c0a8 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > endif > > +# Additional ARCH settings for ARC > +ifeq ($(ARCH),arc) > + CFLAGS += -matomic > +endif > + > ifeq ($(NO_PERF_REGS),0) > $(call detected,CONFIG_PERF_REGS) > endif ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta @ 2015-10-18 11:15 ` Alexey Brodkin 2015-10-18 23:14 ` Andi Kleen 0 siblings, 1 reply; 22+ messages in thread From: Alexey Brodkin @ 2015-10-18 11:15 UTC (permalink / raw) To: Vineet Gupta Cc: Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com Hi Vineet, Looks like this time atomics are a must. And that really sucks! See these commits that introduce usage of atomic_xxx() all around the perf and tools it uses: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f812d3045c2385ac16237e68b156859c4005526e http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3a7c489c7fd2463e3b2c3a2179c7be879dd9cb4 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7143849a5d6a5c623d81790d92f0033507c5b14f http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=59a51c1dc9fbb3fb4af928b852d7b35df83edd74 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1ed3a5b87ed6759e16ec93f16aae83d2cc77ca2 and that's the one that introduced usage of the following generic gcc's atomics (__sync_add_and_fetch/__sync_sub_and_fetch): http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da6d8567512df11e0473b710c07de87efde5709c So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with perf building. -Alexey On Sat, 2015-10-17 at 14:19 +0000, Vineet Gupta wrote: > On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > > Perf uses atomic options and so it is required to have atomics enabled > > in toolchain. > > > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > > ARCv1. Now we explicitly enable atomics for either ARC achitecture > > version so perf could be successfully built. > > > > Currently on attempt to build perf for ARCv1 you'll see tons of: > > ----------------->8----------------- > > undefined reference to `__sync_add_and_fetch_4' > > ----------------->8----------------- > > > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > > execution once "llock" instruction is attempted to be executed. > > Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o > that support. > So what we are saying is that any arch (or a configuration thereof) which doesn't > support atomic r-m-w can't even build perf now - that sucks ! > > A better way would be to do feature test for __sync_xyz and make atomic_xxx > wrappers call __sync_xyz) vs. an empty stub. > So atleast such arches can build and do "some" perf work ! > > -Vineet > > > Cc: Vineet Gupta <vgupta@synopsys.com> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> > > --- > > tools/perf/config/Makefile | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > > index 38a0853..dc7c0a8 100644 > > --- a/tools/perf/config/Makefile > > +++ b/tools/perf/config/Makefile > > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > > endif > > > > +# Additional ARCH settings for ARC > > +ifeq ($(ARCH),arc) > > + CFLAGS += -matomic > > +endif > > + > > ifeq ($(NO_PERF_REGS),0) > > $(call detected,CONFIG_PERF_REGS) > > endif > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-18 11:15 ` Alexey Brodkin @ 2015-10-18 23:14 ` Andi Kleen 2015-10-19 4:58 ` Vineet Gupta 0 siblings, 1 reply; 22+ messages in thread From: Andi Kleen @ 2015-10-18 23:14 UTC (permalink / raw) To: Alexey Brodkin Cc: Vineet Gupta, Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes: > > So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with > perf building. If your target is single cpu only you can always simulate them in C. -Andi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-18 23:14 ` Andi Kleen @ 2015-10-19 4:58 ` Vineet Gupta 2015-10-19 5:49 ` Andi Kleen 0 siblings, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-19 4:58 UTC (permalink / raw) To: Andi Kleen, Alexey Brodkin Cc: Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Monday 19 October 2015 04:45 AM, Andi Kleen wrote: > Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes: >> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with >> perf building. > If your target is single cpu only you can always simulate them in C. > > -Andi But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in C just by itself. -Vineet ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 4:58 ` Vineet Gupta @ 2015-10-19 5:49 ` Andi Kleen 2015-10-19 9:28 ` Vineet Gupta 0 siblings, 1 reply; 22+ messages in thread From: Andi Kleen @ 2015-10-19 5:49 UTC (permalink / raw) To: Vineet Gupta Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: > > But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in > C just by itself. It matters when you access the perf ring buffer which is updated by kernel. Also perf is now multi threaded to some degree. -Andi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 5:49 ` Andi Kleen @ 2015-10-19 9:28 ` Vineet Gupta 2015-10-19 9:35 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-19 9:28 UTC (permalink / raw) To: Andi Kleen Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, peterz@infradead.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >> C just by itself. > It matters when you access the perf ring buffer which is updated by kernel. That's part of the problem. The issue is with atomic_* APIs proliferation in perf user space code which assumes native atomix r-m-w support which is not always true. So I think we still need a feature detection mechanism and if absent leave the ball in arch court by calling arch_atomic_* which can use creative or half working measures so perf will work to some extent atleast and not bomb outright. Also can u please elaborate a bit on "simulate them in C" - u mean just simple unprotected LD, OP, ST or do u fancy usage of futex etc? > Also perf is now multi threaded to some degree. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:28 ` Vineet Gupta @ 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta 2016-10-18 18:58 ` Vineet Gupta 0 siblings, 2 replies; 22+ messages in thread From: Peter Zijlstra @ 2015-10-19 9:35 UTC (permalink / raw) To: Vineet Gupta Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in > >> C just by itself. > > It matters when you access the perf ring buffer which is updated by kernel. > > That's part of the problem. The issue is with atomic_* APIs proliferation in perf > user space code which assumes native atomix r-m-w support which is not always > true. So I think we still need a feature detection mechanism and if absent leave > the ball in arch court by calling arch_atomic_* which can use creative or half > working measures so perf will work to some extent atleast and not bomb outright. > > Also can u please elaborate a bit on "simulate them in C" - u mean just simple > unprotected LD, OP, ST or do u fancy usage of futex etc? Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an IRQ-disabled load-op-store sequence. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:35 ` Peter Zijlstra @ 2015-10-19 9:46 ` Vineet Gupta 2015-10-19 9:51 ` Peter Zijlstra 2016-10-18 18:58 ` Vineet Gupta 1 sibling, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-19 9:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Monday 19 October 2015 03:05 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: >> > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: >>> > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >>>> > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >>>> > >> C just by itself. >>> > > It matters when you access the perf ring buffer which is updated by kernel. >> > >> > That's part of the problem. The issue is with atomic_* APIs proliferation in perf >> > user space code which assumes native atomix r-m-w support which is not always >> > true. So I think we still need a feature detection mechanism and if absent leave >> > the ball in arch court by calling arch_atomic_* which can use creative or half >> > working measures so perf will work to some extent atleast and not bomb outright. >> > >> > Also can u please elaborate a bit on "simulate them in C" - u mean just simple >> > unprotected LD, OP, ST or do u fancy usage of futex etc? > Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an > IRQ-disabled load-op-store sequence. Yeah I remember seeing some syscall like that in ARM. On ARC we could use the atomic EXchange to implement a user space only binary semaphore - these atomic ops will be small duration so it is OK to spin wait for a little bit. That's how the old pthread library worked for ARC w/o any atomic support. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:46 ` Vineet Gupta @ 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 0 siblings, 2 replies; 22+ messages in thread From: Peter Zijlstra @ 2015-10-19 9:51 UTC (permalink / raw) To: Vineet Gupta Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: > On ARC we could use the atomic EXchange to implement a user space only binary > semaphore - these atomic ops will be small duration so it is OK to spin wait for a > little bit. That's how the old pthread library worked for ARC w/o any atomic support. That has the obvious problem of lock-holder-preemption and the horrible performance issues that result from that. I think the syscall at least has deterministic behaviour, whereas that userspace spin loop has this abysmal worst case thing. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:51 ` Peter Zijlstra @ 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 1 sibling, 0 replies; 22+ messages in thread From: Vineet Gupta @ 2015-10-19 10:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: >> > On ARC we could use the atomic EXchange to implement a user space only binary >> > semaphore - these atomic ops will be small duration so it is OK to spin wait for a >> > little bit. That's how the old pthread library worked for ARC w/o any atomic support. > That has the obvious problem of lock-holder-preemption and the horrible > performance issues that result from that. > > I think the syscall at least has deterministic behaviour, whereas that > userspace spin loop has this abysmal worst case thing. I agree - we can add that syscall trivially and use it based on build time feature detection for atomics ! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta @ 2015-10-20 8:00 ` Vineet Gupta 2015-10-20 10:11 ` Peter Zijlstra 1 sibling, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-20 8:00 UTC (permalink / raw) To: Peter Zijlstra, Noam Camus Cc: Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: >> On ARC we could use the atomic EXchange to implement a user space only binary >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a >> little bit. That's how the old pthread library worked for ARC w/o any atomic support. > That has the obvious problem of lock-holder-preemption and the horrible > performance issues that result from that. > > I think the syscall at least has deterministic behaviour, whereas that > userspace spin loop has this abysmal worst case thing. I don't have issue with adding the syscall per-se. But that comes with it's own headaches of ABI change - more importantly it requires several things to match, libc, kernel... It would be easier if change was confined to say perf. Can we use existing syscall(s) - again this is what our good old pthread library code did. static void __pthread_acquire(int * spinlock) { int cnt = 0; struct timespec tm; READ_MEMORY_BARRIER(); while (testandset(spinlock)) { <---- atomic EXchange if (cnt < 50) { sched_yield(); cnt++; } else { tm.tv_sec = 0; tm.tv_nsec = 2000001; nanosleep(&tm, ((void *)0)); cnt = 0; } } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 8:00 ` Vineet Gupta @ 2015-10-20 10:11 ` Peter Zijlstra 2015-10-20 10:45 ` Vineet Gupta 0 siblings, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2015-10-20 10:11 UTC (permalink / raw) To: Vineet Gupta Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com, Thomas Gleixner On Tue, Oct 20, 2015 at 08:00:46AM +0000, Vineet Gupta wrote: > On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: > >> On ARC we could use the atomic EXchange to implement a user space only binary > >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a > >> little bit. That's how the old pthread library worked for ARC w/o any atomic support. > > That has the obvious problem of lock-holder-preemption and the horrible > > performance issues that result from that. > > > > I think the syscall at least has deterministic behaviour, whereas that > > userspace spin loop has this abysmal worst case thing. > > I don't have issue with adding the syscall per-se. But that comes with it's own > headaches of ABI change - more importantly it requires several things to match, > libc, kernel... It would be easier if change was confined to say perf. OTOH fixing all those would get you a 'sane' system :-) > Can we use existing syscall(s) - again this is what our good old pthread library > code did. > > static void __pthread_acquire(int * spinlock) > { > int cnt = 0; > struct timespec tm; > > READ_MEMORY_BARRIER(); > > while (testandset(spinlock)) { <---- atomic EXchange > if (cnt < 50) { > sched_yield(); > cnt++; > } else { > tm.tv_sec = 0; > tm.tv_nsec = 2000001; > nanosleep(&tm, ((void *)0)); > cnt = 0; > } > } *shudder* that is quite horrible. This means all your 'atomics' are broken for anything SCHED_FIFO and the like. You simply _cannot_ run a realtime system. (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the test-and-set control dependency.) But its your arch.. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 10:11 ` Peter Zijlstra @ 2015-10-20 10:45 ` Vineet Gupta 2015-10-29 15:58 ` Alexey Brodkin 0 siblings, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2015-10-20 10:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, dvhart@linux.intel.com, dsahern@gmail.com, acme@redhat.com, Thomas Gleixner On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: >> > Can we use existing syscall(s) - again this is what our good old pthread library >> > code did. >> > >> > static void __pthread_acquire(int * spinlock) >> > { >> > int cnt = 0; >> > struct timespec tm; >> > >> > READ_MEMORY_BARRIER(); >> > >> > while (testandset(spinlock)) { <---- atomic EXchange >> > if (cnt < 50) { >> > sched_yield(); >> > cnt++; >> > } else { >> > tm.tv_sec = 0; >> > tm.tv_nsec = 2000001; >> > nanosleep(&tm, ((void *)0)); >> > cnt = 0; >> > } >> > } > *shudder* that is quite horrible. > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > like. You simply _cannot_ run a realtime system. The code above is from uClibc old threading library which we don't use anymore. The NPTL version doesn't have all of this song-n-dance and relies on futexes. The change we are talking about is only for the atomics in perf itself. I do understand your POV though. > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > test-and-set control dependency.) Absolutely and in this case it will have to be added both inside the loop and one at the end to cover both the scenarios ! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 10:45 ` Vineet Gupta @ 2015-10-29 15:58 ` Alexey Brodkin 2015-10-30 6:19 ` Vineet Gupta 0 siblings, 1 reply; 22+ messages in thread From: Alexey Brodkin @ 2015-10-29 15:58 UTC (permalink / raw) To: Vineet Gupta Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, andi@firstfloor.org, linux-snps-arc@lists.infradead.org, acme@redhat.com, linux-perf-users@vger.kernel.org, giladb@ezchip.com, dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com Hi Vineet, On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: > > > > Can we use existing syscall(s) - again this is what our good old pthread library > > > > code did. > > > > > > > > static void __pthread_acquire(int * spinlock) > > > > { > > > > int cnt = 0; > > > > struct timespec tm; > > > > > > > > READ_MEMORY_BARRIER(); > > > > > > > > while (testandset(spinlock)) { <---- atomic EXchange > > > > if (cnt < 50) { > > > > sched_yield(); > > > > cnt++; > > > > } else { > > > > tm.tv_sec = 0; > > > > tm.tv_nsec = 2000001; > > > > nanosleep(&tm, ((void *)0)); > > > > cnt = 0; > > > > } > > > > } > > *shudder* that is quite horrible. > > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > > like. You simply _cannot_ run a realtime system. > > The code above is from uClibc old threading library which we don't use anymore. > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The > change we are talking about is only for the atomics in perf itself. I do > understand your POV though. > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > > test-and-set control dependency.) > > Absolutely and in this case it will have to be added both inside the loop and one > at the end to cover both the scenarios ! > I'm wondering what are our plans for now? Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) or we'll try to come up with more general solution? -Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-29 15:58 ` Alexey Brodkin @ 2015-10-30 6:19 ` Vineet Gupta 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 0 siblings, 2 replies; 22+ messages in thread From: Vineet Gupta @ 2015-10-30 6:19 UTC (permalink / raw) To: Alexey Brodkin Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, andi@firstfloor.org, linux-snps-arc@lists.infradead.org, acme@redhat.com, linux-perf-users@vger.kernel.org, giladb@ezchip.com, dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote: > Hi Vineet, > > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: >> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: >>>>> Can we use existing syscall(s) - again this is what our good old pthread library >>>>> code did. >>>>> >>>>> static void __pthread_acquire(int * spinlock) >>>>> { >>>>> int cnt = 0; >>>>> struct timespec tm; >>>>> >>>>> READ_MEMORY_BARRIER(); >>>>> >>>>> while (testandset(spinlock)) { <---- atomic EXchange >>>>> if (cnt < 50) { >>>>> sched_yield(); >>>>> cnt++; >>>>> } else { >>>>> tm.tv_sec = 0; >>>>> tm.tv_nsec = 2000001; >>>>> nanosleep(&tm, ((void *)0)); >>>>> cnt = 0; >>>>> } >>>>> } >>> *shudder* that is quite horrible. >>> >>> This means all your 'atomics' are broken for anything SCHED_FIFO and the >>> like. You simply _cannot_ run a realtime system. >> The code above is from uClibc old threading library which we don't use anymore. >> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The >> change we are talking about is only for the atomics in perf itself. I do >> understand your POV though. >> >>> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the >>> test-and-set control dependency.) >> Absolutely and in this case it will have to be added both inside the loop and one >> at the end to cover both the scenarios ! >> > I'm wondering what are our plans for now? > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) > or we'll try to come up with more general solution? I agree with the current solution to add -atomic to for arc700 builds. Although making that default for arc700 tools will be better but that will not fix things before next release of tools etc. But we *do* need to improve generic solution 1. Add atomics detection in perf to add fall back arch stubs 2. ARC needs to add syscall for facilitating atomic r-m-w ! -Vineet ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-30 6:19 ` Vineet Gupta @ 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 1 sibling, 0 replies; 22+ messages in thread From: Alexey Brodkin @ 2016-02-03 16:20 UTC (permalink / raw) To: Vineet Gupta Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, andi@firstfloor.org, linux-snps-arc@lists.infradead.org, acme@redhat.com, linux-perf-users@vger.kernel.org, giladb@ezchip.com, dsahern@gmail.com, dvhart@linux.intel.com, noamc@ezchip.com Hi Vineet, On Fri, 2015-10-30 at 06:19 +0000, Vineet Gupta wrote: > On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote: > > Hi Vineet, > > > > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: > > > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: > > > > > > Can we use existing syscall(s) - again this is what our good old pthread library > > > > > > code did. > > > > > > > > > > > > static void __pthread_acquire(int * spinlock) > > > > > > { > > > > > > int cnt = 0; > > > > > > struct timespec tm; > > > > > > > > > > > > READ_MEMORY_BARRIER(); > > > > > > > > > > > > while (testandset(spinlock)) { <---- atomic EXchange > > > > > > if (cnt < 50) { > > > > > > sched_yield(); > > > > > > cnt++; > > > > > > } else { > > > > > > tm.tv_sec = 0; > > > > > > tm.tv_nsec = 2000001; > > > > > > nanosleep(&tm, ((void *)0)); > > > > > > cnt = 0; > > > > > > } > > > > > > } > > > > *shudder* that is quite horrible. > > > > > > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > > > > like. You simply _cannot_ run a realtime system. > > > The code above is from uClibc old threading library which we don't use anymore. > > > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The > > > change we are talking about is only for the atomics in perf itself. I do > > > understand your POV though. > > > > > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > > > > test-and-set control dependency.) > > > Absolutely and in this case it will have to be added both inside the loop and one > > > at the end to cover both the scenarios ! > > > > > I'm wondering what are our plans for now? > > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) > > or we'll try to come up with more general solution? > > I agree with the current solution to add -atomic to for arc700 builds. > Although making that default for arc700 tools will be better but that will not fix > things before next release of tools etc. > > But we *do* need to improve generic solution > 1. Add atomics detection in perf to add fall back arch stubs > 2. ARC needs to add syscall for facilitating atomic r-m-w ! So the most recent ARC GNU tools (2015.12) were just released yesterday and still atomics are disabled by default for ARCv1. That means perf will continue to fail on building for now. Do you think we may apply my initial fix to 4.5 while it is in development and then to stable trees as well? -Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>]
* Re: [RFC] perf: fix building for ARCv1 [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> @ 2016-02-04 4:13 ` Vineet Gupta 2016-02-05 11:18 ` Noam Camus 0 siblings, 1 reply; 22+ messages in thread From: Vineet Gupta @ 2016-02-04 4:13 UTC (permalink / raw) To: Alexey Brodkin Cc: linux-perf-users@vger.kernel.org, peterz@infradead.org, dvhart@linux.intel.com, linux-kernel@vger.kernel.org, acme@redhat.com, andi@firstfloor.org, noamc@ezchip.com, dsahern@gmail.com, tglx@linutronix.de, linux-snps-arc@lists.infradead.org, giladb@ezchip.com +CC Noam On Wednesday 03 February 2016 09:50 PM, Alexey Brodkin wrote: >> I agree with the current solution to add -atomic to for arc700 builds. >> > Although making that default for arc700 tools will be better but that will not fix >> > things before next release of tools etc. >> > >> > But we *do* need to improve generic solution >> > 1. Add atomics detection in perf to add fall back arch stubs >> > 2. ARC needs to add syscall for facilitating atomic r-m-w ! > So the most recent ARC GNU tools (2015.12) were just released yesterday > and still atomics are disabled by default for ARCv1. > That means perf will continue to fail on building for now. > > Do you think we may apply my initial fix to 4.5 while it is in development and > then to stable trees as well? Noam, what's the atomic story for EZChip. Do you support such things for user space in GNU tools. If -atomic is added to perf user space builds are you guys OK! -Vineet ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2016-02-04 4:13 ` Vineet Gupta @ 2016-02-05 11:18 ` Noam Camus 2016-02-05 16:10 ` acme 0 siblings, 1 reply; 22+ messages in thread From: Noam Camus @ 2016-02-05 11:18 UTC (permalink / raw) To: Vineet Gupta, Alexey Brodkin Cc: linux-perf-users@vger.kernel.org, peterz@infradead.org, dvhart@linux.intel.com, linux-kernel@vger.kernel.org, acme@redhat.com, andi@firstfloor.org, dsahern@gmail.com, tglx@linutronix.de, linux-snps-arc@lists.infradead.org, Gilad Ben Yossef ________________________________________ >From: Vineet Gupta <Vineet.Gupta1@synopsys.com> >Sent: Thursday, February 4, 2016 6:13 AM >Noam, what's the atomic story for EZChip. Do you support such things for user space in GNU tools. If -atomic is added to perf user space builds are you guys OK! Well here for EZchip I also see the: undefined reference to `__sync_add_and_fetch_4' undefined reference to `__sync_sub_and_fetch_4' This is since at file tools/include/asm/atomic.h we use the generic implementation If for ARC I could use just like x86 my own header file then functions like: atomic_inc() atomic_dec_and_test() Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform. You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*. I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC. So seem that either I have my own header at kernel or that I need to change the GCC implementation for __sync* to use my atomic instructions. I am personally tend to the x86 solution and not the generic one since changing GCC will require to have new compiler dependency. -Noam ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2016-02-05 11:18 ` Noam Camus @ 2016-02-05 16:10 ` acme 2016-02-10 3:09 ` Vineet Gupta 0 siblings, 1 reply; 22+ messages in thread From: acme @ 2016-02-05 16:10 UTC (permalink / raw) To: Noam Camus Cc: peterz@infradead.org, Vineet Gupta, Alexey Brodkin, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, andi@firstfloor.org, dsahern@gmail.com, tglx@linutronix.de, linux-snps-arc@lists.infradead.org, dvhart@linux.intel.com, Gilad Ben Yossef Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu: > Well here for EZchip I also see the: > undefined reference to `__sync_add_and_fetch_4' > undefined reference to `__sync_sub_and_fetch_4' Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't you guys adapt arch/arc/include/asm/atomic.h to use in userspace? - Arnaldo > This is since at file tools/include/asm/atomic.h we use the generic implementation > If for ARC I could use just like x86 my own header file then functions like: > atomic_inc() > atomic_dec_and_test() > Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform. > > You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*. > I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC. > So seem that either I have my own header at kernel or that I need to > change the GCC implementation for __sync* to use my atomic > instructions. I am personally tend to the x86 solution and not the > generic one since changing GCC will require to have new compiler > dependency. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2016-02-05 16:10 ` acme @ 2016-02-10 3:09 ` Vineet Gupta 0 siblings, 0 replies; 22+ messages in thread From: Vineet Gupta @ 2016-02-10 3:09 UTC (permalink / raw) To: acme@redhat.com, Noam Camus Cc: peterz@infradead.org, Alexey Brodkin, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, andi@firstfloor.org, dsahern@gmail.com, tglx@linutronix.de, linux-snps-arc@lists.infradead.org, dvhart@linux.intel.com, Gilad Ben Yossef On Friday 05 February 2016 09:40 PM, acme@redhat.com wrote: > Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu: >> Well here for EZchip I also see the: >> undefined reference to `__sync_add_and_fetch_4' >> undefined reference to `__sync_sub_and_fetch_4' > > Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't > you guys adapt arch/arc/include/asm/atomic.h to use in userspace? Sure - however we need to support 3 variants: LLSC, !LLSC, EZCHIP If needed, latter 2 could be done using a new atomic assist syscall I presume kernel Kconfig items are no go in this header so this diversity management needs to use toolchain defined macros e.g. __ezchip__ > > - Arnaldo > >> This is since at file tools/include/asm/atomic.h we use the generic implementation >> If for ARC I could use just like x86 my own header file then functions like: >> atomic_inc() >> atomic_dec_and_test() >> Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform. >> >> You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*. >> I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC. > >> So seem that either I have my own header at kernel or that I need to >> change the GCC implementation for __sync* to use my atomic >> instructions. I am personally tend to the x86 solution and not the >> generic one since changing GCC will require to have new compiler >> dependency. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta @ 2016-10-18 18:58 ` Vineet Gupta 1 sibling, 0 replies; 22+ messages in thread From: Vineet Gupta @ 2016-10-18 18:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexey Brodkin, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Russell King, arcml, acme@redhat.com On 10/19/2015 02:35 AM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: >> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: >>> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >>>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >>>> C just by itself. >>> It matters when you access the perf ring buffer which is updated by kernel. >> That's part of the problem. The issue is with atomic_* APIs proliferation in perf >> user space code which assumes native atomix r-m-w support which is not always >> true. So I think we still need a feature detection mechanism and if absent leave >> the ball in arch court by calling arch_atomic_* which can use creative or half >> working measures so perf will work to some extent atleast and not bomb outright. >> >> Also can u please elaborate a bit on "simulate them in C" - u mean just simple >> unprotected LD, OP, ST or do u fancy usage of futex etc? > Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an > IRQ-disabled load-op-store sequence. So I got around to addressing this - now that someone actually is trying to use NPTL (which uses llock/scond) on ARC700 lacking those instructions. However given that we are going this route, FWIW ARM kernel got rid of this syscall with db695c0509d6ec ("ARM: remove user cmpxchg syscall") citing some security hole. Even of we were to disregard, the code at the time had some open code MM trickery, which I'd rather not replicate. My use case is simple - I only need to support UP config - and a simple {get,put}_user would suffice - given that that can potentially take a TLB refill Miss or worse still a full page fault. I'm going to cook that patch to add that syscall, but wanted to get some thoughts ahead of that. -Vineet ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta @ 2015-10-30 6:21 ` Vineet Gupta 1 sibling, 0 replies; 22+ messages in thread From: Vineet Gupta @ 2015-10-30 6:21 UTC (permalink / raw) To: Alexey Brodkin Cc: Peter Zijlstra, linux-perf-users@vger.kernel.org, Arnaldo Carvalho de Melo, lkml, linux-snps-arc@lists.infradead.org On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > Perf uses atomic options and so it is required to have atomics enabled > in toolchain. > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > ARCv1. Now we explicitly enable atomics for either ARC achitecture > version so perf could be successfully built. > > Currently on attempt to build perf for ARCv1 you'll see tons of: > ----------------->8----------------- > undefined reference to `__sync_add_and_fetch_4' > ----------------->8----------------- > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > execution once "llock" instruction is attempted to be executed. > > Cc: Vineet Gupta <vgupta@synopsys.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> > --- > tools/perf/config/Makefile | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 38a0853..dc7c0a8 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > endif > > +# Additional ARCH settings for ARC > +ifeq ($(ARCH),arc) > + CFLAGS += -matomic > +endif > + > ifeq ($(NO_PERF_REGS),0) > $(call detected,CONFIG_PERF_REGS) > endif ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-10-18 18:58 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta 2015-10-18 11:15 ` Alexey Brodkin 2015-10-18 23:14 ` Andi Kleen 2015-10-19 4:58 ` Vineet Gupta 2015-10-19 5:49 ` Andi Kleen 2015-10-19 9:28 ` Vineet Gupta 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 2015-10-20 10:11 ` Peter Zijlstra 2015-10-20 10:45 ` Vineet Gupta 2015-10-29 15:58 ` Alexey Brodkin 2015-10-30 6:19 ` Vineet Gupta 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 2016-02-04 4:13 ` Vineet Gupta 2016-02-05 11:18 ` Noam Camus 2016-02-05 16:10 ` acme 2016-02-10 3:09 ` Vineet Gupta 2016-10-18 18:58 ` Vineet Gupta 2015-10-30 6:21 ` Vineet Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).