* ia64 atomic_dec_and_lock() patch
@ 2003-12-10 15:37 jerome.marchand
2003-12-10 15:44 ` Christoph Hellwig
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: jerome.marchand @ 2003-12-10 15:37 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3139 bytes --]
I have run a benchmark which load heavily the vfs on a 16 Itanium
computer. When using lockmeter, I have noticed that dcache_lock induce a
significant contention when called from dput. I observed a case in which
80% of CPUs time was used in spin-wait!
The ia64 kernel waste all this time because there is no ia64-specific
implementation of atomic_dec_and_lock() and the kernel use the generic
function instead.
I wrote the ia64 atomic_dec_and_lock function and since dcache_lock never
use more than 0.01% of CPUs time and I have encountered no problem. The
patch is here.
Does someone know why this function was not implemented before whereas it
is implemented for ia32, ppc, ppc64, sparc64 and alpha processors ?
Jerome Marchand
PS: I have also join the patch for lockmeter to this mail.
diff -urN linux-2.6.0-test11.orig/arch/ia64/Kconfig
linux-2.6.0-test11/arch/ia64/Kconfig
--- linux-2.6.0-test11.orig/arch/ia64/Kconfig 2003-12-09
11:26:58.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/Kconfig 2003-12-09
11:34:09.000000000 +0100
@@ -375,6 +375,11 @@
depends on IA32_SUPPORT
default y
+config HAVE_DEC_LOCK
+ bool
+ depends on (SMP || PREEMPT)
+ default y
+
config PERFMON
bool "Performance monitor support"
help
diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/Makefile
linux-2.6.0-test11/arch/ia64/lib/Makefile
--- linux-2.6.0-test11.orig/arch/ia64/lib/Makefile 2003-12-09
11:26:58.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/lib/Makefile 2003-12-09
11:32:05.000000000 +0100
@@ -13,6 +13,7 @@
lib-$(CONFIG_MCKINLEY) += copy_page_mck.o memcpy_mck.o
lib-$(CONFIG_PERFMON) += carta_random.o
lib-$(CONFIG_MD_RAID5) += xor.o
+lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o
AFLAGS___divdi3.o =
AFLAGS___udivdi3.o = -DUNSIGNED
diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c
linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c
--- linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c 1970-01-01
01:00:00.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c 2003-12-09
11:31:23.000000000 +0100
@@ -0,0 +1,42 @@
+/*
+ * ia64 version of "atomic_dec_and_lock()" using
+ * the atomic "cmpxchg" instruction.
+ * This code is an adaptation of the x86 version
+ * of "atomic_dec_and_lock()".
+ */
+
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+
+#ifndef ATOMIC_DEC_AND_LOCK
+int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
+{
+ int counter;
+ int newcount;
+
+repeat:
+ counter = atomic_read(atomic);
+ newcount = counter-1;
+
+ if (!newcount)
+ goto slow_path;
+
+ asm volatile("mov ar.ccv=%1;;\n\t"
+ "cmpxchg4.acq %0=%2,%3,ar.ccv;;"
+ :"=r" (newcount)
+ :"r" (counter), "m" (atomic->counter), "r" (newcount)
+ :"ar.ccv");
+
+ if (newcount != counter)
+ goto repeat;
+ return 0;
+
+slow_path:
+ spin_lock(lock);
+ if (atomic_dec_and_test(atomic))
+ return 1;
+ spin_unlock(lock);
+ return 0;
+}
+#endif
[-- Attachment #2: lockmeter ia64 patch --]
[-- Type: TEXT/PLAIN, Size: 1480 bytes --]
diff -urN linux-2.6.0-test11.lockmeter.orig/include/asm-ia64/spinlock.h linux-2.6.0-test11.lockmeter/include/asm-ia64/spinlock.h
--- linux-2.6.0-test11.lockmeter.orig/include/asm-ia64/spinlock.h 2003-12-09 13:03:36.000000000 +0100
+++ linux-2.6.0-test11.lockmeter/include/asm-ia64/spinlock.h 2003-12-09 13:08:24.000000000 +0100
@@ -247,13 +247,33 @@
extern void _metered_spin_unlock(spinlock_t *lock);
/*
- * Use a less efficient, and inline, atomic_dec_and_lock() if lockmetering
- * so we can see the callerPC of who is actually doing the spin_lock().
- * Otherwise, all we see is the generic rollup of all locks done by
- * atomic_dec_and_lock().
+ * Matches what is in arch/ia64/lib/dec_and_lock.c, except this one is
+ * "static inline" so that the spin_lock(), if actually invoked, is charged
+ * against the real caller, not against the catch-all atomic_dec_and_lock
*/
static inline int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
{
+ int counter;
+ int newcount;
+
+repeat:
+ counter = atomic_read(atomic);
+ newcount = counter-1;
+
+ if (!newcount)
+ goto slow_path;
+
+ asm volatile("mov ar.ccv=%1;;\n\t"
+ "cmpxchg4.acq %0=%2,%3,ar.ccv;;"
+ :"=r" (newcount)
+ :"r" (counter), "m" (atomic->counter), "r" (newcount)
+ :"ar.ccv");
+
+ if (newcount != counter)
+ goto repeat;
+ return 0;
+
+slow_path:
_metered_spin_lock(lock);
if (atomic_dec_and_test(atomic))
return 1;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ia64 atomic_dec_and_lock() patch
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
@ 2003-12-10 15:44 ` Christoph Hellwig
2003-12-10 16:06 ` jerome.marchand
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2003-12-10 15:44 UTC (permalink / raw)
To: linux-ia64
> --- linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c 1970-01-01
> 01:00:00.000000000 +0100
> +++ linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c 2003-12-09
> 11:31:23.000000000 +0100
> @@ -0,0 +1,42 @@
> +/*
> + * ia64 version of "atomic_dec_and_lock()" using
> + * the atomic "cmpxchg" instruction.
> + * This code is an adaptation of the x86 version
> + * of "atomic_dec_and_lock()".
> + */
You should probably add a copyright here and the license (i.e. GPLv2)
> +
> +#include <linux/spinlock.h>
> +#include <asm/atomic.h>
> +
> +#ifndef ATOMIC_DEC_AND_LOCK
Why this ifdef?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ia64 atomic_dec_and_lock() patch
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
2003-12-10 15:44 ` Christoph Hellwig
@ 2003-12-10 16:06 ` jerome.marchand
2003-12-10 17:33 ` David Mosberger
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jerome.marchand @ 2003-12-10 16:06 UTC (permalink / raw)
To: linux-ia64
On Wed, 10 Dec 2003, Christoph Hellwig wrote:
> You should probably add a copyright here and the license (i.e. GPLv2)
Right.
> > +
> > +#include <linux/spinlock.h>
> > +#include <asm/atomic.h>
> > +
> > +#ifndef ATOMIC_DEC_AND_LOCK
>
> Why this ifdef?
Oops! It is a polution of the lockmeter patch. This ifdef should be in the
ia64-lockmeter patch.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ia64 atomic_dec_and_lock() patch
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
2003-12-10 15:44 ` Christoph Hellwig
2003-12-10 16:06 ` jerome.marchand
@ 2003-12-10 17:33 ` David Mosberger
2003-12-11 9:58 ` jerome.marchand
2003-12-11 20:28 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-12-10 17:33 UTC (permalink / raw)
To: linux-ia64
Jermone,
>>>>> On Wed, 10 Dec 2003 16:37:22 +0100 (NFT), jerome.marchand@ext.bull.net said:
Jerome> I have run a benchmark which load heavily the vfs on a 16
Jerome> Itanium computer. When using lockmeter, I have noticed that
Jerome> dcache_lock induce a significant contention when called from
Jerome> dput. I observed a case in which 80% of CPUs time was used
Jerome> in spin-wait! The ia64 kernel waste all this time because
Jerome> there is no ia64-specific implementation of
Jerome> atomic_dec_and_lock() and the kernel use the generic
Jerome> function instead. I wrote the ia64 atomic_dec_and_lock
Jerome> function and since dcache_lock never use more than 0.01% of
Jerome> CPUs time and I have encountered no problem. The patch is
Jerome> here. Does someone know why this function was not
Jerome> implemented before whereas it is implemented for ia32, ppc,
Jerome> ppc64, sparc64 and alpha processors ?
Because nobody so far has demonstrated a need for it or hasn't gotten
around to it?
Could you try replacing the inline-asm with cmpxchg() function? That
way, you won't break compilation with Intel's compiler.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ia64 atomic_dec_and_lock() patch
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
` (2 preceding siblings ...)
2003-12-10 17:33 ` David Mosberger
@ 2003-12-11 9:58 ` jerome.marchand
2003-12-11 20:28 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: jerome.marchand @ 2003-12-11 9:58 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2490 bytes --]
On Wed, 10 Dec 2003, David Mosberger wrote:
> Could you try replacing the inline-asm with cmpxchg() function? That
> way, you won't break compilation with Intel's compiler.
OK, here's the new patch.
I also join the right patch for lockmeter.
diff -urN linux-2.6.0-test11.orig/arch/ia64/Kconfig
linux-2.6.0-test11/arch/ia64/Kconfig
--- linux-2.6.0-test11.orig/arch/ia64/Kconfig 2003-12-09
11:26:58.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/Kconfig 2003-12-09
11:34:09.000000000 +0100
@@ -375,6 +375,11 @@
depends on IA32_SUPPORT
default y
+config HAVE_DEC_LOCK
+ bool
+ depends on (SMP || PREEMPT)
+ default y
+
config PERFMON
bool "Performance monitor support"
help
diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/Makefile
linux-2.6.0-test11/arch/ia64/lib/Makefile
--- linux-2.6.0-test11.orig/arch/ia64/lib/Makefile 2003-12-09
11:26:58.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/lib/Makefile 2003-12-09
11:32:05.000000000 +0100
@@ -13,6 +13,7 @@
lib-$(CONFIG_MCKINLEY) += copy_page_mck.o memcpy_mck.o
lib-$(CONFIG_PERFMON) += carta_random.o
lib-$(CONFIG_MD_RAID5) += xor.o
+lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o
AFLAGS___divdi3.o =
AFLAGS___udivdi3.o = -DUNSIGNED
diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c
linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c
--- linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c 1970-01-01
01:00:00.000000000 +0100
+++ linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c 2003-12-11
10:42:49.000000000 +0100
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2003 Jerome Marchand, Bull S.A.
+ *
+ * This file is released under the GPLv2, or at
+ * your option any later version.
+ *
+ * ia64 version of "atomic_dec_and_lock()" using
+ * the atomic "cmpxchg" instruction.
+ * This code is an adaptation of the x86 version
+ * of "atomic_dec_and_lock()".
+ */
+
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+
+int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
+{
+ int counter;
+ int newcount;
+
+repeat:
+ counter = atomic_read(atomic);
+ newcount = counter-1;
+
+ if (!newcount)
+ goto slow_path;
+
+ if(cmpxchg(&atomic->counter, counter, newcount) != counter)
+ goto repeat;
+ return 0;
+
+slow_path:
+ spin_lock(lock);
+ if (atomic_dec_and_test(atomic))
+ return 1;
+ spin_unlock(lock);
+ return 0;
+}
[-- Attachment #2: lockmeter ia64 patch --]
[-- Type: TEXT/PLAIN, Size: 1914 bytes --]
diff -urN linux-2.6.0-test11.lockmeter.orig/arch/ia64/lib/dec_and_lock.c linux-2.6.0-test11.lockmeter/arch/ia64/lib/dec_and_lock.c
--- linux-2.6.0-test11.lockmeter.orig/arch/ia64/lib/dec_and_lock.c 2003-12-11 10:44:10.000000000 +0100
+++ linux-2.6.0-test11.lockmeter/arch/ia64/lib/dec_and_lock.c 2003-12-11 10:43:47.000000000 +0100
@@ -13,6 +13,7 @@
#include <linux/spinlock.h>
#include <asm/atomic.h>
+#ifndef ATOMIC_DEC_AND_LOCK
int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
{
int counter;
@@ -36,3 +37,4 @@
spin_unlock(lock);
return 0;
}
+#endif
diff -urN linux-2.6.0-test11.lockmeter.orig/include/asm-ia64/spinlock.h linux-2.6.0-test11.lockmeter/include/asm-ia64/spinlock.h
--- linux-2.6.0-test11.lockmeter.orig/include/asm-ia64/spinlock.h 2003-12-11 10:09:42.000000000 +0100
+++ linux-2.6.0-test11.lockmeter/include/asm-ia64/spinlock.h 2003-12-11 10:15:45.000000000 +0100
@@ -247,13 +247,27 @@
extern void _metered_spin_unlock(spinlock_t *lock);
/*
- * Use a less efficient, and inline, atomic_dec_and_lock() if lockmetering
- * so we can see the callerPC of who is actually doing the spin_lock().
- * Otherwise, all we see is the generic rollup of all locks done by
- * atomic_dec_and_lock().
+ * Matches what is in arch/ia64/lib/dec_and_lock.c, except this one is
+ * "static inline" so that the spin_lock(), if actually invoked, is charged
+ * against the real caller, not against the catch-all atomic_dec_and_lock
*/
static inline int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
{
+ int counter;
+ int newcount;
+
+repeat:
+ counter = atomic_read(atomic);
+ newcount = counter-1;
+
+ if (!newcount)
+ goto slow_path;
+
+ if(cmpxchg(&atomic->counter, counter, newcount) != counter)
+ goto repeat;
+ return 0;
+
+slow_path:
_metered_spin_lock(lock);
if (atomic_dec_and_test(atomic))
return 1;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ia64 atomic_dec_and_lock() patch
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
` (3 preceding siblings ...)
2003-12-11 9:58 ` jerome.marchand
@ 2003-12-11 20:28 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-12-11 20:28 UTC (permalink / raw)
To: linux-ia64
Jerome,
>>>>> On Thu, 11 Dec 2003 10:58:22 +0100 (NFT), jerome.marchand@ext.bull.net said:
Jerome> On Wed, 10 Dec 2003, David Mosberger wrote:
>> Could you try replacing the inline-asm with cmpxchg() function?
>> That way, you won't break compilation with Intel's compiler.
Jerome> OK, here's the new patch.
Thanks for switching to cmpxchg.
Jerome> I also join the right patch for lockmeter.
You lost me here. The patch seems all messed up to me, because the
standard kernel doesn't include lockmeter and I really don't want any
lockmeter-specific code in the normal kernel.
Secondly, I really dislike using goto's to teach the compiler about
branch-probabilities. The likely()/unlikely() macros lead to much
more readable code (yes, I know that you simply mirrored the i386...).
Third, I think the EXPORT_SYMBOL() was missing for atomic_dec_and_lock().
Anyhow, rather than going through another iteration, I just fixed up
the code myself. See:
http://lia64.bkbits.net:8080/to-linus-2.5/cset@1.1503
With this code, GCC pre3.4 emits this for the fast path:
alloc r2=ar.pfs,2,2,0
1: ld4.acq r15=[r32];;
cmp4.eq p9,p8=1,r15
adds r16=-1,r15
zxt4 r14=r15
(p09) br.cond.dpnt.few slow_path
mov.m ar.ccv=r14;;
cmpxchg4.acq r8=[r32],r16,ar.ccv;;
cmp4.eq p10,p11=r8,r15
(p11) br.cond.dptk.few 1b
mov r8=r0
mov.i ar.pfs=r2
br.ret.sptk.many b0
I think that's about as good as it gets, apart from not being able to
schedule around latencies. If you want to play with this some more,
it might be worthwhile to inline the fast path by moving it into
spinlock.h. But that would only make sense if the
(performance-critical) users of atomic_dec_and_lock are not leaf
functions.
Thanks,
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-12-11 20:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-10 15:37 ia64 atomic_dec_and_lock() patch jerome.marchand
2003-12-10 15:44 ` Christoph Hellwig
2003-12-10 16:06 ` jerome.marchand
2003-12-10 17:33 ` David Mosberger
2003-12-11 9:58 ` jerome.marchand
2003-12-11 20:28 ` David Mosberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox