I have run a benchmark which load heavily the vfs on a 16 Itanium computer. When using lockmeter, I have noticed that dcache_lock induce a significant contention when called from dput. I observed a case in which 80% of CPUs time was used in spin-wait! The ia64 kernel waste all this time because there is no ia64-specific implementation of atomic_dec_and_lock() and the kernel use the generic function instead. I wrote the ia64 atomic_dec_and_lock function and since dcache_lock never use more than 0.01% of CPUs time and I have encountered no problem. The patch is here. Does someone know why this function was not implemented before whereas it is implemented for ia32, ppc, ppc64, sparc64 and alpha processors ? Jerome Marchand PS: I have also join the patch for lockmeter to this mail. diff -urN linux-2.6.0-test11.orig/arch/ia64/Kconfig linux-2.6.0-test11/arch/ia64/Kconfig --- linux-2.6.0-test11.orig/arch/ia64/Kconfig 2003-12-09 11:26:58.000000000 +0100 +++ linux-2.6.0-test11/arch/ia64/Kconfig 2003-12-09 11:34:09.000000000 +0100 @@ -375,6 +375,11 @@ depends on IA32_SUPPORT default y +config HAVE_DEC_LOCK + bool + depends on (SMP || PREEMPT) + default y + config PERFMON bool "Performance monitor support" help diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/Makefile linux-2.6.0-test11/arch/ia64/lib/Makefile --- linux-2.6.0-test11.orig/arch/ia64/lib/Makefile 2003-12-09 11:26:58.000000000 +0100 +++ linux-2.6.0-test11/arch/ia64/lib/Makefile 2003-12-09 11:32:05.000000000 +0100 @@ -13,6 +13,7 @@ lib-$(CONFIG_MCKINLEY) += copy_page_mck.o memcpy_mck.o lib-$(CONFIG_PERFMON) += carta_random.o lib-$(CONFIG_MD_RAID5) += xor.o +lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o AFLAGS___divdi3.o = AFLAGS___udivdi3.o = -DUNSIGNED diff -urN linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c --- linux-2.6.0-test11.orig/arch/ia64/lib/dec_and_lock.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.0-test11/arch/ia64/lib/dec_and_lock.c 2003-12-09 11:31:23.000000000 +0100 @@ -0,0 +1,42 @@ +/* + * ia64 version of "atomic_dec_and_lock()" using + * the atomic "cmpxchg" instruction. + * This code is an adaptation of the x86 version + * of "atomic_dec_and_lock()". + */ + +#include +#include + +#ifndef ATOMIC_DEC_AND_LOCK +int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +{ + int counter; + int newcount; + +repeat: + counter = atomic_read(atomic); + newcount = counter-1; + + if (!newcount) + goto slow_path; + + asm volatile("mov ar.ccv=%1;;\n\t" + "cmpxchg4.acq %0=%2,%3,ar.ccv;;" + :"=r" (newcount) + :"r" (counter), "m" (atomic->counter), "r" (newcount) + :"ar.ccv"); + + if (newcount != counter) + goto repeat; + return 0; + +slow_path: + spin_lock(lock); + if (atomic_dec_and_test(atomic)) + return 1; + spin_unlock(lock); + return 0; +} +#endif