All of lore.kernel.org
 help / color / mirror / Atom feed
* [IA64] Reduce __clear_bit_unlock overhead
@ 2007-10-19  3:38 Christoph Lameter
  2007-10-19  4:34 ` Nick Piggin
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Christoph Lameter @ 2007-10-19  3:38 UTC (permalink / raw)
  To: linux-ia64

On Fri, 19 Oct 2007, Nick Piggin wrote:

> I'm not sure, I had an idea it was relatively expensive on ia64,
> but I didn't really test with a good workload (a microbenchmark
> probably isn't that good because it won't generate too much out
> of order memory traffic that needs to be fenced).

Its expensive on IA64. Is it any less expensive on x86?

> > Where can I find your patchset? I looked through lkml but did not see it.
> 
> Infrastructure in -mm, starting at bitops-introduce-lock-ops.patch.
> bit_spin_lock-use-lock-bitops.patch and ia64-lock-bitops.patch are
> ones to look at.

ia64-lock-bitops.patch defines:

static __inline__ void
clear_bit_unlock (int nr, volatile void *addr)
{
       __u32 mask, old, new;
       volatile __u32 *m;
       CMPXCHG_BUGCHECK_DECL

       m = (volatile __u32 *) addr + (nr >> 5);
       mask = ~(1 << (nr & 31));
       do {
               CMPXCHG_BUGCHECK(m);
               old = *m;
               new = old & mask;
       } while (cmpxchg_rel(m, old, new) != old);
}

/**
 * __clear_bit_unlock - Non-atomically clear a bit with release
 *
 * This is like clear_bit_unlock, but the implementation may use a non-atomic
 * store (this one uses an atomic, however).
 */
#define __clear_bit_unlock clear_bit_unlock


A non atomic store is a misaligned store on IA64. That is not 
relevant here. The data is properly aligned. I guess it was intended to
refer to the cmpxchg.

How about this patch? [Works fine on IA64 simulator...]




IA64: Slim down __clear_bit_unlock

__clear_bit_unlock does not need to perform atomic operations on the variable.
Avoid a cmpxchg and simply do a store with release semantics. Add a barrier to
be safe that the compiler does not do funky things.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-ia64/bitops.h |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: linux-2.6.23-mm1/include/asm-ia64/bitops.h
=================================--- linux-2.6.23-mm1.orig/include/asm-ia64/bitops.h	2007-10-18 19:37:22.000000000 -0700
+++ linux-2.6.23-mm1/include/asm-ia64/bitops.h	2007-10-18 19:50:22.000000000 -0700
@@ -124,10 +124,21 @@ clear_bit_unlock (int nr, volatile void 
 /**
  * __clear_bit_unlock - Non-atomically clear a bit with release
  *
- * This is like clear_bit_unlock, but the implementation may use a non-atomic
- * store (this one uses an atomic, however).
+ * This is like clear_bit_unlock, but the implementation uses a store
+ * with release semantics. See also __raw_spin_unlock().
  */
-#define __clear_bit_unlock clear_bit_unlock
+static __inline__ void
+__clear_bit_unlock (int nr, volatile void *addr)
+{
+	__u32 mask, new;
+	volatile __u32 *m;
+
+	m = (volatile __u32 *) addr + (nr >> 5);
+	mask = ~(1 << (nr & 31));
+	new = *m & mask;
+	barrier();
+	asm volatile ("st4.rel.nta [%0] = %1\n\t" :: "r"(m), "r"(new));
+}
 
 /**
  * __clear_bit - Clears a bit in memory (non-atomic version)

^ permalink raw reply	[flat|nested] 11+ messages in thread
* SLUB: Avoid atomic operation for slab_unlock
@ 2007-10-18 22:15 Christoph Lameter
  2007-10-19  1:56 ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Lameter @ 2007-10-18 22:15 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm

Currently page flags are only modified in SLUB under page lock. This means
that we do not need an atomic operation to release the lock since there
is nothing we can race against that is modifying page flags. We can simply
clear the bit without the use of an atomic operation and make sure that this
change becomes visible after the other changes to slab metadata through
a memory barrier.

The performance of slab_free() increases 10-15% (SMP configuration doing
a long series of remote frees).

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-10-18 14:12:59.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-10-18 14:24:43.000000000 -0700
@@ -1180,9 +1180,22 @@ static __always_inline void slab_lock(st
 	bit_spin_lock(PG_locked, &page->flags);
 }
 
+/*
+ * Slab unlock version that avoids having to use atomic operations
+ * (echos some of the code of bit_spin_unlock!)
+ */
 static __always_inline void slab_unlock(struct page *page)
 {
-	bit_spin_unlock(PG_locked, &page->flags);
+#ifdef CONFIG_SMP
+	unsigned long flags;
+
+	flags = page->flags & ~(1 << PG_locked);
+
+	smp_wmb();
+	page->flags = flags;
+#endif
+	preempt_enable();
+	__release(bitlock);
 }
 
 static __always_inline int slab_trylock(struct page *page)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-10-21  4:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-19  3:38 [IA64] Reduce __clear_bit_unlock overhead Christoph Lameter
2007-10-19  4:34 ` Nick Piggin
2007-10-19  9:14 ` Zoltan Menyhart
2007-10-19  9:28 ` Nick Piggin
2007-10-19 10:58 ` Christoph Lameter
2007-10-19 11:12 ` Christoph Lameter
2007-10-19 14:15 ` Zoltan Menyhart
2007-10-19 17:44 ` Christoph Lameter
2007-10-21  4:43 ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2007-10-18 22:15 SLUB: Avoid atomic operation for slab_unlock Christoph Lameter
2007-10-19  1:56 ` Nick Piggin
2007-10-19  2:01   ` Christoph Lameter
2007-10-19  2:12     ` Nick Piggin
2007-10-19  3:26       ` [IA64] Reduce __clear_bit_unlock overhead Christoph Lameter
2007-10-19  3:26         ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.