[slubllv4 10/16] slub: Invert locking and avoid slab lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christoph Lameter <cl@linux.com>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Subject: [slubllv4 10/16] slub: Invert locking and avoid slab lock
Date: Fri, 06 May 2011 13:05:51 -0500	[thread overview]
Message-ID: <20110506180700.002723242@linux.com> (raw)
In-Reply-To: 20110506180541.990069206@linux.com

[-- Attachment #1: slab_lock_subsume --]
[-- Type: text/plain, Size: 10314 bytes --]

Locking slabs is no longer necesary if the arch supports cmpxchg operations
and if no debuggin features are used on a slab. If the arch does not support
cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation.

The patch also changes the lock order. Slab locks are subsumed to the node lock
now. With that approach slab_trylocking is no longer necessary.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |  131 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 53 insertions(+), 78 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-06 12:54:35.000000000 -0500
+++ linux-2.6/mm/slub.c	2011-05-06 12:54:36.000000000 -0500
@@ -2,10 +2,11 @@
  * SLUB: A slab allocator that limits cache line use instead of queuing
  * objects in per cpu and per node lists.
  *
- * The allocator synchronizes using per slab locks and only
- * uses a centralized lock to manage a pool of partial slabs.
+ * The allocator synchronizes using per slab locks or atomic operatios
+ * and only uses a centralized lock to manage a pool of partial slabs.
  *
  * (C) 2007 SGI, Christoph Lameter
+ * (C) 2011 Linux Foundation, Christoph Lameter
  */
 
 #include <linux/mm.h>
@@ -32,15 +33,27 @@
 
 /*
  * Lock order:
- *   1. slab_lock(page)
- *   2. slab->list_lock
- *
- *   The slab_lock protects operations on the object of a particular
- *   slab and its metadata in the page struct. If the slab lock
- *   has been taken then no allocations nor frees can be performed
- *   on the objects in the slab nor can the slab be added or removed
- *   from the partial or full lists since this would mean modifying
- *   the page_struct of the slab.
+ *   1. slub_lock (Global Semaphore)
+ *   2. node->list_lock
+ *   3. slab_lock(page) (Only on some arches and for debugging)
+ *
+ *   slub_lock
+ *
+ *   The role of the slub_lock is to protect the list of all the slabs
+ *   and to synchronize major metadata changes to slab cache structures.
+ *
+ *   The slab_lock is only used for debugging and on arches that do not
+ *   have the ability to do a cmpxchg_double. It only protects the second
+ *   double word in the page struct. Meaning
+ *	A. page->freelist	-> List of object free in a page
+ *	B. page->counters	-> Counters of objects
+ *	C. page->frozen		-> frozen state
+ *
+ *   If a slab is frozen then it is exempt from list management. It is not
+ *   on any list. The processor that froze the slab is the one who can
+ *   perform list operations on the page. Other processors may put objects
+ *   onto the freelist but the processor that froze the slab is the only
+ *   one that can retrieve the objects from the page's freelist.
  *
  *   The list_lock protects the partial and full list on each node and
  *   the partial slab counter. If taken then no new slabs may be added or
@@ -53,20 +66,6 @@
  *   slabs, operations can continue without any centralized lock. F.e.
  *   allocating a long series of objects that fill up slabs does not require
  *   the list lock.
- *
- *   The lock order is sometimes inverted when we are trying to get a slab
- *   off a list. We take the list_lock and then look for a page on the list
- *   to use. While we do that objects in the slabs may be freed. We can
- *   only operate on the slab if we have also taken the slab_lock. So we use
- *   a slab_trylock() on the slab. If trylock was successful then no frees
- *   can occur anymore and we can use the slab for allocations etc. If the
- *   slab_trylock() does not succeed then frees are in progress in the slab and
- *   we must stay away from it for a while since we may cause a bouncing
- *   cacheline if we try to acquire the lock. So go onto the next slab.
- *   If all pages are busy then we may allocate a new slab instead of reusing
- *   a partial slab. A new slab has no one operating on it and thus there is
- *   no danger of cacheline contention.
- *
  *   Interrupts are disabled during allocation and deallocation in order to
  *   make the slab allocator safe to use in the context of an irq. In addition
  *   interrupts are disabled to ensure that the processor does not change
@@ -330,6 +329,19 @@ static inline int oo_objects(struct kmem
 	return x.x & OO_MASK;
 }
 
+/*
+ * Per slab locking using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	__bit_spin_unlock(PG_locked, &page->flags);
+}
+
 static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
 		void *freelist_new, unsigned long counters_new,
@@ -344,11 +356,14 @@ static inline bool cmpxchg_double_slab(s
 	} else
 #endif
 	{
+		slab_lock(page);
 		if (page->freelist == freelist_old && page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
+			slab_unlock(page);
 			return 1;
 		}
+		slab_unlock(page);
 	}
 
 	cpu_relax();
@@ -364,7 +379,7 @@ static inline bool cmpxchg_double_slab(s
 /*
  * Determine a map of object in use on a page.
  *
- * Slab lock or node listlock must be held to guarantee that the page does
+ * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
 static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
@@ -796,10 +811,11 @@ static int check_slab(struct kmem_cache
 static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 {
 	int nr = 0;
-	void *fp = page->freelist;
+	void *fp;
 	void *object = NULL;
 	unsigned long max_objects;
 
+	fp = page->freelist;
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -1007,6 +1023,8 @@ bad:
 static noinline int free_debug_processing(struct kmem_cache *s,
 		 struct page *page, void *object, unsigned long addr)
 {
+	slab_lock(page);
+
 	if (!check_slab(s, page))
 		goto fail;
 
@@ -1042,10 +1060,12 @@ static noinline int free_debug_processin
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
 	init_object(s, object, SLUB_RED_INACTIVE);
+	slab_unlock(page);
 	return 1;
 
 fail:
 	slab_fix(s, "Object at 0x%p not freed", object);
+	slab_unlock(page);
 	return 0;
 }
 
@@ -1365,27 +1385,6 @@ static void discard_slab(struct kmem_cac
 }
 
 /*
- * Per slab locking using the pagelock
- */
-static __always_inline void slab_lock(struct page *page)
-{
-	bit_spin_lock(PG_locked, &page->flags);
-}
-
-static __always_inline void slab_unlock(struct page *page)
-{
-	__bit_spin_unlock(PG_locked, &page->flags);
-}
-
-static __always_inline int slab_trylock(struct page *page)
-{
-	int rc = 1;
-
-	rc = bit_spin_trylock(PG_locked, &page->flags);
-	return rc;
-}
-
-/*
  * Management of partially allocated slabs
  */
 static inline void add_partial(struct kmem_cache_node *n,
@@ -1411,17 +1410,13 @@ static inline void remove_partial(struct
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache *s,
+static inline int acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page)
 {
 	void *freelist;
 	unsigned long counters;
 	struct page new;
 
-
-	if (!slab_trylock(page))
-		return 0;
-
 	/*
 	 * Zap the freelist and set the frozen bit.
 	 * The old freelist is the list of objects for the
@@ -1457,7 +1452,6 @@ static inline int lock_and_freeze_slab(s
 		 */
 		printk(KERN_ERR "SLUB: %s : Page without available objects on"
 			" partial list\n", s->name);
-		slab_unlock(page);
 		return 0;
 	}
 }
@@ -1481,7 +1475,7 @@ static struct page *get_partial_node(str
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(s, n, page))
+		if (acquire_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
@@ -1770,8 +1764,6 @@ redo:
 				"unfreezing slab"))
 		goto redo;
 
-	slab_unlock(page);
-
 	if (lock)
 		spin_unlock(&n->list_lock);
 
@@ -1784,7 +1776,6 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	slab_lock(c->page);
 	deactivate_slab(s, c);
 }
 
@@ -1931,7 +1922,6 @@ static void *__slab_alloc(struct kmem_ca
 	if (!page)
 		goto new_slab;
 
-	slab_lock(page);
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
@@ -1960,8 +1950,6 @@ load_freelist:
 	if (unlikely(!object))
 		goto another_slab;
 
-	slab_unlock(page);
-
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2009,7 +1997,6 @@ load_from_page:
 		c->page = page;
 
 		stat(s, ALLOC_SLAB);
-		slab_lock(page);
 
 		goto load_from_page;
 	}
@@ -2198,7 +2185,6 @@ static void __slab_free(struct kmem_cach
 	unsigned long flags;
 
 	local_irq_save(flags);
-	slab_lock(page);
 	stat(s, FREE_SLOWPATH);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
@@ -2264,7 +2250,6 @@ static void __slab_free(struct kmem_cach
 	spin_unlock(&n->list_lock);
 
 out_unlock:
-	slab_unlock(page);
 	local_irq_restore(flags);
 	return;
 
@@ -2278,7 +2263,6 @@ slab_empty:
 	}
 
 	spin_unlock(&n->list_lock);
-	slab_unlock(page);
 	local_irq_restore(flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
@@ -3199,14 +3183,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
+			if (!page->inuse) {
 				remove_partial(n, page);
-				slab_unlock(page);
 				discard_slab(s, page);
 			} else {
 				list_move(&page->lru,
@@ -3794,12 +3772,9 @@ static int validate_slab(struct kmem_cac
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
-	if (slab_trylock(page)) {
-		validate_slab(s, page, map);
-		slab_unlock(page);
-	} else
-		printk(KERN_INFO "SLUB %s: Skipped busy slab 0x%p\n",
-			s->name, page);
+	slab_lock(page);
+	validate_slab(s, page, map);
+	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,

next prev parent reply	other threads:[~2011-05-06 18:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-06 18:05 [slubllv4 00/16] SLUB: Lockless freelists for objects V4 Christoph Lameter
2011-05-06 18:05 ` [slubllv4 01/16] slub: Per object NUMA support Christoph Lameter
2011-05-06 18:05 ` [slubllv4 02/16] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
2011-05-06 18:05 ` [slubllv4 03/16] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
2011-05-06 18:05 ` [slubllv4 04/16] x86: Add support for cmpxchg_double Christoph Lameter
2011-05-06 18:05 ` [slubllv4 05/16] mm: Rearrange struct page Christoph Lameter
2011-05-06 18:05 ` [slubllv4 06/16] slub: Add cmpxchg_double_slab() Christoph Lameter
2011-05-06 18:05 ` [slubllv4 07/16] slub: explicit list_lock taking Christoph Lameter
2011-05-06 18:05 ` [slubllv4 08/16] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
2011-05-06 18:05 ` [slubllv4 09/16] slub: Rework allocator fastpaths Christoph Lameter
2011-05-06 18:05 ` Christoph Lameter [this message]
2011-05-06 18:05 ` [slubllv4 11/16] slub: Disable interrupts in free_debug processing Christoph Lameter
2011-05-06 18:05 ` [slubllv4 12/16] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
2011-05-06 18:05 ` [slubllv4 13/16] slub: Get rid of the another_slab label Christoph Lameter
2011-05-06 18:05 ` [slubllv4 14/16] slub: fast release on full slab Christoph Lameter
2011-05-06 18:05 ` [slubllv4 15/16] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
2011-05-06 18:05 ` [slubllv4 16/16] slub: update statistics for cmpxchg handling Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110506180700.002723242@linux.com \
    --to=cl@linux.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.