[slubllv5 13/25] slub: Invert locking and avoid slab lock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Lameter <cl@linux.com>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-mm@kvack.org, Thomas Gleixner <tglx@linutronix.de>
Subject: [slubllv5 13/25] slub: Invert locking and avoid slab lock
Date: Mon, 16 May 2011 15:26:18 -0500	[thread overview]
Message-ID: <20110516202628.699365728@linux.com> (raw)
In-Reply-To: 20110516202605.274023469@linux.com

[-- Attachment #1: slab_lock_subsume --]
[-- Type: text/plain, Size: 10665 bytes --]

Locking slabs is no longer necesary if the arch supports cmpxchg operations
and if no debuggin features are used on a slab. If the arch does not support
cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation.

The patch also changes the lock order. Slab locks are subsumed to the node lock
now. With that approach slab_trylocking is no longer necessary.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |  131 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 53 insertions(+), 78 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-16 12:45:33.241458954 -0500
+++ linux-2.6/mm/slub.c	2011-05-16 12:45:39.451458948 -0500
@@ -2,10 +2,11 @@
  * SLUB: A slab allocator that limits cache line use instead of queuing
  * objects in per cpu and per node lists.
  *
- * The allocator synchronizes using per slab locks and only
- * uses a centralized lock to manage a pool of partial slabs.
+ * The allocator synchronizes using per slab locks or atomic operatios
+ * and only uses a centralized lock to manage a pool of partial slabs.
  *
  * (C) 2007 SGI, Christoph Lameter
+ * (C) 2011 Linux Foundation, Christoph Lameter
  */
 
 #include <linux/mm.h>
@@ -32,15 +33,27 @@
 
 /*
  * Lock order:
- *   1. slab_lock(page)
- *   2. slab->list_lock
- *
- *   The slab_lock protects operations on the object of a particular
- *   slab and its metadata in the page struct. If the slab lock
- *   has been taken then no allocations nor frees can be performed
- *   on the objects in the slab nor can the slab be added or removed
- *   from the partial or full lists since this would mean modifying
- *   the page_struct of the slab.
+ *   1. slub_lock (Global Semaphore)
+ *   2. node->list_lock
+ *   3. slab_lock(page) (Only on some arches and for debugging)
+ *
+ *   slub_lock
+ *
+ *   The role of the slub_lock is to protect the list of all the slabs
+ *   and to synchronize major metadata changes to slab cache structures.
+ *
+ *   The slab_lock is only used for debugging and on arches that do not
+ *   have the ability to do a cmpxchg_double. It only protects the second
+ *   double word in the page struct. Meaning
+ *	A. page->freelist	-> List of object free in a page
+ *	B. page->counters	-> Counters of objects
+ *	C. page->frozen		-> frozen state
+ *
+ *   If a slab is frozen then it is exempt from list management. It is not
+ *   on any list. The processor that froze the slab is the one who can
+ *   perform list operations on the page. Other processors may put objects
+ *   onto the freelist but the processor that froze the slab is the only
+ *   one that can retrieve the objects from the page's freelist.
  *
  *   The list_lock protects the partial and full list on each node and
  *   the partial slab counter. If taken then no new slabs may be added or
@@ -53,20 +66,6 @@
  *   slabs, operations can continue without any centralized lock. F.e.
  *   allocating a long series of objects that fill up slabs does not require
  *   the list lock.
- *
- *   The lock order is sometimes inverted when we are trying to get a slab
- *   off a list. We take the list_lock and then look for a page on the list
- *   to use. While we do that objects in the slabs may be freed. We can
- *   only operate on the slab if we have also taken the slab_lock. So we use
- *   a slab_trylock() on the slab. If trylock was successful then no frees
- *   can occur anymore and we can use the slab for allocations etc. If the
- *   slab_trylock() does not succeed then frees are in progress in the slab and
- *   we must stay away from it for a while since we may cause a bouncing
- *   cacheline if we try to acquire the lock. So go onto the next slab.
- *   If all pages are busy then we may allocate a new slab instead of reusing
- *   a partial slab. A new slab has no one operating on it and thus there is
- *   no danger of cacheline contention.
- *
  *   Interrupts are disabled during allocation and deallocation in order to
  *   make the slab allocator safe to use in the context of an irq. In addition
  *   interrupts are disabled to ensure that the processor does not change
@@ -346,7 +345,7 @@ static inline int oo_objects(struct kmem
 /*
  * Determine a map of object in use on a page.
  *
- * Slab lock or node listlock must be held to guarantee that the page does
+ * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
 static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
@@ -358,6 +357,19 @@ static void get_map(struct kmem_cache *s
 		set_bit(slab_index(p, s, addr), map);
 }
 
+/*
+ * Per slab locking using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	__bit_spin_unlock(PG_locked, &page->flags);
+}
+
 static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
 		void *freelist_new, unsigned long counters_new,
@@ -372,11 +384,14 @@ static inline bool cmpxchg_double_slab(s
 	} else
 #endif
 	{
+		slab_lock(page);
 		if (page->freelist == freelist_old && page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
+			slab_unlock(page);
 			return 1;
 		}
+		slab_unlock(page);
 	}
 
 	cpu_relax();
@@ -808,10 +823,11 @@ static int check_slab(struct kmem_cache
 static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 {
 	int nr = 0;
-	void *fp = page->freelist;
+	void *fp;
 	void *object = NULL;
 	unsigned long max_objects;
 
+	fp = page->freelist;
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -1019,6 +1035,8 @@ bad:
 static noinline int free_debug_processing(struct kmem_cache *s,
 		 struct page *page, void *object, unsigned long addr)
 {
+	slab_lock(page);
+
 	if (!check_slab(s, page))
 		goto fail;
 
@@ -1054,10 +1072,12 @@ static noinline int free_debug_processin
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
 	init_object(s, object, SLUB_RED_INACTIVE);
+	slab_unlock(page);
 	return 1;
 
 fail:
 	slab_fix(s, "Object at 0x%p not freed", object);
+	slab_unlock(page);
 	return 0;
 }
 
@@ -1385,27 +1405,6 @@ static void discard_slab(struct kmem_cac
 }
 
 /*
- * Per slab locking using the pagelock
- */
-static __always_inline void slab_lock(struct page *page)
-{
-	bit_spin_lock(PG_locked, &page->flags);
-}
-
-static __always_inline void slab_unlock(struct page *page)
-{
-	__bit_spin_unlock(PG_locked, &page->flags);
-}
-
-static __always_inline int slab_trylock(struct page *page)
-{
-	int rc = 1;
-
-	rc = bit_spin_trylock(PG_locked, &page->flags);
-	return rc;
-}
-
-/*
  * Management of partially allocated slabs
  */
 static inline void add_partial(struct kmem_cache_node *n,
@@ -1431,17 +1430,13 @@ static inline void remove_partial(struct
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache *s,
+static inline int acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page)
 {
 	void *freelist;
 	unsigned long counters;
 	struct page new;
 
-
-	if (!slab_trylock(page))
-		return 0;
-
 	/*
 	 * Zap the freelist and set the frozen bit.
 	 * The old freelist is the list of objects for the
@@ -1477,7 +1472,6 @@ static inline int lock_and_freeze_slab(s
 		 */
 		printk(KERN_ERR "SLUB: %s : Page without available objects on"
 			" partial list\n", s->name);
-		slab_unlock(page);
 		return 0;
 	}
 }
@@ -1501,7 +1495,7 @@ static struct page *get_partial_node(str
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(s, n, page))
+		if (acquire_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
@@ -1790,8 +1784,6 @@ redo:
 				"unfreezing slab"))
 		goto redo;
 
-	slab_unlock(page);
-
 	if (lock)
 		spin_unlock(&n->list_lock);
 
@@ -1805,7 +1797,6 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	slab_lock(c->page);
 	deactivate_slab(s, c);
 }
 
@@ -1954,7 +1945,6 @@ static void *__slab_alloc(struct kmem_ca
 	if (!page)
 		goto new_slab;
 
-	slab_lock(page);
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
@@ -1980,8 +1970,6 @@ load_freelist:
 
 	stat(s, ALLOC_REFILL);
 
-	slab_unlock(page);
-
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2017,7 +2005,6 @@ new_slab:
 		page->inuse = page->objects;
 
 		stat(s, ALLOC_SLAB);
-		slab_lock(page);
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -2190,7 +2177,6 @@ static void __slab_free(struct kmem_cach
 	unsigned long flags;
 
 	local_irq_save(flags);
-	slab_lock(page);
 	stat(s, FREE_SLOWPATH);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
@@ -2256,7 +2242,6 @@ static void __slab_free(struct kmem_cach
 	spin_unlock(&n->list_lock);
 
 out_unlock:
-	slab_unlock(page);
 	local_irq_restore(flags);
 	return;
 
@@ -2270,7 +2255,6 @@ slab_empty:
 	}
 
 	spin_unlock(&n->list_lock);
-	slab_unlock(page);
 	local_irq_restore(flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
@@ -3191,14 +3175,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
+			if (!page->inuse) {
 				remove_partial(n, page);
-				slab_unlock(page);
 				discard_slab(s, page);
 			} else {
 				list_move(&page->lru,
@@ -3786,12 +3764,9 @@ static int validate_slab(struct kmem_cac
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
-	if (slab_trylock(page)) {
-		validate_slab(s, page, map);
-		slab_unlock(page);
-	} else
-		printk(KERN_INFO "SLUB %s: Skipped busy slab 0x%p\n",
-			s->name, page);
+	slab_lock(page);
+	validate_slab(s, page, map);
+	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-05-16 20:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 20:26 [slubllv5 00/25] SLUB: Lockless freelists for objects V5 Christoph Lameter
2011-05-16 20:26 ` [slubllv5 01/25] slub: Avoid warning for !CONFIG_SLUB_DEBUG Christoph Lameter
2011-05-16 20:26 ` [slubllv5 02/25] slub: Fix control flow in slab_alloc Christoph Lameter
2011-05-16 20:26 ` [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath Christoph Lameter
2011-05-17  4:52   ` Eric Dumazet
2011-05-17 13:46     ` Christoph Lameter
2011-05-17 19:22       ` Pekka Enberg
2011-05-16 20:26 ` [slubllv5 04/25] slub: Push irq disable into allocate_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 05/25] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
2011-05-16 20:26 ` [slubllv5 06/25] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
2011-05-16 20:26 ` [slubllv5 07/25] x86: Add support for cmpxchg_double Christoph Lameter
2011-05-26 17:57   ` Pekka Enberg
2011-05-26 18:02     ` Christoph Lameter
2011-05-26 18:05   ` H. Peter Anvin
2011-05-26 18:17     ` Christoph Lameter
2011-05-26 18:29       ` H. Peter Anvin
2011-05-26 18:42         ` Christoph Lameter
2011-05-26 21:16         ` Christoph Lameter
2011-05-26 21:21           ` H. Peter Anvin
2011-05-26 21:25           ` Eric Dumazet
2011-05-26 21:31             ` H. Peter Anvin
2011-05-26 21:45               ` Eric Dumazet
2011-05-27  0:49                 ` H. Peter Anvin
2011-05-31 15:13             ` Christoph Lameter
2011-05-31 15:16               ` H. Peter Anvin
2011-05-31 16:53                 ` Christoph Lameter
2011-05-31 23:16                   ` H. Peter Anvin
2011-05-31 23:49                     ` Christoph Lameter
2011-05-31 23:54                       ` H. Peter Anvin
2011-06-01 14:13                         ` Christoph Lameter
2011-06-01 14:46                           ` Christoph Lameter
2011-06-01 15:42                             ` H. Peter Anvin
2011-06-01 16:08                               ` Christoph Lameter
2011-06-01 15:41                           ` H. Peter Anvin
2011-05-27  0:50           ` H. Peter Anvin
2011-05-31 15:10             ` Christoph Lameter
2011-05-16 20:26 ` [slubllv5 08/25] mm: Rearrange struct page Christoph Lameter
2011-05-16 20:26 ` [slubllv5 09/25] slub: Add cmpxchg_double_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 10/25] slub: explicit list_lock taking Christoph Lameter
2011-05-16 20:26 ` [slubllv5 11/25] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
2011-05-16 20:26 ` [slubllv5 12/25] slub: Rework allocator fastpaths Christoph Lameter
2011-05-16 20:26 ` Christoph Lameter [this message]
2011-05-16 20:26 ` [slubllv5 14/25] slub: Disable interrupts in free_debug processing Christoph Lameter
2011-05-16 20:26 ` [slubllv5 15/25] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
2011-05-16 20:26 ` [slubllv5 16/25] slub: Get rid of the another_slab label Christoph Lameter
2011-05-16 20:26 ` [slubllv5 17/25] slub: Add statistics for the case that the current slab does not match the node Christoph Lameter
2011-05-16 20:26 ` [slubllv5 18/25] slub: fast release on full slab Christoph Lameter
2011-05-16 20:26 ` [slubllv5 19/25] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
2011-05-16 20:26 ` [slubllv5 20/25] slub: slabinfo update for cmpxchg handling Christoph Lameter
2011-05-16 20:26 ` [slubllv5 21/25] slub: Prepare inuse field in new_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 22/25] slub: pass kmem_cache_cpu pointer to get_partial() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 23/25] slub: return object pointer from get_partial() / new_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 24/25] slub: Remove gotos from __slab_free() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 25/25] slub: Remove gotos from __slab_alloc() Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110516202628.699365728@linux.com \
    --to=cl@linux.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).