All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Lameter <cl@linux.com>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-mm@kvack.org, Thomas Gleixner <tglx@linutronix.de>
Subject: [slubllv5 13/25] slub: Invert locking and avoid slab lock
Date: Mon, 16 May 2011 15:26:18 -0500	[thread overview]
Message-ID: <20110516202628.699365728@linux.com> (raw)
In-Reply-To: 20110516202605.274023469@linux.com

[-- Attachment #1: slab_lock_subsume --]
[-- Type: text/plain, Size: 10665 bytes --]

Locking slabs is no longer necesary if the arch supports cmpxchg operations
and if no debuggin features are used on a slab. If the arch does not support
cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation.

The patch also changes the lock order. Slab locks are subsumed to the node lock
now. With that approach slab_trylocking is no longer necessary.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 mm/slub.c |  131 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 53 insertions(+), 78 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2011-05-16 12:45:33.241458954 -0500
+++ linux-2.6/mm/slub.c	2011-05-16 12:45:39.451458948 -0500
@@ -2,10 +2,11 @@
  * SLUB: A slab allocator that limits cache line use instead of queuing
  * objects in per cpu and per node lists.
  *
- * The allocator synchronizes using per slab locks and only
- * uses a centralized lock to manage a pool of partial slabs.
+ * The allocator synchronizes using per slab locks or atomic operatios
+ * and only uses a centralized lock to manage a pool of partial slabs.
  *
  * (C) 2007 SGI, Christoph Lameter
+ * (C) 2011 Linux Foundation, Christoph Lameter
  */
 
 #include <linux/mm.h>
@@ -32,15 +33,27 @@
 
 /*
  * Lock order:
- *   1. slab_lock(page)
- *   2. slab->list_lock
- *
- *   The slab_lock protects operations on the object of a particular
- *   slab and its metadata in the page struct. If the slab lock
- *   has been taken then no allocations nor frees can be performed
- *   on the objects in the slab nor can the slab be added or removed
- *   from the partial or full lists since this would mean modifying
- *   the page_struct of the slab.
+ *   1. slub_lock (Global Semaphore)
+ *   2. node->list_lock
+ *   3. slab_lock(page) (Only on some arches and for debugging)
+ *
+ *   slub_lock
+ *
+ *   The role of the slub_lock is to protect the list of all the slabs
+ *   and to synchronize major metadata changes to slab cache structures.
+ *
+ *   The slab_lock is only used for debugging and on arches that do not
+ *   have the ability to do a cmpxchg_double. It only protects the second
+ *   double word in the page struct. Meaning
+ *	A. page->freelist	-> List of object free in a page
+ *	B. page->counters	-> Counters of objects
+ *	C. page->frozen		-> frozen state
+ *
+ *   If a slab is frozen then it is exempt from list management. It is not
+ *   on any list. The processor that froze the slab is the one who can
+ *   perform list operations on the page. Other processors may put objects
+ *   onto the freelist but the processor that froze the slab is the only
+ *   one that can retrieve the objects from the page's freelist.
  *
  *   The list_lock protects the partial and full list on each node and
  *   the partial slab counter. If taken then no new slabs may be added or
@@ -53,20 +66,6 @@
  *   slabs, operations can continue without any centralized lock. F.e.
  *   allocating a long series of objects that fill up slabs does not require
  *   the list lock.
- *
- *   The lock order is sometimes inverted when we are trying to get a slab
- *   off a list. We take the list_lock and then look for a page on the list
- *   to use. While we do that objects in the slabs may be freed. We can
- *   only operate on the slab if we have also taken the slab_lock. So we use
- *   a slab_trylock() on the slab. If trylock was successful then no frees
- *   can occur anymore and we can use the slab for allocations etc. If the
- *   slab_trylock() does not succeed then frees are in progress in the slab and
- *   we must stay away from it for a while since we may cause a bouncing
- *   cacheline if we try to acquire the lock. So go onto the next slab.
- *   If all pages are busy then we may allocate a new slab instead of reusing
- *   a partial slab. A new slab has no one operating on it and thus there is
- *   no danger of cacheline contention.
- *
  *   Interrupts are disabled during allocation and deallocation in order to
  *   make the slab allocator safe to use in the context of an irq. In addition
  *   interrupts are disabled to ensure that the processor does not change
@@ -346,7 +345,7 @@ static inline int oo_objects(struct kmem
 /*
  * Determine a map of object in use on a page.
  *
- * Slab lock or node listlock must be held to guarantee that the page does
+ * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
 static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
@@ -358,6 +357,19 @@ static void get_map(struct kmem_cache *s
 		set_bit(slab_index(p, s, addr), map);
 }
 
+/*
+ * Per slab locking using the pagelock
+ */
+static __always_inline void slab_lock(struct page *page)
+{
+	bit_spin_lock(PG_locked, &page->flags);
+}
+
+static __always_inline void slab_unlock(struct page *page)
+{
+	__bit_spin_unlock(PG_locked, &page->flags);
+}
+
 static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
 		void *freelist_old, unsigned long counters_old,
 		void *freelist_new, unsigned long counters_new,
@@ -372,11 +384,14 @@ static inline bool cmpxchg_double_slab(s
 	} else
 #endif
 	{
+		slab_lock(page);
 		if (page->freelist == freelist_old && page->counters == counters_old) {
 			page->freelist = freelist_new;
 			page->counters = counters_new;
+			slab_unlock(page);
 			return 1;
 		}
+		slab_unlock(page);
 	}
 
 	cpu_relax();
@@ -808,10 +823,11 @@ static int check_slab(struct kmem_cache
 static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 {
 	int nr = 0;
-	void *fp = page->freelist;
+	void *fp;
 	void *object = NULL;
 	unsigned long max_objects;
 
+	fp = page->freelist;
 	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
@@ -1019,6 +1035,8 @@ bad:
 static noinline int free_debug_processing(struct kmem_cache *s,
 		 struct page *page, void *object, unsigned long addr)
 {
+	slab_lock(page);
+
 	if (!check_slab(s, page))
 		goto fail;
 
@@ -1054,10 +1072,12 @@ static noinline int free_debug_processin
 		set_track(s, object, TRACK_FREE, addr);
 	trace(s, page, object, 0);
 	init_object(s, object, SLUB_RED_INACTIVE);
+	slab_unlock(page);
 	return 1;
 
 fail:
 	slab_fix(s, "Object at 0x%p not freed", object);
+	slab_unlock(page);
 	return 0;
 }
 
@@ -1385,27 +1405,6 @@ static void discard_slab(struct kmem_cac
 }
 
 /*
- * Per slab locking using the pagelock
- */
-static __always_inline void slab_lock(struct page *page)
-{
-	bit_spin_lock(PG_locked, &page->flags);
-}
-
-static __always_inline void slab_unlock(struct page *page)
-{
-	__bit_spin_unlock(PG_locked, &page->flags);
-}
-
-static __always_inline int slab_trylock(struct page *page)
-{
-	int rc = 1;
-
-	rc = bit_spin_trylock(PG_locked, &page->flags);
-	return rc;
-}
-
-/*
  * Management of partially allocated slabs
  */
 static inline void add_partial(struct kmem_cache_node *n,
@@ -1431,17 +1430,13 @@ static inline void remove_partial(struct
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache *s,
+static inline int acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page)
 {
 	void *freelist;
 	unsigned long counters;
 	struct page new;
 
-
-	if (!slab_trylock(page))
-		return 0;
-
 	/*
 	 * Zap the freelist and set the frozen bit.
 	 * The old freelist is the list of objects for the
@@ -1477,7 +1472,6 @@ static inline int lock_and_freeze_slab(s
 		 */
 		printk(KERN_ERR "SLUB: %s : Page without available objects on"
 			" partial list\n", s->name);
-		slab_unlock(page);
 		return 0;
 	}
 }
@@ -1501,7 +1495,7 @@ static struct page *get_partial_node(str
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(s, n, page))
+		if (acquire_slab(s, n, page))
 			goto out;
 	page = NULL;
 out:
@@ -1790,8 +1784,6 @@ redo:
 				"unfreezing slab"))
 		goto redo;
 
-	slab_unlock(page);
-
 	if (lock)
 		spin_unlock(&n->list_lock);
 
@@ -1805,7 +1797,6 @@ redo:
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	stat(s, CPUSLAB_FLUSH);
-	slab_lock(c->page);
 	deactivate_slab(s, c);
 }
 
@@ -1954,7 +1945,6 @@ static void *__slab_alloc(struct kmem_ca
 	if (!page)
 		goto new_slab;
 
-	slab_lock(page);
 	if (unlikely(!node_match(c, node)))
 		goto another_slab;
 
@@ -1980,8 +1970,6 @@ load_freelist:
 
 	stat(s, ALLOC_REFILL);
 
-	slab_unlock(page);
-
 	c->freelist = get_freepointer(s, object);
 	c->tid = next_tid(c->tid);
 	local_irq_restore(flags);
@@ -2017,7 +2005,6 @@ new_slab:
 		page->inuse = page->objects;
 
 		stat(s, ALLOC_SLAB);
-		slab_lock(page);
 		c->node = page_to_nid(page);
 		c->page = page;
 		goto load_freelist;
@@ -2190,7 +2177,6 @@ static void __slab_free(struct kmem_cach
 	unsigned long flags;
 
 	local_irq_save(flags);
-	slab_lock(page);
 	stat(s, FREE_SLOWPATH);
 
 	if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr))
@@ -2256,7 +2242,6 @@ static void __slab_free(struct kmem_cach
 	spin_unlock(&n->list_lock);
 
 out_unlock:
-	slab_unlock(page);
 	local_irq_restore(flags);
 	return;
 
@@ -2270,7 +2255,6 @@ slab_empty:
 	}
 
 	spin_unlock(&n->list_lock);
-	slab_unlock(page);
 	local_irq_restore(flags);
 	stat(s, FREE_SLAB);
 	discard_slab(s, page);
@@ -3191,14 +3175,8 @@ int kmem_cache_shrink(struct kmem_cache
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
+			if (!page->inuse) {
 				remove_partial(n, page);
-				slab_unlock(page);
 				discard_slab(s, page);
 			} else {
 				list_move(&page->lru,
@@ -3786,12 +3764,9 @@ static int validate_slab(struct kmem_cac
 static void validate_slab_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
 {
-	if (slab_trylock(page)) {
-		validate_slab(s, page, map);
-		slab_unlock(page);
-	} else
-		printk(KERN_INFO "SLUB %s: Skipped busy slab 0x%p\n",
-			s->name, page);
+	slab_lock(page);
+	validate_slab(s, page, map);
+	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-05-16 20:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 20:26 [slubllv5 00/25] SLUB: Lockless freelists for objects V5 Christoph Lameter
2011-05-16 20:26 ` [slubllv5 01/25] slub: Avoid warning for !CONFIG_SLUB_DEBUG Christoph Lameter
2011-05-16 20:26 ` [slubllv5 02/25] slub: Fix control flow in slab_alloc Christoph Lameter
2011-05-16 20:26 ` [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath Christoph Lameter
2011-05-17  4:52   ` Eric Dumazet
2011-05-17 13:46     ` Christoph Lameter
2011-05-17 19:22       ` Pekka Enberg
2011-05-16 20:26 ` [slubllv5 04/25] slub: Push irq disable into allocate_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 05/25] slub: Do not use frozen page flag but a bit in the page counters Christoph Lameter
2011-05-16 20:26 ` [slubllv5 06/25] slub: Move page->frozen handling near where the page->freelist handling occurs Christoph Lameter
2011-05-16 20:26 ` [slubllv5 07/25] x86: Add support for cmpxchg_double Christoph Lameter
2011-05-26 17:57   ` Pekka Enberg
2011-05-26 18:02     ` Christoph Lameter
2011-05-26 18:05   ` H. Peter Anvin
2011-05-26 18:17     ` Christoph Lameter
2011-05-26 18:29       ` H. Peter Anvin
2011-05-26 18:42         ` Christoph Lameter
2011-05-26 21:16         ` Christoph Lameter
2011-05-26 21:21           ` H. Peter Anvin
2011-05-26 21:25           ` Eric Dumazet
2011-05-26 21:31             ` H. Peter Anvin
2011-05-26 21:45               ` Eric Dumazet
2011-05-27  0:49                 ` H. Peter Anvin
2011-05-31 15:13             ` Christoph Lameter
2011-05-31 15:16               ` H. Peter Anvin
2011-05-31 16:53                 ` Christoph Lameter
2011-05-31 23:16                   ` H. Peter Anvin
2011-05-31 23:49                     ` Christoph Lameter
2011-05-31 23:54                       ` H. Peter Anvin
2011-06-01 14:13                         ` Christoph Lameter
2011-06-01 14:46                           ` Christoph Lameter
2011-06-01 15:42                             ` H. Peter Anvin
2011-06-01 16:08                               ` Christoph Lameter
2011-06-01 15:41                           ` H. Peter Anvin
2011-05-27  0:50           ` H. Peter Anvin
2011-05-31 15:10             ` Christoph Lameter
2011-05-16 20:26 ` [slubllv5 08/25] mm: Rearrange struct page Christoph Lameter
2011-05-16 20:26 ` [slubllv5 09/25] slub: Add cmpxchg_double_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 10/25] slub: explicit list_lock taking Christoph Lameter
2011-05-16 20:26 ` [slubllv5 11/25] slub: Pass kmem_cache struct to lock and freeze slab Christoph Lameter
2011-05-16 20:26 ` [slubllv5 12/25] slub: Rework allocator fastpaths Christoph Lameter
2011-05-16 20:26 ` Christoph Lameter [this message]
2011-05-16 20:26 ` [slubllv5 14/25] slub: Disable interrupts in free_debug processing Christoph Lameter
2011-05-16 20:26 ` [slubllv5 15/25] slub: Avoid disabling interrupts in free slowpath Christoph Lameter
2011-05-16 20:26 ` [slubllv5 16/25] slub: Get rid of the another_slab label Christoph Lameter
2011-05-16 20:26 ` [slubllv5 17/25] slub: Add statistics for the case that the current slab does not match the node Christoph Lameter
2011-05-16 20:26 ` [slubllv5 18/25] slub: fast release on full slab Christoph Lameter
2011-05-16 20:26 ` [slubllv5 19/25] slub: Not necessary to check for empty slab on load_freelist Christoph Lameter
2011-05-16 20:26 ` [slubllv5 20/25] slub: slabinfo update for cmpxchg handling Christoph Lameter
2011-05-16 20:26 ` [slubllv5 21/25] slub: Prepare inuse field in new_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 22/25] slub: pass kmem_cache_cpu pointer to get_partial() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 23/25] slub: return object pointer from get_partial() / new_slab() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 24/25] slub: Remove gotos from __slab_free() Christoph Lameter
2011-05-16 20:26 ` [slubllv5 25/25] slub: Remove gotos from __slab_alloc() Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110516202628.699365728@linux.com \
    --to=cl@linux.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.