Re: Purpose of the mm/slab.c changes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Manfred Spraul <manfred@colorfullife.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: linux-kernel@vger.kernel.org, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	torvalds@transmeta.com
Subject: Re: Purpose of the mm/slab.c changes
Date: Tue, 11 Sep 2001 22:43:40 +0200	[thread overview]
Message-ID: <3B9E777C.BA29918B@colorfullife.com> (raw)
In-Reply-To: <3B9B4CFE.E09D6743@colorfullife.com> <20010909162613.Q11329@athlon.random> <001201c13942$b1bec9a0$010411ac@local> <20010909173313.V11329@athlon.random> <004101c13af1$6c099060$010411ac@local> <20010911213602.C715@athlon.random>

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

Andrea Arcangeli wrote:
> 
> >
> > Ok, so you agree that your changes are only beneficial in one case:
> >
> > kmem_cache_free(), uniprocessor or SMP not-per-cpu cached.
> 
> I wouldn't say "not only in such case" but "mainly in such case"
> (there's not infinite ram in the per-cpu caches).
>
Large object were LIFO even without your patch.
\x18
Small objects on UP with partial slabs were LIFO without your patch.

Small objects on UP without any partial slabs were not LIFO, but it was
simple to fix that (patch attached, not yet fully tested).
Small objects on SMP are managed in the per-cpu arrays - and transfered
in batches (by default 30-120 objects, depending on the size) to the
backend slab lists. The backend was not LIFO (also fixed in the patch).

> 
> Also I believe there are more interesting parts to optimize on the
> design side rather than making the slab.c implementation more complex
> with the object of microoptimization for microbenchmarks (as you told me
> you couldn't measure any decrease in performance in real life in a
> sendfile benchmark, infact the first run with the patch was a little
> faster and the second a little slower).
>
Interesting: you loose all microbenchmarks, your patch doesn't improve
LIFO ordering, and you still think your patch is better? Could you
explain why?

Btw, I didn't merge the current slab.c directly: Ingo Molnar compared it
with his own version and sent it to Linus because it was faster than his
version (8-way, tux optimization).

But I agree that redesign is probably a good idea for 2.5, but that's
2.5 stuff:

* microoptimization: try to avoid the division in kmem_cache_free_one.
Might give a 20 % boost on UP i386, even more on cpus without an
efficient hardware division. I have an (ugly) patch. Mark Hemment's
version avoided the division in some cases, and that really helps.
* redesign: the code relies on a fast virt_to_page(). Is that realistic
with NUMA/CONFIG_DISCONTIGMEM?
* redesign: inode and dcache currently drain the per-cpu caches very
often to reduce the fragmentation - perhaps another, stronger
defragmenting allocator for inode & dcache? CPU crosscalls can't be the
perfect solution.
* redesign: NUMA.

But that redesign - virt_to_page is the first function in
kmem_cache_free and kfree - you won't need a backend simplification,
more a rewrite.

--
	Manfred

[-- Attachment #2: patch-slab-lifo --]
[-- Type: text/plain, Size: 3478 bytes --]

--- 2.4/mm/slab.c	Tue Sep 11 21:32:23 2001
+++ build-2.4/mm/slab.c	Tue Sep 11 22:39:54 2001
@@ -85,9 +85,9 @@
  * FORCED_DEBUG	- 1 enables SLAB_RED_ZONE and SLAB_POISON (if possible)
  */
 
-#define	DEBUG		0
-#define	STATS		0
-#define	FORCED_DEBUG	0
+#define	DEBUG		1
+#define	STATS		1
+#define	FORCED_DEBUG	1
 
 /*
  * Parameters for kmem_cache_reap
@@ -448,7 +448,7 @@
 		/* Inc off-slab bufctl limit until the ceiling is hit. */
 		if (!(OFF_SLAB(sizes->cs_cachep))) {
 			offslab_limit = sizes->cs_size-sizeof(slab_t);
-			offslab_limit /= 2;
+			offslab_limit /= sizeof(kmem_bufctl_t);
 		}
 		sprintf(name, "size-%Zd(DMA)",sizes->cs_size);
 		sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
@@ -1411,16 +1411,31 @@
 moveslab_free:
 	/*
 	 * was partial, now empty.
-	 * c_firstnotfull might point to slabp
-	 * FIXME: optimize
+	 * c_firstnotfull might point to slabp.
+	 * The code ensures LIFO ordering if there are no partial slabs.
+	 * (allocation from partial slabs has higher priority that LIFO
+	 *  - we just found a freeable page!)
 	 */
 	{
-		struct list_head *t = cachep->firstnotfull->prev;
-
-		list_del(&slabp->list);
-		list_add_tail(&slabp->list, &cachep->slabs);
-		if (cachep->firstnotfull == &slabp->list)
-			cachep->firstnotfull = t->next;
+		slab_t* next = list_entry(slabp->list.next, slab_t, list);
+		if (&next->list != &cachep->slabs) {
+			if (next->inuse != cachep->num) {
+				if (&slabp->list == cachep->firstnotfull)
+					cachep->firstnotfull = &next->list;
+				list_del(&slabp->list);
+				list_add_tail(&slabp->list, &cachep->slabs);
+			} /* else {
+			     The next slab is a free slab. That means
+			     the slab with the freed object in in it's
+			     correct position: behind all partial slabs,
+			     in front of all other free slabs to ensure
+			     LIFO.
+			} */
+		}/* else {
+		    the slab the freed object was in was the last slab in
+		    the cache. That means it's already in the correct 
+	 	    position: behind all partial slabs (if any).
+		} */
 		return;
 	}
 }
@@ -1473,6 +1488,46 @@
 #endif
 }
 
+#if DEBUG
+static void kmem_slabchain_test(kmem_cache_t *cachep)
+{
+	int pos = 0;
+	slab_t* walk;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cachep->spinlock, flags);
+
+	walk = list_entry(cachep->slabs.next, slab_t, list);
+	while(&walk->list != &cachep->slabs) {
+		if (walk->inuse == cachep->num) {
+			if (pos >  0)
+				BUG();
+		} else if (walk->inuse > 0) {
+			if (pos == 0) {
+				if (cachep->firstnotfull != &walk->list)
+					BUG();
+			}
+			if (pos >  1)
+				BUG();
+			pos = 1; /* found partial slabp */
+		} else {
+			if (pos == 0) {
+				if (cachep->firstnotfull != &walk->list)
+					BUG();
+			}
+			pos = 2; /* found free slabp */
+		}
+		walk = list_entry(walk->list.next, slab_t, list);
+	}
+	if (pos == 0) {
+		if (cachep->firstnotfull != &cachep->slabs)
+			BUG();
+	}
+	spin_unlock_irqrestore(&cachep->spinlock, flags);
+}
+#else
+#define kmem_slabchain_test(cachep)	do { } while(0)
+#endif
 /**
  * kmem_cache_alloc - Allocate an object
  * @cachep: The cache to allocate from.
@@ -1540,6 +1595,7 @@
 	local_irq_save(flags);
 	__kmem_cache_free(cachep, objp);
 	local_irq_restore(flags);
+	kmem_slabchain_test(cachep);
 }
 
 /**
@@ -1561,6 +1617,7 @@
 	c = GET_PAGE_CACHE(virt_to_page(objp));
 	__kmem_cache_free(c, (void*)objp);
 	local_irq_restore(flags);
+	kmem_slabchain_test(c);
 }
 
 kmem_cache_t * kmem_find_general_cachep (size_t size, int gfpflags)

next prev parent reply	other threads:[~2001-09-11 20:43 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-09-09 11:05 Purpose of the mm/slab.c changes Manfred Spraul
2001-09-09 14:26 ` Andrea Arcangeli
2001-09-09 15:18   ` Manfred Spraul
2001-09-09 15:33     ` Andrea Arcangeli
2001-09-11 18:41       ` Manfred Spraul
2001-09-11 19:36         ` Andrea Arcangeli
2001-09-11 20:43           ` Manfred Spraul [this message]
2001-09-12 14:18             ` Andrea Arcangeli
2001-09-09 16:12     ` Alan Cox
2001-09-09 16:25     ` Linus Torvalds
2001-09-09 16:47       ` Alan Cox
2001-09-09 16:55         ` Manfred Spraul
2001-09-09 17:01         ` Linus Torvalds
2001-09-09 17:22           ` Manfred Spraul
2001-09-09 17:27           ` arjan
2001-09-09 17:35           ` Andrea Arcangeli
2001-09-09 17:30         ` Andrea Arcangeli
2001-09-09 17:59         ` Fwd: 2.4.10-pre6 ramdisk driver broken? won't compile Stephan Gutschke
2001-09-09 20:26         ` Purpose of the mm/slab.c changes Rik van Riel
2001-09-15  0:29       ` Albert D. Cahalan
2001-09-09 20:23     ` Rik van Riel
2001-09-09 20:44       ` Davide Libenzi
2001-09-09 20:45         ` Rik van Riel
2001-09-09 20:58           ` Davide Libenzi
2001-09-22 12:28             ` Ralf Baechle
2001-09-22 21:03               ` Davide Libenzi
2001-09-22 21:36                 ` David S. Miller
2001-09-10  2:28     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B9E777C.BA29918B@colorfullife.com \
    --to=manfred@colorfullife.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox