From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lin Ming <ming.m.lin@intel.com>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [patch] SLQB slab allocator
Date: Fri, 23 Jan 2009 04:55:03 +0100 [thread overview]
Message-ID: <20090123035503.GD20098@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0901211705570.7020@blonde.anvils>
On Wed, Jan 21, 2009 at 06:10:12PM +0000, Hugh Dickins wrote:
> On Wed, 21 Jan 2009, Nick Piggin wrote:
> >
> > Since last posted, I've cleaned up a few bits and pieces, (hopefully)
> > fixed a known bug where it wouldn't boot on memoryless nodes (I don't
> > have a system to test with), and improved performance and reduced
> > locking somewhat for node-specific and interleaved allocations.
>
> I haven't reviewed your postings, but I did give the previous version
> of your patch a try on all my machines. Some observations and one patch.
Great, thanks!
> I was initially _very_ impressed by how well it did on my venerable
> tmpfs loop swapping loads, where I'd expected next to no effect; but
> that turned out to be because on three machines I'd been using SLUB,
> without remembering how default slub_max_order got raised from 1 to 3
> in 2.6.26 (hmm, and Documentation/vm/slub.txt not updated).
>
> That's been making SLUB behave pretty badly (e.g. elapsed time 30%
> more than SLAB) with swapping loads on most of my machines. Though
> oddly one seems immune, and another takes four times as long: guess
> it depends on how close to thrashing, but probably more to investigate
> there. I think my original SLUB versus SLAB comparisons were done on
> the immune one: as I remember, SLUB and SLAB were equivalent on those
> loads when SLUB came in, but even with boot option slub_max_order=1,
> SLUB is still slower than SLAB on such tests (e.g. 2% slower).
> FWIW - swapping loads are not what anybody should tune for.
Yeah, that's to be expected with higher order allocations I think. Does
your immune machine simply have fewer CPUs and thus doesn't use such
high order allocations?
> So in fact SLQB comes in very much like SLAB, as I think you'd expect:
> slightly ahead of it on most of the machines, but probably in the noise.
> (SLOB behaves decently: not a winner, but no catastrophic behaviour.)
>
> What I love most about SLUB is the way you can reasonably build with
> CONFIG_SLUB_DEBUG=y, very little impact, then switch on the specific
> debugging you want with a boot option when you want it. That was a
> great stride forward, which you've followed in SLQB: so I'd have to
> prefer SLQB to SLAB (on debuggability) and to SLUB (on high orders).
It is nice. All credit to Christoph for that (and the fine grained
sysfs code).
> I do hate the name SLQB. Despite having no experience of databases,
> I find it almost impossible to type, coming out as SQLB most times.
> Wish you'd invented a plausible vowel instead of the Q; but probably
> too late for that.
Yeah, apologies for the name :P
> init/Kconfig describes it as "Qeued allocator": should say "Queued".
Thanks.
> Documentation/vm/slqbinfo.c gives several compilation warnings:
> I'd rather leave it to you to fix them, maybe the unused variables
> are about to be used, or maybe there's much worse wrong with it
> than a few compilation warnings, I didn't investigate.
OK.
> The only bug I found (but you'll probably want to change the patch
> - which I've rediffed to today's slqb.c, but not retested).
>
> On fake NUMA I hit kernel BUG at mm/slqb.c:1107! claim_remote_free_list()
> is doing several things without remote_free.lock: that VM_BUG_ON is unsafe
> for one, and even if others are somehow safe today, it will be more robust
> to take the lock sooner.
Good catch, thanks. The BUG should be OK where it is if we only
claim the remote free list when remote_free_check is is set, but
some of the periodic reaping and teardown code calls it unconditionally.
But it's not critical so it should definitely go inside the lock.
> I moved the prefetchw(head) down to where we know it's going to be the head,
> and replaced the offending VM_BUG_ON by a later WARN_ON which you'd probably
> better remove altogether: once we got the lock, it's hardly interesting.
Right, I'll probably do that. Thanks!
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
>
> mm/slqb.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> --- slqb/mm/slqb.c.orig 2009-01-21 15:23:54.000000000 +0000
> +++ slqb/mm/slqb.c 2009-01-21 15:32:44.000000000 +0000
> @@ -1115,17 +1115,12 @@ static void claim_remote_free_list(struc
> void **head, **tail;
> int nr;
>
> - VM_BUG_ON(!l->remote_free.list.head != !l->remote_free.list.tail);
> -
> if (!l->remote_free.list.nr)
> return;
>
> + spin_lock(&l->remote_free.lock);
> l->remote_free_check = 0;
> head = l->remote_free.list.head;
> - /* Get the head hot for the likely subsequent allocation or flush */
> - prefetchw(head);
> -
> - spin_lock(&l->remote_free.lock);
> l->remote_free.list.head = NULL;
> tail = l->remote_free.list.tail;
> l->remote_free.list.tail = NULL;
> @@ -1133,9 +1128,15 @@ static void claim_remote_free_list(struc
> l->remote_free.list.nr = 0;
> spin_unlock(&l->remote_free.lock);
>
> - if (!l->freelist.nr)
> + WARN_ON(!head + !tail != !nr + !nr);
> + if (!nr)
> + return;
> +
> + if (!l->freelist.nr) {
> + /* Get head hot for likely subsequent allocation or flush */
> + prefetchw(head);
> l->freelist.head = head;
> - else
> + } else
> set_freepointer(s, l->freelist.tail, head);
> l->freelist.tail = tail;
>
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lin Ming <ming.m.lin@intel.com>,
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [patch] SLQB slab allocator
Date: Fri, 23 Jan 2009 04:55:03 +0100 [thread overview]
Message-ID: <20090123035503.GD20098@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0901211705570.7020@blonde.anvils>
On Wed, Jan 21, 2009 at 06:10:12PM +0000, Hugh Dickins wrote:
> On Wed, 21 Jan 2009, Nick Piggin wrote:
> >
> > Since last posted, I've cleaned up a few bits and pieces, (hopefully)
> > fixed a known bug where it wouldn't boot on memoryless nodes (I don't
> > have a system to test with), and improved performance and reduced
> > locking somewhat for node-specific and interleaved allocations.
>
> I haven't reviewed your postings, but I did give the previous version
> of your patch a try on all my machines. Some observations and one patch.
Great, thanks!
> I was initially _very_ impressed by how well it did on my venerable
> tmpfs loop swapping loads, where I'd expected next to no effect; but
> that turned out to be because on three machines I'd been using SLUB,
> without remembering how default slub_max_order got raised from 1 to 3
> in 2.6.26 (hmm, and Documentation/vm/slub.txt not updated).
>
> That's been making SLUB behave pretty badly (e.g. elapsed time 30%
> more than SLAB) with swapping loads on most of my machines. Though
> oddly one seems immune, and another takes four times as long: guess
> it depends on how close to thrashing, but probably more to investigate
> there. I think my original SLUB versus SLAB comparisons were done on
> the immune one: as I remember, SLUB and SLAB were equivalent on those
> loads when SLUB came in, but even with boot option slub_max_order=1,
> SLUB is still slower than SLAB on such tests (e.g. 2% slower).
> FWIW - swapping loads are not what anybody should tune for.
Yeah, that's to be expected with higher order allocations I think. Does
your immune machine simply have fewer CPUs and thus doesn't use such
high order allocations?
> So in fact SLQB comes in very much like SLAB, as I think you'd expect:
> slightly ahead of it on most of the machines, but probably in the noise.
> (SLOB behaves decently: not a winner, but no catastrophic behaviour.)
>
> What I love most about SLUB is the way you can reasonably build with
> CONFIG_SLUB_DEBUG=y, very little impact, then switch on the specific
> debugging you want with a boot option when you want it. That was a
> great stride forward, which you've followed in SLQB: so I'd have to
> prefer SLQB to SLAB (on debuggability) and to SLUB (on high orders).
It is nice. All credit to Christoph for that (and the fine grained
sysfs code).
> I do hate the name SLQB. Despite having no experience of databases,
> I find it almost impossible to type, coming out as SQLB most times.
> Wish you'd invented a plausible vowel instead of the Q; but probably
> too late for that.
Yeah, apologies for the name :P
> init/Kconfig describes it as "Qeued allocator": should say "Queued".
Thanks.
> Documentation/vm/slqbinfo.c gives several compilation warnings:
> I'd rather leave it to you to fix them, maybe the unused variables
> are about to be used, or maybe there's much worse wrong with it
> than a few compilation warnings, I didn't investigate.
OK.
> The only bug I found (but you'll probably want to change the patch
> - which I've rediffed to today's slqb.c, but not retested).
>
> On fake NUMA I hit kernel BUG at mm/slqb.c:1107! claim_remote_free_list()
> is doing several things without remote_free.lock: that VM_BUG_ON is unsafe
> for one, and even if others are somehow safe today, it will be more robust
> to take the lock sooner.
Good catch, thanks. The BUG should be OK where it is if we only
claim the remote free list when remote_free_check is is set, but
some of the periodic reaping and teardown code calls it unconditionally.
But it's not critical so it should definitely go inside the lock.
> I moved the prefetchw(head) down to where we know it's going to be the head,
> and replaced the offending VM_BUG_ON by a later WARN_ON which you'd probably
> better remove altogether: once we got the lock, it's hardly interesting.
Right, I'll probably do that. Thanks!
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
>
> mm/slqb.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> --- slqb/mm/slqb.c.orig 2009-01-21 15:23:54.000000000 +0000
> +++ slqb/mm/slqb.c 2009-01-21 15:32:44.000000000 +0000
> @@ -1115,17 +1115,12 @@ static void claim_remote_free_list(struc
> void **head, **tail;
> int nr;
>
> - VM_BUG_ON(!l->remote_free.list.head != !l->remote_free.list.tail);
> -
> if (!l->remote_free.list.nr)
> return;
>
> + spin_lock(&l->remote_free.lock);
> l->remote_free_check = 0;
> head = l->remote_free.list.head;
> - /* Get the head hot for the likely subsequent allocation or flush */
> - prefetchw(head);
> -
> - spin_lock(&l->remote_free.lock);
> l->remote_free.list.head = NULL;
> tail = l->remote_free.list.tail;
> l->remote_free.list.tail = NULL;
> @@ -1133,9 +1128,15 @@ static void claim_remote_free_list(struc
> l->remote_free.list.nr = 0;
> spin_unlock(&l->remote_free.lock);
>
> - if (!l->freelist.nr)
> + WARN_ON(!head + !tail != !nr + !nr);
> + if (!nr)
> + return;
> +
> + if (!l->freelist.nr) {
> + /* Get head hot for likely subsequent allocation or flush */
> + prefetchw(head);
> l->freelist.head = head;
> - else
> + } else
> set_freepointer(s, l->freelist.tail, head);
> l->freelist.tail = tail;
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-01-23 3:55 UTC|newest]
Thread overview: 197+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-21 14:30 [patch] SLQB slab allocator Nick Piggin
2009-01-21 14:30 ` Nick Piggin
2009-01-21 14:59 ` Ingo Molnar
2009-01-21 14:59 ` Ingo Molnar
2009-01-21 15:17 ` Nick Piggin
2009-01-21 15:17 ` Nick Piggin
2009-01-21 16:56 ` Nick Piggin
2009-01-21 16:56 ` Nick Piggin
2009-01-21 17:40 ` Ingo Molnar
2009-01-21 17:40 ` Ingo Molnar
2009-01-23 3:31 ` Nick Piggin
2009-01-23 3:31 ` Nick Piggin
2009-01-23 6:14 ` Nick Piggin
2009-01-23 6:14 ` Nick Piggin
2009-01-23 12:56 ` Ingo Molnar
2009-01-23 12:56 ` Ingo Molnar
2009-01-21 17:59 ` Joe Perches
2009-01-21 17:59 ` Joe Perches
2009-01-23 3:35 ` Nick Piggin
2009-01-23 3:35 ` Nick Piggin
2009-01-23 4:00 ` Joe Perches
2009-01-23 4:00 ` Joe Perches
2009-01-21 18:10 ` Hugh Dickins
2009-01-21 18:10 ` Hugh Dickins
2009-01-22 10:01 ` Pekka Enberg
2009-01-22 10:01 ` Pekka Enberg
2009-01-22 12:47 ` Hugh Dickins
2009-01-22 12:47 ` Hugh Dickins
2009-01-23 14:23 ` Hugh Dickins
2009-01-23 14:23 ` Hugh Dickins
2009-01-23 14:30 ` Pekka Enberg
2009-01-23 14:30 ` Pekka Enberg
2009-02-02 3:38 ` Zhang, Yanmin
2009-02-02 3:38 ` Zhang, Yanmin
2009-02-02 9:00 ` Pekka Enberg
2009-02-02 9:00 ` Pekka Enberg
2009-02-02 15:00 ` Christoph Lameter
2009-02-02 15:00 ` Christoph Lameter
2009-02-03 1:34 ` Zhang, Yanmin
2009-02-03 1:34 ` Zhang, Yanmin
2009-02-03 7:29 ` Zhang, Yanmin
2009-02-03 7:29 ` Zhang, Yanmin
2009-02-03 12:18 ` Hugh Dickins
2009-02-03 12:18 ` Hugh Dickins
2009-02-04 2:21 ` Zhang, Yanmin
2009-02-04 2:21 ` Zhang, Yanmin
2009-02-05 19:04 ` Hugh Dickins
2009-02-05 19:04 ` Hugh Dickins
2009-02-06 0:47 ` Zhang, Yanmin
2009-02-06 0:47 ` Zhang, Yanmin
2009-02-06 8:57 ` Pekka Enberg
2009-02-06 8:57 ` Pekka Enberg
2009-02-06 12:33 ` Hugh Dickins
2009-02-06 12:33 ` Hugh Dickins
2009-02-10 8:56 ` Zhang, Yanmin
2009-02-10 8:56 ` Zhang, Yanmin
2009-02-02 11:50 ` Hugh Dickins
2009-01-23 3:55 ` Nick Piggin [this message]
2009-01-23 3:55 ` Nick Piggin
2009-01-23 13:57 ` Hugh Dickins
2009-01-23 13:57 ` Hugh Dickins
2009-01-22 8:45 ` Zhang, Yanmin
2009-01-22 8:45 ` Zhang, Yanmin
2009-01-23 3:57 ` Nick Piggin
2009-01-23 3:57 ` Nick Piggin
2009-01-23 9:00 ` Nick Piggin
2009-01-23 9:00 ` Nick Piggin
2009-01-23 13:34 ` Hugh Dickins
2009-01-23 13:34 ` Hugh Dickins
2009-01-23 13:44 ` Nick Piggin
2009-01-23 13:44 ` Nick Piggin
2009-01-23 9:55 ` Andi Kleen
2009-01-23 9:55 ` Andi Kleen
2009-01-23 10:13 ` Pekka Enberg
2009-01-23 10:13 ` Pekka Enberg
2009-01-23 11:25 ` Nick Piggin
2009-01-23 11:25 ` Nick Piggin
2009-01-23 11:57 ` Andi Kleen
2009-01-23 11:57 ` Andi Kleen
2009-01-23 13:18 ` Nick Piggin
2009-01-23 13:18 ` Nick Piggin
2009-01-23 14:04 ` Andi Kleen
2009-01-23 14:04 ` Andi Kleen
2009-01-23 14:27 ` Nick Piggin
2009-01-23 14:27 ` Nick Piggin
2009-01-23 15:06 ` Andi Kleen
2009-01-23 15:06 ` Andi Kleen
2009-01-23 15:15 ` Nick Piggin
2009-01-23 15:15 ` Nick Piggin
2009-01-23 12:55 ` Nick Piggin
2009-01-23 12:55 ` Nick Piggin
-- strict thread matches above, loose matches on Subject: below --
2009-01-14 9:04 Nick Piggin
2009-01-14 9:04 ` Nick Piggin
2009-01-14 10:53 ` Pekka Enberg
2009-01-14 10:53 ` Pekka Enberg
2009-01-14 11:47 ` Nick Piggin
2009-01-14 11:47 ` Nick Piggin
2009-01-14 13:44 ` Pekka Enberg
2009-01-14 13:44 ` Pekka Enberg
2009-01-14 14:22 ` Nick Piggin
2009-01-14 14:22 ` Nick Piggin
2009-01-14 14:45 ` Pekka Enberg
2009-01-14 14:45 ` Pekka Enberg
2009-01-14 15:09 ` Nick Piggin
2009-01-14 15:09 ` Nick Piggin
2009-01-14 15:22 ` Nick Piggin
2009-01-14 15:22 ` Nick Piggin
2009-01-14 15:30 ` Pekka Enberg
2009-01-14 15:30 ` Pekka Enberg
2009-01-14 15:59 ` Nick Piggin
2009-01-14 15:59 ` Nick Piggin
2009-01-14 18:40 ` Christoph Lameter
2009-01-14 18:40 ` Christoph Lameter
2009-01-15 6:19 ` Nick Piggin
2009-01-15 6:19 ` Nick Piggin
2009-01-15 20:47 ` Christoph Lameter
2009-01-15 20:47 ` Christoph Lameter
2009-01-16 3:43 ` Nick Piggin
2009-01-16 3:43 ` Nick Piggin
2009-01-16 21:25 ` Christoph Lameter
2009-01-16 21:25 ` Christoph Lameter
2009-01-19 6:18 ` Nick Piggin
2009-01-19 6:18 ` Nick Piggin
2009-01-22 0:13 ` Christoph Lameter
2009-01-22 0:13 ` Christoph Lameter
2009-01-22 9:27 ` Pekka Enberg
2009-01-22 9:27 ` Pekka Enberg
2009-01-22 9:30 ` Zhang, Yanmin
2009-01-22 9:30 ` Zhang, Yanmin
2009-01-22 9:33 ` Pekka Enberg
2009-01-22 9:33 ` Pekka Enberg
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 15:37 ` Pekka Enberg
2009-01-23 15:37 ` Pekka Enberg
2009-01-23 15:42 ` Christoph Lameter
2009-01-23 15:42 ` Christoph Lameter
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 15:32 ` Christoph Lameter
2009-01-23 4:09 ` Nick Piggin
2009-01-23 4:09 ` Nick Piggin
2009-01-23 15:41 ` Christoph Lameter
2009-01-23 15:41 ` Christoph Lameter
2009-01-23 15:53 ` Nick Piggin
2009-01-23 15:53 ` Nick Piggin
2009-01-26 17:28 ` Christoph Lameter
2009-01-26 17:28 ` Christoph Lameter
2009-02-03 1:53 ` Nick Piggin
2009-02-03 1:53 ` Nick Piggin
2009-02-03 17:33 ` Christoph Lameter
2009-02-03 17:33 ` Christoph Lameter
2009-02-03 18:42 ` Pekka Enberg
2009-02-03 18:42 ` Pekka Enberg
2009-02-03 18:47 ` Pekka Enberg
2009-02-03 18:47 ` Pekka Enberg
2009-02-04 4:22 ` Nick Piggin
2009-02-04 4:22 ` Nick Piggin
2009-02-04 20:09 ` Christoph Lameter
2009-02-04 20:09 ` Christoph Lameter
2009-02-05 3:18 ` Nick Piggin
2009-02-05 3:18 ` Nick Piggin
2009-02-04 20:10 ` Christoph Lameter
2009-02-04 20:10 ` Christoph Lameter
2009-02-05 3:14 ` Nick Piggin
2009-02-05 3:14 ` Nick Piggin
2009-02-04 4:07 ` Nick Piggin
2009-02-04 4:07 ` Nick Piggin
2009-01-14 18:01 ` Christoph Lameter
2009-01-14 18:01 ` Christoph Lameter
2009-01-15 6:03 ` Nick Piggin
2009-01-15 6:03 ` Nick Piggin
2009-01-15 20:05 ` Christoph Lameter
2009-01-15 20:05 ` Christoph Lameter
2009-01-16 3:19 ` Nick Piggin
2009-01-16 3:19 ` Nick Piggin
2009-01-16 21:07 ` Christoph Lameter
2009-01-16 21:07 ` Christoph Lameter
2009-01-19 5:47 ` Nick Piggin
2009-01-19 5:47 ` Nick Piggin
2009-01-22 0:19 ` Christoph Lameter
2009-01-22 0:19 ` Christoph Lameter
2009-01-23 4:17 ` Nick Piggin
2009-01-23 4:17 ` Nick Piggin
2009-01-23 15:52 ` Christoph Lameter
2009-01-23 15:52 ` Christoph Lameter
2009-01-23 16:10 ` Nick Piggin
2009-01-23 16:10 ` Nick Piggin
2009-01-23 17:09 ` Nick Piggin
2009-01-23 17:09 ` Nick Piggin
2009-01-26 17:46 ` Christoph Lameter
2009-01-26 17:46 ` Christoph Lameter
2009-02-03 1:42 ` Nick Piggin
2009-02-03 1:42 ` Nick Piggin
2009-01-26 17:34 ` Christoph Lameter
2009-01-26 17:34 ` Christoph Lameter
2009-02-03 1:48 ` Nick Piggin
2009-02-03 1:48 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090123035503.GD20098@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.m.lin@intel.com \
--cc=penberg@cs.helsinki.fi \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.