From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Torsten Kaiser <just.for.lkml@googlemail.com>,
Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Christoph Lameter <clameter@sgi.com>
Subject: Re: Linux 2.6.25-rc2
Date: Tue, 19 Feb 2008 09:02:30 -0500 [thread overview]
Message-ID: <20080219140230.GA32236@Krystal> (raw)
In-Reply-To: <84144f020802182321x452888bai639c71ea2a5067da@mail.gmail.com>
* Pekka Enberg (penberg@cs.helsinki.fi) wrote:
> On Feb 19, 2008 8:54 AM, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> > > > [ 5282.056415] ------------[ cut here ]------------
> > > > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
> > > > [ 5282.062055] invalid opcode: 0000 [1] SMP
> > > > [ 5282.062055] CPU 3
> > >
> > > hm. Your crashes do seem to span multiple subsystems, but it always
> > > seems to be around the SLUB code. Could you try the patch below? The
> > > SLUB code has a new optimization and i'm not 100% sure about it. [the
> > > hack below switches the SLUB optimization off by disabling the CPU
> > > feature it relies on.]
> > >
> > > Ingo
> > >
> > > ------------->
> > > arch/x86/Kconfig | 4 ----
> > > 1 file changed, 4 deletions(-)
> > >
> > > Index: linux/arch/x86/Kconfig
> > > ===================================================================
> > > --- linux.orig/arch/x86/Kconfig
> > > +++ linux/arch/x86/Kconfig
> > > @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
> > > config SEMAPHORE_SLEEPERS
> > > def_bool y
> > >
> > > -config FAST_CMPXCHG_LOCAL
> > > - bool
> > > - default y
> > > -
> > > config MMU
> > > def_bool y
> > >
> >
> > $ grep FAST_CMPXCHG_LOCAL */.config
> > linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> > linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> >
> > -rc2-mm1 still worked for me.
> >
> > Did you mean the new SLUB_FASTPATH?
> > $ grep "define SLUB_FASTPATH" */mm/slub.c
> > linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH
> > linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH
> > linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH
> >
> > The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this...
> >
> > On the other hand:
> > From the crash in 2.6.25-rc2-mm1:
> > [59987.116182] RIP [<ffffffff8029f83d>] kmem_cache_alloc_node+0x6d/0xa0
> >
> > (gdb) list *0xffffffff8029f83d
> > 0xffffffff8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646).
> > 1641 if (unlikely(is_end(object) || !node_match(c, node))) {
> > 1642 object = __slab_alloc(s, gfpflags,
> > node, addr, c);
> > 1643 break;
> > 1644 }
> > 1645 stat(c, ALLOC_FASTPATH);
> > 1646 } while (cmpxchg_local(&c->freelist, object, object[c->offset])
> > 1647
> > != object);
> > 1648 #else
> > 1649 unsigned long flags;
> > 1650
> >
> > That code is part for SLUB_FASTPATH.
> >
> > I'm willing to test the patch, but don't know how fast I can find the
> > time to do it, so my answer if your patch helps might be delayed until
> > the weekend.
>
> Mathieu, Christoph is on vacation and I'm not at all that familiar
> with this cmpxchg_local() optimization, so if you could take a peek at
> this bug report to see if you can spot something obviously wrong with
> it, I would much appreciate that.
Sure,
Initial thoughts :
I'd like to get the complete config causing this bug. I suspect either :
- A race between the lockless algo and an IRQ in a driver allocating
memory.
- stat(c, ALLOC_FASTPATH); seems to be using a var++, therefore
indicating it is not reentrant if IRQs are disabled. Since those are
only stats, I guess it's ok, but still weird.
- CPU hotplug problem.
http://bugzilla.kernel.org/attachment.cgi?id=14877&action=view shows
last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
-- is this linked to a cpu up/down event ?
Since this shows mostly with network card drivers, I think the most
plausible cause would be an IRQ nesting over kmem_cache_alloc_node and
calling it.
Will dig further...
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-02-19 14:02 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-15 21:23 Linux 2.6.25-rc2 Linus Torvalds
2008-02-16 2:08 ` Rafael J. Wysocki
2008-02-16 5:44 ` [BUG] Linux 2.6.25-rc2 - Kernel Ooops while running dbench Kamalesh Babulal
2008-02-18 12:59 ` Andrew Morton
2008-02-18 12:59 ` Andrew Morton
2008-02-18 14:25 ` Jeff Garzik
2008-02-18 16:11 ` Frans Pop
2008-02-18 16:11 ` Frans Pop
2008-03-03 11:51 ` Pekka Enberg
2008-03-03 11:51 ` Pekka Enberg
2008-03-04 4:03 ` Kamalesh Babulal
2008-03-04 4:03 ` Kamalesh Babulal
2008-02-16 6:10 ` [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc Kamalesh Babulal
2008-02-16 6:10 ` Kamalesh Babulal
2008-02-17 19:29 ` Jens Axboe
2008-02-17 19:29 ` Jens Axboe
2008-02-19 8:04 ` KAMEZAWA Hiroyuki
2008-02-19 8:04 ` KAMEZAWA Hiroyuki
2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:36 ` Jens Axboe
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
2008-02-19 8:47 ` KAMEZAWA Hiroyuki
2008-02-19 8:58 ` Jens Axboe
2008-02-19 8:58 ` Jens Axboe
2008-02-19 9:07 ` KAMEZAWA Hiroyuki
2008-02-19 9:07 ` KAMEZAWA Hiroyuki
2008-02-19 9:09 ` Jens Axboe
2008-02-19 9:09 ` Jens Axboe
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
2008-02-19 9:02 ` KAMEZAWA Hiroyuki
2008-02-19 9:01 ` Jens Axboe
2008-02-19 9:01 ` Jens Axboe
2008-02-19 13:19 ` Kamalesh Babulal
2008-02-19 13:19 ` Kamalesh Babulal
2008-02-22 7:24 ` Andrew Morton
2008-02-22 7:24 ` Andrew Morton
2008-02-22 7:40 ` Jens Axboe
2008-02-22 7:40 ` Jens Axboe
2008-02-17 20:08 ` Rafael J. Wysocki
2008-02-17 20:08 ` Rafael J. Wysocki
2008-02-16 16:52 ` Linux 2.6.25-rc2 Jan Engelhardt
2008-02-16 19:14 ` Linux 2.6.25-rc2 regression: LVM cannot find volume group Tilman Schmidt
2008-02-16 20:12 ` Alan Cox
2008-02-16 22:37 ` Jiri Slaby
2008-02-18 0:57 ` Tilman Schmidt
2008-02-18 1:22 ` Jeff Chua
2008-02-18 10:35 ` Tilman Schmidt
2008-02-19 1:53 ` Alasdair G Kergon
2008-02-19 8:56 ` Tilman Schmidt
2008-02-16 21:38 ` Linux 2.6.25-rc2 Torsten Kaiser
2008-02-17 20:25 ` Rafael J. Wysocki
2008-02-17 21:32 ` Torsten Kaiser
2008-02-18 23:54 ` Linus Torvalds
2008-02-19 6:44 ` Torsten Kaiser
2008-02-19 6:11 ` Ingo Molnar
2008-02-19 6:54 ` Torsten Kaiser
2008-02-19 7:21 ` Pekka Enberg
2008-02-19 10:27 ` Ingo Molnar
2008-02-19 10:45 ` Pekka Enberg
2008-02-19 13:02 ` Mathieu Desnoyers
2008-02-19 14:00 ` Ingo Molnar
2008-02-19 14:02 ` Mathieu Desnoyers [this message]
2008-02-19 14:21 ` Pekka Enberg
2008-02-19 14:38 ` Pekka Enberg
2008-02-19 14:55 ` Ingo Molnar
2008-02-19 14:57 ` Ingo Molnar
2008-02-19 15:54 ` Pekka Enberg
2008-02-19 15:52 ` Pekka Enberg
2008-02-20 0:36 ` Zhang, Yanmin
2008-02-20 2:08 ` Zhang, Yanmin
2008-02-20 6:53 ` Zhang, Yanmin
2008-02-20 7:10 ` Pekka Enberg
2008-02-19 16:20 ` Linus Torvalds
2008-02-19 16:45 ` Ingo Molnar
2008-02-19 16:48 ` Ingo Molnar
2008-02-19 19:27 ` Torsten Kaiser
2008-02-19 20:08 ` Mathieu Desnoyers
2008-02-27 23:32 ` Christoph Lameter
2008-02-28 1:57 ` Andrew Morton
2008-02-28 2:43 ` Christoph Lameter
2008-02-28 8:14 ` Ingo Molnar
2008-02-28 11:15 ` Alan Cox
2008-02-28 11:13 ` Jiri Kosina
2008-02-19 16:27 ` Eric Dumazet
2008-02-19 16:38 ` Linus Torvalds
2008-02-19 20:03 ` Mathieu Desnoyers
2008-02-27 23:34 ` Christoph Lameter
2008-02-28 5:55 ` [PATCH] Implement slub fastpath in terms of freebase and freeoffset Mathieu Desnoyers
2008-02-28 19:08 ` Christoph Lameter
2008-02-28 23:25 ` Mathieu Desnoyers
2008-02-29 0:57 ` Christoph Lameter
2008-02-29 1:56 ` Mathieu Desnoyers
2008-02-29 2:12 ` Christoph Lameter
2008-02-29 3:32 ` Mathieu Desnoyers
2008-02-29 5:11 ` Christoph Lameter
2008-02-29 13:03 ` Mathieu Desnoyers
2008-02-29 19:57 ` Christoph Lameter
2008-02-29 13:28 ` [PATCH] Slub Freeoffset check overflow Mathieu Desnoyers
2008-03-04 6:17 ` [PATCH] Slub Freeoffset check overflow (updated) Mathieu Desnoyers
2008-03-04 7:15 ` Christoph Lameter
2008-02-27 23:32 ` Linux 2.6.25-rc2 Christoph Lameter
2008-02-19 18:39 ` Torsten Kaiser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080219140230.GA32236@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=clameter@sgi.com \
--cc=just.for.lkml@googlemail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=penberg@cs.helsinki.fi \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.