From: Dave Jones <davej@redhat.com>
To: Jens Axboe <axboe@suse.de>
Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: .17rc5 cfq slab corruption.
Date: Tue, 30 May 2006 12:12:32 -0400 [thread overview]
Message-ID: <20060530161232.GA17218@redhat.com> (raw)
In-Reply-To: <20060530131728.GX4199@suse.de>
On Tue, May 30, 2006 at 03:17:28PM +0200, Jens Axboe wrote:
> On Sat, May 27 2006, Dave Jones wrote:
> > On Sat, May 27, 2006 at 09:07:24AM +0200, Jens Axboe wrote:
> > > On Fri, May 26 2006, Andrew Morton wrote:
> > > > Dave Jones <davej@redhat.com> wrote:
> > > > >
> > > > > Was playing with googles new picasa toy, which hammered the disks
> > > > > hunting out every image file it could find, when this popped out:
> > > > >
> > > > > Slab corruption: (Not tainted) start=ffff810012b998c8, len=168
> > > > > Redzone: 0x5a2cf071/0x5a2cf071.
> > > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
> > > > > 090: 10 bd 28 1b 00 81 ff ff 6b 6b 6b 6b 6b 6b 6b 6b
> > > > > Prev obj: start=ffff810012b99808, len=168
> > > > > Redzone: 0x5a2cf071/0x5a2cf071.
> > > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
> > > > > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > > > > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > > > > Next obj: start=ffff810012b99988, len=168
> > > > > Redzone: 0x5a2cf071/0x5a2cf071.
> > > > > Last user: [<ffffffff8032c319>](cfq_free_io_context+0x2f/0x74)
> > > > > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > > > > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > >
> > > Pretty baffling... cfq has been hammered pretty thoroughly over the
> > > last months and _nothing_ has shown up except some performance anomalies
> > > that are now fixed. Since daves case (at least) seems to be
> > > use-after-free, I'll see if I can reproduce with some contrived case.
> > > I'm asuming that picasa forks and exits a lot with submitted io in
> > > between than may not have finished at exit.
> >
> > The second time I hit it, was actually during boot up.
>
> Dave, do you have any io scheduler switching going on?
Here's something interesting (possibly unrelated).
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193534
I added this patch to our devel kernel (based on 17rc5-git5 right now)
It's similar to the list_head debugging patch from -mm
--- linux-2.6.12/include/linux/list.h~ 2005-08-08 15:34:50.000000000 -0400
+++ linux-2.6.12/include/linux/list.h 2005-08-08 15:35:22.000000000 -0400
@@ -5,7 +5,9 @@
#include <linux/stddef.h>
#include <linux/prefetch.h>
+#include <linux/kernel.h>
#include <asm/system.h>
+#include <asm/bug.h>
/*
* These are non-NULL pointers that will result in page faults
@@ -52,6 +52,16 @@ static inline void __list_add(struct lis
struct list_head *prev,
struct list_head *next)
{
+ if (next->prev != prev) {
+ printk("List corruption. next->prev should be %p, but was %p\n",
+ prev, next->prev);
+ BUG();
+ }
+ if (prev->next != next) {
+ printk("List corruption. prev->next should be %p, but was %p\n",
+ next, prev->next);
+ BUG();
+ }
next->prev = new;
new->next = next;
new->prev = prev;
@@ -162,6 +162,16 @@ static inline void __list_del(struct lis
*/
static inline void list_del(struct list_head *entry)
{
+ if (entry->prev->next != entry) {
+ printk("List corruption. prev->next should be %p, but was %p\n",
+ entry, entry->prev->next);
+ BUG();
+ }
+ if (entry->next->prev != entry) {
+ printk("List corruption. next->prev should be %p, but was %p\n",
+ entry, entry->next->prev);
+ BUG();
+ }
__list_del(entry->prev, entry->next);
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
And then it turned up this:
List corruption. next->prev should be f74a5e2c, but was ea7ed31c
Pointing at cfq_set_request.
Now, *anything* could have corrupted that list, not necessarily cfq,
but it's something of a coincidence.
Dave
--
http://www.codemonkey.org.uk
next prev parent reply other threads:[~2006-05-30 16:12 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-26 21:39 .17rc5 cfq slab corruption Dave Jones
2006-05-27 0:00 ` Andrew Morton
2006-05-27 7:07 ` Jens Axboe
2006-05-27 13:31 ` Dave Jones
2006-05-27 14:52 ` Jens Axboe
2006-05-30 13:17 ` Jens Axboe
2006-05-30 13:44 ` Dave Jones
2006-05-30 13:50 ` Jens Axboe
2006-05-30 13:52 ` Dave Jones
2006-05-30 14:13 ` Dave Jones
2006-05-30 16:12 ` Dave Jones [this message]
2006-05-30 16:49 ` Jens Axboe
2006-05-30 16:56 ` Dave Jones
2006-05-30 17:04 ` Jens Axboe
2006-05-30 18:49 ` Jens Axboe
2006-05-30 18:51 ` Jens Axboe
2006-05-30 19:11 ` Jens Axboe
2006-05-30 19:23 ` Dave Jones
2006-05-30 19:27 ` Jens Axboe
2006-05-30 19:28 ` OGAWA Hirofumi
2006-05-30 19:42 ` Jens Axboe
2006-05-30 19:48 ` Jens Axboe
2006-05-30 19:49 ` OGAWA Hirofumi
2006-05-27 2:56 ` Dave Jones
2006-05-27 3:03 ` Dave Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060530161232.GA17218@redhat.com \
--to=davej@redhat.com \
--cc=akpm@osdl.org \
--cc=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.