All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Eric Paris <eparis@redhat.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	James Morris <jmorris@namei.org>, Thomas Liu <tliu@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
Date: Tue, 15 Sep 2009 09:24:33 +0200	[thread overview]
Message-ID: <20090915072433.GS14984@kernel.dk> (raw)
In-Reply-To: <20090915071158.GA31392@elte.hu>

On Tue, Sep 15 2009, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > Hard to tell whether it's BDI, RCU or something else - sadly this is 
> > the only incident i've managed to log so far. (We'd be all much 
> > happier if boxes crashed left and right! ;)
> > 
> > -tip's been carrying the RCU changes for a long(er) time which would 
> > reduce the chance of this being RCU related. [ It's still possible 
> > though: if it's a bug with a probability of hitting this box on these 
> > workloads with a chance of 1:20,000 or worse. ]
> > 
> > Plus it triggered shortly after i updated -tip to latest -git which 
> > had the BDI bits - which would indicate the BDI stuff - or just about 
> > anything else in -git for that matter - or something older in -tip. 
> > Every day without having hit this crash once more broadens the range 
> > of plausible possibilities.
> > 
> > In any case, i'll refrain from trying to fit a line on a single point 
> > of measurement ;-)
> 
> Ha! I should have checked all logs of today before writing that, not 
> just that box's logs.
> 
> Another testbox triggered the SLAB corruption yesternight:
> 
> [   13.598011] Freeing unused kernel memory: 2820k freed
> [   13.602011] Write protecting the kernel read-only data: 13528k
> [   13.649011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
> [   14.391012] =============================================================================
> [   14.391012] BUG kmalloc-96: Poison overwritten
> [   14.391012] -----------------------------------------------------------------------------
> [   14.391012] 
> [   14.391012] INFO: 0xffff88003da4a950-0xffff88003da4a988. First byte 0x0 instead of 0x6b
> [   14.391012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=6 cpu=0 pid=3191
> [   14.391012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
> [   14.391012] INFO: Slab 0xffffea000190ae10 objects=24 used=13 fp=0xffff88003da4a930 flags=0x200000000000c3
> [   14.391012] INFO: Object 0xffff88003da4a930 @offset=2352 fp=0xffff88003da4a888
> [   14.391012] 
> [   14.391012] Bytes b4 0xffff88003da4a920:  52 a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a R¤ûÿ....ZZZZZZZZ
> [   14.391012]   Object 0xffff88003da4a930:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [   14.391012]   Object 0xffff88003da4a940:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [   14.391012]   Object 0xffff88003da4a950:  00 0c 47 3d 00 88 ff ff be 22 11 81 ff ff ff ff ..G=..ÿÿ¾"..ÿÿÿÿ
> [   14.391012]   Object 0xffff88003da4a960:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [   14.391012]   Object 0xffff88003da4a970:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [   14.391012]   Object 0xffff88003da4a980:  6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk¥
> [   14.391012]  Redzone 0xffff88003da4a990:  bb bb bb bb bb bb bb bb                         »»»»»»»»        
> [   14.391012]  Padding 0xffff88003da4a9d0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
> [   14.391012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02377-g78907f0-dirty #88876
> [   14.391012] Call Trace:
> [   14.391012]  [<ffffffff810e6bef>] print_trailer+0x140/0x149
> [   14.391012]  [<ffffffff810e70db>] check_bytes_and_report+0xb7/0xf7
> [   14.391012]  [<ffffffff810e71ec>] check_object+0xd1/0x1b4
> [   14.391012]  [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [   14.391012]  [<ffffffff810e7bee>] alloc_debug_processing+0x7b/0xf7
> [   14.391012]  [<ffffffff810e9842>] __slab_alloc+0x23e/0x282
> [   14.391012]  [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [   14.391012]  [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [   14.391012]  [<ffffffff810e9a8f>] kmem_cache_alloc+0xa1/0x13f
> [   14.391012]  [<ffffffff81112078>] bdi_alloc_work+0x20/0x83
> [   14.391012]  [<ffffffff81112ae2>] bdi_writeback_all+0x66/0x133
> [   14.391012]  [<ffffffff810788b8>] ? mark_held_locks+0x4d/0x6b
> [   14.391012]  [<ffffffff81872268>] ? __mutex_unlock_slowpath+0x12d/0x163
> [   14.391012]  [<ffffffff81078b45>] ? trace_hardirqs_on_caller+0x11c/0x140
> [   14.391012]  [<ffffffff815c35e5>] ? usbfs_fill_super+0x0/0xa8
> [   14.391012]  [<ffffffff81112c87>] writeback_inodes_sb+0x75/0x83
> [   14.391012]  [<ffffffff8111652d>] __sync_filesystem+0x30/0x6b
> [   14.391012]  [<ffffffff81116705>] sync_filesystem+0x3a/0x51
> [   14.391012]  [<ffffffff810f4b23>] do_remount_sb+0x5b/0x11f
> [   14.391012]  [<ffffffff810f59a3>] get_sb_single+0x92/0xad
> [   14.391012]  [<ffffffff815c2f01>] usb_get_sb+0x1b/0x1d
> [   14.391012]  [<ffffffff810f5786>] vfs_kern_mount+0x9e/0x122
> [   14.391012]  [<ffffffff810f5871>] do_kern_mount+0x4c/0xec
> [   14.391012]  [<ffffffff8110dd15>] do_mount+0x1e9/0x236
> [   14.391012]  [<ffffffff8110dde6>] sys_mount+0x84/0xc6
> [   14.391012]  [<ffffffff8187333a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [   14.391012]  [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
> [   14.391012] FIX kmalloc-96: Restoring 0xffff88003da4a950-0xffff88003da4a988=0x6b
> [   14.391012] 
> [   14.391012] FIX kmalloc-96: Marking all objects used
> [   20.597016] usb usb2: uevent
> [   20.606016] usb 2-0:1.0: uevent
> [   20.615016] usb usb3: uevent
> [   20.622016] usb 3-0:1.0: uevent
> [   20.631016] usb usb4: uevent
> 
> Different hardware, different config, but still in bdi_alloc_work().
> 
> - Which excludes cosmic rays and freak hardware from the list of 
>   possibilities.
> 
> - I'd also say RCU is out too as this incident was 500 iterations after 
>   the BDI merge - preceded by a streak of 3000+ successful iterations on 
>   that same box with all of -tip (including the RCU changes).
> 
> - Random memory corruption is probably out as well - the chance of 
>   hitting a BDI data structure twice accidentally is low.
> 
> - It's also two completely different versions of distros - the 
>   user-space of the two testboxes affected is 2 years apart or so.
> 
> - It's a single CPU box - SMP races are out as well.
> 
> This points towards this being a BDI bug with about ~80% confidence 
> statistically - or [with a lower probability] a SLAB bug. (both failing 
> configs had CONFIG_SLUB=y. But SLUB is the best in detecting corrupted 
> data structures so that alone does not tell much.)

Hmmm, at least this reproduces more consistently. Can I talk you into
trying to pull in:

  git://git.kernel.dk/linux-2.6-block.git writeback

and see if it reproduces there? That path has been cleaned up
considerably there.

-- 
Jens Axboe


  reply	other threads:[~2009-09-15  7:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-04 17:08 [PATCH] IMA: update ima_counts_put Mimi Zohar
2009-09-06 21:59 ` Eric Paris
2009-09-07  2:17 ` [GIT] IMA regression fix James Morris
2009-09-12  7:24   ` [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux" Ingo Molnar
2009-09-12  7:58     ` [origin tree boot crash #2] kernel BUG at kernel/cred.c:855! Ingo Molnar
2009-09-12  8:19       ` Ingo Molnar
2009-09-12  8:40         ` [PATCH] out-of-tree: Whack warning off in kernel/cred.c Ingo Molnar
2009-09-12  9:58       ` [origin tree boot crash #2] kernel BUG at kernel/cred.c:855! Eric Paris
2009-09-12  9:46     ` [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux" Eric Paris
2009-09-12 10:43       ` Ingo Molnar
2009-09-12 13:58         ` [origin tree boot hang] lockup in key_schedule_gc() Ingo Molnar
2009-09-12 20:27           ` Eric Paris
2009-09-14  6:15             ` Ingo Molnar
2009-09-14 14:38             ` David Howells
2009-09-13  2:28     ` [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux" Eric Paris
2009-09-13 23:03       ` Eric Paris
2009-09-14  7:16       ` [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514 Ingo Molnar
2009-09-14  7:57         ` Pekka Enberg
2009-09-14  9:20           ` Jens Axboe
2009-09-14  9:23             ` Pekka Enberg
2009-09-14 14:40         ` Linus Torvalds
2009-09-14 16:29           ` Paul E. McKenney
2009-09-14 17:10             ` Jens Axboe
2009-09-15  6:57               ` Ingo Molnar
2009-09-15  7:00                 ` Jens Axboe
2009-09-15  7:11                 ` [origin tree SLAB corruption #2] " Ingo Molnar
2009-09-15  7:24                   ` Jens Axboe [this message]
2009-09-15  7:44                     ` Ingo Molnar
2009-09-15  7:48                       ` Ingo Molnar
2009-09-15  7:51                         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090915072433.GS14984@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=eparis@redhat.com \
    --cc=jmorris@namei.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=tliu@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.