From: Christoph Hellwig <hch@infradead.org>
To: linux-kernel@vger.kernel.org, aradford@gmail.com
Cc: xfs@oss.sgi.com
Subject: Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
Date: Tue, 11 Oct 2011 09:34:48 -0400 [thread overview]
Message-ID: <20111011133448.GA10692@infradead.org> (raw)
In-Reply-To: <20111011091757.GA32589@otto.nzcorp.net>
On Tue, Oct 11, 2011 at 11:17:57AM +0200, Anders Ossowicki wrote:
> We seem to have hit a bug on our brand-new disk with an XFS filesystem on the
> 2.6.38.8 kernel. The disk is 2 Dell MD1220 enclosures with Intel SSDs daisy
> chained behind an LSI MegaRAID SAS 9285-8e raid controller. It was under heavy
> I/O load, 1-200 MB/s r/w from postgres for about a week before the bug showed
> up. The system itself is a Dell PowerEdge R815 with 32 cpu cores and 256G
> memory.
>
> Support for the 9285-8e controller was introduced as part of a series of
> patches for drivers/scsi/megaraid in 2.6.38 (0d49016b..cd50ba8e). Given that
> the megaraid driver support for the 9285-8e controller is so new it might be
> the real source of the issue, but this is pure speculation on my part. Any
> suggestions would be most welcome.
>
> The full dmesg is available at
> http://dev.exherbo.org/~arkanoid/kat-dmesg-2011-10.txt
>
> BUG: unable to handle kernel paging request at 000000000040403c
> IP: [<ffffffff810f8d71>] find_get_pages+0x61/0x110
> PGD 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
> CPU 11
> Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs
> minix ntfs vfat msdos fat jfs xfs reiserfs nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc autofs4 psmouse serio_raw joydev ixgbe lp amd64_edac_mod
> i2c_piix4 dca parport edac_core bnx2 power_meter dcdbas mdio edac_mce_amd ses
> enclosure usbhid hid ahci mpt2sas libahci scsi_transport_sas megaraid_sas
> raid_class
>
> Pid: 27512, comm: flush-8:32 Tainted: G W 2.6.38.8 #1 Dell Inc.
> PowerEdge R815/04Y8PT
> RIP: 0010:[<ffffffff810f8d71>] [<ffffffff810f8d71>] find_get_pages+0x61/0x110
This is core VM code, and operates purely on on-stack variables except
for the page cache radix tree nodes / pages. So this either could be a
core VM bug that no one has noticed yet, or memory corruption. Can you
run memtest86 on the box?
> RSP: 0018:ffff881fdee55800 EFLAGS: 00010246
> RAX: ffff8814a66d7000 RBX: ffff881fdee558c0 RCX: 000000000000000e
> RDX: 0000000000000005 RSI: 0000000000000001 RDI: 0000000000404034
> RBP: ffff881fdee55850 R08: 0000000000000001 R09: 0000000000000002
> R10: ffffea00a0ff7788 R11: ffff88129306ac88 R12: 0000000000031535
> R13: 000000000000000e R14: ffff881fdee558e8 R15: 0000000000000005
> FS: 00007fec9ce13720(0000) GS:ffff88181fc80000(0000) knlGS:00000000f744d6d0
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000040403c CR3: 0000000001a03000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process flush-8:32 (pid: 27512, threadinfo ffff881fdee54000, task ffff881fdf4adb80)
> Stack:
> 0000000000000000 0000000000000000 0000000000000000 ffff8832e7edf6e0
> 0000000000000000 ffff881fdee558b0 ffffea008b443c18 0000000000031535
> ffff8832e7edf590 ffff881fdee55d20 ffff881fdee55870 ffffffff81101f92
> Call Trace:
> [<ffffffff81101f92>] pagevec_lookup+0x22/0x30
> [<ffffffffa033e00d>] xfs_cluster_write+0xad/0x180 [xfs]
> [<ffffffffa033e4f4>] xfs_vm_writepage+0x414/0x4f0 [xfs]
> [<ffffffff810ffb77>] __writepage+0x17/0x40
> [<ffffffff81100d95>] write_cache_pages+0x1c5/0x4a0
> [<ffffffff810ffb60>] ? __writepage+0x0/0x40
> [<ffffffff81101094>] generic_writepages+0x24/0x30
> [<ffffffffa033d5dd>] xfs_vm_writepages+0x5d/0x80 [xfs]
> [<ffffffff811010c1>] do_writepages+0x21/0x40
> [<ffffffff811730bf>] writeback_single_inode+0x9f/0x250
> [<ffffffff8117370b>] writeback_sb_inodes+0xcb/0x170
> [<ffffffff81174174>] writeback_inodes_wb+0xa4/0x170
> [<ffffffff8117450b>] wb_writeback+0x2cb/0x440
> [<ffffffff81035bb9>] ? default_spin_lock_flags+0x9/0x10
> [<ffffffff8158b3af>] ? _raw_spin_lock_irqsave+0x2f/0x40
> [<ffffffff811748ac>] wb_do_writeback+0x22c/0x280
> [<ffffffff811749aa>] bdi_writeback_thread+0xaa/0x260
> [<ffffffff81174900>] ? bdi_writeback_thread+0x0/0x260
> [<ffffffff81081b76>] kthread+0x96/0xa0
> [<ffffffff8100cda4>] kernel_thread_helper+0x4/0x10
> [<ffffffff81081ae0>] ? kthread+0x0/0xa0
> [<ffffffff8100cda0>] ? kernel_thread_helper+0x0/0x10
> Code: 4e 1c 00 85 c0 89 c1 0f 84 a7 00 00 00 49 89 de 45 31 ff 31 d2 0f 1f 44
> 00 00 49 8b 06 48 8b 38 48 85 ff 74 3d 40 f6 c7 01 75 54 <44> 8b 47 08 4c 8d 57
> 08 45 85 c0 74 e5 45 8d 48 01 44 89 c0 f0
> RIP [<ffffffff810f8d71>] find_get_pages+0x61/0x110
> RSP <ffff881fdee55800>
> CR2: 000000000040403c
> ---[ end trace 84193c2a431ae14b ]---
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next parent reply other threads:[~2011-10-11 13:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20111011091757.GA32589@otto.nzcorp.net>
2011-10-11 13:34 ` Christoph Hellwig [this message]
2011-10-11 14:13 ` 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load Anders Ossowicki
2011-10-11 16:07 ` Jesper Krogh
2011-10-12 0:35 ` Dave Chinner
2011-10-12 4:13 ` Stan Hoeppner
2011-10-12 12:29 ` Anders Ossowicki
2011-10-17 12:40 ` jesper
2011-10-24 16:45 ` Michael Monnerie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111011133448.GA10692@infradead.org \
--to=hch@infradead.org \
--cc=aradford@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox