Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
       [not found] <20111011091757.GA32589@otto.nzcorp.net>
@ 2011-10-11 13:34 ` Christoph Hellwig
  2011-10-11 14:13   ` Anders Ossowicki
  2011-10-17 12:40   ` jesper
  0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2011-10-11 13:34 UTC (permalink / raw)
  To: linux-kernel, aradford; +Cc: xfs

On Tue, Oct 11, 2011 at 11:17:57AM +0200, Anders Ossowicki wrote:
> We seem to have hit a bug on our brand-new disk with an XFS filesystem on the
> 2.6.38.8 kernel. The disk is 2 Dell MD1220 enclosures with Intel SSDs daisy
> chained behind an LSI MegaRAID SAS 9285-8e raid controller. It was under heavy
> I/O load, 1-200 MB/s r/w from postgres for about a week before the bug showed
> up. The system itself is a Dell PowerEdge R815 with 32 cpu cores and 256G
> memory. 
> 
> Support for the 9285-8e controller was introduced as part of a series of
> patches for drivers/scsi/megaraid in 2.6.38 (0d49016b..cd50ba8e). Given that
> the megaraid driver support for the 9285-8e controller is so new it might be
> the real source of the issue, but this is pure speculation on my part. Any
> suggestions would be most welcome.
> 
> The full dmesg is available at
> http://dev.exherbo.org/~arkanoid/kat-dmesg-2011-10.txt
> 
> BUG: unable to handle kernel paging request at 000000000040403c
> IP: [<ffffffff810f8d71>] find_get_pages+0x61/0x110
> PGD 0 
> Oops: 0000 [#1] SMP  
> last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
> CPU 11 
> Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs
>  minix ntfs vfat msdos fat jfs xfs reiserfs nfsd exportfs nfs lockd nfs_acl
>  auth_rpcgss sunrpc autofs4 psmouse serio_raw joydev ixgbe lp amd64_edac_mod
>  i2c_piix4 dca parport edac_core bnx2 power_meter dcdbas mdio edac_mce_amd ses
>  enclosure usbhid hid ahci mpt2sas libahci scsi_transport_sas megaraid_sas
>  raid_class 
> 
> Pid: 27512, comm: flush-8:32 Tainted: G        W   2.6.38.8 #1 Dell Inc.
>  PowerEdge R815/04Y8PT
> RIP: 0010:[<ffffffff810f8d71>]  [<ffffffff810f8d71>] find_get_pages+0x61/0x110

This is core VM code, and operates purely on on-stack variables except
for the page cache radix tree nodes / pages.  So this either could be a
core VM bug that no one has noticed yet, or memory corruption.  Can you
run memtest86 on the box?

> RSP: 0018:ffff881fdee55800  EFLAGS: 00010246
> RAX: ffff8814a66d7000 RBX: ffff881fdee558c0 RCX: 000000000000000e
> RDX: 0000000000000005 RSI: 0000000000000001 RDI: 0000000000404034
> RBP: ffff881fdee55850 R08: 0000000000000001 R09: 0000000000000002
> R10: ffffea00a0ff7788 R11: ffff88129306ac88 R12: 0000000000031535
> R13: 000000000000000e R14: ffff881fdee558e8 R15: 0000000000000005
> FS:  00007fec9ce13720(0000) GS:ffff88181fc80000(0000) knlGS:00000000f744d6d0
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000040403c CR3: 0000000001a03000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process flush-8:32 (pid: 27512, threadinfo ffff881fdee54000, task ffff881fdf4adb80)
> Stack:
>  0000000000000000 0000000000000000 0000000000000000 ffff8832e7edf6e0
>  0000000000000000 ffff881fdee558b0 ffffea008b443c18 0000000000031535
>  ffff8832e7edf590 ffff881fdee55d20 ffff881fdee55870 ffffffff81101f92
> Call Trace:
>  [<ffffffff81101f92>] pagevec_lookup+0x22/0x30
>  [<ffffffffa033e00d>] xfs_cluster_write+0xad/0x180 [xfs]
>  [<ffffffffa033e4f4>] xfs_vm_writepage+0x414/0x4f0 [xfs]
>  [<ffffffff810ffb77>] __writepage+0x17/0x40 
>  [<ffffffff81100d95>] write_cache_pages+0x1c5/0x4a0
>  [<ffffffff810ffb60>] ? __writepage+0x0/0x40
>  [<ffffffff81101094>] generic_writepages+0x24/0x30
>  [<ffffffffa033d5dd>] xfs_vm_writepages+0x5d/0x80 [xfs]
>  [<ffffffff811010c1>] do_writepages+0x21/0x40
>  [<ffffffff811730bf>] writeback_single_inode+0x9f/0x250
>  [<ffffffff8117370b>] writeback_sb_inodes+0xcb/0x170
>  [<ffffffff81174174>] writeback_inodes_wb+0xa4/0x170
>  [<ffffffff8117450b>] wb_writeback+0x2cb/0x440
>  [<ffffffff81035bb9>] ? default_spin_lock_flags+0x9/0x10
>  [<ffffffff8158b3af>] ? _raw_spin_lock_irqsave+0x2f/0x40
>  [<ffffffff811748ac>] wb_do_writeback+0x22c/0x280
>  [<ffffffff811749aa>] bdi_writeback_thread+0xaa/0x260
>  [<ffffffff81174900>] ? bdi_writeback_thread+0x0/0x260
>  [<ffffffff81081b76>] kthread+0x96/0xa0
>  [<ffffffff8100cda4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81081ae0>] ? kthread+0x0/0xa0
>  [<ffffffff8100cda0>] ? kernel_thread_helper+0x0/0x10
> Code: 4e 1c 00 85 c0 89 c1 0f 84 a7 00 00 00 49 89 de 45 31 ff 31 d2 0f 1f 44
>  00 00 49 8b 06 48 8b 38 48 85 ff 74 3d 40 f6 c7 01 75 54 <44> 8b 47 08 4c 8d 57
>  08 45 85 c0 74 e5 45 8d 48 01 44 89 c0 f0  
> RIP  [<ffffffff810f8d71>] find_get_pages+0x61/0x110
> RSP <ffff881fdee55800>
> CR2: 000000000040403c
> ---[ end trace 84193c2a431ae14b ]--- 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-11 13:34 ` 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load Christoph Hellwig
@ 2011-10-11 14:13   ` Anders Ossowicki
  2011-10-11 16:07     ` Jesper Krogh
  2011-10-12  0:35     ` Dave Chinner
  2011-10-17 12:40   ` jesper
  1 sibling, 2 replies; 8+ messages in thread
From: Anders Ossowicki @ 2011-10-11 14:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aradford, linux-kernel, xfs

On Tue, Oct 11, 2011 at 03:34:48PM +0200, Christoph Hellwig wrote:
> This is core VM code, and operates purely on on-stack variables except
> for the page cache radix tree nodes / pages.  So this either could be a
> core VM bug that no one has noticed yet, or memory corruption.  Can you
> run memtest86 on the box?

Unfortunately not, as it is a production server. Pulling it out to memtest 256G
properly would take too long. But it seems unlikely to me that it should be
memory corruption. The machine has been running with the same (ecc) memory for
more than a year and neither the service processor nor the kernel (according to
dmesg) has caught anything before this. It would be a rare (though I admit not
impossible) coincidence if we got catastrophic, undetected memory corruption a
week after attaching a new raid controller with a new disk array.
-- 
Anders Ossowicki

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-11 14:13   ` Anders Ossowicki
@ 2011-10-11 16:07     ` Jesper Krogh
  2011-10-12  0:35     ` Dave Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: Jesper Krogh @ 2011-10-11 16:07 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, aradford, xfs

On 2011-10-11 16:13, Anders Ossowicki wrote:
> On Tue, Oct 11, 2011 at 03:34:48PM +0200, Christoph Hellwig wrote:
>> This is core VM code, and operates purely on on-stack variables except
>> for the page cache radix tree nodes / pages.  So this either could be a
>> core VM bug that no one has noticed yet, or memory corruption.  Can you
>> run memtest86 on the box?
> Unfortunately not, as it is a production server. Pulling it out to memtest 256G
> properly would take too long. But it seems unlikely to me that it should be
> memory corruption. The machine has been running with the same (ecc) memory for
> more than a year and neither the service processor nor the kernel (according to
> dmesg) has caught anything before this. It would be a rare (though I admit not
> impossible) coincidence if we got catastrophic, undetected memory corruption a
> week after attaching a new raid controller with a new disk array.
A sidenote that Anders forgot.. the system was stable for very long time,
but on a 2.6.37 kernel. We upgraded to 2.6.38 to get the raid-controller
support and then it crashed.

Now we're trying to get the new hardware in the air on 2.6.37 with  
backpatched
megaraid driver for the RAID-controller.

-- 
Jesper

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-11 14:13   ` Anders Ossowicki
  2011-10-11 16:07     ` Jesper Krogh
@ 2011-10-12  0:35     ` Dave Chinner
  2011-10-12  4:13       ` Stan Hoeppner
  2011-10-12 12:29       ` Anders Ossowicki
  1 sibling, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2011-10-12  0:35 UTC (permalink / raw)
  To: Christoph Hellwig, linux-kernel, aradford, xfs

On Tue, Oct 11, 2011 at 04:13:38PM +0200, Anders Ossowicki wrote:
> On Tue, Oct 11, 2011 at 03:34:48PM +0200, Christoph Hellwig wrote:
> > This is core VM code, and operates purely on on-stack variables except
> > for the page cache radix tree nodes / pages.  So this either could be a
> > core VM bug that no one has noticed yet, or memory corruption.  Can you
> > run memtest86 on the box?
> 
> Unfortunately not, as it is a production server. Pulling it out to memtest 256G
> properly would take too long. But it seems unlikely to me that it should be
> memory corruption.  The machine has been running with the same (ecc) memory for
> more than a year and neither the service processor nor the kernel (according to
> dmesg) has caught anything before this. It would be a rare (though I admit not
> impossible) coincidence if we got catastrophic, undetected memory corruption a
> week after attaching a new raid controller with a new disk array.

Memory corruption can be caused by more than just a bad memory
stick. You've got a brand new driver running your brand new
controller and it may still have bugs - it might be scribbling over
memory it doesn't own because of off-by-one index errors, etc. It's
much more likely that that new hardware or driver code is the cause
of your problem than an undetected ECC memory error or core VM
problem.

FWIW, if it's a repeatable problem, you might want to update the
driver and controller firmware to something more recent and see if
that solves the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-12  0:35     ` Dave Chinner
@ 2011-10-12  4:13       ` Stan Hoeppner
  2011-10-12 12:29       ` Anders Ossowicki
  1 sibling, 0 replies; 8+ messages in thread
From: Stan Hoeppner @ 2011-10-12  4:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, aradford, linux-kernel, xfs

On 10/11/2011 7:35 PM, Dave Chinner wrote:

> FWIW, if it's a repeatable problem, you might want to update the
> driver and controller firmware to something more recent and see if
> that solves the problem....

New firmware was released on 08/11/2011 for the SAS2208 dual core ASIC
based cards, the 9265-8i and 9285-8i.  I'd say the odds are good that
the OP's card(s) shipped with the original firmware, given this one was
released exactly two months ago today.

Complete list of bug fixes and enhancements in this firmware:
http://kb.lsi.com/KnowledgebaseArticle16557.aspx

I didn't count them but it looks like a few hundred, which makes this a
major product refresh, and seems to indicate the original firmware for
these two cards was a bit buggy.  This is the 2nd firmware release.

Firmware file:
http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/23.1.1-0004_SAS_FW_Image_3.140.15-1320.zip

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-12  0:35     ` Dave Chinner
  2011-10-12  4:13       ` Stan Hoeppner
@ 2011-10-12 12:29       ` Anders Ossowicki
  1 sibling, 0 replies; 8+ messages in thread
From: Anders Ossowicki @ 2011-10-12 12:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, aradford@gmail.com,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com

On Wed, Oct 12, 2011 at 02:35:26AM +0200, Dave Chinner wrote:
> Memory corruption can be caused by more than just a bad memory
> stick. You've got a brand new driver running your brand new
> controller and it may still have bugs - it might be scribbling over
> memory it doesn't own because of off-by-one index errors, etc. It's
> much more likely that that new hardware or driver code is the cause
> of your problem than an undetected ECC memory error or core VM
> problem.
Ah, now that I agree on. A few more observations from today's experiments:

First of all, there are two MegaRAID controllers in the machine. The 
old'n'reliable 8888ELP and the new'n'wonky 9285-8e. Both are using the megaraid
driver and the 8888ELP card ran with the megaraid driver prior to the 
refactoring that introduced support for 9285-8e without a hitch for about a
year.

We've gotten to a point where we can reliably reproduce this by running certain
queries in postgresql when data from the disk is cached. E.g 
 foo=# select count(*) from foo.sequence;
 ERROR:  invalid page header in block 529134 of relation base/16385/58318945

If we echo 3 >/proc/sys/vm/drop_caches and reload postgres, the same queries
work. This does indeed smell like memory corruption.

The 9285-8e controller has FastPath enabled.

> FWIW, if it's a repeatable problem, you might want to update the
> driver and controller firmware to something more recent and see if
> that solves the problem....
I upgraded the firmware (post-accident) but we're still seeing the corruption.
-- 
Anders

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-11 13:34 ` 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load Christoph Hellwig
  2011-10-11 14:13   ` Anders Ossowicki
@ 2011-10-17 12:40   ` jesper
  2011-10-24 16:45     ` Michael Monnerie
  1 sibling, 1 reply; 8+ messages in thread
From: jesper @ 2011-10-17 12:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aradford, linux-kernel, xfs

> On Tue, Oct 11, 2011 at 11:17:57AM +0200, Anders Ossowicki wrote:
>> Pid: 27512, comm: flush-8:32 Tainted: G        W   2.6.38.8 #1 Dell Inc.
>>  PowerEdge R815/04Y8PT
>> RIP: 0010:[<ffffffff810f8d71>]  [<ffffffff810f8d71>]
>> find_get_pages+0x61/0x110
>
> This is core VM code, and operates purely on on-stack variables except
> for the page cache radix tree nodes / pages.  So this either could be a
> core VM bug that no one has noticed yet, or memory corruption.  Can you
> run memtest86 on the box?

Over the weekend, we have run memtest for 4 hours (50% of the complete
tests according to memtest) and it didnt find anything.

We've also backpatched the raid-driver into 2.6.37 and the problem continued,
so it seems to be related to the driver and/or combinations with hardware.

Jesper
-- 
Jesper

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load
  2011-10-17 12:40   ` jesper
@ 2011-10-24 16:45     ` Michael Monnerie
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Monnerie @ 2011-10-24 16:45 UTC (permalink / raw)
  To: xfs; +Cc: Christoph Hellwig, aradford, linux-kernel, jesper

[-- Attachment #1.1: Type: Text/Plain, Size: 853 bytes --]

On Montag, 17. Oktober 2011 jesper@krogh.cc wrote:
> Over the weekend, we have run memtest for 4 hours (50% of the
> complete tests according to memtest) and it didnt find anything.

This is a bit OT, but you *must* run at least one full loop of memtest 
in order to have significant output. I've had PCs that only broke on 
test 7, and sometimes even a PC could run 2-3 loops before a bad bit was 
found on the 4th loop or so. Only if there's a nasty error, like a 
physically broken mainboard, you'll gets errors quickly.

Despite that, I think your problem is not memory based, as you said you 
use ECC, and I'd believe you've turned on background scrubbing.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-24 16:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20111011091757.GA32589@otto.nzcorp.net>
2011-10-11 13:34 ` 2.6.38.8 kernel bug in XFS or megaraid driver with heavy I/O load Christoph Hellwig
2011-10-11 14:13   ` Anders Ossowicki
2011-10-11 16:07     ` Jesper Krogh
2011-10-12  0:35     ` Dave Chinner
2011-10-12  4:13       ` Stan Hoeppner
2011-10-12 12:29       ` Anders Ossowicki
2011-10-17 12:40   ` jesper
2011-10-24 16:45     ` Michael Monnerie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox