From: David Chinner <dgc@sgi.com>
To: Emmanuel Florac <eflorac@intellique.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS crash on linux raid
Date: Fri, 4 May 2007 10:59:22 +1000 [thread overview]
Message-ID: <20070504005922.GC32602149@melbourne.sgi.com> (raw)
In-Reply-To: <20070503164521.16efe075@harpe.intellique.com>
On Thu, May 03, 2007 at 04:45:21PM +0200, Emmanuel Florac wrote:
>
> Hello,
> Apparently quite a lot of people do encounter the same problem from
> time to time, but I couldn't find any solution.
>
> When writing quite a lot to the filesystem (heavy load on the
> fileserver), the filesystem crashes when filled at 2.5~3TB (varies from
> time to time). The filesystems tested where always running on a software
> raid 0, with disabled barriers. I tend to think that disabled write
> barriers are causing the crash but I'll do some more tests to get sure.
>
> I've met this problem for the first time on 12/23 (yup... merry
> christmas :) when a 13 TB filesystem went belly up :
>
> Dec 23 01:38:10 storiq1 -- MARK --
> Dec 23 01:58:10 storiq1 -- MARK --
> Dec 23 02:10:29 storiq1 kernel: xfs_iunlink_remove: xfs_itobp()
> returned an error 990 on md0. Returning error.
> Dec 23 02:10:29 storiq1 kernel: xfs_inactive:^Ixfs_ifree() returned an
> error = 990 on md0
> Dec 23 02:10:29 storiq1 kernel: xfs_force_shutdown(md0,0x1) called from
> line 1763 of file fs/xfs/xfs_vnodeops.c. Return address = 0xc027f78b
> Dec 23 02:38:11 storiq1 -- MARK --
> Dec 23 02:58:11 storiq1 -- MARK --
So, trying to remove an inode there was a corruption found on disk
and it shut the filesystem down.
Where there any I/o errors reported before the shutdown?
> When mounting, it did that :
>
> Filesystem "md0": Disabling barriers, not supported by the underlying
> device XFS mounting filesystem md0
> Starting XFS recovery on filesystem: md0 (logdev: internal)
> Filesystem "md0": xfs_inode_recover: Bad inode magic number, dino ptr =
> 0xf7196600, dino bp = 0xf718e980, ino = 119318 Filesystem "md0": XFS
Which was found again during log recovery.
> The system was running vanilla 2.6.17.9, and md0 was made of 3 striped
> RAID-5 on 3 3Ware-9550 cards, each hardware RAID-5 made of 8 750 GB
> drives.
>
> On a similar hardware with 2 3Ware-9550 16x750GB striped together, but
> running 2.6.17.13, I had a similar fs crash last week. Unfortunately I
> don't have the logs at hand, but we where able to reproduce several
> times the crash at home :
Hmm - 750GB drives are brand new. i wouldn't rule out media issues
at this point...
> Filesystem "md0": XFS internal error xfs_btree_check_sblock at line 336
> of file fs/xfs/xfs_btree.c. Caller 0xc01fb282 <c0214568>
Memory corruption?
> line 1151 of file fs/xfs/xfs_trans.c. Return address = 0xc025f7b9
> Filesystem "md0": Corruption of in-memory data detected. Shutting down
> filesystem: md0 Please umount the filesystem, and rectify the
> problem(s) xfs_force_shutdown(md0,0x1) called from line 338 of file
> fs/xfs/xfs_rw.c. Return address = 0xc025f7b9
> xfs_force_shutdown(md0,0x1) called from line 338 of file
> fs/xfs/xfs_rw.c. Return address = 0xc025f7b9
>
> After xfs_repair, the fs is fine. However, it crashes again when
> writing again a couple of GBs of data. It crashes again under 2.6.17.13,
> 2.6.17.13 SMP, 2.6.18.8, 2.6.16.36...
>
> Out of curiosity, I've tried to use reiserfs (just to see how it
> compares regarding this). Reiserfs crashed before even writing 100MB!
That indicates there's something wrong other than the filesystem.
I'd suggest making sure your raid arrays, memory, etc are all
functioning correctly first.
What platform are you running on? Are you running ia32 with 4k stacks?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
next prev parent reply other threads:[~2007-05-04 0:59 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-03 14:45 XFS crash on linux raid Emmanuel Florac
2007-05-03 23:02 ` Chris Wedgwood
2007-05-04 0:59 ` David Chinner [this message]
2007-05-04 7:06 ` Emmanuel Florac
2007-05-04 7:33 ` David Chinner
2007-05-04 13:25 ` Emmanuel Florac
2007-05-04 14:55 ` Eric Sandeen
2007-05-04 15:30 ` Emmanuel Florac
2007-05-04 23:20 ` Chris Wedgwood
2007-05-05 15:19 ` Emmanuel Florac
2007-05-05 16:50 ` Eric Sandeen
2007-05-05 20:35 ` Chris Wedgwood
2007-05-05 20:58 ` Emmanuel Florac
2007-05-05 22:12 ` Chris Wedgwood
2007-05-06 17:21 ` Emmanuel Florac
2007-05-05 20:57 ` Emmanuel Florac
2007-11-19 18:10 ` Alexander Bergolth
2007-11-19 23:44 ` Chris Wedgwood
2007-11-21 15:39 ` Alexander 'Leo' Bergolth
2007-05-04 15:58 ` Martin Steigerwald
2007-05-04 21:43 ` Emmanuel Florac
2007-05-05 4:49 ` Eric Sandeen
2007-05-05 15:18 ` Emmanuel Florac
2007-05-05 16:47 ` Eric Sandeen
2007-05-05 20:56 ` Emmanuel Florac
[not found] ` <20070505210002.GC17112@tuatara.stupidest.org>
2007-05-06 17:21 ` Emmanuel Florac
2007-05-06 17:26 ` Chris Wedgwood
2007-05-06 18:36 ` Emmanuel Florac
2007-05-05 20:56 ` Chris Wedgwood
2007-05-06 17:19 ` Emmanuel Florac
2007-05-06 17:56 ` Martin Steigerwald
2007-05-07 2:11 ` David Chinner
2007-05-07 10:07 ` Emmanuel Florac
2007-07-30 4:07 ` richid
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070504005922.GC32602149@melbourne.sgi.com \
--to=dgc@sgi.com \
--cc=eflorac@intellique.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox