From: Eric Sandeen <sandeen@sandeen.net>
To: Emmanuel Florac <eflorac@intellique.com>, xfs@oss.sgi.com
Subject: Re: easily reproducible filesystem crash on rebuilding array
Date: Thu, 11 Dec 2014 09:52:55 -0600 [thread overview]
Message-ID: <5489BDD7.10602@sandeen.net> (raw)
In-Reply-To: <20141211123936.1f3d713d@harpe.intellique.com>
On 12/11/14 5:39 AM, Emmanuel Florac wrote:
>
> Here's the setup: hardware RAID controller (Adaptec 7xx5 series, latest
> firmware), RAID-6 array (problem occured with different RAID width,
> sizes, and disk configuration), and different kernels from 3.2.x to
> 3.16.x.
>
> What happens: while the array is rebuilding, simultaneously reading and
> writing is a sure way to break the filesystem and at times, corrupt
> data.
>
> If the array is NOT rebuilding, nothing ever happens. When using the
> array in read-only mode while it rebuilds, nothing ever happens.
> However, while the array is rebuilding, relatively heavy IO almost
> certainly brings up something as follows:
>
> Dec 10 17:00:56 TEST-ADAPTEC kernel: <1<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repai<<<<<<<1<1<1>XFS (dm-0): Unmount and <<<<1<<1<1<1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:00:56 TEST-ADAPTEC kernel: <1<<<<<<1<1<1>XFS (dm-0): Unmount and run xf<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs<<<<<<<1<1<1>XFS (dm-0): Unmount and run<<<<<<<1<1><1>XFS (dm-0): Unmount and run<<<<<<<1><1<1>XFS (dm-0): Unmount and<<<<<<<1<1<1>XFS (dm-0): Unmount<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:00:56 TEST-ADAPTEC kernel: <1<<<1<1<1>XFS (dm-0): Unmount and run xfs_<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:00:56 TEST-ADAPTEC kernel: <1<<<1<1<1>XF<1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:00:58 TEST-ADAPTEC kernel: <1<<<<<<1<1>XFS (dm-0): Unmount and run xf<<<<1<1>XFS (dm-0): Unmount and run xfs_repa<<<<<<<1<1><1>XFS (dm-0): Unmount and run xfs_re<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_r<<<<<<<1<1><1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:01:01 TEST-ADAPTEC kernel: <<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair
> Dec 10 17:01:01 TEST-ADAPTEC kernel: <<<<<<<1<1<1>XFS (dm-0): Unmount and run<<<<<<<1<1<1>XFS (dm-0): Unmount and run xfs_repair
wow, that's a mess...
> Dec 10 17:01:02 TEST-ADAPTEC kernel: CPU: 6 PID: 16818 Comm: cp Tainted: G O 3.16.7-storiq64-opteron #1
> Dec 10 17:01:02 TEST-ADAPTEC kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.0a 05/07/2013
> Dec 10 17:01:02 TEST-ADAPTEC kernel: 0000000000000000 0000000000000001 ffffffff814ca287 ffff88040404a4f8
> Dec 10 17:01:02 TEST-ADAPTEC kernel: ffffffff81213f7d ffffffff81230203 ffff880200000001 ffff8802009ce703
> Dec 10 17:01:02 TEST-ADAPTEC kernel: ffff8802aa193560 0000000000000001 0000000000000002 0000000000000000
> Dec 10 17:01:02 TEST-ADAPTEC kernel: Call Trace:
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff814ca287>] ? dump_stack+0x41/0x51
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81213f7d>] ? xfs_alloc_fixup_trees+0x2dd/0x390
the actual WANT_CORRUPTED_GOTO isn't shown, but apparently xfs encountered
allocation btrees in a bad state.
Given that this only happens when your raid array is under duress, I'd lay
odds on it being a storage problem, not a filesystem problem.
-Eric
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81230203>] ? xfs_btree_get_rec+0x53/0x90
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff812168a5>] ? xfs_alloc_ag_vextent_near+0x8a5/0xae0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81216ba5>] ? xfs_alloc_ag_vextent+0xc5/0x100
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff812178c1>] ? xfs_alloc_vextent+0x441/0x5f0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8121f573>] ? xfs_bmap_btalloc_nullfb+0x73/0xe0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81226aa1>] ? xfs_bmap_btalloc+0x481/0x720
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff812277ad>] ? xfs_bmapi_write+0x55d/0x9f0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8122a857>] ? xfs_btree_read_buf_block.constprop.28+0x87/0xc0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81231976>] ? xfs_da_grow_inode_int+0xd6/0x360
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8109669d>] ? up+0xd/0x40
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811fba30>] ? xfs_buf_unlock+0x10/0x60
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811fb49e>] ? xfs_buf_rele+0x4e/0x170
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8112d246>] ? cache_alloc_refill+0x96/0x2d0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8124b32f>] ? xfs_iread+0x11f/0x410
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8123508f>] ? xfs_dir2_grow_inode+0x6f/0x130
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff812372b9>] ? xfs_dir2_sf_to_block+0xb9/0x5b0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff812137be>] ? kmem_zone_alloc+0x6e/0xf0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8114ee0a>] ? unlock_new_inode+0x3a/0x60
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8124544b>] ? xfs_ialloc+0x29b/0x530
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8123edc3>] ? xfs_dir2_sf_addname+0x113/0x5d0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81235938>] ? xfs_dir_createname+0x168/0x1a0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81245f87>] ? xfs_create+0x547/0x710
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8120981c>] ? xfs_generic_create+0xdc/0x250
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811445c1>] ? vfs_create+0x71/0xc0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81144d45>] ? do_last.isra.62+0x735/0xd00
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811415d1>] ? link_path_walk+0x61/0x7e0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811453de>] ? path_openat+0xce/0x5f0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81145a8b>] ? user_path_at_empty+0x6b/0xb0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81145b97>] ? do_filp_open+0x47/0xb0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811519da>] ? __alloc_fd+0x3a/0x100
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81135bc0>] ? do_sys_open+0x140/0x230
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff814d08a9>] ? system_call_fastpath+0x16/0x1b
> Dec 10 17:01:02 TEST-ADAPTEC kernel: CPU: 6 PID: 16818 Comm: cp Tainted: G O 3.16.7-storiq64-opteron #1
> Dec 10 17:01:02 TEST-ADAPTEC kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.0a 05/07/2013
> Dec 10 17:01:02 TEST-ADAPTEC kernel: 0000000000000000 000000000000000c ffffffff814ca287 ffff88040cde45c8
> Dec 10 17:01:02 TEST-ADAPTEC kernel: ffffffff81212fdf ffff8803201b1000 ffff8802aa193c68 ffff88040be30000
> Dec 10 17:01:02 TEST-ADAPTEC kernel: ffffffff81245d8b 0000000000000023 ffff8802aa193ba8 ffff8802aa193ba4
> Dec 10 17:01:02 TEST-ADAPTEC kernel: Call Trace:
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff814ca287>] ? dump_stack+0x41/0x51
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81212fdf>] ? xfs_trans_cancel+0xef/0x110
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81245d8b>] ? xfs_create+0x34b/0x710
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff8120981c>] ? xfs_generic_create+0xdc/0x250
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811445c1>] ? vfs_create+0x71/0xc0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81144d45>] ? do_last.isra.62+0x735/0xd00
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811415d1>] ? link_path_walk+0x61/0x7e0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811453de>] ? path_openat+0xce/0x5f0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81145a8b>] ? user_path_at_empty+0x6b/0xb0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81145b97>] ? do_filp_open+0x47/0xb0
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff811519da>] ? __alloc_fd+0x3a/0x100
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff81135bc0>] ? do_sys_open+0x140/0x230
> Dec 10 17:01:02 TEST-ADAPTEC kernel: [<ffffffff814d08a9>] ? system_call_fastpath+0x16/0x1b
> Dec 10 17:01:02 TEST-ADAPTEC kernel: XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 959 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff81212ff8
> Dec 10 17:01:25 TEST-ADAPTEC kernel: XFS (dm-0): xfs_log_force: error 5 returned.
> Dec 10 17:01:55 TEST-ADAPTEC kernel: XFS (dm-0): xfs_log_force: error 5 returned.
> Dec 10 17:02:55 TEST-ADAPTEC last message repeated 2 times
>
> Any idea is welcome...
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-12-11 15:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 11:39 easily reproducible filesystem crash on rebuilding array Emmanuel Florac
2014-12-11 15:52 ` Eric Sandeen [this message]
2014-12-15 12:07 ` Emmanuel Florac
2014-12-15 12:25 ` Emmanuel Florac
2014-12-15 20:10 ` Dave Chinner
2014-12-16 7:56 ` Christoph Hellwig
2014-12-16 11:38 ` Emmanuel Florac
2014-12-16 17:21 ` Emmanuel Florac
2014-12-16 11:34 ` Emmanuel Florac
2014-12-16 19:58 ` Dave Chinner
2014-12-17 11:21 ` Emmanuel Florac
2014-12-18 15:40 ` Emmanuel Florac
2014-12-18 22:58 ` Dave Chinner
2014-12-19 11:57 ` Emmanuel Florac
2014-12-19 23:06 ` Dave Chinner
2014-12-16 11:08 ` easily reproducible filesystem crash on rebuilding array [XFS bug in my book] Emmanuel Florac
2014-12-16 20:04 ` Dave Chinner
2015-01-13 11:21 ` easily reproducible filesystem crash on rebuilding array Emmanuel Florac
2015-01-13 13:59 ` Emmanuel Florac
[not found] <CAH-PCH7W4yTDRhAiKQwN_wQJMx2sTitQYrLNPcLYHvJRucXBjA@mail.gmail.com>
2015-09-17 6:17 ` Emmanuel Florac
2015-09-17 7:21 ` Ferenc Kovacs
2015-09-17 11:17 ` Emmanuel Florac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5489BDD7.10602@sandeen.net \
--to=sandeen@sandeen.net \
--cc=eflorac@intellique.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.