xfs and raid5 - "Structure needs cleaning for directory open"

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* xfs and raid5 - "Structure needs cleaning for directory open"
@ 2010-05-09 18:48   ` Rainer Fuegenstein
  2010-05-10  2:20     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Rainer Fuegenstein @ 2010-05-09 18:48 UTC (permalink / raw)
  To: xfs; +Cc: linux-raid


today in the morning some daemon processes terminated because of
errors in the xfs file system on top of a software raid5, consisting
of 4*1.5TB WD caviar green SATA disks.

current OS is centos 5.4, kernel is:
Linux alfred 2.6.18-164.15.1.el5xen #1 SMP Wed Mar 17 12:04:23 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

the history: this raid was originally created on an ASUS M2N-X plus
mainboard with all 4 drives connected to the on-board controller.
(centos 5.4, current i386 kernel). it
worked fine first, but after some months problems occured when copying
files via SMB, in these situations dmesg showed a stack trace,
starting with an interrupt problem deep in the kernel and reaching up
to the xfs filesystem code.

a few months ago the weekly raid check
(/etc/cron.weekly/99-raid-check) started a re-sync of the raid which
(on the M2N-X board) took about 2.5 to 3 days to complete.

to overcome the interrupt problems, I recently bought an intel D510
atom mainboard and a "Promise Technology, Inc. PDC40718 (SATA 300 TX4)
(rev 02))" sata controller, reinstalled centos 5.4 from scratch
(x86_64 version) and attached the 4 sata disks which worked fine until
this sunday night the 99-raid-check started again at 4:00 in the
morning and lasted until just now (19:00 o'clock).

around 12:00 noon (resync at about 50%) I noticed the first problems,
namely "Structure needs cleaning for directory open" messages. at this
time, a "du -sh *" revealed that around 50% of the data stored on the
xfs was lost (due to directories that couldn't be read because of the
"needs cleaning ..." error. a daring xfs_repair on the unmounted, but
still syncing filesystem revealed & fixed no errors (see output
below).

after painfully waiting 7 hours for the resync to complete, it looks
like the filesystem is OK and back to normal again: du shows the
expected 3.5TB usage, there are no more "needs cleaning ..." errors
and a quick check into the previously lost directories seems to show
that the files contained within seem to be OK.

I wonder what caused this behaviour (and how to prevent it in the
future):

1) damages done to the xfs filesystem on the old board? shouldn't
xfs_repair find & repair them?
2) does a re-syncing raid deliver bad/corrupt data to the filesystem
layer above?
3) may this be a hardware/memory problem since xfs reports "Corruption
of in-memory data detected". ?
4) is the Promise SATA controller to blame ?

here's some output that may help. please let me know if you need more:

*** this is where it started:

May  9 04:22:01 alfred kernel: md: syncing RAID array md0
May  9 04:22:01 alfred kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
May  9 04:22:01 alfred kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstructio
n.
May  9 04:22:01 alfred kernel: md: using 128k window, over a total of 1465135936 blocks.
May  9 04:24:06 alfred kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4565 of file fs/xfs/xfs_bmap.c.  Caller 0xffff
ffff8835dba8
May  9 04:24:06 alfred kernel:
May  9 04:24:06 alfred kernel: Call Trace:
May  9 04:24:06 alfred kernel:  [<ffffffff8833f15e>] :xfs:xfs_bmap_read_extents+0x361/0x384
May  9 04:24:06 alfred kernel:  [<ffffffff8835dba8>] :xfs:xfs_iread_extents+0xac/0xc8
May  9 04:24:06 alfred kernel:  [<ffffffff883448c3>] :xfs:xfs_bmapi+0x226/0xe79
May  9 04:24:06 alfred kernel:  [<ffffffff8021c4c6>] generic_make_request+0x211/0x228
May  9 04:24:06 alfred kernel:  [<ffffffff882edd2e>] :raid456:handle_stripe+0x20a6/0x21ff
May  9 04:24:06 alfred kernel:  [<ffffffff88361a2b>] :xfs:xfs_iomap+0x144/0x2a5
May  9 04:24:06 alfred kernel:  [<ffffffff88376c38>] :xfs:__xfs_get_blocks+0x7a/0x1bf
May  9 04:24:06 alfred kernel:  [<ffffffff882eebdb>] :raid456:make_request+0x4ba/0x4f4
May  9 04:24:06 alfred kernel:  [<ffffffff8029bfc3>] autoremove_wake_function+0x0/0x2e
May  9 04:24:06 alfred kernel:  [<ffffffff80228a95>] do_mpage_readpage+0x167/0x474
May  9 04:24:06 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 04:24:06 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 04:24:06 alfred kernel:  [<ffffffff8020cc70>] add_to_page_cache+0xb9/0xc5
May  9 04:24:06 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 04:24:06 alfred kernel:  [<ffffffff8023a3d8>] mpage_readpages+0x91/0xd9
May  9 04:24:06 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 04:24:06 alfred kernel:  [<ffffffff8020f66f>] __alloc_pages+0x65/0x2ce
May  9 04:24:06 alfred kernel:  [<ffffffff802137d1>] __do_page_cache_readahead+0x130/0x1ab
May  9 04:24:06 alfred kernel:  [<ffffffff802336c8>] blockable_page_cache_readahead+0x53/0xb2
May  9 04:24:06 alfred kernel:  [<ffffffff802147a4>] page_cache_readahead+0xd6/0x1af
May  9 04:24:06 alfred kernel:  [<ffffffff8020c6d7>] do_generic_mapping_read+0xc6/0x38a
May  9 04:24:06 alfred kernel:  [<ffffffff8020d693>] file_read_actor+0x0/0x101
May  9 04:24:06 alfred kernel:  [<ffffffff8020cae7>] __generic_file_aio_read+0x14c/0x198
May  9 04:24:06 alfred kernel:  [<ffffffff8837d7de>] :xfs:xfs_read+0x187/0x209
May  9 04:24:06 alfred kernel:  [<ffffffff8837a4d8>] :xfs:xfs_file_aio_read+0x63/0x6b
May  9 04:24:06 alfred kernel:  [<ffffffff8020d3d2>] do_sync_read+0xc7/0x104
May  9 04:24:06 alfred kernel:  [<ffffffff8021ecec>] __dentry_open+0x101/0x1dc
May  9 04:24:06 alfred kernel:  [<ffffffff8029bfc3>] autoremove_wake_function+0x0/0x2e
May  9 04:24:06 alfred kernel:  [<ffffffff80227a40>] do_filp_open+0x2a/0x38
May  9 04:24:06 alfred kernel:  [<ffffffff8020bbaf>] vfs_read+0xcb/0x171
May  9 04:24:06 alfred kernel:  [<ffffffff80212495>] sys_read+0x45/0x6e
May  9 04:24:06 alfred kernel:  [<ffffffff8026168d>] ia32_sysret+0x0/0x5
May  9 04:24:06 alfred kernel:
May  9 04:24:06 alfred kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4565 of file fs/xfs/xfs_bmap.c.  Caller 0xffff
ffff8835dba8
*** (many, many more)

May  9 06:19:16 alfred kernel: Filesystem "md0": corrupt dinode 1610637790, (btree extents).  Unmount and run xfs_repair.
May  9 06:19:16 alfred kernel: Filesystem "md0": XFS internal error xfs_bmap_read_extents(1) at line 4560 of file fs/xfs/xfs_bma
p.c.  Caller 0xffffffff8835dba8
May  9 06:19:16 alfred kernel:
May  9 06:19:16 alfred kernel: Call Trace:
May  9 06:19:16 alfred kernel:  [<ffffffff8833f15e>] :xfs:xfs_bmap_read_extents+0x361/0x384
May  9 06:19:16 alfred kernel:  [<ffffffff8835dba8>] :xfs:xfs_iread_extents+0xac/0xc8
May  9 06:19:16 alfred kernel:  [<ffffffff883448c3>] :xfs:xfs_bmapi+0x226/0xe79
May  9 06:19:16 alfred kernel:  [<ffffffff8866ac47>] :ip_conntrack:tcp_pkt_to_tuple+0x0/0x61
May  9 06:19:16 alfred kernel:  [<ffffffff8866883d>] :ip_conntrack:__ip_conntrack_find+0xd/0xb7
May  9 06:19:16 alfred kernel:  [<ffffffff8023f750>] lock_timer_base+0x1b/0x3c
May  9 06:19:16 alfred kernel:  [<ffffffff8021ce99>] __mod_timer+0xb0/0xbe
May  9 06:19:16 alfred kernel:  [<ffffffff88668e71>] :ip_conntrack:__ip_ct_refresh_acct+0x10f/0x152
May  9 06:19:16 alfred kernel:  [<ffffffff8866b8a8>] :ip_conntrack:tcp_packet+0xa5f/0xa9f
May  9 06:19:16 alfred kernel:  [<ffffffff88361a2b>] :xfs:xfs_iomap+0x144/0x2a5
May  9 06:19:16 alfred kernel:  [<ffffffff88376c38>] :xfs:__xfs_get_blocks+0x7a/0x1bf
May  9 06:19:16 alfred kernel:  [<ffffffff802235ae>] alloc_buffer_head+0x31/0x36
May  9 06:19:16 alfred kernel:  [<ffffffff8022fa7a>] alloc_page_buffers+0x81/0xd3
May  9 06:19:16 alfred kernel:  [<ffffffff8020ea95>] __block_prepare_write+0x1ad/0x375
May  9 06:19:16 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 06:19:16 alfred kernel:  [<ffffffff802bda81>] add_to_page_cache_lru+0x1c/0x22
May  9 06:19:16 alfred kernel:  [<ffffffff802d3456>] block_write_begin+0x80/0xcf
May  9 06:19:16 alfred kernel:  [<ffffffff8837637d>] :xfs:xfs_vm_write_begin+0x19/0x1e
May  9 06:19:16 alfred kernel:  [<ffffffff88376d8e>] :xfs:xfs_get_blocks+0x0/0xe
May  9 06:19:16 alfred kernel:  [<ffffffff8021072e>] generic_file_buffered_write+0x14b/0x60c
May  9 06:19:16 alfred kernel:  [<ffffffff80209e60>] __d_lookup+0xb0/0xff
May  9 06:19:16 alfred kernel:  [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
May  9 06:19:16 alfred kernel:  [<ffffffff8837dcfe>] :xfs:xfs_write+0x49e/0x69e
May  9 06:19:16 alfred kernel:  [<ffffffff8022d090>] mntput_no_expire+0x19/0x89
May  9 06:19:16 alfred kernel:  [<ffffffff8020edf0>] link_path_walk+0xa6/0xb2
May  9 06:19:16 alfred kernel:  [<ffffffff8837a470>] :xfs:xfs_file_aio_write+0x65/0x6a
May  9 06:19:16 alfred kernel:  [<ffffffff802185e8>] do_sync_write+0xc7/0x104
May  9 06:19:16 alfred kernel:  [<ffffffff8021ecec>] __dentry_open+0x101/0x1dc
May  9 06:19:16 alfred kernel:  [<ffffffff8029bfc3>] autoremove_wake_function+0x0/0x2e
May  9 06:19:16 alfred kernel:  [<ffffffff80227a40>] do_filp_open+0x2a/0x38
May  9 06:19:16 alfred kernel:  [<ffffffff802171aa>] vfs_write+0xce/0x174
May  9 06:19:16 alfred kernel:  [<ffffffff802179e2>] sys_write+0x45/0x6e
May  9 06:19:16 alfred kernel:  [<ffffffff8026168d>] ia32_sysret+0x0/0x5
*** also many, many more, always the same dinode

May  9 12:53:32 alfred kernel: Filesystem "md0": XFS internal error xfs_btree_check_sblock at line 307 of file fs/xfs/xfs_btree.
c.  Caller 0xffffffff88358eb7
May  9 12:53:32 alfred kernel:
May  9 12:53:32 alfred kernel: Call Trace:
May  9 12:53:32 alfred kernel:  [<ffffffff88349bee>] :xfs:xfs_btree_check_sblock+0xaf/0xbe
May  9 12:53:32 alfred kernel:  [<ffffffff88358eb7>] :xfs:xfs_inobt_increment+0x156/0x17e
May  9 12:53:32 alfred kernel:  [<ffffffff88358920>] :xfs:xfs_dialloc+0x4d0/0x80c
May  9 12:53:32 alfred kernel:  [<ffffffff802260ff>] find_or_create_page+0x3f/0xab
May  9 12:53:32 alfred kernel:  [<ffffffff8835eafc>] :xfs:xfs_ialloc+0x5f/0x57f
May  9 12:53:32 alfred kernel:  [<ffffffff8805c02a>] :ext3:ext3_get_acl+0x63/0x310
May  9 12:53:32 alfred kernel:  [<ffffffff8020b242>] kmem_cache_alloc+0x62/0x6d
May  9 12:53:32 alfred kernel:  [<ffffffff88370b23>] :xfs:xfs_dir_ialloc+0x86/0x2b7
May  9 12:53:32 alfred kernel:  [<ffffffff883654c0>] :xfs:xlog_grant_log_space+0x204/0x25c
May  9 12:53:32 alfred kernel:  [<ffffffff883735f8>] :xfs:xfs_create+0x237/0x45c
May  9 12:53:32 alfred kernel:  [<ffffffff88338d47>] :xfs:xfs_attr_get+0x8e/0x9f
May  9 12:53:32 alfred kernel:  [<ffffffff8837cd38>] :xfs:xfs_vn_mknod+0x144/0x215
May  9 12:53:32 alfred kernel:  [<ffffffff8023bdcb>] vfs_create+0xe6/0x158
May  9 12:53:32 alfred kernel:  [<ffffffff8021b38f>] open_namei+0x1a1/0x6ed
May  9 12:53:32 alfred kernel:  [<ffffffff80227a32>] do_filp_open+0x1c/0x38
May  9 12:53:32 alfred kernel:  [<ffffffff8021a1a0>] do_sys_open+0x44/0xbe
May  9 12:53:32 alfred kernel:  [<ffffffff8026168d>] ia32_sysret+0x0/0x5
May  9 12:53:32 alfred kernel:
*** also many, many more

May  9 13:44:35 alfred kernel: 00000000: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ÿÿÿÿÿÿÿÿ........
May  9 13:44:35 alfred kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c.
  Caller 0xffffffff8834b82e
May  9 13:44:35 alfred kernel:
May  9 13:44:35 alfred kernel: Call Trace:
May  9 13:44:35 alfred kernel:  [<ffffffff8834b72d>] :xfs:xfs_da_do_buf+0x503/0x5b1
May  9 13:44:35 alfred kernel:  [<ffffffff8834b82e>] :xfs:xfs_da_read_buf+0x16/0x1b
May  9 13:44:35 alfred kernel:  [<ffffffff8020cb6c>] _atomic_dec_and_lock+0x39/0x57
May  9 13:44:35 alfred kernel:  [<ffffffff8834b82e>] :xfs:xfs_da_read_buf+0x16/0x1b
May  9 13:44:35 alfred kernel:  [<ffffffff88350b0c>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
May  9 13:44:35 alfred kernel:  [<ffffffff88350b0c>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
May  9 13:44:35 alfred kernel:  [<ffffffff88379c6c>] :xfs:xfs_hack_filldir+0x0/0x5b
May  9 13:44:35 alfred kernel:  [<ffffffff88379c6c>] :xfs:xfs_hack_filldir+0x0/0x5b
May  9 13:44:35 alfred kernel:  [<ffffffff8834d868>] :xfs:xfs_readdir+0xa7/0xb6
May  9 13:44:35 alfred kernel:  [<ffffffff8837a301>] :xfs:xfs_file_readdir+0xff/0x14c
May  9 13:44:35 alfred kernel:  [<ffffffff80225d93>] filldir+0x0/0xb7
May  9 13:44:35 alfred kernel:  [<ffffffff80225d93>] filldir+0x0/0xb7
May  9 13:44:35 alfred kernel:  [<ffffffff802366f7>] vfs_readdir+0x77/0xa9
May  9 13:44:35 alfred kernel:  [<ffffffff80239f2a>] sys_getdents+0x75/0xbd
May  9 13:44:35 alfred kernel:  [<ffffffff80260295>] tracesys+0x47/0xb6
May  9 13:44:35 alfred kernel:  [<ffffffff802602f9>] tracesys+0xab/0xb6
May  9 13:44:35 alfred kernel:
May  9 13:51:24 alfred kernel: Filesystem "md0": Disabling barriers, trial barrier write failed
May  9 13:51:24 alfred kernel: XFS mounting filesystem md0

*** these xfs_da_do_buf errors appear at a rate of about 5 per second
until 14:40 o'clock, then stop.  file system was still mounted, maybe
one daemon was still accessing it.

*** xfs_repair performed when raid was at 50% resync and filesystem was
corrupted:

[root@alfred ~]# xfs_repair /dev/md0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
[...]
        - agno = 62
        - agno = 63
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
[...]
        - agno = 61
        - agno = 62
        - agno = 63
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

raid output after sync was finished:

[root@alfred md]# cat /sys/block/md0/md/array_state
clean
[root@alfred md]# cat /sys/block/md0/md/mismatch_cnt
0

tnx & cu

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs and raid5 - "Structure needs cleaning for directory open"
  2010-05-09 18:48   ` xfs and raid5 - "Structure needs cleaning for directory open" Rainer Fuegenstein
@ 2010-05-10  2:20     ` Dave Chinner
  2010-05-17 21:28       ` Doug Ledford
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2010-05-10  2:20 UTC (permalink / raw)
  To: Rainer Fuegenstein; +Cc: xfs, linux-raid

On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote:
> 
> today in the morning some daemon processes terminated because of
> errors in the xfs file system on top of a software raid5, consisting
> of 4*1.5TB WD caviar green SATA disks.

Reminds me of a recent(-ish) md/dm readahead cancellation fix - that
would fit the symptoms of (btree corruption showing up under heavy IO
load but no corruption on disk. However, I can't seem to find any
references to it at the moment (can't remember the bug title), but
perhaps your distro doesn't have the fix in it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs and raid5 - "Structure needs cleaning for directory open"
  2010-05-10  2:20     ` Dave Chinner
@ 2010-05-17 21:28       ` Doug Ledford
  2010-05-17 21:45         ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Ledford @ 2010-05-17 21:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Rainer Fuegenstein, xfs, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

On 05/09/2010 10:20 PM, Dave Chinner wrote:
> On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote:
>>
>> today in the morning some daemon processes terminated because of
>> errors in the xfs file system on top of a software raid5, consisting
>> of 4*1.5TB WD caviar green SATA disks.
> 
> Reminds me of a recent(-ish) md/dm readahead cancellation fix - that
> would fit the symptoms of (btree corruption showing up under heavy IO
> load but no corruption on disk. However, I can't seem to find any
> references to it at the moment (can't remember the bug title), but
> perhaps your distro doesn't have the fix in it?
> 
> Cheers,
> 
> Dave.

That sounds plausible, as does hardware error.  A memory bit flip under
heavy load would cause the in memory data to be corrupt while the on
disk data is good.  By waiting to check it until later, the bad memory
was flushed at some point and when the data was reloaded it came in ok
this time.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs and raid5 - "Structure needs cleaning for directory open"
  2010-05-17 21:28       ` Doug Ledford
@ 2010-05-17 21:45         ` Dave Chinner
  2010-05-17 22:18           ` Doug Ledford
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2010-05-17 21:45 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Rainer Fuegenstein, xfs, linux-raid

On Mon, May 17, 2010 at 05:28:30PM -0400, Doug Ledford wrote:
> On 05/09/2010 10:20 PM, Dave Chinner wrote:
> > On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote:
> >>
> >> today in the morning some daemon processes terminated because of
> >> errors in the xfs file system on top of a software raid5, consisting
> >> of 4*1.5TB WD caviar green SATA disks.
> > 
> > Reminds me of a recent(-ish) md/dm readahead cancellation fix - that
> > would fit the symptoms of (btree corruption showing up under heavy IO
> > load but no corruption on disk. However, I can't seem to find any
> > references to it at the moment (can't remember the bug title), but
> > perhaps your distro doesn't have the fix in it?
> > 
> > Cheers,
> > 
> > Dave.
> 
> That sounds plausible, as does hardware error.  A memory bit flip under
> heavy load would cause the in memory data to be corrupt while the on
> disk data is good.

The data dumps from the bad blocks weren't wrong by a single bit -
they were unrecogniѕable garbage - so that it very unlikely to be
a memory erro causing the problem.

> By waiting to check it until later, the bad memory
> was flushed at some point and when the data was reloaded it came in ok
> this time.

Yup - XFS needs to do a better job of catching this case - the
prototype metadata checksumming patch caught most of these cases...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs and raid5 - "Structure needs cleaning for directory open"
  2010-05-17 21:45         ` Dave Chinner
@ 2010-05-17 22:18           ` Doug Ledford
  2010-05-17 23:04             ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Ledford @ 2010-05-17 22:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Rainer Fuegenstein, xfs, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

On 05/17/2010 05:45 PM, Dave Chinner wrote:
> On Mon, May 17, 2010 at 05:28:30PM -0400, Doug Ledford wrote:
>> On 05/09/2010 10:20 PM, Dave Chinner wrote:
>>> On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote:
>>>>
>>>> today in the morning some daemon processes terminated because of
>>>> errors in the xfs file system on top of a software raid5, consisting
>>>> of 4*1.5TB WD caviar green SATA disks.
>>>
>>> Reminds me of a recent(-ish) md/dm readahead cancellation fix - that
>>> would fit the symptoms of (btree corruption showing up under heavy IO
>>> load but no corruption on disk. However, I can't seem to find any
>>> references to it at the moment (can't remember the bug title), but
>>> perhaps your distro doesn't have the fix in it?
>>>
>>> Cheers,
>>>
>>> Dave.
>>
>> That sounds plausible, as does hardware error.  A memory bit flip under
>> heavy load would cause the in memory data to be corrupt while the on
>> disk data is good.
> 
> The data dumps from the bad blocks weren't wrong by a single bit -
> they were unrecogniѕable garbage - so that it very unlikely to be
> a memory erro causing the problem.

Not true.  It can still be a single bit error but a single bit error
higher up in the chain.  Aka a single bit error in the scsi command to
read various sectors, then you read in all sorts of wrong data and
everything from there is totally whacked.

>> By waiting to check it until later, the bad memory
>> was flushed at some point and when the data was reloaded it came in ok
>> this time.
> 
> Yup - XFS needs to do a better job of catching this case - the
> prototype metadata checksumming patch caught most of these cases...
> 
> Cheers,
> 
> Dave.


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs and raid5 - "Structure needs cleaning for directory open"
  2010-05-17 22:18           ` Doug Ledford
@ 2010-05-17 23:04             ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2010-05-17 23:04 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Rainer Fuegenstein, xfs, linux-raid

On Mon, May 17, 2010 at 06:18:28PM -0400, Doug Ledford wrote:
> On 05/17/2010 05:45 PM, Dave Chinner wrote:
> > On Mon, May 17, 2010 at 05:28:30PM -0400, Doug Ledford wrote:
> >> On 05/09/2010 10:20 PM, Dave Chinner wrote:
> >>> On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote:
> >>>>
> >>>> today in the morning some daemon processes terminated because of
> >>>> errors in the xfs file system on top of a software raid5, consisting
> >>>> of 4*1.5TB WD caviar green SATA disks.
> >>>
> >>> Reminds me of a recent(-ish) md/dm readahead cancellation fix - that
> >>> would fit the symptoms of (btree corruption showing up under heavy IO
> >>> load but no corruption on disk. However, I can't seem to find any
> >>> references to it at the moment (can't remember the bug title), but
> >>> perhaps your distro doesn't have the fix in it?
> >>>
> >>> Cheers,
> >>>
> >>> Dave.
> >>
> >> That sounds plausible, as does hardware error.  A memory bit flip under
> >> heavy load would cause the in memory data to be corrupt while the on
> >> disk data is good.
> > 
> > The data dumps from the bad blocks weren't wrong by a single bit -
> > they were unrecogniѕable garbage - so that it very unlikely to be
> > a memory erro causing the problem.
> 
> Not true.  It can still be a single bit error but a single bit error
> higher up in the chain.  Aka a single bit error in the scsi command to
> read various sectors, then you read in all sorts of wrong data and
> everything from there is totally whacked.

I didn't say it *couldn't be* a bit error, just it was _very
unlikely_.  Hardware errors that result only in repeated XFS btree
corruption in memory or causing other errors in the system is
something I've never seen, even on machines with known bad memory,
HBAs, interconnects, etc. Applying Occam's Razor to this case
indicates that it is going to be caused by a software problem.

Yes, it's still possible that it's a hardware issue, just very, very
unlikely. And if it is hardware and you can prove that it was the
cause, then I suggest we all buy a lottery ticket.... ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-17 23:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1743435018.20100510013511@kaneda.iguw.tuwien.ac.at>
     [not found] ` <462402327.20100509210710@kaneda.iguw.tuwien.ac.at>
2010-05-09 18:48   ` xfs and raid5 - "Structure needs cleaning for directory open" Rainer Fuegenstein
2010-05-10  2:20     ` Dave Chinner
2010-05-17 21:28       ` Doug Ledford
2010-05-17 21:45         ` Dave Chinner
2010-05-17 22:18           ` Doug Ledford
2010-05-17 23:04             ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).