* Re: Corrupt XFS -Filesystems on new Hardware and Kernel [not found] <46094344.4090007@j-o-a.de> @ 2007-03-28 11:31 ` David Chinner 2007-03-28 12:42 ` Oliver Joa 2007-04-11 7:36 ` Oliver Joa 0 siblings, 2 replies; 11+ messages in thread From: David Chinner @ 2007-03-28 11:31 UTC (permalink / raw) To: Oliver Joa; +Cc: linux-kernel, xfs-oss On Tue, Mar 27, 2007 at 06:16:04PM +0200, Oliver Joa wrote: > Hi, > > since some weeks i try to get my new hardware running: > > Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz > Intel DP965LT Mainboard > Seagate SATA-Harddisk in AHCI-Mode > > After some hours of running or after some heavy file-i/o > (find / | cpio -padm /test) I always get a corrupted > XFS-filesystem. What is the corruption message in the log from XFS? Can you please post that? Without it we really can't help you. Also, please check to see if there are any I/O errors in the log around the time the corruption message appears. > I used already the following Kernels: > 2.6.19.2 > 2.6.19.7 > 2.6.20.2 > 2.6.20.4 > > After xfs_repair I get damaged files in lost+found. > > I read in newsgroups that the write-cache of the harddisk > should be turned of, but the messages are all very old. That's really only an issue for crashes, not runtime failures. > I also often get a sata-bus-reset with the kernels 2.6.19.2 > and 2.6.20.2. I/O errors. That's what we need to isolate first. The reports in your logs are the first thing we need to seeee. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 11:31 ` Corrupt XFS -Filesystems on new Hardware and Kernel David Chinner @ 2007-03-28 12:42 ` Oliver Joa 2007-03-28 14:56 ` Eric Sandeen 2007-03-28 23:46 ` David Chinner 2007-04-11 7:36 ` Oliver Joa 1 sibling, 2 replies; 11+ messages in thread From: Oliver Joa @ 2007-03-28 12:42 UTC (permalink / raw) To: David Chinner; +Cc: linux-kernel, xfs-oss Hi, David Chinner wrote: [...] > What is the corruption message in the log from XFS? > Can you please post that? Without it we really can't help you. > > Also, please check to see if there are any I/O errors > in the log around the time the corruption message appears. Ok, here is a test: test:/# find / -xdev | cpio -padm /test/ cpio: /usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt: Structure needs cleaning 3648371 blocks test:/# test:/home/olli# uname -a Linux test 2.6.20.4-majestix-1 #1 SMP PREEMPT Tue Mar 27 12:15:41 CEST 2007 i686 GNU/Linux dmesg gives the following: [15442.935941] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [15442.936003] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [15442.936039] [<c0211f94>] xfs_iget+0x2e4/0x714 [15442.936071] [<c0211f94>] xfs_iget+0x2e4/0x714 [15442.936101] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [15442.936135] [<c022cc6b>] xfs_lookup+0x52/0x78 [15442.936167] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [15442.936201] [<c0153e6d>] do_lookup+0xa3/0x140 [15442.936234] [<c015578e>] __link_path_walk+0x73d/0xb5e [15442.936278] [<c0211655>] xfs_iunlock+0x51/0x6d [15442.936309] [<c0155bf3>] link_path_walk+0x44/0xb3 [15442.936342] [<c0155efe>] do_path_lookup+0x176/0x191 [15442.936373] [<c0154ef8>] getname+0x59/0x8f [15442.936402] [<c01566b8>] __user_walk_fd+0x2f/0x45 [15442.936431] [<c0150a09>] vfs_lstat_fd+0x16/0x3d [15442.936461] [<c0150a75>] sys_lstat64+0xf/0x23 [15442.936490] [<c0102bd8>] syscall_call+0x7/0xb [15442.936519] ======================= And after this command: test:/# rm /usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt rm: cannot remove `/usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt': Structure needs cleaning test:/# I got: [18359.750604] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [18359.750701] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [18359.750755] [<c0211f94>] xfs_iget+0x2e4/0x714 [18359.750802] [<c0211f94>] xfs_iget+0x2e4/0x714 [18359.750849] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [18359.750897] [<c022cc6b>] xfs_lookup+0x52/0x78 [18359.750943] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [18359.750990] [<c0153e6d>] do_lookup+0xa3/0x140 [18359.751036] [<c015578e>] __link_path_walk+0x73d/0xb5e [18359.751086] [<c0155bf3>] link_path_walk+0x44/0xb3 [18359.751133] [<c0252afc>] rb_insert_color+0x4c/0xad [18359.751180] [<c0142044>] vma_link+0x54/0xcd [18359.751226] [<c0155efe>] do_path_lookup+0x176/0x191 [18359.751273] [<c0154ef8>] getname+0x59/0x8f [18359.751318] [<c01566b8>] __user_walk_fd+0x2f/0x45 [18359.751364] [<c0150a09>] vfs_lstat_fd+0x16/0x3d [18359.751410] [<c0252afc>] rb_insert_color+0x4c/0xad [18359.751457] [<c0142044>] vma_link+0x54/0xcd [18359.751501] [<c0150a75>] sys_lstat64+0xf/0x23 [18359.751546] [<c0110545>] do_page_fault+0x277/0x526 [18359.751595] [<c01102ce>] do_page_fault+0x0/0x526 [18359.751640] [<c0102bd8>] syscall_call+0x7/0xb [18359.751686] [<c0360033>] rsc_parse+0x6f/0x37f [18359.751732] ======================= [18359.751784] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [18359.751859] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [18359.751906] [<c0211f94>] xfs_iget+0x2e4/0x714 [18359.751952] [<c0211f94>] xfs_iget+0x2e4/0x714 [18359.751998] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [18359.752047] [<c022cc6b>] xfs_lookup+0x52/0x78 [18359.752094] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [18359.752140] [<c0154bcf>] __lookup_hash+0xb1/0xe1 [18359.752191] [<c0156241>] do_unlinkat+0x5f/0x126 [18359.752237] [<c0110545>] do_page_fault+0x277/0x526 [18359.752285] [<c0102bd8>] syscall_call+0x7/0xb [18359.752331] [<c0360033>] rsc_parse+0x6f/0x37f [18359.752376] ======================= Thanks a Lot Oliver ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 12:42 ` Oliver Joa @ 2007-03-28 14:56 ` Eric Sandeen 2007-03-28 19:56 ` Oliver Joa 2007-03-28 23:46 ` David Chinner 1 sibling, 1 reply; 11+ messages in thread From: Eric Sandeen @ 2007-03-28 14:56 UTC (permalink / raw) To: Oliver Joa; +Cc: David Chinner, linux-kernel, xfs-oss Oliver Joa wrote: > Ok, here is a test: > > test:/# find / -xdev | cpio -padm /test/ > cpio: /usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt: > Structure needs cleaning > 3648371 blocks > test:/# That, cryptically enough, means that the filesystem has detected a problem and has shut down. > test:/home/olli# uname -a > Linux test 2.6.20.4-majestix-1 #1 SMP PREEMPT Tue Mar 27 12:15:41 CEST > 2007 i686 GNU/Linux > > dmesg gives the following: > [15442.935941] Filesystem "sda3": XFS internal error xfs_iformat(6) at > line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 > [15442.936003] [<c0216dba>] xfs_iread+0x4ee/0x6e8 > [15442.936039] [<c0211f94>] xfs_iget+0x2e4/0x714 > [15442.936071] [<c0211f94>] xfs_iget+0x2e4/0x714 > [15442.936101] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 > [15442.936135] [<c022cc6b>] xfs_lookup+0x52/0x78 > [15442.936167] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 > [15442.936201] [<c0153e6d>] do_lookup+0xa3/0x140 > [15442.936234] [<c015578e>] __link_path_walk+0x73d/0xb5e > [15442.936278] [<c0211655>] xfs_iunlock+0x51/0x6d > [15442.936309] [<c0155bf3>] link_path_walk+0x44/0xb3 > [15442.936342] [<c0155efe>] do_path_lookup+0x176/0x191 > [15442.936373] [<c0154ef8>] getname+0x59/0x8f > [15442.936402] [<c01566b8>] __user_walk_fd+0x2f/0x45 > [15442.936431] [<c0150a09>] vfs_lstat_fd+0x16/0x3d > [15442.936461] [<c0150a75>] sys_lstat64+0xf/0x23 > [15442.936490] [<c0102bd8>] syscall_call+0x7/0xb > [15442.936519] ======================= For one reason or another, xfs has detected a corrupted on-disk inode format which it cannot recognize, and shuts down. It is likely the result of something which has gone wrong previously. xfs_repair should fix it. Are there other non-xfs messages in your logs indicating other problems prior to this? -Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 14:56 ` Eric Sandeen @ 2007-03-28 19:56 ` Oliver Joa 2007-03-29 0:21 ` Linda Walsh 0 siblings, 1 reply; 11+ messages in thread From: Oliver Joa @ 2007-03-28 19:56 UTC (permalink / raw) To: Eric Sandeen; +Cc: David Chinner, linux-kernel, xfs-oss Hi, Eric Sandeen wrote: [...] > For one reason or another, xfs has detected a corrupted on-disk inode > format which it cannot recognize, and shuts down. It is likely the > result of something which has gone wrong previously. xfs_repair should > fix it. Are there other non-xfs messages in your logs indicating other > problems prior to this? i sent already the dmesg output to the list. there is nothing else. I made a xfs_repair. Now I have some Files in lost+found. So I tried it again with a new cable: test:/# find / -xdev | cpio -padm /test/ 3648526 blocks test:/# rm -rf test test:/# find / -xdev | cpio -padm /test/ find: /usr/src/linux-2.6.19.2/arch/sh/kernel/cpufreq.c: Structure needs cleaning find: /usr/src/linux-2.6.19.2/arch/sh/kernel/head.S: Structure needs cleaning find: /usr/src/linux-2.6.19.2/arch/sh/kernel/irq.c: Structure needs cleaning 3653268 blocks test:/# Since the reboot I did not get any bus-reset, but the following: [ 1878.777203] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [ 1878.777264] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [ 1878.777298] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.777451] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.777513] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [ 1878.777576] [<c022cc6b>] xfs_lookup+0x52/0x78 [ 1878.777636] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [ 1878.777696] [<c0153e6d>] do_lookup+0xa3/0x140 [ 1878.777757] [<c015578e>] __link_path_walk+0x73d/0xb5e [ 1878.777819] [<c015ff0e>] mntput_no_expire+0x11/0x63 [ 1878.777879] [<c0155c58>] link_path_walk+0xa9/0xb3 [ 1878.777941] [<c0155bf3>] link_path_walk+0x44/0xb3 [ 1878.778001] [<c014ca81>] nameidata_to_filp+0x24/0x33 [ 1878.778074] [<c014cac2>] do_filp_open+0x32/0x39 [ 1878.778145] [<c0155efe>] do_path_lookup+0x176/0x191 [ 1878.778209] [<c0154ef8>] getname+0x59/0x8f [ 1878.778270] [<c01566b8>] __user_walk_fd+0x2f/0x45 [ 1878.778334] [<c0150a09>] vfs_lstat_fd+0x16/0x3d [ 1878.778397] [<c014ca81>] nameidata_to_filp+0x24/0x33 [ 1878.778461] [<c014cac2>] do_filp_open+0x32/0x39 [ 1878.778524] [<c0150a75>] sys_lstat64+0xf/0x23 [ 1878.778585] [<c014ecb7>] __fput+0x112/0x13c [ 1878.778647] [<c015ff0e>] mntput_no_expire+0x11/0x63 [ 1878.778709] [<c014c793>] filp_close+0x51/0x58 [ 1878.778771] [<c014d772>] sys_close+0x67/0x9e [ 1878.778832] [<c0102bd8>] syscall_call+0x7/0xb [ 1878.778895] ======================= [ 1878.974434] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [ 1878.974493] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [ 1878.974599] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.974692] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.974759] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [ 1878.974799] [<c022cc6b>] xfs_lookup+0x52/0x78 [ 1878.974888] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [ 1878.974950] [<c0153e6d>] do_lookup+0xa3/0x140 [ 1878.975015] [<c015578e>] __link_path_walk+0x73d/0xb5e [ 1878.975080] [<c03645e7>] _spin_unlock_irqrestore+0xf/0x23 [ 1878.975145] [<c0285ad3>] n_tty_receive_buf+0xc77/0xd1a [ 1878.975210] [<c0155bf3>] link_path_walk+0x44/0xb3 [ 1878.975275] [<c0155efe>] do_path_lookup+0x176/0x191 [ 1878.975338] [<c0154ef8>] getname+0x59/0x8f [ 1878.975399] [<c01566b8>] __user_walk_fd+0x2f/0x45 [ 1878.975461] [<c0150a09>] vfs_lstat_fd+0x16/0x3d [ 1878.975525] [<c0150a75>] sys_lstat64+0xf/0x23 [ 1878.975588] [<c0102bd8>] syscall_call+0x7/0xb [ 1878.975650] [<c0360033>] rsc_parse+0x6f/0x37f [ 1878.975712] ======================= [ 1878.975956] Filesystem "sda3": XFS internal error xfs_iformat(6) at line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 [ 1878.976012] [<c0216dba>] xfs_iread+0x4ee/0x6e8 [ 1878.976111] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.976184] [<c0211f94>] xfs_iget+0x2e4/0x714 [ 1878.976249] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 [ 1878.976314] [<c022cc6b>] xfs_lookup+0x52/0x78 [ 1878.976376] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 [ 1878.976438] [<c0153e6d>] do_lookup+0xa3/0x140 [ 1878.976500] [<c015578e>] __link_path_walk+0x73d/0xb5e [ 1878.976564] [<c03645e7>] _spin_unlock_irqrestore+0xf/0x23 [ 1878.976629] [<c0285ad3>] n_tty_receive_buf+0xc77/0xd1a [ 1878.976701] [<c0155bf3>] link_path_walk+0x44/0xb3 [ 1878.976766] [<c0155efe>] do_path_lookup+0x176/0x191 [ 1878.976835] [<c0154ef8>] getname+0x59/0x8f [ 1878.976898] [<c01566b8>] __user_walk_fd+0x2f/0x45 [ 1878.976961] [<c0150a09>] vfs_lstat_fd+0x16/0x3d [ 1878.977024] [<c0150a75>] sys_lstat64+0xf/0x23 [ 1878.977088] [<c0102bd8>] syscall_call+0x7/0xb [ 1878.977150] [<c0360033>] rsc_parse+0x6f/0x37f [ 1878.977212] ======================= Thanks Oliver ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 19:56 ` Oliver Joa @ 2007-03-29 0:21 ` Linda Walsh 2007-03-29 2:34 ` Linda Walsh 0 siblings, 1 reply; 11+ messages in thread From: Linda Walsh @ 2007-03-29 0:21 UTC (permalink / raw) To: Oliver Joa; +Cc: Eric Sandeen, David Chinner, linux-kernel, xfs-oss Oliver Joa wrote: >> eason or another, xfs has detected a corrupted on-disk inode format >> which it cannot recognize, and shuts down. It is likely the result >> of something which has gone wrong previously. xfs_repair should fix >> it. Are there other non-xfs messages in your logs indicating other >> problems prior to this? > i sent already the dmesg output to the list. there is nothing else. > I made a xfs_repair. Now I have some Files in lost+found. > So I tried it again with a new cable: --- I doubt it has changed significantly, but xfs was designed for stable hardware. That doesn't mean you can't pull the plug, but if you are getting SATA resets, you may be getting some writes aborted, with subsequent writes going through (speculation). I know when I had a flakey SCSI disk problem (was cable or connector in my case), I'd get a rare XFS corruption (out of ~10 years of XFS use, maybe 2-3 corruptions, all caused by loose connections, cables, etc). I'd strongly suggest you get to the bottom of the SATA reset problem. After that is fixed, then try to clean up your XFS disks (or restore from backups). Sometimes, after some intermittent hardware problems, my xfs file system was too corrupt for me to repair (at least with default xfs_repair options). Doesn't mean it was irreparable, just, I didn't know how to proceed and it was easier to restore from a daily backup than attempt to manually repair the damage. The above is based solely on my own experience. I use xfs with max(8?) logbuffs, and noatime/nodiratime, and find it to have among the best performance characteristics of any file system (overall; lowest performance aspect was file delete). XFS has a low fragmentation rate, due to how it allocates space and can delay writes. Even so, it is also one of the few file systems (only?) that comes with a "defragmenter" (xfs_fsr (file system reorganizer)). Sgi used to ship systems with xfs_fsr configured to run weekly to "watch out for" rare, degenerate cases (important for some real-time video apps). My cron runs it nightly, but often it will pass through all file systems making no changes. Fix the flakey hw -- then see if your xfs probs don't "magically" go away...however, YMMV... Linda ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-29 0:21 ` Linda Walsh @ 2007-03-29 2:34 ` Linda Walsh 2007-03-29 9:34 ` Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Linda Walsh @ 2007-03-29 2:34 UTC (permalink / raw) To: Linda Walsh Cc: Oliver Joa, Eric Sandeen, David Chinner, linux-kernel, xfs-oss Oliver Joa wrote: >> eason or another, xfs has detected a corrupted on-disk inode format >> which it cannot recognize, and shuts down. ---- Oh, one other thing that may not apply in your case, but may. Does your SATA disk support write caching? Does it support something called a barrier function? (not real clear on all the ways this can go wrong, but I believe barriers are supposed to guarantee previous data has been fixed on disk (not in write cache). If the SATA controller issues a reset, it may very well purge the write cache. Theoretically, I can think of a _possibility_, that the reset disk would purge the write cache and the barrier indicator would tell xfs to resume writing. From a recent thread on the xfs list, it would appear this could be a "bad" thing (like crossing the streams ala "ghostbusters", but in a data-integrity context). Just a "shot in the dark" -- absent knowing anything specific about your hardware or situation... If that's the case, you might want to turn off write caching, since when xfs thinks "barriers" work, it turns off some "protection", that can enable some significant speedup in some situations. As an aside, some disks, I gather, may "claim" to support barriers, but really don't. Xfs tries to verify the barrier claim, but I don't know that a reset issued to the disk will have deterministic behavior across all manufacturer's disks. A bunch of "coulds" and "maybe's", but just thinking off top of head... Linda ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-29 2:34 ` Linda Walsh @ 2007-03-29 9:34 ` Jan Kara 2007-03-29 11:14 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2007-03-29 9:34 UTC (permalink / raw) To: Linda Walsh Cc: Oliver Joa, Eric Sandeen, David Chinner, linux-kernel, xfs-oss > Oliver Joa wrote: > >>eason or another, xfs has detected a corrupted on-disk inode format > >>which it cannot recognize, and shuts down. > ---- > Oh, one other thing that may not apply in your case, but may. > Does your SATA disk support write caching? Does it support > something called a barrier function? (not real clear on all > the ways this can go wrong, but I believe barriers are supposed > to guarantee previous data has been fixed on disk (not in write > cache). If the SATA controller issues a reset, it may very well > purge the write cache. Theoretically, I can think of a _possibility_, > that the reset disk would purge the write cache and the barrier > indicator would tell xfs to resume writing. From a recent thread > on the xfs list, it would appear this could be a "bad" thing (like > crossing the streams ala "ghostbusters", but in a data-integrity > context). As far as I can remember, barrier does not mean that data is fixed on disk. It is only a command that forces all the writes before the barrier to be performed before all the writes after the barrier. So this is more an ordering restriction than a data integrity thing... Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-29 9:34 ` Jan Kara @ 2007-03-29 11:14 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2007-03-29 11:14 UTC (permalink / raw) To: Jan Kara Cc: Linda Walsh, Oliver Joa, Eric Sandeen, David Chinner, linux-kernel, xfs-oss On Thu, Mar 29 2007, Jan Kara wrote: > > Oliver Joa wrote: > > >>eason or another, xfs has detected a corrupted on-disk inode format > > >>which it cannot recognize, and shuts down. > > ---- > > Oh, one other thing that may not apply in your case, but may. > > Does your SATA disk support write caching? Does it support > > something called a barrier function? (not real clear on all > > the ways this can go wrong, but I believe barriers are supposed > > to guarantee previous data has been fixed on disk (not in write > > cache). If the SATA controller issues a reset, it may very well > > purge the write cache. Theoretically, I can think of a _possibility_, > > that the reset disk would purge the write cache and the barrier > > indicator would tell xfs to resume writing. From a recent thread > > on the xfs list, it would appear this could be a "bad" thing (like > > crossing the streams ala "ghostbusters", but in a data-integrity > > context). > As far as I can remember, barrier does not mean that data is fixed on > disk. It is only a command that forces all the writes before the barrier > to be performed before all the writes after the barrier. So this is more > an ordering restriction than a data integrity thing... A barrier write guarentees both data before barrier is on disk, as well as the barrier itself when completion is signalled. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 12:42 ` Oliver Joa 2007-03-28 14:56 ` Eric Sandeen @ 2007-03-28 23:46 ` David Chinner 2007-03-30 13:45 ` Oliver Joa 1 sibling, 1 reply; 11+ messages in thread From: David Chinner @ 2007-03-28 23:46 UTC (permalink / raw) To: Oliver Joa; +Cc: David Chinner, linux-kernel, xfs-oss On Wed, Mar 28, 2007 at 02:42:00PM +0200, Oliver Joa wrote: > Hi, > > David Chinner wrote: > > [...] > > >What is the corruption message in the log from XFS? > >Can you please post that? Without it we really can't help you. > > > >Also, please check to see if there are any I/O errors > >in the log around the time the corruption message appears. > > Ok, here is a test: > > test:/# find / -xdev | cpio -padm /test/ > cpio: /usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt: > Structure needs cleaning > 3648371 blocks > test:/# > > test:/home/olli# uname -a > Linux test 2.6.20.4-majestix-1 #1 SMP PREEMPT Tue Mar 27 12:15:41 CEST > 2007 i686 GNU/Linux > > dmesg gives the following: > [15442.935941] Filesystem "sda3": XFS internal error xfs_iformat(6) at > line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 > [15442.936003] [<c0216dba>] xfs_iread+0x4ee/0x6e8 > [15442.936039] [<c0211f94>] xfs_iget+0x2e4/0x714 > [15442.936071] [<c0211f94>] xfs_iget+0x2e4/0x714 > [15442.936101] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 So we have a corrupt inode. The error tells me that the corrupted inode is either a regular file, directory or link. Unfortunately it doesn't tell us the inode number that is corrupted. > test:/# rm /usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt > rm: cannot remove > `/usr/src/linux-2.6.20.2/Documentation/networking/NAPI_HOWTO.txt': > Structure needs cleaning > test:/# Once the filesystem shuts down this will happen to every operation. Next time you get a shutdown, can you unmount the filesystems and run xfs_check and then "xfs_repair -n" on the filesystem. These will tell you the inode numbers that are bad. Can you post the errors reported by these tools? Once you have the bad inode numbers, can you run the following on the bad inodes: # xfs_db -r -c "inode <inum>" -c "p" <device> E.g.: # xfs_db -r -c "inode 128" -c p /dev/sdb8 core.magic = 0x494e core.mode = 040755 core.version = 2 core.format = 2 (extents) ...... and post the output for us? That will enable us to see exactly what the corruption is on the inode. Cheers, Dave. > > I got: > > [18359.750604] Filesystem "sda3": XFS internal error xfs_iformat(6) at > line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 > [18359.750701] [<c0216dba>] xfs_iread+0x4ee/0x6e8 > [18359.750755] [<c0211f94>] xfs_iget+0x2e4/0x714 > [18359.750802] [<c0211f94>] xfs_iget+0x2e4/0x714 > [18359.750849] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 > [18359.750897] [<c022cc6b>] xfs_lookup+0x52/0x78 > [18359.750943] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 > [18359.750990] [<c0153e6d>] do_lookup+0xa3/0x140 > [18359.751036] [<c015578e>] __link_path_walk+0x73d/0xb5e > [18359.751086] [<c0155bf3>] link_path_walk+0x44/0xb3 > [18359.751133] [<c0252afc>] rb_insert_color+0x4c/0xad > [18359.751180] [<c0142044>] vma_link+0x54/0xcd > [18359.751226] [<c0155efe>] do_path_lookup+0x176/0x191 > [18359.751273] [<c0154ef8>] getname+0x59/0x8f > [18359.751318] [<c01566b8>] __user_walk_fd+0x2f/0x45 > [18359.751364] [<c0150a09>] vfs_lstat_fd+0x16/0x3d > [18359.751410] [<c0252afc>] rb_insert_color+0x4c/0xad > [18359.751457] [<c0142044>] vma_link+0x54/0xcd > [18359.751501] [<c0150a75>] sys_lstat64+0xf/0x23 > [18359.751546] [<c0110545>] do_page_fault+0x277/0x526 > [18359.751595] [<c01102ce>] do_page_fault+0x0/0x526 > [18359.751640] [<c0102bd8>] syscall_call+0x7/0xb > [18359.751686] [<c0360033>] rsc_parse+0x6f/0x37f > [18359.751732] ======================= > [18359.751784] Filesystem "sda3": XFS internal error xfs_iformat(6) at > line 492 of file fs/xfs/xfs_inode.c. Caller 0xc0211f94 > [18359.751859] [<c0216dba>] xfs_iread+0x4ee/0x6e8 > [18359.751906] [<c0211f94>] xfs_iget+0x2e4/0x714 > [18359.751952] [<c0211f94>] xfs_iget+0x2e4/0x714 > [18359.751998] [<c02293be>] xfs_dir_lookup_int+0x7d/0xd4 > [18359.752047] [<c022cc6b>] xfs_lookup+0x52/0x78 > [18359.752094] [<c0238a22>] xfs_vn_lookup+0x3b/0x70 > [18359.752140] [<c0154bcf>] __lookup_hash+0xb1/0xe1 > [18359.752191] [<c0156241>] do_unlinkat+0x5f/0x126 > [18359.752237] [<c0110545>] do_page_fault+0x277/0x526 > [18359.752285] [<c0102bd8>] syscall_call+0x7/0xb > [18359.752331] [<c0360033>] rsc_parse+0x6f/0x37f > [18359.752376] ======================= > > > > Thanks a Lot > > Oliver -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 23:46 ` David Chinner @ 2007-03-30 13:45 ` Oliver Joa 0 siblings, 0 replies; 11+ messages in thread From: Oliver Joa @ 2007-03-30 13:45 UTC (permalink / raw) To: David Chinner; +Cc: linux-kernel, xfs-oss Hi, David Chinner wrote: [...] > Next time you get a shutdown, can you unmount the filesystems and > run xfs_check and then "xfs_repair -n" on the filesystem. These will > tell you the inode numbers that are bad. Can you post the errors > reported by these tools? xfs_check gives this: bad format 0 for inode 8458341 type 0100000 bad format 0 for inode 8458344 type 0100000 bad format 0 for inode 8458348 type 0100000 block 1/4962 type unknown not expected block 1/4963 type unknown not expected block 1/4970 type unknown not expected block 1/4975 type unknown not expected block 1/4976 type unknown not expected link count mismatch for inode 8458341 (name ?), nlink 0, counted 1 link count mismatch for inode 8458344 (name ?), nlink 0, counted 1 link count mismatch for inode 8458348 (name ?), nlink 0, counted 1 xfs_repair -n gives this: Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 bad inode format in inode 8458341 bad inode format in inode 8458344 bad inode format in inode 8458348 bad inode format in inode 8458341 would have cleared inode 8458341 bad inode format in inode 8458344 would have cleared inode 8458344 bad inode format in inode 8458348 would have cleared inode 8458348 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 entry "cpufreq.c" at block 0 offset 152 in directory inode 8458336 references free inode 8458341 would clear inode number in entry at offset 152... entry "head.S" at block 0 offset 232 in directory inode 8458336 references free inode 8458344 would clear inode number in entry at offset 232... entry "irq.c" at block 0 offset 320 in directory inode 8458336 references free inode 8458348 would clear inode number in entry at offset 320... bad inode format in inode 8458341 would have cleared inode 8458341 bad inode format in inode 8458344 would have cleared inode 8458344 bad inode format in inode 8458348 would have cleared inode 8458348 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem starting at / ... entry "cpufreq.c" in directory inode 8458336 points to free inode 8458341, would junk entry entry "head.S" in directory inode 8458336 points to free inode 8458344, would junk entry entry "irq.c" in directory inode 8458336 points to free inode 8458348, would junk entry - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. > Once you have the bad inode numbers, can you run the following > on the bad inodes: > > # xfs_db -r -c "inode <inum>" -c "p" <device> xfs_db on inode 8458341 gives: core.magic = 0x494e core.mode = 0100644 core.version = 1 core.format = 0 (dev) core.nlinkv1 = 1 core.uid = 0 core.gid = 0 core.flushiter = 6 core.atime.sec = Tue Jan 30 22:42:51 2007 core.atime.nsec = 000000000 core.mtime.sec = Wed Jan 10 19:10:37 2007 core.mtime.nsec = 000000000 core.ctime.sec = Wed Mar 28 18:15:36 2007 core.ctime.nsec = 612718490 core.size = 6209 core.nblocks = 2 core.extsize = 0 core.nextents = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.dev = 0 xfs_db on inode 8458344 gives: core.magic = 0x494e core.mode = 0100644 core.version = 1 core.format = 0 (dev) core.nlinkv1 = 1 core.uid = 0 core.gid = 0 core.flushiter = 6 core.atime.sec = Tue Jan 30 22:42:51 2007 core.atime.nsec = 000000000 core.mtime.sec = Wed Jan 10 19:10:37 2007 core.mtime.nsec = 000000000 core.ctime.sec = Wed Mar 28 18:15:36 2007 core.ctime.nsec = 612849562 core.size = 2326 core.nblocks = 1 core.extsize = 0 core.nextents = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.dev = 0 xfs_db on inode 8458336 gives: core.magic = 0x494e core.mode = 040755 core.version = 1 core.format = 2 (extents) core.nlinkv1 = 5 core.uid = 0 core.gid = 0 core.flushiter = 1 core.atime.sec = Tue Jan 30 22:42:51 2007 core.atime.nsec = 906063000 core.mtime.sec = Wed Jan 10 19:10:37 2007 core.mtime.nsec = 000000000 core.ctime.sec = Tue Jan 30 22:44:48 2007 core.ctime.nsec = 428077021 core.size = 4096 core.nblocks = 1 core.extsize = 0 core.nextents = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,528704,1,0] [...] > and post the output for us? That will enable us to see exactly what > the corruption is on the inode. Here is it... Thanks a lot... Olli ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Corrupt XFS -Filesystems on new Hardware and Kernel 2007-03-28 11:31 ` Corrupt XFS -Filesystems on new Hardware and Kernel David Chinner 2007-03-28 12:42 ` Oliver Joa @ 2007-04-11 7:36 ` Oliver Joa 1 sibling, 0 replies; 11+ messages in thread From: Oliver Joa @ 2007-04-11 7:36 UTC (permalink / raw) To: xfs-oss; +Cc: linux-kernel Hi, David Chinner wrote: > On Tue, Mar 27, 2007 at 06:16:04PM +0200, Oliver Joa wrote: >> Hi, >> >> since some weeks i try to get my new hardware running: >> >> Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz >> Intel DP965LT Mainboard >> Seagate SATA-Harddisk in AHCI-Mode >> >> After some hours of running or after some heavy file-i/o >> (find / | cpio -padm /test) I always get a corrupted >> XFS-filesystem. I solved the problem: I made a memtest and found a lot of memory-errors, then i bought a other brand of memory and everything working fine. The first memory i used was brandnew. I bought it together with the board and processor. It was from Kingston. Now i have one from Crucial, which seems to work fine. Thanks to everyone for the help Olli ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-04-11 7:37 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <46094344.4090007@j-o-a.de>
2007-03-28 11:31 ` Corrupt XFS -Filesystems on new Hardware and Kernel David Chinner
2007-03-28 12:42 ` Oliver Joa
2007-03-28 14:56 ` Eric Sandeen
2007-03-28 19:56 ` Oliver Joa
2007-03-29 0:21 ` Linda Walsh
2007-03-29 2:34 ` Linda Walsh
2007-03-29 9:34 ` Jan Kara
2007-03-29 11:14 ` Jens Axboe
2007-03-28 23:46 ` David Chinner
2007-03-30 13:45 ` Oliver Joa
2007-04-11 7:36 ` Oliver Joa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox