* XFS breakage in 2.6.18-rc1 @ 2006-07-18 22:29 Torsten Landschoff 2006-07-18 22:57 ` Nathan Scott 2006-07-18 23:06 ` Kevin Radloff 0 siblings, 2 replies; 45+ messages in thread From: Torsten Landschoff @ 2006-07-18 22:29 UTC (permalink / raw) To: linux-kernel Hi friends, I upgraded to 2.6.18-rc1 on sunday, with the following results (taken from my /var/log/kern.log), which ultimately led me to reinstall my system: Jul 17 07:10:12 pulsar kernel: klogd 1.4.1#18, log source = /proc/kmsg started. Jul 17 07:10:12 pulsar kernel: Linux version 2.6.18-rc1 (torsten@pulsar) (gcc version 4.1.2 20060630 (prerelease) (Debian 4.1.1-6)) #18 SMP PREEMPT Fri Jul 14 07:58:49 CEST 2006 ... Jul 17 07:10:32 pulsar kernel: agpgart: Putting AGP V3 device at 0000:03:00.0 into 4x mode Jul 17 07:10:32 pulsar kernel: [drm] Setting GART location based on new memory map Jul 17 07:10:32 pulsar kernel: [drm] Loading R200 Microcode Jul 17 07:10:32 pulsar kernel: [drm] writeback test succeeded in 1 usecs Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c. Caller 0xf8a837d0 Jul 17 07:33:53 pulsar kernel: [<f8a83313>] xfs_da_do_buf+0x4d3/0x900 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a89215>] xfs_dir2_data_log_unused+0x55/0x70 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a87337>] xfs_dir_removename+0xf7/0x100 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8a9720d>] xfs_ilock_nowait+0xcd/0x100 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ab9783>] xfs_remove+0x393/0x4c0 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs] Jul 17 07:33:53 pulsar kernel: [<c017a223>] mntput_no_expire+0x13/0x70 Jul 17 07:33:53 pulsar kernel: [<c016e0c1>] link_path_walk+0x71/0xf0 Jul 17 07:33:53 pulsar kernel: [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<c016bdca>] permission+0x8a/0xc0 Jul 17 07:33:53 pulsar kernel: [<c016c3e9>] may_delete+0x39/0x120 Jul 17 07:33:53 pulsar kernel: [<c016c957>] vfs_unlink+0x87/0xe0 Jul 17 07:33:53 pulsar kernel: [<c016e96c>] do_unlinkat+0xcc/0x150 Jul 17 07:33:53 pulsar kernel: [<c0102fbf>] syscall_call+0x7/0xb Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xf8ab97d7 Jul 17 07:33:53 pulsar kernel: [<f8aaf91d>] xfs_trans_cancel+0xdd/0x100 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs] Jul 17 07:33:53 pulsar kernel: [<c017a223>] mntput_no_expire+0x13/0x70 Jul 17 07:33:53 pulsar kernel: [<c016e0c1>] link_path_walk+0x71/0xf0 Jul 17 07:33:53 pulsar kernel: [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs] Jul 17 07:33:53 pulsar kernel: [<c016bdca>] permission+0x8a/0xc0 Jul 17 07:33:53 pulsar kernel: [<c016c3e9>] may_delete+0x39/0x120 Jul 17 07:33:53 pulsar kernel: [<c016c957>] vfs_unlink+0x87/0xe0 Jul 17 07:33:53 pulsar kernel: [<c016e96c>] do_unlinkat+0xcc/0x150 Jul 17 07:33:53 pulsar kernel: [<c0102fbf>] syscall_call+0x7/0xb Jul 17 07:33:53 pulsar kernel: xfs_force_shutdown(dm-6,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xf8ac77bc Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": Corruption of in-memory data detected. Shutting down filesystem: dm-6 Jul 17 07:33:53 pulsar kernel: Please umount the filesystem, and rectify the problem(s) Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 32K Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 8K That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny enough this happened a few months after finally replaced my ancient disk with a RAID1 array to make sure I do not lose data ;) In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken. After booting into 2.6.17 again, I could use /usr again but random files contain null bytes, firefox segfaults instead of starting up and a number of programs fail in mysterious ways. I tried to recover using xfs_repair but I feel that my partition is thorougly borked. Of course no data was lost due to backups but still I'd like this bug to be fixed ;-) If more information from my logs is required, I can make it available (and any part of the partition if required). Greetings Torsten ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff @ 2006-07-18 22:57 ` Nathan Scott 2006-07-19 8:08 ` Alistair John Strachan ` (3 more replies) 2006-07-18 23:06 ` Kevin Radloff 1 sibling, 4 replies; 45+ messages in thread From: Nathan Scott @ 2006-07-18 22:57 UTC (permalink / raw) To: Torsten Landschoff; +Cc: linux-kernel, xfs On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > Hi friends, Hi Torsten, > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > from my /var/log/kern.log), which ultimately led me to reinstall my > system: > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 I suspect you had some residual directory corruption from using the 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, fixed in the latest -stable point release). > of programs fail in mysterious ways. I tried to recover using xfs_repair > but I feel that my partition is thorougly borked. Of course no data was > lost due to backups but still I'd like this bug to be fixed ;-) 2.6.18-rc1 should be fine (contains the corruption fix). Did you mkfs and restore? Or at least get a full repair run? If you did, and you still see issues in .18-rc1, please let me know asap. thanks. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:57 ` Nathan Scott @ 2006-07-19 8:08 ` Alistair John Strachan 2006-07-19 22:56 ` Nathan Scott 2006-07-19 10:21 ` Kasper Sandberg ` (2 subsequent siblings) 3 siblings, 1 reply; 45+ messages in thread From: Alistair John Strachan @ 2006-07-19 8:08 UTC (permalink / raw) To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs On Tuesday 18 July 2006 23:57, Nathan Scott wrote: [snip] > > of programs fail in mysterious ways. I tried to recover using xfs_repair > > but I feel that my partition is thorougly borked. Of course no data was > > lost due to backups but still I'd like this bug to be fixed ;-) > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > mkfs and restore? Or at least get a full repair run? If you did, > and you still see issues in .18-rc1, please let me know asap. Just out of interest, I've got a few XFS volumes that were created 24 months ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen any crashes so far. Assuming I get the newest XFS repair tools on there, what's the disadvantage of repairing versus creating a new filesystem? What special circumstances are required to cause a crash? -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 8:08 ` Alistair John Strachan @ 2006-07-19 22:56 ` Nathan Scott 2006-07-20 10:29 ` Kasper Sandberg 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-19 22:56 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Torsten Landschoff, linux-kernel, xfs On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote: > On Tuesday 18 July 2006 23:57, Nathan Scott wrote: > [snip] > > > of programs fail in mysterious ways. I tried to recover using xfs_repair > > > but I feel that my partition is thorougly borked. Of course no data was > > > lost due to backups but still I'd like this bug to be fixed ;-) > > > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > > mkfs and restore? Or at least get a full repair run? If you did, > > and you still see issues in .18-rc1, please let me know asap. > > Just out of interest, I've got a few XFS volumes that were created 24 months > ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen > any crashes so far. > > Assuming I get the newest XFS repair tools on there, what's the disadvantage > of repairing versus creating a new filesystem? What special circumstances are > required to cause a crash? There should be no disadvantage to repairing. I will update the FAQ shortly to describe all the details of the problem, recommendations on how to address it, which kernel version is affected, etc. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 22:56 ` Nathan Scott @ 2006-07-20 10:29 ` Kasper Sandberg 0 siblings, 0 replies; 45+ messages in thread From: Kasper Sandberg @ 2006-07-20 10:29 UTC (permalink / raw) To: Nathan Scott Cc: Alistair John Strachan, Torsten Landschoff, linux-kernel, xfs On Thu, 2006-07-20 at 08:56 +1000, Nathan Scott wrote: > On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote: > > On Tuesday 18 July 2006 23:57, Nathan Scott wrote: > > [snip] > > > > of programs fail in mysterious ways. I tried to recover using xfs_repair > > > > but I feel that my partition is thorougly borked. Of course no data was > > > > lost due to backups but still I'd like this bug to be fixed ;-) > > > > > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > > > mkfs and restore? Or at least get a full repair run? If you did, > > > and you still see issues in .18-rc1, please let me know asap. > > > > Just out of interest, I've got a few XFS volumes that were created 24 months > > ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen > > any crashes so far. > > > > Assuming I get the newest XFS repair tools on there, what's the disadvantage > > of repairing versus creating a new filesystem? What special circumstances are > > required to cause a crash? > > There should be no disadvantage to repairing. I will update the FAQ > shortly to describe all the details of the problem, recommendations > on how to address it, which kernel version is affected, etc. this FAQ, is it this: http://oss.sgi.com/projects/xfs/faq.html#dir2 ? (btw, it seems that while only in the TOC once, you have the same about 2.6.17 twice..).. which version of xfsprogs should i use while doing the xfs_check ? > > cheers. > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:57 ` Nathan Scott 2006-07-19 8:08 ` Alistair John Strachan @ 2006-07-19 10:21 ` Kasper Sandberg 2006-07-19 12:43 ` Alistair John Strachan ` (2 more replies) 2006-07-19 21:14 ` XFS breakage in 2.6.18-rc1 Torsten Landschoff 2006-07-22 16:27 ` Christian Kujau 3 siblings, 3 replies; 45+ messages in thread From: Kasper Sandberg @ 2006-07-19 10:21 UTC (permalink / raw) To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > Hi friends, > > Hi Torsten, > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > > from my /var/log/kern.log), which ultimately led me to reinstall my > > system: > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > I suspect you had some residual directory corruption from using the > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > fixed in the latest -stable point release). This has me very worried. i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before. does this mean my .17-rc3 may have corrupted my filesystem? what action do you suggest i do now? > > > of programs fail in mysterious ways. I tried to recover using xfs_repair > > but I feel that my partition is thorougly borked. Of course no data was > > lost due to backups but still I'd like this bug to be fixed ;-) > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > mkfs and restore? Or at least get a full repair run? If you did, > and you still see issues in .18-rc1, please let me know asap. > > thanks. > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 10:21 ` Kasper Sandberg @ 2006-07-19 12:43 ` Alistair John Strachan 2006-07-19 15:25 ` Kasper Sandberg 2006-07-19 22:59 ` Nathan Scott 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott 2 siblings, 1 reply; 45+ messages in thread From: Alistair John Strachan @ 2006-07-19 12:43 UTC (permalink / raw) To: Kasper Sandberg; +Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote: > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > Hi friends, > > > > Hi Torsten, > > > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > > > from my /var/log/kern.log), which ultimately led me to reinstall my > > > system: > > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > I suspect you had some residual directory corruption from using the > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > fixed in the latest -stable point release). > > This has me very worried. > > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before. > does this mean my .17-rc3 may have corrupted my filesystem? > > what action do you suggest i do now? > > > > of programs fail in mysterious ways. I tried to recover using > > > xfs_repair but I feel that my partition is thorougly borked. Of course > > > no data was lost due to backups but still I'd like this bug to be fixed > > > ;-) > > > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > > mkfs and restore? Or at least get a full repair run? If you did, > > and you still see issues in .18-rc1, please let me know asap. > > > > thanks. According to another thread Nathan just responded to, it sounds like we need to wait for a new version of the xfsprogs package, and then run xfs_repair on the affected filesystems. I wouldn't worry about it too much if you've not had any crashes. The damage can be repaired, just not right now. I'm still waiting for a crash on a machine that has been under heavy load for 28 days, so it's obviously not _that_ easy to trigger. -- Cheers, Alistair. Third year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 12:43 ` Alistair John Strachan @ 2006-07-19 15:25 ` Kasper Sandberg 0 siblings, 0 replies; 45+ messages in thread From: Kasper Sandberg @ 2006-07-19 15:25 UTC (permalink / raw) To: Alistair John Strachan Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs On Wed, 2006-07-19 at 13:43 +0100, Alistair John Strachan wrote: > On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote: > > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > > Hi friends, > > > > > > Hi Torsten, > > > > > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > > > > from my /var/log/kern.log), which ultimately led me to reinstall my > > > > system: > > > > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > > > I suspect you had some residual directory corruption from using the > > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > > fixed in the latest -stable point release). > > > > This has me very worried. > > > > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before. > > does this mean my .17-rc3 may have corrupted my filesystem? > > > > what action do you suggest i do now? > > > > > > of programs fail in mysterious ways. I tried to recover using > > > > xfs_repair but I feel that my partition is thorougly borked. Of course > > > > no data was lost due to backups but still I'd like this bug to be fixed > > > > ;-) > > > > > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > > > mkfs and restore? Or at least get a full repair run? If you did, > > > and you still see issues in .18-rc1, please let me know asap. > > > > > > thanks. > > According to another thread Nathan just responded to, it sounds like we need > to wait for a new version of the xfsprogs package, and then run xfs_repair on > the affected filesystems. I wouldn't worry about it too much if you've not > had any crashes. The damage can be repaired, just not right now. without ANY loss? because even though it would be abit painful for me to do, i do have the option of smashing in a new drive, copy everything, and reinitialize my filesystem. > > I'm still waiting for a crash on a machine that has been under heavy load for > 28 days, so it's obviously not _that_ easy to trigger. so basically if i upgrade to a safe kernel before i do get these errors, im good? > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 10:21 ` Kasper Sandberg 2006-07-19 12:43 ` Alistair John Strachan @ 2006-07-19 22:59 ` Nathan Scott 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott 2 siblings, 0 replies; 45+ messages in thread From: Nathan Scott @ 2006-07-19 22:59 UTC (permalink / raw) To: Kasper Sandberg; +Cc: Torsten Landschoff, linux-kernel, xfs On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote: > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > Hi friends, > > > > Hi Torsten, > > > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > > > from my /var/log/kern.log), which ultimately led me to reinstall my > > > system: > > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > I suspect you had some residual directory corruption from using the > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > fixed in the latest -stable point release). > This has me very worried. > > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before. > does this mean my .17-rc3 may have corrupted my filesystem? > > what action do you suggest i do now? The odds are decent that you're unaffected. You can check your filesystem using xfs_check or xfs_repair -n and these will give you a good indication as to whether further action is required. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* FAQ updated (was Re: XFS breakage...) 2006-07-19 10:21 ` Kasper Sandberg 2006-07-19 12:43 ` Alistair John Strachan 2006-07-19 22:59 ` Nathan Scott @ 2006-07-20 7:13 ` Nathan Scott 2006-07-20 12:42 ` Hans-Peter Jansen ` (3 more replies) 2 siblings, 4 replies; 45+ messages in thread From: Nathan Scott @ 2006-07-20 7:13 UTC (permalink / raw) To: Kasper Sandberg, Justin Piszcz, Torsten Landschoff; +Cc: linux-kernel, xfs On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote: > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > I suspect you had some residual directory corruption from using the > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > fixed in the latest -stable point release). Correction there - no -stable exists with this yet, I guess that'll be 2.6.17.7 once its out though. > what action do you suggest i do now? I've captured the state of this issue here, with options and ways to correct the problem: http://oss.sgi.com/projects/xfs/faq.html#dir2 Hope this helps. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott @ 2006-07-20 12:42 ` Hans-Peter Jansen 2006-07-20 13:28 ` David Greaves ` (2 subsequent siblings) 3 siblings, 0 replies; 45+ messages in thread From: Hans-Peter Jansen @ 2006-07-20 12:42 UTC (permalink / raw) To: Nathan Scott; +Cc: linux-kernel, xfs Hi Nathan, Am Donnerstag, 20. Juli 2006 09:13 schrieb Nathan Scott: > > I've captured the state of this issue here, with options and ways > to correct the problem: > http://oss.sgi.com/projects/xfs/faq.html#dir2 Thanks for the pointer. I think, it is valuable for all XFS users, but reading the FAQ with a decent webbrowser on linux (konqueror 3.5.3 and firefox 1.5.0.4 in my case) is very painful, due to the overlong lines, not to speak of printing such texts. Try yourself: load that page into konqueror, hit print, select preview and 'have fun', hmm, suffer.. Pete ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott 2006-07-20 12:42 ` Hans-Peter Jansen @ 2006-07-20 13:28 ` David Greaves 2006-07-20 16:11 ` Chris Wedgwood 2006-07-20 15:13 ` Kevin Radloff 2006-07-31 16:25 ` Jan Kasprzak 3 siblings, 1 reply; 45+ messages in thread From: David Greaves @ 2006-07-20 13:28 UTC (permalink / raw) To: Nathan Scott Cc: Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel, xfs, cw, ml, radsaq Nathan Scott wrote: > Correction there - no -stable exists with this yet, I guess that'll > be 2.6.17.7 once its out though. > >> what action do you suggest i do now? > > I've captured the state of this issue here, with options and ways > to correct the problem: > http://oss.sgi.com/projects/xfs/faq.html#dir2 > > Hope this helps. It does, thanks :) Does this problem exist in 2.16.6.x?? >From various comments like: Unless 2.6.16.x is a dead-end could we please also have this patch put into there? and a result (I believe) of the corruption bug that was in 2.6.16/17. and I just want to confirm this bug as well and unfortunately it was my system disk too who had to take the hit. Im running 2.6.16 I assume it does. But the FAQ says: Q: What is the issue with directory corruption in Linux 2.6.17? In the Linux kernel 2.6.17 release a subtle bug... which implies it's not... HELP So given this is from 2.6.16.9: /* * One less used entry in the free table. */ INT_MOD(free->hdr.nused, ARCH_CONVERT, -1); xfs_dir2_free_log_header(tp, fbp); and it looks awfully similar to the patch which says: --- linux-2.6.17.2.orig/fs/xfs/xfs_dir2_node.c +++ linux-2.6.17.2/fs/xfs/xfs_dir2_node.c @@ -970,7 +970,7 @@ xfs_dir2_leafn_remove( /* * One less used entry in the free table. */ - free->hdr.nused = cpu_to_be32(-1); + be32_add(&free->hdr.nused, -1); xfs_dir2_free_log_header(tp, fbp); Should 2.6.16.x replace INT_MOD(free->hdr.nused, ARCH_CONVERT, -1); with be32_add(&free->hdr.nused, -1); I hope so because I assumed there simply wasn't a patch for 2.6.16 and applied this 'best guess' to my servers and rebooted/remounted successfully. David -- ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 13:28 ` David Greaves @ 2006-07-20 16:11 ` Chris Wedgwood 2006-07-20 22:14 ` Nathan Scott 0 siblings, 1 reply; 45+ messages in thread From: Chris Wedgwood @ 2006-07-20 16:11 UTC (permalink / raw) To: David Greaves Cc: Nathan Scott, Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote: > Does this problem exist in 2.16.6.x?? The change was merged after 2.6.16.x was branched, I was mistaken in how long I thought the bug has been about. > I hope so because I assumed there simply wasn't a patch for 2.6.16 and > applied this 'best guess' to my servers and rebooted/remounted successfully. Doing the correct change to 2.6.16.x won't hurt, but it's not necessary. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 16:11 ` Chris Wedgwood @ 2006-07-20 22:14 ` Nathan Scott 2006-07-20 22:18 ` Justin Piszcz 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-20 22:14 UTC (permalink / raw) To: Chris Wedgwood Cc: David Greaves, Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 09:11:21AM -0700, Chris Wedgwood wrote: > On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote: > > > Does this problem exist in 2.16.6.x?? > > The change was merged after 2.6.16.x was branched, I was mistaken > in how long I thought the bug has been about. > > > I hope so because I assumed there simply wasn't a patch for 2.6.16 and > > applied this 'best guess' to my servers and rebooted/remounted successfully. > > Doing the correct change to 2.6.16.x won't hurt, but it's not > necessary. Yep. As Chris said, 2.6.17 is the only affected kernel. I've fixed up the whacky html formatting and my merge error (thanks to all for reporting those) so its a bit more readable now. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:14 ` Nathan Scott @ 2006-07-20 22:18 ` Justin Piszcz 2006-07-20 22:24 ` Nathan Scott 0 siblings, 1 reply; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 22:18 UTC (permalink / raw) To: Nathan Scott Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq Nathan, Does the bug only occur during a crash? I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all on UPS) - no issue? Justin. On Fri, 21 Jul 2006, Nathan Scott wrote: > On Thu, Jul 20, 2006 at 09:11:21AM -0700, Chris Wedgwood wrote: >> On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote: >> >>> Does this problem exist in 2.16.6.x?? >> >> The change was merged after 2.6.16.x was branched, I was mistaken >> in how long I thought the bug has been about. >> >>> I hope so because I assumed there simply wasn't a patch for 2.6.16 and >>> applied this 'best guess' to my servers and rebooted/remounted successfully. >> >> Doing the correct change to 2.6.16.x won't hurt, but it's not >> necessary. > > Yep. As Chris said, 2.6.17 is the only affected kernel. I've > fixed up the whacky html formatting and my merge error (thanks > to all for reporting those) so its a bit more readable now. > > cheers. > > -- > Nathan > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:18 ` Justin Piszcz @ 2006-07-20 22:24 ` Nathan Scott 2006-07-20 22:43 ` Justin Piszcz 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-20 22:24 UTC (permalink / raw) To: Justin Piszcz Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 06:18:14PM -0400, Justin Piszcz wrote: > Nathan, > > Does the bug only occur during a crash? No, its unrelated to crashing. Only when adding/removing from a directory that is in a specific node/btree format (many entries), and only under a specific set of conditions (like what directory entry names were used, which blocks they've hashed to and how they ended up being allocated and in what order each block gets removed from the directory). > I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all > on UPS) - no issue? Could be an issue, could be none. xfs_check it to be sure. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:24 ` Nathan Scott @ 2006-07-20 22:43 ` Justin Piszcz 2006-07-20 22:52 ` Nathan Scott 0 siblings, 1 reply; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 22:43 UTC (permalink / raw) To: Nathan Scott Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Fri, 21 Jul 2006, Nathan Scott wrote: > On Thu, Jul 20, 2006 at 06:18:14PM -0400, Justin Piszcz wrote: >> Nathan, >> >> Does the bug only occur during a crash? > > No, its unrelated to crashing. Only when adding/removing from a > directory that is in a specific node/btree format (many entries), > and only under a specific set of conditions (like what directory > entry names were used, which blocks they've hashed to and how they > ended up being allocated and in what order each block gets removed > from the directory). > >> I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all >> on UPS) - no issue? > > Could be an issue, could be none. xfs_check it to be sure. > > cheers. > > -- > Nathan > > p34:~# xfs_check -v /dev/md3 xfs_check: out of memory p34:~# D'oh... 1GB ram, 2GB swap trying to check a 2.6T fs, no dice. As long as it mounted ok with the patched kernel, should one be ok? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:43 ` Justin Piszcz @ 2006-07-20 22:52 ` Nathan Scott 2006-07-20 22:55 ` Justin Piszcz 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-20 22:52 UTC (permalink / raw) To: Justin Piszcz Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote: > p34:~# xfs_check -v /dev/md3 > xfs_check: out of memory > p34:~# > > D'oh... xfs_repair -n is another option, it has a cheaper (memory wise, usually) checking algorithm. > As long as it mounted ok with the patched kernel, should one be ok? Not necessarily, no - mount will only read the root inode. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:52 ` Nathan Scott @ 2006-07-20 22:55 ` Justin Piszcz 2006-07-20 22:57 ` Justin Piszcz 2006-07-20 23:00 ` Nathan Scott 0 siblings, 2 replies; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 22:55 UTC (permalink / raw) To: Nathan Scott Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq Nasty! - agno = 37 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem starting at / ... free block 16777216 for directory inode 2684356622 bad nused free block 16777216 for directory inode 2147485710 bad nused - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. p34:~# I applied the "one line fix" - I should be ok now? On Fri, 21 Jul 2006, Nathan Scott wrote: > On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote: >> p34:~# xfs_check -v /dev/md3 >> xfs_check: out of memory >> p34:~# >> >> D'oh... > > xfs_repair -n is another option, it has a cheaper (memory wise, > usually) checking algorithm. > >> As long as it mounted ok with the patched kernel, should one be ok? > > Not necessarily, no - mount will only read the root inode. > > cheers. > > -- > Nathan > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:55 ` Justin Piszcz @ 2006-07-20 22:57 ` Justin Piszcz 2006-07-20 23:00 ` Nathan Scott 1 sibling, 0 replies; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 22:57 UTC (permalink / raw) To: Nathan Scott Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq Erm, the xfs_repair -n only prints out what it needs to fix, I read somewhere that xfs_repair may make things worse? What is the 'correct' fix? On Thu, 20 Jul 2006, Justin Piszcz wrote: > Nasty! > > - agno = 37 > No modify flag set, skipping phase 5 > Phase 6 - check inode connectivity... > - traversing filesystem starting at / ... > free block 16777216 for directory inode 2684356622 bad nused > free block 16777216 for directory inode 2147485710 bad nused > - traversal finished ... > - traversing all unattached subtrees ... > - traversals finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify link counts... > No modify flag set, skipping filesystem flush and exiting. > p34:~# > > I applied the "one line fix" - I should be ok now? > > > > On Fri, 21 Jul 2006, Nathan Scott wrote: > >> On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote: >>> p34:~# xfs_check -v /dev/md3 >>> xfs_check: out of memory >>> p34:~# >>> >>> D'oh... >> >> xfs_repair -n is another option, it has a cheaper (memory wise, >> usually) checking algorithm. >> >>> As long as it mounted ok with the patched kernel, should one be ok? >> >> Not necessarily, no - mount will only read the root inode. >> >> cheers. >> >> -- >> Nathan >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 22:55 ` Justin Piszcz 2006-07-20 22:57 ` Justin Piszcz @ 2006-07-20 23:00 ` Nathan Scott 2006-07-20 23:10 ` Justin Piszcz 1 sibling, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-20 23:00 UTC (permalink / raw) To: Justin Piszcz Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 06:55:51PM -0400, Justin Piszcz wrote: > Phase 6 - check inode connectivity... > - traversing filesystem starting at / ... > free block 16777216 for directory inode 2684356622 bad nused > free block 16777216 for directory inode 2147485710 bad nused > - traversal finished ... > ... > I applied the "one line fix" - I should be ok now? You have two corrupt directory inodes (caused by this bug, that is exactly the signature I'd expect - it was a nused field that was affected by the dodgey endian change). The two inodes need to be fixed - consult the FAQ for details. Once fixed, and with a patched kernel, you're set. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 23:00 ` Nathan Scott @ 2006-07-20 23:10 ` Justin Piszcz 2006-07-20 23:12 ` Chris Wedgwood 0 siblings, 1 reply; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 23:10 UTC (permalink / raw) To: Nathan Scott Cc: Chris Wedgwood, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq Nathan, Running xfs_repair multiple times (after following the FAQ for the write core.mode 0 fix), I get this: - agno = 3 - agno = 4 - agno = 5 - agno = 6 entry ".." at block 0 offset 1352 in directory inode 3221227534 references free inode 2112 clearing inode number in entry at offset 1352... no .. entry for directory 3221227534 - agno = 7 - agno = 8 - agno = 9 disconnected inode 2684386082, moving to lost+found disconnected inode 2684386083, moving to lost+found disconnected inode 2684386084, moving to lost+found disconnected inode 2684386085, moving to lost+found disconnected inode 2684386086, moving to lost+found disconnected inode 2684386087, moving to lost+found disconnected inode 2684386088, moving to lost+found disconnected inode 2684386089, moving to lost+found disconnected inode 2684386090, moving to lost+found disconnected inode 2684386091, moving to lost+found disconnected inode 2684386092, moving to lost+found disconnected inode 2684386093, moving to lost+found disconnected inode 2684386094, moving to lost+found disconnected inode 2684386095, moving to lost+found disconnected inode 2684386096, moving to lost+found disconnected inode 2684386097, moving to lost+found disconnected inode 2684386098, moving to lost+found disconnected inode 2684386099, moving to lost+found disconnected inode 2684386100, moving to lost+found disconnected inode 2684386101, moving to lost+found disconnected inode 2684386102, moving to lost+found disconnected inode 2684386103, moving to lost+found disconnected inode 2684386104, moving to lost+found disconnected inode 2684386105, moving to lost+found disconnected inode 2684386106, moving to lost+found disconnected inode 2684386107, moving to lost+found disconnected inode 2684386108, moving to lost+found disconnected inode 2684386109, moving to lost+found disconnected inode 2684386110, moving to lost+found disconnected inode 2684386111, moving to lost+found disconnected inode 2684386112, moving to lost+found disconnected inode 2684386113, moving to lost+found disconnected inode 2684386114, moving to lost+found disconnected inode 2684386115, moving to lost+found disconnected inode 2684386116, moving to lost+found disconnected inode 2684386117, moving to lost+found disconnected inode 2684386118, moving to lost+found disconnected inode 2684386119, moving to lost+found disconnected inode 2684386120, moving to lost+found disconnected inode 2684386121, moving to lost+found disconnected inode 2684386122, moving to lost+found disconnected inode 2684386123, moving to lost+found disconnected inode 2684386124, moving to lost+found disconnected inode 2684386125, moving to lost+found disconnected inode 2684386126, moving to lost+found disconnected inode 2684386127, moving to lost+found disconnected inode 2684386128, moving to lost+found disconnected inode 2684386129, moving to lost+found disconnected inode 2684386130, moving to lost+found disconnected inode 2684386131, moving to lost+found disconnected inode 2684386132, moving to lost+found disconnected inode 2684386133, moving to lost+found disconnected inode 2684386134, moving to lost+found disconnected inode 2684386135, moving to lost+found disconnected inode 2684386136, moving to lost+found disconnected inode 2684386137, moving to lost+found disconnected inode 2684386138, moving to lost+found disconnected inode 2684386139, moving to lost+found disconnected inode 2684386140, moving to lost+found disconnected inode 2684386141, moving to lost+found disconnected inode 2684386142, moving to lost+found disconnected inode 2684386143, moving to lost+found disconnected inode 2684386144, moving to lost+found disconnected inode 2684386145, moving to lost+found disconnected inode 2684386146, moving to lost+found disconnected inode 2684386147, moving to lost+found disconnected inode 2684386148, moving to lost+found disconnected inode 2684386149, moving to lost+found disconnected inode 2684386150, moving to lost+found disconnected inode 2684386151, moving to lost+found disconnected inode 2684386152, moving to lost+found disconnected inode 2684386153, moving to lost+found disconnected inode 2684386154, moving to lost+found disconnected inode 2684386155, moving to lost+found disconnected inode 2684386156, moving to lost+found disconnected inode 2684386157, moving to lost+found disconnected inode 2684386158, moving to lost+found disconnected inode 2684386159, moving to lost+found disconnected inode 2684386160, moving to lost+found disconnected inode 2684386161, moving to lost+found disconnected inode 2684386162, moving to lost+found disconnected inode 2684386163, moving to lost+found disconnected inode 2684386164, moving to lost+found disconnected inode 2684386165, moving to lost+found disconnected inode 2684653605, moving to lost+found disconnected dir inode 3221227534, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 3221227534 nlinks from 3 to 2 done p34:~# I can run this over and over, and the result is the same? On Fri, 21 Jul 2006, Nathan Scott wrote: > On Thu, Jul 20, 2006 at 06:55:51PM -0400, Justin Piszcz wrote: >> Phase 6 - check inode connectivity... >> - traversing filesystem starting at / ... >> free block 16777216 for directory inode 2684356622 bad nused >> free block 16777216 for directory inode 2147485710 bad nused >> - traversal finished ... >> ... >> I applied the "one line fix" - I should be ok now? > > You have two corrupt directory inodes (caused by this bug, that > is exactly the signature I'd expect - it was a nused field that > was affected by the dodgey endian change). The two inodes need > to be fixed - consult the FAQ for details. > > Once fixed, and with a patched kernel, you're set. > > cheers. > > -- > Nathan > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 23:10 ` Justin Piszcz @ 2006-07-20 23:12 ` Chris Wedgwood 2006-07-20 23:15 ` Justin Piszcz 2006-07-20 23:19 ` Nathan Scott 0 siblings, 2 replies; 45+ messages in thread From: Chris Wedgwood @ 2006-07-20 23:12 UTC (permalink / raw) To: Justin Piszcz Cc: Nathan Scott, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote: > I can run this over and over, and the result is the same? lost+found is recreated every time, rename it and you'll get less output ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 23:12 ` Chris Wedgwood @ 2006-07-20 23:15 ` Justin Piszcz 2006-07-20 23:19 ` Nathan Scott 1 sibling, 0 replies; 45+ messages in thread From: Justin Piszcz @ 2006-07-20 23:15 UTC (permalink / raw) To: Chris Wedgwood Cc: Nathan Scott, David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq Thanks, that was it, after removing the lost+found directory & re-running xfs_repair, I no longer have any errors, onthat device anyway. On Thu, 20 Jul 2006, Chris Wedgwood wrote: > On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote: > >> I can run this over and over, and the result is the same? > > lost+found is recreated every time, rename it and you'll get less > output > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 23:12 ` Chris Wedgwood 2006-07-20 23:15 ` Justin Piszcz @ 2006-07-20 23:19 ` Nathan Scott 1 sibling, 0 replies; 45+ messages in thread From: Nathan Scott @ 2006-07-20 23:19 UTC (permalink / raw) To: Justin Piszcz, Chris Wedgwood Cc: David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel, xfs, ml, radsaq On Thu, Jul 20, 2006 at 04:12:46PM -0700, Chris Wedgwood wrote: > On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote: > > > I can run this over and over, and the result is the same? > > lost+found is recreated every time, rename it and you'll get less > output Yes this is the current xfs_repair behaviour (any previously unlinked inodes will be found as unlinked on each successive run, due to lost+found being recreated). This will likely be rethought soon (not far off), since it confuses everyone. So, its all good - xfs_repair has fixed things and you're all set now. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott 2006-07-20 12:42 ` Hans-Peter Jansen 2006-07-20 13:28 ` David Greaves @ 2006-07-20 15:13 ` Kevin Radloff 2006-07-20 16:51 ` Alistair John Strachan 2006-07-31 16:25 ` Jan Kasprzak 3 siblings, 1 reply; 45+ messages in thread From: Kevin Radloff @ 2006-07-20 15:13 UTC (permalink / raw) To: Nathan Scott Cc: Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel, xfs On 7/20/06, Nathan Scott <nathans@sgi.com> wrote: > On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote: > > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > > > I suspect you had some residual directory corruption from using the > > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > > fixed in the latest -stable point release). > > Correction there - no -stable exists with this yet, I guess that'll > be 2.6.17.7 once its out though. > > > what action do you suggest i do now? > > I've captured the state of this issue here, with options and ways > to correct the problem: > http://oss.sgi.com/projects/xfs/faq.html#dir2 > > Hope this helps. I actually tried the xfs_db method to fix my / filesystem (as you had outlined in http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2), and while it's quite possible that I screwed it up, after a subsequent xfs_repair run (which completed successfully and moved lots of stuff to /lost+found, as I would expect), the XFS code had serious problems with various parts of my filesystem (like "ls /lost+found", which would cause lots of errors to be logged, although not a complete fs shutdown). After another run through xfs_repair resulted in a filesystem that would no longer even successfully boot. Unfortunately it was a mostly-full 74GB big-/ partition on my primary machine (a laptop), so I don't have a dump of it for you and my report is probably pretty useless. :( But on the bright side, virtually all of the filesystem was otherwise intact and I was able to get all my data off before rebuilding my system. -- Kevin 'radsaq' Radloff radsaq@gmail.com http://thesaq.com/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 15:13 ` Kevin Radloff @ 2006-07-20 16:51 ` Alistair John Strachan 0 siblings, 0 replies; 45+ messages in thread From: Alistair John Strachan @ 2006-07-20 16:51 UTC (permalink / raw) To: Kevin Radloff Cc: Nathan Scott, Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel, xfs On Thursday 20 July 2006 16:13, Kevin Radloff wrote: > On 7/20/06, Nathan Scott <nathans@sgi.com> wrote: > > On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote: > > > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote: > > > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote: > > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216 > > > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538 > > > > > > > > I suspect you had some residual directory corruption from using the > > > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > > > fixed in the latest -stable point release). > > > > Correction there - no -stable exists with this yet, I guess that'll > > be 2.6.17.7 once its out though. > > > > > what action do you suggest i do now? > > > > I've captured the state of this issue here, with options and ways > > to correct the problem: > > http://oss.sgi.com/projects/xfs/faq.html#dir2 > > > > Hope this helps. > > I actually tried the xfs_db method to fix my / filesystem (as you had > outlined in > http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2), and > while it's quite possible that I screwed it up, after a subsequent > xfs_repair run (which completed successfully and moved lots of stuff to > /lost+found, as I would expect), the XFS code had serious problems with > various parts of my filesystem (like "ls /lost+found", which > would cause lots of errors to be logged, although not a complete fs > shutdown). After another run through xfs_repair resulted in a > filesystem that would no longer even successfully boot. > > Unfortunately it was a mostly-full 74GB big-/ partition on my primary > machine (a laptop), so I don't have a dump of it for you and my report > is probably pretty useless. :( But on the bright side, virtually all > of the filesystem was otherwise intact and I was able to get all my > data off before rebuilding my system. I've been hit by this on my root filesystem today, and when I followed the instructions, I was able to retrieve my data. It turned out that the only corrupted inode was /, which unfortunately meant I had to repair the filesystem with a boot-cd. However, it was obvious which inodes corresponded to which directories, and I was able to repair it. I'm not sure this advice is sound, but it seems to me that if you're running an affected 2.6.17 kernel (or ever have) on an XFS volume, it's not worth risking destruction if you haven't had any oopses. The filesystem will get worse, but hopefully in a non-fatal way, and the XFS guys will hopefully have an xfs_repair up that works, soon. Right now I'd highly recommend copying as much as possible from the corrupted filesystem )after following the instructions) to a new filesystem (with an unaffected kernel, of course) and destroying the old one. I still have inconsistencies on the filesystem I "repaired", and it was a fairly painful process. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott ` (2 preceding siblings ...) 2006-07-20 15:13 ` Kevin Radloff @ 2006-07-31 16:25 ` Jan Kasprzak 2006-07-31 16:38 ` Justin Piszcz 2006-08-02 4:32 ` Nathan Scott 3 siblings, 2 replies; 45+ messages in thread From: Jan Kasprzak @ 2006-07-31 16:25 UTC (permalink / raw) To: Nathan Scott; +Cc: linux-kernel, xfs Nathan Scott wrote: : I've captured the state of this issue here, with options and ways : to correct the problem: : http://oss.sgi.com/projects/xfs/faq.html#dir2 : : Hope this helps. I have been hit with this bug as well - I tried to clear the two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran xfs_repair (lots of files ended up in lost+found), but apparently the volume is still not OK - when I tried to use it (this volume is a public FTP archive), I got the following traces: Jul 30 16:04:49 odysseus kernel: Filesystem "md5": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff80324221 Jul 30 16:04:49 odysseus kernel: Jul 30 16:04:49 odysseus kernel: Call Trace: <ffffffff803331ac>{xfs_corruption_error+228} Jul 30 16:04:49 odysseus kernel: <ffffffff8035630e>{kmem_zone_alloc+86} <ffffffff803240f0>{xfs_da_do_buf+1359} Jul 30 16:04:49 odysseus kernel: <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff80323aba>{xfs_da_buf_make+31} Jul 30 16:04:49 odysseus kernel: <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff803263e7>{xfs_da_node_lookup_int+112} Jul 30 16:04:49 odysseus kernel: <ffffffff803263e7>{xfs_da_node_lookup_int+112} <ffffffff8032c7b8>{xfs_dir2_node_lookup+70} Jul 30 16:04:50 odysseus kernel: <ffffffff80327b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256} Jul 30 16:04:51 odysseus kernel: <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79} Jul 30 16:04:51 odysseus kernel: <ffffffff8035c95a>{xfs_vn_lo7b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256} Jul 30 16:04:52 odysseus kernel: <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79} Jul 30 16:04:53 odysseus kernel: <ffffffff8035c95a>{xfs_vn_lookup+48} <ffffffff80270b45>{do_lookup+196} Jul 30 16:04:53 odysseus rpc.statd[3145]: Caught signal 15, un-registering and exiting. Jul 30 16:04:53 odysseus kernel: <ffffffff802729c6>{__link_path_walk+2435} <ffffffff80272f40>{link_path_walk+89} Jul 30 16:04:53 odysseus kernel: <ffffffff8049c0d2>{__sched_text_start+290} <ffffffff80273396>{do_path_lookup+614} Jul 30 16:04:53 odysseus kernel: <ffffffff80271e47>{getname+347} <ffffffff80273bd6>{__user_walk_fd+55} Jul 30 16:04:53 odysseus kernel: <ffffffff8026cba7>{vfs_lstat_fd+21} <ffffffff8049c0d2>{__sched_text_start+290} Jul 30 16:04:53 odysseus kernel: <ffffffff8026cd92>{sys_newlstat+25} <ffffffff80265023>{vfs_write+283} Jul 30 16:04:53 odysseus kernel: <ffffffff8026554c>{sys_write+69} <ffffffff80209826>{system_call+126} Jul 30 16:04:53 odysseus kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This is 2.6.17.7 dual x86_64 (Fedora Core 5). It has been unfortunately running 2.6.17.1 for some time. I will probably have to recreate the volume and restore its contents from backups. Or is there any better solution? Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E | | http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ | > I will never go to meetings again because I think face to face meetings < > are the biggest waste of time you can ever have. --Linus Torvalds < ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-31 16:25 ` Jan Kasprzak @ 2006-07-31 16:38 ` Justin Piszcz 2006-08-02 4:32 ` Nathan Scott 1 sibling, 0 replies; 45+ messages in thread From: Justin Piszcz @ 2006-07-31 16:38 UTC (permalink / raw) To: Jan Kasprzak; +Cc: Nathan Scott, linux-kernel, xfs On Mon, 31 Jul 2006, Jan Kasprzak wrote: > Nathan Scott wrote: > : I've captured the state of this issue here, with options and ways > : to correct the problem: > : http://oss.sgi.com/projects/xfs/faq.html#dir2 > : > : Hope this helps. > > I have been hit with this bug as well - I tried to clear the > two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran > xfs_repair (lots of files ended up in lost+found), but apparently > the volume is still not OK - when I tried to use it (this volume > is a public FTP archive), I got the following traces: > > Jul 30 16:04:49 odysseus kernel: Filesystem "md5": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff80324221 > Jul 30 16:04:49 odysseus kernel: > Jul 30 16:04:49 odysseus kernel: Call Trace: <ffffffff803331ac>{xfs_corruption_error+228} > Jul 30 16:04:49 odysseus kernel: <ffffffff8035630e>{kmem_zone_alloc+86} <ffffffff803240f0>{xfs_da_do_buf+1359} > Jul 30 16:04:49 odysseus kernel: <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff80323aba>{xfs_da_buf_make+31} > Jul 30 16:04:49 odysseus kernel: <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff803263e7>{xfs_da_node_lookup_int+112} > Jul 30 16:04:49 odysseus kernel: <ffffffff803263e7>{xfs_da_node_lookup_int+112} <ffffffff8032c7b8>{xfs_dir2_node_lookup+70} > Jul 30 16:04:50 odysseus kernel: <ffffffff80327b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256} > Jul 30 16:04:51 odysseus kernel: <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79} > Jul 30 16:04:51 odysseus kernel: <ffffffff8035c95a>{xfs_vn_lo7b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256} > Jul 30 16:04:52 odysseus kernel: <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79} > Jul 30 16:04:53 odysseus kernel: <ffffffff8035c95a>{xfs_vn_lookup+48} <ffffffff80270b45>{do_lookup+196} > Jul 30 16:04:53 odysseus rpc.statd[3145]: Caught signal 15, un-registering and exiting. > Jul 30 16:04:53 odysseus kernel: <ffffffff802729c6>{__link_path_walk+2435} <ffffffff80272f40>{link_path_walk+89} > Jul 30 16:04:53 odysseus kernel: <ffffffff8049c0d2>{__sched_text_start+290} <ffffffff80273396>{do_path_lookup+614} > Jul 30 16:04:53 odysseus kernel: <ffffffff80271e47>{getname+347} <ffffffff80273bd6>{__user_walk_fd+55} > Jul 30 16:04:53 odysseus kernel: <ffffffff8026cba7>{vfs_lstat_fd+21} <ffffffff8049c0d2>{__sched_text_start+290} > Jul 30 16:04:53 odysseus kernel: <ffffffff8026cd92>{sys_newlstat+25} <ffffffff80265023>{vfs_write+283} > Jul 30 16:04:53 odysseus kernel: <ffffffff8026554c>{sys_write+69} <ffffffff80209826>{system_call+126} > Jul 30 16:04:53 odysseus kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > > This is 2.6.17.7 dual x86_64 (Fedora Core 5). It has been unfortunately > running 2.6.17.1 for some time. > > I will probably have to recreate the volume and restore its > contents from backups. Or is there any better solution? > > Thanks, > > -Yenya > > -- > | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | > | GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E | > | http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ | >> I will never go to meetings again because I think face to face meetings < >> are the biggest waste of time you can ever have. --Linus Torvalds < > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > If you unmount, xfs_repair -n /dev/md5, what does it show currently? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: FAQ updated (was Re: XFS breakage...) 2006-07-31 16:25 ` Jan Kasprzak 2006-07-31 16:38 ` Justin Piszcz @ 2006-08-02 4:32 ` Nathan Scott 1 sibling, 0 replies; 45+ messages in thread From: Nathan Scott @ 2006-08-02 4:32 UTC (permalink / raw) To: Jan Kasprzak; +Cc: linux-kernel, xfs On Mon, Jul 31, 2006 at 06:25:35PM +0200, Jan Kasprzak wrote: > Nathan Scott wrote: > : I've captured the state of this issue here, with options and ways > : to correct the problem: > : http://oss.sgi.com/projects/xfs/faq.html#dir2 > : > : Hope this helps. > > I have been hit with this bug as well - I tried to clear the > two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran > xfs_repair (lots of files ended up in lost+found), but apparently > the volume is still not OK - when I tried to use it (this volume > is a public FTP archive), I got the following traces: There is now a fixed version of xfs_repair available - its in xfsprogs-2.8.10, source is on oss.sgi.com in the XFS ftp area. A number of people have reported success with Barry's earlier patch, noone's reported anything bad, so 2.8.10 is out now with the fix merged. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:57 ` Nathan Scott 2006-07-19 8:08 ` Alistair John Strachan 2006-07-19 10:21 ` Kasper Sandberg @ 2006-07-19 21:14 ` Torsten Landschoff 2006-07-19 23:09 ` Nathan Scott 2006-07-22 16:27 ` Christian Kujau 3 siblings, 1 reply; 45+ messages in thread From: Torsten Landschoff @ 2006-07-19 21:14 UTC (permalink / raw) To: Nathan Scott; +Cc: linux-kernel, xfs [-- Attachment #1: Type: text/plain, Size: 719 bytes --] Hi Nathan, On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote: > I suspect you had some residual directory corruption from using the > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > fixed in the latest -stable point release). That probably the cause of my problem. Thanks for the info! BTW: I think there was nothing important on the broken filesystems, but I'd like to keep what's still there anyway just in case... How would you suggest should I copy that data? I fear, just mounting and using cp might break and shutdown the FS again, would xfsdump be more appropriate? Thanks for XFS, I am using it for years in production servers! Greetings Torsten [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 21:14 ` XFS breakage in 2.6.18-rc1 Torsten Landschoff @ 2006-07-19 23:09 ` Nathan Scott 0 siblings, 0 replies; 45+ messages in thread From: Nathan Scott @ 2006-07-19 23:09 UTC (permalink / raw) To: Torsten Landschoff; +Cc: linux-kernel, xfs On Wed, Jul 19, 2006 at 11:14:02PM +0200, Torsten Landschoff wrote: > On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote: > > I suspect you had some residual directory corruption from using the > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue, > > fixed in the latest -stable point release). > > That probably the cause of my problem. Thanks for the info! > > BTW: I think there was nothing important on the broken filesystems, but > I'd like to keep what's still there anyway just in case... How would you > suggest should I copy that data? I fear, just mounting and using cp > might break and shutdown the FS again, would xfsdump be more > appropriate? Yeah, xfsdumps not a bad idea, the interfaces it uses may well be able to avoid the cases that trigger shutdown. Otherwise it is a case of identifying the problem directory inode (the inum is reported in the shutdown trace) and avoiding that path when cp'ing - you can match inum to path via xfs_ncheck. > Thanks for XFS, I am using it for years in production servers! Thanks for the kind words, they're much appreciated at times like these. :-] cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:57 ` Nathan Scott ` (2 preceding siblings ...) 2006-07-19 21:14 ` XFS breakage in 2.6.18-rc1 Torsten Landschoff @ 2006-07-22 16:27 ` Christian Kujau 2006-07-23 23:01 ` Nathan Scott 3 siblings, 1 reply; 45+ messages in thread From: Christian Kujau @ 2006-07-22 16:27 UTC (permalink / raw) To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs Hi folks, On Wed, 19 Jul 2006, Nathan Scott wrote: > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > mkfs and restore? Or at least get a full repair run? If you did, > and you still see issues in .18-rc1, please let me know asap. well, at least for me, corruption/errors *started* with 2.6.18-rc1: http://oss.sgi.com/archives/xfs/2006-07/msg00151.html I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to 2.6.18-rc2 and see the same errors: xfs_da_do_buf: bno 16777216 dir: inode 24472381 Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c. Caller 0xc0219230 Filesystem "md0": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xc024d717 Please see the whole error/.config/logs here: http://nerdbynature.de/bits/2.6.18-rc2/ Thanks, Christian. -- BOFH excuse #38: secretary plugged hairdryer into UPS ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-22 16:27 ` Christian Kujau @ 2006-07-23 23:01 ` Nathan Scott 2006-07-28 17:01 ` Christian Kujau 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-23 23:01 UTC (permalink / raw) To: Christian Kujau; +Cc: linux-kernel, xfs On Sat, Jul 22, 2006 at 05:27:24PM +0100, Christian Kujau wrote: > On Wed, 19 Jul 2006, Nathan Scott wrote: > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you > > mkfs and restore? Or at least get a full repair run? If you did, > > and you still see issues in .18-rc1, please let me know asap. > > well, at least for me, corruption/errors *started* with 2.6.18-rc1: > ... > I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to > 2.6.18-rc2 and see the same errors: > > xfs_da_do_buf: bno 16777216 > dir: inode 24472381 This is an ondisk corruption - downgrading the kernel will not resolve it. The problem must be triggered by a combination of operations on a directory; I'm certain that if you access inode 24472381 on your filesystem on 2.6.17, that it'll shutdown your filesystem too. See the FAQ entry for a description on how to translate inums to paths, and also the repair -n step to detect any corruption ondisk. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-23 23:01 ` Nathan Scott @ 2006-07-28 17:01 ` Christian Kujau 2006-07-28 21:48 ` Nathan Scott 0 siblings, 1 reply; 45+ messages in thread From: Christian Kujau @ 2006-07-28 17:01 UTC (permalink / raw) To: Nathan Scott; +Cc: linux-kernel, xfs Hello again, On Mon, 24 Jul 2006, Nathan Scott wrote: > filesystem too. See the FAQ entry for a description on how to > translate inums to paths, and also the repair -n step to detect > any corruption ondisk. I had two xfs filesystems and I first noticed that /data/Scratch was befallen from this bug. I did not care much about this (hence the name :)) and I wanted to postpone the xfs_db surgery. Unfortunately I forgot that "/" was also an XFS and it crashed yesterday. remounting ro helped a bit (so no process attempted to write on it. however, cp'ing from the ro-mounted xfs sometimes hung, unkillable), I setup a mini-root somewhere else and followed the instructions in the FAQ. It did not go too well, lots of stuff was moved to lost+found, but every subsequent xfs_repair run found more and more errors. I decided to mkfs the partition and make use of my backups. my other "scratch" partition is still XFS but mounted ro and I'll try the xfsprogs fixes Nathan published on this one. Oh, and I dd'ed the corrupt xfs-filesystem to a file, so I can play around with this one as well. If anyone is interested, here are the typescripts from the horrible xfs_repair runs: http://nerdbynature.de/bits/2.6.18-rc2/log/ cheers, Christian. -- BOFH excuse #21: POSIX compliance problem ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-28 17:01 ` Christian Kujau @ 2006-07-28 21:48 ` Nathan Scott 2006-07-29 20:22 ` Ralf Hildebrandt 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-28 21:48 UTC (permalink / raw) To: Christian Kujau; +Cc: linux-kernel, xfs On Fri, Jul 28, 2006 at 05:01:24PM +0000, Christian Kujau wrote: > I had two xfs filesystems and I first noticed that /data/Scratch was > befallen from this bug. I did not care much about this (hence the > name :)) and I wanted to postpone the xfs_db surgery. > ... > found more and more errors. I decided to mkfs the partition and make use > of my backups. my other "scratch" partition is still XFS but mounted ro > and I'll try the xfsprogs fixes Nathan published on this one. Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com list yesterday; please give that a go and let us know how it fares. cheers. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-28 21:48 ` Nathan Scott @ 2006-07-29 20:22 ` Ralf Hildebrandt 2006-07-29 22:28 ` David Chatterton 0 siblings, 1 reply; 45+ messages in thread From: Ralf Hildebrandt @ 2006-07-29 20:22 UTC (permalink / raw) To: Nathan Scott; +Cc: Christian Kujau, linux-kernel, xfs * Nathan Scott <nathans@sgi.com>: > Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com > list yesterday; please give that a go and let us know how it fares. Just to let you know, I did a cvs checkout of xfs-cmds as described on http://oss.sgi.com/projects/xfs/source.html Then I saved the patch from http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the "Original" link on hat page. I build a xfs_Repair binary using that, transferred it onto an old KLAX boot cd I had and repaired the XFS root on my laptop. I got 5000 files in lost and found, mostly the whole manpages from my system. Had to reinstall a few packages to restore lost binaries, but that's all. When will that horrible bug be fixed in 2.6.x? -- Ralf Hildebrandt (i.A. des IT-Zentrums) Ralf.Hildebrandt@charite.de Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155 Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962 IT-Zentrum Standort CBF send no mail to spamtrap@charite.de ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-29 20:22 ` Ralf Hildebrandt @ 2006-07-29 22:28 ` David Chatterton 0 siblings, 0 replies; 45+ messages in thread From: David Chatterton @ 2006-07-29 22:28 UTC (permalink / raw) To: Nathan Scott, Christian Kujau, linux-kernel, xfs Ralf Hildebrandt wrote: > * Nathan Scott <nathans@sgi.com>: > >> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com >> list yesterday; please give that a go and let us know how it fares. > > Just to let you know, I did a cvs checkout of xfs-cmds > as described on http://oss.sgi.com/projects/xfs/source.html > > Then I saved the patch from > http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the > "Original" link on hat page. > > I build a xfs_Repair binary using that, transferred it onto an old > KLAX boot cd I had and repaired the XFS root on my laptop. > > I got 5000 files in lost and found, mostly the whole manpages from my > system. Had to reinstall a few packages to restore lost binaries, but > that's all. > > When will that horrible bug be fixed in 2.6.x? > The bug is fixed in 2.6.17.7. David ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff 2006-07-18 22:57 ` Nathan Scott @ 2006-07-18 23:06 ` Kevin Radloff 1 sibling, 0 replies; 45+ messages in thread From: Kevin Radloff @ 2006-07-18 23:06 UTC (permalink / raw) To: Torsten Landschoff; +Cc: linux-kernel On 7/18/06, Torsten Landschoff <torsten@debian.org> wrote: > Hi friends, > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken > from my /var/log/kern.log), which ultimately led me to reinstall my > system: [snip] > That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny > enough this happened a few months after finally replaced my ancient disk > with a RAID1 array to make sure I do not lose data ;) > > > In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken. > After booting into 2.6.17 again, I could use /usr again but random files > contain null bytes, firefox segfaults instead of starting up and a number > of programs fail in mysterious ways. I tried to recover using xfs_repair > but I feel that my partition is thorougly borked. Of course no data was > lost due to backups but still I'd like this bug to be fixed ;-) > > If more information from my logs is required, I can make it available (and any > part of the partition if required). That looks like the death knell of my /, which succumbed on Friday as a result (I believe) of the corruption bug that was in 2.6.16/17. Ironically enough, I also saw the problem during an aptitude upgrade. Also see this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2 -- Kevin 'radsaq' Radloff radsaq@gmail.com http://thesaq.com/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
@ 2006-07-19 14:17 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 21:09 ` Torsten Landschoff
0 siblings, 2 replies; 45+ messages in thread
From: Mattias Hedenskog @ 2006-07-19 14:17 UTC (permalink / raw)
To: linux-kernel
> That looks like the death knell of my /, which succumbed on Friday as
> a result (I believe) of the corruption bug that was in 2.6.16/17.
> Ironically enough, I also saw the problem during an aptitude upgrade.
Hi all,
I just want to confirm this bug as well and unfortunately it was my
system disk too who had to take the hit. Im running 2.6.16 and its
reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
the fs I got the same error as in the previous post, running xfsprogs
2.8.4. I haven't had the time to debug this issue further because the
box is quite critical but I'll keep an eye on the other disks on the
system still running xfs.
Regards,
Mattias Hedenskog
^ permalink raw reply [flat|nested] 45+ messages in thread* Re: XFS breakage in 2.6.18-rc1 2006-07-19 14:17 Mattias Hedenskog @ 2006-07-19 14:59 ` Jeffrey E. Hundstad 2006-07-19 23:01 ` Nathan Scott 2006-07-19 21:09 ` Torsten Landschoff 1 sibling, 1 reply; 45+ messages in thread From: Jeffrey E. Hundstad @ 2006-07-19 14:59 UTC (permalink / raw) To: Mattias Hedenskog; +Cc: linux-kernel I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it annihilated the volume. This volume was not showing signs of crashing. So... I guess I would certainly not run xfs_repair unless there is good reason. -- Jeffrey Hundstad PS. ...yes, I had a recent backup ;-) Mattias Hedenskog wrote: >> That looks like the death knell of my /, which succumbed on Friday as >> a result (I believe) of the corruption bug that was in 2.6.16/17. >> Ironically enough, I also saw the problem during an aptitude upgrade. > > Hi all, > > I just want to confirm this bug as well and unfortunately it was my > system disk too who had to take the hit. Im running 2.6.16 and its > reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair > the fs I got the same error as in the previous post, running xfsprogs > 2.8.4. I haven't had the time to debug this issue further because the > box is quite critical but I'll keep an eye on the other disks on the > system still running xfs. > > Regards, > Mattias Hedenskog > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 14:59 ` Jeffrey E. Hundstad @ 2006-07-19 23:01 ` Nathan Scott 2006-07-20 5:51 ` Jeffrey Hundstad 0 siblings, 1 reply; 45+ messages in thread From: Nathan Scott @ 2006-07-19 23:01 UTC (permalink / raw) To: Jeffrey E. Hundstad; +Cc: Mattias Hedenskog, linux-kernel, xfs On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote: > I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it > annihilated the volume. This volume was not showing signs of crashing. > So... I guess I would certainly not run xfs_repair unless there is good > reason. Erm, wha..? Can you expand on "annihilated" a bit? (please send me the full xfs_repair output if you still have it). thanks. -- Nathan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 23:01 ` Nathan Scott @ 2006-07-20 5:51 ` Jeffrey Hundstad 0 siblings, 0 replies; 45+ messages in thread From: Jeffrey Hundstad @ 2006-07-20 5:51 UTC (permalink / raw) To: Nathan Scott; +Cc: Mattias Hedenskog, linux-kernel, xfs Nathan Scott wrote: > On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote: > >> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it >> annihilated the volume. This volume was not showing signs of crashing. >> So... I guess I would certainly not run xfs_repair unless there is good >> reason. >> > > Erm, wha..? Can you expand on "annihilated" a bit? (please send > me the full xfs_repair output if you still have it). > Nathan Scott, I'm very sorry; I don't have the output anymore. By annihilated I mean that there were several directories trees that /didn't work/. If you tried to cd into the directory or take a directory listing... or used a file that you knew was in these certain directories then you'd get pages of debug message to the console; and no usable data. I re-ran xfs_repair and retried several times but the condition never seemed to improve or get worse for that matter. I /incorrectly/ figured it was a known issue or I'd have saved the output. Sorry again. -- Jeffrey Hundstad ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 14:17 Mattias Hedenskog 2006-07-19 14:59 ` Jeffrey E. Hundstad @ 2006-07-19 21:09 ` Torsten Landschoff 2006-07-20 10:46 ` Jan Engelhardt 1 sibling, 1 reply; 45+ messages in thread From: Torsten Landschoff @ 2006-07-19 21:09 UTC (permalink / raw) To: Mattias Hedenskog; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 902 bytes --] On Wed, Jul 19, 2006 at 04:17:50PM +0200, Mattias Hedenskog wrote: > reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair > the fs I got the same error as in the previous post, running xfsprogs > 2.8.4. I haven't had the time to debug this issue further because the > box is quite critical but I'll keep an eye on the other disks on the > system still running xfs. I would not try running xfs_repair without cause as well. My /home did survive the XFS problems but I ran xfs_repair "just to be sure". Now the same problem on that partition, mostly unreadable. :( So, do not run xfs_repair without a cause ;-) For reference, I think it was xfsprogs 2.7.14 that I was using, the latest in Debian. FYI: Nothing important on /home, I think - I can not be sure since I backup only selectively since I do not have proper backup mediums :( Greetings Torsten [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: XFS breakage in 2.6.18-rc1 2006-07-19 21:09 ` Torsten Landschoff @ 2006-07-20 10:46 ` Jan Engelhardt 0 siblings, 0 replies; 45+ messages in thread From: Jan Engelhardt @ 2006-07-20 10:46 UTC (permalink / raw) To: Torsten Landschoff; +Cc: Mattias Hedenskog, linux-kernel > >> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair >> the fs I got the same error as in the previous post, running xfsprogs >> 2.8.4. I haven't had the time to debug this issue further because the >> box is quite critical but I'll keep an eye on the other disks on the >> system still running xfs. I think my experience is worth too: The (that is, of one box) xfs filesystem was created IIRC under 2.6.16, and survived throughout 2.6.17 and 2.6.18-rc1 so far... Jan Engelhardt -- ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2006-08-02 4:32 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff 2006-07-18 22:57 ` Nathan Scott 2006-07-19 8:08 ` Alistair John Strachan 2006-07-19 22:56 ` Nathan Scott 2006-07-20 10:29 ` Kasper Sandberg 2006-07-19 10:21 ` Kasper Sandberg 2006-07-19 12:43 ` Alistair John Strachan 2006-07-19 15:25 ` Kasper Sandberg 2006-07-19 22:59 ` Nathan Scott 2006-07-20 7:13 ` FAQ updated (was Re: XFS breakage...) Nathan Scott 2006-07-20 12:42 ` Hans-Peter Jansen 2006-07-20 13:28 ` David Greaves 2006-07-20 16:11 ` Chris Wedgwood 2006-07-20 22:14 ` Nathan Scott 2006-07-20 22:18 ` Justin Piszcz 2006-07-20 22:24 ` Nathan Scott 2006-07-20 22:43 ` Justin Piszcz 2006-07-20 22:52 ` Nathan Scott 2006-07-20 22:55 ` Justin Piszcz 2006-07-20 22:57 ` Justin Piszcz 2006-07-20 23:00 ` Nathan Scott 2006-07-20 23:10 ` Justin Piszcz 2006-07-20 23:12 ` Chris Wedgwood 2006-07-20 23:15 ` Justin Piszcz 2006-07-20 23:19 ` Nathan Scott 2006-07-20 15:13 ` Kevin Radloff 2006-07-20 16:51 ` Alistair John Strachan 2006-07-31 16:25 ` Jan Kasprzak 2006-07-31 16:38 ` Justin Piszcz 2006-08-02 4:32 ` Nathan Scott 2006-07-19 21:14 ` XFS breakage in 2.6.18-rc1 Torsten Landschoff 2006-07-19 23:09 ` Nathan Scott 2006-07-22 16:27 ` Christian Kujau 2006-07-23 23:01 ` Nathan Scott 2006-07-28 17:01 ` Christian Kujau 2006-07-28 21:48 ` Nathan Scott 2006-07-29 20:22 ` Ralf Hildebrandt 2006-07-29 22:28 ` David Chatterton 2006-07-18 23:06 ` Kevin Radloff -- strict thread matches above, loose matches on Subject: below -- 2006-07-19 14:17 Mattias Hedenskog 2006-07-19 14:59 ` Jeffrey E. Hundstad 2006-07-19 23:01 ` Nathan Scott 2006-07-20 5:51 ` Jeffrey Hundstad 2006-07-19 21:09 ` Torsten Landschoff 2006-07-20 10:46 ` Jan Engelhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox