XFS breakage in 2.6.18-rc1

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* XFS breakage in 2.6.18-rc1
@ 2006-07-18 22:29 Torsten Landschoff
  2006-07-18 22:57 ` Nathan Scott
  2006-07-18 23:06 ` Kevin Radloff
  0 siblings, 2 replies; 45+ messages in thread
From: Torsten Landschoff @ 2006-07-18 22:29 UTC (permalink / raw)
  To: linux-kernel

Hi friends, 

I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
from my /var/log/kern.log), which ultimately led me to reinstall my 
system:

Jul 17 07:10:12 pulsar kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
Jul 17 07:10:12 pulsar kernel: Linux version 2.6.18-rc1 (torsten@pulsar) (gcc version 4.1.2 20060630 (prerelease) (Debian 4.1.1-6)) #18 SMP PREEMPT Fri Jul 14 07:58:49 CEST 2006
...
Jul 17 07:10:32 pulsar kernel: agpgart: Putting AGP V3 device at 0000:03:00.0 into 4x mode
Jul 17 07:10:32 pulsar kernel: [drm] Setting GART location based on new memory map
Jul 17 07:10:32 pulsar kernel: [drm] Loading R200 Microcode
Jul 17 07:10:32 pulsar kernel: [drm] writeback test succeeded in 1 usecs
Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c.  Caller 0xf8a837d0
Jul 17 07:33:53 pulsar kernel:  [<f8a83313>] xfs_da_do_buf+0x4d3/0x900 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a89215>] xfs_dir2_data_log_unused+0x55/0x70 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a87337>] xfs_dir_removename+0xf7/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a9720d>] xfs_ilock_nowait+0xcd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab9783>] xfs_remove+0x393/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel:  [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel:  [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel:  [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel:  [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel:  [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel:  [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xf8ab97d7
Jul 17 07:33:53 pulsar kernel:  [<f8aaf91d>] xfs_trans_cancel+0xdd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel:  [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel:  [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel:  [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel:  [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel:  [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel:  [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: xfs_force_shutdown(dm-6,0x8) called from line 1139 of file fs/xfs/xfs_trans.c.  Return address = 0xf8ac77bc
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": Corruption of in-memory data detected.  Shutting down filesystem: dm-6
Jul 17 07:33:53 pulsar kernel: Please umount the filesystem, and rectify the problem(s)
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 32K
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 8K

That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
enough this happened a few months after finally replaced my ancient disk
with a RAID1 array to make sure I do not lose data ;)


In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
After booting into 2.6.17 again, I could use /usr again but random files
contain null bytes, firefox segfaults instead of starting up and a number 
of programs fail in mysterious ways. I tried to recover using xfs_repair
but I feel that my partition is thorougly borked. Of course no data was 
lost due to backups but still I'd like this bug to be fixed ;-)

If more information from my logs is required, I can make it available (and any
part of the partition if required).

Greetings

	Torsten

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff
@ 2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
                     ` (3 more replies)
  2006-07-18 23:06 ` Kevin Radloff
  1 sibling, 4 replies; 45+ messages in thread
From: Nathan Scott @ 2006-07-18 22:57 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel, xfs

On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> Hi friends, 

Hi Torsten,

> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my 
> system:
> 
> Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> Jul 17 07:33:53 pulsar kernel: dir: inode 54526538

I suspect you had some residual directory corruption from using the
2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
fixed in the latest -stable point release).

> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was 
> lost due to backups but still I'd like this bug to be fixed ;-)

2.6.18-rc1 should be fine (contains the corruption fix).  Did you
mkfs and restore?  Or at least get a full repair run?  If you did,
and you still see issues in .18-rc1, please let me know asap.

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff
  2006-07-18 22:57 ` Nathan Scott
@ 2006-07-18 23:06 ` Kevin Radloff
  1 sibling, 0 replies; 45+ messages in thread
From: Kevin Radloff @ 2006-07-18 23:06 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel

On 7/18/06, Torsten Landschoff <torsten@debian.org> wrote:
> Hi friends,
>
> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my
> system:
[snip]
> That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
> enough this happened a few months after finally replaced my ancient disk
> with a RAID1 array to make sure I do not lose data ;)
>
>
> In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
> After booting into 2.6.17 again, I could use /usr again but random files
> contain null bytes, firefox segfaults instead of starting up and a number
> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was
> lost due to backups but still I'd like this bug to be fixed ;-)
>
> If more information from my logs is required, I can make it available (and any
> part of the partition if required).

That looks like the death knell of my /, which succumbed on Friday as
a result (I believe) of the corruption bug that was in 2.6.16/17.
Ironically enough, I also saw the problem during an aptitude upgrade.

Also see this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2

-- 
Kevin 'radsaq' Radloff
radsaq@gmail.com
http://thesaq.com/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
@ 2006-07-19  8:08   ` Alistair John Strachan
  2006-07-19 22:56     ` Nathan Scott
  2006-07-19 10:21   ` Kasper Sandberg
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Alistair John Strachan @ 2006-07-19  8:08 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
[snip]
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was
> > lost due to backups but still I'd like this bug to be fixed ;-)
>
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.

Just out of interest, I've got a few XFS volumes that were created 24 months 
ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
any crashes so far.

Assuming I get the newest XFS repair tools on there, what's the disadvantage 
of repairing versus creating a new filesystem? What special circumstances are 
required to cause a crash?

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
@ 2006-07-19 10:21   ` Kasper Sandberg
  2006-07-19 12:43     ` Alistair John Strachan
                       ` (2 more replies)
  2006-07-19 21:14   ` XFS breakage in 2.6.18-rc1 Torsten Landschoff
  2006-07-22 16:27   ` Christian Kujau
  3 siblings, 3 replies; 45+ messages in thread
From: Kasper Sandberg @ 2006-07-19 10:21 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > Hi friends, 
> 
> Hi Torsten,
> 
> > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > from my /var/log/kern.log), which ultimately led me to reinstall my 
> > system:
> > 
> > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> 
> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).
This has me very worried.

i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
does this mean my .17-rc3 may have corrupted my filesystem?

what action do you suggest i do now?

> 
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was 
> > lost due to backups but still I'd like this bug to be fixed ;-)
> 
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.
> 
> thanks.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 10:21   ` Kasper Sandberg
@ 2006-07-19 12:43     ` Alistair John Strachan
  2006-07-19 15:25       ` Kasper Sandberg
  2006-07-19 22:59     ` Nathan Scott
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
  2 siblings, 1 reply; 45+ messages in thread
From: Alistair John Strachan @ 2006-07-19 12:43 UTC (permalink / raw)
  To: Kasper Sandberg; +Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs

On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends,
> >
> > Hi Torsten,
> >
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > system:
> > >
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> >
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
>
> This has me very worried.
>
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
>
> what action do you suggest i do now?
>
> > > of programs fail in mysterious ways. I tried to recover using
> > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > no data was lost due to backups but still I'd like this bug to be fixed
> > > ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> >
> > thanks.

According to another thread Nathan just responded to, it sounds like we need 
to wait for a new version of the xfsprogs package, and then run xfs_repair on 
the affected filesystems. I wouldn't worry about it too much if you've not 
had any crashes. The damage can be repaired, just not right now.

I'm still waiting for a crash on a machine that has been under heavy load for 
28 days, so it's obviously not _that_ easy to trigger.

-- 
Cheers,
Alistair.

Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
@ 2006-07-19 14:17 Mattias Hedenskog
  2006-07-19 14:59 ` Jeffrey E. Hundstad
  2006-07-19 21:09 ` Torsten Landschoff
  0 siblings, 2 replies; 45+ messages in thread
From: Mattias Hedenskog @ 2006-07-19 14:17 UTC (permalink / raw)
  To: linux-kernel

> That looks like the death knell of my /, which succumbed on Friday as
> a result (I believe) of the corruption bug that was in 2.6.16/17.
> Ironically enough, I also saw the problem during an aptitude upgrade.

Hi all,

I just want to confirm this bug as well and unfortunately it was my
system disk too who had to take the hit. Im running 2.6.16 and its
reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
the fs I got the same error as in the previous post, running xfsprogs
2.8.4. I haven't had the time to debug this issue further because the
box is quite critical but I'll keep an eye on the other disks on the
system still running xfs.

Regards,
Mattias Hedenskog

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:17 Mattias Hedenskog
@ 2006-07-19 14:59 ` Jeffrey E. Hundstad
  2006-07-19 23:01   ` Nathan Scott
  2006-07-19 21:09 ` Torsten Landschoff
  1 sibling, 1 reply; 45+ messages in thread
From: Jeffrey E. Hundstad @ 2006-07-19 14:59 UTC (permalink / raw)
  To: Mattias Hedenskog; +Cc: linux-kernel

I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
annihilated the volume.  This volume was not showing signs of crashing.  
So... I guess I would certainly not run xfs_repair unless there is good 
reason.

-- 
Jeffrey Hundstad
PS. ...yes, I had a recent backup ;-)

Mattias Hedenskog wrote:
>> That looks like the death knell of my /, which succumbed on Friday as
>> a result (I believe) of the corruption bug that was in 2.6.16/17.
>> Ironically enough, I also saw the problem during an aptitude upgrade.
>
> Hi all,
>
> I just want to confirm this bug as well and unfortunately it was my
> system disk too who had to take the hit. Im running 2.6.16 and its
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
>
> Regards,
> Mattias Hedenskog
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 12:43     ` Alistair John Strachan
@ 2006-07-19 15:25       ` Kasper Sandberg
  0 siblings, 0 replies; 45+ messages in thread
From: Kasper Sandberg @ 2006-07-19 15:25 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs

On Wed, 2006-07-19 at 13:43 +0100, Alistair John Strachan wrote:
> On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > > Hi friends,
> > >
> > > Hi Torsten,
> > >
> > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > > system:
> > > >
> > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > >
> > > I suspect you had some residual directory corruption from using the
> > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > > fixed in the latest -stable point release).
> >
> > This has me very worried.
> >
> > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> > does this mean my .17-rc3 may have corrupted my filesystem?
> >
> > what action do you suggest i do now?
> >
> > > > of programs fail in mysterious ways. I tried to recover using
> > > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > > no data was lost due to backups but still I'd like this bug to be fixed
> > > > ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > > mkfs and restore?  Or at least get a full repair run?  If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> > >
> > > thanks.
> 
> According to another thread Nathan just responded to, it sounds like we need 
> to wait for a new version of the xfsprogs package, and then run xfs_repair on 
> the affected filesystems. I wouldn't worry about it too much if you've not 
> had any crashes. The damage can be repaired, just not right now.
without ANY loss? because even though it would be abit painful for me to
do, i do have the option of smashing in a new drive, copy everything,
and reinitialize my filesystem.
> 
> I'm still waiting for a crash on a machine that has been under heavy load for 
> 28 days, so it's obviously not _that_ easy to trigger.
so basically if i upgrade to a safe kernel before i do get these errors,
im good?


> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:17 Mattias Hedenskog
  2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 21:09 ` Torsten Landschoff
  2006-07-20 10:46   ` Jan Engelhardt
  1 sibling, 1 reply; 45+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:09 UTC (permalink / raw)
  To: Mattias Hedenskog; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On Wed, Jul 19, 2006 at 04:17:50PM +0200, Mattias Hedenskog wrote:
 
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
 
I would not try running xfs_repair without cause as well. My /home did
survive the XFS problems but I ran xfs_repair "just to be sure". Now the 
same problem on that partition, mostly unreadable. :( So, do not run 
xfs_repair without a cause ;-)

For reference, I think it was xfsprogs 2.7.14 that I was using, the 
latest in Debian.

FYI: Nothing important on /home, I think - I can not be sure since I
backup only selectively since I do not have proper backup mediums :(

Greetings

	Torsten

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
  2006-07-19 10:21   ` Kasper Sandberg
@ 2006-07-19 21:14   ` Torsten Landschoff
  2006-07-19 23:09     ` Nathan Scott
  2006-07-22 16:27   ` Christian Kujau
  3 siblings, 1 reply; 45+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:14 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

Hi Nathan, 

On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:

> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).

That probably the cause of my problem. Thanks for the info!

BTW: I think there was nothing important on the broken filesystems, but
I'd like to keep what's still there anyway just in case... How would you
suggest should I copy that data? I fear, just mounting and using cp 
might break and shutdown the FS again, would xfsdump be more
appropriate?

Thanks for XFS, I am using it for years in production servers!

Greetings

	Torsten

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19  8:08   ` Alistair John Strachan
@ 2006-07-19 22:56     ` Nathan Scott
  2006-07-20 10:29       ` Kasper Sandberg
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-19 22:56 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> [snip]
> > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > but I feel that my partition is thorougly borked. Of course no data was
> > > lost due to backups but still I'd like this bug to be fixed ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> 
> Just out of interest, I've got a few XFS volumes that were created 24 months 
> ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
> any crashes so far.
> 
> Assuming I get the newest XFS repair tools on there, what's the disadvantage 
> of repairing versus creating a new filesystem? What special circumstances are 
> required to cause a crash?

There should be no disadvantage to repairing.  I will update the FAQ
shortly to describe all the details of the problem, recommendations
on how to address it, which kernel version is affected, etc.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 10:21   ` Kasper Sandberg
  2006-07-19 12:43     ` Alistair John Strachan
@ 2006-07-19 22:59     ` Nathan Scott
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
  2 siblings, 0 replies; 45+ messages in thread
From: Nathan Scott @ 2006-07-19 22:59 UTC (permalink / raw)
  To: Kasper Sandberg; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends, 
> > 
> > Hi Torsten,
> > 
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my 
> > > system:
> > > 
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > 
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
> This has me very worried.
> 
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
> 
> what action do you suggest i do now?

The odds are decent that you're unaffected.  You can check your filesystem
using xfs_check or xfs_repair -n and these will give you a good indication
as to whether further action is required.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 23:01   ` Nathan Scott
  2006-07-20  5:51     ` Jeffrey Hundstad
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-19 23:01 UTC (permalink / raw)
  To: Jeffrey E. Hundstad; +Cc: Mattias Hedenskog, linux-kernel, xfs

On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
> annihilated the volume.  This volume was not showing signs of crashing.  
> So... I guess I would certainly not run xfs_repair unless there is good 
> reason.

Erm, wha..?  Can you expand on "annihilated" a bit?  (please send
me the full xfs_repair output if you still have it).

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 21:14   ` XFS breakage in 2.6.18-rc1 Torsten Landschoff
@ 2006-07-19 23:09     ` Nathan Scott
  0 siblings, 0 replies; 45+ messages in thread
From: Nathan Scott @ 2006-07-19 23:09 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel, xfs

On Wed, Jul 19, 2006 at 11:14:02PM +0200, Torsten Landschoff wrote:
> On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
> 
> That probably the cause of my problem. Thanks for the info!
> 
> BTW: I think there was nothing important on the broken filesystems, but
> I'd like to keep what's still there anyway just in case... How would you
> suggest should I copy that data? I fear, just mounting and using cp 
> might break and shutdown the FS again, would xfsdump be more
> appropriate?

Yeah, xfsdumps not a bad idea, the interfaces it uses may well
be able to avoid the cases that trigger shutdown.  Otherwise it
is a case of identifying the problem directory inode (the inum
is reported in the shutdown trace) and avoiding that path when
cp'ing - you can match inum to path via xfs_ncheck.

> Thanks for XFS, I am using it for years in production servers!

Thanks for the kind words, they're much appreciated at times
like these. :-]

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 23:01   ` Nathan Scott
@ 2006-07-20  5:51     ` Jeffrey Hundstad
  0 siblings, 0 replies; 45+ messages in thread
From: Jeffrey Hundstad @ 2006-07-20  5:51 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Mattias Hedenskog, linux-kernel, xfs

Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
>   
>> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
>> annihilated the volume.  This volume was not showing signs of crashing.  
>> So... I guess I would certainly not run xfs_repair unless there is good 
>> reason.
>>     
>
> Erm, wha..?  Can you expand on "annihilated" a bit?  (please send
> me the full xfs_repair output if you still have it).
>   

Nathan Scott,

I'm very sorry; I don't have the output anymore.  By annihilated I mean 
that there were several directories trees that /didn't work/.  If you 
tried to cd into the directory or take a directory listing... or used a 
file that you knew was in these certain directories then you'd get pages 
of debug message to the console; and no usable data.  I re-ran 
xfs_repair and retried several times but the condition never seemed to 
improve or get worse for that matter.

I /incorrectly/ figured it was a known issue or I'd have saved the 
output.  Sorry again.

-- 
Jeffrey Hundstad

^ permalink raw reply	[flat|nested] 45+ messages in thread

* FAQ updated (was Re: XFS breakage...)
  2006-07-19 10:21   ` Kasper Sandberg
  2006-07-19 12:43     ` Alistair John Strachan
  2006-07-19 22:59     ` Nathan Scott
@ 2006-07-20  7:13     ` Nathan Scott
  2006-07-20 12:42       ` Hans-Peter Jansen
                         ` (3 more replies)
  2 siblings, 4 replies; 45+ messages in thread
From: Nathan Scott @ 2006-07-20  7:13 UTC (permalink / raw)
  To: Kasper Sandberg, Justin Piszcz, Torsten Landschoff; +Cc: linux-kernel, xfs

On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > 
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > 
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).

Correction there - no -stable exists with this yet, I guess that'll
be 2.6.17.7 once its out though.

> what action do you suggest i do now?

I've captured the state of this issue here, with options and ways
to correct the problem:
	http://oss.sgi.com/projects/xfs/faq.html#dir2

Hope this helps.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 22:56     ` Nathan Scott
@ 2006-07-20 10:29       ` Kasper Sandberg
  0 siblings, 0 replies; 45+ messages in thread
From: Kasper Sandberg @ 2006-07-20 10:29 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Alistair John Strachan, Torsten Landschoff, linux-kernel, xfs

On Thu, 2006-07-20 at 08:56 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> > On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> > [snip]
> > > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > > but I feel that my partition is thorougly borked. Of course no data was
> > > > lost due to backups but still I'd like this bug to be fixed ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > > mkfs and restore?  Or at least get a full repair run?  If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> > 
> > Just out of interest, I've got a few XFS volumes that were created 24 months 
> > ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
> > any crashes so far.
> > 
> > Assuming I get the newest XFS repair tools on there, what's the disadvantage 
> > of repairing versus creating a new filesystem? What special circumstances are 
> > required to cause a crash?
> 
> There should be no disadvantage to repairing.  I will update the FAQ
> shortly to describe all the details of the problem, recommendations
> on how to address it, which kernel version is affected, etc.
this FAQ, is it this: http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
(btw, it seems that while only in the TOC once, you have the same about
2.6.17 twice..)..

which version of xfsprogs should i use while doing the xfs_check ?

> 
> cheers.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 21:09 ` Torsten Landschoff
@ 2006-07-20 10:46   ` Jan Engelhardt
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Engelhardt @ 2006-07-20 10:46 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: Mattias Hedenskog, linux-kernel

> 
>> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
>> the fs I got the same error as in the previous post, running xfsprogs
>> 2.8.4. I haven't had the time to debug this issue further because the
>> box is quite critical but I'll keep an eye on the other disks on the
>> system still running xfs.

I think my experience is worth too: The (that is, of one box) xfs 
filesystem was created IIRC under 2.6.16, and survived throughout 2.6.17 
and 2.6.18-rc1 so far...


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
@ 2006-07-20 12:42       ` Hans-Peter Jansen
  2006-07-20 13:28       ` David Greaves
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Hans-Peter Jansen @ 2006-07-20 12:42 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

Hi Nathan,

Am Donnerstag, 20. Juli 2006 09:13 schrieb Nathan Scott:
>
> I've captured the state of this issue here, with options and ways
> to correct the problem:
> 	http://oss.sgi.com/projects/xfs/faq.html#dir2

Thanks for the pointer. I think, it is valuable for all XFS users, but 
reading the FAQ with a decent webbrowser on linux (konqueror 3.5.3 and 
firefox 1.5.0.4 in my case) is very painful, due to the overlong lines, 
not to speak of printing such texts. Try yourself: load that page into 
konqueror, hit print, select preview and 'have fun', hmm, suffer..

Pete

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
  2006-07-20 12:42       ` Hans-Peter Jansen
@ 2006-07-20 13:28       ` David Greaves
  2006-07-20 16:11         ` Chris Wedgwood
  2006-07-20 15:13       ` Kevin Radloff
  2006-07-31 16:25       ` Jan Kasprzak
  3 siblings, 1 reply; 45+ messages in thread
From: David Greaves @ 2006-07-20 13:28 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel,
	xfs, cw, ml, radsaq

Nathan Scott wrote:
> Correction there - no -stable exists with this yet, I guess that'll
> be 2.6.17.7 once its out though.
> 
>> what action do you suggest i do now?
> 
> I've captured the state of this issue here, with options and ways
> to correct the problem:
> 	http://oss.sgi.com/projects/xfs/faq.html#dir2
> 
> Hope this helps.

It does, thanks :)


Does this problem exist in 2.16.6.x??

>From various comments like:
  Unless 2.6.16.x is a dead-end could we please also have this patch put
  into there?
and
  a result (I believe) of the corruption bug that was in 2.6.16/17.
and
  I just want to confirm this bug as well and unfortunately it was my
  system disk too who had to take the hit. Im running 2.6.16
I assume it does.

But the FAQ says:
Q: What is the issue with directory corruption in Linux 2.6.17?
In the Linux kernel 2.6.17 release a subtle bug...

which implies it's not...

HELP

So given this is from 2.6.16.9:
                        /*
                         * One less used entry in the free table.
                         */
                        INT_MOD(free->hdr.nused, ARCH_CONVERT, -1);
                        xfs_dir2_free_log_header(tp, fbp);

and it looks awfully similar to the patch which says:

--- linux-2.6.17.2.orig/fs/xfs/xfs_dir2_node.c
+++ linux-2.6.17.2/fs/xfs/xfs_dir2_node.c
@@ -970,7 +970,7 @@ xfs_dir2_leafn_remove(
 			/*
 			 * One less used entry in the free table.
 			 */
-			free->hdr.nused = cpu_to_be32(-1);
+			be32_add(&free->hdr.nused, -1);
 			xfs_dir2_free_log_header(tp, fbp);

Should 2.6.16.x replace
  INT_MOD(free->hdr.nused, ARCH_CONVERT, -1);
with
  be32_add(&free->hdr.nused, -1);

I hope so because I assumed there simply wasn't a patch for 2.6.16 and
applied this 'best guess' to my servers and rebooted/remounted successfully.

David


-- 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
  2006-07-20 12:42       ` Hans-Peter Jansen
  2006-07-20 13:28       ` David Greaves
@ 2006-07-20 15:13       ` Kevin Radloff
  2006-07-20 16:51         ` Alistair John Strachan
  2006-07-31 16:25       ` Jan Kasprzak
  3 siblings, 1 reply; 45+ messages in thread
From: Kevin Radloff @ 2006-07-20 15:13 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Kasper Sandberg, Justin Piszcz, Torsten Landschoff, linux-kernel,
	xfs

On 7/20/06, Nathan Scott <nathans@sgi.com> wrote:
> On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > >
> > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > >
> > > I suspect you had some residual directory corruption from using the
> > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > > fixed in the latest -stable point release).
>
> Correction there - no -stable exists with this yet, I guess that'll
> be 2.6.17.7 once its out though.
>
> > what action do you suggest i do now?
>
> I've captured the state of this issue here, with options and ways
> to correct the problem:
>         http://oss.sgi.com/projects/xfs/faq.html#dir2
>
> Hope this helps.

I actually tried the xfs_db method to fix my / filesystem (as you had
outlined in http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2),
and while it's quite possible that I screwed it up, after a subsequent
xfs_repair run (which completed successfully and moved lots of stuff
to /lost+found, as I would expect), the XFS code had serious problems
with various parts of my filesystem (like "ls /lost+found", which
would cause lots of errors to be logged, although not a complete fs
shutdown). After another run through xfs_repair resulted in a
filesystem that would no longer even successfully boot.

Unfortunately it was a mostly-full 74GB big-/ partition on my primary
machine (a laptop), so I don't have a dump of it for you and my report
is probably pretty useless. :( But on the bright side, virtually all
of the filesystem was otherwise intact and I was able to get all my
data off before rebuilding my system.

-- 
Kevin 'radsaq' Radloff
radsaq@gmail.com
http://thesaq.com/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 13:28       ` David Greaves
@ 2006-07-20 16:11         ` Chris Wedgwood
  2006-07-20 22:14           ` Nathan Scott
  0 siblings, 1 reply; 45+ messages in thread
From: Chris Wedgwood @ 2006-07-20 16:11 UTC (permalink / raw)
  To: David Greaves
  Cc: Nathan Scott, Kasper Sandberg, Justin Piszcz, Torsten Landschoff,
	linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote:

> Does this problem exist in 2.16.6.x??

The change was merged after 2.6.16.x was branched, I was mistaken
in how long I thought the bug has been about.

> I hope so because I assumed there simply wasn't a patch for 2.6.16 and
> applied this 'best guess' to my servers and rebooted/remounted successfully.

Doing the correct change to 2.6.16.x won't hurt, but it's not
necessary.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 15:13       ` Kevin Radloff
@ 2006-07-20 16:51         ` Alistair John Strachan
  0 siblings, 0 replies; 45+ messages in thread
From: Alistair John Strachan @ 2006-07-20 16:51 UTC (permalink / raw)
  To: Kevin Radloff
  Cc: Nathan Scott, Kasper Sandberg, Justin Piszcz, Torsten Landschoff,
	linux-kernel, xfs

On Thursday 20 July 2006 16:13, Kevin Radloff wrote:
> On 7/20/06, Nathan Scott <nathans@sgi.com> wrote:
> > On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> > > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > > >
> > > > I suspect you had some residual directory corruption from using the
> > > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > > > fixed in the latest -stable point release).
> >
> > Correction there - no -stable exists with this yet, I guess that'll
> > be 2.6.17.7 once its out though.
> >
> > > what action do you suggest i do now?
> >
> > I've captured the state of this issue here, with options and ways
> > to correct the problem:
> >         http://oss.sgi.com/projects/xfs/faq.html#dir2
> >
> > Hope this helps.
>
> I actually tried the xfs_db method to fix my / filesystem (as you had
> outlined in
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2), and
> while it's quite possible that I screwed it up, after a subsequent
> xfs_repair run (which completed successfully and moved lots of stuff to
> /lost+found, as I would expect), the XFS code had serious problems with
> various parts of my filesystem (like "ls /lost+found", which
> would cause lots of errors to be logged, although not a complete fs
> shutdown). After another run through xfs_repair resulted in a
> filesystem that would no longer even successfully boot.
>
> Unfortunately it was a mostly-full 74GB big-/ partition on my primary
> machine (a laptop), so I don't have a dump of it for you and my report
> is probably pretty useless. :( But on the bright side, virtually all
> of the filesystem was otherwise intact and I was able to get all my
> data off before rebuilding my system.

I've been hit by this on my root filesystem today, and when I followed the 
instructions, I was able to retrieve my data. It turned out that the only 
corrupted inode was /, which unfortunately meant I had to repair the 
filesystem with a boot-cd. However, it was obvious which inodes corresponded 
to which directories, and I was able to repair it.

I'm not sure this advice is sound, but it seems to me that if you're running 
an affected 2.6.17 kernel (or ever have) on an XFS volume, it's not worth 
risking destruction if you haven't had any oopses. The filesystem will get 
worse, but hopefully in a non-fatal way, and the XFS guys will hopefully have 
an xfs_repair up that works, soon.

Right now I'd highly recommend copying as much as possible from the corrupted 
filesystem )after following the instructions) to a new filesystem (with an 
unaffected kernel, of course) and destroying the old one. I still have 
inconsistencies on the filesystem I "repaired", and it was a fairly painful 
process.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 16:11         ` Chris Wedgwood
@ 2006-07-20 22:14           ` Nathan Scott
  2006-07-20 22:18             ` Justin Piszcz
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-20 22:14 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: David Greaves, Kasper Sandberg, Justin Piszcz, Torsten Landschoff,
	linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 09:11:21AM -0700, Chris Wedgwood wrote:
> On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote:
> 
> > Does this problem exist in 2.16.6.x??
> 
> The change was merged after 2.6.16.x was branched, I was mistaken
> in how long I thought the bug has been about.
> 
> > I hope so because I assumed there simply wasn't a patch for 2.6.16 and
> > applied this 'best guess' to my servers and rebooted/remounted successfully.
> 
> Doing the correct change to 2.6.16.x won't hurt, but it's not
> necessary.

Yep.  As Chris said, 2.6.17 is the only affected kernel.  I've
fixed up the whacky html formatting and my merge error (thanks
to all for reporting those) so its a bit more readable now.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:14           ` Nathan Scott
@ 2006-07-20 22:18             ` Justin Piszcz
  2006-07-20 22:24               ` Nathan Scott
  0 siblings, 1 reply; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 22:18 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

Nathan,

Does the bug only occur during a crash?

I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all 
on UPS) - no issue?

Justin.


On Fri, 21 Jul 2006, Nathan Scott wrote:

> On Thu, Jul 20, 2006 at 09:11:21AM -0700, Chris Wedgwood wrote:
>> On Thu, Jul 20, 2006 at 02:28:32PM +0100, David Greaves wrote:
>>
>>> Does this problem exist in 2.16.6.x??
>>
>> The change was merged after 2.6.16.x was branched, I was mistaken
>> in how long I thought the bug has been about.
>>
>>> I hope so because I assumed there simply wasn't a patch for 2.6.16 and
>>> applied this 'best guess' to my servers and rebooted/remounted successfully.
>>
>> Doing the correct change to 2.6.16.x won't hurt, but it's not
>> necessary.
>
> Yep.  As Chris said, 2.6.17 is the only affected kernel.  I've
> fixed up the whacky html formatting and my merge error (thanks
> to all for reporting those) so its a bit more readable now.
>
> cheers.
>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:18             ` Justin Piszcz
@ 2006-07-20 22:24               ` Nathan Scott
  2006-07-20 22:43                 ` Justin Piszcz
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-20 22:24 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 06:18:14PM -0400, Justin Piszcz wrote:
> Nathan,
> 
> Does the bug only occur during a crash?

No, its unrelated to crashing.  Only when adding/removing from a
directory that is in a specific node/btree format (many entries),
and only under a specific set of conditions (like what directory
entry names were used, which blocks they've hashed to and how they
ended up being allocated and in what order each block gets removed
from the directory).

> I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all 
> on UPS) - no issue?

Could be an issue, could be none.  xfs_check it to be sure.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:24               ` Nathan Scott
@ 2006-07-20 22:43                 ` Justin Piszcz
  2006-07-20 22:52                   ` Nathan Scott
  0 siblings, 1 reply; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 22:43 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq



On Fri, 21 Jul 2006, Nathan Scott wrote:

> On Thu, Jul 20, 2006 at 06:18:14PM -0400, Justin Piszcz wrote:
>> Nathan,
>>
>> Does the bug only occur during a crash?
>
> No, its unrelated to crashing.  Only when adding/removing from a
> directory that is in a specific node/btree format (many entries),
> and only under a specific set of conditions (like what directory
> entry names were used, which blocks they've hashed to and how they
> ended up being allocated and in what order each block gets removed
> from the directory).
>
>> I have been running 2.6.17.x for awhile now (multiple XFS filesystems, all
>> on UPS) - no issue?
>
> Could be an issue, could be none.  xfs_check it to be sure.
>
> cheers.
>
> -- 
> Nathan
>
>

p34:~# xfs_check -v /dev/md3
xfs_check: out of memory
p34:~#

D'oh...

1GB ram, 2GB swap trying to check a 2.6T fs, no dice.

As long as it mounted ok with the patched kernel, should one be ok?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:43                 ` Justin Piszcz
@ 2006-07-20 22:52                   ` Nathan Scott
  2006-07-20 22:55                     ` Justin Piszcz
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-20 22:52 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote:
> p34:~# xfs_check -v /dev/md3
> xfs_check: out of memory
> p34:~#
> 
> D'oh...

xfs_repair -n is another option, it has a cheaper (memory wise,
usually) checking algorithm.

> As long as it mounted ok with the patched kernel, should one be ok?

Not necessarily, no - mount will only read the root inode.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:52                   ` Nathan Scott
@ 2006-07-20 22:55                     ` Justin Piszcz
  2006-07-20 22:57                       ` Justin Piszcz
  2006-07-20 23:00                       ` Nathan Scott
  0 siblings, 2 replies; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 22:55 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

Nasty!

         - agno = 37
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
         - traversing filesystem starting at / ...
free block 16777216 for directory inode 2684356622 bad nused
free block 16777216 for directory inode 2147485710 bad nused
         - traversal finished ...
         - traversing all unattached subtrees ...
         - traversals finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
p34:~#

I applied the "one line fix" - I should be ok now?



On Fri, 21 Jul 2006, Nathan Scott wrote:

> On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote:
>> p34:~# xfs_check -v /dev/md3
>> xfs_check: out of memory
>> p34:~#
>>
>> D'oh...
>
> xfs_repair -n is another option, it has a cheaper (memory wise,
> usually) checking algorithm.
>
>> As long as it mounted ok with the patched kernel, should one be ok?
>
> Not necessarily, no - mount will only read the root inode.
>
> cheers.
>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:55                     ` Justin Piszcz
@ 2006-07-20 22:57                       ` Justin Piszcz
  2006-07-20 23:00                       ` Nathan Scott
  1 sibling, 0 replies; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 22:57 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

Erm, the xfs_repair -n only prints out what it needs to fix, I read 
somewhere that xfs_repair may make things worse?

What is the 'correct' fix?

On Thu, 20 Jul 2006, Justin Piszcz wrote:

> Nasty!
>
>        - agno = 37
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>        - traversing filesystem starting at / ...
> free block 16777216 for directory inode 2684356622 bad nused
> free block 16777216 for directory inode 2147485710 bad nused
>        - traversal finished ...
>        - traversing all unattached subtrees ...
>        - traversals finished ...
>        - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> p34:~#
>
> I applied the "one line fix" - I should be ok now?
>
>
>
> On Fri, 21 Jul 2006, Nathan Scott wrote:
>
>> On Thu, Jul 20, 2006 at 06:43:34PM -0400, Justin Piszcz wrote:
>>> p34:~# xfs_check -v /dev/md3
>>> xfs_check: out of memory
>>> p34:~#
>>> 
>>> D'oh...
>> 
>> xfs_repair -n is another option, it has a cheaper (memory wise,
>> usually) checking algorithm.
>> 
>>> As long as it mounted ok with the patched kernel, should one be ok?
>> 
>> Not necessarily, no - mount will only read the root inode.
>> 
>> cheers.
>> 
>> -- 
>> Nathan
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> 
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 22:55                     ` Justin Piszcz
  2006-07-20 22:57                       ` Justin Piszcz
@ 2006-07-20 23:00                       ` Nathan Scott
  2006-07-20 23:10                         ` Justin Piszcz
  1 sibling, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-20 23:00 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 06:55:51PM -0400, Justin Piszcz wrote:
> Phase 6 - check inode connectivity...
>          - traversing filesystem starting at / ...
> free block 16777216 for directory inode 2684356622 bad nused
> free block 16777216 for directory inode 2147485710 bad nused
>          - traversal finished ...
> ...
> I applied the "one line fix" - I should be ok now?

You have two corrupt directory inodes (caused by this bug, that
is exactly the signature I'd expect - it was a nused field that
was affected by the dodgey endian change).  The two inodes need
to be fixed - consult the FAQ for details.

Once fixed, and with a patched kernel, you're set.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 23:00                       ` Nathan Scott
@ 2006-07-20 23:10                         ` Justin Piszcz
  2006-07-20 23:12                           ` Chris Wedgwood
  0 siblings, 1 reply; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 23:10 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Chris Wedgwood, David Greaves, Kasper Sandberg,
	Torsten Landschoff, linux-kernel, xfs, ml, radsaq

Nathan,

Running xfs_repair multiple times (after following the FAQ for the 
write core.mode 0 fix), I get this:

         - agno = 3
         - agno = 4
         - agno = 5
         - agno = 6
entry ".." at block 0 offset 1352 in directory inode 3221227534 references 
free inode 2112
         clearing inode number in entry at offset 1352...
no .. entry for directory 3221227534
         - agno = 7
         - agno = 8
         - agno = 9
disconnected inode 2684386082, moving to lost+found
disconnected inode 2684386083, moving to lost+found
disconnected inode 2684386084, moving to lost+found
disconnected inode 2684386085, moving to lost+found
disconnected inode 2684386086, moving to lost+found
disconnected inode 2684386087, moving to lost+found
disconnected inode 2684386088, moving to lost+found
disconnected inode 2684386089, moving to lost+found
disconnected inode 2684386090, moving to lost+found
disconnected inode 2684386091, moving to lost+found
disconnected inode 2684386092, moving to lost+found
disconnected inode 2684386093, moving to lost+found
disconnected inode 2684386094, moving to lost+found
disconnected inode 2684386095, moving to lost+found
disconnected inode 2684386096, moving to lost+found
disconnected inode 2684386097, moving to lost+found
disconnected inode 2684386098, moving to lost+found
disconnected inode 2684386099, moving to lost+found
disconnected inode 2684386100, moving to lost+found
disconnected inode 2684386101, moving to lost+found
disconnected inode 2684386102, moving to lost+found
disconnected inode 2684386103, moving to lost+found
disconnected inode 2684386104, moving to lost+found
disconnected inode 2684386105, moving to lost+found
disconnected inode 2684386106, moving to lost+found
disconnected inode 2684386107, moving to lost+found
disconnected inode 2684386108, moving to lost+found
disconnected inode 2684386109, moving to lost+found
disconnected inode 2684386110, moving to lost+found
disconnected inode 2684386111, moving to lost+found
disconnected inode 2684386112, moving to lost+found
disconnected inode 2684386113, moving to lost+found
disconnected inode 2684386114, moving to lost+found
disconnected inode 2684386115, moving to lost+found
disconnected inode 2684386116, moving to lost+found
disconnected inode 2684386117, moving to lost+found
disconnected inode 2684386118, moving to lost+found
disconnected inode 2684386119, moving to lost+found
disconnected inode 2684386120, moving to lost+found
disconnected inode 2684386121, moving to lost+found
disconnected inode 2684386122, moving to lost+found
disconnected inode 2684386123, moving to lost+found
disconnected inode 2684386124, moving to lost+found
disconnected inode 2684386125, moving to lost+found
disconnected inode 2684386126, moving to lost+found
disconnected inode 2684386127, moving to lost+found
disconnected inode 2684386128, moving to lost+found
disconnected inode 2684386129, moving to lost+found
disconnected inode 2684386130, moving to lost+found
disconnected inode 2684386131, moving to lost+found
disconnected inode 2684386132, moving to lost+found
disconnected inode 2684386133, moving to lost+found
disconnected inode 2684386134, moving to lost+found
disconnected inode 2684386135, moving to lost+found
disconnected inode 2684386136, moving to lost+found
disconnected inode 2684386137, moving to lost+found
disconnected inode 2684386138, moving to lost+found
disconnected inode 2684386139, moving to lost+found
disconnected inode 2684386140, moving to lost+found
disconnected inode 2684386141, moving to lost+found
disconnected inode 2684386142, moving to lost+found
disconnected inode 2684386143, moving to lost+found
disconnected inode 2684386144, moving to lost+found
disconnected inode 2684386145, moving to lost+found
disconnected inode 2684386146, moving to lost+found
disconnected inode 2684386147, moving to lost+found
disconnected inode 2684386148, moving to lost+found
disconnected inode 2684386149, moving to lost+found
disconnected inode 2684386150, moving to lost+found
disconnected inode 2684386151, moving to lost+found
disconnected inode 2684386152, moving to lost+found
disconnected inode 2684386153, moving to lost+found
disconnected inode 2684386154, moving to lost+found
disconnected inode 2684386155, moving to lost+found
disconnected inode 2684386156, moving to lost+found
disconnected inode 2684386157, moving to lost+found
disconnected inode 2684386158, moving to lost+found
disconnected inode 2684386159, moving to lost+found
disconnected inode 2684386160, moving to lost+found
disconnected inode 2684386161, moving to lost+found
disconnected inode 2684386162, moving to lost+found
disconnected inode 2684386163, moving to lost+found
disconnected inode 2684386164, moving to lost+found
disconnected inode 2684386165, moving to lost+found
disconnected inode 2684653605, moving to lost+found
disconnected dir inode 3221227534, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 3221227534 nlinks from 3 to 2
done
p34:~#

I can run this over and over, and the result is the same?

On Fri, 21 Jul 2006, Nathan Scott wrote:

> On Thu, Jul 20, 2006 at 06:55:51PM -0400, Justin Piszcz wrote:
>> Phase 6 - check inode connectivity...
>>          - traversing filesystem starting at / ...
>> free block 16777216 for directory inode 2684356622 bad nused
>> free block 16777216 for directory inode 2147485710 bad nused

>>          - traversal finished ...
>> ...
>> I applied the "one line fix" - I should be ok now?
>
> You have two corrupt directory inodes (caused by this bug, that
> is exactly the signature I'd expect - it was a nused field that
> was affected by the dodgey endian change).  The two inodes need
> to be fixed - consult the FAQ for details.
>
> Once fixed, and with a patched kernel, you're set.
>
> cheers.
>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 23:10                         ` Justin Piszcz
@ 2006-07-20 23:12                           ` Chris Wedgwood
  2006-07-20 23:15                             ` Justin Piszcz
  2006-07-20 23:19                             ` Nathan Scott
  0 siblings, 2 replies; 45+ messages in thread
From: Chris Wedgwood @ 2006-07-20 23:12 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Nathan Scott, David Greaves, Kasper Sandberg, Torsten Landschoff,
	linux-kernel, xfs, ml, radsaq

On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote:

> I can run this over and over, and the result is the same?

lost+found is recreated every time, rename it and you'll get less
output

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 23:12                           ` Chris Wedgwood
@ 2006-07-20 23:15                             ` Justin Piszcz
  2006-07-20 23:19                             ` Nathan Scott
  1 sibling, 0 replies; 45+ messages in thread
From: Justin Piszcz @ 2006-07-20 23:15 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Nathan Scott, David Greaves, Kasper Sandberg, Torsten Landschoff,
	linux-kernel, xfs, ml, radsaq

Thanks, that was it, after removing the lost+found directory & re-running 
xfs_repair, I no longer have any errors, onthat device anyway.


On Thu, 20 Jul 2006, Chris Wedgwood wrote:

> On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote:
>
>> I can run this over and over, and the result is the same?
>
> lost+found is recreated every time, rename it and you'll get less
> output
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20 23:12                           ` Chris Wedgwood
  2006-07-20 23:15                             ` Justin Piszcz
@ 2006-07-20 23:19                             ` Nathan Scott
  1 sibling, 0 replies; 45+ messages in thread
From: Nathan Scott @ 2006-07-20 23:19 UTC (permalink / raw)
  To: Justin Piszcz, Chris Wedgwood
  Cc: David Greaves, Kasper Sandberg, Torsten Landschoff, linux-kernel,
	xfs, ml, radsaq

On Thu, Jul 20, 2006 at 04:12:46PM -0700, Chris Wedgwood wrote:
> On Thu, Jul 20, 2006 at 07:10:46PM -0400, Justin Piszcz wrote:
> 
> > I can run this over and over, and the result is the same?
> 
> lost+found is recreated every time, rename it and you'll get less
> output

Yes this is the current xfs_repair behaviour (any previously
unlinked inodes will be found as unlinked on each successive
run, due to lost+found being recreated).  This will likely
be rethought soon (not far off), since it confuses everyone.

So, its all good - xfs_repair has fixed things and you're all
set now.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
                     ` (2 preceding siblings ...)
  2006-07-19 21:14   ` XFS breakage in 2.6.18-rc1 Torsten Landschoff
@ 2006-07-22 16:27   ` Christian Kujau
  2006-07-23 23:01     ` Nathan Scott
  3 siblings, 1 reply; 45+ messages in thread
From: Christian Kujau @ 2006-07-22 16:27 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

Hi folks,

On Wed, 19 Jul 2006, Nathan Scott wrote:
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.

well, at least for me, corruption/errors *started* with 2.6.18-rc1:

http://oss.sgi.com/archives/xfs/2006-07/msg00151.html

I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to 
2.6.18-rc2 and see the same errors:

xfs_da_do_buf: bno 16777216
dir: inode 24472381
Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c.  Caller 0xc0219230
Filesystem "md0": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xc024d717

Please see the whole error/.config/logs here:

http://nerdbynature.de/bits/2.6.18-rc2/

Thanks,
Christian.
-- 
BOFH excuse #38:

secretary plugged hairdryer into UPS

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-22 16:27   ` Christian Kujau
@ 2006-07-23 23:01     ` Nathan Scott
  2006-07-28 17:01       ` Christian Kujau
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-23 23:01 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel, xfs

On Sat, Jul 22, 2006 at 05:27:24PM +0100, Christian Kujau wrote:
> On Wed, 19 Jul 2006, Nathan Scott wrote:
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> 
> well, at least for me, corruption/errors *started* with 2.6.18-rc1:
> ...
> I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to 
> 2.6.18-rc2 and see the same errors:
> 
> xfs_da_do_buf: bno 16777216
> dir: inode 24472381

This is an ondisk corruption - downgrading the kernel will not
resolve it.  The problem must be triggered by a combination of
operations on a directory; I'm certain that if you access inode
24472381 on your filesystem on 2.6.17, that it'll shutdown your
filesystem too.  See the FAQ entry for a description on how to
translate inums to paths, and also the repair -n step to detect
any corruption ondisk.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-23 23:01     ` Nathan Scott
@ 2006-07-28 17:01       ` Christian Kujau
  2006-07-28 21:48         ` Nathan Scott
  0 siblings, 1 reply; 45+ messages in thread
From: Christian Kujau @ 2006-07-28 17:01 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

Hello again,

On Mon, 24 Jul 2006, Nathan Scott wrote:
> filesystem too.  See the FAQ entry for a description on how to
> translate inums to paths, and also the repair -n step to detect
> any corruption ondisk.

I had two xfs filesystems and I first noticed that /data/Scratch was 
befallen from this bug. I did not care much about this (hence the
name :)) and I wanted to postpone the xfs_db surgery.

Unfortunately I forgot that "/" was also an XFS and it crashed 
yesterday. remounting ro helped a bit (so no process attempted to write 
on it. however, cp'ing from the ro-mounted xfs sometimes hung, 
unkillable), I setup a mini-root somewhere else and followed the
instructions in the FAQ. It did not go too well, lots of 
stuff was moved to lost+found, but every subsequent xfs_repair run 
found more and more errors. I decided to mkfs the partition and make use 
of my backups. my other "scratch" partition is still XFS but mounted ro 
and I'll try the xfsprogs fixes Nathan published on this one.

Oh, and I dd'ed the corrupt xfs-filesystem to a file, so I can play 
around with this one as well.

If anyone is interested, here are the typescripts from the horrible 
xfs_repair runs: http://nerdbynature.de/bits/2.6.18-rc2/log/

cheers,
Christian.
-- 
BOFH excuse #21:

POSIX compliance problem

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-28 17:01       ` Christian Kujau
@ 2006-07-28 21:48         ` Nathan Scott
  2006-07-29 20:22           ` Ralf Hildebrandt
  0 siblings, 1 reply; 45+ messages in thread
From: Nathan Scott @ 2006-07-28 21:48 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel, xfs

On Fri, Jul 28, 2006 at 05:01:24PM +0000, Christian Kujau wrote:
> I had two xfs filesystems and I first noticed that /data/Scratch was 
> befallen from this bug. I did not care much about this (hence the
> name :)) and I wanted to postpone the xfs_db surgery.
> ...
> found more and more errors. I decided to mkfs the partition and make use 
> of my backups. my other "scratch" partition is still XFS but mounted ro 
> and I'll try the xfsprogs fixes Nathan published on this one.

Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
list yesterday; please give that a go and let us know how it fares.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-28 21:48         ` Nathan Scott
@ 2006-07-29 20:22           ` Ralf Hildebrandt
  2006-07-29 22:28             ` David Chatterton
  0 siblings, 1 reply; 45+ messages in thread
From: Ralf Hildebrandt @ 2006-07-29 20:22 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Christian Kujau, linux-kernel, xfs

* Nathan Scott <nathans@sgi.com>:

> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
> list yesterday; please give that a go and let us know how it fares.

Just to let you know, I did a cvs checkout of xfs-cmds
as described on http://oss.sgi.com/projects/xfs/source.html

Then I saved the patch from
http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
"Original" link on hat page.

I build a xfs_Repair binary using that, transferred it onto an old
KLAX boot cd I had and repaired the XFS root on my laptop.

I got 5000 files in lost and found, mostly the whole manpages from my
system. Had to reinstall a few packages to restore lost binaries, but
that's all.

When will that horrible bug be fixed in 2.6.x? 

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums)         Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-29 20:22           ` Ralf Hildebrandt
@ 2006-07-29 22:28             ` David Chatterton
  0 siblings, 0 replies; 45+ messages in thread
From: David Chatterton @ 2006-07-29 22:28 UTC (permalink / raw)
  To: Nathan Scott, Christian Kujau, linux-kernel, xfs



Ralf Hildebrandt wrote:
> * Nathan Scott <nathans@sgi.com>:
> 
>> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
>> list yesterday; please give that a go and let us know how it fares.
> 
> Just to let you know, I did a cvs checkout of xfs-cmds
> as described on http://oss.sgi.com/projects/xfs/source.html
> 
> Then I saved the patch from
> http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
> "Original" link on hat page.
> 
> I build a xfs_Repair binary using that, transferred it onto an old
> KLAX boot cd I had and repaired the XFS root on my laptop.
> 
> I got 5000 files in lost and found, mostly the whole manpages from my
> system. Had to reinstall a few packages to restore lost binaries, but
> that's all.
> 
> When will that horrible bug be fixed in 2.6.x? 
> 

The bug is fixed in 2.6.17.7.

David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
                         ` (2 preceding siblings ...)
  2006-07-20 15:13       ` Kevin Radloff
@ 2006-07-31 16:25       ` Jan Kasprzak
  2006-07-31 16:38         ` Justin Piszcz
  2006-08-02  4:32         ` Nathan Scott
  3 siblings, 2 replies; 45+ messages in thread
From: Jan Kasprzak @ 2006-07-31 16:25 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

Nathan Scott wrote:
: I've captured the state of this issue here, with options and ways
: to correct the problem:
: 	http://oss.sgi.com/projects/xfs/faq.html#dir2
: 
: Hope this helps.

	I have been hit with this bug as well - I tried to clear the
two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran
xfs_repair (lots of files ended up in lost+found), but apparently
the volume is still not OK - when I tried to use it (this volume
is a public FTP archive), I got the following traces:

Jul 30 16:04:49 odysseus kernel: Filesystem "md5": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c.  Caller 0xffffffff80324221
Jul 30 16:04:49 odysseus kernel:
Jul 30 16:04:49 odysseus kernel: Call Trace: <ffffffff803331ac>{xfs_corruption_error+228}
Jul 30 16:04:49 odysseus kernel:        <ffffffff8035630e>{kmem_zone_alloc+86} <ffffffff803240f0>{xfs_da_do_buf+1359}
Jul 30 16:04:49 odysseus kernel:        <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff80323aba>{xfs_da_buf_make+31}
Jul 30 16:04:49 odysseus kernel:        <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff803263e7>{xfs_da_node_lookup_int+112}
Jul 30 16:04:49 odysseus kernel:        <ffffffff803263e7>{xfs_da_node_lookup_int+112} <ffffffff8032c7b8>{xfs_dir2_node_lookup+70}
Jul 30 16:04:50 odysseus kernel:        <ffffffff80327b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256}
Jul 30 16:04:51 odysseus kernel:        <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79}
Jul 30 16:04:51 odysseus kernel:        <ffffffff8035c95a>{xfs_vn_lo7b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256}
Jul 30 16:04:52 odysseus kernel:        <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79}
Jul 30 16:04:53 odysseus kernel:        <ffffffff8035c95a>{xfs_vn_lookup+48} <ffffffff80270b45>{do_lookup+196}
Jul 30 16:04:53 odysseus rpc.statd[3145]: Caught signal 15, un-registering and exiting.
Jul 30 16:04:53 odysseus kernel:        <ffffffff802729c6>{__link_path_walk+2435} <ffffffff80272f40>{link_path_walk+89}
Jul 30 16:04:53 odysseus kernel:        <ffffffff8049c0d2>{__sched_text_start+290} <ffffffff80273396>{do_path_lookup+614}
Jul 30 16:04:53 odysseus kernel:        <ffffffff80271e47>{getname+347} <ffffffff80273bd6>{__user_walk_fd+55}
Jul 30 16:04:53 odysseus kernel:        <ffffffff8026cba7>{vfs_lstat_fd+21} <ffffffff8049c0d2>{__sched_text_start+290}
Jul 30 16:04:53 odysseus kernel:        <ffffffff8026cd92>{sys_newlstat+25} <ffffffff80265023>{vfs_write+283}
Jul 30 16:04:53 odysseus kernel:        <ffffffff8026554c>{sys_write+69} <ffffffff80209826>{system_call+126}
Jul 30 16:04:53 odysseus kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00

	This is 2.6.17.7 dual x86_64 (Fedora Core 5). It has been unfortunately
running 2.6.17.1 for some time.

	I will probably have to recreate the volume and restore its
contents from backups. Or is there any better solution?

	Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
> I will never go to meetings again because I think  face to face meetings <
> are the biggest waste of time you can ever have.        --Linus Torvalds <

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-31 16:25       ` Jan Kasprzak
@ 2006-07-31 16:38         ` Justin Piszcz
  2006-08-02  4:32         ` Nathan Scott
  1 sibling, 0 replies; 45+ messages in thread
From: Justin Piszcz @ 2006-07-31 16:38 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Nathan Scott, linux-kernel, xfs



On Mon, 31 Jul 2006, Jan Kasprzak wrote:

> Nathan Scott wrote:
> : I've captured the state of this issue here, with options and ways
> : to correct the problem:
> : 	http://oss.sgi.com/projects/xfs/faq.html#dir2
> :
> : Hope this helps.
>
> 	I have been hit with this bug as well - I tried to clear the
> two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran
> xfs_repair (lots of files ended up in lost+found), but apparently
> the volume is still not OK - when I tried to use it (this volume
> is a public FTP archive), I got the following traces:
>
> Jul 30 16:04:49 odysseus kernel: Filesystem "md5": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c.  Caller 0xffffffff80324221
> Jul 30 16:04:49 odysseus kernel:
> Jul 30 16:04:49 odysseus kernel: Call Trace: <ffffffff803331ac>{xfs_corruption_error+228}
> Jul 30 16:04:49 odysseus kernel:        <ffffffff8035630e>{kmem_zone_alloc+86} <ffffffff803240f0>{xfs_da_do_buf+1359}
> Jul 30 16:04:49 odysseus kernel:        <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff80323aba>{xfs_da_buf_make+31}
> Jul 30 16:04:49 odysseus kernel:        <ffffffff80324221>{xfs_da_read_buf+22} <ffffffff803263e7>{xfs_da_node_lookup_int+112}
> Jul 30 16:04:49 odysseus kernel:        <ffffffff803263e7>{xfs_da_node_lookup_int+112} <ffffffff8032c7b8>{xfs_dir2_node_lookup+70}
> Jul 30 16:04:50 odysseus kernel:        <ffffffff80327b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256}
> Jul 30 16:04:51 odysseus kernel:        <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79}
> Jul 30 16:04:51 odysseus kernel:        <ffffffff8035c95a>{xfs_vn_lo7b35>{xfs_dir2_isleaf+25} <ffffffff803280d6>{xfs_dir2_lookup+256}
> Jul 30 16:04:52 odysseus kernel:        <ffffffff8034dd10>{xfs_dir_lookup_int+55} <ffffffff803511af>{xfs_lookup+79}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff8035c95a>{xfs_vn_lookup+48} <ffffffff80270b45>{do_lookup+196}
> Jul 30 16:04:53 odysseus rpc.statd[3145]: Caught signal 15, un-registering and exiting.
> Jul 30 16:04:53 odysseus kernel:        <ffffffff802729c6>{__link_path_walk+2435} <ffffffff80272f40>{link_path_walk+89}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff8049c0d2>{__sched_text_start+290} <ffffffff80273396>{do_path_lookup+614}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff80271e47>{getname+347} <ffffffff80273bd6>{__user_walk_fd+55}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff8026cba7>{vfs_lstat_fd+21} <ffffffff8049c0d2>{__sched_text_start+290}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff8026cd92>{sys_newlstat+25} <ffffffff80265023>{vfs_write+283}
> Jul 30 16:04:53 odysseus kernel:        <ffffffff8026554c>{sys_write+69} <ffffffff80209826>{system_call+126}
> Jul 30 16:04:53 odysseus kernel: 0x0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00
>
> 	This is 2.6.17.7 dual x86_64 (Fedora Core 5). It has been unfortunately
> running 2.6.17.1 for some time.
>
> 	I will probably have to recreate the volume and restore its
> contents from backups. Or is there any better solution?
>
> 	Thanks,
>
> -Yenya
>
> -- 
> | Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
> | GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
> | http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
>> I will never go to meetings again because I think  face to face meetings <
>> are the biggest waste of time you can ever have.        --Linus Torvalds <
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

If you unmount, xfs_repair -n /dev/md5, what does it show currently?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: FAQ updated (was Re: XFS breakage...)
  2006-07-31 16:25       ` Jan Kasprzak
  2006-07-31 16:38         ` Justin Piszcz
@ 2006-08-02  4:32         ` Nathan Scott
  1 sibling, 0 replies; 45+ messages in thread
From: Nathan Scott @ 2006-08-02  4:32 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-kernel, xfs

On Mon, Jul 31, 2006 at 06:25:35PM +0200, Jan Kasprzak wrote:
> Nathan Scott wrote:
> : I've captured the state of this issue here, with options and ways
> : to correct the problem:
> : 	http://oss.sgi.com/projects/xfs/faq.html#dir2
> : 
> : Hope this helps.
> 
> 	I have been hit with this bug as well - I tried to clear the
> two corrupted directory inodes with xfs_db (as the FAQ entry says), then ran
> xfs_repair (lots of files ended up in lost+found), but apparently
> the volume is still not OK - when I tried to use it (this volume
> is a public FTP archive), I got the following traces:

There is now a fixed version of xfs_repair available - its in
xfsprogs-2.8.10, source is on oss.sgi.com in the XFS ftp area.
A number of people have reported success with Barry's earlier
patch, noone's reported anything bad, so 2.8.10 is out now with
the fix merged.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2006-08-02  4:32 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 22:29 XFS breakage in 2.6.18-rc1 Torsten Landschoff
2006-07-18 22:57 ` Nathan Scott
2006-07-19  8:08   ` Alistair John Strachan
2006-07-19 22:56     ` Nathan Scott
2006-07-20 10:29       ` Kasper Sandberg
2006-07-19 10:21   ` Kasper Sandberg
2006-07-19 12:43     ` Alistair John Strachan
2006-07-19 15:25       ` Kasper Sandberg
2006-07-19 22:59     ` Nathan Scott
2006-07-20  7:13     ` FAQ updated (was Re: XFS breakage...) Nathan Scott
2006-07-20 12:42       ` Hans-Peter Jansen
2006-07-20 13:28       ` David Greaves
2006-07-20 16:11         ` Chris Wedgwood
2006-07-20 22:14           ` Nathan Scott
2006-07-20 22:18             ` Justin Piszcz
2006-07-20 22:24               ` Nathan Scott
2006-07-20 22:43                 ` Justin Piszcz
2006-07-20 22:52                   ` Nathan Scott
2006-07-20 22:55                     ` Justin Piszcz
2006-07-20 22:57                       ` Justin Piszcz
2006-07-20 23:00                       ` Nathan Scott
2006-07-20 23:10                         ` Justin Piszcz
2006-07-20 23:12                           ` Chris Wedgwood
2006-07-20 23:15                             ` Justin Piszcz
2006-07-20 23:19                             ` Nathan Scott
2006-07-20 15:13       ` Kevin Radloff
2006-07-20 16:51         ` Alistair John Strachan
2006-07-31 16:25       ` Jan Kasprzak
2006-07-31 16:38         ` Justin Piszcz
2006-08-02  4:32         ` Nathan Scott
2006-07-19 21:14   ` XFS breakage in 2.6.18-rc1 Torsten Landschoff
2006-07-19 23:09     ` Nathan Scott
2006-07-22 16:27   ` Christian Kujau
2006-07-23 23:01     ` Nathan Scott
2006-07-28 17:01       ` Christian Kujau
2006-07-28 21:48         ` Nathan Scott
2006-07-29 20:22           ` Ralf Hildebrandt
2006-07-29 22:28             ` David Chatterton
2006-07-18 23:06 ` Kevin Radloff
  -- strict thread matches above, loose matches on Subject: below --
2006-07-19 14:17 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 23:01   ` Nathan Scott
2006-07-20  5:51     ` Jeffrey Hundstad
2006-07-19 21:09 ` Torsten Landschoff
2006-07-20 10:46   ` Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox