public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* XFS breakage in 2.6.18-rc1
@ 2006-07-18 22:29 Torsten Landschoff
  2006-07-18 22:57 ` Nathan Scott
  2006-07-18 23:06 ` Kevin Radloff
  0 siblings, 2 replies; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-18 22:29 UTC (permalink / raw)
  To: linux-kernel

Hi friends, 

I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
from my /var/log/kern.log), which ultimately led me to reinstall my 
system:

Jul 17 07:10:12 pulsar kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
Jul 17 07:10:12 pulsar kernel: Linux version 2.6.18-rc1 (torsten@pulsar) (gcc version 4.1.2 20060630 (prerelease) (Debian 4.1.1-6)) #18 SMP PREEMPT Fri Jul 14 07:58:49 CEST 2006
...
Jul 17 07:10:32 pulsar kernel: agpgart: Putting AGP V3 device at 0000:03:00.0 into 4x mode
Jul 17 07:10:32 pulsar kernel: [drm] Setting GART location based on new memory map
Jul 17 07:10:32 pulsar kernel: [drm] Loading R200 Microcode
Jul 17 07:10:32 pulsar kernel: [drm] writeback test succeeded in 1 usecs
Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c.  Caller 0xf8a837d0
Jul 17 07:33:53 pulsar kernel:  [<f8a83313>] xfs_da_do_buf+0x4d3/0x900 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a89215>] xfs_dir2_data_log_unused+0x55/0x70 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a87337>] xfs_dir_removename+0xf7/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8a9720d>] xfs_ilock_nowait+0xcd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab9783>] xfs_remove+0x393/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel:  [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel:  [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel:  [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel:  [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel:  [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel:  [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xf8ab97d7
Jul 17 07:33:53 pulsar kernel:  [<f8aaf91d>] xfs_trans_cancel+0xdd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel:  [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel:  [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel:  [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel:  [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel:  [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel:  [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel:  [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: xfs_force_shutdown(dm-6,0x8) called from line 1139 of file fs/xfs/xfs_trans.c.  Return address = 0xf8ac77bc
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": Corruption of in-memory data detected.  Shutting down filesystem: dm-6
Jul 17 07:33:53 pulsar kernel: Please umount the filesystem, and rectify the problem(s)
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 32K
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 8K

That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
enough this happened a few months after finally replaced my ancient disk
with a RAID1 array to make sure I do not lose data ;)


In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
After booting into 2.6.17 again, I could use /usr again but random files
contain null bytes, firefox segfaults instead of starting up and a number 
of programs fail in mysterious ways. I tried to recover using xfs_repair
but I feel that my partition is thorougly borked. Of course no data was 
lost due to backups but still I'd like this bug to be fixed ;-)

If more information from my logs is required, I can make it available (and any
part of the partition if required).

Greetings

	Torsten

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:29 Torsten Landschoff
@ 2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
                     ` (3 more replies)
  2006-07-18 23:06 ` Kevin Radloff
  1 sibling, 4 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-18 22:57 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel, xfs

On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> Hi friends, 

Hi Torsten,

> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my 
> system:
> 
> Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> Jul 17 07:33:53 pulsar kernel: dir: inode 54526538

I suspect you had some residual directory corruption from using the
2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
fixed in the latest -stable point release).

> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was 
> lost due to backups but still I'd like this bug to be fixed ;-)

2.6.18-rc1 should be fine (contains the corruption fix).  Did you
mkfs and restore?  Or at least get a full repair run?  If you did,
and you still see issues in .18-rc1, please let me know asap.

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:29 Torsten Landschoff
  2006-07-18 22:57 ` Nathan Scott
@ 2006-07-18 23:06 ` Kevin Radloff
  1 sibling, 0 replies; 24+ messages in thread
From: Kevin Radloff @ 2006-07-18 23:06 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel

On 7/18/06, Torsten Landschoff <torsten@debian.org> wrote:
> Hi friends,
>
> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my
> system:
[snip]
> That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
> enough this happened a few months after finally replaced my ancient disk
> with a RAID1 array to make sure I do not lose data ;)
>
>
> In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
> After booting into 2.6.17 again, I could use /usr again but random files
> contain null bytes, firefox segfaults instead of starting up and a number
> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was
> lost due to backups but still I'd like this bug to be fixed ;-)
>
> If more information from my logs is required, I can make it available (and any
> part of the partition if required).

That looks like the death knell of my /, which succumbed on Friday as
a result (I believe) of the corruption bug that was in 2.6.16/17.
Ironically enough, I also saw the problem during an aptitude upgrade.

Also see this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2

-- 
Kevin 'radsaq' Radloff
radsaq@gmail.com
http://thesaq.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
@ 2006-07-19  8:08   ` Alistair John Strachan
  2006-07-19 22:56     ` Nathan Scott
  2006-07-19 10:21   ` Kasper Sandberg
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Alistair John Strachan @ 2006-07-19  8:08 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
[snip]
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was
> > lost due to backups but still I'd like this bug to be fixed ;-)
>
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.

Just out of interest, I've got a few XFS volumes that were created 24 months 
ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
any crashes so far.

Assuming I get the newest XFS repair tools on there, what's the disadvantage 
of repairing versus creating a new filesystem? What special circumstances are 
required to cause a crash?

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
@ 2006-07-19 10:21   ` Kasper Sandberg
  2006-07-19 12:43     ` Alistair John Strachan
  2006-07-19 22:59     ` Nathan Scott
  2006-07-19 21:14   ` Torsten Landschoff
  2006-07-22 16:27   ` Christian Kujau
  3 siblings, 2 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-19 10:21 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > Hi friends, 
> 
> Hi Torsten,
> 
> > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > from my /var/log/kern.log), which ultimately led me to reinstall my 
> > system:
> > 
> > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> 
> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).
This has me very worried.

i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
does this mean my .17-rc3 may have corrupted my filesystem?

what action do you suggest i do now?

> 
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was 
> > lost due to backups but still I'd like this bug to be fixed ;-)
> 
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.
> 
> thanks.
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 10:21   ` Kasper Sandberg
@ 2006-07-19 12:43     ` Alistair John Strachan
  2006-07-19 15:25       ` Kasper Sandberg
  2006-07-19 22:59     ` Nathan Scott
  1 sibling, 1 reply; 24+ messages in thread
From: Alistair John Strachan @ 2006-07-19 12:43 UTC (permalink / raw)
  To: Kasper Sandberg; +Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs

On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends,
> >
> > Hi Torsten,
> >
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > system:
> > >
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> >
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
>
> This has me very worried.
>
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
>
> what action do you suggest i do now?
>
> > > of programs fail in mysterious ways. I tried to recover using
> > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > no data was lost due to backups but still I'd like this bug to be fixed
> > > ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> >
> > thanks.

According to another thread Nathan just responded to, it sounds like we need 
to wait for a new version of the xfsprogs package, and then run xfs_repair on 
the affected filesystems. I wouldn't worry about it too much if you've not 
had any crashes. The damage can be repaired, just not right now.

I'm still waiting for a crash on a machine that has been under heavy load for 
28 days, so it's obviously not _that_ easy to trigger.

-- 
Cheers,
Alistair.

Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
@ 2006-07-19 14:17 Mattias Hedenskog
  2006-07-19 14:59 ` Jeffrey E. Hundstad
  2006-07-19 21:09 ` Torsten Landschoff
  0 siblings, 2 replies; 24+ messages in thread
From: Mattias Hedenskog @ 2006-07-19 14:17 UTC (permalink / raw)
  To: linux-kernel

> That looks like the death knell of my /, which succumbed on Friday as
> a result (I believe) of the corruption bug that was in 2.6.16/17.
> Ironically enough, I also saw the problem during an aptitude upgrade.

Hi all,

I just want to confirm this bug as well and unfortunately it was my
system disk too who had to take the hit. Im running 2.6.16 and its
reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
the fs I got the same error as in the previous post, running xfsprogs
2.8.4. I haven't had the time to debug this issue further because the
box is quite critical but I'll keep an eye on the other disks on the
system still running xfs.

Regards,
Mattias Hedenskog

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
@ 2006-07-19 14:59 ` Jeffrey E. Hundstad
  2006-07-19 23:01   ` Nathan Scott
  2006-07-19 21:09 ` Torsten Landschoff
  1 sibling, 1 reply; 24+ messages in thread
From: Jeffrey E. Hundstad @ 2006-07-19 14:59 UTC (permalink / raw)
  To: Mattias Hedenskog; +Cc: linux-kernel

I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
annihilated the volume.  This volume was not showing signs of crashing.  
So... I guess I would certainly not run xfs_repair unless there is good 
reason.

-- 
Jeffrey Hundstad
PS. ...yes, I had a recent backup ;-)

Mattias Hedenskog wrote:
>> That looks like the death knell of my /, which succumbed on Friday as
>> a result (I believe) of the corruption bug that was in 2.6.16/17.
>> Ironically enough, I also saw the problem during an aptitude upgrade.
>
> Hi all,
>
> I just want to confirm this bug as well and unfortunately it was my
> system disk too who had to take the hit. Im running 2.6.16 and its
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
>
> Regards,
> Mattias Hedenskog
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 12:43     ` Alistair John Strachan
@ 2006-07-19 15:25       ` Kasper Sandberg
  0 siblings, 0 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-19 15:25 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs

On Wed, 2006-07-19 at 13:43 +0100, Alistair John Strachan wrote:
> On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > > Hi friends,
> > >
> > > Hi Torsten,
> > >
> > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > > system:
> > > >
> > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > >
> > > I suspect you had some residual directory corruption from using the
> > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > > fixed in the latest -stable point release).
> >
> > This has me very worried.
> >
> > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> > does this mean my .17-rc3 may have corrupted my filesystem?
> >
> > what action do you suggest i do now?
> >
> > > > of programs fail in mysterious ways. I tried to recover using
> > > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > > no data was lost due to backups but still I'd like this bug to be fixed
> > > > ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > > mkfs and restore?  Or at least get a full repair run?  If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> > >
> > > thanks.
> 
> According to another thread Nathan just responded to, it sounds like we need 
> to wait for a new version of the xfsprogs package, and then run xfs_repair on 
> the affected filesystems. I wouldn't worry about it too much if you've not 
> had any crashes. The damage can be repaired, just not right now.
without ANY loss? because even though it would be abit painful for me to
do, i do have the option of smashing in a new drive, copy everything,
and reinitialize my filesystem.
> 
> I'm still waiting for a crash on a machine that has been under heavy load for 
> 28 days, so it's obviously not _that_ easy to trigger.
so basically if i upgrade to a safe kernel before i do get these errors,
im good?


> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
  2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 21:09 ` Torsten Landschoff
  2006-07-20 10:46   ` Jan Engelhardt
  1 sibling, 1 reply; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:09 UTC (permalink / raw)
  To: Mattias Hedenskog; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On Wed, Jul 19, 2006 at 04:17:50PM +0200, Mattias Hedenskog wrote:
 
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
 
I would not try running xfs_repair without cause as well. My /home did
survive the XFS problems but I ran xfs_repair "just to be sure". Now the 
same problem on that partition, mostly unreadable. :( So, do not run 
xfs_repair without a cause ;-)

For reference, I think it was xfsprogs 2.7.14 that I was using, the 
latest in Debian.

FYI: Nothing important on /home, I think - I can not be sure since I
backup only selectively since I do not have proper backup mediums :(

Greetings

	Torsten

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
  2006-07-19  8:08   ` Alistair John Strachan
  2006-07-19 10:21   ` Kasper Sandberg
@ 2006-07-19 21:14   ` Torsten Landschoff
  2006-07-19 23:09     ` Nathan Scott
  2006-07-22 16:27   ` Christian Kujau
  3 siblings, 1 reply; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:14 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

Hi Nathan, 

On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:
 
> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).

That probably the cause of my problem. Thanks for the info!

BTW: I think there was nothing important on the broken filesystems, but
I'd like to keep what's still there anyway just in case... How would you
suggest should I copy that data? I fear, just mounting and using cp 
might break and shutdown the FS again, would xfsdump be more
appropriate?

Thanks for XFS, I am using it for years in production servers!

Greetings

	Torsten

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19  8:08   ` Alistair John Strachan
@ 2006-07-19 22:56     ` Nathan Scott
  2006-07-20 10:29       ` Kasper Sandberg
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 22:56 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> [snip]
> > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > but I feel that my partition is thorougly borked. Of course no data was
> > > lost due to backups but still I'd like this bug to be fixed ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> 
> Just out of interest, I've got a few XFS volumes that were created 24 months 
> ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
> any crashes so far.
> 
> Assuming I get the newest XFS repair tools on there, what's the disadvantage 
> of repairing versus creating a new filesystem? What special circumstances are 
> required to cause a crash?

There should be no disadvantage to repairing.  I will update the FAQ
shortly to describe all the details of the problem, recommendations
on how to address it, which kernel version is affected, etc.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 10:21   ` Kasper Sandberg
  2006-07-19 12:43     ` Alistair John Strachan
@ 2006-07-19 22:59     ` Nathan Scott
  1 sibling, 0 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 22:59 UTC (permalink / raw)
  To: Kasper Sandberg; +Cc: Torsten Landschoff, linux-kernel, xfs

On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends, 
> > 
> > Hi Torsten,
> > 
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my 
> > > system:
> > > 
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > 
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
> This has me very worried.
> 
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
> 
> what action do you suggest i do now?

The odds are decent that you're unaffected.  You can check your filesystem
using xfs_check or xfs_repair -n and these will give you a good indication
as to whether further action is required.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 23:01   ` Nathan Scott
  2006-07-20  5:51     ` Jeffrey Hundstad
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 23:01 UTC (permalink / raw)
  To: Jeffrey E. Hundstad; +Cc: Mattias Hedenskog, linux-kernel, xfs

On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
> annihilated the volume.  This volume was not showing signs of crashing.  
> So... I guess I would certainly not run xfs_repair unless there is good 
> reason.

Erm, wha..?  Can you expand on "annihilated" a bit?  (please send
me the full xfs_repair output if you still have it).

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 21:14   ` Torsten Landschoff
@ 2006-07-19 23:09     ` Nathan Scott
  0 siblings, 0 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 23:09 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: linux-kernel, xfs

On Wed, Jul 19, 2006 at 11:14:02PM +0200, Torsten Landschoff wrote:
> On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
> 
> That probably the cause of my problem. Thanks for the info!
> 
> BTW: I think there was nothing important on the broken filesystems, but
> I'd like to keep what's still there anyway just in case... How would you
> suggest should I copy that data? I fear, just mounting and using cp 
> might break and shutdown the FS again, would xfsdump be more
> appropriate?

Yeah, xfsdumps not a bad idea, the interfaces it uses may well
be able to avoid the cases that trigger shutdown.  Otherwise it
is a case of identifying the problem directory inode (the inum
is reported in the shutdown trace) and avoiding that path when
cp'ing - you can match inum to path via xfs_ncheck.

> Thanks for XFS, I am using it for years in production servers!

Thanks for the kind words, they're much appreciated at times
like these. :-]

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 23:01   ` Nathan Scott
@ 2006-07-20  5:51     ` Jeffrey Hundstad
  0 siblings, 0 replies; 24+ messages in thread
From: Jeffrey Hundstad @ 2006-07-20  5:51 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Mattias Hedenskog, linux-kernel, xfs

Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
>   
>> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it 
>> annihilated the volume.  This volume was not showing signs of crashing.  
>> So... I guess I would certainly not run xfs_repair unless there is good 
>> reason.
>>     
>
> Erm, wha..?  Can you expand on "annihilated" a bit?  (please send
> me the full xfs_repair output if you still have it).
>   

Nathan Scott,

I'm very sorry; I don't have the output anymore.  By annihilated I mean 
that there were several directories trees that /didn't work/.  If you 
tried to cd into the directory or take a directory listing... or used a 
file that you knew was in these certain directories then you'd get pages 
of debug message to the console; and no usable data.  I re-ran 
xfs_repair and retried several times but the condition never seemed to 
improve or get worse for that matter.

I /incorrectly/ figured it was a known issue or I'd have saved the 
output.  Sorry again.

-- 
Jeffrey Hundstad


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 22:56     ` Nathan Scott
@ 2006-07-20 10:29       ` Kasper Sandberg
  0 siblings, 0 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-20 10:29 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Alistair John Strachan, Torsten Landschoff, linux-kernel, xfs

On Thu, 2006-07-20 at 08:56 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> > On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> > [snip]
> > > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > > but I feel that my partition is thorougly borked. Of course no data was
> > > > lost due to backups but still I'd like this bug to be fixed ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > > mkfs and restore?  Or at least get a full repair run?  If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> > 
> > Just out of interest, I've got a few XFS volumes that were created 24 months 
> > ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen 
> > any crashes so far.
> > 
> > Assuming I get the newest XFS repair tools on there, what's the disadvantage 
> > of repairing versus creating a new filesystem? What special circumstances are 
> > required to cause a crash?
> 
> There should be no disadvantage to repairing.  I will update the FAQ
> shortly to describe all the details of the problem, recommendations
> on how to address it, which kernel version is affected, etc.
this FAQ, is it this: http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
(btw, it seems that while only in the TOC once, you have the same about
2.6.17 twice..)..

which version of xfsprogs should i use while doing the xfs_check ?

> 
> cheers.
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-19 21:09 ` Torsten Landschoff
@ 2006-07-20 10:46   ` Jan Engelhardt
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Engelhardt @ 2006-07-20 10:46 UTC (permalink / raw)
  To: Torsten Landschoff; +Cc: Mattias Hedenskog, linux-kernel

> 
>> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
>> the fs I got the same error as in the previous post, running xfsprogs
>> 2.8.4. I haven't had the time to debug this issue further because the
>> box is quite critical but I'll keep an eye on the other disks on the
>> system still running xfs.

I think my experience is worth too: The (that is, of one box) xfs 
filesystem was created IIRC under 2.6.16, and survived throughout 2.6.17 
and 2.6.18-rc1 so far...


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-18 22:57 ` Nathan Scott
                     ` (2 preceding siblings ...)
  2006-07-19 21:14   ` Torsten Landschoff
@ 2006-07-22 16:27   ` Christian Kujau
  2006-07-23 23:01     ` Nathan Scott
  3 siblings, 1 reply; 24+ messages in thread
From: Christian Kujau @ 2006-07-22 16:27 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs

Hi folks,

On Wed, 19 Jul 2006, Nathan Scott wrote:
> 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> mkfs and restore?  Or at least get a full repair run?  If you did,
> and you still see issues in .18-rc1, please let me know asap.

well, at least for me, corruption/errors *started* with 2.6.18-rc1:

http://oss.sgi.com/archives/xfs/2006-07/msg00151.html

I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to 
2.6.18-rc2 and see the same errors:

xfs_da_do_buf: bno 16777216
dir: inode 24472381
Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c.  Caller 0xc0219230
Filesystem "md0": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.  Caller 0xc024d717

Please see the whole error/.config/logs here:

http://nerdbynature.de/bits/2.6.18-rc2/

Thanks,
Christian.
-- 
BOFH excuse #38:

secretary plugged hairdryer into UPS

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-22 16:27   ` Christian Kujau
@ 2006-07-23 23:01     ` Nathan Scott
  2006-07-28 17:01       ` Christian Kujau
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-23 23:01 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel, xfs

On Sat, Jul 22, 2006 at 05:27:24PM +0100, Christian Kujau wrote:
> On Wed, 19 Jul 2006, Nathan Scott wrote:
> > 2.6.18-rc1 should be fine (contains the corruption fix).  Did you
> > mkfs and restore?  Or at least get a full repair run?  If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> 
> well, at least for me, corruption/errors *started* with 2.6.18-rc1:
> ...
> I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to 
> 2.6.18-rc2 and see the same errors:
> 
> xfs_da_do_buf: bno 16777216
> dir: inode 24472381

This is an ondisk corruption - downgrading the kernel will not
resolve it.  The problem must be triggered by a combination of
operations on a directory; I'm certain that if you access inode
24472381 on your filesystem on 2.6.17, that it'll shutdown your
filesystem too.  See the FAQ entry for a description on how to
translate inums to paths, and also the repair -n step to detect
any corruption ondisk.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-23 23:01     ` Nathan Scott
@ 2006-07-28 17:01       ` Christian Kujau
  2006-07-28 21:48         ` Nathan Scott
  0 siblings, 1 reply; 24+ messages in thread
From: Christian Kujau @ 2006-07-28 17:01 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, xfs

Hello again,

On Mon, 24 Jul 2006, Nathan Scott wrote:
> filesystem too.  See the FAQ entry for a description on how to
> translate inums to paths, and also the repair -n step to detect
> any corruption ondisk.

I had two xfs filesystems and I first noticed that /data/Scratch was 
befallen from this bug. I did not care much about this (hence the
name :)) and I wanted to postpone the xfs_db surgery.

Unfortunately I forgot that "/" was also an XFS and it crashed 
yesterday. remounting ro helped a bit (so no process attempted to write 
on it. however, cp'ing from the ro-mounted xfs sometimes hung, 
unkillable), I setup a mini-root somewhere else and followed the
instructions in the FAQ. It did not go too well, lots of 
stuff was moved to lost+found, but every subsequent xfs_repair run 
found more and more errors. I decided to mkfs the partition and make use 
of my backups. my other "scratch" partition is still XFS but mounted ro 
and I'll try the xfsprogs fixes Nathan published on this one.

Oh, and I dd'ed the corrupt xfs-filesystem to a file, so I can play 
around with this one as well.

If anyone is interested, here are the typescripts from the horrible 
xfs_repair runs: http://nerdbynature.de/bits/2.6.18-rc2/log/

cheers,
Christian.
-- 
BOFH excuse #21:

POSIX compliance problem

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-28 17:01       ` Christian Kujau
@ 2006-07-28 21:48         ` Nathan Scott
  2006-07-29 20:22           ` Ralf Hildebrandt
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-28 21:48 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel, xfs

On Fri, Jul 28, 2006 at 05:01:24PM +0000, Christian Kujau wrote:
> I had two xfs filesystems and I first noticed that /data/Scratch was 
> befallen from this bug. I did not care much about this (hence the
> name :)) and I wanted to postpone the xfs_db surgery.
> ...
> found more and more errors. I decided to mkfs the partition and make use 
> of my backups. my other "scratch" partition is still XFS but mounted ro 
> and I'll try the xfsprogs fixes Nathan published on this one.

Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
list yesterday; please give that a go and let us know how it fares.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-28 21:48         ` Nathan Scott
@ 2006-07-29 20:22           ` Ralf Hildebrandt
  2006-07-29 22:28             ` David Chatterton
  0 siblings, 1 reply; 24+ messages in thread
From: Ralf Hildebrandt @ 2006-07-29 20:22 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Christian Kujau, linux-kernel, xfs

* Nathan Scott <nathans@sgi.com>:

> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
> list yesterday; please give that a go and let us know how it fares.

Just to let you know, I did a cvs checkout of xfs-cmds
as described on http://oss.sgi.com/projects/xfs/source.html

Then I saved the patch from
http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
"Original" link on hat page.

I build a xfs_Repair binary using that, transferred it onto an old
KLAX boot cd I had and repaired the XFS root on my laptop.

I got 5000 files in lost and found, mostly the whole manpages from my
system. Had to reinstall a few packages to restore lost binaries, but
that's all.

When will that horrible bug be fixed in 2.6.x? 

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums)         Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: XFS breakage in 2.6.18-rc1
  2006-07-29 20:22           ` Ralf Hildebrandt
@ 2006-07-29 22:28             ` David Chatterton
  0 siblings, 0 replies; 24+ messages in thread
From: David Chatterton @ 2006-07-29 22:28 UTC (permalink / raw)
  To: Nathan Scott, Christian Kujau, linux-kernel, xfs



Ralf Hildebrandt wrote:
> * Nathan Scott <nathans@sgi.com>:
> 
>> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
>> list yesterday; please give that a go and let us know how it fares.
> 
> Just to let you know, I did a cvs checkout of xfs-cmds
> as described on http://oss.sgi.com/projects/xfs/source.html
> 
> Then I saved the patch from
> http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
> "Original" link on hat page.
> 
> I build a xfs_Repair binary using that, transferred it onto an old
> KLAX boot cd I had and repaired the XFS root on my laptop.
> 
> I got 5000 files in lost and found, mostly the whole manpages from my
> system. Had to reinstall a few packages to restore lost binaries, but
> that's all.
> 
> When will that horrible bug be fixed in 2.6.x? 
> 

The bug is fixed in 2.6.17.7.

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-07-29 22:28 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 23:01   ` Nathan Scott
2006-07-20  5:51     ` Jeffrey Hundstad
2006-07-19 21:09 ` Torsten Landschoff
2006-07-20 10:46   ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2006-07-18 22:29 Torsten Landschoff
2006-07-18 22:57 ` Nathan Scott
2006-07-19  8:08   ` Alistair John Strachan
2006-07-19 22:56     ` Nathan Scott
2006-07-20 10:29       ` Kasper Sandberg
2006-07-19 10:21   ` Kasper Sandberg
2006-07-19 12:43     ` Alistair John Strachan
2006-07-19 15:25       ` Kasper Sandberg
2006-07-19 22:59     ` Nathan Scott
2006-07-19 21:14   ` Torsten Landschoff
2006-07-19 23:09     ` Nathan Scott
2006-07-22 16:27   ` Christian Kujau
2006-07-23 23:01     ` Nathan Scott
2006-07-28 17:01       ` Christian Kujau
2006-07-28 21:48         ` Nathan Scott
2006-07-29 20:22           ` Ralf Hildebrandt
2006-07-29 22:28             ` David Chatterton
2006-07-18 23:06 ` Kevin Radloff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox