* XFS breakage in 2.6.18-rc1
@ 2006-07-18 22:29 Torsten Landschoff
2006-07-18 22:57 ` Nathan Scott
2006-07-18 23:06 ` Kevin Radloff
0 siblings, 2 replies; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-18 22:29 UTC (permalink / raw)
To: linux-kernel
Hi friends,
I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
from my /var/log/kern.log), which ultimately led me to reinstall my
system:
Jul 17 07:10:12 pulsar kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
Jul 17 07:10:12 pulsar kernel: Linux version 2.6.18-rc1 (torsten@pulsar) (gcc version 4.1.2 20060630 (prerelease) (Debian 4.1.1-6)) #18 SMP PREEMPT Fri Jul 14 07:58:49 CEST 2006
...
Jul 17 07:10:32 pulsar kernel: agpgart: Putting AGP V3 device at 0000:03:00.0 into 4x mode
Jul 17 07:10:32 pulsar kernel: [drm] Setting GART location based on new memory map
Jul 17 07:10:32 pulsar kernel: [drm] Loading R200 Microcode
Jul 17 07:10:32 pulsar kernel: [drm] writeback test succeeded in 1 usecs
Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c. Caller 0xf8a837d0
Jul 17 07:33:53 pulsar kernel: [<f8a83313>] xfs_da_do_buf+0x4d3/0x900 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a8e0cf>] xfs_dir2_leafn_lookup_int+0x28f/0x520 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a89215>] xfs_dir2_data_log_unused+0x55/0x70 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a837d0>] xfs_da_read_buf+0x30/0x40 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a8c782>] xfs_dir2_node_removename+0x312/0x500 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a87337>] xfs_dir_removename+0xf7/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8a9720d>] xfs_ilock_nowait+0xcd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ab9783>] xfs_remove+0x393/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel: [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel: [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel: [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel: [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel: [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel: [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel: [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xf8ab97d7
Jul 17 07:33:53 pulsar kernel: [<f8aaf91d>] xfs_trans_cancel+0xdd/0x100 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ab97d7>] xfs_remove+0x3e7/0x4c0 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac4123>] xfs_vn_unlink+0x23/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel: [<c017a223>] mntput_no_expire+0x13/0x70
Jul 17 07:33:53 pulsar kernel: [<c016e0c1>] link_path_walk+0x71/0xf0
Jul 17 07:33:53 pulsar kernel: [<f8ab0638>] xfs_trans_unlocked_item+0x38/0x60 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ab63ff>] xfs_access+0x3f/0x50 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43ef>] xfs_vn_permission+0xf/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<f8ac43e0>] xfs_vn_permission+0x0/0x20 [xfs]
Jul 17 07:33:53 pulsar kernel: [<c016bdca>] permission+0x8a/0xc0
Jul 17 07:33:53 pulsar kernel: [<c016c3e9>] may_delete+0x39/0x120
Jul 17 07:33:53 pulsar kernel: [<c016c957>] vfs_unlink+0x87/0xe0
Jul 17 07:33:53 pulsar kernel: [<c016e96c>] do_unlinkat+0xcc/0x150
Jul 17 07:33:53 pulsar kernel: [<c0102fbf>] syscall_call+0x7/0xb
Jul 17 07:33:53 pulsar kernel: xfs_force_shutdown(dm-6,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xf8ac77bc
Jul 17 07:33:53 pulsar kernel: Filesystem "dm-6": Corruption of in-memory data detected. Shutting down filesystem: dm-6
Jul 17 07:33:53 pulsar kernel: Please umount the filesystem, and rectify the problem(s)
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 32K
Jul 17 07:39:32 pulsar kernel: Reducing readahead size to 8K
That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
enough this happened a few months after finally replaced my ancient disk
with a RAID1 array to make sure I do not lose data ;)
In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
After booting into 2.6.17 again, I could use /usr again but random files
contain null bytes, firefox segfaults instead of starting up and a number
of programs fail in mysterious ways. I tried to recover using xfs_repair
but I feel that my partition is thorougly borked. Of course no data was
lost due to backups but still I'd like this bug to be fixed ;-)
If more information from my logs is required, I can make it available (and any
part of the partition if required).
Greetings
Torsten
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:29 Torsten Landschoff
@ 2006-07-18 22:57 ` Nathan Scott
2006-07-19 8:08 ` Alistair John Strachan
` (3 more replies)
2006-07-18 23:06 ` Kevin Radloff
1 sibling, 4 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-18 22:57 UTC (permalink / raw)
To: Torsten Landschoff; +Cc: linux-kernel, xfs
On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> Hi friends,
Hi Torsten,
> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my
> system:
>
> Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
I suspect you had some residual directory corruption from using the
2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
fixed in the latest -stable point release).
> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was
> lost due to backups but still I'd like this bug to be fixed ;-)
2.6.18-rc1 should be fine (contains the corruption fix). Did you
mkfs and restore? Or at least get a full repair run? If you did,
and you still see issues in .18-rc1, please let me know asap.
thanks.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:29 Torsten Landschoff
2006-07-18 22:57 ` Nathan Scott
@ 2006-07-18 23:06 ` Kevin Radloff
1 sibling, 0 replies; 24+ messages in thread
From: Kevin Radloff @ 2006-07-18 23:06 UTC (permalink / raw)
To: Torsten Landschoff; +Cc: linux-kernel
On 7/18/06, Torsten Landschoff <torsten@debian.org> wrote:
> Hi friends,
>
> I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> from my /var/log/kern.log), which ultimately led me to reinstall my
> system:
[snip]
> That problem occured during a dist-upgrade, dm-6 is my /usr partition. Funny
> enough this happened a few months after finally replaced my ancient disk
> with a RAID1 array to make sure I do not lose data ;)
>
>
> In any case it seems like the XFS driver in 2.6.18-rc1 is decently broken.
> After booting into 2.6.17 again, I could use /usr again but random files
> contain null bytes, firefox segfaults instead of starting up and a number
> of programs fail in mysterious ways. I tried to recover using xfs_repair
> but I feel that my partition is thorougly borked. Of course no data was
> lost due to backups but still I'd like this bug to be fixed ;-)
>
> If more information from my logs is required, I can make it available (and any
> part of the partition if required).
That looks like the death knell of my /, which succumbed on Friday as
a result (I believe) of the corruption bug that was in 2.6.16/17.
Ironically enough, I also saw the problem during an aptitude upgrade.
Also see this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115070320401919&w=2
--
Kevin 'radsaq' Radloff
radsaq@gmail.com
http://thesaq.com/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:57 ` Nathan Scott
@ 2006-07-19 8:08 ` Alistair John Strachan
2006-07-19 22:56 ` Nathan Scott
2006-07-19 10:21 ` Kasper Sandberg
` (2 subsequent siblings)
3 siblings, 1 reply; 24+ messages in thread
From: Alistair John Strachan @ 2006-07-19 8:08 UTC (permalink / raw)
To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs
On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
[snip]
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was
> > lost due to backups but still I'd like this bug to be fixed ;-)
>
> 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> mkfs and restore? Or at least get a full repair run? If you did,
> and you still see issues in .18-rc1, please let me know asap.
Just out of interest, I've got a few XFS volumes that were created 24 months
ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen
any crashes so far.
Assuming I get the newest XFS repair tools on there, what's the disadvantage
of repairing versus creating a new filesystem? What special circumstances are
required to cause a crash?
--
Cheers,
Alistair.
Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:57 ` Nathan Scott
2006-07-19 8:08 ` Alistair John Strachan
@ 2006-07-19 10:21 ` Kasper Sandberg
2006-07-19 12:43 ` Alistair John Strachan
2006-07-19 22:59 ` Nathan Scott
2006-07-19 21:14 ` Torsten Landschoff
2006-07-22 16:27 ` Christian Kujau
3 siblings, 2 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-19 10:21 UTC (permalink / raw)
To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs
On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > Hi friends,
>
> Hi Torsten,
>
> > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > from my /var/log/kern.log), which ultimately led me to reinstall my
> > system:
> >
> > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
>
> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).
This has me very worried.
i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
does this mean my .17-rc3 may have corrupted my filesystem?
what action do you suggest i do now?
>
> > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > but I feel that my partition is thorougly borked. Of course no data was
> > lost due to backups but still I'd like this bug to be fixed ;-)
>
> 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> mkfs and restore? Or at least get a full repair run? If you did,
> and you still see issues in .18-rc1, please let me know asap.
>
> thanks.
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 10:21 ` Kasper Sandberg
@ 2006-07-19 12:43 ` Alistair John Strachan
2006-07-19 15:25 ` Kasper Sandberg
2006-07-19 22:59 ` Nathan Scott
1 sibling, 1 reply; 24+ messages in thread
From: Alistair John Strachan @ 2006-07-19 12:43 UTC (permalink / raw)
To: Kasper Sandberg; +Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs
On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends,
> >
> > Hi Torsten,
> >
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > system:
> > >
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> >
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
>
> This has me very worried.
>
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
>
> what action do you suggest i do now?
>
> > > of programs fail in mysterious ways. I tried to recover using
> > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > no data was lost due to backups but still I'd like this bug to be fixed
> > > ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> > mkfs and restore? Or at least get a full repair run? If you did,
> > and you still see issues in .18-rc1, please let me know asap.
> >
> > thanks.
According to another thread Nathan just responded to, it sounds like we need
to wait for a new version of the xfsprogs package, and then run xfs_repair on
the affected filesystems. I wouldn't worry about it too much if you've not
had any crashes. The damage can be repaired, just not right now.
I'm still waiting for a crash on a machine that has been under heavy load for
28 days, so it's obviously not _that_ easy to trigger.
--
Cheers,
Alistair.
Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
@ 2006-07-19 14:17 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 21:09 ` Torsten Landschoff
0 siblings, 2 replies; 24+ messages in thread
From: Mattias Hedenskog @ 2006-07-19 14:17 UTC (permalink / raw)
To: linux-kernel
> That looks like the death knell of my /, which succumbed on Friday as
> a result (I believe) of the corruption bug that was in 2.6.16/17.
> Ironically enough, I also saw the problem during an aptitude upgrade.
Hi all,
I just want to confirm this bug as well and unfortunately it was my
system disk too who had to take the hit. Im running 2.6.16 and its
reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
the fs I got the same error as in the previous post, running xfsprogs
2.8.4. I haven't had the time to debug this issue further because the
box is quite critical but I'll keep an eye on the other disks on the
system still running xfs.
Regards,
Mattias Hedenskog
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
@ 2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 23:01 ` Nathan Scott
2006-07-19 21:09 ` Torsten Landschoff
1 sibling, 1 reply; 24+ messages in thread
From: Jeffrey E. Hundstad @ 2006-07-19 14:59 UTC (permalink / raw)
To: Mattias Hedenskog; +Cc: linux-kernel
I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it
annihilated the volume. This volume was not showing signs of crashing.
So... I guess I would certainly not run xfs_repair unless there is good
reason.
--
Jeffrey Hundstad
PS. ...yes, I had a recent backup ;-)
Mattias Hedenskog wrote:
>> That looks like the death knell of my /, which succumbed on Friday as
>> a result (I believe) of the corruption bug that was in 2.6.16/17.
>> Ironically enough, I also saw the problem during an aptitude upgrade.
>
> Hi all,
>
> I just want to confirm this bug as well and unfortunately it was my
> system disk too who had to take the hit. Im running 2.6.16 and its
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
>
> Regards,
> Mattias Hedenskog
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 12:43 ` Alistair John Strachan
@ 2006-07-19 15:25 ` Kasper Sandberg
0 siblings, 0 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-19 15:25 UTC (permalink / raw)
To: Alistair John Strachan
Cc: Nathan Scott, Torsten Landschoff, linux-kernel, xfs
On Wed, 2006-07-19 at 13:43 +0100, Alistair John Strachan wrote:
> On Wednesday 19 July 2006 11:21, Kasper Sandberg wrote:
> > On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > > Hi friends,
> > >
> > > Hi Torsten,
> > >
> > > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > > system:
> > > >
> > > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> > >
> > > I suspect you had some residual directory corruption from using the
> > > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > > fixed in the latest -stable point release).
> >
> > This has me very worried.
> >
> > i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> > does this mean my .17-rc3 may have corrupted my filesystem?
> >
> > what action do you suggest i do now?
> >
> > > > of programs fail in mysterious ways. I tried to recover using
> > > > xfs_repair but I feel that my partition is thorougly borked. Of course
> > > > no data was lost due to backups but still I'd like this bug to be fixed
> > > > ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> > > mkfs and restore? Or at least get a full repair run? If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> > >
> > > thanks.
>
> According to another thread Nathan just responded to, it sounds like we need
> to wait for a new version of the xfsprogs package, and then run xfs_repair on
> the affected filesystems. I wouldn't worry about it too much if you've not
> had any crashes. The damage can be repaired, just not right now.
without ANY loss? because even though it would be abit painful for me to
do, i do have the option of smashing in a new drive, copy everything,
and reinitialize my filesystem.
>
> I'm still waiting for a crash on a machine that has been under heavy load for
> 28 days, so it's obviously not _that_ easy to trigger.
so basically if i upgrade to a safe kernel before i do get these errors,
im good?
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 21:09 ` Torsten Landschoff
2006-07-20 10:46 ` Jan Engelhardt
1 sibling, 1 reply; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:09 UTC (permalink / raw)
To: Mattias Hedenskog; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 902 bytes --]
On Wed, Jul 19, 2006 at 04:17:50PM +0200, Mattias Hedenskog wrote:
> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
> the fs I got the same error as in the previous post, running xfsprogs
> 2.8.4. I haven't had the time to debug this issue further because the
> box is quite critical but I'll keep an eye on the other disks on the
> system still running xfs.
I would not try running xfs_repair without cause as well. My /home did
survive the XFS problems but I ran xfs_repair "just to be sure". Now the
same problem on that partition, mostly unreadable. :( So, do not run
xfs_repair without a cause ;-)
For reference, I think it was xfsprogs 2.7.14 that I was using, the
latest in Debian.
FYI: Nothing important on /home, I think - I can not be sure since I
backup only selectively since I do not have proper backup mediums :(
Greetings
Torsten
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:57 ` Nathan Scott
2006-07-19 8:08 ` Alistair John Strachan
2006-07-19 10:21 ` Kasper Sandberg
@ 2006-07-19 21:14 ` Torsten Landschoff
2006-07-19 23:09 ` Nathan Scott
2006-07-22 16:27 ` Christian Kujau
3 siblings, 1 reply; 24+ messages in thread
From: Torsten Landschoff @ 2006-07-19 21:14 UTC (permalink / raw)
To: Nathan Scott; +Cc: linux-kernel, xfs
[-- Attachment #1: Type: text/plain, Size: 719 bytes --]
Hi Nathan,
On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:
> I suspect you had some residual directory corruption from using the
> 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> fixed in the latest -stable point release).
That probably the cause of my problem. Thanks for the info!
BTW: I think there was nothing important on the broken filesystems, but
I'd like to keep what's still there anyway just in case... How would you
suggest should I copy that data? I fear, just mounting and using cp
might break and shutdown the FS again, would xfsdump be more
appropriate?
Thanks for XFS, I am using it for years in production servers!
Greetings
Torsten
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 8:08 ` Alistair John Strachan
@ 2006-07-19 22:56 ` Nathan Scott
2006-07-20 10:29 ` Kasper Sandberg
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 22:56 UTC (permalink / raw)
To: Alistair John Strachan; +Cc: Torsten Landschoff, linux-kernel, xfs
On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> [snip]
> > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > but I feel that my partition is thorougly borked. Of course no data was
> > > lost due to backups but still I'd like this bug to be fixed ;-)
> >
> > 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> > mkfs and restore? Or at least get a full repair run? If you did,
> > and you still see issues in .18-rc1, please let me know asap.
>
> Just out of interest, I've got a few XFS volumes that were created 24 months
> ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen
> any crashes so far.
>
> Assuming I get the newest XFS repair tools on there, what's the disadvantage
> of repairing versus creating a new filesystem? What special circumstances are
> required to cause a crash?
There should be no disadvantage to repairing. I will update the FAQ
shortly to describe all the details of the problem, recommendations
on how to address it, which kernel version is affected, etc.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 10:21 ` Kasper Sandberg
2006-07-19 12:43 ` Alistair John Strachan
@ 2006-07-19 22:59 ` Nathan Scott
1 sibling, 0 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 22:59 UTC (permalink / raw)
To: Kasper Sandberg; +Cc: Torsten Landschoff, linux-kernel, xfs
On Wed, Jul 19, 2006 at 12:21:08PM +0200, Kasper Sandberg wrote:
> On Wed, 2006-07-19 at 08:57 +1000, Nathan Scott wrote:
> > On Wed, Jul 19, 2006 at 12:29:41AM +0200, Torsten Landschoff wrote:
> > > Hi friends,
> >
> > Hi Torsten,
> >
> > > I upgraded to 2.6.18-rc1 on sunday, with the following results (taken
> > > from my /var/log/kern.log), which ultimately led me to reinstall my
> > > system:
> > >
> > > Jul 17 07:33:53 pulsar kernel: xfs_da_do_buf: bno 16777216
> > > Jul 17 07:33:53 pulsar kernel: dir: inode 54526538
> >
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
> This has me very worried.
>
> i just upgraded to .18-rc1-git5 when it came out, i used .17-rc3 before.
> does this mean my .17-rc3 may have corrupted my filesystem?
>
> what action do you suggest i do now?
The odds are decent that you're unaffected. You can check your filesystem
using xfs_check or xfs_repair -n and these will give you a good indication
as to whether further action is required.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 14:59 ` Jeffrey E. Hundstad
@ 2006-07-19 23:01 ` Nathan Scott
2006-07-20 5:51 ` Jeffrey Hundstad
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 23:01 UTC (permalink / raw)
To: Jeffrey E. Hundstad; +Cc: Mattias Hedenskog, linux-kernel, xfs
On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it
> annihilated the volume. This volume was not showing signs of crashing.
> So... I guess I would certainly not run xfs_repair unless there is good
> reason.
Erm, wha..? Can you expand on "annihilated" a bit? (please send
me the full xfs_repair output if you still have it).
thanks.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 21:14 ` Torsten Landschoff
@ 2006-07-19 23:09 ` Nathan Scott
0 siblings, 0 replies; 24+ messages in thread
From: Nathan Scott @ 2006-07-19 23:09 UTC (permalink / raw)
To: Torsten Landschoff; +Cc: linux-kernel, xfs
On Wed, Jul 19, 2006 at 11:14:02PM +0200, Torsten Landschoff wrote:
> On Wed, Jul 19, 2006 at 08:57:31AM +1000, Nathan Scott wrote:
> > I suspect you had some residual directory corruption from using the
> > 2.6.17 XFS (which is known to have a lurking dir2 corruption issue,
> > fixed in the latest -stable point release).
>
> That probably the cause of my problem. Thanks for the info!
>
> BTW: I think there was nothing important on the broken filesystems, but
> I'd like to keep what's still there anyway just in case... How would you
> suggest should I copy that data? I fear, just mounting and using cp
> might break and shutdown the FS again, would xfsdump be more
> appropriate?
Yeah, xfsdumps not a bad idea, the interfaces it uses may well
be able to avoid the cases that trigger shutdown. Otherwise it
is a case of identifying the problem directory inode (the inum
is reported in the shutdown trace) and avoiding that path when
cp'ing - you can match inum to path via xfs_ncheck.
> Thanks for XFS, I am using it for years in production servers!
Thanks for the kind words, they're much appreciated at times
like these. :-]
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 23:01 ` Nathan Scott
@ 2006-07-20 5:51 ` Jeffrey Hundstad
0 siblings, 0 replies; 24+ messages in thread
From: Jeffrey Hundstad @ 2006-07-20 5:51 UTC (permalink / raw)
To: Nathan Scott; +Cc: Mattias Hedenskog, linux-kernel, xfs
Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:59:33AM -0500, Jeffrey E. Hundstad wrote:
>
>> I did try the xfs_repair 2.8.4 for a volume running on 2.6.17.4 and it
>> annihilated the volume. This volume was not showing signs of crashing.
>> So... I guess I would certainly not run xfs_repair unless there is good
>> reason.
>>
>
> Erm, wha..? Can you expand on "annihilated" a bit? (please send
> me the full xfs_repair output if you still have it).
>
Nathan Scott,
I'm very sorry; I don't have the output anymore. By annihilated I mean
that there were several directories trees that /didn't work/. If you
tried to cd into the directory or take a directory listing... or used a
file that you knew was in these certain directories then you'd get pages
of debug message to the console; and no usable data. I re-ran
xfs_repair and retried several times but the condition never seemed to
improve or get worse for that matter.
I /incorrectly/ figured it was a known issue or I'd have saved the
output. Sorry again.
--
Jeffrey Hundstad
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 22:56 ` Nathan Scott
@ 2006-07-20 10:29 ` Kasper Sandberg
0 siblings, 0 replies; 24+ messages in thread
From: Kasper Sandberg @ 2006-07-20 10:29 UTC (permalink / raw)
To: Nathan Scott
Cc: Alistair John Strachan, Torsten Landschoff, linux-kernel, xfs
On Thu, 2006-07-20 at 08:56 +1000, Nathan Scott wrote:
> On Wed, Jul 19, 2006 at 09:08:30AM +0100, Alistair John Strachan wrote:
> > On Tuesday 18 July 2006 23:57, Nathan Scott wrote:
> > [snip]
> > > > of programs fail in mysterious ways. I tried to recover using xfs_repair
> > > > but I feel that my partition is thorougly borked. Of course no data was
> > > > lost due to backups but still I'd like this bug to be fixed ;-)
> > >
> > > 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> > > mkfs and restore? Or at least get a full repair run? If you did,
> > > and you still see issues in .18-rc1, please let me know asap.
> >
> > Just out of interest, I've got a few XFS volumes that were created 24 months
> > ago on a machine that I upgraded to 2.6.17 about a month ago. I haven't seen
> > any crashes so far.
> >
> > Assuming I get the newest XFS repair tools on there, what's the disadvantage
> > of repairing versus creating a new filesystem? What special circumstances are
> > required to cause a crash?
>
> There should be no disadvantage to repairing. I will update the FAQ
> shortly to describe all the details of the problem, recommendations
> on how to address it, which kernel version is affected, etc.
this FAQ, is it this: http://oss.sgi.com/projects/xfs/faq.html#dir2 ?
(btw, it seems that while only in the TOC once, you have the same about
2.6.17 twice..)..
which version of xfsprogs should i use while doing the xfs_check ?
>
> cheers.
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-19 21:09 ` Torsten Landschoff
@ 2006-07-20 10:46 ` Jan Engelhardt
0 siblings, 0 replies; 24+ messages in thread
From: Jan Engelhardt @ 2006-07-20 10:46 UTC (permalink / raw)
To: Torsten Landschoff; +Cc: Mattias Hedenskog, linux-kernel
>
>> reproducible in 2.6.17 and 2.6.18-rc1 as well. When I tried to repair
>> the fs I got the same error as in the previous post, running xfsprogs
>> 2.8.4. I haven't had the time to debug this issue further because the
>> box is quite critical but I'll keep an eye on the other disks on the
>> system still running xfs.
I think my experience is worth too: The (that is, of one box) xfs
filesystem was created IIRC under 2.6.16, and survived throughout 2.6.17
and 2.6.18-rc1 so far...
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-18 22:57 ` Nathan Scott
` (2 preceding siblings ...)
2006-07-19 21:14 ` Torsten Landschoff
@ 2006-07-22 16:27 ` Christian Kujau
2006-07-23 23:01 ` Nathan Scott
3 siblings, 1 reply; 24+ messages in thread
From: Christian Kujau @ 2006-07-22 16:27 UTC (permalink / raw)
To: Nathan Scott; +Cc: Torsten Landschoff, linux-kernel, xfs
Hi folks,
On Wed, 19 Jul 2006, Nathan Scott wrote:
> 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> mkfs and restore? Or at least get a full repair run? If you did,
> and you still see issues in .18-rc1, please let me know asap.
well, at least for me, corruption/errors *started* with 2.6.18-rc1:
http://oss.sgi.com/archives/xfs/2006-07/msg00151.html
I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to
2.6.18-rc2 and see the same errors:
xfs_da_do_buf: bno 16777216
dir: inode 24472381
Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c. Caller 0xc0219230
Filesystem "md0": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xc024d717
Please see the whole error/.config/logs here:
http://nerdbynature.de/bits/2.6.18-rc2/
Thanks,
Christian.
--
BOFH excuse #38:
secretary plugged hairdryer into UPS
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-22 16:27 ` Christian Kujau
@ 2006-07-23 23:01 ` Nathan Scott
2006-07-28 17:01 ` Christian Kujau
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-23 23:01 UTC (permalink / raw)
To: Christian Kujau; +Cc: linux-kernel, xfs
On Sat, Jul 22, 2006 at 05:27:24PM +0100, Christian Kujau wrote:
> On Wed, 19 Jul 2006, Nathan Scott wrote:
> > 2.6.18-rc1 should be fine (contains the corruption fix). Did you
> > mkfs and restore? Or at least get a full repair run? If you did,
> > and you still see issues in .18-rc1, please let me know asap.
>
> well, at least for me, corruption/errors *started* with 2.6.18-rc1:
> ...
> I downgraded to 2.6.17.5 and the errors stopped. Now I've upgraded to
> 2.6.18-rc2 and see the same errors:
>
> xfs_da_do_buf: bno 16777216
> dir: inode 24472381
This is an ondisk corruption - downgrading the kernel will not
resolve it. The problem must be triggered by a combination of
operations on a directory; I'm certain that if you access inode
24472381 on your filesystem on 2.6.17, that it'll shutdown your
filesystem too. See the FAQ entry for a description on how to
translate inums to paths, and also the repair -n step to detect
any corruption ondisk.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-23 23:01 ` Nathan Scott
@ 2006-07-28 17:01 ` Christian Kujau
2006-07-28 21:48 ` Nathan Scott
0 siblings, 1 reply; 24+ messages in thread
From: Christian Kujau @ 2006-07-28 17:01 UTC (permalink / raw)
To: Nathan Scott; +Cc: linux-kernel, xfs
Hello again,
On Mon, 24 Jul 2006, Nathan Scott wrote:
> filesystem too. See the FAQ entry for a description on how to
> translate inums to paths, and also the repair -n step to detect
> any corruption ondisk.
I had two xfs filesystems and I first noticed that /data/Scratch was
befallen from this bug. I did not care much about this (hence the
name :)) and I wanted to postpone the xfs_db surgery.
Unfortunately I forgot that "/" was also an XFS and it crashed
yesterday. remounting ro helped a bit (so no process attempted to write
on it. however, cp'ing from the ro-mounted xfs sometimes hung,
unkillable), I setup a mini-root somewhere else and followed the
instructions in the FAQ. It did not go too well, lots of
stuff was moved to lost+found, but every subsequent xfs_repair run
found more and more errors. I decided to mkfs the partition and make use
of my backups. my other "scratch" partition is still XFS but mounted ro
and I'll try the xfsprogs fixes Nathan published on this one.
Oh, and I dd'ed the corrupt xfs-filesystem to a file, so I can play
around with this one as well.
If anyone is interested, here are the typescripts from the horrible
xfs_repair runs: http://nerdbynature.de/bits/2.6.18-rc2/log/
cheers,
Christian.
--
BOFH excuse #21:
POSIX compliance problem
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-28 17:01 ` Christian Kujau
@ 2006-07-28 21:48 ` Nathan Scott
2006-07-29 20:22 ` Ralf Hildebrandt
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Scott @ 2006-07-28 21:48 UTC (permalink / raw)
To: Christian Kujau; +Cc: linux-kernel, xfs
On Fri, Jul 28, 2006 at 05:01:24PM +0000, Christian Kujau wrote:
> I had two xfs filesystems and I first noticed that /data/Scratch was
> befallen from this bug. I did not care much about this (hence the
> name :)) and I wanted to postpone the xfs_db surgery.
> ...
> found more and more errors. I decided to mkfs the partition and make use
> of my backups. my other "scratch" partition is still XFS but mounted ro
> and I'll try the xfsprogs fixes Nathan published on this one.
Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
list yesterday; please give that a go and let us know how it fares.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-28 21:48 ` Nathan Scott
@ 2006-07-29 20:22 ` Ralf Hildebrandt
2006-07-29 22:28 ` David Chatterton
0 siblings, 1 reply; 24+ messages in thread
From: Ralf Hildebrandt @ 2006-07-29 20:22 UTC (permalink / raw)
To: Nathan Scott; +Cc: Christian Kujau, linux-kernel, xfs
* Nathan Scott <nathans@sgi.com>:
> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
> list yesterday; please give that a go and let us know how it fares.
Just to let you know, I did a cvs checkout of xfs-cmds
as described on http://oss.sgi.com/projects/xfs/source.html
Then I saved the patch from
http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
"Original" link on hat page.
I build a xfs_Repair binary using that, transferred it onto an old
KLAX boot cd I had and repaired the XFS root on my laptop.
I got 5000 files in lost and found, mostly the whole manpages from my
system. Had to reinstall a few packages to restore lost binaries, but
that's all.
When will that horrible bug be fixed in 2.6.x?
--
Ralf Hildebrandt (i.A. des IT-Zentrums) Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to spamtrap@charite.de
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: XFS breakage in 2.6.18-rc1
2006-07-29 20:22 ` Ralf Hildebrandt
@ 2006-07-29 22:28 ` David Chatterton
0 siblings, 0 replies; 24+ messages in thread
From: David Chatterton @ 2006-07-29 22:28 UTC (permalink / raw)
To: Nathan Scott, Christian Kujau, linux-kernel, xfs
Ralf Hildebrandt wrote:
> * Nathan Scott <nathans@sgi.com>:
>
>> Barry sent an xfs_repair patch to resolve this issue to the xfs@oss.sgi.com
>> list yesterday; please give that a go and let us know how it fares.
>
> Just to let you know, I did a cvs checkout of xfs-cmds
> as described on http://oss.sgi.com/projects/xfs/source.html
>
> Then I saved the patch from
> http://oss.sgi.com/archives/xfs/2006-07/msg00374.html using the
> "Original" link on hat page.
>
> I build a xfs_Repair binary using that, transferred it onto an old
> KLAX boot cd I had and repaired the XFS root on my laptop.
>
> I got 5000 files in lost and found, mostly the whole manpages from my
> system. Had to reinstall a few packages to restore lost binaries, but
> that's all.
>
> When will that horrible bug be fixed in 2.6.x?
>
The bug is fixed in 2.6.17.7.
David
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-07-29 22:28 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-19 14:17 XFS breakage in 2.6.18-rc1 Mattias Hedenskog
2006-07-19 14:59 ` Jeffrey E. Hundstad
2006-07-19 23:01 ` Nathan Scott
2006-07-20 5:51 ` Jeffrey Hundstad
2006-07-19 21:09 ` Torsten Landschoff
2006-07-20 10:46 ` Jan Engelhardt
-- strict thread matches above, loose matches on Subject: below --
2006-07-18 22:29 Torsten Landschoff
2006-07-18 22:57 ` Nathan Scott
2006-07-19 8:08 ` Alistair John Strachan
2006-07-19 22:56 ` Nathan Scott
2006-07-20 10:29 ` Kasper Sandberg
2006-07-19 10:21 ` Kasper Sandberg
2006-07-19 12:43 ` Alistair John Strachan
2006-07-19 15:25 ` Kasper Sandberg
2006-07-19 22:59 ` Nathan Scott
2006-07-19 21:14 ` Torsten Landschoff
2006-07-19 23:09 ` Nathan Scott
2006-07-22 16:27 ` Christian Kujau
2006-07-23 23:01 ` Nathan Scott
2006-07-28 17:01 ` Christian Kujau
2006-07-28 21:48 ` Nathan Scott
2006-07-29 20:22 ` Ralf Hildebrandt
2006-07-29 22:28 ` David Chatterton
2006-07-18 23:06 ` Kevin Radloff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox