public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* "Corrupt dinode 6242615, (btree extents).  This is a bug."
@ 2011-08-08 10:05 Hanne Munkholm
  2011-08-08 11:58 ` Michael Monnerie
  2011-08-08 23:58 ` "Corrupt dinode 6242615, (btree extents). This is a bug." Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Hanne Munkholm @ 2011-08-08 10:05 UTC (permalink / raw)
  To: xfs

Hi list.

I have an xfs file system which got damaged due to not being
properly unmounted before the iSCSI connection terminated (I
think. Corrupted it is).

I cannot mount it. 
mount: wrong fs type, bad option, bad superblock on /dev/sdd,
         missing codepage or helper program, or other error
         In some cases useful info is found in syslog - try
         dmesg | tail  or so

xfs_check suggests running xfs_repair -L. 
ERROR: The filesystem has valuable metadata changes in a log
which needs to be replayed.  Mount the filesystem to replay the log, and
unmount it before re-running xfs_check.  If you are unable to mount the
filesystem, then use the xfs_repair -L option to destroy the log and attempt a
repair.  Note that destroying the log may cause corruption -- please
attempt a mount of the filesystem before doing this.

I haven't done that yet.

Instead I ran
xfs_repair -n.
I got a lot of output that looks promising for a repair IMO, at
least it acknowleges an xfs system beoing there:

xfs_repair -n /dev/sdd
Phase 1 - find and verify superblock...
Phase 2 - using internal log
          - scan filesystem freespace and inode maps...
          - found root inode chunk
Phase 3 - for each AG...
          - scan (but don't clear) agi unlinked lists...
          - process known inodes and perform inode discovery...
          - agno = 0
bad nblocks 952 for inode 6242615, would reset to 972
bad nextents 182 for inode 6242615, would reset to 185
imap claims a free inode 13352640 is in use, would correct imap and clear inode
imap claims a free inode 13352641 is in use, would correct imap and clear inode
<snip>
         - agno = 1
          - agno = 2
          - agno = 3
          - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
          - setting up duplicate extent list...
          - check for inodes claiming duplicate blocks...
          - agno = 0
          - agno = 3
          - agno = 2
          - agno = 1
bad nblocks 952 for inode 6242615, would reset to 972
bad nextents 182 for inode 6242615, would reset to 185
entry "sample_000001299840_0_0.000000.pdb" at block 764 offset
2512 in directory inode 6242615 references free inode 13352640
  	would clear inode number in entry at offset 2512...
entry "sample_000001299860_0_0.000000.pdb" at block 764 offset
2560 in directory inode 6242615 references free inode 13352641
<snip>
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
          - traversing filesystem ...
corrupt dinode 6242615, (btree extents).  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
corrupt dinode 6242615, (btree extents).  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
corrupt dinode 6242615, (btree extents).  This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
Segmentation fault

But then it fails in phase 6, as shown above.

My questions are now:

a) Does it look like clearing the log with -L would be a good
idea? To me it looks like a lot of errors that are not log
errors is found?

b) What's with the segfault? What happens when the "real" repair
with no -n gets to the segfault? Is it dangerous to try it if it
segfaults somewhere halfway? (More dangerous than it would
normally be).

It is a 6TB file system and I am running a terribly old Xen
kernel: 2.6.26-2-xen-amd64 #1 SMP Mon Jun 13 18:44:16 UTC 2011
x86_64 GNU/Linux

I have placed a xfs_metadump here:
http://people.binf.ku.dk/hanne/tmp/metadata.gz

Thanks in advance for some help in interpreting what the xfs
tools are telling me.

PS We do have some backup so I am not interested in smug
comments about backup :) only in technical help in repairing
this file system or concluding that we cannot).


Med venlig hilsen / Best regards
--
Hanne Munkholm                      Email: hanne@binf.ku.dk
Systemadministrator                 Tlf: +45 35 32 13 49

Bioinformatik-centret
Københavns Biocenter, Biologisk Institut
Ole Maaløes Vej 5, 2200 København N

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrupt dinode 6242615, (btree extents).  This is a bug."
  2011-08-08 10:05 "Corrupt dinode 6242615, (btree extents). This is a bug." Hanne Munkholm
@ 2011-08-08 11:58 ` Michael Monnerie
  2011-08-08 12:53   ` "Corrupt dinode 6242615, (btree extents). This is a bug."<o Hanne Munkholm
  2011-08-08 23:58 ` "Corrupt dinode 6242615, (btree extents). This is a bug." Dave Chinner
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Monnerie @ 2011-08-08 11:58 UTC (permalink / raw)
  To: xfs; +Cc: Hanne Munkholm


[-- Attachment #1.1: Type: Text/Plain, Size: 1264 bytes --]

On Montag, 8. August 2011 Hanne Munkholm wrote:
> a) Does it look like clearing the log with -L would be a good
> idea? To me it looks like a lot of errors that are not log
> errors is found?

Yes, mount -L/umount once to replay the log, when you are sure the block 
device works correctly again.
 
> b) What's with the segfault? What happens when the "real" repair
> with no -n gets to the segfault? Is it dangerous to try it if it
> segfaults somewhere halfway? (More dangerous than it would
> normally be).
> 
> It is a 6TB file system and I am running a terribly old Xen
> kernel: 2.6.26-2-xen-amd64 #1 SMP Mon Jun 13 18:44:16 UTC 2011
> x86_64 GNU/Linux

Try to get the newest xfsprogs, maybe that's a bug in xfs_repair itself 
that has been fixed. If that doesn't help, a newer kernel might solve 
your problem.

Generally I've been able to fix all problems, and if not, xfsprogs has 
received an update to fix a bug. But it's been a long time since the 
latest bug, so an actual xfs_repair might help you.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrupt dinode 6242615, (btree extents).  This is a bug."<o
  2011-08-08 11:58 ` Michael Monnerie
@ 2011-08-08 12:53   ` Hanne Munkholm
  2011-08-08 13:12     ` Michael Monnerie
  0 siblings, 1 reply; 6+ messages in thread
From: Hanne Munkholm @ 2011-08-08 12:53 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On Mon, 8 Aug 2011, Michael Monnerie wrote:
> Yes, mount -L/umount once to replay the log, when you are sure the block
> device works correctly again.

Thank you very much for your reply.

I don't see an -L option to the mount command meaning "replay
log", I think if I was able to mount it, it would replay the log
by itself? However, I cannot mount it.

<snip>

> Try to get the newest xfsprogs, maybe that's a bug in xfs_repair itself
> that has been fixed. If that doesn't help, a newer kernel might solve
> your problem.

That is a good idea, I downloaded xfsprogs 3.1.5 but it gave the
same results. Provided that it's OK to run it from the folder I
compiled it and it is not using libraries from the old version
somehow.

Trying with a newer kernel is possible but some trouble, is it
likely to help?

If I run an xfs_repair, will it get me anywhere even if it
segfaults at the same point?

Do you recommend that I run an xfs_repair with or without -L, or
should I definetly try a new kernel first?


Med venlig hilsen / Best regards
--
Hanne Munkholm                      Email: hanne@binf.ku.dk
Systemadministrator                 Tlf: +45 35 32 13 49

Bioinformatik-centret
Københavns Biocenter, Biologisk Institut
Ole Maaløes Vej 5, 2200 København N

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrupt dinode 6242615, (btree extents).  This is a bug."<o
  2011-08-08 12:53   ` "Corrupt dinode 6242615, (btree extents). This is a bug."<o Hanne Munkholm
@ 2011-08-08 13:12     ` Michael Monnerie
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Monnerie @ 2011-08-08 13:12 UTC (permalink / raw)
  To: Hanne Munkholm; +Cc: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 1575 bytes --]

On Montag, 8. August 2011 Hanne Munkholm wrote:
> On Mon, 8 Aug 2011, Michael Monnerie wrote:
> > Yes, mount -L/umount once to replay the log, when you are sure the
> > block device works correctly again.
> 
> Thank you very much for your reply.
> 
> I don't see an -L option to the mount command meaning "replay
> log", I think if I was able to mount it, it would replay the log
> by itself? However, I cannot mount it.

Sorry, mixing up with xfs_repair -L. If mount doesn't work, then try to 
xfs_repair -L.

> Trying with a newer kernel is possible but some trouble, is it
> likely to help?

Could be. Dave Chinner knows more. I'd say if a newer kernel is 
problematic, try xfs_repair -L first, and keep the output for review. As 
you have a backup anyway, you're on the safe side. I've needed 
xfs_repair often and never had a problem because of it. YMMV though.
 
> If I run an xfs_repair, will it get me anywhere even if it
> segfaults at the same point?

Yes, sometimes running it several times solves all problems.
 
> Do you recommend that I run an xfs_repair with or without -L, or
> should I definetly try a new kernel first?

If you can quickly download a "live" CD with an actual kernel, do that 
and try that. If that is problematic, e.g. you don't have access to the 
server, just run xfs_repair -L.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrupt dinode 6242615, (btree extents).  This is a bug."
  2011-08-08 10:05 "Corrupt dinode 6242615, (btree extents). This is a bug." Hanne Munkholm
  2011-08-08 11:58 ` Michael Monnerie
@ 2011-08-08 23:58 ` Dave Chinner
  2011-08-09  8:47   ` Hanne Munkholm
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2011-08-08 23:58 UTC (permalink / raw)
  To: Hanne Munkholm; +Cc: xfs

On Mon, Aug 08, 2011 at 12:05:09PM +0200, Hanne Munkholm wrote:
> Hi list.
> 
> I have an xfs file system which got damaged due to not being
> properly unmounted before the iSCSI connection terminated (I
> think. Corrupted it is).
> 
> I cannot mount it. mount: wrong fs type, bad option, bad superblock
> on /dev/sdd,
>         missing codepage or helper program, or other error
>         In some cases useful info is found in syslog - try
>         dmesg | tail  or so

That is the default error message from mount when the kernel throws
and error. The error message in dmesg will tell you exactly what the
error was - can you post that?

> xfs_check suggests running xfs_repair -L. ERROR: The filesystem has
> valuable metadata changes in a log
> which needs to be replayed.  Mount the filesystem to replay the log, and
> unmount it before re-running xfs_check.  If you are unable to mount the
> filesystem, then use the xfs_repair -L option to destroy the log and attempt a
> repair.  Note that destroying the log may cause corruption -- please
> attempt a mount of the filesystem before doing this.
> 
> I haven't done that yet.

You won't be able to because mounting is failing. Hence you only
option for recovery is to use xfs_repair -L to zero the log.

> Instead I ran
> xfs_repair -n.
> I got a lot of output that looks promising for a repair IMO, at
> least it acknowleges an xfs system beoing there:
> 
> xfs_repair -n /dev/sdd
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>          - scan filesystem freespace and inode maps...
>          - found root inode chunk
> Phase 3 - for each AG...
>          - scan (but don't clear) agi unlinked lists...
>          - process known inodes and perform inode discovery...
>          - agno = 0
> bad nblocks 952 for inode 6242615, would reset to 972
> bad nextents 182 for inode 6242615, would reset to 185
> imap claims a free inode 13352640 is in use, would correct imap and clear inode
> imap claims a free inode 13352641 is in use, would correct imap and clear inode
> <snip>
>         - agno = 1
>          - agno = 2
>          - agno = 3
>          - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>          - setting up duplicate extent list...
>          - check for inodes claiming duplicate blocks...
>          - agno = 0
>          - agno = 3
>          - agno = 2
>          - agno = 1
> bad nblocks 952 for inode 6242615, would reset to 972
> bad nextents 182 for inode 6242615, would reset to 185
> entry "sample_000001299840_0_0.000000.pdb" at block 764 offset
> 2512 in directory inode 6242615 references free inode 13352640
>  	would clear inode number in entry at offset 2512...
> entry "sample_000001299860_0_0.000000.pdb" at block 764 offset
> 2560 in directory inode 6242615 references free inode 13352641
> <snip>
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>          - traversing filesystem ...
> corrupt dinode 6242615, (btree extents).  This is a bug.

That's one of the inodes that has already been found to be bad, and
woul dhave had parts of it fixed before getting to phase 6. Hence
this problem may have already been fixed by this stage.

> Please capture the filesystem metadata with xfs_metadump and
> report it to xfs@oss.sgi.com.
> corrupt dinode 6242615, (btree extents).  This is a bug.
> Please capture the filesystem metadata with xfs_metadump and
> report it to xfs@oss.sgi.com.
> corrupt dinode 6242615, (btree extents).  This is a bug.
> Please capture the filesystem metadata with xfs_metadump and
> report it to xfs@oss.sgi.com.
> Segmentation fault

And the chance is that this won't happen.

> I have placed a xfs_metadump here:
> http://people.binf.ku.dk/hanne/tmp/metadata.gz

Downloading it now. it's about 550MB, so will take a little while...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "Corrupt dinode 6242615, (btree extents).  This is a bug."
  2011-08-08 23:58 ` "Corrupt dinode 6242615, (btree extents). This is a bug." Dave Chinner
@ 2011-08-09  8:47   ` Hanne Munkholm
  0 siblings, 0 replies; 6+ messages in thread
From: Hanne Munkholm @ 2011-08-09  8:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Thank you very much.

I have my file system running again.

The real problem turned out to be that the device had changed
it's name from sdc to sdd. I have seen that before and should
have noticed. It refused to mount the file system with the same
ID "again" even after xfs_repair because it had not been really
umounted from sdc.

My dmesg was cluttered by a lot of kernel traces instead
of the revealing "Filesystem "sdc": xfs_log_force: error 5
returned." until after the xfs_repair, however, I could see that
the device name had changed and should have known what it meant.

I had to reboot to fix the problem. I now wonder if a reboot
right away had done it, and xfs had happily recovered, or the
xfs_repair was needed. In both cases, the log would have been
lost anyway.

It was the segfault that made me write to the list. I can see
why I got the segfault when running in -n mode, I suspected that
myself, but I wasn't sure, I needed someone to tell me that I
should not panic :).

Now I have one more experience with xfs, and next time someone
googles this they might not have to ask.

Thank you very much for your time.

Med venlig hilsen / Best regards
--
Hanne Munkholm                      Email: hanne@binf.ku.dk
Systemadministrator                 Tlf: +45 35 32 13 49

Bioinformatik-centret
Københavns Biocenter, Biologisk Institut
Ole Maaløes Vej 5, 2200 København N



On Tue, 9 Aug 2011, Dave Chinner wrote:

> On Mon, Aug 08, 2011 at 12:05:09PM +0200, Hanne Munkholm wrote:
>> Hi list.
>>
>> I have an xfs file system which got damaged due to not being
>> properly unmounted before the iSCSI connection terminated (I
>> think. Corrupted it is).
>>
>> I cannot mount it. mount: wrong fs type, bad option, bad superblock
>> on /dev/sdd,
>>         missing codepage or helper program, or other error
>>         In some cases useful info is found in syslog - try
>>         dmesg | tail  or so
>
> That is the default error message from mount when the kernel throws
> and error. The error message in dmesg will tell you exactly what the
> error was - can you post that?
>
>> xfs_check suggests running xfs_repair -L. ERROR: The filesystem has
>> valuable metadata changes in a log
>> which needs to be replayed.  Mount the filesystem to replay the log, and
>> unmount it before re-running xfs_check.  If you are unable to mount the
>> filesystem, then use the xfs_repair -L option to destroy the log and attempt a
>> repair.  Note that destroying the log may cause corruption -- please
>> attempt a mount of the filesystem before doing this.
>>
>> I haven't done that yet.
>
> You won't be able to because mounting is failing. Hence you only
> option for recovery is to use xfs_repair -L to zero the log.
>
>> Instead I ran
>> xfs_repair -n.
>> I got a lot of output that looks promising for a repair IMO, at
>> least it acknowleges an xfs system beoing there:
>>
>> xfs_repair -n /dev/sdd
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>          - scan filesystem freespace and inode maps...
>>          - found root inode chunk
>> Phase 3 - for each AG...
>>          - scan (but don't clear) agi unlinked lists...
>>          - process known inodes and perform inode discovery...
>>          - agno = 0
>> bad nblocks 952 for inode 6242615, would reset to 972
>> bad nextents 182 for inode 6242615, would reset to 185
>> imap claims a free inode 13352640 is in use, would correct imap and clear inode
>> imap claims a free inode 13352641 is in use, would correct imap and clear inode
>> <snip>
>>         - agno = 1
>>          - agno = 2
>>          - agno = 3
>>          - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>          - setting up duplicate extent list...
>>          - check for inodes claiming duplicate blocks...
>>          - agno = 0
>>          - agno = 3
>>          - agno = 2
>>          - agno = 1
>> bad nblocks 952 for inode 6242615, would reset to 972
>> bad nextents 182 for inode 6242615, would reset to 185
>> entry "sample_000001299840_0_0.000000.pdb" at block 764 offset
>> 2512 in directory inode 6242615 references free inode 13352640
>>  	would clear inode number in entry at offset 2512...
>> entry "sample_000001299860_0_0.000000.pdb" at block 764 offset
>> 2560 in directory inode 6242615 references free inode 13352641
>> <snip>
>> No modify flag set, skipping phase 5
>> Phase 6 - check inode connectivity...
>>          - traversing filesystem ...
>> corrupt dinode 6242615, (btree extents).  This is a bug.
>
> That's one of the inodes that has already been found to be bad, and
> woul dhave had parts of it fixed before getting to phase 6. Hence
> this problem may have already been fixed by this stage.
>
>> Please capture the filesystem metadata with xfs_metadump and
>> report it to xfs@oss.sgi.com.
>> corrupt dinode 6242615, (btree extents).  This is a bug.
>> Please capture the filesystem metadata with xfs_metadump and
>> report it to xfs@oss.sgi.com.
>> corrupt dinode 6242615, (btree extents).  This is a bug.
>> Please capture the filesystem metadata with xfs_metadump and
>> report it to xfs@oss.sgi.com.
>> Segmentation fault
>
> And the chance is that this won't happen.
>
>> I have placed a xfs_metadump here:
>> http://people.binf.ku.dk/hanne/tmp/metadata.gz
>
> Downloading it now. it's about 550MB, so will take a little while...
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-08-09  8:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-08 10:05 "Corrupt dinode 6242615, (btree extents). This is a bug." Hanne Munkholm
2011-08-08 11:58 ` Michael Monnerie
2011-08-08 12:53   ` "Corrupt dinode 6242615, (btree extents). This is a bug."<o Hanne Munkholm
2011-08-08 13:12     ` Michael Monnerie
2011-08-08 23:58 ` "Corrupt dinode 6242615, (btree extents). This is a bug." Dave Chinner
2011-08-09  8:47   ` Hanne Munkholm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox