Should xfs_repair take this long?

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Should xfs_repair take this long?
@ 2007-03-15 11:27 Thomas Walker
  2007-03-15 14:04 ` Emmanuel Florac
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-15 11:27 UTC (permalink / raw)
  To: xfs

   I am trying to restore a corrupt xfs partition.  It is 6TB total, it 
is an LVM of two 3TB fiber channel SAN volumes.  The host is running 
RHEL4, 2.6.9-42.0.2.ELsmp, and the version of xfsprogs is 
xfsprogs-2.6.13-2.  The host has four threaded AMD Opterons, 4GB of RAM 
and 2GB of swap located on an internal SCSI disk.  It is unclear how the 
xfs partition was damaged, but it reports a bad superblock and will not 
mount.  I am running this command;

xfs_repair -o assume_xfs /dev/mapper/vg0-hladata3

   This command has been running for two days now.  There is cpu 
activity and i/o activity on the physical SAN.  There is some swapping 
but not an unusual amount and swapon -s shows only a small amount in 
use.  I have seen information implying xfs_repair needs a large amount 
of memory to work well, otherwise it will take a long time.  My question 
is, giving my setup, is there an estimate of how long I should wait 
before expecting a result?  Should I add swap space?  Is there anything 
else I should do?

    thanks in advance for any help.

    Thomas Walker

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-15 11:27 Should xfs_repair take this long? Thomas Walker
@ 2007-03-15 14:04 ` Emmanuel Florac
  2007-03-15 14:06   ` Thomas Walker
  0 siblings, 1 reply; 15+ messages in thread
From: Emmanuel Florac @ 2007-03-15 14:04 UTC (permalink / raw)
  To: Thomas Walker; +Cc: xfs

Le Thu, 15 Mar 2007 07:27:08 -0400
Thomas Walker <walker@stsci.edu> écrivait:

> xfs_repair -o assume_xfs /dev/mapper/vg0-hladata3
> 
>    This command has been running for two days now. 

Is there any output from xfs_repair ? This doesn't sound good. I've run
xfs_repair on some badly corrupted fs up to 13 TB, and it never took
more than a couple of minutes.

-- 
----------------------------------------
Emmanuel Florac     |   Intellique
----------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-15 14:04 ` Emmanuel Florac
@ 2007-03-15 14:06   ` Thomas Walker
       [not found]     ` <20070315160309.652a6e0c@harpe.intellique.com>
  2007-03-15 23:10     ` David Chinner
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Walker @ 2007-03-15 14:06 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


     The terminal shows a lot of "." dots running across the screen 
quickly, and every few hours it says this;


.....................................................found candidate 
secondary superblock...
unable to verify superblock, continuing...
found candidate secondary superblock...
unable to verify superblock, continuing...

      Thomas Walker


Emmanuel Florac wrote:

>Le Thu, 15 Mar 2007 07:27:08 -0400
>Thomas Walker <walker@stsci.edu> écrivait:
>
>  
>
>>xfs_repair -o assume_xfs /dev/mapper/vg0-hladata3
>>
>>   This command has been running for two days now. 
>>    
>>
>
>Is there any output from xfs_repair ? This doesn't sound good. I've run
>xfs_repair on some badly corrupted fs up to 13 TB, and it never took
>more than a couple of minutes.
>
>  
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20070315160309.652a6e0c@harpe.intellique.com>]

[parent not found: <45F96150.50001@stsci.edu>]

* Re: Should xfs_repair take this long?
       [not found]       ` <45F96150.50001@stsci.edu>
@ 2007-03-15 15:23         ` Emmanuel Florac
  2007-03-15 15:27           ` Thomas Walker
  0 siblings, 1 reply; 15+ messages in thread
From: Emmanuel Florac @ 2007-03-15 15:23 UTC (permalink / raw)
  To: Thomas Walker, xfs

Le Thu, 15 Mar 2007 11:08:00 -0400
Thomas Walker <walker@stsci.edu> écrivait:

>     So if I see I/O activity and cpu activity, which I do, should I 
> assume that eventually the repair should return? 

The repair _MAY_ return, unfortunately...

>  We are thinking of 
> interrupting it and trying to add more memory and restarting it.
> Maybe we should just let it go. 

Yes, let it go now...

> If you think it might finish some
> time, even if it's going to be another day or two, then I'm willing
> to be patient. I'm just worried it might be going around in circles.
> 

It should finish if you're testing the right device, and the LV is
properly assembled. If it reach the end of the device and find nothing,
you should restart LVM first to check that your PV/VG/LV are correctly
set up, and retry.
If it doesn't work after that, backup will be your last friend.

-- 
----------------------------------------
Emmanuel Florac     |   Intellique
----------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-15 15:23         ` Emmanuel Florac
@ 2007-03-15 15:27           ` Thomas Walker
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Walker @ 2007-03-15 15:27 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


    I checked the status of the LVM before starting xfs_repair.  I can't 
promise it's all in order, but at least the various pvdisplay, 
vgdisplay, lvdisplay, etc all came back normal.  So... I'll just wait a 
few days and hope that xfs_repair comes back with something eventually.

    thanks for at least taking an interest, too bad that there's nothing 
we can really do about it.

    Thomas Walker

Emmanuel Florac wrote:

>Le Thu, 15 Mar 2007 11:08:00 -0400
>Thomas Walker <walker@stsci.edu> écrivait:
>
>  
>
>>    So if I see I/O activity and cpu activity, which I do, should I 
>>assume that eventually the repair should return? 
>>    
>>
>
>The repair _MAY_ return, unfortunately...
>
>  
>
>> We are thinking of 
>>interrupting it and trying to add more memory and restarting it.
>>Maybe we should just let it go. 
>>    
>>
>
>Yes, let it go now...
>
>  
>
>>If you think it might finish some
>>time, even if it's going to be another day or two, then I'm willing
>>to be patient. I'm just worried it might be going around in circles.
>>
>>    
>>
>
>It should finish if you're testing the right device, and the LV is
>properly assembled. If it reach the end of the device and find nothing,
>you should restart LVM first to check that your PV/VG/LV are correctly
>set up, and retry.
>If it doesn't work after that, backup will be your last friend.
>
>  
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-15 14:06   ` Thomas Walker
       [not found]     ` <20070315160309.652a6e0c@harpe.intellique.com>
@ 2007-03-15 23:10     ` David Chinner
  2007-03-16 15:15       ` Thomas Walker
  1 sibling, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-03-15 23:10 UTC (permalink / raw)
  To: Thomas Walker; +Cc: Emmanuel Florac, xfs

On Thu, Mar 15, 2007 at 10:06:42AM -0400, Thomas Walker wrote:
> 
>     The terminal shows a lot of "." dots running across the screen 
> quickly, and every few hours it says this;
> 
> 
> .....................................................found candidate 
> secondary superblock...
> unable to verify superblock, continuing...
> found candidate secondary superblock...
> unable to verify superblock, continuing...

The primary superblock is not good, and it's trying to find a valid
secondary superblock. Doesn't sound promising so far - reapir can't
start until a valid superblok is found....

Can you dump the first sector of the device the fielsystem is
on:

# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x

So we can see if that really holds a primary XFS superblock?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-15 23:10     ` David Chinner
@ 2007-03-16 15:15       ` Thomas Walker
  2007-03-16 19:37         ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16 15:15 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

     I see in other posts that "parted" is sometimes a culprit in these 
problems.  Indeed, in my case I did use "parted" to create a gpt 
partition table on these xfs volumes and I was asked my parted to use 
the secondary signature.

--- snip ---

i also remember something about parted (maybe...) finding a backup gpt
signature at the end of a disk, and "helpfully" copying it over the
front end if so.  This was a bug.  sgi guys do you remember?

But for this one has to invoke parted, and commit the operations done,
am I right?

if I recall, even invoking parted could do this.
--- snip ---

   So maybe I got bit the same way.  parted may be overwritten something 
at the head of the volume.  Is there any way to repair the super block 
though?  It seems that everyone agrees xfs can't do anything until it 
has a super block somewhere and I don't seem to have one.  If there's no 
way to repair, then what about recovery?  I see mention of possibly 
doing an xfs dump to another disk, reformat the original volume, and 
then xfs restore back.  Is there any online procedure for how to do that 
if it applies to me here?

   Thomas Walker

David Chinner wrote:

>On Thu, Mar 15, 2007 at 10:06:42AM -0400, Thomas Walker wrote:
>  
>
>>    The terminal shows a lot of "." dots running across the screen 
>>quickly, and every few hours it says this;
>>
>>
>>.....................................................found candidate 
>>secondary superblock...
>>unable to verify superblock, continuing...
>>found candidate secondary superblock...
>>unable to verify superblock, continuing...
>>    
>>
>
>The primary superblock is not good, and it's trying to find a valid
>secondary superblock. Doesn't sound promising so far - reapir can't
>start until a valid superblok is found....
>
>
>  
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16 15:15       ` Thomas Walker
@ 2007-03-16 19:37         ` David Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: David Chinner @ 2007-03-16 19:37 UTC (permalink / raw)
  To: Thomas Walker; +Cc: David Chinner, xfs

On Fri, Mar 16, 2007 at 11:15:12AM -0400, Thomas Walker wrote:
>   So maybe I got bit the same way.  parted may be overwritten something 
> at the head of the volume.

Doesn't look like partition blocks at the start of each volume, though.

> Is there any way to repair the super block 
> though?  It seems that everyone agrees xfs can't do anything until it 
> has a super block somewhere and I don't seem to have one.

That's beacuse repair can't work out where things are supposed to
be without a superblock to tell it critical information.
Manually trying to find and repair a superblock is a hit and miss
affair - at this point we don't even know if the primary superblocks
have been overwritten or whether something else is wrong with LVM...

> If there's no 
> way to repair, then what about recovery? 

In a word: backups.

> I see mention of possibly 
> doing an xfs dump to another disk, reformat the original volume, and 
> then xfs restore back.  Is there any online procedure for how to do that 
> if it applies to me here?

You need to be able to mount the filesystem to dump it, so until you
can run repair there's no simple recovery option.

If the lvm config is correct and repair cannot find a valid
secondary superblock, then you really need to start doing dangerous
things to try to recover. i'd suggest taking a copy of the lvm
volumes before doing anything else.

Then, find a secondary superblock in the volume (first 4 bytes of
the sector are "XFSB" in hex) and copy that sector to block zero of
the filesystem. If repair still won't do it's stuff, then you need
to use xfs_db to modify that superblock until it does.  Then when
repair runs, you get to look in lost+found and try to work out what
all the broken bits are.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
@ 2007-03-16  0:20 Thomas Walker
  2007-03-16  1:32 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16  0:20 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs


  Ok, here's the output of the command you wanted.  I ran it on both of the xfs file systems we have, both say bad superblock when trying to mount;

[root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
000000

[root@hla-ags ~]# dd if=/dev/mapper/vg1-hladata2 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
000000

   [root@hla-ags ~]# mount /hladata2
mount: wrong fs type, bad option, bad superblock on /dev/vg0/hladata3,
       or too many mounted file systems

  Thomas Walker


---- Original message ----
>Date: Fri, 16 Mar 2007 10:10:31 +1100
>From: David Chinner <dgc@sgi.com>  
>Subject: Re: Should xfs_repair take this long?  
>To: Thomas Walker <walker@stsci.edu>
>Cc: Emmanuel Florac <eflorac@intellique.com>, xfs@oss.sgi.com
>
>On Thu, Mar 15, 2007 at 10:06:42AM -0400, Thomas Walker wrote:
>> 
>>     The terminal shows a lot of "." dots running across the screen 
>> quickly, and every few hours it says this;
>> 
>> 
>> .....................................................found candidate 
>> secondary superblock...
>> unable to verify superblock, continuing...
>> found candidate secondary superblock...
>> unable to verify superblock, continuing...
>
>The primary superblock is not good, and it's trying to find a valid
>secondary superblock. Doesn't sound promising so far - reapir can't
>start until a valid superblok is found....
>
>Can you dump the first sector of the device the fielsystem is
>on:
>
># dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
>
>So we can see if that really holds a primary XFS superblock?
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16  0:20 Thomas Walker
@ 2007-03-16  1:32 ` David Chinner
  2007-03-16 11:15   ` Thomas Walker
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-03-16  1:32 UTC (permalink / raw)
  To: Thomas Walker; +Cc: David Chinner, xfs

On Thu, Mar 15, 2007 at 08:20:27PM -0400, Thomas Walker wrote:
> 
>   Ok, here's the output of the command you wanted.  I ran it on both of the xfs file systems we have, both say bad superblock when trying to mount;
> 
> [root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
> 000000

That failed - the output should be like:

 # dd if=/dev/mapper/test_vg-fred bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
000000 4658 4253 0000 0010 0000 0000 1000 0000
000010 0000 0000 0000 0000 0000 0000 0000 0000
000020 34a8 5343 d8e3 8d46 01a5 b1e4 3a76 ac05
000030 0000 0000 0800 0400 0000 0000 0000 8000
000040 0000 0000 0000 8100 0000 0000 0000 8200
000050 0000 0100 0200 0000 0000 0800 0000 0000
000060 0000 000a b430 0002 0001 1000 0000 0000
000070 0000 0000 0000 0000 090c 0408 0011 1900
000080 0000 0000 0000 803c 0000 0000 0000 0606
000090 0000 0000 0c00 16f5 0000 0000 0000 0000
0000a0 0000 0000 0000 0000 0000 0000 0000 0000
0000b0 0000 0000 0000 0200 0000 0000 0000 0000
0000c0 0000 0000 0000 0000 0000 0000 0000 0000
*
000200

Can you remove the redirect to /dev/null so we can see the error message?

Sorry about that.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16  1:32 ` David Chinner
@ 2007-03-16 11:15   ` Thomas Walker
  2007-03-16 19:20     ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16 11:15 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs


    No problem.  The error was with the iflag=direct, apparently RHEL4 
doesn't like that option so I took it out.  Here's the output from each 
of the xfs volumes;

[root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 | od -Ax -x
1+0 records in
1+0 records out
000000 0000 6419 0000 0c00 0000 7419 0000 0c00
000010 0000 8419 0000 0c00 0000 9419 0000 0c00
000020 0000 a419 0000 0c00 0000 b419 0000 0c00
000030 0000 c419 0000 0c00 0000 d419 0000 0c00
000040 0000 e419 0000 0c00 0000 f419 0000 0c00
000050 0000 041a 0000 0c00 0000 141a 0000 0c00
000060 0000 241a 0000 0c00 0000 341a 0000 0c00
000070 0000 441a 0000 0c00 0000 541a 0000 0c00
000080 0000 641a 0000 0c00 0000 741a 0000 0c00
000090 0000 841a 0000 0c00 0000 941a 0000 0c00
0000a0 0000 a41a 0000 0c00 0000 b41a 0000 0c00
0000b0 0000 c41a 0000 0c00 0000 d41a 0000 0c00
0000c0 0000 e41a 0000 0c00 0000 f41a 0000 0c00
0000d0 0000 041b 0000 0c00 0000 141b 0000 0c00
0000e0 0000 241b 0000 0c00 0000 341b 0000 0c00
0000f0 0000 441b 0000 0c00 0000 541b 0000 0c00
000100 0000 641b 0000 0c00 0000 741b 0000 0c00
000110 0000 841b 0000 0c00 0000 941b 0000 0c00
000120 0000 a41b 0000 0c00 0000 b41b 0000 0c00
000130 0000 c41b 0000 0c00 0000 d41b 0000 0c00
000140 0000 e41b 0000 0c00 0000 f41b 0000 0c00
000150 0000 041c 0000 0c00 0000 141c 0000 0c00
000160 0000 241c 0000 0c00 0000 341c 0000 0c00
000170 0000 441c 0000 0c00 0000 541c 0000 0c00
000180 0000 641c 0000 0c00 0000 741c 0000 0c00
000190 0000 841c 0000 0c00 0000 941c 0000 0c00
0001a0 0000 a41c 0000 0c00 0000 b41c 0000 0c00
0001b0 0000 c41c 0000 0c00 0000 d41c 0000 0c00
0001c0 0000 e41c 0000 0c00 0000 f41c 0000 0c00
0001d0 0000 041d 0000 0c00 0000 141d 0000 0c00
0001e0 0000 241d 0000 0c00 0000 341d 0000 0c00
0001f0 0000 441d 0000 0c00 0000 541d 0000 0c00
000200



[root@hla-ags ~]# dd if=/dev/mapper/vg1-hladata2 bs=512 count=1 | od -Ax -x
1+0 records in
000000 7970 6f72 746f 203a 2030 0a2f 500a 414c
1+0 records out
000010 4e49 4b0a 3120 410a 560a 3120 0a34 6964
000020 2072 2e31 2e30 3572 392f 3830 4b0a 3420
000030 690a 746f 0a61 2056 3531 660a 6c69 2065
000040 2e6b 2e30 3472 362f 3833 450a 444e 450a
000050 444e 4552 0a50 6469 203a 2e30 2e30 3572
000060 312f 3131 0a30 7974 6570 203a 6964 0a72
000070 7270 6465 203a 2e30 2e30 3472 382f 3034
000080 630a 756f 746e 203a 0a35 6574 7478 203a
000090 2035 3031 3733 3620 2030 3036 3220 6262
0000a0 3538 6237 3138 3165 3235 6134 3939 6163
0000b0 6337 6165 3163 3636 3433 6663 0a66 7063
0000c0 7461 3a68 2f20 630a 706f 7279 6f6f 3a74
0000d0 3020 2f20 0a0a 2e63 2e30 3372 392f 3135
0000e0 6420 6c65 7465 2065 6166 736c 2065 6166
0000f0 736c 2065 412f 442f 472f 722f 6f68 0a0a
000100 2e6c 2e30 3474 312d 6d20 646f 6669 2079
000110 7274 6575 6620 6c61 6573 2f20 2f41 2f43
000120 7065 6973 6f6c 0a6e 0a0a 3131 3031 3120
000130 3332 0a38 0000 0000 0000 0000 0000 0000
000140 0000 0000 0000 0000 0000 0000 0000 0000
*
000200


         Does that help with a diag?

     Thomas Walker
David Chinner wrote:

>On Thu, Mar 15, 2007 at 08:20:27PM -0400, Thomas Walker wrote:
>  
>
>>  Ok, here's the output of the command you wanted.  I ran it on both of the xfs file systems we have, both say bad superblock when trying to mount;
>>
>>[root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
>>000000
>>    
>>
>
>That failed - the output should be like:
>
> # dd if=/dev/mapper/test_vg-fred bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
>000000 4658 4253 0000 0010 0000 0000 1000 0000
>000010 0000 0000 0000 0000 0000 0000 0000 0000
>000020 34a8 5343 d8e3 8d46 01a5 b1e4 3a76 ac05
>000030 0000 0000 0800 0400 0000 0000 0000 8000
>000040 0000 0000 0000 8100 0000 0000 0000 8200
>000050 0000 0100 0200 0000 0000 0800 0000 0000
>000060 0000 000a b430 0002 0001 1000 0000 0000
>000070 0000 0000 0000 0000 090c 0408 0011 1900
>000080 0000 0000 0000 803c 0000 0000 0000 0606
>000090 0000 0000 0c00 16f5 0000 0000 0000 0000
>0000a0 0000 0000 0000 0000 0000 0000 0000 0000
>0000b0 0000 0000 0000 0200 0000 0000 0000 0000
>0000c0 0000 0000 0000 0000 0000 0000 0000 0000
>*
>000200
>
>Can you remove the redirect to /dev/null so we can see the error message?
>
>Sorry about that.
>
>Cheers,
>
>Dave.
>  
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16 11:15   ` Thomas Walker
@ 2007-03-16 19:20     ` David Chinner
  2007-03-16 19:30       ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-03-16 19:20 UTC (permalink / raw)
  To: Thomas Walker; +Cc: xfs

On Fri, Mar 16, 2007 at 07:15:03AM -0400, Thomas Walker wrote:
> 
>    No problem.  The error was with the iflag=direct, apparently RHEL4 
> doesn't like that option so I took it out.  Here's the output from each 
> of the xfs volumes;
> 
> [root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 | od -Ax -x
> 1+0 records in
> 1+0 records out
> 000000 0000 6419 0000 0c00 0000 7419 0000 0c00
> 000010 0000 8419 0000 0c00 0000 9419 0000 0c00

That's not a XFS superblock :(

> [root@hla-ags ~]# dd if=/dev/mapper/vg1-hladata2 bs=512 count=1 | od -Ax -x
> 1+0 records in
> 000000 7970 6f72 746f 203a 2030 0a2f 500a 414c
> 000010 4e49 4b0a 3120 410a 560a 3120 0a34 6964
> 000020 2072 2e31 2e30 3572 392f 3830 4b0a 3420
> 000030 690a 746f 0a61 2056 3531 660a 6c69 2065

Neither is that - it's a bunch of text that doesn't
make much sense to me....

This would explain why xfs_repair is having trouble.

>         Does that help with a diag?

It tells us that something has either overwritten the start of the
partitions, or the lvm volumes have been put together incorrectly so
the superblocks are not where it should be. I'd check that the LVM
config is correct (again)....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16 19:20     ` David Chinner
@ 2007-03-16 19:30       ` Eric Sandeen
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2007-03-16 19:30 UTC (permalink / raw)
  To: David Chinner; +Cc: Thomas Walker, xfs

David Chinner wrote:
> On Fri, Mar 16, 2007 at 07:15:03AM -0400, Thomas Walker wrote:
>>    No problem.  The error was with the iflag=direct, apparently RHEL4 
>> doesn't like that option so I took it out.  Here's the output from each 
>> of the xfs volumes;
>>
>> [root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 | od -Ax -x
>> 1+0 records in
>> 1+0 records out
>> 000000 0000 6419 0000 0c00 0000 7419 0000 0c00
>> 000010 0000 8419 0000 0c00 0000 9419 0000 0c00
> 
> That's not a XFS superblock :(

lmdd pattern?

>> [root@hla-ags ~]# dd if=/dev/mapper/vg1-hladata2 bs=512 count=1 | od -Ax -x
>> 1+0 records in
>> 000000 7970 6f72 746f 203a 2030 0a2f 500a 414c
>> 000010 4e49 4b0a 3120 410a 560a 3120 0a34 6964
>> 000020 2072 2e31 2e30 3572 392f 3830 4b0a 3420
>> 000030 690a 746f 0a61 2056 3531 660a 6c69 2065
> 
> Neither is that - it's a bunch of text that doesn't
> make much sense to me....

[esandeen@neon ~]$ echo "pyroot" | hexdump
0000000 7970 6f72 746f 000a

python?

*shrug*

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
@ 2007-03-16 20:09 Thomas Walker
  2007-03-16 20:52 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16 20:09 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

  I already had xfs_repair scan the entire 6TB (took it 56 hours, which is the reason for the subject line).  So it couldn't find a SB anywhere on that volume and it walked all over it.  Therefore I guess the SB has been overwritten by something, maybe parted.  As for the LVM physicals being in the wrong order, I can try to reverse them but I'm really pretty sure I have it right.  Still, since the scan by xfs_repair couldn't find a SB anywhere I don't know what I would gain.

   We don't have a backup of these volumes, but I'm told by the user that almost all the data can be retrieved again from our archive, it's just a pain in the neck to do that.  So while it would be nice to recover it won't be critical.

   Before wrapping this up, if you could just clarify a couple things.  If I look at the bytes at the beginning of each physical part of the LVM's, what am I looking for?  "XFSB"?  If I do find that byte string, why couldn't xfs_repair find it when it did the scan and what do I do with it if I do find one?  We see a software product call ufsexplorer that claims to be able to recover data without an XFS super block, anybody try it?

    I appreciate your help and time,

  Thomas Walker

---- Original message ----
>Date: Sat, 17 Mar 2007 06:37:22 +1100
>From: David Chinner <dgc@sgi.com>  
>Subject: Re: Should xfs_repair take this long?  
>To: Thomas Walker <walker@stsci.edu>
>Cc: David Chinner <dgc@sgi.com>, xfs@oss.sgi.com
>
>On Fri, Mar 16, 2007 at 11:15:12AM -0400, Thomas Walker wrote:
>>   So maybe I got bit the same way.  parted may be overwritten something 
>> at the head of the volume.
>
>Doesn't look like partition blocks at the start of each volume, though.
>
>> Is there any way to repair the super block 
>> though?  It seems that everyone agrees xfs can't do anything until it 
>> has a super block somewhere and I don't seem to have one.
>
>That's beacuse repair can't work out where things are supposed to
>be without a superblock to tell it critical information.
>Manually trying to find and repair a superblock is a hit and miss
>affair - at this point we don't even know if the primary superblocks
>have been overwritten or whether something else is wrong with LVM...
>
>> If there's no 
>> way to repair, then what about recovery? 
>
>In a word: backups.
>
>> I see mention of possibly 
>> doing an xfs dump to another disk, reformat the original volume, and 
>> then xfs restore back.  Is there any online procedure for how to do that 
>> if it applies to me here?
>
>You need to be able to mount the filesystem to dump it, so until you
>can run repair there's no simple recovery option.
>
>If the lvm config is correct and repair cannot find a valid
>secondary superblock, then you really need to start doing dangerous
>things to try to recover. i'd suggest taking a copy of the lvm
>volumes before doing anything else.
>
>Then, find a secondary superblock in the volume (first 4 bytes of
>the sector are "XFSB" in hex) and copy that sector to block zero of
>the filesystem. If repair still won't do it's stuff, then you need
>to use xfs_db to modify that superblock until it does.  Then when
>repair runs, you get to look in lost+found and try to work out what
>all the broken bits are.....
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Should xfs_repair take this long?
  2007-03-16 20:09 Thomas Walker
@ 2007-03-16 20:52 ` David Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: David Chinner @ 2007-03-16 20:52 UTC (permalink / raw)
  To: Thomas Walker; +Cc: xfs

On Fri, Mar 16, 2007 at 04:09:00PM -0400, Thomas Walker wrote:
> 
>   I already had xfs_repair scan the entire 6TB (took it 56 hours, which is
>   the reason for the subject line).  So it couldn't find a SB anywhere on
>   that volume and it walked all over it.  Therefore I guess the SB has been
>   overwritten by something, maybe parted.  As for the LVM physicals being in
>   the wrong order, I can try to reverse them but I'm really pretty sure I
>   have it right.  Still, since the scan by xfs_repair couldn't find a SB
>   anywhere I don't know what I would gain.

xfs-repair did find candidate secondary superblocks - it discarded them
for some reason or another. If they were ok, all repair would have done
is copied them to block zero and then continued.

I'm suggesting that you manually do this step, and then see if repair
will run.

>    Before wrapping this up, if you could just clarify a couple things.  If I
>    look at the bytes at the beginning of each physical part of the LVM's,
>    what am I looking for?  "XFSB"?

yes.

>    If I do find that byte string, why
>    couldn't xfs_repair find it when it did the scan and what do I do with it
>    if I do find one?

As I said above, xfs-repair did find some, but rejected them for some
(unknown) reason. if you find one, copy it over block zero of the partition,
and see if repair will run. Like I said, though, you'll probably want to
back up th partition first, or at least run repair in no-modify mode.

>    We see a software product call ufsexplorer that claims
>    to be able to recover data without an XFS super block, anybody try it?

Given that a) it runs on windows, and b) XFS support was apparently adding
only a week ago, I doubt there's many ppl here that have tried it....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-03-16 20:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-15 11:27 Should xfs_repair take this long? Thomas Walker
2007-03-15 14:04 ` Emmanuel Florac
2007-03-15 14:06   ` Thomas Walker
     [not found]     ` <20070315160309.652a6e0c@harpe.intellique.com>
     [not found]       ` <45F96150.50001@stsci.edu>
2007-03-15 15:23         ` Emmanuel Florac
2007-03-15 15:27           ` Thomas Walker
2007-03-15 23:10     ` David Chinner
2007-03-16 15:15       ` Thomas Walker
2007-03-16 19:37         ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-03-16  0:20 Thomas Walker
2007-03-16  1:32 ` David Chinner
2007-03-16 11:15   ` Thomas Walker
2007-03-16 19:20     ` David Chinner
2007-03-16 19:30       ` Eric Sandeen
2007-03-16 20:09 Thomas Walker
2007-03-16 20:52 ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox