public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Should xfs_repair take this long?
@ 2007-03-15 11:27 Thomas Walker
  2007-03-15 14:04 ` Emmanuel Florac
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-15 11:27 UTC (permalink / raw)
  To: xfs

 
   I am trying to restore a corrupt xfs partition.  It is 6TB total, it 
is an LVM of two 3TB fiber channel SAN volumes.  The host is running 
RHEL4, 2.6.9-42.0.2.ELsmp, and the version of xfsprogs is 
xfsprogs-2.6.13-2.  The host has four threaded AMD Opterons, 4GB of RAM 
and 2GB of swap located on an internal SCSI disk.  It is unclear how the 
xfs partition was damaged, but it reports a bad superblock and will not 
mount.  I am running this command;

xfs_repair -o assume_xfs /dev/mapper/vg0-hladata3

   This command has been running for two days now.  There is cpu 
activity and i/o activity on the physical SAN.  There is some swapping 
but not an unusual amount and swapon -s shows only a small amount in 
use.  I have seen information implying xfs_repair needs a large amount 
of memory to work well, otherwise it will take a long time.  My question 
is, giving my setup, is there an estimate of how long I should wait 
before expecting a result?  Should I add swap space?  Is there anything 
else I should do?

    thanks in advance for any help.

    Thomas Walker

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Should xfs_repair take this long?
@ 2007-03-16  0:20 Thomas Walker
  2007-03-16  1:32 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16  0:20 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs


  Ok, here's the output of the command you wanted.  I ran it on both of the xfs file systems we have, both say bad superblock when trying to mount;

[root@hla-ags ~]# dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
000000

[root@hla-ags ~]# dd if=/dev/mapper/vg1-hladata2 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
000000

   [root@hla-ags ~]# mount /hladata2
mount: wrong fs type, bad option, bad superblock on /dev/vg0/hladata3,
       or too many mounted file systems

  Thomas Walker


---- Original message ----
>Date: Fri, 16 Mar 2007 10:10:31 +1100
>From: David Chinner <dgc@sgi.com>  
>Subject: Re: Should xfs_repair take this long?  
>To: Thomas Walker <walker@stsci.edu>
>Cc: Emmanuel Florac <eflorac@intellique.com>, xfs@oss.sgi.com
>
>On Thu, Mar 15, 2007 at 10:06:42AM -0400, Thomas Walker wrote:
>> 
>>     The terminal shows a lot of "." dots running across the screen 
>> quickly, and every few hours it says this;
>> 
>> 
>> .....................................................found candidate 
>> secondary superblock...
>> unable to verify superblock, continuing...
>> found candidate secondary superblock...
>> unable to verify superblock, continuing...
>
>The primary superblock is not good, and it's trying to find a valid
>secondary superblock. Doesn't sound promising so far - reapir can't
>start until a valid superblok is found....
>
>Can you dump the first sector of the device the fielsystem is
>on:
>
># dd if=/dev/mapper/vg0-hladata3 bs=512 count=1 iflag=direct 2> /dev/null | od -Ax -x
>
>So we can see if that really holds a primary XFS superblock?
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Should xfs_repair take this long?
@ 2007-03-16 20:09 Thomas Walker
  2007-03-16 20:52 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Walker @ 2007-03-16 20:09 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs


  I already had xfs_repair scan the entire 6TB (took it 56 hours, which is the reason for the subject line).  So it couldn't find a SB anywhere on that volume and it walked all over it.  Therefore I guess the SB has been overwritten by something, maybe parted.  As for the LVM physicals being in the wrong order, I can try to reverse them but I'm really pretty sure I have it right.  Still, since the scan by xfs_repair couldn't find a SB anywhere I don't know what I would gain.

   We don't have a backup of these volumes, but I'm told by the user that almost all the data can be retrieved again from our archive, it's just a pain in the neck to do that.  So while it would be nice to recover it won't be critical.

   Before wrapping this up, if you could just clarify a couple things.  If I look at the bytes at the beginning of each physical part of the LVM's, what am I looking for?  "XFSB"?  If I do find that byte string, why couldn't xfs_repair find it when it did the scan and what do I do with it if I do find one?  We see a software product call ufsexplorer that claims to be able to recover data without an XFS super block, anybody try it?

    I appreciate your help and time,

  Thomas Walker

---- Original message ----
>Date: Sat, 17 Mar 2007 06:37:22 +1100
>From: David Chinner <dgc@sgi.com>  
>Subject: Re: Should xfs_repair take this long?  
>To: Thomas Walker <walker@stsci.edu>
>Cc: David Chinner <dgc@sgi.com>, xfs@oss.sgi.com
>
>On Fri, Mar 16, 2007 at 11:15:12AM -0400, Thomas Walker wrote:
>>   So maybe I got bit the same way.  parted may be overwritten something 
>> at the head of the volume.
>
>Doesn't look like partition blocks at the start of each volume, though.
>
>> Is there any way to repair the super block 
>> though?  It seems that everyone agrees xfs can't do anything until it 
>> has a super block somewhere and I don't seem to have one.
>
>That's beacuse repair can't work out where things are supposed to
>be without a superblock to tell it critical information.
>Manually trying to find and repair a superblock is a hit and miss
>affair - at this point we don't even know if the primary superblocks
>have been overwritten or whether something else is wrong with LVM...
>
>> If there's no 
>> way to repair, then what about recovery? 
>
>In a word: backups.
>
>> I see mention of possibly 
>> doing an xfs dump to another disk, reformat the original volume, and 
>> then xfs restore back.  Is there any online procedure for how to do that 
>> if it applies to me here?
>
>You need to be able to mount the filesystem to dump it, so until you
>can run repair there's no simple recovery option.
>
>If the lvm config is correct and repair cannot find a valid
>secondary superblock, then you really need to start doing dangerous
>things to try to recover. i'd suggest taking a copy of the lvm
>volumes before doing anything else.
>
>Then, find a secondary superblock in the volume (first 4 bytes of
>the sector are "XFSB" in hex) and copy that sector to block zero of
>the filesystem. If repair still won't do it's stuff, then you need
>to use xfs_db to modify that superblock until it does.  Then when
>repair runs, you get to look in lost+found and try to work out what
>all the broken bits are.....
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>Principal Engineer
>SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-03-16 20:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-15 11:27 Should xfs_repair take this long? Thomas Walker
2007-03-15 14:04 ` Emmanuel Florac
2007-03-15 14:06   ` Thomas Walker
     [not found]     ` <20070315160309.652a6e0c@harpe.intellique.com>
     [not found]       ` <45F96150.50001@stsci.edu>
2007-03-15 15:23         ` Emmanuel Florac
2007-03-15 15:27           ` Thomas Walker
2007-03-15 23:10     ` David Chinner
2007-03-16 15:15       ` Thomas Walker
2007-03-16 19:37         ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-03-16  0:20 Thomas Walker
2007-03-16  1:32 ` David Chinner
2007-03-16 11:15   ` Thomas Walker
2007-03-16 19:20     ` David Chinner
2007-03-16 19:30       ` Eric Sandeen
2007-03-16 20:09 Thomas Walker
2007-03-16 20:52 ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox