Lost Superblock and need help recovering

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Lost Superblock and need help recovering
@ 2008-05-26  2:40 Javier Gomez
  2008-05-26  3:40 ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Javier Gomez @ 2008-05-26  2:40 UTC (permalink / raw)
  To: xfs



    We are currently running a few Coraid AoE devices which we have 
formated using Raid-5 and XFS filesystem.  The devices were shutdown 
abruptly causing what looks like a some data issues.  We are running a 
Redhat 5 head unit connected to the disk array.  When the devices came 
back up we were unable to remount them.  Based on the tools to check the 
device we got the log information that the Superblock does not exist nor 
does the secondary.  We have two devices, each with 13 TB of disk space 
each and both with what seems like the same issue.  These devices were 
used as a backup storage device, so they are the backup.  But this 
historical information is very critical to us.  We attempted to run the 
"xfs_repair -nv /dev/etherd/e4.1p1"  command to see if found the 
potential issues ( xfs_repair version 2.9.8 ).  It came back with the 
comments noted below.  Does any one have any suggestions for pulling 
this information off the drive and / or correcting this issue?  What 
other tools should I run to get more information?  Thanks for any 
support or suggestions you can provide.


 > xfs_repair -nv /dev/etherd/e4.1p1
---------------------------------------------------------------
Phase 1 - find and verify superblock...
error reading superblock 4 -- seek to offset 1219003957248 failed
couldn't verify primary superblock - bad magic number !!!

attempting to find secondary superblock...
................................................................................................
......................................................
..................
..................
............Sorry, could not find valid secondary superblock
Exiting now.
---------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26  2:40 Lost Superblock and need help recovering Javier Gomez
@ 2008-05-26  3:40 ` Eric Sandeen
  2008-05-26 10:35   ` Javier Gomez
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2008-05-26  3:40 UTC (permalink / raw)
  To: Javier Gomez; +Cc: xfs

Javier Gomez wrote:

>  > xfs_repair -nv /dev/etherd/e4.1p1
> ---------------------------------------------------------------
> Phase 1 - find and verify superblock...
> error reading superblock 4 -- seek to offset 1219003957248 failed
> couldn't verify primary superblock - bad magic number !!!

Looks to me like you still have storage problems.

1219003957248 is just over 1 terabyte... why can't repair seek to that
location if it's a 13T device?

What does /proc/partitions say about this block device (or do AoE
devices go there?)

-Eric

> attempting to find secondary superblock...
> ................................................................................................
> ......................................................
> ..................
> ..................
> ............Sorry, could not find valid secondary superblock
> Exiting now.
> ---------------------------------------------------------------
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26  3:40 ` Eric Sandeen
@ 2008-05-26 10:35   ` Javier Gomez
  2008-05-26 14:49     ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Javier Gomez @ 2008-05-26 10:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


    The two devices having issues are /dev/etherd/e5.1p1    and    
/dev/etherd/e4.1p1

    You make a very valid point.  Notice the main device shows the full 
size (one has 12.6 TB and the other is 9.5 TB).  Each of these two 
devices contain a single complete partition on it taking up the full 
size of the device.  It looks like both of these are short on the size 
for the actual partition "1p1".  Note that for device /dev/etherd/e3.1 
and /dev/etherd/e7.1  and /dev/etherd/e7.2 we formated the xfs 
filesystem directly on the device.  The groups on the net had noted that 
it could be done either way, but it might be a little safer to do it 
with the xfs formated directly on the device (not sure if this is 
valid).  In this case /dev/etherd/e3 and /dev/etherd/e7 both came up 
just fine after the hard shutdown while the /dev/etherd/e4 and 
/dev/etherd/e5 both have this superblock issue.  Each of these devices 
are running the same stuff except that /dev/etherd/e5 is slightly 
smaller then the other ones in disk space.  See this information below, 
do you have any suggestions to recover from it?  Is there anyway to 
remap the partition description to fill the entire size correctly so 
that the xfs_repair can complete its job?
   
    Thanks again for any help...
                      Javier

[root@seer proc]# cat partitions
major minor  #blocks  name

   8     0  243163136 sda
   8     1     104391 sda1
   8     2  243055417 sda2
 253     0  241008640 dm-0
 253     1    2031616 dm-1
 152     0 12697913278 etherd/e4.1
 152     1 1960494281 etherd/e4.1p1
 152    16 12697913278 etherd/e3.1
 152    32 12697913278 etherd/e7.1
 152    48 9523468862 etherd/e5.1
 152    49  933533929 etherd/e5.1p1
 152    64  976762558 etherd/e7.2



Eric Sandeen wrote:
> Javier Gomez wrote:
>
>   
>>  > xfs_repair -nv /dev/etherd/e4.1p1
>> ---------------------------------------------------------------
>> Phase 1 - find and verify superblock...
>> error reading superblock 4 -- seek to offset 1219003957248 failed
>> couldn't verify primary superblock - bad magic number !!!
>>     
>
> Looks to me like you still have storage problems.
>
> 1219003957248 is just over 1 terabyte... why can't repair seek to that
> location if it's a 13T device?
>
> What does /proc/partitions say about this block device (or do AoE
> devices go there?)
>
> -Eric
>
>   
>> attempting to find secondary superblock...
>> ................................................................................................
>> ......................................................
>> ..................
>> ..................
>> ............Sorry, could not find valid secondary superblock
>> Exiting now.
>> ---------------------------------------------------------------
>>
>>
>>     
>
>   


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26 10:35   ` Javier Gomez
@ 2008-05-26 14:49     ` Eric Sandeen
  2008-05-26 15:13       ` Javier Gomez
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2008-05-26 14:49 UTC (permalink / raw)
  To: Javier Gomez; +Cc: xfs

Javier Gomez wrote:
> 
>     The two devices having issues are /dev/etherd/e5.1p1    and   
> /dev/etherd/e4.1p1
> 
>     You make a very valid point.  Notice the main device shows the full
> size (one has 12.6 TB and the other is 9.5 TB).  Each of these two
> devices contain a single complete partition on it taking up the full
> size of the device.  It looks like both of these are short on the size
> for the actual partition "1p1".  

Yep....

> Note that for device /dev/etherd/e3.1
> and /dev/etherd/e7.1  and /dev/etherd/e7.2 we formated the xfs
> filesystem directly on the device.  The groups on the net had noted that
> it could be done either way, but it might be a little safer to do it
> with the xfs formated directly on the device (not sure if this is
> valid).  

>From the xfs perspective, it does not really matter.

> In this case /dev/etherd/e3 and /dev/etherd/e7 both came up
> just fine after the hard shutdown while the /dev/etherd/e4 and
> /dev/etherd/e5 both have this superblock issue.  

If we look at those devices in /proc/partitions:

>  152     0 12697913278 etherd/e4.1	<-- 11.8GiB
>  152     1 1960494281 etherd/e4.1p1	<--  1.8GiB
>  152    48 9523468862 etherd/e5.1	<--  8.8GiB
>  152    49  933533929 etherd/e5.1p1	<--  0.9GiB

you can see that the partitions don't actually seeem to span much of the
device.  I don't know how that happened, but it's unlikely to be an xfs
problem.... perhaps if you can figure out what went wrong there, and get
your partitions back to the right(?) size xfs will see a consistent
filesystem.

> Each of these devices
> are running the same stuff except that /dev/etherd/e5 is slightly
> smaller then the other ones in disk space.  See this information below,
> do you have any suggestions to recover from it?  Is there anyway to
> remap the partition description to fill the entire size correctly so
> that the xfs_repair can complete its job?

What sort of partition tables are on the devices?  I'll hazard a guess
that they're dos partition tables made with parted?  Hmm yep from
looking at the sizes of your devices and partitions, it does appear that
the high bits of the size have been lost.

If so then you've been bitten by a parted bug that lets you "create" dos
partition tables larger than can actually be stored on-disk (2T IIRC),
so that when you reboot, it appears to be truncated.  However, the xfs
data is still there, if so.

Depending on how big the dos partition table is I think some people have
successfully replaced it with a GPT table, which can handle these larger
sizes.  Doing that is a little tricky, and backing up the old table with
dd is well-advised.

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26 14:49     ` Eric Sandeen
@ 2008-05-26 15:13       ` Javier Gomez
  2008-05-26 16:25         ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Javier Gomez @ 2008-05-26 15:13 UTC (permalink / raw)
  To: Eric Sandeen, xfs


    Thanks for the feedback.  Your right on the mark.  We did use 
"parted" to create the partitions on this device.  That would explain 
the issue we are having right now.  Do you have any suggestions on what 
to do next to correct this issue.  I have not seen any clear information 
on the net about this issue.  The information on these devices is very 
important to us and very critical we get it up again prior to tomorrow 
(or reasonably soon after).

    How would you suggest we try to repair the partition table.  Also 
given that its 13 TB a "dd" to backup the device will take a long time 
and I am also not sure what dd command to run that will grab the data 
correctly given the bad partition information currently in place.
                Javier




Eric Sandeen wrote:
> Javier Gomez wrote:
>   
>>     The two devices having issues are /dev/etherd/e5.1p1    and   
>> /dev/etherd/e4.1p1
>>
>>     You make a very valid point.  Notice the main device shows the full
>> size (one has 12.6 TB and the other is 9.5 TB).  Each of these two
>> devices contain a single complete partition on it taking up the full
>> size of the device.  It looks like both of these are short on the size
>> for the actual partition "1p1".  
>>     
>
> Yep....
>
>   
>> Note that for device /dev/etherd/e3.1
>> and /dev/etherd/e7.1  and /dev/etherd/e7.2 we formated the xfs
>> filesystem directly on the device.  The groups on the net had noted that
>> it could be done either way, but it might be a little safer to do it
>> with the xfs formated directly on the device (not sure if this is
>> valid).  
>>     
>
> >From the xfs perspective, it does not really matter.
>
>   
>> In this case /dev/etherd/e3 and /dev/etherd/e7 both came up
>> just fine after the hard shutdown while the /dev/etherd/e4 and
>> /dev/etherd/e5 both have this superblock issue.  
>>     
>
> If we look at those devices in /proc/partitions:
>
>   
>>  152     0 12697913278 etherd/e4.1	<-- 11.8GiB
>>  152     1 1960494281 etherd/e4.1p1	<--  1.8GiB
>>  152    48 9523468862 etherd/e5.1	<--  8.8GiB
>>  152    49  933533929 etherd/e5.1p1	<--  0.9GiB
>>     
>
> you can see that the partitions don't actually seeem to span much of the
> device.  I don't know how that happened, but it's unlikely to be an xfs
> problem.... perhaps if you can figure out what went wrong there, and get
> your partitions back to the right(?) size xfs will see a consistent
> filesystem.
>
>   
>> Each of these devices
>> are running the same stuff except that /dev/etherd/e5 is slightly
>> smaller then the other ones in disk space.  See this information below,
>> do you have any suggestions to recover from it?  Is there anyway to
>> remap the partition description to fill the entire size correctly so
>> that the xfs_repair can complete its job?
>>     
>
> What sort of partition tables are on the devices?  I'll hazard a guess
> that they're dos partition tables made with parted?  Hmm yep from
> looking at the sizes of your devices and partitions, it does appear that
> the high bits of the size have been lost.
>
> If so then you've been bitten by a parted bug that lets you "create" dos
> partition tables larger than can actually be stored on-disk (2T IIRC),
> so that when you reboot, it appears to be truncated.  However, the xfs
> data is still there, if so.
>
> Depending on how big the dos partition table is I think some people have
> successfully replaced it with a GPT table, which can handle these larger
> sizes.  Doing that is a little tricky, and backing up the old table with
> dd is well-advised.
>
> -Eric
>   


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26 15:13       ` Javier Gomez
@ 2008-05-26 16:25         ` Eric Sandeen
  2008-05-26 19:46           ` Javier Gomez
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2008-05-26 16:25 UTC (permalink / raw)
  To: Javier Gomez; +Cc: xfs

Javier Gomez wrote:
> 
>     Thanks for the feedback.  Your right on the mark.  We did use
> "parted" to create the partitions on this device.  That would explain
> the issue we are having right now.  Do you have any suggestions on what
> to do next to correct this issue.  I have not seen any clear information
> on the net about this issue.  The information on these devices is very
> important to us and very critical we get it up again prior to tomorrow
> (or reasonably soon after).
> 
>     How would you suggest we try to repair the partition table.  Also
> given that its 13 TB a "dd" to backup the device will take a long time
> and I am also not sure what dd command to run that will grab the data
> correctly given the bad partition information currently in place.
>                 Javier

Basically, you want to replace the dos partition table with a GPT
partition table, without overwriting any of your filesystem (on dos
partition #1)  I can give you a basic walkthough but, do your own
thinking and don't assume that what I'm saying here is 100% perfect and
infallible.  This is the general idea.

For the backup, I'm just recommending backing up the partition table.

So I would use parted, and set the units to "sectors" :

(parted) unit s

print it out and you'll see where it starts:

Number  Start    End        Size       Type     File system  Flags
 1      63s      XXXXXXs    xxxxxxs    primary  ext3

So the original partition starts at sector #63; therefore I'd back up
the first 64 sectors:

dd if=/dev/etherd/e4.1 bs=512 count=64 of=e4.1.table.backup

Actually if it were me I'd probably back up a bit more in case something
goes wrong in the next steps, i.e. count=256 or so.  That'll get the dos
table and the first part of the fs, in case it were to get overwritten.

Now you want to remove the dos partition table & add a gpt partition
table, essentially what you have now:

1: [  dos table ][xfs filesystem data ... ]

then remove the dos partition table with parted to get:

2: [    empty   ][xfs filesystem data ... ]

And add a new gpt table with parted, with the first partition at exactly
the same start-point (63s) but this time extending to the end of the device:

3: [gpt][ empty ][xfs filesystem data ... ]

but this requires 3 things:

* the gpt table must fit in the first 63 sectors to not overwrite the
xfs filesystem (IIRC it should fit).
* the gpt table must point to a first-partition start point exactly the
same as what the dos table pointed to (sector 63) (I assume this starts
at 0 so sector 63 is the 64th sector; in any case you'd tell parted to
start at "63s" AFAIK.)
* the gpt table doesn't write anything to the *end* of the device, or if
it does, it's not clobbering any of the filesystem.

The last part is probably the trickiest; IIRC gpt can write backup
tables at the end of the device; however it's possible that your
filesystem doesn't actually extend that far.  I suppose I would use dd
to copy out the last few sectors of your pristine device as well, to
keep a copy before you do this.  Then I'd probably strace parted and
save output to a file to see where it actually wrote data when I created
the gpt table.  Maybe this is all overkill but it'd be safest.

After you've written the gpt table and convinced yourself that it didn't
overwrite any of the filesystem, I'd probably try an xfs_repair -n to
see if all looks well...

-Eric (who thinks maybe this is a common enough problem that it warrants
a faq, and maybe even automatic recovery script...)

p.s.

I suppose one other alternative, which is less involved but isn't a 100%
fix, would be to simply delete and recreate the *dos* partition table
with parted.  This would get you back the whole partition size for this
session, but it'd get lost on reboot.  This works for the current
session because parted actually pokes the too-large partition size
directly into the kernel when it writes the table, even though it can't
re-read it from disk on the next boot.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lost Superblock and need help recovering
  2008-05-26 16:25         ` Eric Sandeen
@ 2008-05-26 19:46           ` Javier Gomez
  0 siblings, 0 replies; 7+ messages in thread
From: Javier Gomez @ 2008-05-26 19:46 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs



    Thank you very much for the help on this one.  As of right now we 
are back up and running.  We actually followed your secondary suggestion 
to just "simply delete and recreate the *dos* partition table".  This 
worked great for the initial drive e4.1, but we had a number of issues 
with the e5.1.  With a lot of guess work, we finally got the e5.1 
partition to come up as well but we also had to run the xfs_repair on 
the drive.  Now that the devices are running we are going to move off 
all of the data to another device and then just reformat the entire unit 
again (without parted this time).  Again, thank you very much for your 
suggestions on this issue.
                Javier



Eric Sandeen wrote:
> Javier Gomez wrote:
>   
>>     Thanks for the feedback.  Your right on the mark.  We did use
>> "parted" to create the partitions on this device.  That would explain
>> the issue we are having right now.  Do you have any suggestions on what
>> to do next to correct this issue.  I have not seen any clear information
>> on the net about this issue.  The information on these devices is very
>> important to us and very critical we get it up again prior to tomorrow
>> (or reasonably soon after).
>>
>>     How would you suggest we try to repair the partition table.  Also
>> given that its 13 TB a "dd" to backup the device will take a long time
>> and I am also not sure what dd command to run that will grab the data
>> correctly given the bad partition information currently in place.
>>                 Javier
>>     
>
> Basically, you want to replace the dos partition table with a GPT
> partition table, without overwriting any of your filesystem (on dos
> partition #1)  I can give you a basic walkthough but, do your own
> thinking and don't assume that what I'm saying here is 100% perfect and
> infallible.  This is the general idea.
>
> For the backup, I'm just recommending backing up the partition table.
>
> So I would use parted, and set the units to "sectors" :
>
> (parted) unit s
>
> print it out and you'll see where it starts:
>
> Number  Start    End        Size       Type     File system  Flags
>  1      63s      XXXXXXs    xxxxxxs    primary  ext3
>
> So the original partition starts at sector #63; therefore I'd back up
> the first 64 sectors:
>
> dd if=/dev/etherd/e4.1 bs=512 count=64 of=e4.1.table.backup
>
> Actually if it were me I'd probably back up a bit more in case something
> goes wrong in the next steps, i.e. count=256 or so.  That'll get the dos
> table and the first part of the fs, in case it were to get overwritten.
>
> Now you want to remove the dos partition table & add a gpt partition
> table, essentially what you have now:
>
> 1: [  dos table ][xfs filesystem data ... ]
>
> then remove the dos partition table with parted to get:
>
> 2: [    empty   ][xfs filesystem data ... ]
>
> And add a new gpt table with parted, with the first partition at exactly
> the same start-point (63s) but this time extending to the end of the device:
>
> 3: [gpt][ empty ][xfs filesystem data ... ]
>
> but this requires 3 things:
>
> * the gpt table must fit in the first 63 sectors to not overwrite the
> xfs filesystem (IIRC it should fit).
> * the gpt table must point to a first-partition start point exactly the
> same as what the dos table pointed to (sector 63) (I assume this starts
> at 0 so sector 63 is the 64th sector; in any case you'd tell parted to
> start at "63s" AFAIK.)
> * the gpt table doesn't write anything to the *end* of the device, or if
> it does, it's not clobbering any of the filesystem.
>
> The last part is probably the trickiest; IIRC gpt can write backup
> tables at the end of the device; however it's possible that your
> filesystem doesn't actually extend that far.  I suppose I would use dd
> to copy out the last few sectors of your pristine device as well, to
> keep a copy before you do this.  Then I'd probably strace parted and
> save output to a file to see where it actually wrote data when I created
> the gpt table.  Maybe this is all overkill but it'd be safest.
>
> After you've written the gpt table and convinced yourself that it didn't
> overwrite any of the filesystem, I'd probably try an xfs_repair -n to
> see if all looks well...
>
> -Eric (who thinks maybe this is a common enough problem that it warrants
> a faq, and maybe even automatic recovery script...)
>
>
> p.s.
>
> I suppose one other alternative, which is less involved but isn't a 100%
> fix, would be to simply delete and recreate the *dos* partition table
> with parted.  This would get you back the whole partition size for this
> session, but it'd get lost on reboot.  This works for the current
> session because parted actually pokes the too-large partition size
> directly into the kernel when it writes the table, even though it can't
> re-read it from disk on the next boot.
>   


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-05-26 19:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26  2:40 Lost Superblock and need help recovering Javier Gomez
2008-05-26  3:40 ` Eric Sandeen
2008-05-26 10:35   ` Javier Gomez
2008-05-26 14:49     ` Eric Sandeen
2008-05-26 15:13       ` Javier Gomez
2008-05-26 16:25         ` Eric Sandeen
2008-05-26 19:46           ` Javier Gomez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox