public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs_repair force_geometry
@ 2013-05-13 12:24 Benedikt Schmidt
  2013-05-13 16:58 ` Michael L. Semon
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Benedikt Schmidt @ 2013-05-13 12:24 UTC (permalink / raw)
  To: xfs

Hi,
currently I'm looking for the correct usage of the force_geometry option 
of xfs_repair. I wasn't able to find more documentation on this option 
beside that it exists. Could please somebody explain it to me?

For a more detailed description of my problem: I've got here a hard disk 
which is dying at the moment, so I copied all the content with dd_rescue 
to a new and bigger one. To use xfs_copy wasn't possible as the 
filesystem was already corrupted. So now I've got nearly everything on 
the second hard disk (dd_rescue could'nt copy something around 6 or 7 
MB), but I can not mount the filesystem or even run xfs_repair on it, as 
it fails to find a superblock. I think the problem lies in the fact that 
the new disk has a different geometry than the previous one.

Kind regards,
Benedikt Schmidt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-13 12:24 xfs_repair force_geometry Benedikt Schmidt
@ 2013-05-13 16:58 ` Michael L. Semon
  2013-05-13 17:56 ` Stan Hoeppner
  2013-05-13 22:15 ` Eric Sandeen
  2 siblings, 0 replies; 9+ messages in thread
From: Michael L. Semon @ 2013-05-13 16:58 UTC (permalink / raw)
  To: benediktibk; +Cc: Benedikt Schmidt, xfs

Ordinary user comments follow...

On 05/13/2013 08:24 AM, Benedikt Schmidt wrote:
> Hi,
> currently I'm looking for the correct usage of the force_geometry option
> of xfs_repair. I wasn't able to find more documentation on this option
> beside that it exists. Could please somebody explain it to me?

According to the xfs_repair man page, it just means to skip the geometry 
test.  In other words, it's not a place to place CHS values or specify 
that your new drive has 4k sectors.  Usage is '-o force_geometry'.

> For a more detailed description of my problem: I've got here a hard disk
> which is dying at the moment, so I copied all the content with dd_rescue
> to a new and bigger one. To use xfs_copy wasn't possible as the
> filesystem was already corrupted. So now I've got nearly everything on
> the second hard disk (dd_rescue could'nt copy something around 6 or 7
> MB), but I can not mount the filesystem or even run xfs_repair on it, as
> it fails to find a superblock. I think the problem lies in the fact that
> the new disk has a different geometry than the previous one.
>
> Kind regards,
> Benedikt Schmidt

Did you ddrescue to a regular file or to a new partition?  In my case, 
ddrescue didn't let me do a simple `ddrescue /dev/sda8 /dev/sdb6`, so I 
passed on using ddrescue's suggested '--force' flag and instead did 
something like this:

root@plbearer:~# ddrescue /dev/sda8 /mnt/xfstests-test/fs_as_file

GNU ddrescue 1.16
Press Ctrl-C to interrupt
rescued:   671088 kB,  errsize:       0 B,  current rate:    9043 kB/s
    ipos:   671023 kB,   errors:       0,    average rate:   31956 kB/s
    opos:   671023 kB,     time since last successful read:       0 s
Finished

# -f means "check a regular file"
# -n means "do not modify filesystem"
root@plbearer:~# xfs_repair -f -n -o force_geometry 
/mnt/xfstests-test/fs_as_file
Phase 1 - find and verify superblock...
Phase 2 - using internal log
         - scan filesystem freespace and inode maps...
         - found root inode chunk
Phase 3 - for each AG...
         - scan (but don't clear) agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
         - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
         - setting up duplicate extent list...
         - check for inodes claiming duplicate blocks...
         - agno = 0
         - agno = 1
         - agno = 2
         - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
         - traversing filesystem ...
         - traversal finished ...
         - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

root@plbearer:~# mount -t xfs -o loop /mnt/xfstests-test/fs_as_file 
/mnt/loopback/

With the regular file mounted loopback, this is the part where I would 
somehow get the data to a freshly-made XFS filesystem somewhere else, 
then verify that the data that I need is indeed still intact and valid. 
  There are lots of tools to do such operations.  I lean on 
xfsdump/xfsrestore because I have longstanding good experience with the 
programs and see such situations as a really bad time to learn new tools.

If this is your last copy of important data, you should a) use the '-n' 
flag of xfs_repair before deciding to modify the ddrescued filesystem, 
and/or b) duplicate your recovered filesystem to another place so you 
have at least one good emergency backup.

Good luck, and be careful!

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-13 12:24 xfs_repair force_geometry Benedikt Schmidt
  2013-05-13 16:58 ` Michael L. Semon
@ 2013-05-13 17:56 ` Stan Hoeppner
  2013-05-13 22:15 ` Eric Sandeen
  2 siblings, 0 replies; 9+ messages in thread
From: Stan Hoeppner @ 2013-05-13 17:56 UTC (permalink / raw)
  To: xfs

On 5/13/2013 7:24 AM, Benedikt Schmidt wrote:
> Hi,
> currently I'm looking for the correct usage of the force_geometry option
> of xfs_repair. I wasn't able to find more documentation on this option
> beside that it exists. Could please somebody explain it to me?

> For a more detailed description of my problem: I've got here a hard disk
> which is dying at the moment, so I copied all the content with dd_rescue
> to a new and bigger one. To use xfs_copy wasn't possible as the
> filesystem was already corrupted. 

This is a standalone disk which wholly contains an XFS filesystem, yes?
 The filesystem is corrupted and cannot be repaired?  And it won't mount?

> So now I've got nearly everything on
> the second hard disk (dd_rescue could'nt copy something around 6 or 7
> MB), but I can not mount the filesystem or even run xfs_repair on it, as
> it fails to find a superblock. I think the problem lies in the fact that
> the new disk has a different geometry than the previous one.

Given the fact that the filesystem is already corrupted and not
repairable, and the disk is in a state of failure, it's not surprising
you can't repair or mount after the ddrescue copy, which now contains
the same corrupted structures.

If the original can't be repaired nor mounted, why would you believe a
sector-by-sector copy could be?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-13 12:24 xfs_repair force_geometry Benedikt Schmidt
  2013-05-13 16:58 ` Michael L. Semon
  2013-05-13 17:56 ` Stan Hoeppner
@ 2013-05-13 22:15 ` Eric Sandeen
  2013-05-14  5:11   ` Benedikt Schmidt
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2013-05-13 22:15 UTC (permalink / raw)
  To: benediktibk; +Cc: Benedikt Schmidt, xfs

On 5/13/13 7:24 AM, Benedikt Schmidt wrote:
> Hi, currently I'm looking for the correct usage of the force_geometry
> option of xfs_repair. I wasn't able to find more documentation on
> this option beside that it exists. Could please somebody explain it
> to me?
> 
> For a more detailed description of my problem: I've got here a hard
> disk which is dying at the moment, so I copied all the content with
> dd_rescue to a new and bigger one. To use xfs_copy wasn't possible as
> the filesystem was already corrupted. So now I've got nearly
> everything on the second hard disk (dd_rescue could'nt copy something
> around 6 or 7 MB), but I can not mount the filesystem or even run
> xfs_repair on it, as it fails to find a superblock. I think the
> problem lies in the fact that the new disk has a different geometry
> than the previous one.

the geometry in "force_geometry" refers to the filesystem geometry,
not the CHS geometry of your disk.

It's only needed if the fs has only 2 allocation groups and they don't
match, or if the fs has only a single allocation group (and therefore
has nothing to test against).

So I don't think that's the option you need.

I don't know what you copied the fs to, but perhaps you copied the
entire disk, not the partition.  

How did you invoke dd_rescue?

If you dd_rescued to a file, what does:

# file <file you dd'd to>

say?

Or, if you dd_rescued to a device, what does

# file -s <dev you dd'd to>

say?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-13 22:15 ` Eric Sandeen
@ 2013-05-14  5:11   ` Benedikt Schmidt
  2013-05-14  7:50     ` Michael L. Semon
  0 siblings, 1 reply; 9+ messages in thread
From: Benedikt Schmidt @ 2013-05-14  5:11 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1875 bytes --]

First of all: Thanks for your very fast and helpful response.

I copied actually only the partition, not the whole disk: /dd_rescue 
--force -r1 /dev/sdd1 /dev/sdc1/
The cause for this is that I don't have enough space left on another 
device to store a whole copy of the faulty disk. I thought it would be 
possible, like in some examples I found with google, that you can rescue 
a partition directly.

/file -s /dev/sdc1/ says:
//dev/sdc1: data/

The disks look like this (/fdisk -l/):
/Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors//
//Units = Sektoren of 1 * 512 = 512 bytes//
//Sector size (logical/physical): 512 bytes / 4096 bytes//
//I/O size (minimum/optimal): 4096 bytes / 4096 bytes//
//Disk identifier: 0xcba506ee//
//
//   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
///dev/sdc1             256   732566645   366283195   83 Linux//
//
//Disk /dev/sdd: 2000.4 GB, 2000397852160 bytes, 3907027055 sectors//
//Units = Sektoren of 1 * 512 = 512 bytes//
//Sector size (logical/physical): 512 bytes / 512 bytes//
//I/O size (minimum/optimal): 512 bytes / 512 bytes//
//Disk identifier: 0x3c34826b//
//
//   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
///dev/sdd1              63  3907024064  1953512001   83 Linux/

If it is not possible to rescue the partition this way I will have to 
extend my to RAID5 so that I can put the copy of the faulty disk on this 
one, like Michael explained in his answer. I just hoped that I can avoid 
this, because it would save me more than 100EUR.

As last information: The content of this copy is not totally lost, 
actually only the last few files I have added. All the other stuff is 
already stored on the RAID5, only the latest stuff is not contained in 
this backup. So I don't loose everything if something goes wrong (at 
least one thing :-) ).

Kind regards,
Benedikt

[-- Attachment #1.2: Type: text/html, Size: 3122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-14  5:11   ` Benedikt Schmidt
@ 2013-05-14  7:50     ` Michael L. Semon
  2013-05-14  8:56       ` Benedikt Schmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Michael L. Semon @ 2013-05-14  7:50 UTC (permalink / raw)
  To: benediktibk; +Cc: Benedikt Schmidt, xfs

On 05/14/2013 01:11 AM, Benedikt Schmidt wrote:
> First of all: Thanks for your very fast and helpful response.
>
> I copied actually only the partition, not the whole disk: /dd_rescue
> --force -r1 /dev/sdd1 /dev/sdc1/
> The cause for this is that I don't have enough space left on another
> device to store a whole copy of the faulty disk. I thought it would be
> possible, like in some examples I found with google, that you can rescue
> a partition directly.

Understood.  This seems like a valid option.  Had fdisk, cfdisk, and 
gdisk been more cooperative over the past year, this would have been my 
first option.

> /file -s /dev/sdc1/ says:
> //dev/sdc1: data/

This is different from what I got, but maybe Eric sees something in your 
answer.

> The disks look like this (/fdisk -l/):
> /Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors//
> //Units = Sektoren of 1 * 512 = 512 bytes//
> //Sector size (logical/physical): 512 bytes / 4096 bytes//
> //I/O size (minimum/optimal): 4096 bytes / 4096 bytes//
> //Disk identifier: 0xcba506ee//
> //
> //   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
> ///dev/sdc1             256   732566645   366283195   83 Linux//
> //
> //Disk /dev/sdd: 2000.4 GB, 2000397852160 bytes, 3907027055 sectors//
> //Units = Sektoren of 1 * 512 = 512 bytes//
> //Sector size (logical/physical): 512 bytes / 512 bytes//
> //I/O size (minimum/optimal): 512 bytes / 512 bytes//
> //Disk identifier: 0x3c34826b//
> //
> //   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
> ///dev/sdd1              63  3907024064  1953512001   83 Linux/
>
> If it is not possible to rescue the partition this way I will have to
> extend my to RAID5 so that I can put the copy of the faulty disk on this
> one, like Michael explained in his answer. I just hoped that I can avoid
> this, because it would save me more than 100€.

I had not much information to use, so I set up the safest possible 
scenario and hoped that you were getting results that were close to that.

If the few extra files that you're rescuing aren't worth 100 euros, then 
it's not worth 100 euros to make a duplicate of a dump of an 
already-damaged filesystem.

The crazy, reckless guide is this:

1) use `xfs_repair -n /dev/sdc1`.  If that looks nice,...

2) use `xfs_repair /dev/sdc1`...

   a) A repaired partition is a good sign.  Mount that partition!

   b) If the "attempting to find secondary superblock" search ends in 
"Sorry, could not find valid secondary superblock," then maybe something 
went wrong in the original data transfer.  You might have to give this 
step some time to complete, and it will print dots for a while.  Either 
that or the failures in your hard drive really did hit all of the 
superblocks.

   c) If the "attempting to find secondary superblock" finds something, 
it might make everything well but spit some files into lost+found.  If 
the repair goes badly, there's a chance you'll be using dd to look for 
your data.

   d) If it's something else--xfs_repair segfaults, needs to be run 
again, whatever, mention it--and at least you'll be closer to the real 
answer.

3) If all else fails, and especially when a backup is handy, you could 
try `xfs_repair -L` to zero the log.  This helps when xfs_repair asks 
you to mount the filesystem to allow metadata updates to happen, but 
Linux has an oops as the filesystem is mounted.  In many other 
scenarios, it can work against you.  This is the second-to-last resort.

> As last information: The content of this copy is not totally lost,
> actually only the last few files I have added. All the other stuff is
> already stored on the RAID5, only the latest stuff is not contained in
> this backup. So I don't loose everything if something goes wrong (at
> least one thing :-) ).

Really, it becomes a question of whether it would be faster to search 
for the data using dd and grep, use xfs_repair and hope it works, or 
recreate the data from scratch.

The hope is that dd_rescue does a credible job for you, and that XFS can 
be made to mount something, somewhere, so that you can grab those last 
few files.  The very last resort would be to do all of this repair stuff 
on the original damaged partition, but the safety net goes away after that.

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-14  7:50     ` Michael L. Semon
@ 2013-05-14  8:56       ` Benedikt Schmidt
  2013-05-14 12:35         ` Stan Hoeppner
  0 siblings, 1 reply; 9+ messages in thread
From: Benedikt Schmidt @ 2013-05-14  8:56 UTC (permalink / raw)
  To: xfs

On 14.05.2013 09:50, Michael L. Semon wrote:
> On 05/14/2013 01:11 AM, Benedikt Schmidt wrote:
>> First of all: Thanks for your very fast and helpful response.
>>
>> I copied actually only the partition, not the whole disk: /dd_rescue
>> --force -r1 /dev/sdd1 /dev/sdc1/
>> The cause for this is that I don't have enough space left on another
>> device to store a whole copy of the faulty disk. I thought it would be
>> possible, like in some examples I found with google, that you can rescue
>> a partition directly.
>
> Understood.  This seems like a valid option.  Had fdisk, cfdisk, and 
> gdisk been more cooperative over the past year, this would have been 
> my first option.
>
>> /file -s /dev/sdc1/ says:
>> //dev/sdc1: data/
>
> This is different from what I got, but maybe Eric sees something in 
> your answer.
>
>> The disks look like this (/fdisk -l/):
>> /Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors//
>> //Units = Sektoren of 1 * 512 = 512 bytes//
>> //Sector size (logical/physical): 512 bytes / 4096 bytes//
>> //I/O size (minimum/optimal): 4096 bytes / 4096 bytes//
>> //Disk identifier: 0xcba506ee//
>> //
>> //   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
>> ///dev/sdc1             256   732566645   366283195   83 Linux//
>> //
>> //Disk /dev/sdd: 2000.4 GB, 2000397852160 bytes, 3907027055 sectors//
>> //Units = Sektoren of 1 * 512 = 512 bytes//
>> //Sector size (logical/physical): 512 bytes / 512 bytes//
>> //I/O size (minimum/optimal): 512 bytes / 512 bytes//
>> //Disk identifier: 0x3c34826b//
>> //
>> //   Gerät  boot.     Anfang        Ende     Blöcke   Id System//
>> ///dev/sdd1              63  3907024064  1953512001   83 Linux/
>>
>> If it is not possible to rescue the partition this way I will have to
>> extend my to RAID5 so that I can put the copy of the faulty disk on this
>> one, like Michael explained in his answer. I just hoped that I can avoid
>> this, because it would save me more than 100€.
>
> I had not much information to use, so I set up the safest possible 
> scenario and hoped that you were getting results that were close to that.
>
> If the few extra files that you're rescuing aren't worth 100 euros, 
> then it's not worth 100 euros to make a duplicate of a dump of an 
> already-damaged filesystem.
>
> The crazy, reckless guide is this:
>
> 1) use `xfs_repair -n /dev/sdc1`.  If that looks nice,...
>
> 2) use `xfs_repair /dev/sdc1`...
>
>   a) A repaired partition is a good sign.  Mount that partition!
>
>   b) If the "attempting to find secondary superblock" search ends in 
> "Sorry, could not find valid secondary superblock," then maybe 
> something went wrong in the original data transfer.  You might have to 
> give this step some time to complete, and it will print dots for a 
> while.  Either that or the failures in your hard drive really did hit 
> all of the superblocks.
>
>   c) If the "attempting to find secondary superblock" finds something, 
> it might make everything well but spit some files into lost+found.  If 
> the repair goes badly, there's a chance you'll be using dd to look for 
> your data.
>
>   d) If it's something else--xfs_repair segfaults, needs to be run 
> again, whatever, mention it--and at least you'll be closer to the real 
> answer.
>
> 3) If all else fails, and especially when a backup is handy, you could 
> try `xfs_repair -L` to zero the log.  This helps when xfs_repair asks 
> you to mount the filesystem to allow metadata updates to happen, but 
> Linux has an oops as the filesystem is mounted.  In many other 
> scenarios, it can work against you.  This is the second-to-last resort.
>
>> As last information: The content of this copy is not totally lost,
>> actually only the last few files I have added. All the other stuff is
>> already stored on the RAID5, only the latest stuff is not contained in
>> this backup. So I don't loose everything if something goes wrong (at
>> least one thing :-) ).
>
> Really, it becomes a question of whether it would be faster to search 
> for the data using dd and grep, use xfs_repair and hope it works, or 
> recreate the data from scratch.
>
> The hope is that dd_rescue does a credible job for you, and that XFS 
> can be made to mount something, somewhere, so that you can grab those 
> last few files.  The very last resort would be to do all of this 
> repair stuff on the original damaged partition, but the safety net 
> goes away after that.
>
> Michael

I see, I should have mentioned this earlier. I already tried xfs_repair 
and it failed to find the second superblock. Because I am still able to 
mount the original disk and most parts of it I guessed that xfs_repair 
is confused by the different disk geometries. What I have also already 
tried out was, naturally, to copy the whole stuff with for example cp or 
xfs_copy, but both failed because of filesystem errors. The only program 
which didn't fail to copy the data was dd_rescue, which can handle the 
errors. That is why I used, as it was my only option (as far as I can see).

Kind regards,
Benedikt

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-14  8:56       ` Benedikt Schmidt
@ 2013-05-14 12:35         ` Stan Hoeppner
  2013-05-14 17:54           ` Michael L. Semon
  0 siblings, 1 reply; 9+ messages in thread
From: Stan Hoeppner @ 2013-05-14 12:35 UTC (permalink / raw)
  To: xfs

On 5/14/2013 3:56 AM, Benedikt Schmidt wrote:
...
> I see, I should have mentioned this earlier. I already tried xfs_repair
> and it failed to find the second superblock. Because I am still able to
> mount the original disk and most parts of it I guessed that xfs_repair
> is confused by the different disk geometries. What I have also already
> tried out was, naturally, to copy the whole stuff with for example cp or
> xfs_copy, but both failed because of filesystem errors. The only program
> which didn't fail to copy the data was dd_rescue, which can handle the
> errors. That is why I used, as it was my only option (as far as I can see).

You are able to mount the XFS on the original disk which means the
superblocks are apparently intact and the log section isn't corrupt.
But when you attempt to copy files from that XFS to another
disk/filesystem you get what you describe as filesystem errors.  How far
did the cp/xfs_copy progress before you received the filesystem errors?
 What is the result of running xfs_repair -n on the original filesystem?

The point of these questions is to reveal whether the original disk
simply has media surface errors toward the end of the disk where you
wrote those few most recent files, *or* if the problem with the disk is
electrical or mechanical.

The fact that cp/xfs_copy fail, yet ddrescue completes by retrying
(though possibly while ignoring some sectors due to retry limit of 1),
would tend to suggest the problem is electrical or mechanical, not
platter surface defects.  From what you've described so far it sounds
like the more load you put on the drive the more errors it throws.  This
is typical when the internal power supply circuit on a drive is failing.

While the drive is idle, I would suggest you use xfs_db on the original
XFS to locate the positions of those few files that are not backed up.
Unmount the XFS and use dd with skip/seek to copy only these files to
another location.  Do one file at a time to put as little load on the
drive as possible.  Give it some resting time between dd operations.  If
this works it eliminates the need to expand your RAID5 or attempt more
full partition copies to the new 2TB drive.  If this doesn't work, it
also eliminates the need for either of these steps, as it will
demonstrate it's simply not possible to recover the data.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair force_geometry
  2013-05-14 12:35         ` Stan Hoeppner
@ 2013-05-14 17:54           ` Michael L. Semon
  0 siblings, 0 replies; 9+ messages in thread
From: Michael L. Semon @ 2013-05-14 17:54 UTC (permalink / raw)
  To: stan; +Cc: xfs

On 05/14/2013 08:35 AM, Stan Hoeppner wrote:
> On 5/14/2013 3:56 AM, Benedikt Schmidt wrote:
> ...
>> I see, I should have mentioned this earlier. I already tried xfs_repair
>> and it failed to find the second superblock. Because I am still able to
>> mount the original disk and most parts of it I guessed that xfs_repair
>> is confused by the different disk geometries. What I have also already
>> tried out was, naturally, to copy the whole stuff with for example cp or
>> xfs_copy, but both failed because of filesystem errors. The only program
>> which didn't fail to copy the data was dd_rescue, which can handle the
>> errors. That is why I used, as it was my only option (as far as I can see).
>
> You are able to mount the XFS on the original disk which means the
> superblocks are apparently intact and the log section isn't corrupt.
> But when you attempt to copy files from that XFS to another
> disk/filesystem you get what you describe as filesystem errors.  How far
> did the cp/xfs_copy progress before you received the filesystem errors?
>   What is the result of running xfs_repair -n on the original filesystem?
>
> The point of these questions is to reveal whether the original disk
> simply has media surface errors toward the end of the disk where you
> wrote those few most recent files, *or* if the problem with the disk is
> electrical or mechanical.
>
> The fact that cp/xfs_copy fail, yet ddrescue completes by retrying
> (though possibly while ignoring some sectors due to retry limit of 1),
> would tend to suggest the problem is electrical or mechanical, not
> platter surface defects.  From what you've described so far it sounds
> like the more load you put on the drive the more errors it throws.  This
> is typical when the internal power supply circuit on a drive is failing.
>
> While the drive is idle, I would suggest you use xfs_db on the original
> XFS to locate the positions of those few files that are not backed up.
> Unmount the XFS and use dd with skip/seek to copy only these files to
> another location.  Do one file at a time to put as little load on the
> drive as possible.  Give it some resting time between dd operations.  If
> this works it eliminates the need to expand your RAID5 or attempt more
> full partition copies to the new 2TB drive.  If this doesn't work, it
> also eliminates the need for either of these steps, as it will
> demonstrate it's simply not possible to recover the data.
>

I've been hesitant to suggest using the smartmontools to aid in this 
quest.  In the event of surface errors, `smartctl -a /dev/sdd` may or 
may not show the exact error locations.  The read error rate numbers 
might be helpful, too.  However, smartctl has extra features that might 
cause SMART to remap sectors that could be read one last time. `smartctl 
--test=long /dev/sdd` should be a no-no at this point.  At any rate, I 
wouldn't want that SMART initialization clunk noise to be the drive's 
last dying gasp.  Thoughts?

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-05-14 17:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13 12:24 xfs_repair force_geometry Benedikt Schmidt
2013-05-13 16:58 ` Michael L. Semon
2013-05-13 17:56 ` Stan Hoeppner
2013-05-13 22:15 ` Eric Sandeen
2013-05-14  5:11   ` Benedikt Schmidt
2013-05-14  7:50     ` Michael L. Semon
2013-05-14  8:56       ` Benedikt Schmidt
2013-05-14 12:35         ` Stan Hoeppner
2013-05-14 17:54           ` Michael L. Semon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox