Re: Need help to recover root filesystem after a power supply issue

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrey Zhunev <a-j@a-j.ru>
To: Eric Sandeen <sandeen@sandeen.net>, linux-xfs@vger.kernel.org
Subject: Re: Need help to recover root filesystem after a power supply issue
Date: Wed, 10 Jul 2019 18:02:30 +0300	[thread overview]
Message-ID: <15810023599.20190710180230@a-j.ru> (raw)
In-Reply-To: <8bef8d1e-2f5f-a8bd-08d3-fff0dce1256e@sandeen.net>

Wednesday, July 10, 2019, 5:23:41 PM, you wrote:



> On 7/10/19 8:58 AM, Andrey Zhunev wrote:
>> Wednesday, July 10, 2019, 4:26:14 PM, you wrote:
>> 
>>> On 7/10/19 4:56 AM, Andrey Zhunev wrote:
>>>> Hello All,
>>>>
>>>> I am struggling to recover my system after a PSU failure, and I was
>>>> suggested to ask here for support.
>>>>
>>>> One of the hard drives throws some read errors, and that happen to be
>>>> my root drive...
>>>> My system is CentOS 7, and the root partition is a part of LVM.
>>>>
>>>> [root@mgmt ~]# lvscan
>>>>   ACTIVE            '/dev/centos/root' [<98.83 GiB] inherit
>>>>   ACTIVE            '/dev/centos/home' [<638.31 GiB] inherit
>>>>   ACTIVE            '/dev/centos/swap' [<7.52 GiB] inherit
>>>> [root@mgmt ~]#
>>>>
>>>> [root@tftp ~]# file -s /dev/centos/root
>>>> /dev/centos/root: symbolic link to `../dm-3'
>>>> [root@tftp ~]# file -s /dev/centos/home
>>>> /dev/centos/home: symbolic link to `../dm-4'
>>>> [root@tftp ~]# file -s /dev/dm-3
>>>> /dev/dm-3: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs)
>>>> [root@tftp ~]# file -s /dev/dm-4
>>>> /dev/dm-4: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs)
>>>>
>>>>
>>>> [root@tftp ~]# xfs_repair /dev/centos/root
>>>> Phase 1 - find and verify superblock...
>>>> superblock read failed, offset 53057945600, size 131072, ag 2, rval -1
>>>>
>>>> fatal error -- Input/output error
>> 
>>> look at dmesg, see what the kernel says about the read failure.
>> 
>>> You might be able to use https://www.gnu.org/software/ddrescue/ 
>>> to read as many sectors off the device into an image file as possible,
>>> and that image might be enough to work with for recovery.  That would be
>>> my first approach:
>> 
>>> 1) use dd-rescue to create an image file of the device
>>> 2) make a copy of that image file
>>> 3) run xfs_repair -n on the copy to see what it would do
>>> 4) if that looks reasonable run xfs_repair on the copy
>>> 5) mount the copy and see what you get
>> 
>>> But if your drive simply cannot be read at all, this is not a filesystem
>>> problem, it is a hardware problem. If this is critical data you may wish
>>> to hire a data recovery service.
>> 
>>> -Eric
>> 
>> 
>> Hi Eric,
>> 
>> Thanks for your message!
>> I already started to copy the failing drive with ddrescue. This is a
>> large drive, so it takes some time to complete...
>> 
>> When I tried to run xfs_repair on the original (failing) drive, the
>> xfs_repair was unable to read the superblock and then just quitted
>> with an 'io error'.
>> Do you think it can behave differently on a copied image ?

> As I said, look at dmesg to see what failed on the original drive read
> attempt.

> ddrescue will fill unreadable sectors with 0, and then of course that
> can be read from the image file.


Ooops, I forgot to paste the error message from dmesg.
Here it is:

Jul 10 11:48:05 mgmt kernel: ata1.00: exception Emask 0x0 SAct 0x180000 SErr 0x0 action 0x0
Jul 10 11:48:05 mgmt kernel: ata1.00: irq_stat 0x40000008
Jul 10 11:48:05 mgmt kernel: ata1.00: failed command: READ FPDMA QUEUED
Jul 10 11:48:05 mgmt kernel: ata1.00: cmd 60/00:98:28:ac:3e/01:00:03:00:00/40 tag 19 ncq 131072 in#012         res 41/40:00:08:ad:3e/00:00:03:00:00/40 Emask 0x409 (media error) <F>
Jul 10 11:48:05 mgmt kernel: ata1.00: status: { DRDY ERR }
Jul 10 11:48:05 mgmt kernel: ata1.00: error: { UNC }
Jul 10 11:48:05 mgmt kernel: ata1.00: configured for UDMA/133
Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 Sense Key : Medium Error [current] [descriptor]
Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 Add. Sense: Unrecovered read error - auto reallocate failed
Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 CDB: Read(16) 88 00 00 00 00 00 03 3e ac 28 00 00 01 00 00 00
Jul 10 11:48:05 mgmt kernel: blk_update_request: I/O error, dev sda, sector 54439176
Jul 10 11:48:05 mgmt kernel: ata1: EH complete

There are several of these.
At the moment ddrescue reports 22 read errors (with 35% of the data
copied to a new storage). If I remember correctly, the LVM with my
root partition is at the end of the drive. This means more errors will
likely come... :( 

The way I interpret the dmesg message, that's just a read error. I'm
not sure, but maybe a complete wipe of the drive will even overwrite /
clear these unreadable sectors.
Well, that's something to be checked after the copy process finishes.


---
Best regards,
 Andrey

next prev parent reply	other threads:[~2019-07-10 15:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-10  9:56 Need help to recover root filesystem after a power supply issue Andrey Zhunev
2019-07-10 13:26 ` Eric Sandeen
2019-07-10 13:58   ` Andrey Zhunev
2019-07-10 14:23     ` Eric Sandeen
2019-07-10 15:02       ` Andrey Zhunev [this message]
2019-07-10 15:23         ` Eric Sandeen
2019-07-10 18:21         ` Carlos E. R.
  -- strict thread matches above, loose matches on Subject: below --
2019-07-10  9:47 Andrey Zhunev
2019-07-10 14:30 ` Chris Murphy
2019-07-10 15:28   ` Andrey Zhunev
2019-07-10 15:45     ` Chris Murphy
2019-07-10 16:07       ` Andrey Zhunev
2019-07-10 16:46         ` Chris Murphy
2019-07-10 16:47           ` Chris Murphy
2019-07-10 17:16             ` Andrey Zhunev
2019-07-10 18:03               ` Chris Murphy
2019-07-10 18:35                 ` Carlos E. R.
2019-07-10 19:30                   ` Chris Murphy
2019-07-10 23:43                     ` Andrey Zhunev
2019-07-11  2:47                       ` Carlos E. R.
2019-07-11  7:10                         ` Andrey Zhunev
2019-07-11 10:23                           ` Carlos E. R.
2019-07-10 16:51         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15810023599.20190710180230@a-j.ru \
    --to=a-j@a-j.ru \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox