All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Katsubo <dmitry.katsubo@gmail.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Kernel crash if both devices in raid1 are failing
Date: Mon, 18 Apr 2016 01:18:31 +0200	[thread overview]
Message-ID: <571419C7.6070709@gmail.com> (raw)
In-Reply-To: <570FFDFE.3050305@gmail.com>

On 2016-04-14 22:30, Dmitry Katsubo wrote:
> Dear btrfs community,
> 
> I have the following setup:
> 
> # btrfs fi show /home
> Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> 	Total devices 3 FS bytes used 55.68GiB
> 	devid    1 size 52.91GiB used 0.00B path /dev/sdd2
> 	devid    2 size 232.89GiB used 59.03GiB path /dev/sda
> 	devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1
> 
> btrfs volume was created in raid1 mode both for data and metadata and mounted
> with compress=lzo option.
> 
> Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
> leads to system crash if I start the system in runlevel 3 (see crash1.log).
> 
> After I have started the system in single mode, volume can be mounted in rw
> mode and I can write some data into it. Unfortunately when I tried to read
> a certain file, the system crashed (see crash2.log).
> 
> I have started scrub on the volume and here is the report:
> 
> # btrfs scrub status /home
> scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> 	scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
> 	total bytes scrubbed: 55.68GiB with 1767 errors
> 	error details: verify=175 csum=1592
> 	corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
> 
> Obviously, some data is lost. However due to above crash, I cannot just copy
> the data from the volume. I would assume that I still can access the data, but
> the files for which data is lost, should result I/O error (I would then recover
> them from my backup).
> 
> I have decided to attach another drive and remove failing devices one-by-one.
> However that does not work:
> 
> # btrfs dev delete /dev/sda /home
> [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  168.684236] ata3.00: BMDMA stat 0x25
> [  168.688464] ata3.00: failed command: READ DMA
> [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  168.701281] ata3.00: status: { DRDY ERR }
> [  168.705600] ata3.00: error: { UNC }
> [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
> [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  172.828651] ata3.00: BMDMA stat 0x25
> [  172.833281] ata3.00: failed command: READ DMA
> [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  172.847296] ata3.00: status: { DRDY ERR }
> [  172.852054] ata3.00: error: { UNC }
> [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
> ERROR: error removing device '/dev/sda': Input/output error
> 
> The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> btrfs "force" option so that btrfs balances only chunks that are accessible? I
> can potentially physically disconnect /dev/sda, but the loss will be greater
> I believe.
> 
> How can I proceed except btrfs restore?
> 
> During scrub operation the following was recorded in the logs:
> 
> [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)
> 
> If I collect all the messages like this, will it give a full picture of damaged files?
> 
> Many thanks in advance.
> 
> P.S. Linux kernel v4.4.2, btrfs-progs v4.4.

I have decided to try "btrfs restore". Actually I have discovered two usability
points about it:

1. I cannot run this utility as following:

btrfs -i restore /dev/sda /mnt/usb &> log

because this command is interactive and may read something from the terminal.
It would be nice if there is a flag -y (answer "yes" to all questions) so that
no input is required from user. The example of the question is:

We seem to be looping a lot on ..., do you want to keep going on? [y/N/a]

In general this question puzzles me. What does it mean? As far as I understood
it prevents btrfs restore from looping forever. Should I consider those files
as lost? I have also hit the same problem as discussed in [1]: answer
"a" (always) still causes the questions to be asked.

2. btrfs restore does not print a final statistics: how many files are
successfully restored, and how many have failed.

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html

-- 
With best regards,
Dmitry

  reply	other threads:[~2016-04-17 23:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
2016-04-17 23:18 ` Dmitry Katsubo [this message]
2016-04-21  3:45   ` Liu Bo
     [not found]     ` <571DC34A.50509@gmail.com>
2016-04-27  2:44       ` Dmitry Katsubo
2016-05-02 20:51         ` Dmitry Katsubo
2016-04-18  0:19 ` Chris Murphy
2016-04-19  5:45   ` Dmitry Katsubo
2016-04-19  7:58     ` Duncan
2016-04-20 22:02       ` Dmitry Katsubo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571419C7.6070709@gmail.com \
    --to=dmitry.katsubo@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.