linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Dmitry Katsubo <dmitry.katsubo@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Kernel crash if both devices in raid1 are failing
Date: Wed, 20 Apr 2016 20:45:24 -0700	[thread overview]
Message-ID: <20160421034524.GA26182@localhost.localdomain> (raw)
In-Reply-To: <571419C7.6070709@gmail.com>

On Mon, Apr 18, 2016 at 01:18:31AM +0200, Dmitry Katsubo wrote:
> On 2016-04-14 22:30, Dmitry Katsubo wrote:
> > Dear btrfs community,
> > 
> > I have the following setup:
> > 
> > # btrfs fi show /home
> > Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> > 	Total devices 3 FS bytes used 55.68GiB
> > 	devid    1 size 52.91GiB used 0.00B path /dev/sdd2
> > 	devid    2 size 232.89GiB used 59.03GiB path /dev/sda
> > 	devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1
> > 
> > btrfs volume was created in raid1 mode both for data and metadata and mounted
> > with compress=lzo option.
> > 
> > Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
> > leads to system crash if I start the system in runlevel 3 (see crash1.log).
> > 
> > After I have started the system in single mode, volume can be mounted in rw
> > mode and I can write some data into it. Unfortunately when I tried to read
> > a certain file, the system crashed (see crash2.log).
> > 
> > I have started scrub on the volume and here is the report:
> > 
> > # btrfs scrub status /home
> > scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> > 	scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
> > 	total bytes scrubbed: 55.68GiB with 1767 errors
> > 	error details: verify=175 csum=1592
> > 	corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
> > 
> > Obviously, some data is lost. However due to above crash, I cannot just copy
> > the data from the volume. I would assume that I still can access the data, but
> > the files for which data is lost, should result I/O error (I would then recover
> > them from my backup).
> > 
> > I have decided to attach another drive and remove failing devices one-by-one.
> > However that does not work:
> > 
> > # btrfs dev delete /dev/sda /home
> > [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [  168.684236] ata3.00: BMDMA stat 0x25
> > [  168.688464] ata3.00: failed command: READ DMA
> > [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> > [  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> > [  168.701281] ata3.00: status: { DRDY ERR }
> > [  168.705600] ata3.00: error: { UNC }
> > [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> > [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
> > [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [  172.828651] ata3.00: BMDMA stat 0x25
> > [  172.833281] ata3.00: failed command: READ DMA
> > [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> > [  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> > [  172.847296] ata3.00: status: { DRDY ERR }
> > [  172.852054] ata3.00: error: { UNC }
> > [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> > [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
> > ERROR: error removing device '/dev/sda': Input/output error
> > 
> > The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> > btrfs "force" option so that btrfs balances only chunks that are accessible? I
> > can potentially physically disconnect /dev/sda, but the loss will be greater
> > I believe.
> > 
> > How can I proceed except btrfs restore?
> > 
> > During scrub operation the following was recorded in the logs:
> > 
> > [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)
> > 
> > If I collect all the messages like this, will it give a full picture of damaged files?
> > 
> > Many thanks in advance.
> > 
> > P.S. Linux kernel v4.4.2, btrfs-progs v4.4.
> 
> I have decided to try "btrfs restore". Actually I have discovered two usability
> points about it:
> 
> 1. I cannot run this utility as following:
> 
> btrfs -i restore /dev/sda /mnt/usb &> log
> 
> because this command is interactive and may read something from the terminal.
> It would be nice if there is a flag -y (answer "yes" to all questions) so that
> no input is required from user. The example of the question is:
> 
> We seem to be looping a lot on ..., do you want to keep going on? [y/N/a]
> 
> In general this question puzzles me. What does it mean? As far as I understood
> it prevents btrfs restore from looping forever. Should I consider those files
> as lost? I have also hit the same problem as discussed in [1]: answer
> "a" (always) still causes the questions to be asked.
> 
> 2. btrfs restore does not print a final statistics: how many files are
> successfully restored, and how many have failed.

Thanks for trying 'restore', but I was wondering, does btrfsck work for you?

Thanks,

-liubo

> 
> [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html
> 
> -- 
> With best regards,
> Dmitry
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-04-21  3:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
2016-04-17 23:18 ` Dmitry Katsubo
2016-04-21  3:45   ` Liu Bo [this message]
     [not found]     ` <571DC34A.50509@gmail.com>
2016-04-27  2:44       ` Dmitry Katsubo
2016-05-02 20:51         ` Dmitry Katsubo
2016-04-18  0:19 ` Chris Murphy
2016-04-19  5:45   ` Dmitry Katsubo
2016-04-19  7:58     ` Duncan
2016-04-20 22:02       ` Dmitry Katsubo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160421034524.GA26182@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=dmitry.katsubo@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).