Re: Kernel crash if both devices in raid1 are failing

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dmitry Katsubo <dmitry.katsubo@gmail.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Kernel crash if both devices in raid1 are failing
Date: Thu, 21 Apr 2016 00:02:52 +0200	[thread overview]
Message-ID: <5717FC8C.5090006@gmail.com> (raw)
In-Reply-To: <pan$4b158$295907b$46fdde88$24c8df73@cox.net>

On 2016-04-19 09:58, Duncan wrote:
> Dmitry Katsubo posted on Tue, 19 Apr 2016 07:45:40 +0200 as excerpted:
> 
>> Actually btrfs restore has recovered many files, however I was not able
>> to run in fully unattended mode as it complains about "looping a lot".
>> Does it mean that files are corrupted / not correctly restored?
> 
> As long as you tell it to keep going each time, the loop complaints 
> shouldn't be an issue.  The problem is that the loop counter is measuring 
> loops on a particular directory, because that's what it has available to 
> measure.  But if you had a whole bunch of files in that dir, it's /going/ 
> to loop a lot, to restore all of them.
> 
> I have one cache directory with over 200K files in it.  They're all text 
> messages from various technical lists and newsgroups (like this list, 
> which I view as a newsgroup using gmane.org's list2news service) so 
> they're quite small, about 5 KiB on average by my quick calculation, but 
> that's still a LOT of files for a single dir, even if they're only using 
> just over a GiB of space.
> 
> I ended up doing a btrfs restore on that filesystem (/home), because 
> while I had a backup, restore was getting more recent copies of stuff 
> back, and that dir looped a *LOT* the first time it happened, now several 
> years ago, before they actually added the always option.

I have the same situation here: there is a backup, but the most recent
modifications in files are preferable.

> The second time it happened, about a year ago, restore worked much 
> better, and I was able to use the always option.  But AFAIK, always only 
> applies to that dir.  If you have multiple dirs with the problem, you'll 
> still get asked for the next one.  But it did vastly improve the 
> situation for me, giving me only a handful of prompts instead of the very 
> many I had before the option was there.

Yes, this is exactly the problem discussed a while ago. Would be nice if
"btrfs restore -i" applies "(a)lways" option to all questions or there is
a separate option for that ("-y").

For me personally "looping" is too low-level problem. System administrators
(that are going to use this utility) should operate with some more reasonable
terms. If "looping" is some analogy of "time consumption" then I would say
that during restore time does not matter so much: I am ready to wait for 1
minute until a specific file is restored. So I think not the number of loops
but number of time spent should be measured.

Also I have difficulties in finding out what files have not been restored
due to uncorrectable errors. As I cannot redirect the output of
"btrfs restore" and it does not print the final stats, I cannot tell what
files have to be restored from backup.

> (The main problem triggering the need to run restore for me, turned out 
> to be hardware.  I've had no issues since I replaced that failing ssd, 
> and with a bit of luck, won't be running restore again for a few years, 
> now.)

I would be happy if I am able to replace the failing drive on the fly, without
stopping the system. Unfortunately I cannot do that due to kernel crashes :(
btrfs is still not resistant to these corner cases.

-- 
With best regards,
Dmitry

     prev parent reply	other threads:[~2016-04-20 22:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
2016-04-17 23:18 ` Dmitry Katsubo
2016-04-21  3:45   ` Liu Bo
     [not found]     ` <571DC34A.50509@gmail.com>
2016-04-27  2:44       ` Dmitry Katsubo
2016-05-02 20:51         ` Dmitry Katsubo
2016-04-18  0:19 ` Chris Murphy
2016-04-19  5:45   ` Dmitry Katsubo
2016-04-19  7:58     ` Duncan
2016-04-20 22:02       ` Dmitry Katsubo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5717FC8C.5090006@gmail.com \
    --to=dmitry.katsubo@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).