Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
Date: Mon, 2 Jul 2018 08:03:10 -0400	[thread overview]
Message-ID: <6118a37a-487d-cc04-8686-db8a691b164c@gmail.com> (raw)
In-Reply-To: <pan$d561f$1f780d41$c875436f$9e055ce0@cox.net>

On 2018-06-30 02:33, Duncan wrote:
> Austin S. Hemmelgarn posted on Fri, 29 Jun 2018 14:31:04 -0400 as
> excerpted:
> 
>> On 2018-06-29 13:58, james harvey wrote:
>>> On Fri, Jun 29, 2018 at 1:09 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>> On 2018-06-29 11:15, james harvey wrote:
>>>>>
>>>>> On Thu, Jun 28, 2018 at 6:27 PM, Chris Murphy
>>>>> <lists@colorremedies.com>
>>>>> wrote:
>>>>>>
>>>>>> And an open question I have about scrub is weather it only ever is
>>>>>> checking csums, meaning nodatacow files are never scrubbed, or if
>>>>>> the copies are at least compared to each other?
>>>>>
>>>>>
>>>>> Scrub never looks at nodatacow files.  It does not compare the copies
>>>>> to each other.
>>>>>
>>>>> Qu submitted a patch to make check compare the copies:
>>>>> https://patchwork.kernel.org/patch/10434509/
>>>>>
>>>>> This hasn't been added to btrfs-progs git yet.
>>>>>
>>>>> IMO, I think the offline check should look at nodatacow copies like
>>>>> this, but I still think this also needs to be added to scrub.  In the
>>>>> patch thread, I discuss my reasons why.  In brief: online scanning;
>>>>> this goes along with user's expectation of scrub ensuring mirrored
>>>>> data integrity; and recommendations to setup scrub on periodic basis
>>>>> to me means it's the place to put it.
>>>>
>>>> That said, it can't sanely fix things if there is a mismatch. At
>>>> least,
>>>> not unless BTRFS gets proper generational tracking to handle
>>>> temporarily missing devices.  As of right now, sanely fixing things
>>>> requires significant manual intervention, as you have to bypass the
>>>> device read selection algorithm to be able to look at the state of the
>>>> individual copies so that you can pick one to use and forcibly rewrite
>>>> the whole file by hand.
>>>
>>> Absolutely.  User would need to use manual intervention as you
>>> describe, or restore the single file(s) from backup.  But, it's a good
>>> opportunity to tell the user they had partial data corruption, even if
>>> it can't be auto-fixed.  Otherwise they get intermittent data
>>> corruption, depending on which copies are read.
> 
>> The thing is though, as things stand right now, you need to manually
>> edit the data on-disk directly or restore the file from a backup to fix
>> the file.  While it's technically true that you can manually repair this
>> type of thing, both of the cases for doing it without those patches I
>> mentioned, it's functionally impossible for a regular user to do it
>> without potentially losing some data.
> 
> [Usual backups rant, user vs. admin variant, nowcow/tmpfs edition.
> Regulars can skip as the rest is already predicted from past posts, for
> them. =;^]
> 
> "Regular user"?
> 
> "Regular users" don't need to bother with this level of detail.  They
> simply get their "admin" to do it, even if that "admin" is their kid, or
> the kid from next door that's good with computers, or the geek squad (aka
> nsa-agent-squad) guy/gal, doing it... or telling them to install "a real
> OS", meaning whatever MS/Apple/Google something that they know how to
> deal with.
> 
> If the "user" is dealing with setting nocow, choosing btrfs in the first
> place, etc, then they're _not_ a "regular user" by definition, they're
> already an admin.I'd argue that that's not always true.  'Regular users' also bli9ndly 
follow advice they find online about how to make their system run 
better, and quite often don't keep backups.
> 
> And as any admin learns rather quickly, the value of data is defined by
> the number of backups it's worth having of that data.
> 
> Which means it's not a problem.  Either the data had a backup and it's
> (reasonably) trivial to restore the data from that backup, or the data
> was defined by lack of having that backup as of only trivial value, so
> low as to not be worth the time/trouble/resources necessary to make that
> backup in the first place.
> 
> Which of course means what was defined as of most value, either the data
> of there was a backup, or the time/trouble/resources that would have gone
> into creating it if not, is *always* saved.
> 
> (And of course the same goes for "I had a backup, but it's old", except
> in this case it's the value of the data delta between the backup and
> current.  As soon as it's worth more than the time/trouble/hassle of
> updating the backup, it will by definition be updated.  Not having a
> newer backup available thus simply means the value of the data that
> changed between the last backup and current was simply not enough to
> justify updating the backup, and again, what was of most value is
> *always* saved, either the data, or the time that would have otherwise
> gone into making the newer backup.)
> 
> Because while a "regular user" may not know it because it's not his /job/
> to know it, if there's anything an admin knows *well* it's that the
> working copy of data **WILL** be damaged.  It's not a matter of if, but
> of when, and of whether it'll be a fat-finger mistake, or a hardware or
> software failure, or wetware (theft, ransomware, etc), or wetware (flood,
> fire and the water that put it out damage, etc), tho none of that
> actually matters after all, because in the end, the only thing that
> matters was how the value of that data was defined by the number of
> backups made of it, and how quickly and conveniently at least one of
> those backups can be retrieved and restored.
> 
> 
> Meanwhile, an admin worth the label will also know the relative risk
> associated with various options they might use, including nocow, and
> knowing that downgrades the stability rating of the storage approximately
> to the same degree that raid0 does, they'll already be aware that in such
> a case the working copy can only be defined as "throw-away" level in case
> of problems in the first place, and will thus not even consider their
> working copy to be a permanent copy at all, just a temporary garbage
> copy, only slightly more reliable than one stored on tmpfs, and will thus
> consider the first backup thereof the true working copy, with an
> additional level of backup beyond what they'd normally have thrown in to
> account for that fact.
> 
> So in case of problems people can simply restore nocow files from a near-
> line stable working copy, much as they'd do after reboot or a umount/
> remount cycle for a file stored in tmpfs.  And if they didn't have even a
> stable working copy let alone a backup... well, much like that file in
> tmpfs, what did they expect?  They *really* defined that data as of no
> more than trivial value, didn't they?
> 
> 
> All that said, making the NOCOW warning labels a bit more bold print
> couldn't hurt; and making scrub in the nocow case at least compare copies
> and report differences, simply makes it easier for people to know they
> need to reach for that near-line stable working copy, or mkfs and start
> from scratch if they defined the data value as not worth the trouble of
> (in this case) even a stable working copy, let alone a backup, so that'd
> be a good thing too. =:^)
> 
There are two things this rant ignores though:

1. Restoring from a backup is usually slow.  Even if you have a good 
backup system.  As a really specific example, where I work, it takes me 
about 5 minutes to find a single file in our backups.  Beyond that, the 
backup software has to pull together the whole archive form the 
individual pieces, decompress it, and then extract the file.  On 
average, for a file the size of a VM image, this all takes at least half 
an hour.

2. Backups are usually daily.  In most cases, it's much preferred to not 
lose all the day's work on a given file.

Given both points, I'd much rather be able to take 90 seconds to fix a 
file and have it probably work, with the ability to restore from a 
backup if it doesn't.  Currently, despite the fact that I actually know 
(just barely) enough to fix this particular type of issue by hand, I end 
up just restoring files from backup all the time, because that 30 minute 
wait is still better than the hour plus amount of time it takes for me 
to repair it by hand.

next prev parent reply	other threads:[~2018-07-02 12:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28  1:42 Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files Remi Gauvin
2018-06-28  1:58 ` Qu Wenruo
2018-06-28  2:10   ` Remi Gauvin
2018-06-28  2:55     ` Qu Wenruo
2018-06-28  3:14       ` remi
2018-06-28  5:39         ` Qu Wenruo
2018-06-28  8:16           ` Andrei Borzenkov
2018-06-28  8:20             ` Andrei Borzenkov
2018-06-28  9:15             ` Qu Wenruo
2018-06-28 11:12               ` Austin S. Hemmelgarn
2018-06-28 11:46                 ` Qu Wenruo
2018-06-28 12:20                   ` Austin S. Hemmelgarn
2018-06-28 17:10               ` Andrei Borzenkov
2018-06-29  0:07                 ` Qu Wenruo
2018-06-28 22:00               ` Remi Gauvin
2018-06-28 13:24 ` Anand Jain
2018-06-28 14:17   ` Chris Murphy
2018-06-28 15:37     ` Remi Gauvin
2018-06-28 22:04       ` Chris Murphy
2018-06-28 17:37     ` Goffredo Baroncelli
2018-06-28 22:27       ` Chris Murphy
2018-06-29 15:15         ` james harvey
2018-06-29 17:09           ` Austin S. Hemmelgarn
2018-06-29 17:58             ` james harvey
2018-06-29 18:31               ` Austin S. Hemmelgarn
2018-06-30  6:33                 ` Duncan
2018-07-02 12:03                   ` Austin S. Hemmelgarn [this message]
2018-06-29 18:40           ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6118a37a-487d-cc04-8686-db8a691b164c@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).