Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
Date: Sat, 30 Jun 2018 06:33:35 +0000 (UTC)	[thread overview]
Message-ID: <pan$d561f$1f780d41$c875436f$9e055ce0@cox.net> (raw)
In-Reply-To: f76a93ba-05c8-967f-d7c0-8fdc0a618a7e@gmail.com

Austin S. Hemmelgarn posted on Fri, 29 Jun 2018 14:31:04 -0400 as
excerpted:

> On 2018-06-29 13:58, james harvey wrote:
>> On Fri, Jun 29, 2018 at 1:09 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2018-06-29 11:15, james harvey wrote:
>>>>
>>>> On Thu, Jun 28, 2018 at 6:27 PM, Chris Murphy
>>>> <lists@colorremedies.com>
>>>> wrote:
>>>>>
>>>>> And an open question I have about scrub is weather it only ever is
>>>>> checking csums, meaning nodatacow files are never scrubbed, or if
>>>>> the copies are at least compared to each other?
>>>>
>>>>
>>>> Scrub never looks at nodatacow files.  It does not compare the copies
>>>> to each other.
>>>>
>>>> Qu submitted a patch to make check compare the copies:
>>>> https://patchwork.kernel.org/patch/10434509/
>>>>
>>>> This hasn't been added to btrfs-progs git yet.
>>>>
>>>> IMO, I think the offline check should look at nodatacow copies like
>>>> this, but I still think this also needs to be added to scrub.  In the
>>>> patch thread, I discuss my reasons why.  In brief: online scanning;
>>>> this goes along with user's expectation of scrub ensuring mirrored
>>>> data integrity; and recommendations to setup scrub on periodic basis
>>>> to me means it's the place to put it.
>>>
>>> That said, it can't sanely fix things if there is a mismatch. At
>>> least,
>>> not unless BTRFS gets proper generational tracking to handle
>>> temporarily missing devices.  As of right now, sanely fixing things
>>> requires significant manual intervention, as you have to bypass the
>>> device read selection algorithm to be able to look at the state of the
>>> individual copies so that you can pick one to use and forcibly rewrite
>>> the whole file by hand.
>> 
>> Absolutely.  User would need to use manual intervention as you
>> describe, or restore the single file(s) from backup.  But, it's a good
>> opportunity to tell the user they had partial data corruption, even if
>> it can't be auto-fixed.  Otherwise they get intermittent data
>> corruption, depending on which copies are read.

> The thing is though, as things stand right now, you need to manually
> edit the data on-disk directly or restore the file from a backup to fix
> the file.  While it's technically true that you can manually repair this
> type of thing, both of the cases for doing it without those patches I
> mentioned, it's functionally impossible for a regular user to do it
> without potentially losing some data.

[Usual backups rant, user vs. admin variant, nowcow/tmpfs edition.  
Regulars can skip as the rest is already predicted from past posts, for 
them. =;^]

"Regular user"?  

"Regular users" don't need to bother with this level of detail.  They 
simply get their "admin" to do it, even if that "admin" is their kid, or 
the kid from next door that's good with computers, or the geek squad (aka 
nsa-agent-squad) guy/gal, doing it... or telling them to install "a real 
OS", meaning whatever MS/Apple/Google something that they know how to 
deal with.

If the "user" is dealing with setting nocow, choosing btrfs in the first 
place, etc, then they're _not_ a "regular user" by definition, they're 
already an admin.

And as any admin learns rather quickly, the value of data is defined by 
the number of backups it's worth having of that data.

Which means it's not a problem.  Either the data had a backup and it's 
(reasonably) trivial to restore the data from that backup, or the data 
was defined by lack of having that backup as of only trivial value, so 
low as to not be worth the time/trouble/resources necessary to make that 
backup in the first place.

Which of course means what was defined as of most value, either the data 
of there was a backup, or the time/trouble/resources that would have gone 
into creating it if not, is *always* saved.

(And of course the same goes for "I had a backup, but it's old", except 
in this case it's the value of the data delta between the backup and 
current.  As soon as it's worth more than the time/trouble/hassle of 
updating the backup, it will by definition be updated.  Not having a 
newer backup available thus simply means the value of the data that 
changed between the last backup and current was simply not enough to 
justify updating the backup, and again, what was of most value is 
*always* saved, either the data, or the time that would have otherwise 
gone into making the newer backup.)

Because while a "regular user" may not know it because it's not his /job/ 
to know it, if there's anything an admin knows *well* it's that the 
working copy of data **WILL** be damaged.  It's not a matter of if, but 
of when, and of whether it'll be a fat-finger mistake, or a hardware or 
software failure, or wetware (theft, ransomware, etc), or wetware (flood, 
fire and the water that put it out damage, etc), tho none of that 
actually matters after all, because in the end, the only thing that 
matters was how the value of that data was defined by the number of 
backups made of it, and how quickly and conveniently at least one of 
those backups can be retrieved and restored.

Meanwhile, an admin worth the label will also know the relative risk 
associated with various options they might use, including nocow, and 
knowing that downgrades the stability rating of the storage approximately 
to the same degree that raid0 does, they'll already be aware that in such 
a case the working copy can only be defined as "throw-away" level in case 
of problems in the first place, and will thus not even consider their 
working copy to be a permanent copy at all, just a temporary garbage 
copy, only slightly more reliable than one stored on tmpfs, and will thus 
consider the first backup thereof the true working copy, with an 
additional level of backup beyond what they'd normally have thrown in to 
account for that fact.

So in case of problems people can simply restore nocow files from a near-
line stable working copy, much as they'd do after reboot or a umount/
remount cycle for a file stored in tmpfs.  And if they didn't have even a 
stable working copy let alone a backup... well, much like that file in 
tmpfs, what did they expect?  They *really* defined that data as of no 
more than trivial value, didn't they?

All that said, making the NOCOW warning labels a bit more bold print 
couldn't hurt; and making scrub in the nocow case at least compare copies 
and report differences, simply makes it easier for people to know they 
need to reach for that near-line stable working copy, or mkfs and start 
from scratch if they defined the data value as not worth the trouble of 
(in this case) even a stable working copy, let alone a backup, so that'd 
be a good thing too. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2018-06-30  6:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28  1:42 Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files Remi Gauvin
2018-06-28  1:58 ` Qu Wenruo
2018-06-28  2:10   ` Remi Gauvin
2018-06-28  2:55     ` Qu Wenruo
2018-06-28  3:14       ` remi
2018-06-28  5:39         ` Qu Wenruo
2018-06-28  8:16           ` Andrei Borzenkov
2018-06-28  8:20             ` Andrei Borzenkov
2018-06-28  9:15             ` Qu Wenruo
2018-06-28 11:12               ` Austin S. Hemmelgarn
2018-06-28 11:46                 ` Qu Wenruo
2018-06-28 12:20                   ` Austin S. Hemmelgarn
2018-06-28 17:10               ` Andrei Borzenkov
2018-06-29  0:07                 ` Qu Wenruo
2018-06-28 22:00               ` Remi Gauvin
2018-06-28 13:24 ` Anand Jain
2018-06-28 14:17   ` Chris Murphy
2018-06-28 15:37     ` Remi Gauvin
2018-06-28 22:04       ` Chris Murphy
2018-06-28 17:37     ` Goffredo Baroncelli
2018-06-28 22:27       ` Chris Murphy
2018-06-29 15:15         ` james harvey
2018-06-29 17:09           ` Austin S. Hemmelgarn
2018-06-29 17:58             ` james harvey
2018-06-29 18:31               ` Austin S. Hemmelgarn
2018-06-30  6:33                 ` Duncan [this message]
2018-07-02 12:03                   ` Austin S. Hemmelgarn
2018-06-29 18:40           ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$d561f$1f780d41$c875436f$9e055ce0@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).