Re: Unrecoverable fs corruption?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Unrecoverable fs corruption?
Date: Wed, 6 Jan 2016 07:35:53 +0000 (UTC)	[thread overview]
Message-ID: <pan$239bc$38bcd0c3$4c5a6275$65deddb4@cox.net> (raw)
In-Reply-To: 1451865902.6411.6.camel@scientia.net

Christoph Anton Mitterer posted on Mon, 04 Jan 2016 01:05:02 +0100 as
excerpted:

> On Sun, 2016-01-03 at 15:00 +0000, Duncan wrote:
>> But now that I think about it, balance does read the chunk in ordered
>> to rewrite its contents, and that read, like all reads, should normally
>> be checksum verified
> That was my idea.... :)
> 
>>  (except of course in the case of nodatasum, which nocow
>> of course implies).
> Though I haven't had the time so far to reply on the most recent posts
> in that thread,... I still haven't given up on the quest for
> checksumming of nodatacow'ed data ;-)

Following the lines of the btrfs-convert discussion elsewhere, I don't 
believe the current devs to be too interested in this at the current 
time, tho maybe in the "bluesky" timeframe, beyond five years out, likely 
more like ten.  Because most of them believe it to be cost/benefit 
impractical to work on.  However, much like btrfs-convert, if a (probably 
new) developer finds this his particular itch he wants to scratch, and 
puts in the seriously high level of effort to get it to work, and it's 
all up to code standard, perhaps.  But it's going to have to pass a 
pretty high level of skepticism and in general it's simply not considered 
worth the incredible level of effort that would be necessary, so it's 
going to take a developer with a pretty intense itch to scratch over a 
period, very likely, of some years, by the time the code can be both 
demonstrated theoretically correct and pass regression tests and 
skepticism, to get it to the level were it could be properly included.

IOW, not impossible, but as close as it gets.  I'd say the chances of 
seeing this in mainline (not just a series of patches carried by someone 
else) in anything under say 7 years is well under 5%, probably under 2%.  
The chances at say 15 years... maybe 15%.  (That said, if you look at ext4 
as an example, it has grown a bunch of exotic options over time, that 
most people will never use but that scratched someone's itch.  Btrfs 
could be getting similar, at 7+ years out, so it's possible, and at that 
viewpoint, some may even consider the chances near 50% at the 10 year out 
mark.  I'm skeptical, but I wouldn't have considered all those weird 
things now possible in ext4 likely to ever reach mainline ext4, either, 
so...)

But I honestly don't expect current devs to spend much time on the 
proposal, at least not in the 7- year timeframe.

> Especially on large filesystems all these operations tend to take large
> amounts of time and may even impact the lifetime of the storage
> device(s)... so it would be clever if certain such operations could be
> kinda "merged", at least for the purposes of getting the results.
> As in the above example, if one would anyway run a full balance, the
> next scrub may be skipped because one is just doing one.
> Similar for defrag.

Well, balance definitely doesn't do defrag.  By analogy, balance is at 
the UN, nation to nation, level, while defrag is at the city precinct 
level.  They're simply out of each other's scope.

Which isn't to say that at some point in the future, there won't be some 
btrfs doitall command, that does scrub and balance and defrag and 
recompression and ... all in a single pass, taking parameters from all 
the individual functions.  But as you say, that's likely to be at least 
intermediate future, 3-5 years out, maybe 5-7 years out or more.

And like btrfs-convert, I'd consider it in the "not a core tool, but nice 
to have" category.

>> And even if balance works to verify no checksum errors, I don't believe
>> it would correct them or give you the detail on them that a scrub
>> would.
> I'd have expected that that read errors are (if possible because of
> block copies) are repaired as soon as they're encountered... isn't that
> the case?

(My understanding is that...) At the balance level, checksum corruption 
errors aren't going to be fixed from the other copy or from parity, 
because unlike normal file usage, the other copy isn't read -- balance 
isn't worried about file or extent level corruption, and any it would 
find would be simply a byproduct of the normal read-time checksum 
verification process, it's simply moving chunks around.  Such errors 
would thus simply cause the balance to abort, with whatever balance-time 
error that wouldn't even necessarily reflect that it's a checksum error.

Assuming that's correct, a completed balance could be assumed to have in 
addition the meaning of a scrub completed without any errors, but a 
failed balance could have failed for one of any number of reasons and 
with one of various balance-level errors, with such a failure yielding 
little or no clue as to scrub status.

>> And if there is an error, it'd be a balance error, which might or might
>> not actually be a scrub error.
> Sure, but it shouldn't be difficult to collect e.g. scrub stats during
> balance as well.

Given that as of now they're still struggling to manage balance's memory 
requirements in ordered to let it scale more efficiently, and that 
scaling, particularly in the presence of large numbers of subvolumes and 
with quotas remains the single biggest issue, the devs are extremely 
unlikely to want to be adding additional memory requirements in ordered 
to additionally track scrub stats.

Even once the current scaling issues are resolved, I don't see it being a 
useful option for balance itself, precisely because of the scaling 
issues, then on potentially embedded systems running TB-scale storage.  
But there might indeed be some place for it in the still very theoretical 
btrfs doitall command you proposed and I named doitall, above.  Embedded-
scale applications would simply not run that command, instead running the 
lower resource individual commands, while doitall could say check that it 
had a minimum of 16 GiB of memory or whatever to use, and exit with an 
error if not, so it could optionally be run on systems with the required 
resources.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-01-06  7:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-31 23:36 Unrecoverable fs corruption? Alexander Duscheleit
2016-01-01  1:22 ` Chris Murphy
2016-01-01  8:13   ` Duncan
2016-01-02  4:32     ` Christoph Anton Mitterer
2016-01-03 15:00       ` Duncan
2016-01-04  0:05         ` Christoph Anton Mitterer
2016-01-06  7:35           ` Duncan [this message]
2016-01-02 10:53     ` Alexander Duscheleit
2016-01-02 21:19       ` Henk Slager
2016-01-03 15:53         ` Duncan
2016-01-03 16:24           ` Martin Steigerwald
2016-01-03 16:08       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$239bc$38bcd0c3$4c5a6275$65deddb4@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).