linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Adam Borowski <kilobyte@angband.pl>
Cc: Zoltan <zoltan1980@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: Is it safe to use btrfs on top of different types of devices?
Date: Tue, 17 Oct 2017 07:26:24 -0400	[thread overview]
Message-ID: <e8497dd4-7341-a708-7828-282001738a95@gmail.com> (raw)
In-Reply-To: <20171017011443.bupcsskm7joc73wb@angband.pl>

On 2017-10-16 21:14, Adam Borowski wrote:
> On Mon, Oct 16, 2017 at 01:27:40PM -0400, Austin S. Hemmelgarn wrote:
>> On 2017-10-16 12:57, Zoltan wrote:
>>> On Mon, Oct 16, 2017 at 1:53 PM, Austin S. Hemmelgarn wrote:
>> In an ideal situation, scrubbing should not be an 'only if needed' thing,
>> even for a regular array that isn't dealing with USB issues. From a
>> practical perspective, there's no way to know for certain if a scrub is
>> needed short of reading every single file in the filesystem in it's
>> entirety, at which point, you're just better off running a scrub (because if
>> you _do_ need to scrub, you'll end up reading everything twice).
> 
>> [...]  There are three things to deal with here:
>> 1. Latent data corruption caused either by bit rot, or by a half-write (that
>> is, one copy got written successfully, then the other device disappeared
>> _before_ the other copy got written).
>> 2. Single chunks generated when the array is degraded.
>> 3. Half-raid1 chunks generated by newer kernels when the array is degraded.
> 
> Note that any of the above other than bit rot affect only very recent data.
> If we keep record of the last known-good generation, all of that can be
> enumerated, allowing us to make a selective scrub that checks only a small
> part of the disk.  A linear read a 8TB disk takes 14 hours...
> 
> If we ever get auto-recovery, this is a fine candidate.
Indeed, and in fact I think that generational filtering may in fact be 
one of the easier performance improvements here too.
> 
>> Scrub will fix problem 1 because that's what it's designed to fix.  it will
>> also fix problem 3, since that behaves just like problem 1 from a
>> higher-level perspective.  It won't fix problem 2 though, as it doesn't look
>> at chunk types (only if the data in the chunk doesn't have the correct
>> number of valid copies).
> 
> Here not even tracking generations is required: a soft convert balance
> touches only bad chunks.  Again, would work well for auto-recovery, as it's
> a no-op if all is well.
However, it would require some minor differences from the current 
balance command, as newer kernels (are supposed to) generate half-raid1 
chunks instead of single chunks, though that can also be fixed by scrub.
> 
>> In contrast, the balance command you quoted won't fix issue 1 (because it
>> doesn't validate checksums or check that data has the right number of
>> copies), or issue 3 (because it's been told to only operate on non-raid1
>> chunks), but it will fix issue 2.
>>
>> In comparison to both of the above, a full balance without filters will fix
>> all three issues, although it will do so less efficiently (in terms of both
>> time and disk usage) than running a soft-conversion balance followed by a
>> scrub.
> 
> "less efficiently" is an understatement.  Scrub gets a good part of
> theoretical linear speed, while I just had a single metadata block take
> 14428 seconds to balance.
Yeah, the metadata especially can get pretty bad.
> 
>> In the case of normal usage, device disconnects are rare, so you should
>> generally be more worried about latent data corruption.
> 
> Yeah, but certain setups (like anything USB) gets disconnect quite often.
> It would be nice to get them right.  MD thanks to write-intent bitmap can
> recover almost instantly, btrfs could do it better -- the code to do so
> isn't written yet.
The write intent bitmap is also exponentially easier to implement than 
what's be needed for BTRFS.
> 
>> monitor the kernel log to watch for device disconnects, remount the
>> filesystem when the device reconnects, and then run the balance command
>> followed by a scrub.  With most hardware I've seen, USB disconnects tend to
>> be relatively frequent unless you're using very high quality cabling and
>> peripheral devices.  If, however, they happen less than once a day most of
>> the time, just set up the log monitor to remount, and set the balance and
>> scrub commands on the schedule I suggested above for normal usage.
> 
> A day-long recovery for an event that happens daily isn't a particularly
> enticing prospect.
I forget sometimes that people insist on storing large volumes of data 
on unreliable storage...

  reply	other threads:[~2017-10-17 11:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-14 19:00 Is it safe to use btrfs on top of different types of devices? Zoltán Ivánfi
2017-10-15  0:19 ` Peter Grandi
2017-10-15  3:42 ` Duncan
2017-10-15  8:30 ` Zoltán Ivánfi
2017-10-15 12:05   ` Duncan
2017-10-16 11:53   ` Austin S. Hemmelgarn
2017-10-16 16:57     ` Zoltan
2017-10-16 17:27       ` Austin S. Hemmelgarn
2017-10-17  1:14         ` Adam Borowski
2017-10-17 11:26           ` Austin S. Hemmelgarn [this message]
2017-10-17 11:42             ` Zoltan
2017-10-17 12:40               ` Austin S. Hemmelgarn
2017-10-17 17:06                 ` Adam Borowski
2017-10-17 19:19                   ` Austin S. Hemmelgarn
2017-10-17 20:21                     ` Adam Borowski
2017-10-17 21:56                       ` Zoltán Ivánfi
2017-10-18  4:44                         ` Duncan
2017-10-18 14:07                         ` Peter Grandi
2017-10-18 11:30                       ` Austin S. Hemmelgarn
2017-10-18 11:59                         ` Adam Borowski
2017-10-18 14:30                           ` Austin S. Hemmelgarn
2017-10-18  4:50                     ` Duncan
2017-10-18 13:53               ` Peter Grandi
2017-10-18 14:30                 ` Austin S. Hemmelgarn
2017-10-19 11:01                   ` Peter Grandi
2017-10-19 12:32                     ` Austin S. Hemmelgarn
2017-10-19 18:39                       ` Peter Grandi
2017-10-20 11:53                         ` Austin S. Hemmelgarn
2017-10-19 13:48                     ` Zoltan
2017-10-19 14:27                       ` Austin S. Hemmelgarn
2017-10-19 14:42                         ` Zoltan
2017-10-19 15:07                           ` Austin S. Hemmelgarn
2017-10-19 18:00                         ` Peter Grandi
2017-10-19 17:56                       ` Peter Grandi
2017-10-19 18:59                         ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8497dd4-7341-a708-7828-282001738a95@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zoltan1980@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).