From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f43.google.com ([209.85.214.43]:45004 "EHLO mail-it0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753352AbdJQL0a (ORCPT ); Tue, 17 Oct 2017 07:26:30 -0400 Received: by mail-it0-f43.google.com with SMTP id n195so5398847itg.1 for ; Tue, 17 Oct 2017 04:26:29 -0700 (PDT) Subject: Re: Is it safe to use btrfs on top of different types of devices? To: Adam Borowski Cc: Zoltan , linux-btrfs@vger.kernel.org References: <20171017011443.bupcsskm7joc73wb@angband.pl> From: "Austin S. Hemmelgarn" Message-ID: Date: Tue, 17 Oct 2017 07:26:24 -0400 MIME-Version: 1.0 In-Reply-To: <20171017011443.bupcsskm7joc73wb@angband.pl> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2017-10-16 21:14, Adam Borowski wrote: > On Mon, Oct 16, 2017 at 01:27:40PM -0400, Austin S. Hemmelgarn wrote: >> On 2017-10-16 12:57, Zoltan wrote: >>> On Mon, Oct 16, 2017 at 1:53 PM, Austin S. Hemmelgarn wrote: >> In an ideal situation, scrubbing should not be an 'only if needed' thing, >> even for a regular array that isn't dealing with USB issues. From a >> practical perspective, there's no way to know for certain if a scrub is >> needed short of reading every single file in the filesystem in it's >> entirety, at which point, you're just better off running a scrub (because if >> you _do_ need to scrub, you'll end up reading everything twice). > >> [...] There are three things to deal with here: >> 1. Latent data corruption caused either by bit rot, or by a half-write (that >> is, one copy got written successfully, then the other device disappeared >> _before_ the other copy got written). >> 2. Single chunks generated when the array is degraded. >> 3. Half-raid1 chunks generated by newer kernels when the array is degraded. > > Note that any of the above other than bit rot affect only very recent data. > If we keep record of the last known-good generation, all of that can be > enumerated, allowing us to make a selective scrub that checks only a small > part of the disk. A linear read a 8TB disk takes 14 hours... > > If we ever get auto-recovery, this is a fine candidate. Indeed, and in fact I think that generational filtering may in fact be one of the easier performance improvements here too. > >> Scrub will fix problem 1 because that's what it's designed to fix. it will >> also fix problem 3, since that behaves just like problem 1 from a >> higher-level perspective. It won't fix problem 2 though, as it doesn't look >> at chunk types (only if the data in the chunk doesn't have the correct >> number of valid copies). > > Here not even tracking generations is required: a soft convert balance > touches only bad chunks. Again, would work well for auto-recovery, as it's > a no-op if all is well. However, it would require some minor differences from the current balance command, as newer kernels (are supposed to) generate half-raid1 chunks instead of single chunks, though that can also be fixed by scrub. > >> In contrast, the balance command you quoted won't fix issue 1 (because it >> doesn't validate checksums or check that data has the right number of >> copies), or issue 3 (because it's been told to only operate on non-raid1 >> chunks), but it will fix issue 2. >> >> In comparison to both of the above, a full balance without filters will fix >> all three issues, although it will do so less efficiently (in terms of both >> time and disk usage) than running a soft-conversion balance followed by a >> scrub. > > "less efficiently" is an understatement. Scrub gets a good part of > theoretical linear speed, while I just had a single metadata block take > 14428 seconds to balance. Yeah, the metadata especially can get pretty bad. > >> In the case of normal usage, device disconnects are rare, so you should >> generally be more worried about latent data corruption. > > Yeah, but certain setups (like anything USB) gets disconnect quite often. > It would be nice to get them right. MD thanks to write-intent bitmap can > recover almost instantly, btrfs could do it better -- the code to do so > isn't written yet. The write intent bitmap is also exponentially easier to implement than what's be needed for BTRFS. > >> monitor the kernel log to watch for device disconnects, remount the >> filesystem when the device reconnects, and then run the balance command >> followed by a scrub. With most hardware I've seen, USB disconnects tend to >> be relatively frequent unless you're using very high quality cabling and >> peripheral devices. If, however, they happen less than once a day most of >> the time, just set up the log monitor to remount, and set the balance and >> scrub commands on the schedule I suggested above for normal usage. > > A day-long recovery for an event that happens daily isn't a particularly > enticing prospect. I forget sometimes that people insist on storing large volumes of data on unreliable storage...