From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f42.google.com ([209.85.214.42]:46777 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933668AbeAJRBr (ORCPT ); Wed, 10 Jan 2018 12:01:47 -0500 Received: by mail-it0-f42.google.com with SMTP id c16so165222itc.5 for ; Wed, 10 Jan 2018 09:01:47 -0800 (PST) Subject: Re: Recommendations for balancing as part of regular maintenance? To: Tom Worster , linux-btrfs@vger.kernel.org References: <5A539A3A.10107@gmail.com> <811ff9be-d155-dae0-8841-0c1b20c18843@cobb.uk.net> <796ad87c-852f-c6a0-7366-5e888d51fc5c@gmail.com> <01020160d7768587-50a9392c-7250-4735-9d14-66ff03a161c9-000000@eu-west-1.amazonses.com> <3eae37f6-3776-15c9-84ae-568e56abfa7e@rqc.ru> <13b5063c-a7bd-5c95-1f6e-16124d385569@gmail.com> From: "Austin S. Hemmelgarn" Message-ID: <3e353e79-3a13-d2cf-e098-6074a3e17918@gmail.com> Date: Wed, 10 Jan 2018 12:01:42 -0500 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2018-01-10 11:30, Tom Worster wrote: > On 9 Jan 2018, at 22:49, Duncan wrote: > >> AFAIK, such corruption reports re balance aren't really balance, per se, >> at all. >> >> Instead, what I've seen in nearly all cases is a number of filesystem >> maintenance commands involving heavy I/O colliding, that is, being run at >> the same time > > I hope there is consensus on this because it might be the key to > resolving the contradictions that appear to me in the following > propositions that all seem plausible/reasonable: > > - Depletion of unallocated space (DoUS, apologies for coining the term > if there already is one) is a property of BTRFS even if the volume's > capacity is more than enough for the files on it. Strictly speaking this particular statement is only true in that there are still probably bugs in the allocator. The goal is for this to never be a significant problem as long as you have a reasonable amount of free space (reasonable being enough for at least a couple of chunks to be allocated). Also, for future reference, the term we typically use is ENOSPC, as that's the symbolic name for the error code you get when this happens (or when your filesystem is just normally full), but I actually kind of like your name for it too, it conveys the exact condition being discussed in a way that should be a bit easier for non-technical types to understand. > > - To a user that isn't a BTRFS expert, DoUS can be unexpected, its > advance can be surprisingly fast and it can become severe. Absolutely correct, and actually true even for a number of BTRFS 'experts' (no, seriously, I know of a number of cases where this caught 'experts' (including myself) by surprise simply because they ran into a corner case they had never dealt with or found a bug in the allocator). > > - BTRFS does not recycle allocated but unused space to the unallocated > pool. Kind of. The regular BTRFS allocator will (usually) preferentially avoid using blocks of free space smaller than a given size for new allocations. Without the 'ssd' mount option set, or when using Linux kernel version 4.14 or newer, the minimum size is 64kB, so it's generally not too bad unless you regularly are dealing with lots of small files that change very frequently. With the 'ssd' mount option set on Linux kernels prior to 4.14, the minimum size is 2MB, which tends to result in really poor space utilization, though it's still mostly an issue with volumes holding lots of small files that change frequently or see lots of small changes to large files. However, this does not mean that that space will always be unused. If space gets tight, BTRFS will use that previously allocated space to it's fullest, and it will reuse it in other circumstances too. > > - Resolving severe DoUS involves either running `btrfs balance` or > recreating the filesystem from, e.g. backups. In most cases yes, though it is sometimes possible to resolve simply by dropping snapshots if you have a lot of them and then deleting some files. > > - People have reported that `btrfs balance` sometimes causes filesystem > corruption. As I commented, I've not heard about this specifically, and I'm inclined to agree with Duncan's assessment that it's probably from people running multiple low-level maintenance operations happening concurrently (running two or more balances at the same time is known to be able to cause this type of corruption, and as a result there's locking in the kernel to prevent you from running more than one balance at a time on a filesystem).> > - Some experienced users say that, to resolve a problem with DoUS, they > would rather recreate the filesystem than run balance. This is kind of independent of BTRFS. A lot of seasoned system administrators are going to be more likely to just rebuild a broken filesystem from scratch if possible than repair it simply because it's more reliable and generally guaranteed to fix the issue. It largely comes down to the mentality of the individual, and how confident they are that they can fix a problem in a reasonable amount of time without causing damage elsewhere. > > - Some experienced users say you should stop all other use of the > filesystem while running balance. I've never seen any evidence that this is actually needed, but it does make the balance operation finish faster. Strictly speaking, it shouldn't be needed at all (that's part of the point of having CoW semantics in the filesystem, it makes it easier to handle maintenance on-line). > > - Some experts recommend running balance regularly, even once a day, to > prevent DoUS. > > Without some satisfactory way to resolve the contradictions, I'm not > sure how to proceed. For example, I'm not willing to offload the > workload from each filesystem once a day for prophylactic balance. And > I'm not going to let balance run unattended if those more experienced > than me say it's known to corrupt filesystems. The best I can do is > monitor DoUS and respond ad hoc. Or I can use a different fs type. It may be worth seriously looking at whether you actually _need_ BTRFS for your use case. In general, unless you need at least one of it's features, and either can't get that feature with ZFS or just want to avoid using ZFS, you are likely better-off for the time being using another filesystem. In my case for example, I _really_ want to avoid dealing with ZFS on Linux because of how it impacts what kernel versions I use and the fact that I don't trust the proprietary NVIDIA drivers to get along with it, and I need the checksumming and online transformation features (reshaping, profile conversion, device replacement, etc) of BTRFS. If it weren't for all of that, I would not be using BTRFS at all. > > But if Duncan is right (which, for me, is practically the same as > consensus on the proposition) that problems with corruption while > running balance are associated with heavy coincident IO activity, then I > can see a reasonable way forwards. I can even see how general > recommendations for BTRFS maintenance might develop. As I commented above, I would tend to believe Duncan is right in this case (both because it makes sense, and because he seems to generally be right about this type of thing). That said, I really do think that normal user I/O is probably not the issue, but low-level filesystem operations are. That said, there is no reason that BTRFS shouldn't either: 1. Handle this just fine without causing corruption. or: 2. Extend the mutex used to prevent concurrent balances to cover other operations that might cause issues (that is, make it so you can't scrub a filesystem while it's being balanced, or defragment it, or whatever else).