From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6841AC2D0A3 for ; Sun, 1 Nov 2020 17:50:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 345D0217A0 for ; Sun, 1 Nov 2020 17:50:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727155AbgKARuM convert rfc822-to-8bit (ORCPT ); Sun, 1 Nov 2020 12:50:12 -0500 Received: from james.kirk.hungrycats.org ([174.142.39.145]:48648 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgKARuM (ORCPT ); Sun, 1 Nov 2020 12:50:12 -0500 Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id 2251B881068; Sun, 1 Nov 2020 12:49:07 -0500 (EST) Date: Sun, 1 Nov 2020 12:49:03 -0500 From: Zygo Blaxell To: waxhead Cc: Btrfs BTRFS Subject: Re: Switching from spacecache v1 to v2 Message-ID: <20201101174902.GU5890@hungrycats.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Sat, Oct 31, 2020 at 01:27:57AM +0100, waxhead wrote: > A couple of months ago I asked on IRC how to properly switch from version 1 > to version 2 of the space cache. I also asked if the space cache v2 was > considered stable. > I only remember what we talked about, and from what I understood it was not > as easy to switch as the wiki may seem to indicate. > > We run a box with a btrfs filesystem at 19TB, 9 disks, 11 subvolumes that > contains about 6.5 million files (and this number is growing). > > The filesystem has always been mounted with just the default options. > > Performance is slow, and it improved when I moved the bulk of the files to > various subvolumes for some reason. The wiki states that performance on very > large filesystems (what is considered large?) may degrade drastically. The important number for space_cache=v1 performance is the number of block groups in which some space was allocated or deallocated per transaction (i.e. the number of block groups that have to be updated on disk), divided by the speed of the drives (i.e. the number of seeks they can perform per second). "Large" could be 100GB if it was on a slow disk with a highly fragmented workload and low latency requirement. A 19TB filesystem has up to 19000 block groups and a spinning disk can do maybe 150 seeks per second, so a worst-case commit could take a couple of minutes. Delete a few old snapshots, and you'll add enough fragmentation to touch a significant portion of the block groups, and thus see a lot of additional latency. > I would like to try v2 of the space cache to see if that improves speed a > bit. > > So is space cache v2 safe to use?! AFAIK it has been 663 days since the last bug fix specific to free space tree (a6d8654d885d "Btrfs: fix deadlock when using free space tree due to block group creation" from 5.0). That fix was backported to earlier LTS kernels. We switched to space_cache=v2 for all new filesystems back in 2016, and upgraded our last legacy machine still running space_cache=v1 in 2019. I have never considered going back to v1: we have no machines running v1, I don't run regression tests on new kernels with v1, and I've never seen a filesystem fail in the field due to v2 (even with the bugs we now know it had). IMHO the real question is "is v1 safe to use", given that its design is based on letting errors happen, then detecting and recovering from them after they occur (this is the mechanism behind the ubiquitous "failed to load free space cache for block group %llu, rebuilding it now" message). v2 prevents the errors from happening in the first place by using the same btrfs metadata update mechanisms that are used for everything else in the filesystem. The problems in v1 may be mostly theoretical. I've never cared enough about v1 to try a practical experiment to see if btrfs recovers from these problems correctly (or not). v2 doesn't have those problems even in theory, and it works, so I use v2 instead. > And > How do I make the switch properly? Unmount the filesystem, mount it once with -o clear_cache,space_cache=v2. It will take some time to create the tree. After that, no mount option is needed. With current kernels it is not possible to upgrade while the filesystem is online, i.e. to upgrade "/" you have to set rootflags in the bootloader or boot from external media. That and the long mount time to do the conversion (which offends systemd's default mount timeout parameters) are the two major gotchas. There are some patches for future kernels that will take care of details like deleting the v1 space cache inodes and other inert parts of the space_cache=v1 infrastructure. I would not bother with these now, and instead let future kernels clean up automatically.