From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76A67C433F5 for ; Thu, 6 Oct 2022 09:49:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231175AbiJFJtP (ORCPT ); Thu, 6 Oct 2022 05:49:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231468AbiJFJtO (ORCPT ); Thu, 6 Oct 2022 05:49:14 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B8A1696E9 for ; Thu, 6 Oct 2022 02:49:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id C5020B82036 for ; Thu, 6 Oct 2022 09:49:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8AD70C43470 for ; Thu, 6 Oct 2022 09:49:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665049750; bh=GyscXwLfBCNb3H+dAej4vFMiaiyE9Tbm0RYuShoTRs4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=VFkHIJbrRKlQy+x25dLS4+RpgfWryCyhmIjl9VKFwodMvJr2WjubcYCTHejiX9tb+ s+ZxZNpqOebRtz7WoQP646DvFg5WO7OmRqXTSvh+mf6LwosOwyusuSvOo5CfmWX8Ey 5k1I4wiizYtmrb9ibSKFbEsA0bcJIiXVoFZyu8cOM9zWEEb9Ny/bh90xBP86Y/6W2K JLhBLcE9EVea8vc59w/gFSgp0OpZsttaxbguKx/TzpLDuIa1zJOP5YejVcAKOqeplv 5afHJ/51Cw1ikxKft/zFpXh9haJ/08MOIyzjs5wMYpRk0/pUYx/ohJkS/vDaOutihk U6LY3CfHKo4mg== Received: by mail-oi1-f172.google.com with SMTP id g130so1399940oia.13 for ; Thu, 06 Oct 2022 02:49:10 -0700 (PDT) X-Gm-Message-State: ACrzQf3KcUCgCjw9vW/jGIW8pPazNA0J3iWku3ZRqU25U2TLaJgUVGno FQFg+vUxqAyZ/J2fucDzsqDzuW6PRv6iMFFeP6s= X-Google-Smtp-Source: AMsMyM5YC9TXmPHK5q0D7C6dySCY+2LLnpGTO0LJee8giGYi1okM4goFeUaWgLw2dXcBc54o/4UusX4EjZhH/jz5BF4= X-Received: by 2002:a05:6808:1a09:b0:350:1e6a:e469 with SMTP id bk9-20020a0568081a0900b003501e6ae469mr4230759oib.92.1665049749705; Thu, 06 Oct 2022 02:49:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Filipe Manana Date: Thu, 6 Oct 2022 10:48:33 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/5] btrfs: 1G falloc extents To: Qu Wenruo Cc: Boris Burkov , linux-btrfs@vger.kernel.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, Oct 6, 2022 at 9:06 AM Qu Wenruo wrote: > > > > On 2022/10/6 03:49, Boris Burkov wrote: > > When doing a large fallocate, btrfs will break it up into 256MiB > > extents. Our data block groups are 1GiB, so a more natural maximum size > > is 1GiB, so that we tend to allocate and fully use block groups rather > > than fragmenting the file around. > > > > This is especially useful if large fallocates tend to be for "round" > > amounts, which strikes me as a reasonable assumption. > > > > While moving to size classes reduces the value of this change, it is > > also good to compare potential allocator algorithms against just 1G > > extents. > > Btrfs extent booking is already causing a lot of wasted space, is this > larger extent size really a good idea? > > E.g. after a lot of random writes, we may have only a very small part of > the original 1G still being referred. > (The first write into the pre-allocated range will not be COWed, but the > next one over the same range will be COWed) > > But the full 1G can only be freed if none of its sectors is referred. > Thus this would make preallocated space much harder to be free, > snapshots/reflink can make it even worse. > > So wouldn't such enlarged preallocted extent size cause more pressure? I agree, increasing the max extent size here does not seem like a good thing to do. If an application fallocates space, then it generally expects to write to all that space. However future random partial writes may not rewrite the entire extent for a very long time, therefore making us keep a 1G extent for a very long time (or forever in the worst case). Even for NOCOW files, it's still an issue if snapshots are used. > > In fact, the original 256M is already too large to me. > > Thanks, > Qu > > > > Signed-off-by: Boris Burkov > > --- > > fs/btrfs/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 45ebef8d3ea8..fd66586ae2fc 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -9884,7 +9884,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, > > if (trans) > > own_trans = false; > > while (num_bytes > 0) { > > - cur_bytes = min_t(u64, num_bytes, SZ_256M); > > + cur_bytes = min_t(u64, num_bytes, SZ_1G); > > cur_bytes = max(cur_bytes, min_size); > > /* > > * If we are severely fragmented we could end up with really