public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Alex Lyakas <alex.lyakas@zadara.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH-RFC] btrfs: for unclustered allocation don't consider ffe_ctl->empty_cluster
Date: Mon, 2 Mar 2026 11:43:59 -0800	[thread overview]
Message-ID: <20260302194359.GA1173790@zen.localdomain> (raw)
In-Reply-To: <CAOcd+r31fvMHskN93LNjbXrhpL0hF1kZd6rkVuhcEAgXHtcW-A@mail.gmail.com>

On Sun, Mar 01, 2026 at 12:14:09PM +0200, Alex Lyakas wrote:
> Hi Boris,
> 
> On Thu, Feb 26, 2026 at 8:28 PM Boris Burkov <boris@bur.io> wrote:
> >
> > On Thu, Feb 26, 2026 at 09:37:46AM -0800, Boris Burkov wrote:
> > > On Thu, Feb 26, 2026 at 01:34:19PM +0200, Alex Lyakas wrote:
> > > > I encountered an issue when performing unclustered allocation for metadata:
> > > >
> > > > free_space_ctl->free_space = 2MB
> > > > ffe_ctl->num_bytes = 4096
> > > > ffe_ctl->empty_cluster = 2MB
> > > > ffe_ctl->empty_size = 0
> > > >
> > > > So early check for free_space_ctl->free_space skips the block group, even though
> > > > it has enough space to do the allocation.
> > > > I think when doing unclustered allocation we should not look at ffe_ctl->empty_cluster.
> > >
> > > I see what you are saying, and a the level of this situation and this
> > > line of code, I think you are right.
> > >
> > > But as-is, this change does not contain enough reasoning about the
> > > semantics of the allocation algorithm to realistically be anything
> > > but exchanging one bug for two new ones.
> > >
> 
> Thank you for your review and your comments below.
> 
> However, I am not sure what is it that you are actually requesting.
> Do you expect me to describe the whole mechanics of find_free_extent()
> and how its state is tracked through ffe_ctl?

To be confident that this change does not introduce a different,
potentially more serious, bug, it is important to demonstrate why this
transformation is reasonable, safe, and fits the existing model. That
implies some level of explanation of the mechanisms of the existing
model.

> 
> > > What is empty_cluster modelling? Why is it included in this calculation?
> > > Why should it not be included? Where else is empty_cluster used and
> > > should it change under this new interpretation? Does any of this change
> > > if there is actually a cluster active for clustered allocations?
> I am not changing the meaning of any field of ffe_ctl, which tracks
> the state of every call to find_free_extent().
> 
> I found an issue where "empty_cluster" is used erroneously in my opinion.
> 

At this point, in my opinion, you have a useful insight and a good lead
for a fix, but have not done enough work to justify the change you want
to make.

> According to the code, "empty_cluster" describes the minimal amount of
> free space in the block group, which we are considering to allocate a
> cluster from. I am not changing the meaning of it.
> 
> My fix is suggesting that "empty_cluster" should not be used when
> looking at free space in a block group during unclustered allocation.
> Nothing else is changed. In the issue that I saw, this bug prevented
> us from allocating a metadata block from a block group, when the block
> group still had enough space to make a 4Kb allocation.

At a minimum level, I would like to see an analysis of why the
empty_cluster may have been included in the calculation and some
reasoning about why excluding it ought to be safe and not cause issues
in other situations. Aside from improving the behavior in the conditions
you observed.

i.e., a general demonstration of an understanding of how the change fits
into the code at a wider level than just changing the one thing you care
about right now.

> 
> I tagged the patch as RFC, because I am unable to test it on the
> latest mainline kernel, and I tested it on a stable LTS kernel 6.6.

Why are you unable to test it on the latest mainline kernel? I can run
it through fstests for you, if that would be helpful.

> 
> Thanks,
> Alex.

Thanks,
Boris

> 
> 
> 
> 
> 
> > >
> > > etc. etc.
> > >
> > > Thanks,
> > > Boris
> > >
> >
> > I missed the RFC tag in the patch, so I would like to apologize for my
> > negativity. In my experience with other bugs, the interplay between the
> > clustered algorithm and the unclustered algorithm is under-specified so
> > I think it is likely you have indeed found a bug.
> >
> > If you want to fix it, I would proceed along the lines I complained
> > about in my first response and try to define the relationships between
> > the variables in a consistent way that explains why we shouldn't count
> > that variable here.
> >
> > If you do go through with sharpening the definition of empty_cluster,
> > I would be happy to review that work and help get it in.
> >
> 
> 
> 
> > Thanks,
> > Boris
> >
> > > >
> > > > I tested this on stable kernel 6.6.
> > > >
> > > > Signed-off-by: Alex Lyakas <alex.lyakas@zadara.com>
> > > > ---
> > > >  fs/btrfs/extent-tree.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > index 03cf9f242c70..84b340a67882 100644
> > > > --- a/fs/btrfs/extent-tree.c
> > > > +++ b/fs/btrfs/extent-tree.c
> > > > @@ -3885,7 +3885,7 @@ static int find_free_extent_unclustered(struct btrfs_block_group *bg,
> > > >             free_space_ctl = bg->free_space_ctl;
> > > >             spin_lock(&free_space_ctl->tree_lock);
> > > >             if (free_space_ctl->free_space <
> > > > -               ffe_ctl->num_bytes + ffe_ctl->empty_cluster +
> > > > +               ffe_ctl->num_bytes +
> > > >                 ffe_ctl->empty_size) {
> > > >                     ffe_ctl->total_free_space = max_t(u64,
> > > >                                     ffe_ctl->total_free_space,
> > > > --
> > > > 2.43.0
> > > >

  reply	other threads:[~2026-03-02 19:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 11:34 [PATCH-RFC] btrfs: for unclustered allocation don't consider ffe_ctl->empty_cluster Alex Lyakas
2026-02-26 17:37 ` Boris Burkov
2026-02-26 18:29   ` Boris Burkov
2026-03-01 10:14     ` Alex Lyakas
2026-03-02 19:43       ` Boris Burkov [this message]
2026-03-02 21:49         ` Boris Burkov
2026-03-02 21:59           ` Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260302194359.GA1173790@zen.localdomain \
    --to=boris@bur.io \
    --cc=alex.lyakas@zadara.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox