public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"kernel-team@fb.com" <kernel-team@fb.com>
Subject: Re: [PATCH 3/3] btrfs: handle active zone accounting properly
Date: Thu, 2 Mar 2023 11:07:24 -0500	[thread overview]
Message-ID: <ZADJvIYBD3K9Efkl@localhost.localdomain> (raw)
In-Reply-To: <20230302144512.cc7ofuvud5dodvxi@naota-xeon>

On Thu, Mar 02, 2023 at 02:45:13PM +0000, Naohiro Aota wrote:
> On Thu, Mar 02, 2023 at 04:01:07PM +0900, Naohiro Aota wrote:
> > On Wed, Mar 01, 2023 at 04:14:44PM -0500, Josef Bacik wrote:
> > > Running xfstests on my ZNS drives uncovered a problem where I was
> > > allocating the entire device with metadata block groups.  The root cause
> > > of this was we would get ENOSPC on mount, and while trying to satisfy
> > > tickets we'd allocate more block groups.
> > > 
> > > The reason we were getting ENOSPC was because ->bytes_zone_unusable was
> > > set to 40gib, but ->active_total_bytes was set to 8gib, which was the
> > > maximum number of active block groups we're allowed to have at one time.
> > > This was because we always update ->bytes_zone_unusable with the
> > > unusable amount from every block group, but we only update
> > > ->active_total_bytes with the active block groups.
> > >
> > > This is actually much worse however, because we could potentially have
> > > other counters potentially well above the ->active_total_bytes, which
> > > would lead to this early enospc situation.
> > > 
> > > Fix this by mirroring the counters for active block groups into their
> > > own counters.  This allows us to keep the full space_info counters
> > > consistent, which is needed for things like sysfs and the space info
> > > ioctl, and then track the actual usage for ENOSPC reasons for only the
> > > active block groups.
> > 
> > I think the mirroring the counters approach duplicates the code and
> > variables and makes them complex. Instead, we can fix the
> > "active_total_bytes" accounting maybe like this:
> > 
> > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > index d4dd73c9a701..bf4d96d74efe 100644
> > --- a/fs/btrfs/space-info.c
> > +++ b/fs/btrfs/space-info.c
> > @@ -319,7 +319,8 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
> >  	ASSERT(found);
> >  	spin_lock(&found->lock);
> >  	found->total_bytes += block_group->length;
> > -	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
> > +	if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags) ||
> > +	    btrfs_zoned_bg_is_full(block_group))
> >  		found->active_total_bytes += block_group->length;
> >  	found->disk_total += block_group->length * factor;
> >  	found->bytes_used += block_group->used;
> > 
> > Or, we can remove "active_total_bytes" and introduce something like
> > "preactivated_bytes" to count the bytes of BGs never get activated (BGs #1 below).
> 
> I got a new idea. How about counting all the region in a new block group as
> zone_unusable? Then, region [0 .. zone_capacity] will be freed for use once
> it gets activated. With this, we can drop "active_total_bytes" so the code
> will become similar between the regular and the zoned mode.
> 
> We also need to care not to reclaim the fresh BGs but it's trivial
> (alloc_offset == 0).
> 

I like this idea better than mine for sure.  Will you write it up and send it
and I'll run it through my test-rig?  Thanks,

Josef

      reply	other threads:[~2023-03-02 16:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01 21:14 [PATCH 0/3] Fix active zone accounting for zoned Josef Bacik
2023-03-01 21:14 ` [PATCH 1/3] btrfs: rename BTRFS_FS_NO_OVERCOMMIT -> BTRFS_FS_ACTIVE_ZONE_TRACKING Josef Bacik
2023-03-02  6:45   ` Anand Jain
2023-03-02 10:03   ` Johannes Thumshirn
2023-03-02 14:45   ` Naohiro Aota
2023-03-01 21:14 ` [PATCH 2/3] btrfs: clean up space_info usage in btrfs_update_block_group Josef Bacik
2023-03-02  6:46   ` Anand Jain
2023-03-02 10:04   ` Johannes Thumshirn
2023-03-02 14:47   ` Naohiro Aota
2023-03-01 21:14 ` [PATCH 3/3] btrfs: handle active zone accounting properly Josef Bacik
2023-03-02  7:01   ` Naohiro Aota
2023-03-02 14:45     ` Naohiro Aota
2023-03-02 16:07       ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZADJvIYBD3K9Efkl@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox