From: Boris Burkov <boris@bur.io>
To: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
David Sterba <dsterba@suse.com>,
Hans Holmberg <Hans.Holmberg@wdc.com>,
Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop
Date: Fri, 15 May 2026 11:38:27 -0700 [thread overview]
Message-ID: <20260515183827.GE1197064@zen.localdomain> (raw)
In-Reply-To: <20260513123445.43197-8-johannes.thumshirn@wdc.com>
On Wed, May 13, 2026 at 02:34:45PM +0200, Johannes Thumshirn wrote:
> On zoned filesystems, when waiting for space tickets during data
> relocation, the async reclaim flush state machine may starve if
> RECLAIM_ZONES and RESET_ZONES states are not executed early in the flush
> sequence.
>
> Currently do_async_reclaim_data_space() only executes RECLAIM_ZONES and
> RESET_ZONES in later flush states (FLUSH_DELALLOC and beyond), but by
> the time these states are reached, the ticket wait may have already
> deadlocked waiting for space that can only be freed by zone reset.
This explanation is a bit confusing to me. Does your previous fix
prevent all known deadlocks? If not, can you describe the remaining
deadlock in more detail? If having these flush states in the general
flush state list causes a deadlock, we should not leave them there, even
if we add this earlier pass.
I assume the issue is that some other flusher can't make progress when
we are out of zones and also lands on a ticket, and so async reclaim is
stuck on a ticket and the only way to make progress is to reset a zone
(hopefully) or reclaim a zone (painfully?)
Maybe we need some high level flushing logic like "needs zoned help now
please"? i.e., if we are low/out of free zones, do only zone flushing,
otherwise do regular flushing (including zoned stuff if necessary/wise?)
>
> Fix this by adding RECLAIM_ZONES and RESET_ZONES to the first async
> reclaim loop (FLUSH_ALLOC) for zoned filesystems, ensuring zone reset
> happens early enough to free space for pending allocation tickets.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>
> This patch was AI assisted and I'm not sure this is the correct thing to
> do (the flushing, not the use of AI), hence the RFC tag.
>
> fs/btrfs/space-info.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index ec811a77ebb1..a1235f114f3e 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1451,6 +1451,17 @@ static void do_async_reclaim_data_space(struct btrfs_space_info *space_info)
>
> while (!space_info->full) {
> flush_space(space_info, U64_MAX, ALLOC_CHUNK_FORCE, false);
> + /*
> + * For zoned filesystems, also run RECLAIM_ZONES and RESET_ZONES
> + * in the first loop to avoid starvation. Zoned filesystems have
> + * sequential write requirements, so space cannot be reused until
> + * zones are reset. Running these states early ensures zones are
> + * reclaimed and reset before we get into a starvation situation.
> + */
> + if (btrfs_is_zoned(fs_info)) {
> + flush_space(space_info, U64_MAX, RECLAIM_ZONES, false);
> + flush_space(space_info, U64_MAX, RESET_ZONES, false);
> + }
Just to set a common ground on the existing algorithm:
The current logic is to allocate a chunk (which may satisfy tickets)
then check if we have any tickets left. If not tickets left, great we're
done. Else, allocate another chunk (till full). Finally, go into the
various flushers if we can't allocate a chunk.
The way you have changed it, you tack on the two zoned specific flushes
right after allocating a chunk, regardless of the continued presense of
tickets. That feels off to me. I don't know enough about zoned to
accurately judge how you want to order it, but I think the question you
want to answer for yourself is:
If there are totally free bgs on a zoned fs, do I want to run
reclaim/reset zones before or after allocating them?
Given the fixed number of zones, I would assume reset zones at least
should come before grabbing a fresh bg? (Unless that fails in a free
zone aware way?)
OTOH, if you put zoned reclaim before chunk alloc, we may block data
allocations on pretty expensive reclaim work when we could just make
progress now by allocating a chunk.
Long term, I am planning to refactor space flushing to try to make the
separate work less sequential and driven by the demand for the
particular type of flushing, but that is way longer term than your
immediate need. I am just saying that to hopefully make the pain of the
"ordering" aspect a bit more clear in greater context, it's not zoned
specific. (it's bad to keep running delalloc first if we have a bunch
of ordered extents out and should instead run delayed refs or commmit a
txn to unpin, e.g.)
> spin_lock(&space_info->lock);
> if (list_empty(&space_info->tickets)) {
> space_info->flush = false;
> --
> 2.54.0
>
next prev parent reply other threads:[~2026-05-15 18:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 12:34 [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 1/7] btrfs: zoned: document RECLAIM_ZONES flush state Johannes Thumshirn
2026-05-14 14:44 ` Boris Burkov
2026-05-13 12:34 ` [PATCH 2/7] btrfs: zoned: decode 'RECLAIM_ZONES' state in tracepoints Johannes Thumshirn
2026-05-13 12:34 ` [PATCH 3/7] btrfs: zoned: always set data_relocation_bg Johannes Thumshirn
2026-05-14 5:42 ` Damien Le Moal
2026-05-14 14:54 ` Boris Burkov
2026-05-13 12:34 ` [PATCH 4/7] btrfs: zoned: don't account data relocation space-info in statfs free space Johannes Thumshirn
2026-05-14 5:42 ` Damien Le Moal
2026-05-15 4:38 ` Christoph Hellwig
2026-05-13 12:34 ` [PATCH 5/7] btrfs: zoned: subtract zone_unusable space in statfs Johannes Thumshirn
2026-05-14 5:43 ` Damien Le Moal
2026-05-15 4:39 ` Christoph Hellwig
2026-05-15 9:26 ` Johannes Thumshirn
2026-05-15 11:34 ` Christoph Hellwig
2026-05-15 21:05 ` Boris Burkov
2026-05-13 12:34 ` [PATCH 6/7] btrfs: zoned: fix deadlock waiting for ticket during data relocation Johannes Thumshirn
2026-05-15 17:26 ` Boris Burkov
2026-05-13 12:34 ` [RFC PATCH 7/7] btrfs: zoned: add RECLAIM_ZONES and RESET_ZONES to first async reclaim loop Johannes Thumshirn
2026-05-15 18:38 ` Boris Burkov [this message]
2026-05-14 14:43 ` [PATCH 0/7] btrfs: fixes around generic/747 on zoned filesystems Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515183827.GE1197064@zen.localdomain \
--to=boris@bur.io \
--cc=Hans.Holmberg@wdc.com \
--cc=dlemoal@kernel.org \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=hch@lst.de \
--cc=johannes.thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox