public inbox for fio@vger.kernel.org
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Vincent Fu <vincentfu@gmail.com>
Subject: Re: [PATCH v2 2/8] zbd: fix write zone accounting
Date: Fri, 27 Feb 2026 12:08:59 +0000	[thread overview]
Message-ID: <aaGENdb45uNtc6C3@shinmob> (raw)
In-Reply-To: <e1c6465e-8eb5-4a4e-b323-c31bccdedad2@kernel.org>

On Feb 27, 2026 / 13:53, Damien Le Moal wrote:
> On 2/16/26 16:59, Shin'ichiro Kawasaki wrote:
> > Currently, zbd_convert_to_write_zones() calls io_u_quiesce() when the
> > number of write target zones hits one of the limits of write zones. This
> > wait by io_u_quiesce() significantly degrade the performance. While I
> > tried to remove the io_u_quiesce(), I observed that the test case 58 of
> > t/zbd/test-zbd-support failed with null_blk devices that have a
> > max_active_zones limit set.
> > 
> > The failure cause is an incorrect write target zone accounting in
> > zbd_convert_to_write_zones(). This function checks the current write
> > target zones, and selects one of them as the next write target zone.
> > After the zone selection, it locks the zone. But when the zone is
> > locked, another job might have removed the zone from the write target
> > zones array. This caused an incorrect zone accounting and the test case
> > failure.
> > 
> > To avoid the incorrect zone accounting, call zbd_write_zone_get() after
> > the selected zone gets locked. If the zone is removed from the write
> > target zones array, the function adds the zone back to the array.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > ---
> >  zbd.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/zbd.c b/zbd.c
> > index b71f842c..c511b709 100644
> > --- a/zbd.c
> > +++ b/zbd.c
> > @@ -1693,8 +1693,17 @@ retry:
> >  
> >  		zone_lock(td, f, z);
> >  		if (zbd_zone_remainder(z) >= min_bs) {
> > -			need_zone_finish = false;
> > -			goto out;
> > +			/*
> > +			 * The zone might be already removed from
> > +			 * zbdi->write_zones[] by other jobs at this moment.
> > +			 * Even if the zone has remainder, call
> > +			 * zbd_write_zone_get() to ensure that it is in the
> > +			 * array.
> > +			 */
> > +			if (zbd_write_zone_get(td, f, z)) {
> > +				need_zone_finish = false;
> > +				goto out;
> > +			}
> 
> The way I understand this is: since we do have a remainder, the zone is not
> full, so zbd_write_zone_get() cannot return false. So this looks OK, but is also
> very confusing. What about removing the if and instead use an assert checking
> that zbd_write_zone_get() returns true ?

As to the failure of the test case 58, which has one write job and one trim job,
I think your idea will work. However, I still think there is a tiny possibility
that zbd_write_zone_get() returns false due to the max_open_zone limit and what
other jobs do. As I describe below, the zone can be removed from the write
target array by the trim job. And if another write job exists and if it open
another zone in parallel, zbd_write_zone_get() may hit max_open_zones limit and
return false.

> 
> Also, it is not clear what the conditions are for a zone that is still not full
> to be removed from the array. Can you detail that ?

As to the test case 58, the condition is that there is a trim workload running
in parallel to a write workload. The trim workload can choose the zone in the
write target array, and reset the zone. After the reset, the trim worload
removes the zone from the array, then call zone_unlock(). The zone reset by the
trim zone can happen just before the zone_lock() in the hunk above. I think
write workloads with the zone_reset_threshold option can cause the same failure.

> 
> >  		}
> >  		pthread_mutex_lock(&zbdi->mutex);
> >  	}
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

  reply	other threads:[~2026-02-27 12:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-16  7:59 [PATCH v2 0/8] zbd: fix problems of random write with unaligned block size Shin'ichiro Kawasaki
2026-02-16  7:59 ` [PATCH v2 1/8] zbd: fix zone selection of random writes Shin'ichiro Kawasaki
2026-02-27  4:48   ` Damien Le Moal
2026-02-16  7:59 ` [PATCH v2 2/8] zbd: fix write zone accounting Shin'ichiro Kawasaki
2026-02-27  4:53   ` Damien Le Moal
2026-02-27 12:08     ` Shinichiro Kawasaki [this message]
2026-02-16  7:59 ` [PATCH v2 3/8] zbd: introduce write_zone_remainder option Shin'ichiro Kawasaki
2026-02-27  4:59   ` Damien Le Moal
2026-02-16  7:59 ` [PATCH v2 4/8] doc: explain the option write_zone_remainder Shin'ichiro Kawasaki
2026-02-27  5:06   ` Damien Le Moal
2026-02-16  7:59 ` [PATCH v2 5/8] t/zbd: add -m option to enable write_zone_remainder option Shin'ichiro Kawasaki
2026-02-16  7:59 ` [PATCH v2 6/8] t/zbd: avoid test case 14 failure with " Shin'ichiro Kawasaki
2026-02-16  7:59 ` [PATCH v2 7/8] t/zbd: avoid test case 33 " Shin'ichiro Kawasaki
2026-02-16  7:59 ` [PATCH v2 8/8] t/zbd: avoid test case 71 " Shin'ichiro Kawasaki
2026-02-16  9:10 ` [PATCH v2 0/8] zbd: fix problems of random write with unaligned block size fiotestbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaGENdb45uNtc6C3@shinmob \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=fio@vger.kernel.org \
    --cc=vincentfu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox