public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 4/8] block: improve disk_zone_wplug_schedule_bio_work()
Date: Tue, 24 Feb 2026 11:03:24 +0900	[thread overview]
Message-ID: <1ce157d7-31fe-4a32-8acf-ed337dc1a52f@kernel.org> (raw)
In-Reply-To: <43213df9-9233-4d5a-8b56-77894052db67@suse.de>

On 2/23/26 8:59 PM, Hannes Reinecke wrote:
> On 2/21/26 01:44, Damien Le Moal wrote:
>> The function disk_zone_wplug_schedule_bio_work() always takes a
>> reference on the zone write plug of the BIO work being scheduled. This
>> ensure that the zone write plug cannot be freed while the BIO work is
>> being scheduled but has not run yet. However, this unconditional
>> reference taking is fragile since the reference taken is realized by the
>> BIO work blk_zone_wplug_bio_work() function, which implies that there
>> always must be a 1:1 relation between the work being scheduled and the
>> work running.
>>
>> Make this less fragile by taking the reference on the BIO work if and
>> only if the BIO work has not been already scheduled.
>>
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   block/blk-zoned.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>> index c5c91f927a71..4b388ae1acaa 100644
>> --- a/block/blk-zoned.c
>> +++ b/block/blk-zoned.c
>> @@ -1205,13 +1205,15 @@ static void disk_zone_wplug_schedule_bio_work(struct
>> gendisk *disk,
>>       lockdep_assert_held(&zwplug->lock);
>>         /*
>> -     * Take a reference on the zone write plug and schedule the submission
>> -     * of the next plugged BIO. blk_zone_wplug_bio_work() will release the
>> -     * reference we take here.
>> +     * Schedule the submission of the next plugged BIO. If the zone write
>> +     * plug BIO work is not already scheduled, take a reference on the zone
>> +     * write plug to ensure that it does not go away while the work is being
>> +     * scheduled but has not run yet. blk_zone_wplug_bio_work() will release
>> +     * the reference we take here.
>>        */
>>       WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED));
>> -    refcount_inc(&zwplug->ref);
>> -    queue_work(disk->zone_wplugs_wq, &zwplug->bio_work);
>> +    if (queue_work(disk->zone_wplugs_wq, &zwplug->bio_work))
>> +        refcount_inc(&zwplug->ref);
>>   }
>>     static inline void disk_zone_wplug_add_bio(struct gendisk *disk,
> 
> Urgh. Don't.
> There is a race condition; 'bio_work' might be scheduled directly and
> complete before 'refcount_inc()' is called.
> You have to invert the statement by keeping the 'refcount_inc()'
> before calling 'queue_work()', and then drop the reference again
> if queue_work failed.
> Ugly, I know.

As Bart pointed out, since this is done with the zone write plug spinlock held
and the work takes that lock first when it runs, there is no chance that the
refcount gets bad. That is indeed not super clear though, so I will imrove the
comment to describe that.

Note: the zone write plug refcounting is a bit a mess and needs cleaning up,
which I was planning to do on top of all this.


-- 
Damien Le Moal
Western Digital Research

  parent reply	other threads:[~2026-02-24  2:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21  0:44 [PATCH 0/8] Improve zoned (SMR) HDD write throughput Damien Le Moal
2026-02-21  0:44 ` [PATCH 1/8] block: fix zone write plug removal Damien Le Moal
2026-02-23 11:56   ` Hannes Reinecke
2026-02-23 19:30     ` Bart Van Assche
2026-02-23 20:21       ` Bart Van Assche
2026-02-24  1:57         ` Damien Le Moal
2026-02-21  0:44 ` [PATCH 2/8] block: remove BLK_ZONE_WPLUG_UNHASHED Damien Le Moal
2026-02-23 11:48   ` Hannes Reinecke
2026-02-24  2:04     ` Damien Le Moal
2026-02-21  0:44 ` [PATCH 3/8] block: remove disk_zone_is_full() Damien Le Moal
2026-02-23 11:56   ` Hannes Reinecke
2026-02-24 13:15   ` Johannes Thumshirn
2026-02-21  0:44 ` [PATCH 4/8] block: improve disk_zone_wplug_schedule_bio_work() Damien Le Moal
2026-02-23 11:59   ` Hannes Reinecke
2026-02-23 18:56     ` Bart Van Assche
2026-02-24  2:03     ` Damien Le Moal [this message]
2026-02-24 15:00       ` Hannes Reinecke
2026-02-24 15:08         ` Christoph Hellwig
2026-02-24 13:18   ` Johannes Thumshirn
2026-02-21  0:44 ` [PATCH 5/8] block: rename struct gendisk zone_wplugs_lock field Damien Le Moal
2026-02-23 12:00   ` Hannes Reinecke
2026-02-24 13:19   ` Johannes Thumshirn
2026-02-21  0:44 ` [PATCH 6/8] block: allow submitting all zone writes from a single context Damien Le Moal
2026-02-23 12:07   ` Hannes Reinecke
2026-02-24  2:00     ` Damien Le Moal
2026-02-21  0:44 ` [PATCH 7/8] block: default to QD=1 writes for blk-mq rotational zoned devices Damien Le Moal
2026-02-23 12:07   ` Hannes Reinecke
2026-02-21  0:44 ` [PATCH 8/8] Documentation: ABI: stable: document the zoned_qd1_writes attribute Damien Le Moal
2026-02-23 12:07   ` Hannes Reinecke
2026-02-23 17:03 ` [PATCH 0/8] Improve zoned (SMR) HDD write throughput Bart Van Assche
2026-02-24  1:07   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ce157d7-31fe-4a32-8acf-ed337dc1a52f@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox