public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 06/26] block: Introduce zone write plugging
Date: Mon, 5 Feb 2024 21:43:14 +0900	[thread overview]
Message-ID: <93aa5cdd-0ab4-413c-bb18-b1109c7f4f9d@kernel.org> (raw)
In-Reply-To: <ac52010e-c97b-43cf-baa7-0566185bc410@kernel.org>

On 2/5/24 21:20, Damien Le Moal wrote:
> On 2/5/24 19:06, Ming Lei wrote:
>> On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote:
>>> On 2/5/24 11:19, Ming Lei wrote:
>>>>>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
>>>>>>> +					  struct bio *bio, unsigned int nr_segs)
>>>>>>> +{
>>>>>>> +	/*
>>>>>>> +	 * Keep a reference on the BIO request queue usage. This reference will
>>>>>>> +	 * be dropped either if the BIO is failed or after it is issued and
>>>>>>> +	 * completes.
>>>>>>> +	 */
>>>>>>> +	percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
>>>>>>
>>>>>> It is fragile to get nested usage_counter, and same with grabbing/releasing it
>>>>>> from different contexts or even functions, and it could be much better to just
>>>>>> let block layer maintain it.
>>>>>>
>>>>>> From patch 23's change:
>>>>>>
>>>>>> +	 * Zoned block device information. Reads of this information must be
>>>>>> +	 * protected with blk_queue_enter() / blk_queue_exit(). Modifying this
>>>>>>
>>>>>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and
>>>>>> request_queue are live, so not sure if this .q_usage_counter protection
>>>>>> is needed.
>>>>>
>>>>> Hannes also commented about this. Let me revisit this.
>>>>
>>>> I think only queue re-configuration(blk_revalidate_zone) requires the
>>>> queue usage counter. Otherwise, bdev open()/close() should work just
>>>> fine.
>>>
>>> I want to check FS case though. No clear if mounting FS that supports zone
>>> (btrfs) also uses bdev open ?
>>
>> I feel the following delta change might be cleaner and easily documented:
>>
>> - one IO takes single reference for both bio based and blk-mq,
>> - no drop & re-grab
>> - only grab extra reference for bio based
>> - two code paths share same pattern
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 9520ccab3050..118dd789beb5 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -597,6 +597,10 @@ static void __submit_bio(struct bio *bio)
>>  
>>  	if (!bio->bi_bdev->bd_has_submit_bio) {
>>  		blk_mq_submit_bio(bio);
>> +	} else if (bio_zone_write_plugging(bio)) {
>> +		struct gendisk *disk = bio->bi_bdev->bd_disk;
>> +
>> +		disk->fops->submit_bio(bio);

Actually, no, that is not correct. This would not stop BIO submission if
blk_queue_freeze() was called by another context. So we cannot do that here
without calling blk_queue_enter()...

>>  	} else if (likely(bio_queue_enter(bio) == 0)) {
>>  		struct gendisk *disk = bio->bi_bdev->bd_disk;
>>  
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index f0fc61a3ec81..fc6d792747dc 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3006,8 +3006,12 @@ void blk_mq_submit_bio(struct bio *bio)
>>  	if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
>>  		goto queue_exit;
>>  
>> +	/*
>> +	 * Grab one reference for plugged zoned write and it will be reused in
>> +	 * next real submission
>> +	 */
>>  	if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
>> -		goto queue_exit;
>> +		return;

...and this one is not correct because of the cached request: if there was a
cached request, blk_mq_submit_bio() did not call blk_queue_enter() because the
cached request already had a reference. But we cannot reuse that reference as
the next BIO may be a read or a write to a zone that is not plugged, and these
would use the cached request and so need the usage counter reference. So we
would still need to grab an extra reference in such case.

So in the end, it feels a lot simpler to keep the reference counting as it was
as it makes things a lot less messier in blk_mq_submit_bio(). I will though try
to improve the comments to make it clear how this is working.

>>  
>>  	if (!rq) {
>>  new_request:
>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>> index f6d4f511b664..87abb3f7ef30 100644
>> --- a/block/blk-zoned.c
>> +++ b/block/blk-zoned.c
>> @@ -514,7 +514,8 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
>>  	 * be dropped either if the BIO is failed or after it is issued and
>>  	 * completes.
>>  	 */
>> -	percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
>> +	if (bio->bi_bdev->bd_has_submit_bio)
>> +		percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
>>  
>>  	/*
>>  	 * The BIO is being plugged and thus will have to wait for the on-going
>> @@ -760,15 +761,10 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
>>  
>>  	blk_zone_wplug_unlock(zwplug, flags);
>>  
>> -	/*
>> -	 * blk-mq devices will reuse the reference on the request queue usage
>> -	 * we took when the BIO was plugged, but the submission path for
>> -	 * BIO-based devices will not do that. So drop this reference here.
>> -	 */
>> +	submit_bio_noacct_nocheck(bio);
>> +
>>  	if (bio->bi_bdev->bd_has_submit_bio)
>>  		blk_queue_exit(bio->bi_bdev->bd_disk->queue);
> 
> Hmm... As-is, this is a potential use-after-free of the bio. But I get the idea.
> This is indeed a little better. I will integrate this.
> 
>> -
>> -	submit_bio_noacct_nocheck(bio);
>>  }
>>  
>>  static struct blk_zone_wplug *blk_zone_alloc_write_plugs(unsigned int nr_zones)
>>
>> Thanks,
>> Ming
>>
>>
> 

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2024-02-05 12:43 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  7:30 [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-02  7:30 ` [PATCH 01/26] block: Restore sector of flush requests Damien Le Moal
2024-02-04 11:55   ` Hannes Reinecke
2024-02-05 17:22   ` Bart Van Assche
2024-02-05 23:42     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 02/26] block: Remove req_bio_endio() Damien Le Moal
2024-02-04 11:57   ` Hannes Reinecke
2024-02-05 17:28   ` Bart Van Assche
2024-02-05 23:45     ` Damien Le Moal
2024-02-09  6:53     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-02-03  4:09   ` Bart Van Assche
2024-02-04 11:58   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 04/26] block: Introduce blk_zone_complete_request_bio() Damien Le Moal
2024-02-04 11:59   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-02-03  4:11   ` Bart Van Assche
2024-02-04 12:00   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 06/26] block: Introduce zone write plugging Damien Le Moal
2024-02-04  3:56   ` Ming Lei
2024-02-04 23:57     ` Damien Le Moal
2024-02-05  2:19       ` Ming Lei
2024-02-05  2:41         ` Damien Le Moal
2024-02-05  3:38           ` Ming Lei
2024-02-05  5:11           ` Christoph Hellwig
2024-02-05  5:37             ` Damien Le Moal
2024-02-05  5:50               ` Christoph Hellwig
2024-02-05  6:14                 ` Damien Le Moal
2024-02-05 10:06           ` Ming Lei
2024-02-05 12:20             ` Damien Le Moal
2024-02-05 12:43               ` Damien Le Moal [this message]
2024-02-04 12:14   ` Hannes Reinecke
2024-02-05 17:48   ` Bart Van Assche
2024-02-05 23:48     ` Damien Le Moal
2024-02-06  0:52       ` Bart Van Assche
2024-02-02  7:30 ` [PATCH 07/26] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-02-04 12:15   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 08/26] block: Implement zone append emulation Damien Le Moal
2024-02-04 12:24   ` Hannes Reinecke
2024-02-05  0:10     ` Damien Le Moal
2024-02-05 17:58   ` Bart Van Assche
2024-02-05 23:57     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-02-04 12:26   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 10/26] dm: Use the block layer zone append emulation Damien Le Moal
2024-02-03 17:58   ` Mike Snitzer
2024-02-05  5:38     ` Damien Le Moal
2024-02-05 20:33       ` Mike Snitzer
2024-02-05 23:40         ` Damien Le Moal
2024-02-06 20:41           ` Mike Snitzer
2024-02-04 12:30   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 11/26] scsi: sd: " Damien Le Moal
2024-02-04 12:29   ` Hannes Reinecke
2024-02-06  1:55   ` Martin K. Petersen
2024-02-02  7:30 ` [PATCH 12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-02-04 12:31   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 13/26] null_blk: " Damien Le Moal
2024-02-04 12:31   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 14/26] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-02-04 12:32   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 15/26] null_blk: Introduce fua attribute Damien Le Moal
2024-02-04 12:33   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-02-04 12:34   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 17/26] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-02-04 12:34   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 18/26] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-02-04 12:35   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 19/26] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-02-04 12:36   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 20/26] block: Remove elevator required features Damien Le Moal
2024-02-04 12:36   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 21/26] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-02-04 12:37   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 22/26] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-02-04 12:38   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 23/26] block: Remove zone write locking Damien Le Moal
2024-02-04 12:38   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 24/26] block: Do not special-case plugging of zone write operations Damien Le Moal
2024-02-04 12:39   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 25/26] block: Reduce zone write plugging memory usage Damien Le Moal
2024-02-04 12:42   ` Hannes Reinecke
2024-02-05 17:51     ` Bart Van Assche
2024-02-05 23:55       ` Damien Le Moal
2024-02-06 21:20         ` Bart Van Assche
2024-02-09  3:58           ` Damien Le Moal
2024-02-09 19:36             ` Bart Van Assche
2024-02-10  0:06               ` Damien Le Moal
2024-02-11  3:40                 ` Bart Van Assche
2024-02-12  1:09                   ` Damien Le Moal
2024-02-12 18:58                     ` Bart Van Assche
2024-02-12  8:23                   ` Damien Le Moal
2024-02-12  8:47                     ` Damien Le Moal
2024-02-12 18:40                       ` Bart Van Assche
2024-02-13  0:05                         ` Damien Le Moal
2024-02-02  7:31 ` [PATCH 26/26] block: Add zone_active_wplugs debugfs entry Damien Le Moal
2024-02-04 12:43   ` Hannes Reinecke
2024-02-02  7:37 ` [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-03 12:11   ` Jens Axboe
2024-02-09  5:28     ` Damien Le Moal
2024-02-05 17:21 ` Bart Van Assche
2024-02-05 23:42   ` Damien Le Moal
2024-02-06  0:57     ` Bart Van Assche
2024-02-05 18:18 ` Bart Van Assche
2024-02-06  0:07   ` Damien Le Moal
2024-02-06  1:25     ` Bart Van Assche
2024-02-09  4:03       ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93aa5cdd-0ab4-413c-bb18-b1109c7f4f9d@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox