All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	ming.lei@redhat.com
Subject: Re: [PATCH 06/26] block: Introduce zone write plugging
Date: Sun, 4 Feb 2024 11:56:18 +0800	[thread overview]
Message-ID: <Zb8K4uSN3SNeqrPI@fedora> (raw)
In-Reply-To: <20240202073104.2418230-7-dlemoal@kernel.org>

On Fri, Feb 02, 2024 at 04:30:44PM +0900, Damien Le Moal wrote:
> Zone write plugging implements a per-zone "plug" for write operations to
> tightly control the submission and execution order of writes to
> sequential write required zones of a zoned block device. Per-zone
> plugging of writes guarantees that at any time at most one write request
> per zone is in flight. This mechanism is intended to replace zone write
> locking which is controlled at the scheduler level and implemented only
> by mq-deadline.
> 
> Unlike zone write locking which operates on requests, zone write
> plugging operates on BIOs. A zone write plug is simply a BIO list that
> is atomically manipulated using a spinlock and a kblockd submission
> work. A write BIO to a zone is "plugged" to delay its execution if a
> write BIO for the same zone was already issued, that is, if a write
> request for the same zone is being executed. The next plugged BIO is
> unplugged and issued once the write request completes.
> 
> This mechanism allows to:
>  - Untangles zone write ordering from block IO schedulers. This allows
>    removing the restriction on using only mq-deadline for zoned block
>    devices. Any block IO scheduler, including "none" can be used.
>  - Zone write plugging operates on BIOs instead of requests. Plugged
>    BIOs waiting for execution thus do not hold scheduling tags and thus
>    are not preventing other BIOs to proceed (reads or writes to other
>    zones). Depending on the workload, this can significantly improve
>    the device use and performance.
>  - Both blk-mq (request) based zoned devices and BIO-based devices (e.g.
>    device mapper) can use zone write plugging. It is mandatory for the
>    former but optional for the latter: BIO-based driver can use zone
>    write plugging to implement write ordering guarantees, or the drivers
>    can implement their own if needed.
>  - The code is less invasive in the block layer and is mostly limited to
>    blk-zoned.c with some small changes in blk-mq.c, blk-merge.c and
>    bio.c.
> 
> Zone write plugging is implemented using struct blk_zone_wplug. This
> structurei includes a spinlock, a BIO list and a work structure to
> handle the submission of plugged BIOs.
> 
> Plugging of zone write BIOs is done using the function
> blk_zone_write_plug_bio() which returns false if a BIO execution does
> not need to be delayed and true otherwise. This function is called
> from blk_mq_submit_bio() after a BIO is split to avoid large BIOs
> spanning multiple zones which would cause mishandling of zone write
> plugging. This enables by default zone write plugging for any mq
> request-based block device. BIO-based device drivers can also use zone
> write plugging by expliclty calling blk_zone_write_plug_bio() in their
> ->submit_bio method. For such devices, the driver must ensure that a
> BIO passed to blk_zone_write_plug_bio() is already split and not
> straddling zone boundaries.
> 
> Only write and write zeroes BIOs are plugged. Zone write plugging does
> not introduce any significant overhead for other operations. A BIO that
> is being handled through zone write plugging is flagged using the new
> BIO flag BIO_ZONE_WRITE_PLUGGING. A request handling a BIO flagged with
> this new flag is flagged with the new RQF_ZONE_WRITE_PLUGGING flag.
> The completion processing of BIOs and requests flagged trigger
> respectively calls to the functions blk_zone_write_plug_bio_endio() and
> blk_zone_write_plug_complete_request(). The latter function is used to
> trigger submission of the next plugged BIO using the zone plug work.
> blk_zone_write_plug_bio_endio() does the same for BIO-based devices.
> This ensures that at any time, at most one request (blk-mq devices) or
> one BIO (BIO-based devices) are being executed for any zone. The
> handling of zone write plug using a per-zone plug spinlock maximizes
> parrallelism and device usage by allowing multiple zones to be writen
> simultaneously without lock contention.
> 
> Zone write plugging ignores flush BIOs without data. Hovever, any flush
> BIO that has data is always plugged so that the write part of the flush
> sequence is serialized with other regular writes.
> 
> Given that any BIO handled through zone write plugging will be the only
> BIO in flight for the target zone when it is executed, the unplugging
> and submission of a BIO will have no chance of successfully merging with
> plugged requests or requests in the scheduler. To overcome this
> potential performance loss, blk_mq_submit_bio() calls the function
> blk_zone_write_plug_attempt_merge() to try to merge other plugged BIOs
> with the one just unplugged. Successful merging is signaled using
> blk_zone_write_plug_bio_merged(), called from bio_attempt_back_merge().
> Furthermore, to avoid recalculating the number of segments of plugged
> BIOs to attempt merging, the number of segments of a plugged BIO is
> saved using the new struct bio field __bi_nr_segments. To avoid growing
> the size of struct bio, this field is added as a union with the
> bio_cookie field. This is safe to do as polling is always disabled for
> plugged BIOs.
> 
> When BIOs are plugged in a zone write plug, the device request queue
> usage counter is always incremented. This kept and reused when the
> plugged BIO is unplugged and submitted again using
> submit_bio_noacct_nocheck(). For this case, the unplugged BIO is already
> flagged with BIO_ZONE_WRITE_PLUGGING and blk_mq_submit_bio() proceeds
> directly to allocating a new request for the BIO, re-using the usage
> reference count taken when the BIO was plugged. This extra reference
> count is dropped in blk_zone_write_plug_attempt_merge() for any plugged
> BIO that is successfully merged. Given that BIO-based devices will not
> take this path, the extra reference is dropped when a plugged BIO is
> unplugged and submitted.
> 
> To match the new data structures used for zoned disks, the function
> disk_free_zone_bitmaps() is renamed to the more generic
> disk_free_zone_resources().
> 
> This commit contains contributions from Christoph Hellwig <hch@lst.de>.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  block/bio.c               |   7 +
>  block/blk-merge.c         |  11 +
>  block/blk-mq.c            |  28 +++
>  block/blk-zoned.c         | 408 +++++++++++++++++++++++++++++++++++++-
>  block/blk.h               |  32 ++-
>  block/genhd.c             |   2 +-
>  include/linux/blk-mq.h    |   2 +
>  include/linux/blk_types.h |   8 +-
>  include/linux/blkdev.h    |   8 +
>  9 files changed, 496 insertions(+), 10 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index b9642a41f286..c8b0f7e8c713 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1581,6 +1581,13 @@ void bio_endio(struct bio *bio)
>  	if (!bio_integrity_endio(bio))
>  		return;
>  
> +	/*
> +	 * For BIOs handled through a zone write plugs, signal the end of the
> +	 * BIO to the zone write plug to submit the next plugged BIO.
> +	 */
> +	if (bio_zone_write_plugging(bio))
> +		blk_zone_write_plug_bio_endio(bio);
> +
>  	rq_qos_done_bio(bio);
>  
>  	if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index a1ef61b03e31..2b5489cd9c65 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -377,6 +377,7 @@ struct bio *__bio_split_to_limits(struct bio *bio,
>  		blkcg_bio_issue_init(split);
>  		bio_chain(split, bio);
>  		trace_block_split(split, bio->bi_iter.bi_sector);
> +		WARN_ON_ONCE(bio_zone_write_plugging(bio));
>  		submit_bio_noacct(bio);
>  		return split;
>  	}
> @@ -980,6 +981,9 @@ enum bio_merge_status bio_attempt_back_merge(struct request *req,
>  
>  	blk_update_mixed_merge(req, bio, false);
>  
> +	if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> +		blk_zone_write_plug_bio_merged(bio);
> +
>  	req->biotail->bi_next = bio;
>  	req->biotail = bio;
>  	req->__data_len += bio->bi_iter.bi_size;
> @@ -995,6 +999,13 @@ static enum bio_merge_status bio_attempt_front_merge(struct request *req,
>  {
>  	const blk_opf_t ff = bio_failfast(bio);
>  
> +	/*
> +	 * A front merge for zone writes can happen only if the user submitted
> +	 * writes out of order. Do not attempt this to let the write fail.
> +	 */
> +	if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> +		return BIO_MERGE_FAILED;
> +
>  	if (!ll_front_merge_fn(req, bio, nr_segs))
>  		return BIO_MERGE_FAILED;
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f02e486a02ae..aa49bebf1199 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -830,6 +830,9 @@ static void blk_complete_request(struct request *req)
>  		bio = next;
>  	} while (bio);
>  
> +	if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> +		blk_zone_write_plug_complete_request(req);
> +
>  	/*
>  	 * Reset counters so that the request stacking driver
>  	 * can find how many bytes remain in the request
> @@ -943,6 +946,9 @@ bool blk_update_request(struct request *req, blk_status_t error,
>  	 * completely done
>  	 */
>  	if (!req->bio) {
> +		if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> +			blk_zone_write_plug_complete_request(req);
> +
>  		/*
>  		 * Reset counters so that the request stacking driver
>  		 * can find how many bytes remain in the request
> @@ -2975,6 +2981,17 @@ void blk_mq_submit_bio(struct bio *bio)
>  	struct request *rq;
>  	blk_status_t ret;
>  
> +	/*
> +	 * A BIO that was released form a zone write plug has already been
> +	 * through the preparation in this function, already holds a reference
> +	 * on the queue usage counter, and is the only write BIO in-flight for
> +	 * the target zone. Go straight to allocating a request for it.
> +	 */
> +	if (bio_zone_write_plugging(bio)) {
> +		nr_segs = bio->__bi_nr_segments;
> +		goto new_request;
> +	}
> +
>  	bio = blk_queue_bounce(bio, q);
>  	bio_set_ioprio(bio);
>  
> @@ -3001,7 +3018,11 @@ void blk_mq_submit_bio(struct bio *bio)
>  	if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
>  		goto queue_exit;
>  
> +	if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
> +		goto queue_exit;
> +
>  	if (!rq) {
> +new_request:
>  		rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
>  		if (unlikely(!rq))
>  			goto queue_exit;
> @@ -3017,8 +3038,12 @@ void blk_mq_submit_bio(struct bio *bio)
>  
>  	ret = blk_crypto_rq_get_keyslot(rq);
>  	if (ret != BLK_STS_OK) {
> +		bool zwplugging = bio_zone_write_plugging(bio);
> +
>  		bio->bi_status = ret;
>  		bio_endio(bio);
> +		if (zwplugging)
> +			blk_zone_write_plug_complete_request(rq);
>  		blk_mq_free_request(rq);
>  		return;
>  	}
> @@ -3026,6 +3051,9 @@ void blk_mq_submit_bio(struct bio *bio)
>  	if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
>  		return;
>  
> +	if (bio_zone_write_plugging(bio))
> +		blk_zone_write_plug_attempt_merge(rq);
> +
>  	if (plug) {
>  		blk_add_rq_to_plug(plug, rq);
>  		return;
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index d343e5756a9c..f6d4f511b664 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -7,11 +7,11 @@
>   *
>   * Copyright (c) 2016, Damien Le Moal
>   * Copyright (c) 2016, Western Digital
> + * Copyright (c) 2024, Western Digital Corporation or its affiliates.
>   */
>  
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> -#include <linux/rbtree.h>
>  #include <linux/blkdev.h>
>  #include <linux/blk-mq.h>
>  #include <linux/mm.h>
> @@ -19,6 +19,7 @@
>  #include <linux/sched/mm.h>
>  
>  #include "blk.h"
> +#include "blk-mq-sched.h"
>  
>  #define ZONE_COND_NAME(name) [BLK_ZONE_COND_##name] = #name
>  static const char *const zone_cond_name[] = {
> @@ -33,6 +34,27 @@ static const char *const zone_cond_name[] = {
>  };
>  #undef ZONE_COND_NAME
>  
> +/*
> + * Per-zone write plug.
> + */
> +struct blk_zone_wplug {
> +	spinlock_t		lock;
> +	unsigned int		flags;
> +	struct bio_list		bio_list;
> +	struct work_struct	bio_work;
> +};
> +
> +/*
> + * Zone write plug flags bits:
> + *  - BLK_ZONE_WPLUG_CONV: Indicate that the zone is a conventional one. Writes
> + *    to these zones are never plugged.
> + *  - BLK_ZONE_WPLUG_PLUGGED: Indicate that the zone write plug is plugged,
> + *    that is, that write BIOs are being throttled due to a write BIO already
> + *    being executed or the zone write plug bio list is not empty.
> + */
> +#define BLK_ZONE_WPLUG_CONV	(1U << 0)
> +#define BLK_ZONE_WPLUG_PLUGGED	(1U << 1)

BLK_ZONE_WPLUG_PLUGGED == !bio_list_empty(&zwplug->bio_list), so looks
this flag isn't necessary.

> +
>  /**
>   * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX.
>   * @zone_cond: BLK_ZONE_COND_XXX.
> @@ -429,12 +451,374 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
>  	return ret;
>  }
>  
> -void disk_free_zone_bitmaps(struct gendisk *disk)
> +#define blk_zone_wplug_lock(zwplug, flags) \
> +	spin_lock_irqsave(&zwplug->lock, flags)
> +
> +#define blk_zone_wplug_unlock(zwplug, flags) \
> +	spin_unlock_irqrestore(&zwplug->lock, flags)
> +
> +static inline void blk_zone_wplug_bio_io_error(struct bio *bio)
> +{
> +	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +
> +	bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +	bio_io_error(bio);
> +	blk_queue_exit(q);
> +}
> +
> +static int blk_zone_wplug_abort(struct gendisk *disk,
> +				struct blk_zone_wplug *zwplug)
> +{
> +	struct bio *bio;
> +	int nr_aborted = 0;
> +
> +	while ((bio = bio_list_pop(&zwplug->bio_list))) {
> +		blk_zone_wplug_bio_io_error(bio);
> +		nr_aborted++;
> +	}
> +
> +	return nr_aborted;
> +}
> +
> +/*
> + * Return the zone write plug for sector in sequential write required zone.
> + * Given that conventional zones have no write ordering constraints, NULL is
> + * returned for sectors in conventional zones, to indicate that zone write
> + * plugging is not needed.
> + */
> +static inline struct blk_zone_wplug *
> +disk_lookup_zone_wplug(struct gendisk *disk, sector_t sector)
> +{
> +	struct blk_zone_wplug *zwplug;
> +
> +	if (WARN_ON_ONCE(!disk->zone_wplugs))
> +		return NULL;
> +
> +	zwplug = &disk->zone_wplugs[disk_zone_no(disk, sector)];
> +	if (zwplug->flags & BLK_ZONE_WPLUG_CONV)
> +		return NULL;
> +	return zwplug;
> +}
> +
> +static inline struct blk_zone_wplug *bio_lookup_zone_wplug(struct bio *bio)
> +{
> +	return disk_lookup_zone_wplug(bio->bi_bdev->bd_disk,
> +				      bio->bi_iter.bi_sector);
> +}
> +
> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
> +					  struct bio *bio, unsigned int nr_segs)
> +{
> +	/*
> +	 * Keep a reference on the BIO request queue usage. This reference will
> +	 * be dropped either if the BIO is failed or after it is issued and
> +	 * completes.
> +	 */
> +	percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);

It is fragile to get nested usage_counter, and same with grabbing/releasing it
from different contexts or even functions, and it could be much better to just
let block layer maintain it.

From patch 23's change:

+	 * Zoned block device information. Reads of this information must be
+	 * protected with blk_queue_enter() / blk_queue_exit(). Modifying this

Anytime if there is in-flight bio, the block device is opened, so both gendisk and
request_queue are live, so not sure if this .q_usage_counter protection
is needed.

+	 * information is only allowed while no requests are being processed.
+	 * See also blk_mq_freeze_queue() and blk_mq_unfreeze_queue().
 	 */

> +
> +	/*
> +	 * The BIO is being plugged and thus will have to wait for the on-going
> +	 * write and for all other writes already plugged. So polling makes
> +	 * no sense.
> +	 */
> +	bio_clear_polled(bio);
> +
> +	/*
> +	 * Reuse the poll cookie field to store the number of segments when
> +	 * split to the hardware limits.
> +	 */
> +	bio->__bi_nr_segments = nr_segs;
> +
> +	/*
> +	 * We always receive BIOs after they are split and ready to be issued.
> +	 * The block layer passes the parts of a split BIO in order, and the
> +	 * user must also issue write sequentially. So simply add the new BIO
> +	 * at the tail of the list to preserve the sequential write order.
> +	 */
> +	bio_list_add(&zwplug->bio_list, bio);
> +}
> +
> +/*
> + * Called from bio_attempt_back_merge() when a BIO was merged with a request.
> + */
> +void blk_zone_write_plug_bio_merged(struct bio *bio)
> +{
> +	bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +}
> +
> +/*
> + * Attempt to merge plugged BIOs with a newly formed request of a BIO that went
> + * through zone write plugging (either a new BIO or one that was unplugged).
> + */
> +void blk_zone_write_plug_attempt_merge(struct request *req)
> +{
> +	struct blk_zone_wplug *zwplug = bio_lookup_zone_wplug(req->bio);
> +	sector_t req_back_sector = blk_rq_pos(req) + blk_rq_sectors(req);
> +	struct request_queue *q = req->q;
> +	unsigned long flags;
> +	struct bio *bio;
> +
> +	/*
> +	 * Completion of this request needs to be handled with
> +	 * blk_zone_write_complete_request().
> +	 */
> +	req->rq_flags |= RQF_ZONE_WRITE_PLUGGING;
> +
> +	if (blk_queue_nomerges(q))
> +		return;
> +
> +	/*
> +	 * Walk through the list of plugged BIOs to check if they can be merged
> +	 * into the back of the request.
> +	 */
> +	blk_zone_wplug_lock(zwplug, flags);
> +	while ((bio = bio_list_peek(&zwplug->bio_list))) {
> +		if (bio->bi_iter.bi_sector != req_back_sector ||
> +		    !blk_rq_merge_ok(req, bio))
> +			break;
> +
> +		WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> +			     !bio->__bi_nr_segments);
> +
> +		bio_list_pop(&zwplug->bio_list);
> +		if (bio_attempt_back_merge(req, bio, bio->__bi_nr_segments) !=
> +		    BIO_MERGE_OK) {
> +			bio_list_add_head(&zwplug->bio_list, bio);
> +			break;
> +		}
> +
> +		/*
> +		 * Drop the extra reference on the queue usage we got when
> +		 * plugging the BIO.
> +		 */
> +		blk_queue_exit(q);
> +
> +		req_back_sector += bio_sectors(bio);
> +	}
> +	blk_zone_wplug_unlock(zwplug, flags);
> +}
> +
> +static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
> +{
> +	struct blk_zone_wplug *zwplug;
> +	unsigned long flags;
> +
> +	/*
> +	 * BIOs must be fully contained within a zone so that we use the correct
> +	 * zone write plug for the entire BIO. For blk-mq devices, the block
> +	 * layer should already have done any splitting required to ensure this
> +	 * and this BIO should thus not be straddling zone boundaries. For
> +	 * BIO-based devices, it is the responsibility of the driver to split
> +	 * the bio before submitting it.
> +	 */
> +	if (WARN_ON_ONCE(bio_straddle_zones(bio))) {
> +		bio_io_error(bio);
> +		return true;
> +	}
> +
> +	zwplug = bio_lookup_zone_wplug(bio);
> +	if (!zwplug)
> +		return false;
> +
> +	blk_zone_wplug_lock(zwplug, flags);
> +
> +	/* Indicate that this BIO is being handled using zone write plugging. */
> +	bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +
> +	/*
> +	 * If the zone is already plugged, add the BIO to the plug BIO list.
> +	 * Otherwise, plug and let the BIO execute.
> +	 */
> +	if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) {
> +		blk_zone_wplug_add_bio(zwplug, bio, nr_segs);
> +		blk_zone_wplug_unlock(zwplug, flags);
> +		return true;
> +	}
> +
> +	zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
> +
> +	blk_zone_wplug_unlock(zwplug, flags);
> +
> +	return false;
> +}
> +
> +/**
> + * blk_zone_write_plug_bio - Handle a zone write BIO with zone write plugging
> + * @bio: The BIO being submitted
> + *
> + * Handle write and write zeroes operations using zone write plugging.
> + * Return true whenever @bio execution needs to be delayed through the zone
> + * write plug. Otherwise, return false to let the submission path process
> + * @bio normally.
> + */
> +bool blk_zone_write_plug_bio(struct bio *bio, unsigned int nr_segs)
> +{
> +	if (!bio->bi_bdev->bd_disk->zone_wplugs)
> +		return false;
> +
> +	/*
> +	 * If the BIO already has the plugging flag set, then it was already
> +	 * handled through this path and this is a submission from the zone
> +	 * plug bio submit work.
> +	 */
> +	if (bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING))
> +		return false;
> +
> +	/*
> +	 * We do not need to do anything special for empty flush BIOs, e.g
> +	 * BIOs such as issued by blkdev_issue_flush(). The is because it is
> +	 * the responsibility of the user to first wait for the completion of
> +	 * write operations for flush to have any effect on the persistence of
> +	 * the written data.
> +	 */
> +	if (op_is_flush(bio->bi_opf) && !bio_sectors(bio))
> +		return false;
> +
> +	/*
> +	 * Regular writes and write zeroes need to be handled through the target
> +	 * zone write plug. This includes writes with REQ_FUA | REQ_PREFLUSH
> +	 * which may need to go through the flush machinery depending on the
> +	 * target device capabilities. Plugging such writes is fine as the flush
> +	 * machinery operates at the request level, below the plug, and
> +	 * completion of the flush sequence will go through the regular BIO
> +	 * completion, which will handle zone write plugging.
> +	 */
> +	switch (bio_op(bio)) {
> +	case REQ_OP_WRITE:
> +	case REQ_OP_WRITE_ZEROES:
> +		return blk_zone_wplug_handle_write(bio, nr_segs);
> +	default:
> +		return false;
> +	}
> +
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(blk_zone_write_plug_bio);
> +
> +static void blk_zone_write_plug_unplug_bio(struct blk_zone_wplug *zwplug)
> +{
> +	unsigned long flags;
> +
> +	blk_zone_wplug_lock(zwplug, flags);
> +
> +	/* Schedule submission of the next plugged BIO if we have one. */
> +	if (!bio_list_empty(&zwplug->bio_list))
> +		kblockd_schedule_work(&zwplug->bio_work);
> +	else
> +		zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
> +
> +	blk_zone_wplug_unlock(zwplug, flags);
> +}
> +
> +void blk_zone_write_plug_bio_endio(struct bio *bio)
> +{
> +	/* Make sure we do not see this BIO again by clearing the plug flag. */
> +	bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +
> +	/*
> +	 * For BIO-based devices, blk_zone_write_plug_complete_request()
> +	 * is not called. So we need to schedule execution of the next
> +	 * plugged BIO here.
> +	 */
> +	if (bio->bi_bdev->bd_has_submit_bio) {
> +		struct blk_zone_wplug *zwplug = bio_lookup_zone_wplug(bio);
> +
> +		blk_zone_write_plug_unplug_bio(zwplug);
> +	}
> +}
> +
> +void blk_zone_write_plug_complete_request(struct request *req)
> +{
> +	struct gendisk *disk = req->q->disk;
> +	struct blk_zone_wplug *zwplug =
> +		disk_lookup_zone_wplug(disk, req->__sector);
> +
> +	req->rq_flags &= ~RQF_ZONE_WRITE_PLUGGING;
> +
> +	blk_zone_write_plug_unplug_bio(zwplug);
> +}
> +
> +static void blk_zone_wplug_bio_work(struct work_struct *work)
> +{
> +	struct blk_zone_wplug *zwplug =
> +		container_of(work, struct blk_zone_wplug, bio_work);
> +	unsigned long flags;
> +	struct bio *bio;
> +
> +	/*
> +	 * Unplug and submit the next plugged BIO. If we do not have any, clear
> +	 * the plugged flag.
> +	 */
> +	blk_zone_wplug_lock(zwplug, flags);
> +
> +	bio = bio_list_pop(&zwplug->bio_list);
> +	if (!bio) {
> +		zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
> +		blk_zone_wplug_unlock(zwplug, flags);
> +		return;
> +	}
> +
> +	blk_zone_wplug_unlock(zwplug, flags);
> +
> +	/*
> +	 * blk-mq devices will reuse the reference on the request queue usage
> +	 * we took when the BIO was plugged, but the submission path for
> +	 * BIO-based devices will not do that. So drop this reference here.
> +	 */
> +	if (bio->bi_bdev->bd_has_submit_bio)
> +		blk_queue_exit(bio->bi_bdev->bd_disk->queue);

But I don't see where this reference is reused for blk-mq in this patch,
care to point it out?


Thanks, 
Ming


  reply	other threads:[~2024-02-04  3:56 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  7:30 [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-02  7:30 ` [PATCH 01/26] block: Restore sector of flush requests Damien Le Moal
2024-02-04 11:55   ` Hannes Reinecke
2024-02-05 17:22   ` Bart Van Assche
2024-02-05 23:42     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 02/26] block: Remove req_bio_endio() Damien Le Moal
2024-02-04 11:57   ` Hannes Reinecke
2024-02-05 17:28   ` Bart Van Assche
2024-02-05 23:45     ` Damien Le Moal
2024-02-09  6:53     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-02-03  4:09   ` Bart Van Assche
2024-02-04 11:58   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 04/26] block: Introduce blk_zone_complete_request_bio() Damien Le Moal
2024-02-04 11:59   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-02-03  4:11   ` Bart Van Assche
2024-02-04 12:00   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 06/26] block: Introduce zone write plugging Damien Le Moal
2024-02-04  3:56   ` Ming Lei [this message]
2024-02-04 23:57     ` Damien Le Moal
2024-02-05  2:19       ` Ming Lei
2024-02-05  2:41         ` Damien Le Moal
2024-02-05  3:38           ` Ming Lei
2024-02-05  5:11           ` Christoph Hellwig
2024-02-05  5:37             ` Damien Le Moal
2024-02-05  5:50               ` Christoph Hellwig
2024-02-05  6:14                 ` Damien Le Moal
2024-02-05 10:06           ` Ming Lei
2024-02-05 12:20             ` Damien Le Moal
2024-02-05 12:43               ` Damien Le Moal
2024-02-04 12:14   ` Hannes Reinecke
2024-02-05 17:48   ` Bart Van Assche
2024-02-05 23:48     ` Damien Le Moal
2024-02-06  0:52       ` Bart Van Assche
2024-02-02  7:30 ` [PATCH 07/26] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-02-04 12:15   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 08/26] block: Implement zone append emulation Damien Le Moal
2024-02-04 12:24   ` Hannes Reinecke
2024-02-05  0:10     ` Damien Le Moal
2024-02-05 17:58   ` Bart Van Assche
2024-02-05 23:57     ` Damien Le Moal
2024-02-02  7:30 ` [PATCH 09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-02-04 12:26   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 10/26] dm: Use the block layer zone append emulation Damien Le Moal
2024-02-03 17:58   ` Mike Snitzer
2024-02-05  5:38     ` Damien Le Moal
2024-02-05 20:33       ` Mike Snitzer
2024-02-05 23:40         ` Damien Le Moal
2024-02-06 20:41           ` Mike Snitzer
2024-02-04 12:30   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 11/26] scsi: sd: " Damien Le Moal
2024-02-04 12:29   ` Hannes Reinecke
2024-02-06  1:55   ` Martin K. Petersen
2024-02-02  7:30 ` [PATCH 12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-02-04 12:31   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 13/26] null_blk: " Damien Le Moal
2024-02-04 12:31   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 14/26] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-02-04 12:32   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 15/26] null_blk: Introduce fua attribute Damien Le Moal
2024-02-04 12:33   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-02-04 12:34   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 17/26] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-02-04 12:34   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 18/26] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-02-04 12:35   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 19/26] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-02-04 12:36   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 20/26] block: Remove elevator required features Damien Le Moal
2024-02-04 12:36   ` Hannes Reinecke
2024-02-02  7:30 ` [PATCH 21/26] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-02-04 12:37   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 22/26] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-02-04 12:38   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 23/26] block: Remove zone write locking Damien Le Moal
2024-02-04 12:38   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 24/26] block: Do not special-case plugging of zone write operations Damien Le Moal
2024-02-04 12:39   ` Hannes Reinecke
2024-02-02  7:31 ` [PATCH 25/26] block: Reduce zone write plugging memory usage Damien Le Moal
2024-02-04 12:42   ` Hannes Reinecke
2024-02-05 17:51     ` Bart Van Assche
2024-02-05 23:55       ` Damien Le Moal
2024-02-06 21:20         ` Bart Van Assche
2024-02-09  3:58           ` Damien Le Moal
2024-02-09 19:36             ` Bart Van Assche
2024-02-10  0:06               ` Damien Le Moal
2024-02-11  3:40                 ` Bart Van Assche
2024-02-12  1:09                   ` Damien Le Moal
2024-02-12 18:58                     ` Bart Van Assche
2024-02-12  8:23                   ` Damien Le Moal
2024-02-12  8:47                     ` Damien Le Moal
2024-02-12 18:40                       ` Bart Van Assche
2024-02-13  0:05                         ` Damien Le Moal
2024-02-02  7:31 ` [PATCH 26/26] block: Add zone_active_wplugs debugfs entry Damien Le Moal
2024-02-04 12:43   ` Hannes Reinecke
2024-02-02  7:37 ` [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-03 12:11   ` Jens Axboe
2024-02-09  5:28     ` Damien Le Moal
2024-02-05 17:21 ` Bart Van Assche
2024-02-05 23:42   ` Damien Le Moal
2024-02-06  0:57     ` Bart Van Assche
2024-02-05 18:18 ` Bart Van Assche
2024-02-06  0:07   ` Damien Le Moal
2024-02-06  1:25     ` Bart Van Assche
2024-02-09  4:03       ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zb8K4uSN3SNeqrPI@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.