From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-scsi@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>,
Christoph Hellwig <hch@lst.de>,
ming.lei@redhat.com
Subject: Re: [PATCH 06/26] block: Introduce zone write plugging
Date: Sun, 4 Feb 2024 11:56:18 +0800 [thread overview]
Message-ID: <Zb8K4uSN3SNeqrPI@fedora> (raw)
In-Reply-To: <20240202073104.2418230-7-dlemoal@kernel.org>
On Fri, Feb 02, 2024 at 04:30:44PM +0900, Damien Le Moal wrote:
> Zone write plugging implements a per-zone "plug" for write operations to
> tightly control the submission and execution order of writes to
> sequential write required zones of a zoned block device. Per-zone
> plugging of writes guarantees that at any time at most one write request
> per zone is in flight. This mechanism is intended to replace zone write
> locking which is controlled at the scheduler level and implemented only
> by mq-deadline.
>
> Unlike zone write locking which operates on requests, zone write
> plugging operates on BIOs. A zone write plug is simply a BIO list that
> is atomically manipulated using a spinlock and a kblockd submission
> work. A write BIO to a zone is "plugged" to delay its execution if a
> write BIO for the same zone was already issued, that is, if a write
> request for the same zone is being executed. The next plugged BIO is
> unplugged and issued once the write request completes.
>
> This mechanism allows to:
> - Untangles zone write ordering from block IO schedulers. This allows
> removing the restriction on using only mq-deadline for zoned block
> devices. Any block IO scheduler, including "none" can be used.
> - Zone write plugging operates on BIOs instead of requests. Plugged
> BIOs waiting for execution thus do not hold scheduling tags and thus
> are not preventing other BIOs to proceed (reads or writes to other
> zones). Depending on the workload, this can significantly improve
> the device use and performance.
> - Both blk-mq (request) based zoned devices and BIO-based devices (e.g.
> device mapper) can use zone write plugging. It is mandatory for the
> former but optional for the latter: BIO-based driver can use zone
> write plugging to implement write ordering guarantees, or the drivers
> can implement their own if needed.
> - The code is less invasive in the block layer and is mostly limited to
> blk-zoned.c with some small changes in blk-mq.c, blk-merge.c and
> bio.c.
>
> Zone write plugging is implemented using struct blk_zone_wplug. This
> structurei includes a spinlock, a BIO list and a work structure to
> handle the submission of plugged BIOs.
>
> Plugging of zone write BIOs is done using the function
> blk_zone_write_plug_bio() which returns false if a BIO execution does
> not need to be delayed and true otherwise. This function is called
> from blk_mq_submit_bio() after a BIO is split to avoid large BIOs
> spanning multiple zones which would cause mishandling of zone write
> plugging. This enables by default zone write plugging for any mq
> request-based block device. BIO-based device drivers can also use zone
> write plugging by expliclty calling blk_zone_write_plug_bio() in their
> ->submit_bio method. For such devices, the driver must ensure that a
> BIO passed to blk_zone_write_plug_bio() is already split and not
> straddling zone boundaries.
>
> Only write and write zeroes BIOs are plugged. Zone write plugging does
> not introduce any significant overhead for other operations. A BIO that
> is being handled through zone write plugging is flagged using the new
> BIO flag BIO_ZONE_WRITE_PLUGGING. A request handling a BIO flagged with
> this new flag is flagged with the new RQF_ZONE_WRITE_PLUGGING flag.
> The completion processing of BIOs and requests flagged trigger
> respectively calls to the functions blk_zone_write_plug_bio_endio() and
> blk_zone_write_plug_complete_request(). The latter function is used to
> trigger submission of the next plugged BIO using the zone plug work.
> blk_zone_write_plug_bio_endio() does the same for BIO-based devices.
> This ensures that at any time, at most one request (blk-mq devices) or
> one BIO (BIO-based devices) are being executed for any zone. The
> handling of zone write plug using a per-zone plug spinlock maximizes
> parrallelism and device usage by allowing multiple zones to be writen
> simultaneously without lock contention.
>
> Zone write plugging ignores flush BIOs without data. Hovever, any flush
> BIO that has data is always plugged so that the write part of the flush
> sequence is serialized with other regular writes.
>
> Given that any BIO handled through zone write plugging will be the only
> BIO in flight for the target zone when it is executed, the unplugging
> and submission of a BIO will have no chance of successfully merging with
> plugged requests or requests in the scheduler. To overcome this
> potential performance loss, blk_mq_submit_bio() calls the function
> blk_zone_write_plug_attempt_merge() to try to merge other plugged BIOs
> with the one just unplugged. Successful merging is signaled using
> blk_zone_write_plug_bio_merged(), called from bio_attempt_back_merge().
> Furthermore, to avoid recalculating the number of segments of plugged
> BIOs to attempt merging, the number of segments of a plugged BIO is
> saved using the new struct bio field __bi_nr_segments. To avoid growing
> the size of struct bio, this field is added as a union with the
> bio_cookie field. This is safe to do as polling is always disabled for
> plugged BIOs.
>
> When BIOs are plugged in a zone write plug, the device request queue
> usage counter is always incremented. This kept and reused when the
> plugged BIO is unplugged and submitted again using
> submit_bio_noacct_nocheck(). For this case, the unplugged BIO is already
> flagged with BIO_ZONE_WRITE_PLUGGING and blk_mq_submit_bio() proceeds
> directly to allocating a new request for the BIO, re-using the usage
> reference count taken when the BIO was plugged. This extra reference
> count is dropped in blk_zone_write_plug_attempt_merge() for any plugged
> BIO that is successfully merged. Given that BIO-based devices will not
> take this path, the extra reference is dropped when a plugged BIO is
> unplugged and submitted.
>
> To match the new data structures used for zoned disks, the function
> disk_free_zone_bitmaps() is renamed to the more generic
> disk_free_zone_resources().
>
> This commit contains contributions from Christoph Hellwig <hch@lst.de>.
>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> block/bio.c | 7 +
> block/blk-merge.c | 11 +
> block/blk-mq.c | 28 +++
> block/blk-zoned.c | 408 +++++++++++++++++++++++++++++++++++++-
> block/blk.h | 32 ++-
> block/genhd.c | 2 +-
> include/linux/blk-mq.h | 2 +
> include/linux/blk_types.h | 8 +-
> include/linux/blkdev.h | 8 +
> 9 files changed, 496 insertions(+), 10 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index b9642a41f286..c8b0f7e8c713 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1581,6 +1581,13 @@ void bio_endio(struct bio *bio)
> if (!bio_integrity_endio(bio))
> return;
>
> + /*
> + * For BIOs handled through a zone write plugs, signal the end of the
> + * BIO to the zone write plug to submit the next plugged BIO.
> + */
> + if (bio_zone_write_plugging(bio))
> + blk_zone_write_plug_bio_endio(bio);
> +
> rq_qos_done_bio(bio);
>
> if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index a1ef61b03e31..2b5489cd9c65 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -377,6 +377,7 @@ struct bio *__bio_split_to_limits(struct bio *bio,
> blkcg_bio_issue_init(split);
> bio_chain(split, bio);
> trace_block_split(split, bio->bi_iter.bi_sector);
> + WARN_ON_ONCE(bio_zone_write_plugging(bio));
> submit_bio_noacct(bio);
> return split;
> }
> @@ -980,6 +981,9 @@ enum bio_merge_status bio_attempt_back_merge(struct request *req,
>
> blk_update_mixed_merge(req, bio, false);
>
> + if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> + blk_zone_write_plug_bio_merged(bio);
> +
> req->biotail->bi_next = bio;
> req->biotail = bio;
> req->__data_len += bio->bi_iter.bi_size;
> @@ -995,6 +999,13 @@ static enum bio_merge_status bio_attempt_front_merge(struct request *req,
> {
> const blk_opf_t ff = bio_failfast(bio);
>
> + /*
> + * A front merge for zone writes can happen only if the user submitted
> + * writes out of order. Do not attempt this to let the write fail.
> + */
> + if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> + return BIO_MERGE_FAILED;
> +
> if (!ll_front_merge_fn(req, bio, nr_segs))
> return BIO_MERGE_FAILED;
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f02e486a02ae..aa49bebf1199 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -830,6 +830,9 @@ static void blk_complete_request(struct request *req)
> bio = next;
> } while (bio);
>
> + if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> + blk_zone_write_plug_complete_request(req);
> +
> /*
> * Reset counters so that the request stacking driver
> * can find how many bytes remain in the request
> @@ -943,6 +946,9 @@ bool blk_update_request(struct request *req, blk_status_t error,
> * completely done
> */
> if (!req->bio) {
> + if (req->rq_flags & RQF_ZONE_WRITE_PLUGGING)
> + blk_zone_write_plug_complete_request(req);
> +
> /*
> * Reset counters so that the request stacking driver
> * can find how many bytes remain in the request
> @@ -2975,6 +2981,17 @@ void blk_mq_submit_bio(struct bio *bio)
> struct request *rq;
> blk_status_t ret;
>
> + /*
> + * A BIO that was released form a zone write plug has already been
> + * through the preparation in this function, already holds a reference
> + * on the queue usage counter, and is the only write BIO in-flight for
> + * the target zone. Go straight to allocating a request for it.
> + */
> + if (bio_zone_write_plugging(bio)) {
> + nr_segs = bio->__bi_nr_segments;
> + goto new_request;
> + }
> +
> bio = blk_queue_bounce(bio, q);
> bio_set_ioprio(bio);
>
> @@ -3001,7 +3018,11 @@ void blk_mq_submit_bio(struct bio *bio)
> if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
> goto queue_exit;
>
> + if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
> + goto queue_exit;
> +
> if (!rq) {
> +new_request:
> rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
> if (unlikely(!rq))
> goto queue_exit;
> @@ -3017,8 +3038,12 @@ void blk_mq_submit_bio(struct bio *bio)
>
> ret = blk_crypto_rq_get_keyslot(rq);
> if (ret != BLK_STS_OK) {
> + bool zwplugging = bio_zone_write_plugging(bio);
> +
> bio->bi_status = ret;
> bio_endio(bio);
> + if (zwplugging)
> + blk_zone_write_plug_complete_request(rq);
> blk_mq_free_request(rq);
> return;
> }
> @@ -3026,6 +3051,9 @@ void blk_mq_submit_bio(struct bio *bio)
> if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
> return;
>
> + if (bio_zone_write_plugging(bio))
> + blk_zone_write_plug_attempt_merge(rq);
> +
> if (plug) {
> blk_add_rq_to_plug(plug, rq);
> return;
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index d343e5756a9c..f6d4f511b664 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -7,11 +7,11 @@
> *
> * Copyright (c) 2016, Damien Le Moal
> * Copyright (c) 2016, Western Digital
> + * Copyright (c) 2024, Western Digital Corporation or its affiliates.
> */
>
> #include <linux/kernel.h>
> #include <linux/module.h>
> -#include <linux/rbtree.h>
> #include <linux/blkdev.h>
> #include <linux/blk-mq.h>
> #include <linux/mm.h>
> @@ -19,6 +19,7 @@
> #include <linux/sched/mm.h>
>
> #include "blk.h"
> +#include "blk-mq-sched.h"
>
> #define ZONE_COND_NAME(name) [BLK_ZONE_COND_##name] = #name
> static const char *const zone_cond_name[] = {
> @@ -33,6 +34,27 @@ static const char *const zone_cond_name[] = {
> };
> #undef ZONE_COND_NAME
>
> +/*
> + * Per-zone write plug.
> + */
> +struct blk_zone_wplug {
> + spinlock_t lock;
> + unsigned int flags;
> + struct bio_list bio_list;
> + struct work_struct bio_work;
> +};
> +
> +/*
> + * Zone write plug flags bits:
> + * - BLK_ZONE_WPLUG_CONV: Indicate that the zone is a conventional one. Writes
> + * to these zones are never plugged.
> + * - BLK_ZONE_WPLUG_PLUGGED: Indicate that the zone write plug is plugged,
> + * that is, that write BIOs are being throttled due to a write BIO already
> + * being executed or the zone write plug bio list is not empty.
> + */
> +#define BLK_ZONE_WPLUG_CONV (1U << 0)
> +#define BLK_ZONE_WPLUG_PLUGGED (1U << 1)
BLK_ZONE_WPLUG_PLUGGED == !bio_list_empty(&zwplug->bio_list), so looks
this flag isn't necessary.
> +
> /**
> * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX.
> * @zone_cond: BLK_ZONE_COND_XXX.
> @@ -429,12 +451,374 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
> return ret;
> }
>
> -void disk_free_zone_bitmaps(struct gendisk *disk)
> +#define blk_zone_wplug_lock(zwplug, flags) \
> + spin_lock_irqsave(&zwplug->lock, flags)
> +
> +#define blk_zone_wplug_unlock(zwplug, flags) \
> + spin_unlock_irqrestore(&zwplug->lock, flags)
> +
> +static inline void blk_zone_wplug_bio_io_error(struct bio *bio)
> +{
> + struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +
> + bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> + bio_io_error(bio);
> + blk_queue_exit(q);
> +}
> +
> +static int blk_zone_wplug_abort(struct gendisk *disk,
> + struct blk_zone_wplug *zwplug)
> +{
> + struct bio *bio;
> + int nr_aborted = 0;
> +
> + while ((bio = bio_list_pop(&zwplug->bio_list))) {
> + blk_zone_wplug_bio_io_error(bio);
> + nr_aborted++;
> + }
> +
> + return nr_aborted;
> +}
> +
> +/*
> + * Return the zone write plug for sector in sequential write required zone.
> + * Given that conventional zones have no write ordering constraints, NULL is
> + * returned for sectors in conventional zones, to indicate that zone write
> + * plugging is not needed.
> + */
> +static inline struct blk_zone_wplug *
> +disk_lookup_zone_wplug(struct gendisk *disk, sector_t sector)
> +{
> + struct blk_zone_wplug *zwplug;
> +
> + if (WARN_ON_ONCE(!disk->zone_wplugs))
> + return NULL;
> +
> + zwplug = &disk->zone_wplugs[disk_zone_no(disk, sector)];
> + if (zwplug->flags & BLK_ZONE_WPLUG_CONV)
> + return NULL;
> + return zwplug;
> +}
> +
> +static inline struct blk_zone_wplug *bio_lookup_zone_wplug(struct bio *bio)
> +{
> + return disk_lookup_zone_wplug(bio->bi_bdev->bd_disk,
> + bio->bi_iter.bi_sector);
> +}
> +
> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
> + struct bio *bio, unsigned int nr_segs)
> +{
> + /*
> + * Keep a reference on the BIO request queue usage. This reference will
> + * be dropped either if the BIO is failed or after it is issued and
> + * completes.
> + */
> + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
It is fragile to get nested usage_counter, and same with grabbing/releasing it
from different contexts or even functions, and it could be much better to just
let block layer maintain it.
From patch 23's change:
+ * Zoned block device information. Reads of this information must be
+ * protected with blk_queue_enter() / blk_queue_exit(). Modifying this
Anytime if there is in-flight bio, the block device is opened, so both gendisk and
request_queue are live, so not sure if this .q_usage_counter protection
is needed.
+ * information is only allowed while no requests are being processed.
+ * See also blk_mq_freeze_queue() and blk_mq_unfreeze_queue().
*/
> +
> + /*
> + * The BIO is being plugged and thus will have to wait for the on-going
> + * write and for all other writes already plugged. So polling makes
> + * no sense.
> + */
> + bio_clear_polled(bio);
> +
> + /*
> + * Reuse the poll cookie field to store the number of segments when
> + * split to the hardware limits.
> + */
> + bio->__bi_nr_segments = nr_segs;
> +
> + /*
> + * We always receive BIOs after they are split and ready to be issued.
> + * The block layer passes the parts of a split BIO in order, and the
> + * user must also issue write sequentially. So simply add the new BIO
> + * at the tail of the list to preserve the sequential write order.
> + */
> + bio_list_add(&zwplug->bio_list, bio);
> +}
> +
> +/*
> + * Called from bio_attempt_back_merge() when a BIO was merged with a request.
> + */
> +void blk_zone_write_plug_bio_merged(struct bio *bio)
> +{
> + bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +}
> +
> +/*
> + * Attempt to merge plugged BIOs with a newly formed request of a BIO that went
> + * through zone write plugging (either a new BIO or one that was unplugged).
> + */
> +void blk_zone_write_plug_attempt_merge(struct request *req)
> +{
> + struct blk_zone_wplug *zwplug = bio_lookup_zone_wplug(req->bio);
> + sector_t req_back_sector = blk_rq_pos(req) + blk_rq_sectors(req);
> + struct request_queue *q = req->q;
> + unsigned long flags;
> + struct bio *bio;
> +
> + /*
> + * Completion of this request needs to be handled with
> + * blk_zone_write_complete_request().
> + */
> + req->rq_flags |= RQF_ZONE_WRITE_PLUGGING;
> +
> + if (blk_queue_nomerges(q))
> + return;
> +
> + /*
> + * Walk through the list of plugged BIOs to check if they can be merged
> + * into the back of the request.
> + */
> + blk_zone_wplug_lock(zwplug, flags);
> + while ((bio = bio_list_peek(&zwplug->bio_list))) {
> + if (bio->bi_iter.bi_sector != req_back_sector ||
> + !blk_rq_merge_ok(req, bio))
> + break;
> +
> + WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE_ZEROES &&
> + !bio->__bi_nr_segments);
> +
> + bio_list_pop(&zwplug->bio_list);
> + if (bio_attempt_back_merge(req, bio, bio->__bi_nr_segments) !=
> + BIO_MERGE_OK) {
> + bio_list_add_head(&zwplug->bio_list, bio);
> + break;
> + }
> +
> + /*
> + * Drop the extra reference on the queue usage we got when
> + * plugging the BIO.
> + */
> + blk_queue_exit(q);
> +
> + req_back_sector += bio_sectors(bio);
> + }
> + blk_zone_wplug_unlock(zwplug, flags);
> +}
> +
> +static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
> +{
> + struct blk_zone_wplug *zwplug;
> + unsigned long flags;
> +
> + /*
> + * BIOs must be fully contained within a zone so that we use the correct
> + * zone write plug for the entire BIO. For blk-mq devices, the block
> + * layer should already have done any splitting required to ensure this
> + * and this BIO should thus not be straddling zone boundaries. For
> + * BIO-based devices, it is the responsibility of the driver to split
> + * the bio before submitting it.
> + */
> + if (WARN_ON_ONCE(bio_straddle_zones(bio))) {
> + bio_io_error(bio);
> + return true;
> + }
> +
> + zwplug = bio_lookup_zone_wplug(bio);
> + if (!zwplug)
> + return false;
> +
> + blk_zone_wplug_lock(zwplug, flags);
> +
> + /* Indicate that this BIO is being handled using zone write plugging. */
> + bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +
> + /*
> + * If the zone is already plugged, add the BIO to the plug BIO list.
> + * Otherwise, plug and let the BIO execute.
> + */
> + if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) {
> + blk_zone_wplug_add_bio(zwplug, bio, nr_segs);
> + blk_zone_wplug_unlock(zwplug, flags);
> + return true;
> + }
> +
> + zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
> +
> + blk_zone_wplug_unlock(zwplug, flags);
> +
> + return false;
> +}
> +
> +/**
> + * blk_zone_write_plug_bio - Handle a zone write BIO with zone write plugging
> + * @bio: The BIO being submitted
> + *
> + * Handle write and write zeroes operations using zone write plugging.
> + * Return true whenever @bio execution needs to be delayed through the zone
> + * write plug. Otherwise, return false to let the submission path process
> + * @bio normally.
> + */
> +bool blk_zone_write_plug_bio(struct bio *bio, unsigned int nr_segs)
> +{
> + if (!bio->bi_bdev->bd_disk->zone_wplugs)
> + return false;
> +
> + /*
> + * If the BIO already has the plugging flag set, then it was already
> + * handled through this path and this is a submission from the zone
> + * plug bio submit work.
> + */
> + if (bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING))
> + return false;
> +
> + /*
> + * We do not need to do anything special for empty flush BIOs, e.g
> + * BIOs such as issued by blkdev_issue_flush(). The is because it is
> + * the responsibility of the user to first wait for the completion of
> + * write operations for flush to have any effect on the persistence of
> + * the written data.
> + */
> + if (op_is_flush(bio->bi_opf) && !bio_sectors(bio))
> + return false;
> +
> + /*
> + * Regular writes and write zeroes need to be handled through the target
> + * zone write plug. This includes writes with REQ_FUA | REQ_PREFLUSH
> + * which may need to go through the flush machinery depending on the
> + * target device capabilities. Plugging such writes is fine as the flush
> + * machinery operates at the request level, below the plug, and
> + * completion of the flush sequence will go through the regular BIO
> + * completion, which will handle zone write plugging.
> + */
> + switch (bio_op(bio)) {
> + case REQ_OP_WRITE:
> + case REQ_OP_WRITE_ZEROES:
> + return blk_zone_wplug_handle_write(bio, nr_segs);
> + default:
> + return false;
> + }
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(blk_zone_write_plug_bio);
> +
> +static void blk_zone_write_plug_unplug_bio(struct blk_zone_wplug *zwplug)
> +{
> + unsigned long flags;
> +
> + blk_zone_wplug_lock(zwplug, flags);
> +
> + /* Schedule submission of the next plugged BIO if we have one. */
> + if (!bio_list_empty(&zwplug->bio_list))
> + kblockd_schedule_work(&zwplug->bio_work);
> + else
> + zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
> +
> + blk_zone_wplug_unlock(zwplug, flags);
> +}
> +
> +void blk_zone_write_plug_bio_endio(struct bio *bio)
> +{
> + /* Make sure we do not see this BIO again by clearing the plug flag. */
> + bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING);
> +
> + /*
> + * For BIO-based devices, blk_zone_write_plug_complete_request()
> + * is not called. So we need to schedule execution of the next
> + * plugged BIO here.
> + */
> + if (bio->bi_bdev->bd_has_submit_bio) {
> + struct blk_zone_wplug *zwplug = bio_lookup_zone_wplug(bio);
> +
> + blk_zone_write_plug_unplug_bio(zwplug);
> + }
> +}
> +
> +void blk_zone_write_plug_complete_request(struct request *req)
> +{
> + struct gendisk *disk = req->q->disk;
> + struct blk_zone_wplug *zwplug =
> + disk_lookup_zone_wplug(disk, req->__sector);
> +
> + req->rq_flags &= ~RQF_ZONE_WRITE_PLUGGING;
> +
> + blk_zone_write_plug_unplug_bio(zwplug);
> +}
> +
> +static void blk_zone_wplug_bio_work(struct work_struct *work)
> +{
> + struct blk_zone_wplug *zwplug =
> + container_of(work, struct blk_zone_wplug, bio_work);
> + unsigned long flags;
> + struct bio *bio;
> +
> + /*
> + * Unplug and submit the next plugged BIO. If we do not have any, clear
> + * the plugged flag.
> + */
> + blk_zone_wplug_lock(zwplug, flags);
> +
> + bio = bio_list_pop(&zwplug->bio_list);
> + if (!bio) {
> + zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED;
> + blk_zone_wplug_unlock(zwplug, flags);
> + return;
> + }
> +
> + blk_zone_wplug_unlock(zwplug, flags);
> +
> + /*
> + * blk-mq devices will reuse the reference on the request queue usage
> + * we took when the BIO was plugged, but the submission path for
> + * BIO-based devices will not do that. So drop this reference here.
> + */
> + if (bio->bi_bdev->bd_has_submit_bio)
> + blk_queue_exit(bio->bi_bdev->bd_disk->queue);
But I don't see where this reference is reused for blk-mq in this patch,
care to point it out?
Thanks,
Ming
next prev parent reply other threads:[~2024-02-04 3:56 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 7:30 [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-02 7:30 ` [PATCH 01/26] block: Restore sector of flush requests Damien Le Moal
2024-02-04 11:55 ` Hannes Reinecke
2024-02-05 17:22 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 02/26] block: Remove req_bio_endio() Damien Le Moal
2024-02-04 11:57 ` Hannes Reinecke
2024-02-05 17:28 ` Bart Van Assche
2024-02-05 23:45 ` Damien Le Moal
2024-02-09 6:53 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() Damien Le Moal
2024-02-03 4:09 ` Bart Van Assche
2024-02-04 11:58 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 04/26] block: Introduce blk_zone_complete_request_bio() Damien Le Moal
2024-02-04 11:59 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally Damien Le Moal
2024-02-03 4:11 ` Bart Van Assche
2024-02-04 12:00 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 06/26] block: Introduce zone write plugging Damien Le Moal
2024-02-04 3:56 ` Ming Lei [this message]
2024-02-04 23:57 ` Damien Le Moal
2024-02-05 2:19 ` Ming Lei
2024-02-05 2:41 ` Damien Le Moal
2024-02-05 3:38 ` Ming Lei
2024-02-05 5:11 ` Christoph Hellwig
2024-02-05 5:37 ` Damien Le Moal
2024-02-05 5:50 ` Christoph Hellwig
2024-02-05 6:14 ` Damien Le Moal
2024-02-05 10:06 ` Ming Lei
2024-02-05 12:20 ` Damien Le Moal
2024-02-05 12:43 ` Damien Le Moal
2024-02-04 12:14 ` Hannes Reinecke
2024-02-05 17:48 ` Bart Van Assche
2024-02-05 23:48 ` Damien Le Moal
2024-02-06 0:52 ` Bart Van Assche
2024-02-02 7:30 ` [PATCH 07/26] block: Allow zero value of max_zone_append_sectors queue limit Damien Le Moal
2024-02-04 12:15 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 08/26] block: Implement zone append emulation Damien Le Moal
2024-02-04 12:24 ` Hannes Reinecke
2024-02-05 0:10 ` Damien Le Moal
2024-02-05 17:58 ` Bart Van Assche
2024-02-05 23:57 ` Damien Le Moal
2024-02-02 7:30 ` [PATCH 09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() Damien Le Moal
2024-02-04 12:26 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 10/26] dm: Use the block layer zone append emulation Damien Le Moal
2024-02-03 17:58 ` Mike Snitzer
2024-02-05 5:38 ` Damien Le Moal
2024-02-05 20:33 ` Mike Snitzer
2024-02-05 23:40 ` Damien Le Moal
2024-02-06 20:41 ` Mike Snitzer
2024-02-04 12:30 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 11/26] scsi: sd: " Damien Le Moal
2024-02-04 12:29 ` Hannes Reinecke
2024-02-06 1:55 ` Martin K. Petersen
2024-02-02 7:30 ` [PATCH 12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 13/26] null_blk: " Damien Le Moal
2024-02-04 12:31 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 14/26] null_blk: Introduce zone_append_max_sectors attribute Damien Le Moal
2024-02-04 12:32 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 15/26] null_blk: Introduce fua attribute Damien Le Moal
2024-02-04 12:33 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 17/26] block: Remove BLK_STS_ZONE_RESOURCE Damien Le Moal
2024-02-04 12:34 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 18/26] block: Simplify blk_revalidate_disk_zones() interface Damien Le Moal
2024-02-04 12:35 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 19/26] block: mq-deadline: Remove support for zone write locking Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 20/26] block: Remove elevator required features Damien Le Moal
2024-02-04 12:36 ` Hannes Reinecke
2024-02-02 7:30 ` [PATCH 21/26] block: Do not check zone type in blk_check_zone_append() Damien Le Moal
2024-02-04 12:37 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 22/26] block: Move zone related debugfs attribute to blk-zoned.c Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 23/26] block: Remove zone write locking Damien Le Moal
2024-02-04 12:38 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 24/26] block: Do not special-case plugging of zone write operations Damien Le Moal
2024-02-04 12:39 ` Hannes Reinecke
2024-02-02 7:31 ` [PATCH 25/26] block: Reduce zone write plugging memory usage Damien Le Moal
2024-02-04 12:42 ` Hannes Reinecke
2024-02-05 17:51 ` Bart Van Assche
2024-02-05 23:55 ` Damien Le Moal
2024-02-06 21:20 ` Bart Van Assche
2024-02-09 3:58 ` Damien Le Moal
2024-02-09 19:36 ` Bart Van Assche
2024-02-10 0:06 ` Damien Le Moal
2024-02-11 3:40 ` Bart Van Assche
2024-02-12 1:09 ` Damien Le Moal
2024-02-12 18:58 ` Bart Van Assche
2024-02-12 8:23 ` Damien Le Moal
2024-02-12 8:47 ` Damien Le Moal
2024-02-12 18:40 ` Bart Van Assche
2024-02-13 0:05 ` Damien Le Moal
2024-02-02 7:31 ` [PATCH 26/26] block: Add zone_active_wplugs debugfs entry Damien Le Moal
2024-02-04 12:43 ` Hannes Reinecke
2024-02-02 7:37 ` [PATCH 00/26] Zone write plugging Damien Le Moal
2024-02-03 12:11 ` Jens Axboe
2024-02-09 5:28 ` Damien Le Moal
2024-02-05 17:21 ` Bart Van Assche
2024-02-05 23:42 ` Damien Le Moal
2024-02-06 0:57 ` Bart Van Assche
2024-02-05 18:18 ` Bart Van Assche
2024-02-06 0:07 ` Damien Le Moal
2024-02-06 1:25 ` Bart Van Assche
2024-02-09 4:03 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zb8K4uSN3SNeqrPI@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).