* [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:22 ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
` (6 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
Factor out a helper from blk_insert_flush that initializes the flush
machine related fields in struct request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 04698ed9bcd4a9..422a6d5446d1c5 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
return RQ_END_IO_NONE;
}
+static void blk_rq_init_flush(struct request *rq)
+{
+ memset(&rq->flush, 0, sizeof(rq->flush));
+ INIT_LIST_HEAD(&rq->flush.list);
+ rq->rq_flags |= RQF_FLUSH_SEQ;
+ rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+ rq->end_io = mq_flush_data_end_io;
+}
+
/**
* blk_insert_flush - insert a new PREFLUSH/FUA request
* @rq: request to insert
@@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
* @rq should go through flush machinery. Mark it part of flush
* sequence and submit for further processing.
*/
- memset(&rq->flush, 0, sizeof(rq->flush));
- INIT_LIST_HEAD(&rq->flush.list);
- rq->rq_flags |= RQF_FLUSH_SEQ;
- rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
-
- rq->end_io = mq_flush_data_end_io;
-
+ blk_rq_init_flush(rq);
spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-04-17 6:22 ` Damien Le Moal
2023-04-17 6:24 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:22 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> Factor out a helper from blk_insert_flush that initializes the flush
> machine related fields in struct request.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-flush.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 04698ed9bcd4a9..422a6d5446d1c5 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -376,6 +376,15 @@ static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
> return RQ_END_IO_NONE;
> }
>
> +static void blk_rq_init_flush(struct request *rq)
> +{
> + memset(&rq->flush, 0, sizeof(rq->flush));
> + INIT_LIST_HEAD(&rq->flush.list);
> + rq->rq_flags |= RQF_FLUSH_SEQ;
> + rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> + rq->end_io = mq_flush_data_end_io;
> +}
struct flush is:
struct {
unsigned int seq;
struct list_head list;
rq_end_io_fn *saved_end_io;
} flush;
So given that list and saved_end_io are initialized here, we could remove the
memset() by initializing seq "manually":
+static void blk_rq_init_flush(struct request *rq)
+{
+ rq->flush.seq = 0;
+ INIT_LIST_HEAD(&rq->flush.list);
+ rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
+ rq->rq_flags |= RQF_FLUSH_SEQ;
+ rq->end_io = mq_flush_data_end_io;
+}
No ?
Otherwise, look good to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> +
> /**
> * blk_insert_flush - insert a new PREFLUSH/FUA request
> * @rq: request to insert
> @@ -437,13 +446,7 @@ void blk_insert_flush(struct request *rq)
> * @rq should go through flush machinery. Mark it part of flush
> * sequence and submit for further processing.
> */
> - memset(&rq->flush, 0, sizeof(rq->flush));
> - INIT_LIST_HEAD(&rq->flush.list);
> - rq->rq_flags |= RQF_FLUSH_SEQ;
> - rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> -
> - rq->end_io = mq_flush_data_end_io;
> -
> + blk_rq_init_flush(rq);
> spin_lock_irq(&fq->mq_flush_lock);
> blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
> spin_unlock_irq(&fq->mq_flush_lock);
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper
2023-04-17 6:22 ` Damien Le Moal
@ 2023-04-17 6:24 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-17 6:24 UTC (permalink / raw)
To: Damien Le Moal
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block
On Mon, Apr 17, 2023 at 03:22:21PM +0900, Damien Le Moal wrote:
> > +static void blk_rq_init_flush(struct request *rq)
> > +{
> > + memset(&rq->flush, 0, sizeof(rq->flush));
> > + INIT_LIST_HEAD(&rq->flush.list);
> > + rq->rq_flags |= RQF_FLUSH_SEQ;
> > + rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> > + rq->end_io = mq_flush_data_end_io;
> > +}
>
> struct flush is:
>
> struct {
> unsigned int seq;
> struct list_head list;
> rq_end_io_fn *saved_end_io;
> } flush;
>
> So given that list and saved_end_io are initialized here, we could remove the
> memset() by initializing seq "manually":
Yes. And seq is always initialized in the callers, and I think I can also
removed saved_end_io entirely. So maybe we can drop this patch in the
end and just do some other changes.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/7] blk-mq: reflow blk_insert_flush
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:29 ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
` (5 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
Use a switch statement to decide on the disposition of a flush request
instead of multiple if statements, out of which one does checks that are
more complex than required. Also warn on a malformed request early
on instead of doing a BUG_ON later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 53 +++++++++++++++++++++++------------------------
1 file changed, 26 insertions(+), 27 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 422a6d5446d1c5..9fcfee7394f27d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
+ /* FLUSH/FUA request must never be merged */
+ WARN_ON_ONCE(rq->bio != rq->biotail);
+
/*
* @policy now records what operations need to be done. Adjust
* REQ_PREFLUSH and FUA for the driver.
@@ -417,39 +420,35 @@ void blk_insert_flush(struct request *rq)
*/
rq->cmd_flags |= REQ_SYNC;
- /*
- * An empty flush handed down from a stacking driver may
- * translate into nothing if the underlying device does not
- * advertise a write-back cache. In this case, simply
- * complete the request.
- */
- if (!policy) {
+ switch (policy) {
+ case 0:
+ /*
+ * An empty flush handed down from a stacking driver may
+ * translate into nothing if the underlying device does not
+ * advertise a write-back cache. In this case, simply
+ * complete the request.
+ */
blk_mq_end_request(rq, 0);
return;
- }
-
- BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
-
- /*
- * If there's data but flush is not necessary, the request can be
- * processed directly without going through flush machinery. Queue
- * for normal execution.
- */
- if ((policy & REQ_FSEQ_DATA) &&
- !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
+ case REQ_FSEQ_DATA:
+ /*
+ * If there's data, but no flush is necessary, the request can
+ * be processed directly without going through flush machinery.
+ * Queue for normal execution.
+ */
blk_mq_request_bypass_insert(rq, 0);
blk_mq_run_hw_queue(hctx, false);
return;
+ default:
+ /*
+ * Mark the request as part of a flush sequence and submit it
+ * for further processing to the flush state machine.
+ */
+ blk_rq_init_flush(rq);
+ spin_lock_irq(&fq->mq_flush_lock);
+ blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
+ spin_unlock_irq(&fq->mq_flush_lock);
}
-
- /*
- * @rq should go through flush machinery. Mark it part of flush
- * sequence and submit for further processing.
- */
- blk_rq_init_flush(rq);
- spin_lock_irq(&fq->mq_flush_lock);
- blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
- spin_unlock_irq(&fq->mq_flush_lock);
}
/**
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 2/7] blk-mq: reflow blk_insert_flush
2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-04-17 6:29 ` Damien Le Moal
0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:29 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> Use a switch statement to decide on the disposition of a flush request
> instead of multiple if statements, out of which one does checks that are
> more complex than required. Also warn on a malformed request early
> on instead of doing a BUG_ON later.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
One nit below, but looks OK overall.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> block/blk-flush.c | 53 +++++++++++++++++++++++------------------------
> 1 file changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 422a6d5446d1c5..9fcfee7394f27d 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -402,6 +402,9 @@ void blk_insert_flush(struct request *rq)
> struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
> struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>
> + /* FLUSH/FUA request must never be merged */
> + WARN_ON_ONCE(rq->bio != rq->biotail);
This could be:
if (WARN_ON_ONCE(rq->bio != rq->biotail)) {
blk_mq_end_request(rq, BLK_STS_IOERR);
return;
}
To avoid an oops on the malformed request later on ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
2023-04-16 20:09 ` [PATCH 1/7] blk-mq: factor out a blk_rq_init_flush helper Christoph Hellwig
2023-04-16 20:09 ` [PATCH 2/7] blk-mq: reflow blk_insert_flush Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:31 ` Damien Le Moal
2023-04-17 19:48 ` Bart Van Assche
2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
` (4 subsequent siblings)
7 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
If blk_insert_flush decides that a command does not need to use the
flush state machine, return false and let blk_mq_submit_bio handle
it the normal way (including using an I/O scheduler) instead of doing
a bypass insert.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 22 ++++++++--------------
block/blk-mq.c | 8 ++++----
block/blk-mq.h | 4 ----
block/blk.h | 2 +-
4 files changed, 13 insertions(+), 23 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9fcfee7394f27d..5bc462524473b2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -385,22 +385,17 @@ static void blk_rq_init_flush(struct request *rq)
rq->end_io = mq_flush_data_end_io;
}
-/**
- * blk_insert_flush - insert a new PREFLUSH/FUA request
- * @rq: request to insert
- *
- * To be called from __elv_add_request() for %ELEVATOR_INSERT_FLUSH insertions.
- * or __blk_mq_run_hw_queue() to dispatch request.
- * @rq is being submitted. Analyze what needs to be done and put it on the
- * right queue.
+/*
+ * Insert a PREFLUSH/FUA request into the flush state machine.
+ * Returns true if the request has been consumed by the flush state machine,
+ * or false if the caller should continue to process it.
*/
-void blk_insert_flush(struct request *rq)
+bool blk_insert_flush(struct request *rq)
{
struct request_queue *q = rq->q;
unsigned long fflags = q->queue_flags; /* may change, cache */
unsigned int policy = blk_flush_policy(fflags, rq);
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
- struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
/* FLUSH/FUA request must never be merged */
WARN_ON_ONCE(rq->bio != rq->biotail);
@@ -429,16 +424,14 @@ void blk_insert_flush(struct request *rq)
* complete the request.
*/
blk_mq_end_request(rq, 0);
- return;
+ return true;
case REQ_FSEQ_DATA:
/*
* If there's data, but no flush is necessary, the request can
* be processed directly without going through flush machinery.
* Queue for normal execution.
*/
- blk_mq_request_bypass_insert(rq, 0);
- blk_mq_run_hw_queue(hctx, false);
- return;
+ return false;
default:
/*
* Mark the request as part of a flush sequence and submit it
@@ -448,6 +441,7 @@ void blk_insert_flush(struct request *rq)
spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
+ return true;
}
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2d297efe229db..452bc17cce4dcf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -45,6 +45,8 @@
static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
+static void blk_mq_request_bypass_insert(struct request *rq,
+ blk_insert_t flags);
static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
struct list_head *list);
@@ -2429,7 +2431,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
* Should only be used carefully, when the caller knows we want to
* bypass a potential IO scheduler on the target device.
*/
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
+static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
{
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
@@ -2972,10 +2974,8 @@ void blk_mq_submit_bio(struct bio *bio)
return;
}
- if (op_is_flush(bio->bi_opf)) {
- blk_insert_flush(rq);
+ if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
return;
- }
if (plug) {
blk_add_rq_to_plug(plug, rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index f882677ff106a5..62c505e6ef1e40 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -64,10 +64,6 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
struct blk_mq_tags *tags,
unsigned int hctx_idx);
-/*
- * Internal helpers for request insertion into sw queues
- */
-void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags);
/*
* CPU -> queue mappings
diff --git a/block/blk.h b/block/blk.h
index 2da83110347173..512a1c0db6ba15 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -277,7 +277,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
*/
#define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
-void blk_insert_flush(struct request *rq);
+bool blk_insert_flush(struct request *rq);
int elevator_switch(struct request_queue *q, struct elevator_type *new_e);
void elevator_disable(struct request_queue *q);
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
@ 2023-04-17 6:31 ` Damien Le Moal
2023-04-17 19:48 ` Bart Van Assche
1 sibling, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:31 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> If blk_insert_flush decides that a command does not need to use the
> flush state machine, return false and let blk_mq_submit_bio handle
> it the normal way (including using an I/O scheduler) instead of doing
> a bypass insert.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Nice !
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands
2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
2023-04-17 6:31 ` Damien Le Moal
@ 2023-04-17 19:48 ` Bart Van Assche
1 sibling, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2023-04-17 19:48 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block
On 4/16/23 13:09, Christoph Hellwig wrote:
> If blk_insert_flush decides that a command does not need to use the
> flush state machine, return false and let blk_mq_submit_bio handle
> it the normal way (including using an I/O scheduler) instead of doing
> a bypass insert.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
` (2 preceding siblings ...)
2023-04-16 20:09 ` [PATCH 3/7] blk-mq: defer to the normal submission path for non-flush flush commands Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:34 ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
` (3 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
From: Bart Van Assche <bvanassche@acm.org>
Send write requests issued by the flush state machine through the normal
I/O submission path including the I/O scheduler (if present) so that I/O
scheduler policies are applied to writes with the FUA flag set.
Separate the I/O scheduler members from the flush members in struct
request since now a request may pass through both an I/O scheduler
and the flush machinery.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 5 ++++-
block/blk-mq-sched.c | 14 ++++++++++++++
block/blk-mq-sched.h | 1 +
block/blk-mq.c | 24 ++++++------------------
include/linux/blk-mq.h | 27 +++++++++++----------------
5 files changed, 36 insertions(+), 35 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 5bc462524473b2..f62e74d9d56bc8 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -330,8 +330,11 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
* account of this driver tag
*/
flush_rq->rq_flags |= RQF_MQ_INFLIGHT;
- } else
+ } else {
flush_rq->internal_tag = first_rq->internal_tag;
+ flush_rq->rq_flags |= RQF_ELV;
+ blk_mq_sched_prepare(flush_rq);
+ }
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 67c95f31b15bb1..70087554eeb8f4 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -16,6 +16,20 @@
#include "blk-mq-sched.h"
#include "blk-wbt.h"
+/* Prepare a request for insertion into an I/O scheduler. */
+void blk_mq_sched_prepare(struct request *rq)
+{
+ struct elevator_queue *e = rq->q->elevator;
+
+ INIT_HLIST_NODE(&rq->hash);
+ RB_CLEAR_NODE(&rq->rb_node);
+
+ if (e->type->ops.prepare_request) {
+ e->type->ops.prepare_request(rq);
+ rq->rq_flags |= RQF_ELVPRIV;
+ }
+}
+
/*
* Mark a hardware queue as needing a restart.
*/
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 7c3cbad17f3052..3049e988ca5916 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -7,6 +7,7 @@
#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
+void blk_mq_sched_prepare(struct request *rq);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **merged_request);
bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 452bc17cce4dcf..50a03e1592819c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -388,18 +388,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
WRITE_ONCE(rq->deadline, 0);
req_ref_set(rq, 1);
- if (rq->rq_flags & RQF_ELV) {
- struct elevator_queue *e = data->q->elevator;
-
- INIT_HLIST_NODE(&rq->hash);
- RB_CLEAR_NODE(&rq->rb_node);
-
- if (!op_is_flush(data->cmd_flags) &&
- e->type->ops.prepare_request) {
- e->type->ops.prepare_request(rq);
- rq->rq_flags |= RQF_ELVPRIV;
- }
- }
+ if (rq->rq_flags & RQF_ELV)
+ blk_mq_sched_prepare(rq);
return rq;
}
@@ -456,13 +446,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
data->rq_flags |= RQF_ELV;
/*
- * Flush/passthrough requests are special and go directly to the
- * dispatch list. Don't include reserved tags in the
- * limiting, as it isn't useful.
+ * Do not limit the depth for passthrough requests nor for
+ * requests with a reserved tag.
*/
- if (!op_is_flush(data->cmd_flags) &&
+ if (e->type->ops.limit_depth &&
!blk_op_is_passthrough(data->cmd_flags) &&
- e->type->ops.limit_depth &&
!(data->flags & BLK_MQ_REQ_RESERVED))
e->type->ops.limit_depth(data->cmd_flags, data);
}
@@ -2496,7 +2484,7 @@ static void blk_mq_insert_request(struct request *rq, blk_insert_t flags)
* dispatch it given we prioritize requests in hctx->dispatch.
*/
blk_mq_request_bypass_insert(rq, flags);
- } else if (rq->rq_flags & RQF_FLUSH_SEQ) {
+ } else if (req_op(rq) == REQ_OP_FLUSH) {
/*
* Firstly normal IO request is inserted to scheduler queue or
* sw queue, meantime we add flush request to dispatch queue(
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1dacb2c81fdda1..a204a5f3cc9524 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,25 +169,20 @@ struct request {
void *completion_data;
};
-
/*
* Three pointers are available for the IO schedulers, if they need
- * more they have to dynamically allocate it. Flush requests are
- * never put on the IO scheduler. So let the flush fields share
- * space with the elevator data.
+ * more they have to dynamically allocate it.
*/
- union {
- struct {
- struct io_cq *icq;
- void *priv[2];
- } elv;
-
- struct {
- unsigned int seq;
- struct list_head list;
- rq_end_io_fn *saved_end_io;
- } flush;
- };
+ struct {
+ struct io_cq *icq;
+ void *priv[2];
+ } elv;
+
+ struct {
+ unsigned int seq;
+ struct list_head list;
+ rq_end_io_fn *saved_end_io;
+ } flush;
union {
struct __call_single_data csd;
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine
2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
@ 2023-04-17 6:34 ` Damien Le Moal
0 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:34 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> From: Bart Van Assche <bvanassche@acm.org>
>
> Send write requests issued by the flush state machine through the normal
> I/O submission path including the I/O scheduler (if present) so that I/O
> scheduler policies are applied to writes with the FUA flag set.
>
> Separate the I/O scheduler members from the flush members in struct
> request since now a request may pass through both an I/O scheduler
> and the flush machinery.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> [hch: rebased]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks OK to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
` (3 preceding siblings ...)
2023-04-16 20:09 ` [PATCH 4/7] blk-mq: also use the I/O scheduler for requests from the flush state machine Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:36 ` Damien Le Moal
2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
` (2 subsequent siblings)
7 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
Requests with the FUA bit on hardware without FUA support need a post
flush before returning the caller, but they can still be sent using
the normal I/O path after initializing the flush-related fields and
end I/O handler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index f62e74d9d56bc8..9eda6d46438dba 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -435,6 +435,17 @@ bool blk_insert_flush(struct request *rq)
* Queue for normal execution.
*/
return false;
+ case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
+ /*
+ * Initialize the flush fields and completion handler to trigger
+ * the post flush, and then just pass the command on.
+ */
+ blk_rq_init_flush(rq);
+ rq->flush.seq |= REQ_FSEQ_PREFLUSH;
+ spin_lock_irq(&fq->mq_flush_lock);
+ list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
+ spin_unlock_irq(&fq->mq_flush_lock);
+ return false;
default:
/*
* Mark the request as part of a flush sequence and submit it
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-04-17 6:36 ` Damien Le Moal
2023-04-17 6:39 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:36 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> Requests with the FUA bit on hardware without FUA support need a post
> flush before returning the caller, but they can still be sent using
s/returning/returning to
> the normal I/O path after initializing the flush-related fields and
> end I/O handler.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-flush.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index f62e74d9d56bc8..9eda6d46438dba 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -435,6 +435,17 @@ bool blk_insert_flush(struct request *rq)
> * Queue for normal execution.
> */
> return false;
> + case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
> + /*
> + * Initialize the flush fields and completion handler to trigger
> + * the post flush, and then just pass the command on.
> + */
> + blk_rq_init_flush(rq);
> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
Shouldn't this be REQ_FSEQ_POSTFLUSH ?
> + spin_lock_irq(&fq->mq_flush_lock);
> + list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> + spin_unlock_irq(&fq->mq_flush_lock);
> + return false;
> default:
> /*
> * Mark the request as part of a flush sequence and submit it
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests
2023-04-17 6:36 ` Damien Le Moal
@ 2023-04-17 6:39 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-17 6:39 UTC (permalink / raw)
To: Damien Le Moal
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block
On Mon, Apr 17, 2023 at 03:36:54PM +0900, Damien Le Moal wrote:
> > + case REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH:
> > + /*
> > + * Initialize the flush fields and completion handler to trigger
> > + * the post flush, and then just pass the command on.
> > + */
> > + blk_rq_init_flush(rq);
> > + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
>
> Shouldn't this be REQ_FSEQ_POSTFLUSH ?
Yes. My fault for optimizing away the complicated assignment in the
last minute..
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
` (4 preceding siblings ...)
2023-04-16 20:09 ` [PATCH 5/7] blk-mq: defer to the normal submission path for post-flush requests Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-17 6:38 ` Damien Le Moal
2023-04-17 19:51 ` Bart Van Assche
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-05-16 0:08 ` RFC: less special casing for flush requests Bart Van Assche
7 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
blk_flush_complete_seq currently queues requests that write data after
a pre-flush from the flush state machine at the head of the queue.
This doesn't really make sense, as the original request bypassed all
queue lists by directly diverting to blk_insert_flush from
blk_mq_submit_bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9eda6d46438dba..69e9806f575455 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,7 @@ static void blk_flush_complete_seq(struct request *rq,
case REQ_FSEQ_DATA:
list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
- blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+ blk_mq_add_to_requeue_list(rq, 0);
blk_mq_kick_requeue_list(q);
break;
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
@ 2023-04-17 6:38 ` Damien Le Moal
2023-04-17 19:51 ` Bart Van Assche
1 sibling, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:38 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> blk_flush_complete_seq currently queues requests that write data after
> a pre-flush from the flush state machine at the head of the queue.
> This doesn't really make sense, as the original request bypassed all
> queue lists by directly diverting to blk_insert_flush from
> blk_mq_submit_bio.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands
2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
2023-04-17 6:38 ` Damien Le Moal
@ 2023-04-17 19:51 ` Bart Van Assche
1 sibling, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2023-04-17 19:51 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block
On 4/16/23 13:09, Christoph Hellwig wrote:
> blk_flush_complete_seq currently queues requests that write data after
> a pre-flush from the flush state machine at the head of the queue.
> This doesn't really make sense, as the original request bypassed all
> queue lists by directly diverting to blk_insert_flush from
> blk_mq_submit_bio.
insertations -> insertions
Otherwise:
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
` (5 preceding siblings ...)
2023-04-16 20:09 ` [PATCH 6/7] blk-mq: do not do head insertations post-pre-flush commands Christoph Hellwig
@ 2023-04-16 20:09 ` Christoph Hellwig
2023-04-16 21:01 ` Bart Van Assche
` (3 more replies)
2023-05-16 0:08 ` RFC: less special casing for flush requests Bart Van Assche
7 siblings, 4 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-16 20:09 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, Damien Le Moal, linux-block
Currently both requeues of commands that were already sent to the
driver and flush commands submitted from the flush state machine
share the same requeue_list struct request_queue, despite requeues
doing head insertations and flushes not. Switch to using two
separate lists instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-flush.c | 9 +++++++--
block/blk-mq-debugfs.c | 1 -
block/blk-mq.c | 36 +++++++++++++++---------------------
block/blk-mq.h | 1 -
include/linux/blk-mq.h | 4 +---
include/linux/blkdev.h | 1 +
6 files changed, 24 insertions(+), 28 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 69e9806f575455..231d3780e74ad1 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
case REQ_FSEQ_DATA:
list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
- blk_mq_add_to_requeue_list(rq, 0);
+ spin_lock(&q->requeue_lock);
+ list_add_tail(&rq->queuelist, &q->flush_list);
+ spin_unlock(&q->requeue_lock);
blk_mq_kick_requeue_list(q);
break;
@@ -349,7 +351,10 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
smp_wmb();
req_ref_set(flush_rq, 1);
- blk_mq_add_to_requeue_list(flush_rq, 0);
+ spin_lock(&q->requeue_lock);
+ list_add_tail(&flush_rq->queuelist, &q->flush_list);
+ spin_unlock(&q->requeue_lock);
+
blk_mq_kick_requeue_list(q);
}
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d23a8554ec4aeb..0d2a0f0524b53e 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -241,7 +241,6 @@ static const char *const cmd_flag_name[] = {
#define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
static const char *const rqf_name[] = {
RQF_NAME(STARTED),
- RQF_NAME(SOFTBARRIER),
RQF_NAME(FLUSH_SEQ),
RQF_NAME(MIXED_MERGE),
RQF_NAME(MQ_INFLIGHT),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 50a03e1592819c..eaed84a346c9c4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1403,13 +1403,16 @@ static void __blk_mq_requeue_request(struct request *rq)
void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
{
struct request_queue *q = rq->q;
+ unsigned long flags;
__blk_mq_requeue_request(rq);
/* this request will be re-inserted to io scheduler queue */
blk_mq_sched_requeue_request(rq);
- blk_mq_add_to_requeue_list(rq, BLK_MQ_INSERT_AT_HEAD);
+ spin_lock_irqsave(&q->requeue_lock, flags);
+ list_add_tail(&rq->queuelist, &q->requeue_list);
+ spin_unlock_irqrestore(&q->requeue_lock, flags);
if (kick_requeue_list)
blk_mq_kick_requeue_list(q);
@@ -1421,13 +1424,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
struct request_queue *q =
container_of(work, struct request_queue, requeue_work.work);
LIST_HEAD(rq_list);
- struct request *rq, *next;
+ LIST_HEAD(flush_list);
+ struct request *rq;
spin_lock_irq(&q->requeue_lock);
list_splice_init(&q->requeue_list, &rq_list);
+ list_splice_init(&q->flush_list, &flush_list);
spin_unlock_irq(&q->requeue_lock);
- list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+ while (!list_empty(&rq_list)) {
+ rq = list_entry(rq_list.next, struct request, queuelist);
/*
* If RQF_DONTPREP ist set, the request has been started by the
* driver already and might have driver-specific data allocated
@@ -1435,18 +1441,16 @@ static void blk_mq_requeue_work(struct work_struct *work)
* block layer merges for the request.
*/
if (rq->rq_flags & RQF_DONTPREP) {
- rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
blk_mq_request_bypass_insert(rq, 0);
- } else if (rq->rq_flags & RQF_SOFTBARRIER) {
- rq->rq_flags &= ~RQF_SOFTBARRIER;
+ } else {
list_del_init(&rq->queuelist);
blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
}
}
- while (!list_empty(&rq_list)) {
- rq = list_entry(rq_list.next, struct request, queuelist);
+ while (!list_empty(&flush_list)) {
+ rq = list_entry(flush_list.next, struct request, queuelist);
list_del_init(&rq->queuelist);
blk_mq_insert_request(rq, 0);
}
@@ -1454,24 +1458,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
blk_mq_run_hw_queues(q, false);
}
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags)
+void blk_flush_queue_insert(struct request *rq)
{
struct request_queue *q = rq->q;
unsigned long flags;
- /*
- * We abuse this flag that is otherwise used by the I/O scheduler to
- * request head insertion from the workqueue.
- */
- BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
-
spin_lock_irqsave(&q->requeue_lock, flags);
- if (insert_flags & BLK_MQ_INSERT_AT_HEAD) {
- rq->rq_flags |= RQF_SOFTBARRIER;
- list_add(&rq->queuelist, &q->requeue_list);
- } else {
- list_add_tail(&rq->queuelist, &q->requeue_list);
- }
+ list_add_tail(&rq->queuelist, &q->flush_list);
spin_unlock_irqrestore(&q->requeue_lock, flags);
}
@@ -4222,6 +4215,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
blk_mq_update_poll_flag(q);
INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+ INIT_LIST_HEAD(&q->flush_list);
INIT_LIST_HEAD(&q->requeue_list);
spin_lock_init(&q->requeue_lock);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 62c505e6ef1e40..1955e0c08d154c 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -47,7 +47,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
unsigned int);
-void blk_mq_add_to_requeue_list(struct request *rq, blk_insert_t insert_flags);
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a204a5f3cc9524..23fa9fdf59f3e1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -28,8 +28,6 @@ typedef __u32 __bitwise req_flags_t;
/* drive already may have started this one */
#define RQF_STARTED ((__force req_flags_t)(1 << 1))
-/* may not be passed by ioscheduler */
-#define RQF_SOFTBARRIER ((__force req_flags_t)(1 << 3))
/* request for flush sequence */
#define RQF_FLUSH_SEQ ((__force req_flags_t)(1 << 4))
/* merge of different types, fail separately */
@@ -65,7 +63,7 @@ typedef __u32 __bitwise req_flags_t;
/* flags that prevent us from merging requests: */
#define RQF_NOMERGE_FLAGS \
- (RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+ (RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
enum mq_rq_state {
MQ_RQ_IDLE = 0,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6ede578dfbc642..649274b4043b69 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -490,6 +490,7 @@ struct request_queue {
* for flush operations
*/
struct blk_flush_queue *fq;
+ struct list_head flush_list;
struct list_head requeue_list;
spinlock_t requeue_lock;
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-04-16 21:01 ` Bart Van Assche
2023-04-17 4:26 ` Christoph Hellwig
2023-04-17 6:46 ` Damien Le Moal
` (2 subsequent siblings)
3 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2023-04-16 21:01 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block
On 4/16/23 13:09, Christoph Hellwig wrote:
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 69e9806f575455..231d3780e74ad1 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>
> case REQ_FSEQ_DATA:
> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> - blk_mq_add_to_requeue_list(rq, 0);
> + spin_lock(&q->requeue_lock);
> + list_add_tail(&rq->queuelist, &q->flush_list);
> + spin_unlock(&q->requeue_lock);
> blk_mq_kick_requeue_list(q);
> break;
At least the SCSI core can call blk_flush_complete_seq() from interrupt
context so I don't think the above code is correct. The call chain is as
follows:
LLD interrupt handler
scsi_done()
scsi_done_internal()
blk_mq_complete_request()
scsi_complete()
scsi_finish_command()
scsi_io_completion()
scsi_end_request()
__blk_mq_end_request()
flush_end_io()
blk_flush_complete_seq()
Bart.
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 21:01 ` Bart Van Assche
@ 2023-04-17 4:26 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-04-17 4:26 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block
On Sun, Apr 16, 2023 at 02:01:30PM -0700, Bart Van Assche wrote:
> On 4/16/23 13:09, Christoph Hellwig wrote:
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index 69e9806f575455..231d3780e74ad1 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>> case REQ_FSEQ_DATA:
>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>> - blk_mq_add_to_requeue_list(rq, 0);
>> + spin_lock(&q->requeue_lock);
>> + list_add_tail(&rq->queuelist, &q->flush_list);
>> + spin_unlock(&q->requeue_lock);
>> blk_mq_kick_requeue_list(q);
>> break;
>
> At least the SCSI core can call blk_flush_complete_seq() from interrupt
> context so I don't think the above code is correct. The call chain is as
> follows:
All callers of blk_flush_complete_seq already disable interrupts when
taking mq_flush_lock. No need to disable interrupts again for a nested
lock then.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-04-16 21:01 ` Bart Van Assche
@ 2023-04-17 6:46 ` Damien Le Moal
2023-04-19 12:37 ` kernel test robot
2023-04-19 17:34 ` kernel test robot
3 siblings, 0 replies; 28+ messages in thread
From: Damien Le Moal @ 2023-04-17 6:46 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Bart Van Assche, linux-block
On 4/17/23 05:09, Christoph Hellwig wrote:
> Currently both requeues of commands that were already sent to the
> driver and flush commands submitted from the flush state machine
> share the same requeue_list struct request_queue, despite requeues
> doing head insertations and flushes not. Switch to using two
> separate lists instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks OK.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> block/blk-flush.c | 9 +++++++--
> block/blk-mq-debugfs.c | 1 -
> block/blk-mq.c | 36 +++++++++++++++---------------------
> block/blk-mq.h | 1 -
> include/linux/blk-mq.h | 4 +---
> include/linux/blkdev.h | 1 +
> 6 files changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 69e9806f575455..231d3780e74ad1 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -188,7 +188,9 @@ static void blk_flush_complete_seq(struct request *rq,
>
> case REQ_FSEQ_DATA:
> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> - blk_mq_add_to_requeue_list(rq, 0);
> + spin_lock(&q->requeue_lock);
> + list_add_tail(&rq->queuelist, &q->flush_list);
> + spin_unlock(&q->requeue_lock);
> blk_mq_kick_requeue_list(q);
With this change, this function name is a little misleading... But I do not have
a better name to propose :)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
2023-04-16 21:01 ` Bart Van Assche
2023-04-17 6:46 ` Damien Le Moal
@ 2023-04-19 12:37 ` kernel test robot
2023-04-19 17:34 ` kernel test robot
3 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2023-04-19 12:37 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: oe-kbuild-all, Bart Van Assche, Damien Le Moal, linux-block
Hi Christoph,
kernel test robot noticed the following build warnings:
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20230418]
[cannot apply to linus/master v6.3-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20230416200930.29542-8-hch%40lst.de
patch subject: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
config: i386-randconfig-a003 (https://download.01.org/0day-ci/archive/20230419/202304192001.KzxpDQmK-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/c1197e1a4ff5b38b73d3ec923987ca857f5e2695
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
git checkout c1197e1a4ff5b38b73d3ec923987ca857f5e2695
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 olddefconfig
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304192001.KzxpDQmK-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> block/blk-mq.c:1461:6: warning: no previous prototype for 'blk_flush_queue_insert' [-Wmissing-prototypes]
1461 | void blk_flush_queue_insert(struct request *rq)
| ^~~~~~~~~~~~~~~~~~~~~~
vim +/blk_flush_queue_insert +1461 block/blk-mq.c
1460
> 1461 void blk_flush_queue_insert(struct request *rq)
1462 {
1463 struct request_queue *q = rq->q;
1464 unsigned long flags;
1465
1466 spin_lock_irqsave(&q->requeue_lock, flags);
1467 list_add_tail(&rq->queuelist, &q->flush_list);
1468 spin_unlock_irqrestore(&q->requeue_lock, flags);
1469 }
1470
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
` (2 preceding siblings ...)
2023-04-19 12:37 ` kernel test robot
@ 2023-04-19 17:34 ` kernel test robot
3 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2023-04-19 17:34 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: llvm, oe-kbuild-all, Bart Van Assche, Damien Le Moal, linux-block
Hi Christoph,
kernel test robot noticed the following build warnings:
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20230418]
[cannot apply to linus/master v6.3-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20230416200930.29542-8-hch%40lst.de
patch subject: [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands
config: x86_64-randconfig-a013-20230417 (https://download.01.org/0day-ci/archive/20230420/202304200122.GEFsqxFh-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/c1197e1a4ff5b38b73d3ec923987ca857f5e2695
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Christoph-Hellwig/blk-mq-reflow-blk_insert_flush/20230417-051014
git checkout c1197e1a4ff5b38b73d3ec923987ca857f5e2695
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304200122.GEFsqxFh-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> block/blk-mq.c:1461:6: warning: no previous prototype for function 'blk_flush_queue_insert' [-Wmissing-prototypes]
void blk_flush_queue_insert(struct request *rq)
^
block/blk-mq.c:1461:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void blk_flush_queue_insert(struct request *rq)
^
static
1 warning generated.
vim +/blk_flush_queue_insert +1461 block/blk-mq.c
1460
> 1461 void blk_flush_queue_insert(struct request *rq)
1462 {
1463 struct request_queue *q = rq->q;
1464 unsigned long flags;
1465
1466 spin_lock_irqsave(&q->requeue_lock, flags);
1467 list_add_tail(&rq->queuelist, &q->flush_list);
1468 spin_unlock_irqrestore(&q->requeue_lock, flags);
1469 }
1470
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: RFC: less special casing for flush requests
2023-04-16 20:09 RFC: less special casing for flush requests Christoph Hellwig
` (6 preceding siblings ...)
2023-04-16 20:09 ` [PATCH 7/7] blk-mq: don't use the requeue list to queue flush commands Christoph Hellwig
@ 2023-05-16 0:08 ` Bart Van Assche
2023-05-16 4:03 ` Christoph Hellwig
7 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2023-05-16 0:08 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: Damien Le Moal, linux-block
On 4/16/23 13:09, Christoph Hellwig wrote:
> inspired by Barts quest for less reording during requeues, this series
> aims to limit the scope of requeues. This is mostly done by simply
> sending down more than we currently do down the normal submission path
> with the help of Bart's patch to allow requests that are in the flush
> state machine to still be seen by the I/O schedulers.
Hi Christoph,
Do you have the time to continue the work on this patch series? Please
let me know in case you would prefer that I continue the work on this
patch series. I just found and fixed a bug in my patch "block: Send FUA
requests to the I/O scheduler" (not included in this series).
Thanks,
Bart.
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: RFC: less special casing for flush requests
2023-05-16 0:08 ` RFC: less special casing for flush requests Bart Van Assche
@ 2023-05-16 4:03 ` Christoph Hellwig
2023-05-18 5:57 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-16 4:03 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block
On Mon, May 15, 2023 at 05:08:27PM -0700, Bart Van Assche wrote:
> Do you have the time to continue the work on this patch series? Please let
> me know in case you would prefer that I continue the work on this patch
> series. I just found and fixed a bug in my patch "block: Send FUA requests
> to the I/O scheduler" (not included in this series).
I did another pass last weekend and was planning to finish it off today.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: RFC: less special casing for flush requests
2023-05-16 4:03 ` Christoph Hellwig
@ 2023-05-18 5:57 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 5:57 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, Damien Le Moal, linux-block
On Tue, May 16, 2023 at 06:03:51AM +0200, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 05:08:27PM -0700, Bart Van Assche wrote:
> > Do you have the time to continue the work on this patch series? Please let
> > me know in case you would prefer that I continue the work on this patch
> > series. I just found and fixed a bug in my patch "block: Send FUA requests
> > to the I/O scheduler" (not included in this series).
>
> I did another pass last weekend and was planning to finish it off today.
Turns out this all interacts with the scheduler tags vs passthrough
dicussion we had, but I now have a version of the series rebased on top
of that:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-flush
I will send it out once we've settled the tags discussion.
^ permalink raw reply [flat|nested] 28+ messages in thread