From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
John Garry <john.garry@huawei.com>,
Bart Van Assche <bvanassche@acm.org>,
Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH V6 3/8] blk-mq: prepare for draining IO when hctx's all CPUs are offline
Date: Tue, 7 Apr 2020 17:28:56 +0800 [thread overview]
Message-ID: <20200407092901.314228-4-ming.lei@redhat.com> (raw)
In-Reply-To: <20200407092901.314228-1-ming.lei@redhat.com>
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:
"
That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again.
"
However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is
one cpuhp state handled after the CPU is down, so there isn't any chance
to quiesce hctx for blk-mq wrt. CPU hotplug.
Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues
and wait for completion of in-flight requests.
We will stop hw queue and wait for completion of in-flight requests
when one hctx is becoming dead in the following patch. This way may
cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
loop.
Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
loop, so we needn't to wait for completion of in-flight requests from
dm-rq & loop, then the potential dead-lock can be avoided.
[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-debugfs.c | 1 +
block/blk-mq.c | 13 +++++++++++++
drivers/block/loop.c | 2 +-
drivers/md/dm-rq.c | 2 +-
include/linux/blk-mq.h | 3 +++
include/linux/cpuhotplug.h | 1 +
6 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d6de4f7f38cb..b62390918ca5 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -240,6 +240,7 @@ static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(TAG_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
+ HCTX_FLAG_NAME(NO_MANAGED_IRQ),
};
#undef HCTX_FLAG_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6f1ba3ff783..4ee8695142c0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2249,6 +2249,11 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
return -ENOMEM;
}
+static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+{
+ return 0;
+}
+
/*
* 'cpu' is going away. splice any existing rq_list entries from this
* software queue to the hw queue dispatch list, and ensure that it
@@ -2285,6 +2290,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
{
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
&hctx->cpuhp_dead);
}
@@ -2344,6 +2352,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
{
hctx->queue_num = hctx_idx;
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
hctx->tags = set->tags[hctx_idx];
@@ -3588,6 +3599,8 @@ static int __init blk_mq_init(void)
{
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
+ cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online",
+ NULL, blk_mq_hctx_notify_online);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..651dadd9be12 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2012,7 +2012,7 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
- lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+ lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3f8577e2c13b..5f1ff70ac029 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
md->tag_set->ops = &dm_mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
- md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+ md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
md->tag_set->driver_data = md;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b669e776d4cb..ca2201435a48 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,6 +140,8 @@ struct blk_mq_hw_ctx {
*/
atomic_t nr_active;
+ /** @cpuhp_online: List to store request if CPU is going to die */
+ struct hlist_node cpuhp_online;
/** @cpuhp_dead: List to store request if some CPU die. */
struct hlist_node cpuhp_dead;
/** @kobj: Kernel object for sysfs. */
@@ -388,6 +390,7 @@ struct blk_mq_ops {
enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
BLK_MQ_F_TAG_SHARED = 1 << 1,
+ BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index d37c17e68268..8bd2fea6cd59 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -151,6 +151,7 @@ enum cpuhp_state {
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
CPUHP_AP_IRQ_AFFINITY_ONLINE,
+ CPUHP_AP_BLK_MQ_ONLINE,
CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS,
CPUHP_AP_X86_INTEL_EPB_ONLINE,
CPUHP_AP_PERF_ONLINE,
--
2.25.2
next prev parent reply other threads:[~2020-04-07 9:29 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-07 9:28 [PATCH V6 0/8] blk-mq: improvement CPU hotplug Ming Lei
2020-04-07 9:28 ` [PATCH V6 1/8] blk-mq: assign rq->tag in blk_mq_get_driver_tag Ming Lei
2020-04-07 17:14 ` Christoph Hellwig
2020-04-08 1:38 ` Ming Lei
2020-04-07 9:28 ` [PATCH V6 2/8] blk-mq: add new state of BLK_MQ_S_INACTIVE Ming Lei
2020-04-07 17:14 ` Christoph Hellwig
2020-04-07 9:28 ` Ming Lei [this message]
2020-04-07 9:28 ` [PATCH V6 4/8] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-04-07 9:28 ` [PATCH V6 5/8] block: add blk_end_flush_machinery Ming Lei
2020-04-07 9:28 ` [PATCH V6 6/8] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-04-07 9:29 ` [PATCH V6 7/8] blk-mq: handle requests dispatched from IO scheduler in case of inactive hctx Ming Lei
2020-04-07 9:29 ` [PATCH V6 8/8] block: deactivate hctx when the hctx is actually inactive Ming Lei
2020-04-08 12:40 ` [PATCH V6 0/8] blk-mq: improvement CPU hotplug Daniel Wagner
2020-04-08 13:01 ` John Garry
2020-04-08 13:10 ` Daniel Wagner
2020-04-08 13:29 ` John Garry
2020-04-08 15:14 ` Daniel Wagner
2020-04-08 16:56 ` John Garry
2020-04-08 13:25 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200407092901.314228-4-ming.lei@redhat.com \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=john.garry@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).