From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
John Garry <john.garry@huawei.com>,
Bart Van Assche <bvanassche@acm.org>,
Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
Thomas Gleixner <tglx@linutronix.de>,
Keith Busch <keith.busch@intel.com>
Subject: [PATCH 2/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline
Date: Wed, 15 Jan 2020 19:44:05 +0800 [thread overview]
Message-ID: <20200115114409.28895-3-ming.lei@redhat.com> (raw)
In-Reply-To: <20200115114409.28895-1-ming.lei@redhat.com>
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:
"
That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again.
"
However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is
one cpuhp state handled after the CPU is down, so there isn't any chance
to quiesce hctx for blk-mq wrt. CPU hotplug.
Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues
and wait for completion of in-flight requests.
We will stop hw queue and wait for completion of in-flight requests
when one hctx is becoming dead in the following patch. This way may
cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
loop.
Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
loop, so we needn't to wait for completion of in-flight requests from
dm-rq & loop, then the potential dead-lock can be avoided.
[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-debugfs.c | 1 +
block/blk-mq.c | 13 +++++++++++++
drivers/block/loop.c | 2 +-
drivers/md/dm-rq.c | 2 +-
include/linux/blk-mq.h | 3 +++
include/linux/cpuhotplug.h | 1 +
6 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d6de4f7f38cb..b62390918ca5 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -240,6 +240,7 @@ static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(TAG_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
+ HCTX_FLAG_NAME(NO_MANAGED_IRQ),
};
#undef HCTX_FLAG_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a12b1763508d..0f55bb92b16f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2244,6 +2244,11 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
return -ENOMEM;
}
+static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+{
+ return 0;
+}
+
/*
* 'cpu' is going away. splice any existing rq_list entries from this
* software queue to the hw queue dispatch list, and ensure that it
@@ -2280,6 +2285,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
{
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
&hctx->cpuhp_dead);
}
@@ -2339,6 +2347,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
{
hctx->queue_num = hctx_idx;
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
hctx->tags = set->tags[hctx_idx];
@@ -3585,6 +3596,8 @@ static int __init blk_mq_init(void)
{
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
+ cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online",
+ NULL, blk_mq_hctx_notify_online);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..651dadd9be12 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2012,7 +2012,7 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
- lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+ lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3f8577e2c13b..5f1ff70ac029 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
md->tag_set->ops = &dm_mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
- md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+ md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
md->tag_set->driver_data = md;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b669e776d4cb..ca2201435a48 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,6 +140,8 @@ struct blk_mq_hw_ctx {
*/
atomic_t nr_active;
+ /** @cpuhp_online: List to store request if CPU is going to die */
+ struct hlist_node cpuhp_online;
/** @cpuhp_dead: List to store request if some CPU die. */
struct hlist_node cpuhp_dead;
/** @kobj: Kernel object for sysfs. */
@@ -388,6 +390,7 @@ struct blk_mq_ops {
enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
BLK_MQ_F_TAG_SHARED = 1 << 1,
+ BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e51ee772b9f5..48b7cb04224a 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -149,6 +149,7 @@ enum cpuhp_state {
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
CPUHP_AP_IRQ_AFFINITY_ONLINE,
+ CPUHP_AP_BLK_MQ_ONLINE,
CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS,
CPUHP_AP_X86_INTEL_EPB_ONLINE,
CPUHP_AP_PERF_ONLINE,
--
2.20.1
next prev parent reply other threads:[~2020-01-15 11:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-15 11:44 [PATCH V5 0/6] blk-mq: improvement CPU hotplug Ming Lei
2020-01-15 11:44 ` [PATCH 1/6] blk-mq: add new state of BLK_MQ_S_INACTIVE Ming Lei
2020-01-15 11:44 ` Ming Lei [this message]
2020-01-15 11:44 ` [PATCH 3/6] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-01-15 11:44 ` [PATCH 4/6] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-01-15 11:44 ` [PATCH 5/6] blk-mq: handle requests dispatched from IO scheduler in case of inactive hctx Ming Lei
2020-01-15 11:44 ` [PATCH 6/6] block: deactivate hctx when all its CPUs are offline when running queue Ming Lei
2020-01-15 17:00 ` [PATCH V5 0/6] blk-mq: improvement CPU hotplug John Garry
2020-01-20 13:23 ` John Garry
2020-01-31 10:04 ` Ming Lei
2020-01-31 10:24 ` John Garry
2020-01-31 10:58 ` Ming Lei
2020-01-31 17:51 ` John Garry
2020-01-31 18:02 ` John Garry
2020-02-01 1:31 ` Ming Lei
2020-02-01 11:05 ` Marc Zyngier
2020-02-01 11:31 ` Thomas Gleixner
2020-02-03 10:30 ` John Garry
2020-02-03 10:49 ` John Garry
2020-02-03 10:59 ` Ming Lei
2020-02-03 12:56 ` John Garry
2020-02-03 15:43 ` Marc Zyngier
2020-02-03 18:16 ` John Garry
2020-02-05 14:08 ` John Garry
2020-02-05 14:23 ` Marc Zyngier
2020-02-07 10:56 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200115114409.28895-3-ming.lei@redhat.com \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=john.garry@huawei.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox