From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
John Garry <john.garry@huawei.com>,
Bart Van Assche <bvanassche@acm.org>,
Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH V7 3/9] blk-mq: prepare for draining IO when hctx's all CPUs are offline
Date: Sat, 18 Apr 2020 11:09:19 +0800 [thread overview]
Message-ID: <20200418030925.31996-4-ming.lei@redhat.com> (raw)
In-Reply-To: <20200418030925.31996-1-ming.lei@redhat.com>
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:
"
That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again.
"
However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is
one cpuhp state handled after the CPU is down, so there isn't any chance
to quiesce hctx for blk-mq wrt. CPU hotplug.
Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues
and wait for completion of in-flight requests.
We will stop hw queue and wait for completion of in-flight requests
when one hctx is becoming dead in the following patch. This way may
cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
loop.
Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
loop, so we needn't to wait for completion of in-flight requests from
dm-rq & loop, then the potential dead-lock can be avoided.
[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-debugfs.c | 1 +
block/blk-mq.c | 19 +++++++++++++++++++
drivers/block/loop.c | 2 +-
drivers/md/dm-rq.c | 2 +-
include/linux/blk-mq.h | 3 +++
include/linux/cpuhotplug.h | 1 +
6 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..8e745826eb86 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -239,6 +239,7 @@ static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(TAG_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
+ HCTX_FLAG_NAME(NO_MANAGED_IRQ),
};
#undef HCTX_FLAG_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29ffa40db793..a28daefd7dd6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2259,6 +2259,16 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
return -ENOMEM;
}
+static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+{
+ return 0;
+}
+
+static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
+{
+ return 0;
+}
+
/*
* 'cpu' is going away. splice any existing rq_list entries from this
* software queue to the hw queue dispatch list, and ensure that it
@@ -2295,6 +2305,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
{
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
&hctx->cpuhp_dead);
}
@@ -2354,6 +2367,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
{
hctx->queue_num = hctx_idx;
+ if (!(hctx->flags & BLK_MQ_F_NO_MANAGED_IRQ))
+ cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+ &hctx->cpuhp_online);
cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
hctx->tags = set->tags[hctx_idx];
@@ -3608,6 +3624,9 @@ static int __init blk_mq_init(void)
{
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
+ cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online",
+ blk_mq_hctx_notify_online,
+ blk_mq_hctx_notify_offline);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..784f2e038b55 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2037,7 +2037,7 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
- lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+ lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3f8577e2c13b..5f1ff70ac029 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
md->tag_set->ops = &dm_mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
- md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+ md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
md->tag_set->driver_data = md;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f389d7c724bd..786614753d73 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,6 +140,8 @@ struct blk_mq_hw_ctx {
*/
atomic_t nr_active;
+ /** @cpuhp_online: List to store request if CPU is going to die */
+ struct hlist_node cpuhp_online;
/** @cpuhp_dead: List to store request if some CPU die. */
struct hlist_node cpuhp_dead;
/** @kobj: Kernel object for sysfs. */
@@ -391,6 +393,7 @@ struct blk_mq_ops {
enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
BLK_MQ_F_TAG_SHARED = 1 << 1,
+ BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 77d70b633531..24b3a77810b6 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -152,6 +152,7 @@ enum cpuhp_state {
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
CPUHP_AP_IRQ_AFFINITY_ONLINE,
+ CPUHP_AP_BLK_MQ_ONLINE,
CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS,
CPUHP_AP_X86_INTEL_EPB_ONLINE,
CPUHP_AP_PERF_ONLINE,
--
2.25.2
next prev parent reply other threads:[~2020-04-18 3:10 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-18 3:09 [PATCH V7 0/9] blk-mq: improvement CPU hotplug Ming Lei
2020-04-18 3:09 ` [PATCH V7 1/9] blk-mq: mark blk_mq_get_driver_tag as static Ming Lei
2020-04-23 7:14 ` Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 2/9] blk-mq: assign rq->tag in blk_mq_get_driver_tag Ming Lei
2020-04-23 7:30 ` Christoph Hellwig
2020-04-18 3:09 ` Ming Lei [this message]
2020-04-23 7:31 ` [PATCH V7 3/9] blk-mq: prepare for draining IO when hctx's all CPUs are offline Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 4/9] blk-mq: support rq filter callback when iterating rqs Ming Lei
2020-04-20 10:34 ` John Garry
2020-04-23 7:31 ` Christoph Hellwig
2020-04-23 7:32 ` Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 5/9] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-04-23 7:38 ` Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 6/9] block: add blk_end_flush_machinery Ming Lei
2020-04-23 7:40 ` Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 7/9] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-04-23 7:50 ` Christoph Hellwig
2020-04-23 8:46 ` Ming Lei
2020-04-18 3:09 ` [PATCH V7 8/9] blk-mq: handle requests dispatched from IO scheduler in case of inactive hctx Ming Lei
2020-04-23 7:51 ` Christoph Hellwig
2020-04-18 3:09 ` [PATCH V7 9/9] block: deactivate hctx when the hctx is actually inactive Ming Lei
2020-04-20 10:29 ` [PATCH V7 0/9] blk-mq: improvement CPU hotplug John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200418030925.31996-4-ming.lei@redhat.com \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=john.garry@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.