* [PATCH V3 0/4] blk-mq: queue mapping fix & improvement
@ 2018-12-17 12:08 Ming Lei
2018-12-17 12:08 ` [PATCH V3 1/3] blk-mq: fix allocation for queue mapping table Ming Lei
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Ming Lei @ 2018-12-17 12:08 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Ming Lei, Christoph Hellwig
Hi,
The 1st patch fixes allocation for queue mapping table.
The 2nd patch exports hctx->type in debugfs, so that we can write
debugfs based test for verifying if queue mapping is valid.
The 3rd patch fixes dispatch from sw queue when there is either
write queue or poll queue.
V3:
- drop the original 2nd patch, which can be done by hch's two
patches
- add changelog on patch3
V2:
- remove hctx->type export from sysfs
- take hch's patch to fix shared queue mapping
- add the patch 3
Ming Lei (3):
blk-mq: fix allocation for queue mapping table
blk-mq: export hctx->type in debugfs instead of sysfs
blk-mq: fix dispatch from sw queue
block/blk-mq-debugfs.c | 85 ++++++++++++++++++++++++++++++++------------------
block/blk-mq-sched.c | 23 +++++++++-----
block/blk-mq-sysfs.c | 17 ----------
block/blk-mq.c | 54 +++++++++++++++++++-------------
block/blk-mq.h | 10 +++---
5 files changed, 108 insertions(+), 81 deletions(-)
Cc: Christoph Hellwig <hch@lst.de>
--
2.9.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V3 1/3] blk-mq: fix allocation for queue mapping table
2018-12-17 12:08 [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Ming Lei
@ 2018-12-17 12:08 ` Ming Lei
2018-12-17 12:08 ` [PATCH V3 2/3] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2018-12-17 12:08 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Ming Lei, Christoph Hellwig
Type of each element in queue mapping table is 'unsigned int,
intead of 'struct blk_mq_queue_map)', so fix it.
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2d3a29eb58ca..313f28b2d079 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3019,7 +3019,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
ret = -ENOMEM;
for (i = 0; i < set->nr_maps; i++) {
set->map[i].mq_map = kcalloc_node(nr_cpu_ids,
- sizeof(struct blk_mq_queue_map),
+ sizeof(set->map[i].mq_map[0]),
GFP_KERNEL, set->numa_node);
if (!set->map[i].mq_map)
goto out_free_mq_map;
--
2.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V3 2/3] blk-mq: export hctx->type in debugfs instead of sysfs
2018-12-17 12:08 [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Ming Lei
2018-12-17 12:08 ` [PATCH V3 1/3] blk-mq: fix allocation for queue mapping table Ming Lei
@ 2018-12-17 12:08 ` Ming Lei
2018-12-17 12:08 ` [PATCH V3 3/3] blk-mq: fix dispatch from sw queue Ming Lei
2018-12-17 12:56 ` [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Jens Axboe
3 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2018-12-17 12:08 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Ming Lei, Christoph Hellwig
Now we only export hctx->type via sysfs, and there isn't such info
in hctx entry under debugfs. We often use debugfs only to diagnose
queue mapping issue, so add the support in debugfs.
Queue mapping becomes a bit more complicated after multiple queue
mapping is supported, we may write blktest to verify if queue mapping
is valid based on blk-mq-debugfs.
Given not necessary to export hctx->type twice, so remove the export
from sysfs.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-debugfs.c | 16 ++++++++++++++++
block/blk-mq-sysfs.c | 17 -----------------
2 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 2793e91bc7a4..1e12033be9ea 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -447,6 +447,21 @@ static int hctx_busy_show(void *data, struct seq_file *m)
return 0;
}
+static const char *const hctx_types[] = {
+ [HCTX_TYPE_DEFAULT] = "default",
+ [HCTX_TYPE_READ] = "read",
+ [HCTX_TYPE_POLL] = "poll",
+};
+
+static int hctx_type_show(void *data, struct seq_file *m)
+{
+ struct blk_mq_hw_ctx *hctx = data;
+
+ BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
+ seq_printf(m, "%s\n", hctx_types[hctx->type]);
+ return 0;
+}
+
static int hctx_ctx_map_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
@@ -799,6 +814,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"run", 0600, hctx_run_show, hctx_run_write},
{"active", 0400, hctx_active_show},
{"dispatch_busy", 0400, hctx_dispatch_busy_show},
+ {"type", 0400, hctx_type_show},
{},
};
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 9c2df137256a..3f9c3f4ac44c 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -173,18 +173,6 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page)
return ret;
}
-static const char *const hctx_types[] = {
- [HCTX_TYPE_DEFAULT] = "default",
- [HCTX_TYPE_READ] = "read",
- [HCTX_TYPE_POLL] = "poll",
-};
-
-static ssize_t blk_mq_hw_sysfs_type_show(struct blk_mq_hw_ctx *hctx, char *page)
-{
- BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
- return sprintf(page, "%s\n", hctx_types[hctx->type]);
-}
-
static struct attribute *default_ctx_attrs[] = {
NULL,
};
@@ -201,16 +189,11 @@ static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_cpus = {
.attr = {.name = "cpu_list", .mode = 0444 },
.show = blk_mq_hw_sysfs_cpus_show,
};
-static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_type = {
- .attr = {.name = "type", .mode = 0444 },
- .show = blk_mq_hw_sysfs_type_show,
-};
static struct attribute *default_hw_ctx_attrs[] = {
&blk_mq_hw_sysfs_nr_tags.attr,
&blk_mq_hw_sysfs_nr_reserved_tags.attr,
&blk_mq_hw_sysfs_cpus.attr,
- &blk_mq_hw_sysfs_type.attr,
NULL,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V3 3/3] blk-mq: fix dispatch from sw queue
2018-12-17 12:08 [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Ming Lei
2018-12-17 12:08 ` [PATCH V3 1/3] blk-mq: fix allocation for queue mapping table Ming Lei
2018-12-17 12:08 ` [PATCH V3 2/3] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
@ 2018-12-17 12:08 ` Ming Lei
2018-12-17 12:55 ` Jens Axboe
2018-12-17 12:56 ` [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Jens Axboe
3 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2018-12-17 12:08 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Ming Lei, Christoph Hellwig
When requst is added to rq list of sw queue(ctx), the rq may be from
different type of hctx, especially after multi queue mapping is introduced.
So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
hctx can be dispatch to current hctx in case that read queue or poll queue
is enabled.
This patch fixes this issue by introducing per-queue-type list.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-debugfs.c | 69 ++++++++++++++++++++++++++++----------------------
block/blk-mq-sched.c | 23 +++++++++++------
block/blk-mq.c | 52 ++++++++++++++++++++++---------------
block/blk-mq.h | 10 +++++---
4 files changed, 91 insertions(+), 63 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 1e12033be9ea..159607be5158 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -652,36 +652,43 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
return 0;
}
-static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
- __acquires(&ctx->lock)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- spin_lock(&ctx->lock);
- return seq_list_start(&ctx->rq_list, *pos);
-}
-
-static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- return seq_list_next(v, &ctx->rq_list, pos);
-}
+#define CTX_RQ_SEQ_OPS(name, type) \
+static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
+ __acquires(&ctx->lock) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ spin_lock(&ctx->list[type].lock); \
+ return seq_list_start(&ctx->list[type].rq_list, *pos); \
+} \
+ \
+static void *ctx_##name##_rq_list_next(struct seq_file *m, void *v, \
+ loff_t *pos) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ return seq_list_next(v, &ctx->list[type].rq_list, pos); \
+} \
+ \
+static void ctx_##name##_rq_list_stop(struct seq_file *m, void *v) \
+ __releases(&ctx->lock) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ spin_unlock(&ctx->list[type].lock); \
+} \
+ \
+static const struct seq_operations ctx_##name##_rq_list_seq_ops = { \
+ .start = ctx_##name##_rq_list_start, \
+ .next = ctx_##name##_rq_list_next, \
+ .stop = ctx_##name##_rq_list_stop, \
+ .show = blk_mq_debugfs_rq_show, \
+}
+
+CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
+CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
+CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
-static void ctx_rq_list_stop(struct seq_file *m, void *v)
- __releases(&ctx->lock)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- spin_unlock(&ctx->lock);
-}
-
-static const struct seq_operations ctx_rq_list_seq_ops = {
- .start = ctx_rq_list_start,
- .next = ctx_rq_list_next,
- .stop = ctx_rq_list_stop,
- .show = blk_mq_debugfs_rq_show,
-};
static int ctx_dispatched_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
@@ -819,7 +826,9 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
- {"rq_list", 0400, .seq_ops = &ctx_rq_list_seq_ops},
+ {"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
+ {"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
+ {"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
{"merged", 0600, ctx_merged_show, ctx_merged_write},
{"completed", 0600, ctx_completed_show, ctx_completed_write},
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 5b4d52d9cba2..09594947933e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -301,11 +301,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
* too much time checking for merges.
*/
static bool blk_mq_attempt_merge(struct request_queue *q,
+ struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx, struct bio *bio)
{
- lockdep_assert_held(&ctx->lock);
+ enum hctx_type type = hctx->type;
- if (blk_mq_bio_list_merge(q, &ctx->rq_list, bio)) {
+ lockdep_assert_held(&ctx->list[type].lock);
+
+ if (blk_mq_bio_list_merge(q, &ctx->list[type].rq_list, bio)) {
ctx->rq_merged++;
return true;
}
@@ -319,18 +322,20 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
bool ret = false;
+ enum hctx_type type;
if (e && e->type->ops.bio_merge) {
blk_mq_put_ctx(ctx);
return e->type->ops.bio_merge(hctx, bio);
}
+ type = hctx->type;
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
- !list_empty_careful(&ctx->rq_list)) {
+ !list_empty_careful(&ctx->list[type].rq_list)) {
/* default per sw-queue merge */
- spin_lock(&ctx->lock);
- ret = blk_mq_attempt_merge(q, ctx, bio);
- spin_unlock(&ctx->lock);
+ spin_lock(&ctx->list[type].lock);
+ ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
+ spin_unlock(&ctx->list[type].lock);
}
blk_mq_put_ctx(ctx);
@@ -392,9 +397,11 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
list_add(&rq->queuelist, &list);
e->type->ops.insert_requests(hctx, &list, at_head);
} else {
- spin_lock(&ctx->lock);
+ enum hctx_type type = hctx->type;
+
+ spin_lock(&ctx->list[type].lock);
__blk_mq_insert_request(hctx, rq, at_head);
- spin_unlock(&ctx->lock);
+ spin_unlock(&ctx->list[type].lock);
}
run:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 313f28b2d079..06213516efd0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -958,11 +958,12 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
struct flush_busy_ctx_data *flush_data = data;
struct blk_mq_hw_ctx *hctx = flush_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+ enum hctx_type type = hctx->type;
- spin_lock(&ctx->lock);
- list_splice_tail_init(&ctx->rq_list, flush_data->list);
+ spin_lock(&ctx->list[type].lock);
+ list_splice_tail_init(&ctx->list[type].rq_list, flush_data->list);
sbitmap_clear_bit(sb, bitnr);
- spin_unlock(&ctx->lock);
+ spin_unlock(&ctx->list[type].lock);
return true;
}
@@ -992,15 +993,16 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
struct dispatch_rq_data *dispatch_data = data;
struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+ enum hctx_type type = hctx->type;
- spin_lock(&ctx->lock);
- if (!list_empty(&ctx->rq_list)) {
- dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+ spin_lock(&ctx->list[type].lock);
+ if (!list_empty(&ctx->list[type].rq_list)) {
+ dispatch_data->rq = list_entry_rq(ctx->list[type].rq_list.next);
list_del_init(&dispatch_data->rq->queuelist);
- if (list_empty(&ctx->rq_list))
+ if (list_empty(&ctx->list[type].rq_list))
sbitmap_clear_bit(sb, bitnr);
}
- spin_unlock(&ctx->lock);
+ spin_unlock(&ctx->list[type].lock);
return !dispatch_data->rq;
}
@@ -1608,15 +1610,16 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx,
bool at_head)
{
struct blk_mq_ctx *ctx = rq->mq_ctx;
+ enum hctx_type type = hctx->type;
- lockdep_assert_held(&ctx->lock);
+ lockdep_assert_held(&ctx->list[type].lock);
trace_block_rq_insert(hctx->queue, rq);
if (at_head)
- list_add(&rq->queuelist, &ctx->rq_list);
+ list_add(&rq->queuelist, &ctx->list[type].rq_list);
else
- list_add_tail(&rq->queuelist, &ctx->rq_list);
+ list_add_tail(&rq->queuelist, &ctx->list[type].rq_list);
}
void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -1624,7 +1627,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
{
struct blk_mq_ctx *ctx = rq->mq_ctx;
- lockdep_assert_held(&ctx->lock);
+ lockdep_assert_held(&ctx->list[hctx->type].lock);
__blk_mq_insert_req_list(hctx, rq, at_head);
blk_mq_hctx_mark_pending(hctx, ctx);
@@ -1651,6 +1654,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
{
struct request *rq;
+ enum hctx_type type = hctx->type;
/*
* preemption doesn't flush plug list, so it's possible ctx->cpu is
@@ -1661,10 +1665,10 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
trace_block_rq_insert(hctx->queue, rq);
}
- spin_lock(&ctx->lock);
- list_splice_tail_init(list, &ctx->rq_list);
+ spin_lock(&ctx->list[type].lock);
+ list_splice_tail_init(list, &ctx->list[type].rq_list);
blk_mq_hctx_mark_pending(hctx, ctx);
- spin_unlock(&ctx->lock);
+ spin_unlock(&ctx->list[type].lock);
}
static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
@@ -2200,16 +2204,18 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
LIST_HEAD(tmp);
+ enum hctx_type type;
hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
ctx = __blk_mq_get_ctx(hctx->queue, cpu);
+ type = hctx->type;
- spin_lock(&ctx->lock);
- if (!list_empty(&ctx->rq_list)) {
- list_splice_init(&ctx->rq_list, &tmp);
+ spin_lock(&ctx->list[type].lock);
+ if (!list_empty(&ctx->list[type].rq_list)) {
+ list_splice_init(&ctx->list[type].rq_list, &tmp);
blk_mq_hctx_clear_pending(hctx, ctx);
}
- spin_unlock(&ctx->lock);
+ spin_unlock(&ctx->list[type].lock);
if (list_empty(&tmp))
return 0;
@@ -2343,10 +2349,14 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
for_each_possible_cpu(i) {
struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i);
struct blk_mq_hw_ctx *hctx;
+ int k;
__ctx->cpu = i;
- spin_lock_init(&__ctx->lock);
- INIT_LIST_HEAD(&__ctx->rq_list);
+
+ for (k = HCTX_TYPE_DEFAULT; k < HCTX_MAX_TYPES; k++) {
+ spin_lock_init(&__ctx->list[k].lock);
+ INIT_LIST_HEAD(&__ctx->list[k].rq_list);
+ }
__ctx->queue = q;
/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d1ed096723fb..0973a91eb1dd 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -12,14 +12,16 @@ struct blk_mq_ctxs {
struct blk_mq_ctx __percpu *queue_ctx;
};
+struct blk_mq_ctx_list {
+ spinlock_t lock;
+ struct list_head rq_list;
+} ____cacheline_aligned_in_smp;
+
/**
* struct blk_mq_ctx - State for a software queue facing the submitting CPUs
*/
struct blk_mq_ctx {
- struct {
- spinlock_t lock;
- struct list_head rq_list;
- } ____cacheline_aligned_in_smp;
+ struct blk_mq_ctx_list list[HCTX_MAX_TYPES];
unsigned int cpu;
unsigned short index_hw[HCTX_MAX_TYPES];
--
2.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V3 3/3] blk-mq: fix dispatch from sw queue
2018-12-17 12:08 ` [PATCH V3 3/3] blk-mq: fix dispatch from sw queue Ming Lei
@ 2018-12-17 12:55 ` Jens Axboe
2018-12-17 13:29 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 12:55 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, Christoph Hellwig
On 12/17/18 5:08 AM, Ming Lei wrote:
> When requst is added to rq list of sw queue(ctx), the rq may be from
> different type of hctx, especially after multi queue mapping is introduced.
>
> So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
> blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
> hctx can be dispatch to current hctx in case that read queue or poll queue
> is enabled.
>
> This patch fixes this issue by introducing per-queue-type list.
Looks good, just one comment:
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index d1ed096723fb..0973a91eb1dd 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -12,14 +12,16 @@ struct blk_mq_ctxs {
> struct blk_mq_ctx __percpu *queue_ctx;
> };
>
> +struct blk_mq_ctx_list {
> + spinlock_t lock;
> + struct list_head rq_list;
> +} ____cacheline_aligned_in_smp;
> +
> /**
> * struct blk_mq_ctx - State for a software queue facing the submitting CPUs
> */
> struct blk_mq_ctx {
> - struct {
> - spinlock_t lock;
> - struct list_head rq_list;
> - } ____cacheline_aligned_in_smp;
> + struct blk_mq_ctx_list list[HCTX_MAX_TYPES];
Let's not make that use 3 cachelines. There is no good reason to split
these across cachelines, if we have heavy traffic to multiple of these,
then we're not going very fast anyway. So just make it:
struct {
spinlock_t lock;
struct list_head rq_list[HCTX_MAX_TYPES];
} ____cacheline_aligned_in_smp;
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V3 0/4] blk-mq: queue mapping fix & improvement
2018-12-17 12:08 [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Ming Lei
` (2 preceding siblings ...)
2018-12-17 12:08 ` [PATCH V3 3/3] blk-mq: fix dispatch from sw queue Ming Lei
@ 2018-12-17 12:56 ` Jens Axboe
3 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 12:56 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, Christoph Hellwig
On 12/17/18 5:08 AM, Ming Lei wrote:
> Hi,
>
> The 1st patch fixes allocation for queue mapping table.
>
> The 2nd patch exports hctx->type in debugfs, so that we can write
> debugfs based test for verifying if queue mapping is valid.
>
> The 3rd patch fixes dispatch from sw queue when there is either
> write queue or poll queue.
Applied 1-2, please respin 3/3 with the mentioned change.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V3 3/3] blk-mq: fix dispatch from sw queue
2018-12-17 12:55 ` Jens Axboe
@ 2018-12-17 13:29 ` Jens Axboe
2018-12-17 15:43 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 13:29 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, Christoph Hellwig
On 12/17/18 5:55 AM, Jens Axboe wrote:
> On 12/17/18 5:08 AM, Ming Lei wrote:
>> When requst is added to rq list of sw queue(ctx), the rq may be from
>> different type of hctx, especially after multi queue mapping is introduced.
>>
>> So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
>> blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
>> hctx can be dispatch to current hctx in case that read queue or poll queue
>> is enabled.
>>
>> This patch fixes this issue by introducing per-queue-type list.
>
> Looks good, just one comment:
>
>> diff --git a/block/blk-mq.h b/block/blk-mq.h
>> index d1ed096723fb..0973a91eb1dd 100644
>> --- a/block/blk-mq.h
>> +++ b/block/blk-mq.h
>> @@ -12,14 +12,16 @@ struct blk_mq_ctxs {
>> struct blk_mq_ctx __percpu *queue_ctx;
>> };
>>
>> +struct blk_mq_ctx_list {
>> + spinlock_t lock;
>> + struct list_head rq_list;
>> +} ____cacheline_aligned_in_smp;
>> +
>> /**
>> * struct blk_mq_ctx - State for a software queue facing the submitting CPUs
>> */
>> struct blk_mq_ctx {
>> - struct {
>> - spinlock_t lock;
>> - struct list_head rq_list;
>> - } ____cacheline_aligned_in_smp;
>> + struct blk_mq_ctx_list list[HCTX_MAX_TYPES];
>
> Let's not make that use 3 cachelines. There is no good reason to split
> these across cachelines, if we have heavy traffic to multiple of these,
> then we're not going very fast anyway. So just make it:
>
> struct {
> spinlock_t lock;
> struct list_head rq_list[HCTX_MAX_TYPES];
> } ____cacheline_aligned_in_smp;
Did it just to check, turns out fine and is of course less changes.
Let me know.
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 1e12033be9ea..90d68760af08 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -652,36 +652,43 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
return 0;
}
-static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
- __acquires(&ctx->lock)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- spin_lock(&ctx->lock);
- return seq_list_start(&ctx->rq_list, *pos);
-}
-
-static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- return seq_list_next(v, &ctx->rq_list, pos);
-}
+#define CTX_RQ_SEQ_OPS(name, type) \
+static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
+ __acquires(&ctx->lock) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ spin_lock(&ctx->lock); \
+ return seq_list_start(&ctx->rq_lists[type], *pos); \
+} \
+ \
+static void *ctx_##name##_rq_list_next(struct seq_file *m, void *v, \
+ loff_t *pos) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ return seq_list_next(v, &ctx->rq_lists[type], pos); \
+} \
+ \
+static void ctx_##name##_rq_list_stop(struct seq_file *m, void *v) \
+ __releases(&ctx->lock) \
+{ \
+ struct blk_mq_ctx *ctx = m->private; \
+ \
+ spin_unlock(&ctx->lock); \
+} \
+ \
+static const struct seq_operations ctx_##name##_rq_list_seq_ops = { \
+ .start = ctx_##name##_rq_list_start, \
+ .next = ctx_##name##_rq_list_next, \
+ .stop = ctx_##name##_rq_list_stop, \
+ .show = blk_mq_debugfs_rq_show, \
+}
+
+CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
+CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
+CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
-static void ctx_rq_list_stop(struct seq_file *m, void *v)
- __releases(&ctx->lock)
-{
- struct blk_mq_ctx *ctx = m->private;
-
- spin_unlock(&ctx->lock);
-}
-
-static const struct seq_operations ctx_rq_list_seq_ops = {
- .start = ctx_rq_list_start,
- .next = ctx_rq_list_next,
- .stop = ctx_rq_list_stop,
- .show = blk_mq_debugfs_rq_show,
-};
static int ctx_dispatched_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
@@ -819,7 +826,9 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
- {"rq_list", 0400, .seq_ops = &ctx_rq_list_seq_ops},
+ {"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
+ {"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
+ {"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
{"merged", 0600, ctx_merged_show, ctx_merged_write},
{"completed", 0600, ctx_completed_show, ctx_completed_write},
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 1efd781fcdea..5af40bbf4fc6 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -301,11 +301,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
* too much time checking for merges.
*/
static bool blk_mq_attempt_merge(struct request_queue *q,
+ struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx, struct bio *bio)
{
+ enum hctx_type type = hctx->type;
+
lockdep_assert_held(&ctx->lock);
- if (blk_mq_bio_list_merge(q, &ctx->rq_list, bio)) {
+ if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio)) {
ctx->rq_merged++;
return true;
}
@@ -319,17 +322,19 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
bool ret = false;
+ enum hctx_type type;
if (e && e->type->ops.bio_merge) {
blk_mq_put_ctx(ctx);
return e->type->ops.bio_merge(hctx, bio);
}
+ type = hctx->type;
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
- !list_empty_careful(&ctx->rq_list)) {
+ !list_empty_careful(&ctx->rq_lists[type])) {
/* default per sw-queue merge */
spin_lock(&ctx->lock);
- ret = blk_mq_attempt_merge(q, ctx, bio);
+ ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
spin_unlock(&ctx->lock);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 313f28b2d079..1546e88fe59c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -958,9 +958,10 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
struct flush_busy_ctx_data *flush_data = data;
struct blk_mq_hw_ctx *hctx = flush_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+ enum hctx_type type = hctx->type;
spin_lock(&ctx->lock);
- list_splice_tail_init(&ctx->rq_list, flush_data->list);
+ list_splice_tail_init(&ctx->rq_lists[type], flush_data->list);
sbitmap_clear_bit(sb, bitnr);
spin_unlock(&ctx->lock);
return true;
@@ -992,12 +993,13 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
struct dispatch_rq_data *dispatch_data = data;
struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+ enum hctx_type type = hctx->type;
spin_lock(&ctx->lock);
- if (!list_empty(&ctx->rq_list)) {
- dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+ if (!list_empty(&ctx->rq_lists[type])) {
+ dispatch_data->rq = list_entry_rq(ctx->rq_lists[type].next);
list_del_init(&dispatch_data->rq->queuelist);
- if (list_empty(&ctx->rq_list))
+ if (list_empty(&ctx->rq_lists[type]))
sbitmap_clear_bit(sb, bitnr);
}
spin_unlock(&ctx->lock);
@@ -1608,15 +1610,16 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx,
bool at_head)
{
struct blk_mq_ctx *ctx = rq->mq_ctx;
+ enum hctx_type type = hctx->type;
lockdep_assert_held(&ctx->lock);
trace_block_rq_insert(hctx->queue, rq);
if (at_head)
- list_add(&rq->queuelist, &ctx->rq_list);
+ list_add(&rq->queuelist, &ctx->rq_lists[type]);
else
- list_add_tail(&rq->queuelist, &ctx->rq_list);
+ list_add_tail(&rq->queuelist, &ctx->rq_lists[type]);
}
void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -1651,6 +1654,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
{
struct request *rq;
+ enum hctx_type type = hctx->type;
/*
* preemption doesn't flush plug list, so it's possible ctx->cpu is
@@ -1662,7 +1666,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
}
spin_lock(&ctx->lock);
- list_splice_tail_init(list, &ctx->rq_list);
+ list_splice_tail_init(list, &ctx->rq_lists[type]);
blk_mq_hctx_mark_pending(hctx, ctx);
spin_unlock(&ctx->lock);
}
@@ -2200,13 +2204,15 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
LIST_HEAD(tmp);
+ enum hctx_type type;
hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
ctx = __blk_mq_get_ctx(hctx->queue, cpu);
+ type = hctx->type;
spin_lock(&ctx->lock);
- if (!list_empty(&ctx->rq_list)) {
- list_splice_init(&ctx->rq_list, &tmp);
+ if (!list_empty(&ctx->rq_lists[type])) {
+ list_splice_init(&ctx->rq_lists[type], &tmp);
blk_mq_hctx_clear_pending(hctx, ctx);
}
spin_unlock(&ctx->lock);
@@ -2343,10 +2349,14 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
for_each_possible_cpu(i) {
struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i);
struct blk_mq_hw_ctx *hctx;
+ int k;
__ctx->cpu = i;
+
spin_lock_init(&__ctx->lock);
- INIT_LIST_HEAD(&__ctx->rq_list);
+ for (k = HCTX_TYPE_DEFAULT; k < HCTX_MAX_TYPES; k++) {
+ INIT_LIST_HEAD(&__ctx->rq_lists[k]);
+ }
__ctx->queue = q;
/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d1ed096723fb..d943d46b0785 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -18,8 +18,8 @@ struct blk_mq_ctxs {
struct blk_mq_ctx {
struct {
spinlock_t lock;
- struct list_head rq_list;
- } ____cacheline_aligned_in_smp;
+ struct list_head rq_lists[HCTX_MAX_TYPES];
+ } ____cacheline_aligned_in_smp;
unsigned int cpu;
unsigned short index_hw[HCTX_MAX_TYPES];
--
Jens Axboe
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V3 3/3] blk-mq: fix dispatch from sw queue
2018-12-17 13:29 ` Jens Axboe
@ 2018-12-17 15:43 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 15:43 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, Christoph Hellwig
On 12/17/18 6:29 AM, Jens Axboe wrote:
> On 12/17/18 5:55 AM, Jens Axboe wrote:
>> On 12/17/18 5:08 AM, Ming Lei wrote:
>>> When requst is added to rq list of sw queue(ctx), the rq may be from
>>> different type of hctx, especially after multi queue mapping is introduced.
>>>
>>> So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
>>> blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
>>> hctx can be dispatch to current hctx in case that read queue or poll queue
>>> is enabled.
>>>
>>> This patch fixes this issue by introducing per-queue-type list.
>>
>> Looks good, just one comment:
>>
>>> diff --git a/block/blk-mq.h b/block/blk-mq.h
>>> index d1ed096723fb..0973a91eb1dd 100644
>>> --- a/block/blk-mq.h
>>> +++ b/block/blk-mq.h
>>> @@ -12,14 +12,16 @@ struct blk_mq_ctxs {
>>> struct blk_mq_ctx __percpu *queue_ctx;
>>> };
>>>
>>> +struct blk_mq_ctx_list {
>>> + spinlock_t lock;
>>> + struct list_head rq_list;
>>> +} ____cacheline_aligned_in_smp;
>>> +
>>> /**
>>> * struct blk_mq_ctx - State for a software queue facing the submitting CPUs
>>> */
>>> struct blk_mq_ctx {
>>> - struct {
>>> - spinlock_t lock;
>>> - struct list_head rq_list;
>>> - } ____cacheline_aligned_in_smp;
>>> + struct blk_mq_ctx_list list[HCTX_MAX_TYPES];
>>
>> Let's not make that use 3 cachelines. There is no good reason to split
>> these across cachelines, if we have heavy traffic to multiple of these,
>> then we're not going very fast anyway. So just make it:
>>
>> struct {
>> spinlock_t lock;
>> struct list_head rq_list[HCTX_MAX_TYPES];
>> } ____cacheline_aligned_in_smp;
>
> Did it just to check, turns out fine and is of course less changes.
> Let me know.
Tests good for me, thanks for doing this work. Just a heads up that
I'm going to queue this up, attributed to you, but just with a note
that I changed the layout of that structure.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-17 15:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-17 12:08 [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Ming Lei
2018-12-17 12:08 ` [PATCH V3 1/3] blk-mq: fix allocation for queue mapping table Ming Lei
2018-12-17 12:08 ` [PATCH V3 2/3] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
2018-12-17 12:08 ` [PATCH V3 3/3] blk-mq: fix dispatch from sw queue Ming Lei
2018-12-17 12:55 ` Jens Axboe
2018-12-17 13:29 ` Jens Axboe
2018-12-17 15:43 ` Jens Axboe
2018-12-17 12:56 ` [PATCH V3 0/4] blk-mq: queue mapping fix & improvement Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).