[PATCH V2 0/4] blk-mq: queue mapping fix & improvement

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 0/4] blk-mq: queue mapping fix & improvement
@ 2018-12-17 10:42 Ming Lei
  2018-12-17 10:42 ` [PATCH V2 1/4] blk-mq: fix allocation for queue mapping table Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ming Lei @ 2018-12-17 10:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei

Hi,

The 1st patch fixes allocation for queue mapping table.

The 2nd patch fixes shared queue mapping, which may cause IO
hang in case that both write_queues and poll_queues are zero.

The 3rd patch fixes dispatch from sw queue when there is either
write queue or poll queue.

The 4th patch exports hctx->type in debugfs, so that we can write
debugfs based test for verifying if queue mapping is valid.


V2:
	- remove hctx->type export from sysfs
	- take hch's patch to fix shared queue mapping
	- add the patch 3

Christoph Hellwig (1):
  blk-mq: fix shared queue mapping

Ming Lei (3):
  blk-mq: fix allocation for queue mapping table
  blk-mq: fix dispatch from sw queue
  blk-mq: export hctx->type in debugfs instead of sysfs

 block/blk-mq-debugfs.c  | 85 ++++++++++++++++++++++++++++++++-----------------
 block/blk-mq-sched.c    | 23 ++++++++-----
 block/blk-mq-sysfs.c    | 17 ----------
 block/blk-mq.c          | 57 ++++++++++++++++++++-------------
 block/blk-mq.h          | 21 +++++++-----
 drivers/nvme/host/pci.c |  6 +---
 6 files changed, 119 insertions(+), 90 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V2 1/4] blk-mq: fix allocation for queue mapping table
  2018-12-17 10:42 [PATCH V2 0/4] blk-mq: queue mapping fix & improvement Ming Lei
@ 2018-12-17 10:42 ` Ming Lei
  2018-12-17 10:42 ` [PATCH V2 2/4] blk-mq: fix shared queue mapping Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2018-12-17 10:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Jeff Moyer, Mike Snitzer,
	Christoph Hellwig

Type of each element in queue mapping table is 'unsigned int,
intead of 'struct blk_mq_queue_map)', so fix it.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2d3a29eb58ca..313f28b2d079 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3019,7 +3019,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	ret = -ENOMEM;
 	for (i = 0; i < set->nr_maps; i++) {
 		set->map[i].mq_map = kcalloc_node(nr_cpu_ids,
-						  sizeof(struct blk_mq_queue_map),
+						  sizeof(set->map[i].mq_map[0]),
 						  GFP_KERNEL, set->numa_node);
 		if (!set->map[i].mq_map)
 			goto out_free_mq_map;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 2/4] blk-mq: fix shared queue mapping
  2018-12-17 10:42 [PATCH V2 0/4] blk-mq: queue mapping fix & improvement Ming Lei
  2018-12-17 10:42 ` [PATCH V2 1/4] blk-mq: fix allocation for queue mapping table Ming Lei
@ 2018-12-17 10:42 ` Ming Lei
  2018-12-17 10:57   ` Christoph Hellwig
  2018-12-17 10:42 ` [PATCH V2 3/4] blk-mq: fix dispatch from sw queue Ming Lei
  2018-12-17 10:42 ` [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
  3 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-12-17 10:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jeff Moyer, Mike Snitzer,
	Ming Lei

From: Christoph Hellwig <hch@lst.de>

Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
may be setup via blk_mq_map_queues() which cause different mapping compared
with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.

This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
then the whole mapping may become broken, for example, one same ctx can be
mapped to different hctxs with same hctx type. This bad mapping has caused
IO hang in simple dd test, as reported by Mike.

This patch sets map->nr_queues as zero explictly if there is zero
queues for such queue type, also maps to correct hctx if .nr_queues of the
queue type is zero.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>

(don't handle zero .nr_queues map in blk_mq_map_swqueue())
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c          |  3 +++
 block/blk-mq.h          | 11 +++++++----
 drivers/nvme/host/pci.c |  6 +-----
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 313f28b2d079..e843f23843c8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2431,6 +2431,9 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 		for (j = 0; j < set->nr_maps; j++) {
 			hctx = blk_mq_map_queue_type(q, j, i);
 
+			if (!set->map[j].nr_queues)
+				continue;
+
 			/*
 			 * If the CPU is already set in the mask, then we've
 			 * mapped this one already. This can happen if
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b63a0de8a07a..f50c73d559d7 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -105,12 +105,15 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 {
 	enum hctx_type type = HCTX_TYPE_DEFAULT;
 
-	if (q->tag_set->nr_maps > HCTX_TYPE_POLL &&
-	    ((flags & REQ_HIPRI) && test_bit(QUEUE_FLAG_POLL, &q->queue_flags)))
+	if ((flags & REQ_HIPRI) &&
+	    q->tag_set->nr_maps > HCTX_TYPE_POLL &&
+	    q->tag_set->map[HCTX_TYPE_POLL].nr_queues &&
+	    test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
 		type = HCTX_TYPE_POLL;
 
-	else if (q->tag_set->nr_maps > HCTX_TYPE_READ &&
-		 ((flags & REQ_OP_MASK) == REQ_OP_READ))
+	else if (((flags & REQ_OP_MASK) == REQ_OP_READ) &&
+	         q->tag_set->nr_maps > HCTX_TYPE_READ &&
+		 q->tag_set->map[HCTX_TYPE_READ].nr_queues)
 		type = HCTX_TYPE_READ;
 
 	return blk_mq_map_queue_type(q, type, cpu);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fb9d8270f32c..698b350b38cf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -496,11 +496,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 		map->nr_queues = dev->io_queues[i];
 		if (!map->nr_queues) {
 			BUG_ON(i == HCTX_TYPE_DEFAULT);
-
-			/* shared set, resuse read set parameters */
-			map->nr_queues = dev->io_queues[HCTX_TYPE_DEFAULT];
-			qoff = 0;
-			offset = queue_irq_offset(dev);
+			continue;
 		}
 
 		/*
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 3/4] blk-mq: fix dispatch from sw queue
  2018-12-17 10:42 [PATCH V2 0/4] blk-mq: queue mapping fix & improvement Ming Lei
  2018-12-17 10:42 ` [PATCH V2 1/4] blk-mq: fix allocation for queue mapping table Ming Lei
  2018-12-17 10:42 ` [PATCH V2 2/4] blk-mq: fix shared queue mapping Ming Lei
@ 2018-12-17 10:42 ` Ming Lei
  2018-12-17 11:08   ` Christoph Hellwig
  2018-12-17 10:42 ` [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
  3 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-12-17 10:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Jeff Moyer, Mike Snitzer,
	Christoph Hellwig

When requst is added to rq list of sw queue(ctx), the rq may be from
different hctx, after multi queue mapping is introduced.

So we have to put the request into one per-queue-type list inside
sw queue, otherwise the request may be dispatched to wrong hw queue.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 69 ++++++++++++++++++++++++++++----------------------
 block/blk-mq-sched.c   | 23 +++++++++++------
 block/blk-mq.c         | 52 ++++++++++++++++++++++---------------
 block/blk-mq.h         | 10 +++++---
 4 files changed, 91 insertions(+), 63 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 2793e91bc7a4..7021d44cef6d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -637,36 +637,43 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
 	return 0;
 }
 
-static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
-	__acquires(&ctx->lock)
-{
-	struct blk_mq_ctx *ctx = m->private;
-
-	spin_lock(&ctx->lock);
-	return seq_list_start(&ctx->rq_list, *pos);
-}
-
-static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
-{
-	struct blk_mq_ctx *ctx = m->private;
-
-	return seq_list_next(v, &ctx->rq_list, pos);
-}
+#define CTX_RQ_SEQ_OPS(name, type)					\
+static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
+	__acquires(&ctx->lock)						\
+{									\
+	struct blk_mq_ctx *ctx = m->private;				\
+									\
+	spin_lock(&ctx->list[type].lock);				\
+	return seq_list_start(&ctx->list[type].rq_list, *pos);		\
+}									\
+									\
+static void *ctx_##name##_rq_list_next(struct seq_file *m, void *v,	\
+				     loff_t *pos)			\
+{									\
+	struct blk_mq_ctx *ctx = m->private;				\
+									\
+	return seq_list_next(v, &ctx->list[type].rq_list, pos);		\
+}									\
+									\
+static void ctx_##name##_rq_list_stop(struct seq_file *m, void *v)	\
+	__releases(&ctx->lock)						\
+{									\
+	struct blk_mq_ctx *ctx = m->private;				\
+									\
+	spin_unlock(&ctx->list[type].lock);				\
+}									\
+									\
+static const struct seq_operations ctx_##name##_rq_list_seq_ops = {	\
+	.start	= ctx_##name##_rq_list_start,				\
+	.next	= ctx_##name##_rq_list_next,				\
+	.stop	= ctx_##name##_rq_list_stop,				\
+	.show	= blk_mq_debugfs_rq_show,				\
+}
+
+CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
+CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
+CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
 
-static void ctx_rq_list_stop(struct seq_file *m, void *v)
-	__releases(&ctx->lock)
-{
-	struct blk_mq_ctx *ctx = m->private;
-
-	spin_unlock(&ctx->lock);
-}
-
-static const struct seq_operations ctx_rq_list_seq_ops = {
-	.start	= ctx_rq_list_start,
-	.next	= ctx_rq_list_next,
-	.stop	= ctx_rq_list_stop,
-	.show	= blk_mq_debugfs_rq_show,
-};
 static int ctx_dispatched_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_ctx *ctx = data;
@@ -803,7 +810,9 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 };
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
-	{"rq_list", 0400, .seq_ops = &ctx_rq_list_seq_ops},
+	{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
+	{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
+	{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
 	{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
 	{"merged", 0600, ctx_merged_show, ctx_merged_write},
 	{"completed", 0600, ctx_completed_show, ctx_completed_write},
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 5b4d52d9cba2..09594947933e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -301,11 +301,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
  * too much time checking for merges.
  */
 static bool blk_mq_attempt_merge(struct request_queue *q,
+				 struct blk_mq_hw_ctx *hctx,
 				 struct blk_mq_ctx *ctx, struct bio *bio)
 {
-	lockdep_assert_held(&ctx->lock);
+	enum hctx_type type = hctx->type;
 
-	if (blk_mq_bio_list_merge(q, &ctx->rq_list, bio)) {
+	lockdep_assert_held(&ctx->list[type].lock);
+
+	if (blk_mq_bio_list_merge(q, &ctx->list[type].rq_list, bio)) {
 		ctx->rq_merged++;
 		return true;
 	}
@@ -319,18 +322,20 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 	struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
 	bool ret = false;
+	enum hctx_type type;
 
 	if (e && e->type->ops.bio_merge) {
 		blk_mq_put_ctx(ctx);
 		return e->type->ops.bio_merge(hctx, bio);
 	}
 
+	type = hctx->type;
 	if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
-			!list_empty_careful(&ctx->rq_list)) {
+			!list_empty_careful(&ctx->list[type].rq_list)) {
 		/* default per sw-queue merge */
-		spin_lock(&ctx->lock);
-		ret = blk_mq_attempt_merge(q, ctx, bio);
-		spin_unlock(&ctx->lock);
+		spin_lock(&ctx->list[type].lock);
+		ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
+		spin_unlock(&ctx->list[type].lock);
 	}
 
 	blk_mq_put_ctx(ctx);
@@ -392,9 +397,11 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 		list_add(&rq->queuelist, &list);
 		e->type->ops.insert_requests(hctx, &list, at_head);
 	} else {
-		spin_lock(&ctx->lock);
+		enum hctx_type type = hctx->type;
+
+		spin_lock(&ctx->list[type].lock);
 		__blk_mq_insert_request(hctx, rq, at_head);
-		spin_unlock(&ctx->lock);
+		spin_unlock(&ctx->list[type].lock);
 	}
 
 run:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e843f23843c8..27303951a752 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -958,11 +958,12 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 	struct flush_busy_ctx_data *flush_data = data;
 	struct blk_mq_hw_ctx *hctx = flush_data->hctx;
 	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+	enum hctx_type type = hctx->type;
 
-	spin_lock(&ctx->lock);
-	list_splice_tail_init(&ctx->rq_list, flush_data->list);
+	spin_lock(&ctx->list[type].lock);
+	list_splice_tail_init(&ctx->list[type].rq_list, flush_data->list);
 	sbitmap_clear_bit(sb, bitnr);
-	spin_unlock(&ctx->lock);
+	spin_unlock(&ctx->list[type].lock);
 	return true;
 }
 
@@ -992,15 +993,16 @@ static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
 	struct dispatch_rq_data *dispatch_data = data;
 	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
 	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+	enum hctx_type type = hctx->type;
 
-	spin_lock(&ctx->lock);
-	if (!list_empty(&ctx->rq_list)) {
-		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+	spin_lock(&ctx->list[type].lock);
+	if (!list_empty(&ctx->list[type].rq_list)) {
+		dispatch_data->rq = list_entry_rq(ctx->list[type].rq_list.next);
 		list_del_init(&dispatch_data->rq->queuelist);
-		if (list_empty(&ctx->rq_list))
+		if (list_empty(&ctx->list[type].rq_list))
 			sbitmap_clear_bit(sb, bitnr);
 	}
-	spin_unlock(&ctx->lock);
+	spin_unlock(&ctx->list[type].lock);
 
 	return !dispatch_data->rq;
 }
@@ -1608,15 +1610,16 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx,
 					    bool at_head)
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
+	enum hctx_type type = hctx->type;
 
-	lockdep_assert_held(&ctx->lock);
+	lockdep_assert_held(&ctx->list[type].lock);
 
 	trace_block_rq_insert(hctx->queue, rq);
 
 	if (at_head)
-		list_add(&rq->queuelist, &ctx->rq_list);
+		list_add(&rq->queuelist, &ctx->list[type].rq_list);
 	else
-		list_add_tail(&rq->queuelist, &ctx->rq_list);
+		list_add_tail(&rq->queuelist, &ctx->list[type].rq_list);
 }
 
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -1624,7 +1627,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 
-	lockdep_assert_held(&ctx->lock);
+	lockdep_assert_held(&ctx->list[hctx->type].lock);
 
 	__blk_mq_insert_req_list(hctx, rq, at_head);
 	blk_mq_hctx_mark_pending(hctx, ctx);
@@ -1651,6 +1654,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 
 {
 	struct request *rq;
+	enum hctx_type type = hctx->type;
 
 	/*
 	 * preemption doesn't flush plug list, so it's possible ctx->cpu is
@@ -1661,10 +1665,10 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 		trace_block_rq_insert(hctx->queue, rq);
 	}
 
-	spin_lock(&ctx->lock);
-	list_splice_tail_init(list, &ctx->rq_list);
+	spin_lock(&ctx->list[type].lock);
+	list_splice_tail_init(list, &ctx->list[type].rq_list);
 	blk_mq_hctx_mark_pending(hctx, ctx);
-	spin_unlock(&ctx->lock);
+	spin_unlock(&ctx->list[type].lock);
 }
 
 static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
@@ -2200,16 +2204,18 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	struct blk_mq_hw_ctx *hctx;
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
+	enum hctx_type type;
 
 	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
+	type = hctx->type;
 
-	spin_lock(&ctx->lock);
-	if (!list_empty(&ctx->rq_list)) {
-		list_splice_init(&ctx->rq_list, &tmp);
+	spin_lock(&ctx->list[type].lock);
+	if (!list_empty(&ctx->list[type].rq_list)) {
+		list_splice_init(&ctx->list[type].rq_list, &tmp);
 		blk_mq_hctx_clear_pending(hctx, ctx);
 	}
-	spin_unlock(&ctx->lock);
+	spin_unlock(&ctx->list[type].lock);
 
 	if (list_empty(&tmp))
 		return 0;
@@ -2343,10 +2349,14 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 	for_each_possible_cpu(i) {
 		struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i);
 		struct blk_mq_hw_ctx *hctx;
+		int k;
 
 		__ctx->cpu = i;
-		spin_lock_init(&__ctx->lock);
-		INIT_LIST_HEAD(&__ctx->rq_list);
+
+		for (k = HCTX_TYPE_DEFAULT; k < HCTX_MAX_TYPES; k++) {
+			spin_lock_init(&__ctx->list[k].lock);
+			INIT_LIST_HEAD(&__ctx->list[k].rq_list);
+		}
 		__ctx->queue = q;
 
 		/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index f50c73d559d7..39064144326e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -12,14 +12,16 @@ struct blk_mq_ctxs {
 	struct blk_mq_ctx __percpu	*queue_ctx;
 };
 
+struct blk_mq_ctx_list {
+	spinlock_t		lock;
+	struct list_head	rq_list;
+}  ____cacheline_aligned_in_smp;
+
 /**
  * struct blk_mq_ctx - State for a software queue facing the submitting CPUs
  */
 struct blk_mq_ctx {
-	struct {
-		spinlock_t		lock;
-		struct list_head	rq_list;
-	}  ____cacheline_aligned_in_smp;
+	struct blk_mq_ctx_list  list[HCTX_MAX_TYPES];
 
 	unsigned int		cpu;
 	unsigned short		index_hw[HCTX_MAX_TYPES];
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs
  2018-12-17 10:42 [PATCH V2 0/4] blk-mq: queue mapping fix & improvement Ming Lei
                   ` (2 preceding siblings ...)
  2018-12-17 10:42 ` [PATCH V2 3/4] blk-mq: fix dispatch from sw queue Ming Lei
@ 2018-12-17 10:42 ` Ming Lei
  2018-12-17 10:57   ` Christoph Hellwig
  3 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-12-17 10:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Jeff Moyer, Mike Snitzer,
	Christoph Hellwig

Now we only export hctx->type via sysfs, and there isn't such info
in hctx entry under debugfs. We often use debugfs only to diagnose
queue mapping issue, so add the support in debugfs.

Queue mapping becomes a bit more complicated after multiple queue
mapping is supported, we may write blktest to verify if queue mapping
is valid based on blk-mq-debugfs.

Given not necessary to export hctx->type twice, so remove the export
from sysfs.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 16 ++++++++++++++++
 block/blk-mq-sysfs.c   | 17 -----------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7021d44cef6d..159607be5158 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -447,6 +447,21 @@ static int hctx_busy_show(void *data, struct seq_file *m)
 	return 0;
 }
 
+static const char *const hctx_types[] = {
+	[HCTX_TYPE_DEFAULT]	= "default",
+	[HCTX_TYPE_READ]	= "read",
+	[HCTX_TYPE_POLL]	= "poll",
+};
+
+static int hctx_type_show(void *data, struct seq_file *m)
+{
+	struct blk_mq_hw_ctx *hctx = data;
+
+	BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
+	seq_printf(m, "%s\n", hctx_types[hctx->type]);
+	return 0;
+}
+
 static int hctx_ctx_map_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
@@ -806,6 +821,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"run", 0600, hctx_run_show, hctx_run_write},
 	{"active", 0400, hctx_active_show},
 	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
+	{"type", 0400, hctx_type_show},
 	{},
 };
 
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 9c2df137256a..3f9c3f4ac44c 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -173,18 +173,6 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page)
 	return ret;
 }
 
-static const char *const hctx_types[] = {
-	[HCTX_TYPE_DEFAULT]	= "default",
-	[HCTX_TYPE_READ]	= "read",
-	[HCTX_TYPE_POLL]	= "poll",
-};
-
-static ssize_t blk_mq_hw_sysfs_type_show(struct blk_mq_hw_ctx *hctx, char *page)
-{
-	BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
-	return sprintf(page, "%s\n", hctx_types[hctx->type]);
-}
-
 static struct attribute *default_ctx_attrs[] = {
 	NULL,
 };
@@ -201,16 +189,11 @@ static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_cpus = {
 	.attr = {.name = "cpu_list", .mode = 0444 },
 	.show = blk_mq_hw_sysfs_cpus_show,
 };
-static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_type = {
-	.attr = {.name = "type", .mode = 0444 },
-	.show = blk_mq_hw_sysfs_type_show,
-};
 
 static struct attribute *default_hw_ctx_attrs[] = {
 	&blk_mq_hw_sysfs_nr_tags.attr,
 	&blk_mq_hw_sysfs_nr_reserved_tags.attr,
 	&blk_mq_hw_sysfs_cpus.attr,
-	&blk_mq_hw_sysfs_type.attr,
 	NULL,
 };
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/4] blk-mq: fix shared queue mapping
  2018-12-17 10:42 ` [PATCH V2 2/4] blk-mq: fix shared queue mapping Ming Lei
@ 2018-12-17 10:57   ` Christoph Hellwig
  2018-12-17 11:06     ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-12-17 10:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jeff Moyer,
	Mike Snitzer

On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
> may be setup via blk_mq_map_queues() which cause different mapping compared
> with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.
> 
> This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
> then the whole mapping may become broken, for example, one same ctx can be
> mapped to different hctxs with same hctx type. This bad mapping has caused
> IO hang in simple dd test, as reported by Mike.
> 
> This patch sets map->nr_queues as zero explictly if there is zero
> queues for such queue type, also maps to correct hctx if .nr_queues of the
> queue type is zero.
> 
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> 
> (don't handle zero .nr_queues map in blk_mq_map_swqueue())
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

I actually think we should split this into three patches:

 1) skip zero-queue maps in blk_mq_map_swqueue
 2) skip zero-queue maps in blk_mq_map_queue
 3) don't duplicate maps in nvme

sorry for misleading you with my quick WIP patch.

I can send send patches for my two with a proper changelogs, but
I'll leave blk_mq_map_swqueue to you as I don't full understand
the rationale, even if it looks sensible to me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs
  2018-12-17 10:42 ` [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
@ 2018-12-17 10:57   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-12-17 10:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeff Moyer, Mike Snitzer,
	Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 2/4] blk-mq: fix shared queue mapping
  2018-12-17 10:57   ` Christoph Hellwig
@ 2018-12-17 11:06     ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2018-12-17 11:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Jeff Moyer, Mike Snitzer

On Mon, Dec 17, 2018 at 11:57:25AM +0100, Christoph Hellwig wrote:
> On Mon, Dec 17, 2018 at 06:42:46PM +0800, Ming Lei wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Even though poll_queues are zero, nvme's mapping for HCTX_TYPE_POLL still
> > may be setup via blk_mq_map_queues() which cause different mapping compared
> > with HCTX_TYPE_DEFAULT's mapping built from managed irq affinity.
> > 
> > This mapping will cause hctx->type to be over-written in blk_mq_map_swqueue(),
> > then the whole mapping may become broken, for example, one same ctx can be
> > mapped to different hctxs with same hctx type. This bad mapping has caused
> > IO hang in simple dd test, as reported by Mike.
> > 
> > This patch sets map->nr_queues as zero explictly if there is zero
> > queues for such queue type, also maps to correct hctx if .nr_queues of the
> > queue type is zero.
> > 
> > Cc: Jeff Moyer <jmoyer@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > 
> > (don't handle zero .nr_queues map in blk_mq_map_swqueue())
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I actually think we should split this into three patches:
> 
>  1) skip zero-queue maps in blk_mq_map_swqueue
>  2) skip zero-queue maps in blk_mq_map_queue
>  3) don't duplicate maps in nvme
> 
> sorry for misleading you with my quick WIP patch.
> 
> I can send send patches for my two with a proper changelogs, but

OK, I just want to fix the issue quick, :-)

> I'll leave blk_mq_map_swqueue to you as I don't full understand
> the rationale, even if it looks sensible to me.

If maps won't be duplicated, skip zero-queue maps in blk_mq_map_swqueue() may
not be necessary. However, it may make the code more robust/simple.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 3/4] blk-mq: fix dispatch from sw queue
  2018-12-17 10:42 ` [PATCH V2 3/4] blk-mq: fix dispatch from sw queue Ming Lei
@ 2018-12-17 11:08   ` Christoph Hellwig
  2018-12-17 11:15     ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2018-12-17 11:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Jeff Moyer, Mike Snitzer,
	Christoph Hellwig



On Mon, Dec 17, 2018 at 06:42:47PM +0800, Ming Lei wrote:
> When requst is added to rq list of sw queue(ctx), the rq may be from
> different hctx, after multi queue mapping is introduced.
> 
> So we have to put the request into one per-queue-type list inside
> sw queue, otherwise the request may be dispatched to wrong hw queue.

Hmm.  For one I don't think splitting the lock makes sense even for
split lists.  Second, do we really need different lists, or just take
the type into account when dispatching from rq_list?  I'm not really
sure about that one, but a rationale for splitting might be worth
adding to the changelog if you want to stick to the split lists.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 3/4] blk-mq: fix dispatch from sw queue
  2018-12-17 11:08   ` Christoph Hellwig
@ 2018-12-17 11:15     ` Ming Lei
  2018-12-17 11:16       ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-12-17 11:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Jeff Moyer, Mike Snitzer

On Mon, Dec 17, 2018 at 12:08:57PM +0100, Christoph Hellwig wrote:
> 
> 
> On Mon, Dec 17, 2018 at 06:42:47PM +0800, Ming Lei wrote:
> > When requst is added to rq list of sw queue(ctx), the rq may be from
> > different hctx, after multi queue mapping is introduced.
> > 
> > So we have to put the request into one per-queue-type list inside
> > sw queue, otherwise the request may be dispatched to wrong hw queue.
> 
> Hmm.  For one I don't think splitting the lock makes sense even for
> split lists.
> Second, do we really need different lists, or just take

It depends if the driver/existed blk-mq code can work well if one
request is dispatched to wrong hctx.

> the type into account when dispatching from rq_list?  I'm not really

This way may be inefficient since the whole list has to be iterated once
for finding all requests aimed to one specific hctx.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2 3/4] blk-mq: fix dispatch from sw queue
  2018-12-17 11:15     ` Ming Lei
@ 2018-12-17 11:16       ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-12-17 11:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Jeff Moyer,
	Mike Snitzer

On Mon, Dec 17, 2018 at 07:15:21PM +0800, Ming Lei wrote:
> On Mon, Dec 17, 2018 at 12:08:57PM +0100, Christoph Hellwig wrote:
> > 
> > 
> > On Mon, Dec 17, 2018 at 06:42:47PM +0800, Ming Lei wrote:
> > > When requst is added to rq list of sw queue(ctx), the rq may be from
> > > different hctx, after multi queue mapping is introduced.
> > > 
> > > So we have to put the request into one per-queue-type list inside
> > > sw queue, otherwise the request may be dispatched to wrong hw queue.
> > 
> > Hmm.  For one I don't think splitting the lock makes sense even for
> > split lists.
> > Second, do we really need different lists, or just take
> 
> It depends if the driver/existed blk-mq code can work well if one
> request is dispatched to wrong hctx.
> 
> > the type into account when dispatching from rq_list?  I'm not really
> 
> This way may be inefficient since the whole list has to be iterated once
> for finding all requests aimed to one specific hctx.

Then just update the changelog to make this clear, please!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-12-17 11:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-17 10:42 [PATCH V2 0/4] blk-mq: queue mapping fix & improvement Ming Lei
2018-12-17 10:42 ` [PATCH V2 1/4] blk-mq: fix allocation for queue mapping table Ming Lei
2018-12-17 10:42 ` [PATCH V2 2/4] blk-mq: fix shared queue mapping Ming Lei
2018-12-17 10:57   ` Christoph Hellwig
2018-12-17 11:06     ` Ming Lei
2018-12-17 10:42 ` [PATCH V2 3/4] blk-mq: fix dispatch from sw queue Ming Lei
2018-12-17 11:08   ` Christoph Hellwig
2018-12-17 11:15     ` Ming Lei
2018-12-17 11:16       ` Christoph Hellwig
2018-12-17 10:42 ` [PATCH V2 4/4] blk-mq: export hctx->type in debugfs instead of sysfs Ming Lei
2018-12-17 10:57   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).