* [PATCH v5 0/10] block: per-distpatch_queue flush machinery
@ 2014-09-25 15:23 Ming Lei
2014-09-25 15:23 ` [PATCH v5 01/10] blk-mq: handle failure path for initializing hctx Ming Lei
` (10 more replies)
0 siblings, 11 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig
Hi,
As recent discussion, especially suggested by Christoph, this patchset
implements per-distpatch_queue flush machinery, so that:
- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed
- flushing performance gets improved in case of multi hw-queue
About 70% throughput improvement is observed in sync write
over multi dispatch-queue virtio-blk, see details in commit log
of patch 10/10.
This patchset can be pulled from below tree too:
git://kernel.ubuntu.com/ming/linux.git v3.17-block-dev-flush_v5
V5:
- make the failure path fix and cleanup patch as 1st one
- fix oops in case of bio based request queue(patch 6)
- pass 'node' to blk_alloc_flush_queue() instead of 'hctx'(patch 10)
- comment on init_request/exit_request about flush request(patch 10)
V4:
- remove pdu copy from original request to flush request
- don't call blk_free_flush_queue for !q->mq_ops
V3:
- don't return failure code from blk_alloc_flush_queue() to
avoid freeing invalid buffer in case of allocation failure
- remove blk_init_flush() and blk_exit_flush()
- remove unnecessary WARN_ON() from blk_alloc_flush_queue()
V2:
- refactor blk_mq_init_hw_queues() and its pair, also it is a fix
on failure path, so that conversion to per-queue flush becomes simple.
- allocate/initialize flush queue in blk_mq_init_hw_queues()
- add sync write tests on virtio-blk which is backed by SSD image
V1:
- commit log typo fix
- introduce blk_alloc_flush_queue() and its pair earlier, so
that patch 5 and 8 become easier for review
block/blk-core.c | 13 ++--
block/blk-flush.c | 141 +++++++++++++++++++++++++------------
block/blk-mq.c | 180 +++++++++++++++++++++++++++---------------------
block/blk-mq.h | 1 -
block/blk-sysfs.c | 4 +-
block/blk.h | 35 +++++++++-
include/linux/blk-mq.h | 6 ++
include/linux/blkdev.h | 10 +--
8 files changed, 246 insertions(+), 144 deletions(-)
thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 01/10] blk-mq: handle failure path for initializing hctx
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 02/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
Failure of initializing one hctx isn't handled, so this patch
introduces blk_mq_init_hctx() and its pair to handle it explicitly.
Also this patch makes code cleaner.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-mq.c | 114 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 69 insertions(+), 45 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 03f5c79..d8c7f90 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1509,6 +1509,20 @@ static int blk_mq_hctx_notify(void *data, unsigned long action,
return NOTIFY_OK;
}
+static void blk_mq_exit_hctx(struct request_queue *q,
+ struct blk_mq_tag_set *set,
+ struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
+{
+ blk_mq_tag_idle(hctx);
+
+ if (set->ops->exit_hctx)
+ set->ops->exit_hctx(hctx, hctx_idx);
+
+ blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+ kfree(hctx->ctxs);
+ blk_mq_free_bitmap(&hctx->ctx_map);
+}
+
static void blk_mq_exit_hw_queues(struct request_queue *q,
struct blk_mq_tag_set *set, int nr_queue)
{
@@ -1518,17 +1532,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
queue_for_each_hw_ctx(q, hctx, i) {
if (i == nr_queue)
break;
-
- blk_mq_tag_idle(hctx);
-
- if (set->ops->exit_hctx)
- set->ops->exit_hctx(hctx, i);
-
- blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
- kfree(hctx->ctxs);
- blk_mq_free_bitmap(&hctx->ctx_map);
+ blk_mq_exit_hctx(q, set, hctx, i);
}
-
}
static void blk_mq_free_hw_queues(struct request_queue *q,
@@ -1543,53 +1548,72 @@ static void blk_mq_free_hw_queues(struct request_queue *q,
}
}
-static int blk_mq_init_hw_queues(struct request_queue *q,
- struct blk_mq_tag_set *set)
+static int blk_mq_init_hctx(struct request_queue *q,
+ struct blk_mq_tag_set *set,
+ struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
{
- struct blk_mq_hw_ctx *hctx;
- unsigned int i;
+ int node;
+
+ node = hctx->numa_node;
+ if (node == NUMA_NO_NODE)
+ node = hctx->numa_node = set->numa_node;
+
+ INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
+ INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
+ spin_lock_init(&hctx->lock);
+ INIT_LIST_HEAD(&hctx->dispatch);
+ hctx->queue = q;
+ hctx->queue_num = hctx_idx;
+ hctx->flags = set->flags;
+ hctx->cmd_size = set->cmd_size;
+
+ blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
+ blk_mq_hctx_notify, hctx);
+ blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
+
+ hctx->tags = set->tags[hctx_idx];
/*
- * Initialize hardware queues
+ * Allocate space for all possible cpus to avoid allocation at
+ * runtime
*/
- queue_for_each_hw_ctx(q, hctx, i) {
- int node;
+ hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
+ GFP_KERNEL, node);
+ if (!hctx->ctxs)
+ goto unregister_cpu_notifier;
- node = hctx->numa_node;
- if (node == NUMA_NO_NODE)
- node = hctx->numa_node = set->numa_node;
+ if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
+ goto free_ctxs;
- INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
- INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
- spin_lock_init(&hctx->lock);
- INIT_LIST_HEAD(&hctx->dispatch);
- hctx->queue = q;
- hctx->queue_num = i;
- hctx->flags = set->flags;
- hctx->cmd_size = set->cmd_size;
+ hctx->nr_ctx = 0;
- blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
- blk_mq_hctx_notify, hctx);
- blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
+ if (set->ops->init_hctx &&
+ set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
+ goto free_bitmap;
- hctx->tags = set->tags[i];
+ return 0;
- /*
- * Allocate space for all possible cpus to avoid allocation at
- * runtime
- */
- hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
- GFP_KERNEL, node);
- if (!hctx->ctxs)
- break;
+ free_bitmap:
+ blk_mq_free_bitmap(&hctx->ctx_map);
+ free_ctxs:
+ kfree(hctx->ctxs);
+ unregister_cpu_notifier:
+ blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
- if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
- break;
+ return -1;
+}
- hctx->nr_ctx = 0;
+static int blk_mq_init_hw_queues(struct request_queue *q,
+ struct blk_mq_tag_set *set)
+{
+ struct blk_mq_hw_ctx *hctx;
+ unsigned int i;
- if (set->ops->init_hctx &&
- set->ops->init_hctx(hctx, set->driver_data, i))
+ /*
+ * Initialize hardware queues
+ */
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (blk_mq_init_hctx(q, set, hctx, i))
break;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 02/10] blk-mq: allocate flush_rq in blk_mq_init_flush()
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
2014-09-25 15:23 ` [PATCH v5 01/10] blk-mq: handle failure path for initializing hctx Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 03/10] block: introduce blk_init_flush and its pair Ming Lei
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
It is reasonable to allocate flush req in blk_mq_init_flush().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-flush.c | 11 ++++++++++-
block/blk-mq.c | 16 ++++++----------
block/blk-mq.h | 2 +-
3 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index c8e2576..55028a7 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -472,7 +472,16 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-void blk_mq_init_flush(struct request_queue *q)
+int blk_mq_init_flush(struct request_queue *q)
{
+ struct blk_mq_tag_set *set = q->tag_set;
+
spin_lock_init(&q->mq_flush_lock);
+
+ q->flush_rq = kzalloc(round_up(sizeof(struct request) +
+ set->cmd_size, cache_line_size()),
+ GFP_KERNEL);
+ if (!q->flush_rq)
+ return -ENOMEM;
+ return 0;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8c7f90..80c2cd2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1849,17 +1849,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
if (set->ops->complete)
blk_queue_softirq_done(q, set->ops->complete);
- blk_mq_init_flush(q);
blk_mq_init_cpu_queues(q, set->nr_hw_queues);
- q->flush_rq = kzalloc(round_up(sizeof(struct request) +
- set->cmd_size, cache_line_size()),
- GFP_KERNEL);
- if (!q->flush_rq)
- goto err_hw;
-
if (blk_mq_init_hw_queues(q, set))
- goto err_flush_rq;
+ goto err_hw;
mutex_lock(&all_q_mutex);
list_add_tail(&q->all_q_node, &all_q_list);
@@ -1867,12 +1860,15 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
blk_mq_add_queue_tag_set(set, q);
+ if (blk_mq_init_flush(q))
+ goto err_hw_queues;
+
blk_mq_map_swqueue(q);
return q;
-err_flush_rq:
- kfree(q->flush_rq);
+err_hw_queues:
+ blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
err_hw:
blk_cleanup_queue(q);
err_hctxs:
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a3c613a..ecac69c 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,7 @@ struct blk_mq_ctx {
void __blk_mq_complete_request(struct request *rq);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_init_flush(struct request_queue *q);
+int blk_mq_init_flush(struct request_queue *q);
void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q);
void blk_mq_clone_flush_request(struct request *flush_rq,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 03/10] block: introduce blk_init_flush and its pair
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
2014-09-25 15:23 ` [PATCH v5 01/10] blk-mq: handle failure path for initializing hctx Ming Lei
2014-09-25 15:23 ` [PATCH v5 02/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 04/10] block: move flush initialization to blk_flush_init Ming Lei
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 5 ++---
block/blk-flush.c | 19 ++++++++++++++++++-
block/blk-mq.c | 2 +-
block/blk-mq.h | 1 -
block/blk-sysfs.c | 4 ++--
block/blk.h | 3 +++
6 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 5bd9a04..215490a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -705,8 +705,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
if (!q)
return NULL;
- q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
- if (!q->flush_rq)
+ if (blk_init_flush(q))
return NULL;
if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@@ -742,7 +741,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
return q;
fail:
- kfree(q->flush_rq);
+ blk_exit_flush(q);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 55028a7..c72ab32 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -472,7 +472,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-int blk_mq_init_flush(struct request_queue *q)
+static int blk_mq_init_flush(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
@@ -485,3 +485,20 @@ int blk_mq_init_flush(struct request_queue *q)
return -ENOMEM;
return 0;
}
+
+int blk_init_flush(struct request_queue *q)
+{
+ if (q->mq_ops)
+ return blk_mq_init_flush(q);
+
+ q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
+ if (!q->flush_rq)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void blk_exit_flush(struct request_queue *q)
+{
+ kfree(q->flush_rq);
+}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 80c2cd2..5acdec4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1860,7 +1860,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
blk_mq_add_queue_tag_set(set, q);
- if (blk_mq_init_flush(q))
+ if (blk_init_flush(q))
goto err_hw_queues;
blk_mq_map_swqueue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ecac69c..d567d52 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ struct blk_mq_ctx {
void __blk_mq_complete_request(struct request *rq);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-int blk_mq_init_flush(struct request_queue *q);
void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q);
void blk_mq_clone_flush_request(struct request *flush_rq,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 17f5c84..9490759 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,11 +517,11 @@ static void blk_release_queue(struct kobject *kobj)
if (q->queue_tags)
__blk_queue_free_tags(q);
+ blk_exit_flush(q);
+
if (q->mq_ops)
blk_mq_free_queue(q);
- kfree(q->flush_rq);
-
blk_trace_shutdown(q);
bdi_destroy(&q->backing_dev_info);
diff --git a/block/blk.h b/block/blk.h
index e515a28..c6fa3d4 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -22,6 +22,9 @@ static inline void __blk_get_queue(struct request_queue *q)
kobject_get(&q->kobj);
}
+int blk_init_flush(struct request_queue *q);
+void blk_exit_flush(struct request_queue *q);
+
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
void blk_exit_rl(struct request_list *rl);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 04/10] block: move flush initialization to blk_flush_init
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (2 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 03/10] block: introduce blk_init_flush and its pair Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 05/10] block: avoid to use q->flush_rq directly Ming Lei
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
These fields are always used with the flush request, so
initialize them together.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 3 ---
block/blk-flush.c | 4 ++++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 215490a..bfb44ba 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -600,9 +600,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
#ifdef CONFIG_BLK_CGROUP
INIT_LIST_HEAD(&q->blkg_list);
#endif
- INIT_LIST_HEAD(&q->flush_queue[0]);
- INIT_LIST_HEAD(&q->flush_queue[1]);
- INIT_LIST_HEAD(&q->flush_data_in_flight);
INIT_DELAYED_WORK(&q->delay_work, blk_delay_work);
kobject_init(&q->kobj, &blk_queue_ktype);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index c72ab32..a49ffbd 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -488,6 +488,10 @@ static int blk_mq_init_flush(struct request_queue *q)
int blk_init_flush(struct request_queue *q)
{
+ INIT_LIST_HEAD(&q->flush_queue[0]);
+ INIT_LIST_HEAD(&q->flush_queue[1]);
+ INIT_LIST_HEAD(&q->flush_data_in_flight);
+
if (q->mq_ops)
return blk_mq_init_flush(q);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 05/10] block: avoid to use q->flush_rq directly
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (3 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 04/10] block: move flush initialization to blk_flush_init Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 06/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
This patch trys to use local variable to access flush request,
so that we can convert to per-queue flush machinery a bit easier.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-flush.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index a49ffbd..caf44756 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -223,7 +223,7 @@ static void flush_end_io(struct request *flush_rq, int error)
if (q->mq_ops) {
spin_lock_irqsave(&q->mq_flush_lock, flags);
- q->flush_rq->tag = -1;
+ flush_rq->tag = -1;
}
running = &q->flush_queue[q->flush_running_idx];
@@ -281,6 +281,7 @@ static bool blk_kick_flush(struct request_queue *q)
struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
struct request *first_rq =
list_first_entry(pending, struct request, flush.list);
+ struct request *flush_rq = q->flush_rq;
/* C1 described at the top of this file */
if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending))
@@ -298,16 +299,16 @@ static bool blk_kick_flush(struct request_queue *q)
*/
q->flush_pending_idx ^= 1;
- blk_rq_init(q, q->flush_rq);
+ blk_rq_init(q, flush_rq);
if (q->mq_ops)
- blk_mq_clone_flush_request(q->flush_rq, first_rq);
+ blk_mq_clone_flush_request(flush_rq, first_rq);
- q->flush_rq->cmd_type = REQ_TYPE_FS;
- q->flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
- q->flush_rq->rq_disk = first_rq->rq_disk;
- q->flush_rq->end_io = flush_end_io;
+ flush_rq->cmd_type = REQ_TYPE_FS;
+ flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
+ flush_rq->rq_disk = first_rq->rq_disk;
+ flush_rq->end_io = flush_end_io;
- return blk_flush_queue_rq(q->flush_rq, false);
+ return blk_flush_queue_rq(flush_rq, false);
}
static void flush_data_end_io(struct request *rq, int error)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 06/10] block: introduce blk_flush_queue to drive flush machinery
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (4 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 05/10] block: avoid to use q->flush_rq directly Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 07/10] block: remove blk_init_flush() and its pair Ming Lei
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that
- flush implementation details aren't exposed to driver
- it is easy to convert to per dispatch-queue flush machinery
This patch is basically a mechanical replacement.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 4 +-
block/blk-flush.c | 109 ++++++++++++++++++++++++++++++------------------
block/blk-mq.c | 10 +++--
block/blk.h | 22 +++++++++-
include/linux/blkdev.h | 10 +----
5 files changed, 99 insertions(+), 56 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index bfb44ba..e2a3f0d4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -390,11 +390,13 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
* be drained. Check all the queues and counters.
*/
if (drain_all) {
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
drain |= !list_empty(&q->queue_head);
for (i = 0; i < 2; i++) {
drain |= q->nr_rqs[i];
drain |= q->in_flight[i];
- drain |= !list_empty(&q->flush_queue[i]);
+ if (fq)
+ drain |= !list_empty(&fq->flush_queue[i]);
}
}
diff --git a/block/blk-flush.c b/block/blk-flush.c
index caf44756..b01a86d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -28,7 +28,7 @@
*
* The actual execution of flush is double buffered. Whenever a request
* needs to execute PRE or POSTFLUSH, it queues at
- * q->flush_queue[q->flush_pending_idx]. Once certain criteria are met, a
+ * fq->flush_queue[fq->flush_pending_idx]. Once certain criteria are met, a
* flush is issued and the pending_idx is toggled. When the flush
* completes, all the requests which were pending are proceeded to the next
* step. This allows arbitrary merging of different types of FLUSH/FUA
@@ -155,7 +155,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
* completion and trigger the next step.
*
* CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
*
* RETURNS:
* %true if requests were added to the dispatch queue, %false otherwise.
@@ -164,7 +164,8 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
int error)
{
struct request_queue *q = rq->q;
- struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
bool queued = false, kicked;
BUG_ON(rq->flush.seq & seq);
@@ -180,12 +181,12 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
case REQ_FSEQ_POSTFLUSH:
/* queue for flush */
if (list_empty(pending))
- q->flush_pending_since = jiffies;
+ fq->flush_pending_since = jiffies;
list_move_tail(&rq->flush.list, pending);
break;
case REQ_FSEQ_DATA:
- list_move_tail(&rq->flush.list, &q->flush_data_in_flight);
+ list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
queued = blk_flush_queue_rq(rq, true);
break;
@@ -220,17 +221,18 @@ static void flush_end_io(struct request *flush_rq, int error)
bool queued = false;
struct request *rq, *n;
unsigned long flags = 0;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
if (q->mq_ops) {
- spin_lock_irqsave(&q->mq_flush_lock, flags);
+ spin_lock_irqsave(&fq->mq_flush_lock, flags);
flush_rq->tag = -1;
}
- running = &q->flush_queue[q->flush_running_idx];
- BUG_ON(q->flush_pending_idx == q->flush_running_idx);
+ running = &fq->flush_queue[fq->flush_running_idx];
+ BUG_ON(fq->flush_pending_idx == fq->flush_running_idx);
/* account completion of the flush request */
- q->flush_running_idx ^= 1;
+ fq->flush_running_idx ^= 1;
if (!q->mq_ops)
elv_completed_request(q, flush_rq);
@@ -254,13 +256,13 @@ static void flush_end_io(struct request *flush_rq, int error)
* directly into request_fn may confuse the driver. Always use
* kblockd.
*/
- if (queued || q->flush_queue_delayed) {
+ if (queued || fq->flush_queue_delayed) {
WARN_ON(q->mq_ops);
blk_run_queue_async(q);
}
- q->flush_queue_delayed = 0;
+ fq->flush_queue_delayed = 0;
if (q->mq_ops)
- spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
/**
@@ -271,33 +273,34 @@ static void flush_end_io(struct request *flush_rq, int error)
* Please read the comment at the top of this file for more info.
*
* CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
*
* RETURNS:
* %true if flush was issued, %false otherwise.
*/
static bool blk_kick_flush(struct request_queue *q)
{
- struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
struct request *first_rq =
list_first_entry(pending, struct request, flush.list);
- struct request *flush_rq = q->flush_rq;
+ struct request *flush_rq = fq->flush_rq;
/* C1 described at the top of this file */
- if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending))
+ if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
return false;
/* C2 and C3 */
- if (!list_empty(&q->flush_data_in_flight) &&
+ if (!list_empty(&fq->flush_data_in_flight) &&
time_before(jiffies,
- q->flush_pending_since + FLUSH_PENDING_TIMEOUT))
+ fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
return false;
/*
* Issue flush and toggle pending_idx. This makes pending_idx
* different from running_idx, which means flush is in flight.
*/
- q->flush_pending_idx ^= 1;
+ fq->flush_pending_idx ^= 1;
blk_rq_init(q, flush_rq);
if (q->mq_ops)
@@ -329,6 +332,7 @@ static void mq_flush_data_end_io(struct request *rq, int error)
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
unsigned long flags;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
ctx = rq->mq_ctx;
hctx = q->mq_ops->map_queue(q, ctx->cpu);
@@ -337,10 +341,10 @@ static void mq_flush_data_end_io(struct request *rq, int error)
* After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io().
*/
- spin_lock_irqsave(&q->mq_flush_lock, flags);
+ spin_lock_irqsave(&fq->mq_flush_lock, flags);
if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
blk_mq_run_hw_queue(hctx, true);
- spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
/**
@@ -408,11 +412,13 @@ void blk_insert_flush(struct request *rq)
rq->cmd_flags |= REQ_FLUSH_SEQ;
rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
if (q->mq_ops) {
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
+
rq->end_io = mq_flush_data_end_io;
- spin_lock_irq(&q->mq_flush_lock);
+ spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
- spin_unlock_irq(&q->mq_flush_lock);
+ spin_unlock_irq(&fq->mq_flush_lock);
return;
}
rq->end_io = flush_data_end_io;
@@ -473,31 +479,52 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-static int blk_mq_init_flush(struct request_queue *q)
+static struct blk_flush_queue *blk_alloc_flush_queue(
+ struct request_queue *q)
{
- struct blk_mq_tag_set *set = q->tag_set;
+ struct blk_flush_queue *fq;
+ int rq_sz = sizeof(struct request);
- spin_lock_init(&q->mq_flush_lock);
+ fq = kzalloc(sizeof(*fq), GFP_KERNEL);
+ if (!fq)
+ goto fail;
- q->flush_rq = kzalloc(round_up(sizeof(struct request) +
- set->cmd_size, cache_line_size()),
- GFP_KERNEL);
- if (!q->flush_rq)
- return -ENOMEM;
- return 0;
+ if (q->mq_ops) {
+ spin_lock_init(&fq->mq_flush_lock);
+ rq_sz = round_up(rq_sz + q->tag_set->cmd_size,
+ cache_line_size());
+ }
+
+ fq->flush_rq = kzalloc(rq_sz, GFP_KERNEL);
+ if (!fq->flush_rq)
+ goto fail_rq;
+
+ INIT_LIST_HEAD(&fq->flush_queue[0]);
+ INIT_LIST_HEAD(&fq->flush_queue[1]);
+ INIT_LIST_HEAD(&fq->flush_data_in_flight);
+
+ return fq;
+
+ fail_rq:
+ kfree(fq);
+ fail:
+ return NULL;
}
-int blk_init_flush(struct request_queue *q)
+static void blk_free_flush_queue(struct blk_flush_queue *fq)
{
- INIT_LIST_HEAD(&q->flush_queue[0]);
- INIT_LIST_HEAD(&q->flush_queue[1]);
- INIT_LIST_HEAD(&q->flush_data_in_flight);
+ /* bio based request queue hasn't flush queue */
+ if (!fq)
+ return;
- if (q->mq_ops)
- return blk_mq_init_flush(q);
+ kfree(fq->flush_rq);
+ kfree(fq);
+}
- q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
- if (!q->flush_rq)
+int blk_init_flush(struct request_queue *q)
+{
+ q->fq = blk_alloc_flush_queue(q);
+ if (!q->fq)
return -ENOMEM;
return 0;
@@ -505,5 +532,5 @@ int blk_init_flush(struct request_queue *q)
void blk_exit_flush(struct request_queue *q)
{
- kfree(q->flush_rq);
+ blk_free_flush_queue(q->fq);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5acdec4..2b9ab09 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -508,20 +508,22 @@ void blk_mq_kick_requeue_list(struct request_queue *q)
}
EXPORT_SYMBOL(blk_mq_kick_requeue_list);
-static inline bool is_flush_request(struct request *rq, unsigned int tag)
+static inline bool is_flush_request(struct request *rq,
+ struct blk_flush_queue *fq, unsigned int tag)
{
return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
- rq->q->flush_rq->tag == tag);
+ fq->flush_rq->tag == tag);
}
struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
{
struct request *rq = tags->rqs[tag];
+ struct blk_flush_queue *fq = blk_get_flush_queue(rq->q);
- if (!is_flush_request(rq, tag))
+ if (!is_flush_request(rq, fq, tag))
return rq;
- return rq->q->flush_rq;
+ return fq->flush_rq;
}
EXPORT_SYMBOL(blk_mq_tag_to_rq);
diff --git a/block/blk.h b/block/blk.h
index c6fa3d4..833c4ac 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -12,11 +12,28 @@
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
+struct blk_flush_queue {
+ unsigned int flush_queue_delayed:1;
+ unsigned int flush_pending_idx:1;
+ unsigned int flush_running_idx:1;
+ unsigned long flush_pending_since;
+ struct list_head flush_queue[2];
+ struct list_head flush_data_in_flight;
+ struct request *flush_rq;
+ spinlock_t mq_flush_lock;
+};
+
extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype;
extern struct ida blk_queue_ida;
+static inline struct blk_flush_queue *blk_get_flush_queue(
+ struct request_queue *q)
+{
+ return q->fq;
+}
+
static inline void __blk_get_queue(struct request_queue *q)
{
kobject_get(&q->kobj);
@@ -89,6 +106,7 @@ void blk_insert_flush(struct request *rq);
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
while (1) {
if (!list_empty(&q->queue_head)) {
@@ -111,9 +129,9 @@ static inline struct request *__elv_next_request(struct request_queue *q)
* should be restarted later. Please see flush_end_io() for
* details.
*/
- if (q->flush_pending_idx != q->flush_running_idx &&
+ if (fq->flush_pending_idx != fq->flush_running_idx &&
!queue_flush_queueable(q)) {
- q->flush_queue_delayed = 1;
+ fq->flush_queue_delayed = 1;
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b0cbe1a..ab28dd4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -36,6 +36,7 @@ struct request;
struct sg_io_hdr;
struct bsg_job;
struct blkcg_gq;
+struct blk_flush_queue;
#define BLKDEV_MIN_RQ 4
#define BLKDEV_MAX_RQ 128 /* Default maximum */
@@ -455,14 +456,7 @@ struct request_queue {
*/
unsigned int flush_flags;
unsigned int flush_not_queueable:1;
- unsigned int flush_queue_delayed:1;
- unsigned int flush_pending_idx:1;
- unsigned int flush_running_idx:1;
- unsigned long flush_pending_since;
- struct list_head flush_queue[2];
- struct list_head flush_data_in_flight;
- struct request *flush_rq;
- spinlock_t mq_flush_lock;
+ struct blk_flush_queue *fq;
struct list_head requeue_list;
spinlock_t requeue_lock;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 07/10] block: remove blk_init_flush() and its pair
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (5 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 06/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 08/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 5 +++--
block/blk-flush.c | 19 ++-----------------
block/blk-mq.c | 3 ++-
block/blk-sysfs.c | 2 +-
block/blk.h | 4 ++--
5 files changed, 10 insertions(+), 23 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index e2a3f0d4..10555fb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -704,7 +704,8 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
if (!q)
return NULL;
- if (blk_init_flush(q))
+ q->fq = blk_alloc_flush_queue(q);
+ if (!q->fq)
return NULL;
if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@@ -740,7 +741,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
return q;
fail:
- blk_exit_flush(q);
+ blk_free_flush_queue(q->fq);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index b01a86d..d66cbf2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -479,8 +479,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-static struct blk_flush_queue *blk_alloc_flush_queue(
- struct request_queue *q)
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q)
{
struct blk_flush_queue *fq;
int rq_sz = sizeof(struct request);
@@ -511,7 +510,7 @@ static struct blk_flush_queue *blk_alloc_flush_queue(
return NULL;
}
-static void blk_free_flush_queue(struct blk_flush_queue *fq)
+void blk_free_flush_queue(struct blk_flush_queue *fq)
{
/* bio based request queue hasn't flush queue */
if (!fq)
@@ -520,17 +519,3 @@ static void blk_free_flush_queue(struct blk_flush_queue *fq)
kfree(fq->flush_rq);
kfree(fq);
}
-
-int blk_init_flush(struct request_queue *q)
-{
- q->fq = blk_alloc_flush_queue(q);
- if (!q->fq)
- return -ENOMEM;
-
- return 0;
-}
-
-void blk_exit_flush(struct request_queue *q)
-{
- blk_free_flush_queue(q->fq);
-}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2b9ab09..64ed7ac 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1862,7 +1862,8 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
blk_mq_add_queue_tag_set(set, q);
- if (blk_init_flush(q))
+ q->fq = blk_alloc_flush_queue(q);
+ if (!q->fq)
goto err_hw_queues;
blk_mq_map_swqueue(q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9490759..718cffc4c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,7 +517,7 @@ static void blk_release_queue(struct kobject *kobj)
if (q->queue_tags)
__blk_queue_free_tags(q);
- blk_exit_flush(q);
+ blk_free_flush_queue(q->fq);
if (q->mq_ops)
blk_mq_free_queue(q);
diff --git a/block/blk.h b/block/blk.h
index 833c4ac..9eaa6e9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -39,8 +39,8 @@ static inline void __blk_get_queue(struct request_queue *q)
kobject_get(&q->kobj);
}
-int blk_init_flush(struct request_queue *q);
-void blk_exit_flush(struct request_queue *q);
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q);
+void blk_free_flush_queue(struct blk_flush_queue *fq);
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 08/10] block: flush: avoid to figure out flush queue unnecessarily
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (6 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 07/10] block: remove blk_init_flush() and its pair Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 09/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
Just figuring out flush queue at the entry of kicking off flush
machinery and request's completion handler, then pass it through.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-flush.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index d66cbf2..9bc5b4f 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -91,7 +91,8 @@ enum {
FLUSH_PENDING_TIMEOUT = 5 * HZ,
};
-static bool blk_kick_flush(struct request_queue *q);
+static bool blk_kick_flush(struct request_queue *q,
+ struct blk_flush_queue *fq);
static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
{
@@ -148,6 +149,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
/**
* blk_flush_complete_seq - complete flush sequence
* @rq: FLUSH/FUA request being sequenced
+ * @fq: flush queue
* @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero)
* @error: whether an error occurred
*
@@ -160,11 +162,11 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
* RETURNS:
* %true if requests were added to the dispatch queue, %false otherwise.
*/
-static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
- int error)
+static bool blk_flush_complete_seq(struct request *rq,
+ struct blk_flush_queue *fq,
+ unsigned int seq, int error)
{
struct request_queue *q = rq->q;
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
bool queued = false, kicked;
@@ -210,7 +212,7 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
BUG();
}
- kicked = blk_kick_flush(q);
+ kicked = blk_kick_flush(q, fq);
return kicked | queued;
}
@@ -242,7 +244,7 @@ static void flush_end_io(struct request *flush_rq, int error)
unsigned int seq = blk_flush_cur_seq(rq);
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
- queued |= blk_flush_complete_seq(rq, seq, error);
+ queued |= blk_flush_complete_seq(rq, fq, seq, error);
}
/*
@@ -268,6 +270,7 @@ static void flush_end_io(struct request *flush_rq, int error)
/**
* blk_kick_flush - consider issuing flush request
* @q: request_queue being kicked
+ * @fq: flush queue
*
* Flush related states of @q have changed, consider issuing flush request.
* Please read the comment at the top of this file for more info.
@@ -278,9 +281,8 @@ static void flush_end_io(struct request *flush_rq, int error)
* RETURNS:
* %true if flush was issued, %false otherwise.
*/
-static bool blk_kick_flush(struct request_queue *q)
+static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
{
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
struct request *first_rq =
list_first_entry(pending, struct request, flush.list);
@@ -317,12 +319,13 @@ static bool blk_kick_flush(struct request_queue *q)
static void flush_data_end_io(struct request *rq, int error)
{
struct request_queue *q = rq->q;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
/*
* After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io().
*/
- if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+ if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_run_queue_async(q);
}
@@ -342,7 +345,7 @@ static void mq_flush_data_end_io(struct request *rq, int error)
* the comment in flush_end_io().
*/
spin_lock_irqsave(&fq->mq_flush_lock, flags);
- if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+ if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_mq_run_hw_queue(hctx, true);
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
@@ -364,6 +367,7 @@ void blk_insert_flush(struct request *rq)
struct request_queue *q = rq->q;
unsigned int fflags = q->flush_flags; /* may change, cache */
unsigned int policy = blk_flush_policy(fflags, rq);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q);
/*
* @policy now records what operations need to be done. Adjust
@@ -412,18 +416,16 @@ void blk_insert_flush(struct request *rq)
rq->cmd_flags |= REQ_FLUSH_SEQ;
rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
if (q->mq_ops) {
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
-
rq->end_io = mq_flush_data_end_io;
spin_lock_irq(&fq->mq_flush_lock);
- blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
+ blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
return;
}
rq->end_io = flush_data_end_io;
- blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
+ blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
}
/**
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 09/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (7 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 08/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 15:23 ` [PATCH v5 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
2014-09-25 16:30 ` [PATCH v5 0/10] block: " Christoph Hellwig
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.
For legacy queue, the parameter can be simply 'NULL'.
For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 2 +-
block/blk-flush.c | 11 +++++------
block/blk-mq.c | 3 ++-
block/blk.h | 4 ++--
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 10555fb..6d7ece9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -390,7 +390,7 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
* be drained. Check all the queues and counters.
*/
if (drain_all) {
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
drain |= !list_empty(&q->queue_head);
for (i = 0; i < 2; i++) {
drain |= q->nr_rqs[i];
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9bc5b4f..004d95e 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -223,7 +223,7 @@ static void flush_end_io(struct request *flush_rq, int error)
bool queued = false;
struct request *rq, *n;
unsigned long flags = 0;
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
if (q->mq_ops) {
spin_lock_irqsave(&fq->mq_flush_lock, flags);
@@ -319,7 +319,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
static void flush_data_end_io(struct request *rq, int error)
{
struct request_queue *q = rq->q;
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
/*
* After populating an empty queue, kick it to avoid stall. Read
@@ -333,11 +333,10 @@ static void mq_flush_data_end_io(struct request *rq, int error)
{
struct request_queue *q = rq->q;
struct blk_mq_hw_ctx *hctx;
- struct blk_mq_ctx *ctx;
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
unsigned long flags;
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
- ctx = rq->mq_ctx;
hctx = q->mq_ops->map_queue(q, ctx->cpu);
/*
@@ -367,7 +366,7 @@ void blk_insert_flush(struct request *rq)
struct request_queue *q = rq->q;
unsigned int fflags = q->flush_flags; /* may change, cache */
unsigned int policy = blk_flush_policy(fflags, rq);
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
/*
* @policy now records what operations need to be done. Adjust
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 64ed7ac..7c903ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -518,7 +518,8 @@ static inline bool is_flush_request(struct request *rq,
struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
{
struct request *rq = tags->rqs[tag];
- struct blk_flush_queue *fq = blk_get_flush_queue(rq->q);
+ /* mq_ctx of flush rq is always cloned from the corresponding req */
+ struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
if (!is_flush_request(rq, fq, tag))
return rq;
diff --git a/block/blk.h b/block/blk.h
index 9eaa6e9..7ecdd85 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -29,7 +29,7 @@ extern struct kobj_type blk_queue_ktype;
extern struct ida blk_queue_ida;
static inline struct blk_flush_queue *blk_get_flush_queue(
- struct request_queue *q)
+ struct request_queue *q, struct blk_mq_ctx *ctx)
{
return q->fq;
}
@@ -106,7 +106,7 @@ void blk_insert_flush(struct request *rq);
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
- struct blk_flush_queue *fq = blk_get_flush_queue(q);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
while (1) {
if (!list_empty(&q->queue_head)) {
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 10/10] blk-mq: support per-distpatch_queue flush machinery
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (8 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 09/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
@ 2014-09-25 15:23 ` Ming Lei
2014-09-25 16:30 ` [PATCH v5 0/10] block: " Christoph Hellwig
10 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-25 15:23 UTC (permalink / raw)
To: linux-kernel; +Cc: Jens Axboe, Christoph Hellwig, Ming Lei
This patch supports to run one single flush machinery for
each blk-mq dispatch queue, so that:
- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed
- flushing performance gets improved in case of multi hw-queue
In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
increased a lot over my test environment:
- throughput: +70% in case of virtio-blk over null_blk
- throughput: +30% in case of virtio-blk over SSD image
The multi virtqueue feature isn't merged to QEMU yet, and patches for
the feature can be found in below tree:
git://kernel.ubuntu.com/ming/qemu.git v2.1.0-mq.4
And simply passing 'num_queues=4 vectors=5' should be enough to
enable multi queue(quad queue) feature for QEMU virtio-blk.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 2 +-
block/blk-flush.c | 21 +++++++++++++-------
block/blk-mq.c | 50 +++++++++++++++++++++++-------------------------
block/blk-sysfs.c | 4 ++--
block/blk.h | 16 +++++++++++++---
include/linux/blk-mq.h | 6 ++++++
6 files changed, 60 insertions(+), 39 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 6d7ece9..9acfd8e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -704,7 +704,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
if (!q)
return NULL;
- q->fq = blk_alloc_flush_queue(q);
+ q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
if (!q->fq)
return NULL;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 004d95e..20badd7 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -305,8 +305,15 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
fq->flush_pending_idx ^= 1;
blk_rq_init(q, flush_rq);
- if (q->mq_ops)
- blk_mq_clone_flush_request(flush_rq, first_rq);
+
+ /*
+ * Borrow tag from the first request since they can't
+ * be in flight at the same time.
+ */
+ if (q->mq_ops) {
+ flush_rq->mq_ctx = first_rq->mq_ctx;
+ flush_rq->tag = first_rq->tag;
+ }
flush_rq->cmd_type = REQ_TYPE_FS;
flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
@@ -480,22 +487,22 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q)
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
+ int node, int cmd_size)
{
struct blk_flush_queue *fq;
int rq_sz = sizeof(struct request);
- fq = kzalloc(sizeof(*fq), GFP_KERNEL);
+ fq = kzalloc_node(sizeof(*fq), GFP_KERNEL, node);
if (!fq)
goto fail;
if (q->mq_ops) {
spin_lock_init(&fq->mq_flush_lock);
- rq_sz = round_up(rq_sz + q->tag_set->cmd_size,
- cache_line_size());
+ rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
}
- fq->flush_rq = kzalloc(rq_sz, GFP_KERNEL);
+ fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
if (!fq->flush_rq)
goto fail_rq;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7c903ae..6c00855 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -281,26 +281,6 @@ void blk_mq_free_request(struct request *rq)
__blk_mq_free_request(hctx, ctx, rq);
}
-/*
- * Clone all relevant state from a request that has been put on hold in
- * the flush state machine into the preallocated flush request that hangs
- * off the request queue.
- *
- * For a driver the flush request should be invisible, that's why we are
- * impersonating the original request here.
- */
-void blk_mq_clone_flush_request(struct request *flush_rq,
- struct request *orig_rq)
-{
- struct blk_mq_hw_ctx *hctx =
- orig_rq->q->mq_ops->map_queue(orig_rq->q, orig_rq->mq_ctx->cpu);
-
- flush_rq->mq_ctx = orig_rq->mq_ctx;
- flush_rq->tag = orig_rq->tag;
- memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
- hctx->cmd_size);
-}
-
inline void __blk_mq_end_request(struct request *rq, int error)
{
blk_account_io_done(rq);
@@ -1516,12 +1496,20 @@ static void blk_mq_exit_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
+ unsigned flush_start_tag = set->queue_depth;
+
blk_mq_tag_idle(hctx);
+ if (set->ops->exit_request)
+ set->ops->exit_request(set->driver_data,
+ hctx->fq->flush_rq, hctx_idx,
+ flush_start_tag + hctx_idx);
+
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+ blk_free_flush_queue(hctx->fq);
kfree(hctx->ctxs);
blk_mq_free_bitmap(&hctx->ctx_map);
}
@@ -1556,6 +1544,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
{
int node;
+ unsigned flush_start_tag = set->queue_depth;
node = hctx->numa_node;
if (node == NUMA_NO_NODE)
@@ -1594,8 +1583,23 @@ static int blk_mq_init_hctx(struct request_queue *q,
set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
goto free_bitmap;
+ hctx->fq = blk_alloc_flush_queue(q, hctx->numa_node, set->cmd_size);
+ if (!hctx->fq)
+ goto exit_hctx;
+
+ if (set->ops->init_request &&
+ set->ops->init_request(set->driver_data,
+ hctx->fq->flush_rq, hctx_idx,
+ flush_start_tag + hctx_idx, node))
+ goto free_fq;
+
return 0;
+ free_fq:
+ kfree(hctx->fq);
+ exit_hctx:
+ if (set->ops->exit_hctx)
+ set->ops->exit_hctx(hctx, hctx_idx);
free_bitmap:
blk_mq_free_bitmap(&hctx->ctx_map);
free_ctxs:
@@ -1863,16 +1867,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
blk_mq_add_queue_tag_set(set, q);
- q->fq = blk_alloc_flush_queue(q);
- if (!q->fq)
- goto err_hw_queues;
-
blk_mq_map_swqueue(q);
return q;
-err_hw_queues:
- blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
err_hw:
blk_cleanup_queue(q);
err_hctxs:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 718cffc4c..e8f38a3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,10 +517,10 @@ static void blk_release_queue(struct kobject *kobj)
if (q->queue_tags)
__blk_queue_free_tags(q);
- blk_free_flush_queue(q->fq);
-
if (q->mq_ops)
blk_mq_free_queue(q);
+ else
+ blk_free_flush_queue(q->fq);
blk_trace_shutdown(q);
diff --git a/block/blk.h b/block/blk.h
index 7ecdd85..43b0361 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -2,6 +2,8 @@
#define BLK_INTERNAL_H
#include <linux/idr.h>
+#include <linux/blk-mq.h>
+#include "blk-mq.h"
/* Amount of time in which a process may batch requests */
#define BLK_BATCH_TIME (HZ/50UL)
@@ -31,7 +33,14 @@ extern struct ida blk_queue_ida;
static inline struct blk_flush_queue *blk_get_flush_queue(
struct request_queue *q, struct blk_mq_ctx *ctx)
{
- return q->fq;
+ struct blk_mq_hw_ctx *hctx;
+
+ if (!q->mq_ops)
+ return q->fq;
+
+ hctx = q->mq_ops->map_queue(q, ctx->cpu);
+
+ return hctx->fq;
}
static inline void __blk_get_queue(struct request_queue *q)
@@ -39,8 +48,9 @@ static inline void __blk_get_queue(struct request_queue *q)
kobject_get(&q->kobj);
}
-struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q);
-void blk_free_flush_queue(struct blk_flush_queue *fq);
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
+ int node, int cmd_size);
+void blk_free_flush_queue(struct blk_flush_queue *q);
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3253495..02c5d95 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -4,6 +4,7 @@
#include <linux/blkdev.h>
struct blk_mq_tags;
+struct blk_flush_queue;
struct blk_mq_cpu_notifier {
struct list_head list;
@@ -34,6 +35,7 @@ struct blk_mq_hw_ctx {
struct request_queue *queue;
unsigned int queue_num;
+ struct blk_flush_queue *fq;
void *driver_data;
@@ -119,6 +121,10 @@ struct blk_mq_ops {
/*
* Called for every command allocated by the block layer to allow
* the driver to set up driver specific data.
+ *
+ * Tag greater than or equal to queue_depth is for setting up
+ * flush request.
+ *
* Ditto for exit/teardown.
*/
init_request_fn *init_request;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 0/10] block: per-distpatch_queue flush machinery
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
` (9 preceding siblings ...)
2014-09-25 15:23 ` [PATCH v5 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
@ 2014-09-25 16:30 ` Christoph Hellwig
2014-09-26 4:35 ` Ming Lei
10 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2014-09-25 16:30 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-kernel, Jens Axboe
[-- Attachment #1: Type: text/plain, Size: 395 bytes --]
To review this properly I had to actually squatch patches 2-4 and 6-9
into a single patch to read nicely over it, the result is attached below.
That one has one minor issue in __blk_drain_queue where a newly added
indent uses 4 spaces instead of tabs, but otherwise everything in this
series looks fine to me and passes sanity testing with scsi-mq.
Reviewed-by: Christoph Hellwig <hch@lst.de>
[-- Attachment #2: 0003-blk-mq-allocate-flush_rq-in-blk_mq_init_flush.patch --]
[-- Type: text/x-patch, Size: 19349 bytes --]
>From 48fb5773b19e6db3c2c26d76740e0995d0bbd136 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Thu, 25 Sep 2014 23:23:39 +0800
Subject: blk-mq: allocate flush_rq in blk_mq_init_flush()
It is reasonable to allocate flush req in blk_mq_init_flush().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: introduce blk_init_flush and its pair
These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: move flush initialization to blk_flush_init
These fields are always used with the flush request, so
initialize them together.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: introduce blk_flush_queue to drive flush machinery
This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that
- flush implementation details aren't exposed to driver
- it is easy to convert to per dispatch-queue flush machinery
This patch is basically a mechanical replacement.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: remove blk_init_flush() and its pair
Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: flush: avoid to figure out flush queue unnecessarily
Just figuring out flush queue at the entry of kicking off flush
machinery and request's completion handler, then pass it through.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.
For legacy queue, the parameter can be simply 'NULL'.
For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/blk-core.c | 13 +++---
block/blk-flush.c | 117 +++++++++++++++++++++++++++++++++----------------
block/blk-mq.c | 28 ++++++------
block/blk-mq.h | 1 -
block/blk-sysfs.c | 4 +-
block/blk.h | 25 ++++++++++-
include/linux/blkdev.h | 10 +----
7 files changed, 127 insertions(+), 71 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 6946a42..b1dd4e0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -390,11 +390,13 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
* be drained. Check all the queues and counters.
*/
if (drain_all) {
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
drain |= !list_empty(&q->queue_head);
for (i = 0; i < 2; i++) {
drain |= q->nr_rqs[i];
drain |= q->in_flight[i];
- drain |= !list_empty(&q->flush_queue[i]);
+ if (fq)
+ drain |= !list_empty(&fq->flush_queue[i]);
}
}
@@ -600,9 +602,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
#ifdef CONFIG_BLK_CGROUP
INIT_LIST_HEAD(&q->blkg_list);
#endif
- INIT_LIST_HEAD(&q->flush_queue[0]);
- INIT_LIST_HEAD(&q->flush_queue[1]);
- INIT_LIST_HEAD(&q->flush_data_in_flight);
INIT_DELAYED_WORK(&q->delay_work, blk_delay_work);
kobject_init(&q->kobj, &blk_queue_ktype);
@@ -705,8 +704,8 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
if (!q)
return NULL;
- q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
- if (!q->flush_rq)
+ q->fq = blk_alloc_flush_queue(q);
+ if (!q->fq)
return NULL;
if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@@ -742,7 +741,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
return q;
fail:
- kfree(q->flush_rq);
+ blk_free_flush_queue(q->fq);
return NULL;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 1bafbfe..004d95e 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -28,7 +28,7 @@
*
* The actual execution of flush is double buffered. Whenever a request
* needs to execute PRE or POSTFLUSH, it queues at
- * q->flush_queue[q->flush_pending_idx]. Once certain criteria are met, a
+ * fq->flush_queue[fq->flush_pending_idx]. Once certain criteria are met, a
* flush is issued and the pending_idx is toggled. When the flush
* completes, all the requests which were pending are proceeded to the next
* step. This allows arbitrary merging of different types of FLUSH/FUA
@@ -91,7 +91,8 @@ enum {
FLUSH_PENDING_TIMEOUT = 5 * HZ,
};
-static bool blk_kick_flush(struct request_queue *q);
+static bool blk_kick_flush(struct request_queue *q,
+ struct blk_flush_queue *fq);
static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
{
@@ -148,6 +149,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
/**
* blk_flush_complete_seq - complete flush sequence
* @rq: FLUSH/FUA request being sequenced
+ * @fq: flush queue
* @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero)
* @error: whether an error occurred
*
@@ -155,16 +157,17 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
* completion and trigger the next step.
*
* CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
*
* RETURNS:
* %true if requests were added to the dispatch queue, %false otherwise.
*/
-static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
- int error)
+static bool blk_flush_complete_seq(struct request *rq,
+ struct blk_flush_queue *fq,
+ unsigned int seq, int error)
{
struct request_queue *q = rq->q;
- struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+ struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
bool queued = false, kicked;
BUG_ON(rq->flush.seq & seq);
@@ -180,12 +183,12 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
case REQ_FSEQ_POSTFLUSH:
/* queue for flush */
if (list_empty(pending))
- q->flush_pending_since = jiffies;
+ fq->flush_pending_since = jiffies;
list_move_tail(&rq->flush.list, pending);
break;
case REQ_FSEQ_DATA:
- list_move_tail(&rq->flush.list, &q->flush_data_in_flight);
+ list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
queued = blk_flush_queue_rq(rq, true);
break;
@@ -209,7 +212,7 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
BUG();
}
- kicked = blk_kick_flush(q);
+ kicked = blk_kick_flush(q, fq);
return kicked | queued;
}
@@ -220,17 +223,18 @@ static void flush_end_io(struct request *flush_rq, int error)
bool queued = false;
struct request *rq, *n;
unsigned long flags = 0;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
if (q->mq_ops) {
- spin_lock_irqsave(&q->mq_flush_lock, flags);
+ spin_lock_irqsave(&fq->mq_flush_lock, flags);
flush_rq->tag = -1;
}
- running = &q->flush_queue[q->flush_running_idx];
- BUG_ON(q->flush_pending_idx == q->flush_running_idx);
+ running = &fq->flush_queue[fq->flush_running_idx];
+ BUG_ON(fq->flush_pending_idx == fq->flush_running_idx);
/* account completion of the flush request */
- q->flush_running_idx ^= 1;
+ fq->flush_running_idx ^= 1;
if (!q->mq_ops)
elv_completed_request(q, flush_rq);
@@ -240,7 +244,7 @@ static void flush_end_io(struct request *flush_rq, int error)
unsigned int seq = blk_flush_cur_seq(rq);
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
- queued |= blk_flush_complete_seq(rq, seq, error);
+ queued |= blk_flush_complete_seq(rq, fq, seq, error);
}
/*
@@ -254,50 +258,51 @@ static void flush_end_io(struct request *flush_rq, int error)
* directly into request_fn may confuse the driver. Always use
* kblockd.
*/
- if (queued || q->flush_queue_delayed) {
+ if (queued || fq->flush_queue_delayed) {
WARN_ON(q->mq_ops);
blk_run_queue_async(q);
}
- q->flush_queue_delayed = 0;
+ fq->flush_queue_delayed = 0;
if (q->mq_ops)
- spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
/**
* blk_kick_flush - consider issuing flush request
* @q: request_queue being kicked
+ * @fq: flush queue
*
* Flush related states of @q have changed, consider issuing flush request.
* Please read the comment at the top of this file for more info.
*
* CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
*
* RETURNS:
* %true if flush was issued, %false otherwise.
*/
-static bool blk_kick_flush(struct request_queue *q)
+static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
{
- struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+ struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
struct request *first_rq =
list_first_entry(pending, struct request, flush.list);
- struct request *flush_rq = q->flush_rq;
+ struct request *flush_rq = fq->flush_rq;
/* C1 described at the top of this file */
- if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending))
+ if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
return false;
/* C2 and C3 */
- if (!list_empty(&q->flush_data_in_flight) &&
+ if (!list_empty(&fq->flush_data_in_flight) &&
time_before(jiffies,
- q->flush_pending_since + FLUSH_PENDING_TIMEOUT))
+ fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
return false;
/*
* Issue flush and toggle pending_idx. This makes pending_idx
* different from running_idx, which means flush is in flight.
*/
- q->flush_pending_idx ^= 1;
+ fq->flush_pending_idx ^= 1;
blk_rq_init(q, flush_rq);
if (q->mq_ops)
@@ -314,12 +319,13 @@ static bool blk_kick_flush(struct request_queue *q)
static void flush_data_end_io(struct request *rq, int error)
{
struct request_queue *q = rq->q;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
/*
* After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io().
*/
- if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+ if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_run_queue_async(q);
}
@@ -327,20 +333,20 @@ static void mq_flush_data_end_io(struct request *rq, int error)
{
struct request_queue *q = rq->q;
struct blk_mq_hw_ctx *hctx;
- struct blk_mq_ctx *ctx;
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
unsigned long flags;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
- ctx = rq->mq_ctx;
hctx = q->mq_ops->map_queue(q, ctx->cpu);
/*
* After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io().
*/
- spin_lock_irqsave(&q->mq_flush_lock, flags);
- if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+ spin_lock_irqsave(&fq->mq_flush_lock, flags);
+ if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_mq_run_hw_queue(hctx, true);
- spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+ spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
/**
@@ -360,6 +366,7 @@ void blk_insert_flush(struct request *rq)
struct request_queue *q = rq->q;
unsigned int fflags = q->flush_flags; /* may change, cache */
unsigned int policy = blk_flush_policy(fflags, rq);
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
/*
* @policy now records what operations need to be done. Adjust
@@ -410,14 +417,14 @@ void blk_insert_flush(struct request *rq)
if (q->mq_ops) {
rq->end_io = mq_flush_data_end_io;
- spin_lock_irq(&q->mq_flush_lock);
- blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
- spin_unlock_irq(&q->mq_flush_lock);
+ spin_lock_irq(&fq->mq_flush_lock);
+ blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
+ spin_unlock_irq(&fq->mq_flush_lock);
return;
}
rq->end_io = flush_data_end_io;
- blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
+ blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
}
/**
@@ -473,7 +480,43 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
}
EXPORT_SYMBOL(blkdev_issue_flush);
-void blk_mq_init_flush(struct request_queue *q)
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q)
{
- spin_lock_init(&q->mq_flush_lock);
+ struct blk_flush_queue *fq;
+ int rq_sz = sizeof(struct request);
+
+ fq = kzalloc(sizeof(*fq), GFP_KERNEL);
+ if (!fq)
+ goto fail;
+
+ if (q->mq_ops) {
+ spin_lock_init(&fq->mq_flush_lock);
+ rq_sz = round_up(rq_sz + q->tag_set->cmd_size,
+ cache_line_size());
+ }
+
+ fq->flush_rq = kzalloc(rq_sz, GFP_KERNEL);
+ if (!fq->flush_rq)
+ goto fail_rq;
+
+ INIT_LIST_HEAD(&fq->flush_queue[0]);
+ INIT_LIST_HEAD(&fq->flush_queue[1]);
+ INIT_LIST_HEAD(&fq->flush_data_in_flight);
+
+ return fq;
+
+ fail_rq:
+ kfree(fq);
+ fail:
+ return NULL;
+}
+
+void blk_free_flush_queue(struct blk_flush_queue *fq)
+{
+ /* bio based request queue hasn't flush queue */
+ if (!fq)
+ return;
+
+ kfree(fq->flush_rq);
+ kfree(fq);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 66ef1fb..53b6def1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -508,20 +508,23 @@ void blk_mq_kick_requeue_list(struct request_queue *q)
}
EXPORT_SYMBOL(blk_mq_kick_requeue_list);
-static inline bool is_flush_request(struct request *rq, unsigned int tag)
+static inline bool is_flush_request(struct request *rq,
+ struct blk_flush_queue *fq, unsigned int tag)
{
return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
- rq->q->flush_rq->tag == tag);
+ fq->flush_rq->tag == tag);
}
struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
{
struct request *rq = tags->rqs[tag];
+ /* mq_ctx of flush rq is always cloned from the corresponding req */
+ struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
- if (!is_flush_request(rq, tag))
+ if (!is_flush_request(rq, fq, tag))
return rq;
- return rq->q->flush_rq;
+ return fq->flush_rq;
}
EXPORT_SYMBOL(blk_mq_tag_to_rq);
@@ -1848,17 +1851,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
if (set->ops->complete)
blk_queue_softirq_done(q, set->ops->complete);
- blk_mq_init_flush(q);
blk_mq_init_cpu_queues(q, set->nr_hw_queues);
- q->flush_rq = kzalloc(round_up(sizeof(struct request) +
- set->cmd_size, cache_line_size()),
- GFP_KERNEL);
- if (!q->flush_rq)
- goto err_hw;
-
if (blk_mq_init_hw_queues(q, set))
- goto err_flush_rq;
+ goto err_hw;
mutex_lock(&all_q_mutex);
list_add_tail(&q->all_q_node, &all_q_list);
@@ -1866,12 +1862,16 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
blk_mq_add_queue_tag_set(set, q);
+ q->fq = blk_alloc_flush_queue(q);
+ if (!q->fq)
+ goto err_hw_queues;
+
blk_mq_map_swqueue(q);
return q;
-err_flush_rq:
- kfree(q->flush_rq);
+err_hw_queues:
+ blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
err_hw:
blk_cleanup_queue(q);
err_hctxs:
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a3c613a..d567d52 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ struct blk_mq_ctx {
void __blk_mq_complete_request(struct request *rq);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_init_flush(struct request_queue *q);
void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q);
void blk_mq_clone_flush_request(struct request *flush_rq,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 17f5c84..718cffc4c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,11 +517,11 @@ static void blk_release_queue(struct kobject *kobj)
if (q->queue_tags)
__blk_queue_free_tags(q);
+ blk_free_flush_queue(q->fq);
+
if (q->mq_ops)
blk_mq_free_queue(q);
- kfree(q->flush_rq);
-
blk_trace_shutdown(q);
bdi_destroy(&q->backing_dev_info);
diff --git a/block/blk.h b/block/blk.h
index e515a28..7ecdd85 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -12,16 +12,36 @@
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
+struct blk_flush_queue {
+ unsigned int flush_queue_delayed:1;
+ unsigned int flush_pending_idx:1;
+ unsigned int flush_running_idx:1;
+ unsigned long flush_pending_since;
+ struct list_head flush_queue[2];
+ struct list_head flush_data_in_flight;
+ struct request *flush_rq;
+ spinlock_t mq_flush_lock;
+};
+
extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype;
extern struct ida blk_queue_ida;
+static inline struct blk_flush_queue *blk_get_flush_queue(
+ struct request_queue *q, struct blk_mq_ctx *ctx)
+{
+ return q->fq;
+}
+
static inline void __blk_get_queue(struct request_queue *q)
{
kobject_get(&q->kobj);
}
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q);
+void blk_free_flush_queue(struct blk_flush_queue *fq);
+
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
void blk_exit_rl(struct request_list *rl);
@@ -86,6 +106,7 @@ void blk_insert_flush(struct request *rq);
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
while (1) {
if (!list_empty(&q->queue_head)) {
@@ -108,9 +129,9 @@ static inline struct request *__elv_next_request(struct request_queue *q)
* should be restarted later. Please see flush_end_io() for
* details.
*/
- if (q->flush_pending_idx != q->flush_running_idx &&
+ if (fq->flush_pending_idx != fq->flush_running_idx &&
!queue_flush_queueable(q)) {
- q->flush_queue_delayed = 1;
+ fq->flush_queue_delayed = 1;
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e267bf0..49f3461 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -36,6 +36,7 @@ struct request;
struct sg_io_hdr;
struct bsg_job;
struct blkcg_gq;
+struct blk_flush_queue;
#define BLKDEV_MIN_RQ 4
#define BLKDEV_MAX_RQ 128 /* Default maximum */
@@ -455,14 +456,7 @@ struct request_queue {
*/
unsigned int flush_flags;
unsigned int flush_not_queueable:1;
- unsigned int flush_queue_delayed:1;
- unsigned int flush_pending_idx:1;
- unsigned int flush_running_idx:1;
- unsigned long flush_pending_since;
- struct list_head flush_queue[2];
- struct list_head flush_data_in_flight;
- struct request *flush_rq;
- spinlock_t mq_flush_lock;
+ struct blk_flush_queue *fq;
struct list_head requeue_list;
spinlock_t requeue_lock;
--
1.9.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 0/10] block: per-distpatch_queue flush machinery
2014-09-25 16:30 ` [PATCH v5 0/10] block: " Christoph Hellwig
@ 2014-09-26 4:35 ` Ming Lei
0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2014-09-26 4:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Linux Kernel Mailing List, Jens Axboe
On Fri, Sep 26, 2014 at 12:30 AM, Christoph Hellwig <hch@lst.de> wrote:
> To review this properly I had to actually squatch patches 2-4 and 6-9
> into a single patch to read nicely over it, the result is attached below.
>
> That one has one minor issue in __blk_drain_queue where a newly added
> indent uses 4 spaces instead of tabs, but otherwise everything in this
> series looks fine to me and passes sanity testing with scsi-mq.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Great thanks for your review.
Thanks,
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-09-26 4:35 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-25 15:23 [PATCH v5 0/10] block: per-distpatch_queue flush machinery Ming Lei
2014-09-25 15:23 ` [PATCH v5 01/10] blk-mq: handle failure path for initializing hctx Ming Lei
2014-09-25 15:23 ` [PATCH v5 02/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
2014-09-25 15:23 ` [PATCH v5 03/10] block: introduce blk_init_flush and its pair Ming Lei
2014-09-25 15:23 ` [PATCH v5 04/10] block: move flush initialization to blk_flush_init Ming Lei
2014-09-25 15:23 ` [PATCH v5 05/10] block: avoid to use q->flush_rq directly Ming Lei
2014-09-25 15:23 ` [PATCH v5 06/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
2014-09-25 15:23 ` [PATCH v5 07/10] block: remove blk_init_flush() and its pair Ming Lei
2014-09-25 15:23 ` [PATCH v5 08/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
2014-09-25 15:23 ` [PATCH v5 09/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
2014-09-25 15:23 ` [PATCH v5 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
2014-09-25 16:30 ` [PATCH v5 0/10] block: " Christoph Hellwig
2014-09-26 4:35 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox