* blk-mq: allow to defer ->queue_rq invocations to workqueue @ 2014-11-03 8:23 Christoph Hellwig 2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig 2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig 0 siblings, 2 replies; 6+ messages in thread From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel Drivers that need to do synchronous, blocking operations to do I/O generally want to defer all I/O to a drіver-private workqueue. Examples for that are the loop driver, rbd, or ubi block driver, and probably lots more that haven't been evaluated yet. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu 2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig @ 2014-11-03 8:23 ` Christoph Hellwig 2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig 1 sibling, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel Don't duplicate the code to handle the not cpu bounce case in the caller, do it inside blk_mq_hctx_next_cpu instead. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-mq.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index b355b59..22e50a5 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -780,10 +780,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) */ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) { - int cpu = hctx->next_cpu; + if (hctx->queue->nr_hw_queues == 1) + return WORK_CPU_UNBOUND; if (--hctx->next_cpu_batch <= 0) { - int next_cpu; + int cpu = hctx->next_cpu, next_cpu; next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); if (next_cpu >= nr_cpu_ids) @@ -791,9 +792,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; + + return cpu; } - return cpu; + return hctx->next_cpu; } void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) @@ -801,16 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; - if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) + if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { __blk_mq_run_hw_queue(hctx); - else if (hctx->queue->nr_hw_queues == 1) - kblockd_schedule_delayed_work(&hctx->run_work, 0); - else { - unsigned int cpu; - - cpu = blk_mq_hctx_next_cpu(hctx); - kblockd_schedule_delayed_work_on(cpu, &hctx->run_work, 0); + return; } + + kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), + &hctx->run_work, 0); } void blk_mq_run_queues(struct request_queue *q, bool async) @@ -908,16 +908,8 @@ static void blk_mq_delay_work_fn(struct work_struct *work) void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) { - unsigned long tmo = msecs_to_jiffies(msecs); - - if (hctx->queue->nr_hw_queues == 1) - kblockd_schedule_delayed_work(&hctx->delay_work, tmo); - else { - unsigned int cpu; - - cpu = blk_mq_hctx_next_cpu(hctx); - kblockd_schedule_delayed_work_on(cpu, &hctx->delay_work, tmo); - } + kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), + &hctx->delay_work, msecs_to_jiffies(msecs)); } EXPORT_SYMBOL(blk_mq_delay_queue); -- 1.9.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue 2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig 2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig @ 2014-11-03 8:23 ` Christoph Hellwig 2014-11-03 8:40 ` Ming Lei 1 sibling, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel We have various block drivers that need to execute long term blocking operations during I/O submission like file system or network I/O. Currently these drivers just queue up work to an internal workqueue from their request_fn. With blk-mq we can make sure they always get called on their own workqueue directly for I/O submission by: 1) adding a flag to prevent inline submission of I/O, and 2) allowing the driver to pass in a workqueue in the tag_set that will be used instead of kblockd. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-core.c | 2 +- block/blk-mq.c | 12 +++++++++--- block/blk.h | 1 + include/linux/blk-mq.h | 4 ++++ 4 files changed, 15 insertions(+), 4 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 0421b53..7f7249f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -61,7 +61,7 @@ struct kmem_cache *blk_requestq_cachep; /* * Controlling structure to kblockd */ -static struct workqueue_struct *kblockd_workqueue; +struct workqueue_struct *kblockd_workqueue; void blk_queue_congestion_threshold(struct request_queue *q) { diff --git a/block/blk-mq.c b/block/blk-mq.c index 22e50a5..3d27d22 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -804,12 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) return; - if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { + if (!async && !(hctx->flags & BLK_MQ_F_WORKQUEUE) && + cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { __blk_mq_run_hw_queue(hctx); return; } - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq, &hctx->run_work, 0); } @@ -908,7 +909,7 @@ static void blk_mq_delay_work_fn(struct work_struct *work) void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) { - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq, &hctx->delay_work, msecs_to_jiffies(msecs)); } EXPORT_SYMBOL(blk_mq_delay_queue); @@ -1581,6 +1582,11 @@ static int blk_mq_init_hctx(struct request_queue *q, hctx->flags = set->flags; hctx->cmd_size = set->cmd_size; + if (set->wq) + hctx->wq = set->wq; + else + hctx->wq = kblockd_workqueue; + blk_mq_init_cpu_notifier(&hctx->cpu_notifier, blk_mq_hctx_notify, hctx); blk_mq_register_cpu_notifier(&hctx->cpu_notifier); diff --git a/block/blk.h b/block/blk.h index 43b0361..fb46ad0 100644 --- a/block/blk.h +++ b/block/blk.h @@ -25,6 +25,7 @@ struct blk_flush_queue { spinlock_t mq_flush_lock; }; +extern struct workqueue_struct *kblockd_workqueue; extern struct kmem_cache *blk_requestq_cachep; extern struct kmem_cache *request_cachep; extern struct kobj_type blk_queue_ktype; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 5a901d0..ebe4699 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -37,6 +37,8 @@ struct blk_mq_hw_ctx { unsigned int queue_num; struct blk_flush_queue *fq; + struct workqueue_struct *wq; + void *driver_data; struct blk_mq_ctxmap ctx_map; @@ -64,6 +66,7 @@ struct blk_mq_hw_ctx { struct blk_mq_tag_set { struct blk_mq_ops *ops; + struct workqueue_struct *wq; unsigned int nr_hw_queues; unsigned int queue_depth; /* max hw supported */ unsigned int reserved_tags; @@ -156,6 +159,7 @@ enum { BLK_MQ_F_SG_MERGE = 1 << 2, BLK_MQ_F_SYSFS_UP = 1 << 3, BLK_MQ_F_DEFER_ISSUE = 1 << 4, + BLK_MQ_F_WORKQUEUE = 1 << 5, BLK_MQ_S_STOPPED = 0, BLK_MQ_S_TAG_ACTIVE = 1, -- 1.9.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue 2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig @ 2014-11-03 8:40 ` Ming Lei 2014-11-03 10:10 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Ming Lei @ 2014-11-03 8:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Richard Weinberger, ceph-devel, Linux Kernel Mailing List Hi Christoph, On Mon, Nov 3, 2014 at 4:23 PM, Christoph Hellwig <hch@lst.de> wrote: > We have various block drivers that need to execute long term blocking > operations during I/O submission like file system or network I/O. > > Currently these drivers just queue up work to an internal workqueue > from their request_fn. With blk-mq we can make sure they always get > called on their own workqueue directly for I/O submission by: > > 1) adding a flag to prevent inline submission of I/O, and > 2) allowing the driver to pass in a workqueue in the tag_set that > will be used instead of kblockd. The above two aren't enough because the big problem is that drivers need a per-request work structure instead of 'hctx->run_work', otherwise there are at most NR_CPUS concurrent submissions. So the per-request work structure should be exposed to blk-mq too for the kind of usage, such as .blk_mq_req_work(req) callback in case of BLK_MQ_F_WORKQUEUE. Thanks, > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-core.c | 2 +- > block/blk-mq.c | 12 +++++++++--- > block/blk.h | 1 + > include/linux/blk-mq.h | 4 ++++ > 4 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 0421b53..7f7249f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -61,7 +61,7 @@ struct kmem_cache *blk_requestq_cachep; > /* > * Controlling structure to kblockd > */ > -static struct workqueue_struct *kblockd_workqueue; > +struct workqueue_struct *kblockd_workqueue; > > void blk_queue_congestion_threshold(struct request_queue *q) > { > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 22e50a5..3d27d22 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -804,12 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) > if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) > return; > > - if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { > + if (!async && !(hctx->flags & BLK_MQ_F_WORKQUEUE) && > + cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { > __blk_mq_run_hw_queue(hctx); > return; > } > > - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), > + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq, > &hctx->run_work, 0); > } > > @@ -908,7 +909,7 @@ static void blk_mq_delay_work_fn(struct work_struct *work) > > void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) > { > - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), > + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq, > &hctx->delay_work, msecs_to_jiffies(msecs)); > } > EXPORT_SYMBOL(blk_mq_delay_queue); > @@ -1581,6 +1582,11 @@ static int blk_mq_init_hctx(struct request_queue *q, > hctx->flags = set->flags; > hctx->cmd_size = set->cmd_size; > > + if (set->wq) > + hctx->wq = set->wq; > + else > + hctx->wq = kblockd_workqueue; > + > blk_mq_init_cpu_notifier(&hctx->cpu_notifier, > blk_mq_hctx_notify, hctx); > blk_mq_register_cpu_notifier(&hctx->cpu_notifier); > diff --git a/block/blk.h b/block/blk.h > index 43b0361..fb46ad0 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -25,6 +25,7 @@ struct blk_flush_queue { > spinlock_t mq_flush_lock; > }; > > +extern struct workqueue_struct *kblockd_workqueue; > extern struct kmem_cache *blk_requestq_cachep; > extern struct kmem_cache *request_cachep; > extern struct kobj_type blk_queue_ktype; > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index 5a901d0..ebe4699 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -37,6 +37,8 @@ struct blk_mq_hw_ctx { > unsigned int queue_num; > struct blk_flush_queue *fq; > > + struct workqueue_struct *wq; > + > void *driver_data; > > struct blk_mq_ctxmap ctx_map; > @@ -64,6 +66,7 @@ struct blk_mq_hw_ctx { > > struct blk_mq_tag_set { > struct blk_mq_ops *ops; > + struct workqueue_struct *wq; > unsigned int nr_hw_queues; > unsigned int queue_depth; /* max hw supported */ > unsigned int reserved_tags; > @@ -156,6 +159,7 @@ enum { > BLK_MQ_F_SG_MERGE = 1 << 2, > BLK_MQ_F_SYSFS_UP = 1 << 3, > BLK_MQ_F_DEFER_ISSUE = 1 << 4, > + BLK_MQ_F_WORKQUEUE = 1 << 5, > > BLK_MQ_S_STOPPED = 0, > BLK_MQ_S_TAG_ACTIVE = 1, > -- > 1.9.1 > -- Ming Lei ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue 2014-11-03 8:40 ` Ming Lei @ 2014-11-03 10:10 ` Christoph Hellwig 2014-11-03 11:54 ` Ming Lei 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2014-11-03 10:10 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Jens Axboe, Richard Weinberger, ceph-devel, Linux Kernel Mailing List On Mon, Nov 03, 2014 at 04:40:47PM +0800, Ming Lei wrote: > The above two aren't enough because the big problem is that > drivers need a per-request work structure instead of 'hctx->run_work', > otherwise there are at most NR_CPUS concurrent submissions. > > So the per-request work structure should be exposed to blk-mq > too for the kind of usage, such as .blk_mq_req_work(req) callback > in case of BLK_MQ_F_WORKQUEUE. Hmm. Maybe a better option is to just add a flag to never defer ->queue_rq to a workqueue and let drivers handle the it? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue 2014-11-03 10:10 ` Christoph Hellwig @ 2014-11-03 11:54 ` Ming Lei 0 siblings, 0 replies; 6+ messages in thread From: Ming Lei @ 2014-11-03 11:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Richard Weinberger, ceph-devel, Linux Kernel Mailing List On Mon, Nov 3, 2014 at 6:10 PM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, Nov 03, 2014 at 04:40:47PM +0800, Ming Lei wrote: >> The above two aren't enough because the big problem is that >> drivers need a per-request work structure instead of 'hctx->run_work', >> otherwise there are at most NR_CPUS concurrent submissions. >> >> So the per-request work structure should be exposed to blk-mq >> too for the kind of usage, such as .blk_mq_req_work(req) callback >> in case of BLK_MQ_F_WORKQUEUE. > > Hmm. Maybe a better option is to just add a flag to never defer > ->queue_rq to a workqueue and let drivers handle the it? That should work, but might lose potential merge benefit of defer. Thanks, ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-11-03 11:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig 2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig 2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig 2014-11-03 8:40 ` Ming Lei 2014-11-03 10:10 ` Christoph Hellwig 2014-11-03 11:54 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox