* [PATCH] block: mq flush: fix race between IPI handler and mq flush worker @ 2014-05-19 15:05 Ming Lei 2014-05-19 15:18 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2014-05-19 15:05 UTC (permalink / raw) To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, stable, Ming Lei In 'struct request', both 'csd' and 'requeue_work' are defined inside one union, unfortunately, the below race can be triggered: - one PREFLUSH request may be queued inside the previous POSTFLUSH request's IPI handler, and both the two requests share one request instance(q->flush_rq) - IPI interrupt for the previous POSTFLUSH request comes, and generic_smp_call_function_single_interrupt() is called to run the remote complete handler by csd->func(); - the POSTFLUSH flush request's end_io callback(flush_end_io) is called - finally inside the path, blk_flush_queue_rq() calls INIT_WORK(&rq->requeue_work,...) and schedule the work for queuing a new PREFLUSH request - csd->func(csd->info) returns, and csd_unlock(csd) is called, see generic_smp_call_function_single_interrupt() - then the initialized rq->requeue_work is changed by csd_unlock(): the lowerest bit of requeue_work.func is observed that it was cleared, and function address can be unaligned on x86. - the work func of mq_flush_run() will never be run - fs sync hangs forever This patch introduces one exclusive work_struct inside request_queue for queuing flush request only to fix the problem. Cc: <stable@vger.kernel.org> #3.14 Signed-off-by: Ming Lei <tom.leiming@gmail.com> --- Another simple fix is to disable ipi for flush request, but looks this one should be better. block/blk-flush.c | 43 +++++++++++++++++++++++++++++++++++++++---- include/linux/blkdev.h | 1 + 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index ec7a224..42e8a34 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,21 +130,56 @@ static void blk_flush_restore_request(struct request *rq) blk_clear_rq_complete(rq); } +static void __mq_flush_run(struct request *rq) +{ + memset(&rq->csd, 0, sizeof(rq->csd)); + blk_mq_insert_request(rq, false, true, false); +} + static void mq_flush_run(struct work_struct *work) { + struct request_queue *q; + struct request *rq; + + q = container_of(work, struct request_queue, flush_work); + rq = q->flush_rq; + + __mq_flush_run(rq); +} + +static void mq_data_flush_run(struct work_struct *work) +{ struct request *rq; rq = container_of(work, struct request, requeue_work); - memset(&rq->csd, 0, sizeof(rq->csd)); - blk_mq_insert_request(rq, false, true, false); + __mq_flush_run(rq); } static bool blk_flush_queue_rq(struct request *rq, bool add_front) { if (rq->q->mq_ops) { - INIT_WORK(&rq->requeue_work, mq_flush_run); - kblockd_schedule_work(&rq->requeue_work); + struct work_struct *wk; + work_func_t func; + + /* + * Flush request need use its own work_struct + * instead of rq->requeue_work because it + * will be changed by csd_unlock() in future + * if the request(PREFLUSH) is being queued + * from previous POSTFLUSH request's IPI + * handler context. + */ + if (rq == rq->q->flush_rq) { + wk = &rq->q->flush_work; + func = mq_flush_run; + } else { + wk = &rq->requeue_work; + func = mq_data_flush_run; + } + + INIT_WORK(wk, func); + kblockd_schedule_work(wk); return false; } else { if (add_front) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1f1ecc7..4c8f24c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -461,6 +461,7 @@ struct request_queue { struct list_head flush_queue[2]; struct list_head flush_data_in_flight; struct request *flush_rq; + struct work_struct flush_work; spinlock_t mq_flush_lock; struct mutex sysfs_lock; -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-19 15:05 [PATCH] block: mq flush: fix race between IPI handler and mq flush worker Ming Lei @ 2014-05-19 15:18 ` Christoph Hellwig 2014-05-19 15:34 ` Ming Lei 2014-05-20 3:20 ` Ming Lei 0 siblings, 2 replies; 21+ messages in thread From: Christoph Hellwig @ 2014-05-19 15:18 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-kernel, stable On Mon, May 19, 2014 at 11:05:50PM +0800, Ming Lei wrote: > Another simple fix is to disable ipi for flush request, but looks > this one should be better. I think the first thing is to bite the bullet and sort out and document the various unions in struct request for real. For example the first union has the call_single_data for the blk-mq completions, while the second one has the ipi_list that is used by the old blk-softirq code. If we get this right with a single union that contains a struct for each phase of the request we might find enough space to keep using the current way. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-19 15:18 ` Christoph Hellwig @ 2014-05-19 15:34 ` Ming Lei 2014-05-20 3:20 ` Ming Lei 1 sibling, 0 replies; 21+ messages in thread From: Ming Lei @ 2014-05-19 15:34 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List, stable On Mon, May 19, 2014 at 11:18 PM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, May 19, 2014 at 11:05:50PM +0800, Ming Lei wrote: >> Another simple fix is to disable ipi for flush request, but looks >> this one should be better. > > I think the first thing is to bite the bullet and sort out and document > the various unions in struct request for real. I agree, unions should be documented in detail. > > For example the first union has the call_single_data for the blk-mq blk-softirq need rq->csd too in raise_blk_irq(). > completions, while the second one has the ipi_list that is used by > the old blk-softirq code. Also we can put some mq specific fields and legacy fields into one union too. > > If we get this right with a single union that contains a struct for > each phase of the request we might find enough space to keep using > the current way. If we can figure it out, that should be better solution, but changing/ merging fields may affect performance too, and need careful verification. Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-19 15:18 ` Christoph Hellwig 2014-05-19 15:34 ` Ming Lei @ 2014-05-20 3:20 ` Ming Lei 2014-05-20 15:23 ` Christoph Hellwig 1 sibling, 1 reply; 21+ messages in thread From: Ming Lei @ 2014-05-20 3:20 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List, stable On Mon, May 19, 2014 at 11:18 PM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, May 19, 2014 at 11:05:50PM +0800, Ming Lei wrote: >> Another simple fix is to disable ipi for flush request, but looks >> this one should be better. > > I think the first thing is to bite the bullet and sort out and document > the various unions in struct request for real. Considered the problem can be reproduced easily and cause data loss on kvm-guest, I hope fix can be seen in 3.15. Also it should be backported to 3.14 because someone may enable ctx->ipi_redirect. Thinking it further, looks the patch is fine: - sizeof(call_single_data) and sizeof(work_struct) is very close and both are not small, so it is reasonable to put the two in one union for saving request space - the conflict on the two structures just happens with flush requests because rq->requeue_work is only used to queue flush requests - in the flush sequence, the conflict won't happen on request with data, so it only happens with completing and queueing PREFLUSH/POSTFLUSH request. Any comments and suggestions? Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-20 3:20 ` Ming Lei @ 2014-05-20 15:23 ` Christoph Hellwig 2014-05-21 5:16 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-20 15:23 UTC (permalink / raw) To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List, stable On Tue, May 20, 2014 at 11:20:25AM +0800, Ming Lei wrote: > - the conflict on the two structures just happens with flush > requests because rq->requeue_work is only used to queue > flush requests Once we get non-trivial block drivers we'll need to be able to requeue arbitrary requests, that's why I added blk_mq_requeue_request. The scsi-mq work that I plant to submit for the next merge window is the prime example. I'd really prefer to avoid breaking that use case if we can avoid it. Note that the flush code already is very nasy for blk-mq and this just makes it worse. One fix that would also help to sort out some of the flush issues would be to add a list of requests that need requeueing to the blk_mq context, which we can add requeusts to from irq context. The next time we run hw contexts for the queue we'll pick them up in user context and insert them. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-20 15:23 ` Christoph Hellwig @ 2014-05-21 5:16 ` Ming Lei 2014-05-21 5:36 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2014-05-21 5:16 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List, stable On Tue, May 20, 2014 at 11:23 PM, Christoph Hellwig <hch@lst.de> wrote: > On Tue, May 20, 2014 at 11:20:25AM +0800, Ming Lei wrote: >> - the conflict on the two structures just happens with flush >> requests because rq->requeue_work is only used to queue >> flush requests > > Once we get non-trivial block drivers we'll need to be able > to requeue arbitrary requests, that's why I added blk_mq_requeue_request. I am wondering if virtio-blk is trivial block driver, :-) And virtio-blk still supports requeuing with returning BLK_MQ_RQ_QUEUE_BUSY. > The scsi-mq work that I plant to submit for the next merge window is > the prime example. It depends if one scsi-mq req has to requeue itself with rq->requeue_work inside its own .softirq_done_fn. If yes, we can't put call_single_data and requeue_work into one union simply. From you last scsi-mq post, looks the request may do that if I understand correctly. But scsi_cmnd has already one 'abort_work', and I am wondering why scsi-mq requeuing doesn't share its own requeue_work with abort_work, which seems doable since requeuing and aborting belong to different stage. > > I'd really prefer to avoid breaking that use case if we can avoid it. This patch won't break the coming scsi-mq, and it is a fix. I'd like to figure out one patch to cover scsi-mq case if we can get that before 3.15 release since your 'respect-affinity' patch has enabled IPI at default already. Otherwise, we still have enough time to fix the issue for scsi-mq, don't we? > > > Note that the flush code already is very nasy for blk-mq and this just > makes it worse. I think the patch is clean and simple, with documenting the special conflict case clearly too. > > One fix that would also help to sort out some of the flush issues would > be to add a list of requests that need requeueing to the blk_mq context, > which we can add requeusts to from irq context. The next time we run That won't be easy to introduce a requeue_list for the purpose since we need to keep order of requests per blk-mq ctx. Also lockless list won't work since there are both 'add_front'/'add_tail' requirement. > hw contexts for the queue we'll pick them up in user context and insert > them. IMO, the requests can be inserted to ctx list directly from irq context, but with cost of spin_lock_irqsave(&ctx->lock) everywhere. Follows current ideas: 1), this patch with scsi-mq sharing abort_work together? 2), move requeue_work out of the union inside request 3), spin_lock_irqsave(&ctx->lock) everywhere and requeue request directly to ctx without using work Any other ideas? Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-21 5:16 ` Ming Lei @ 2014-05-21 5:36 ` Christoph Hellwig 2014-05-21 6:48 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-21 5:36 UTC (permalink / raw) To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List, stable On Wed, May 21, 2014 at 01:16:14PM +0800, Ming Lei wrote: > I am wondering if virtio-blk is trivial block driver, :-) It's about as simple as it gets. > > The scsi-mq work that I plant to submit for the next merge window is > > the prime example. > > It depends if one scsi-mq req has to requeue itself with rq->requeue_work > inside its own .softirq_done_fn. If yes, we can't put call_single_data > and requeue_work into one union simply. From you last scsi-mq post, > looks the request may do that if I understand correctly. Requeueing a request from the completion handler is indeed what we'll need with various more complete drivers. > I think the patch is clean and simple, with documenting the special > conflict case clearly too. While I can't say anything against the fact that it fixes the issue it's neither clean nor simple. > Follows current ideas: > 1), this patch with scsi-mq sharing abort_work together? > 2), move requeue_work out of the union inside request > 3), spin_lock_irqsave(&ctx->lock) everywhere and requeue > request directly to ctx without using work I think Jens very much wanted to avoid irq disabling in the I/O path if possible. If we have a separate requeue list with it's separate lock we can avoid that unless we actually have to take requests of that requeue list. I can look into that implementation. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-21 5:36 ` Christoph Hellwig @ 2014-05-21 6:48 ` Ming Lei 2014-05-27 18:13 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2014-05-21 6:48 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List, stable On Wed, May 21, 2014 at 1:36 PM, Christoph Hellwig <hch@lst.de> wrote: > On Wed, May 21, 2014 at 01:16:14PM +0800, Ming Lei wrote: >> I am wondering if virtio-blk is trivial block driver, :-) > > It's about as simple as it gets. > >> > The scsi-mq work that I plant to submit for the next merge window is >> > the prime example. >> >> It depends if one scsi-mq req has to requeue itself with rq->requeue_work >> inside its own .softirq_done_fn. If yes, we can't put call_single_data >> and requeue_work into one union simply. From you last scsi-mq post, >> looks the request may do that if I understand correctly. > > Requeueing a request from the completion handler is indeed what we'll > need with various more complete drivers. At least for scsi, the current scsi_cmnd->abort_work(reusing) should be enough for requeing the command. For other drivers, maybe they can make use of something like scsi_cmnd->abort_work too. > >> I think the patch is clean and simple, with documenting the special >> conflict case clearly too. > > While I can't say anything against the fact that it fixes the issue > it's neither clean nor simple. It just uses q->flush_rq's own work_struct for requeuing itself, that is it. I'd like to see a cleaner/simpler solution without wasting space and extra complexity. > >> Follows current ideas: >> 1), this patch with scsi-mq sharing abort_work together? >> 2), move requeue_work out of the union inside request >> 3), spin_lock_irqsave(&ctx->lock) everywhere and requeue >> request directly to ctx without using work > > I think Jens very much wanted to avoid irq disabling in the I/O path > if possible. If we have a separate requeue list with it's separate > lock we can avoid that unless we actually have to take requests of > that requeue list. I can look into that implementation. I am wondering if you can keep requests in order per blk-mq ctx since you introduce another list. Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-21 6:48 ` Ming Lei @ 2014-05-27 18:13 ` Christoph Hellwig 2014-05-27 18:38 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-27 18:13 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, Linux Kernel Mailing List, stable [-- Attachment #1: Type: text/plain, Size: 132 bytes --] Here is my counter proposal that requeues via two lists and a work struct in the request_queue. I've also tested it with scsi-mq. [-- Attachment #2: 0001-blk-mq-add-helper-to-insert-requests-from-irq-contex.patch --] [-- Type: text/x-patch, Size: 4648 bytes --] >From 95f15eda9a87d8545e1dff1b996d9227dfca1129 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Wed, 21 May 2014 19:37:11 +0200 Subject: blk-mq: add helper to insert requests from irq context --- block/blk-flush.c | 16 ++++---------- block/blk-mq.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 6 +++++- 4 files changed, 65 insertions(+), 14 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index ec7a224..ef608b3 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,21 +130,13 @@ static void blk_flush_restore_request(struct request *rq) blk_clear_rq_complete(rq); } -static void mq_flush_run(struct work_struct *work) -{ - struct request *rq; - - rq = container_of(work, struct request, requeue_work); - - memset(&rq->csd, 0, sizeof(rq->csd)); - blk_mq_insert_request(rq, false, true, false); -} - static bool blk_flush_queue_rq(struct request *rq, bool add_front) { if (rq->q->mq_ops) { - INIT_WORK(&rq->requeue_work, mq_flush_run); - kblockd_schedule_work(&rq->requeue_work); + struct request_queue *q = rq->q; + + blk_mq_add_to_requeue_list(rq, add_front); + blk_mq_kick_requeue_list(q); return false; } else { if (add_front) diff --git a/block/blk-mq.c b/block/blk-mq.c index 62082c5..926d844 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -510,10 +510,57 @@ void blk_mq_requeue_request(struct request *rq) blk_clear_rq_complete(rq); BUG_ON(blk_queued_rq(rq)); - blk_mq_insert_request(rq, true, true, false); + blk_mq_add_to_requeue_list(rq, true); } EXPORT_SYMBOL(blk_mq_requeue_request); +static void blk_mq_requeue_work(struct work_struct *work) +{ + struct request_queue *q = + container_of(work, struct request_queue, requeue_work); + LIST_HEAD(head_list); + LIST_HEAD(tail_list); + struct request *rq; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + list_splice_init(&q->requeue_head_list, &head_list); + list_splice_init(&q->requeue_list, &tail_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); + + while (!list_empty(&head_list)) { + rq = list_entry(head_list.next, struct request, queuelist); + blk_mq_insert_request(rq, true, false, false); + } + + while (!list_empty(&tail_list)) { + rq = list_entry(tail_list.next, struct request, queuelist); + blk_mq_insert_request(rq, false, false, false); + } + + blk_mq_run_queues(q, false); +} + +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) +{ + struct request_queue *q = rq->q; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + if (at_head) + list_add(&rq->queuelist, &q->requeue_head_list); + else + list_add_tail(&rq->queuelist, &q->requeue_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); +} +EXPORT_SYMBOL(blk_mq_add_to_requeue_list); + +void blk_mq_kick_requeue_list(struct request_queue *q) +{ + kblockd_schedule_work(&q->requeue_work); +} +EXPORT_SYMBOL(blk_mq_kick_requeue_list); + struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) { return tags->rqs[tag]; @@ -1777,6 +1824,11 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) q->sg_reserved_size = INT_MAX; + INIT_WORK(&q->requeue_work, blk_mq_requeue_work); + INIT_LIST_HEAD(&q->requeue_head_list); + INIT_LIST_HEAD(&q->requeue_list); + spin_lock_init(&q->requeue_lock); + if (q->nr_hw_queues > 1) blk_queue_make_request(q, blk_mq_make_request); else diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f76bb18..81bb7f1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -173,6 +173,9 @@ void __blk_mq_end_io(struct request *rq, int error); void blk_mq_requeue_request(struct request *rq); +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head); +void blk_mq_kick_requeue_list(struct request_queue *q); + void blk_mq_complete_request(struct request *rq); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b0104ba..e137348 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -99,7 +99,6 @@ struct request { struct list_head queuelist; union { struct call_single_data csd; - struct work_struct requeue_work; unsigned long fifo_time; }; @@ -463,6 +462,11 @@ struct request_queue { struct request *flush_rq; spinlock_t mq_flush_lock; + struct list_head requeue_list; + struct list_head requeue_head_list; + spinlock_t requeue_lock; + struct work_struct requeue_work; + struct mutex sysfs_lock; int bypass_depth; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 18:13 ` Christoph Hellwig @ 2014-05-27 18:38 ` Jens Axboe 2014-05-27 18:53 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-27 18:38 UTC (permalink / raw) To: Christoph Hellwig, Ming Lei; +Cc: Linux Kernel Mailing List, stable On 2014-05-27 12:13, Christoph Hellwig wrote: > Here is my counter proposal that requeues via two lists and a work struct > in the request_queue. I've also tested it with scsi-mq. > I like this, moves the state out of the request. But how about we consolidate the two requeue requests lists, and just mark the request as needing head insertion instead? Just add a cmd_flags flag, REQ_REQUEUE_HEAD or something. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 18:38 ` Jens Axboe @ 2014-05-27 18:53 ` Christoph Hellwig 2014-05-27 18:54 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-27 18:53 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, Linux Kernel Mailing List, stable On Tue, May 27, 2014 at 12:38:15PM -0600, Jens Axboe wrote: > On 2014-05-27 12:13, Christoph Hellwig wrote: >> Here is my counter proposal that requeues via two lists and a work struct >> in the request_queue. I've also tested it with scsi-mq. >> > > I like this, moves the state out of the request. But how about we > consolidate the two requeue requests lists, and just mark the request as > needing head insertion instead? Just add a cmd_flags flag, REQ_REQUEUE_HEAD > or something. That should be doable, just didn't like introducing even more flags. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 18:53 ` Christoph Hellwig @ 2014-05-27 18:54 ` Jens Axboe 2014-05-27 19:15 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-27 18:54 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, Linux Kernel Mailing List, stable On 2014-05-27 12:53, Christoph Hellwig wrote: > On Tue, May 27, 2014 at 12:38:15PM -0600, Jens Axboe wrote: >> On 2014-05-27 12:13, Christoph Hellwig wrote: >>> Here is my counter proposal that requeues via two lists and a work struct >>> in the request_queue. I've also tested it with scsi-mq. >>> >> >> I like this, moves the state out of the request. But how about we >> consolidate the two requeue requests lists, and just mark the request as >> needing head insertion instead? Just add a cmd_flags flag, REQ_REQUEUE_HEAD >> or something. > > That should be doable, just didn't like introducing even more flags. Space is plenty big again now... The down side is that the splice trick wont work, but if we have more than a few requests on the requeue list, we're doing it wrong anyway. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 18:54 ` Jens Axboe @ 2014-05-27 19:15 ` Christoph Hellwig 2014-05-27 19:17 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-27 19:15 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Ming Lei, Linux Kernel Mailing List, stable [-- Attachment #1: Type: text/plain, Size: 325 bytes --] On Tue, May 27, 2014 at 12:54:22PM -0600, Jens Axboe wrote: > Space is plenty big again now... The down side is that the splice trick > wont work, but if we have more than a few requests on the requeue list, > we're doing it wrong anyway. I don't see a problem with the splice. How about this version (untested so far)? [-- Attachment #2: 0001-blk-mq-add-helper-to-insert-requests-from-irq-contex.patch --] [-- Type: text/x-patch, Size: 4648 bytes --] >From 95f15eda9a87d8545e1dff1b996d9227dfca1129 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Wed, 21 May 2014 19:37:11 +0200 Subject: blk-mq: add helper to insert requests from irq context --- block/blk-flush.c | 16 ++++---------- block/blk-mq.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 6 +++++- 4 files changed, 65 insertions(+), 14 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index ec7a224..ef608b3 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,21 +130,13 @@ static void blk_flush_restore_request(struct request *rq) blk_clear_rq_complete(rq); } -static void mq_flush_run(struct work_struct *work) -{ - struct request *rq; - - rq = container_of(work, struct request, requeue_work); - - memset(&rq->csd, 0, sizeof(rq->csd)); - blk_mq_insert_request(rq, false, true, false); -} - static bool blk_flush_queue_rq(struct request *rq, bool add_front) { if (rq->q->mq_ops) { - INIT_WORK(&rq->requeue_work, mq_flush_run); - kblockd_schedule_work(&rq->requeue_work); + struct request_queue *q = rq->q; + + blk_mq_add_to_requeue_list(rq, add_front); + blk_mq_kick_requeue_list(q); return false; } else { if (add_front) diff --git a/block/blk-mq.c b/block/blk-mq.c index 62082c5..926d844 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -510,10 +510,57 @@ void blk_mq_requeue_request(struct request *rq) blk_clear_rq_complete(rq); BUG_ON(blk_queued_rq(rq)); - blk_mq_insert_request(rq, true, true, false); + blk_mq_add_to_requeue_list(rq, true); } EXPORT_SYMBOL(blk_mq_requeue_request); +static void blk_mq_requeue_work(struct work_struct *work) +{ + struct request_queue *q = + container_of(work, struct request_queue, requeue_work); + LIST_HEAD(head_list); + LIST_HEAD(tail_list); + struct request *rq; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + list_splice_init(&q->requeue_head_list, &head_list); + list_splice_init(&q->requeue_list, &tail_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); + + while (!list_empty(&head_list)) { + rq = list_entry(head_list.next, struct request, queuelist); + blk_mq_insert_request(rq, true, false, false); + } + + while (!list_empty(&tail_list)) { + rq = list_entry(tail_list.next, struct request, queuelist); + blk_mq_insert_request(rq, false, false, false); + } + + blk_mq_run_queues(q, false); +} + +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) +{ + struct request_queue *q = rq->q; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + if (at_head) + list_add(&rq->queuelist, &q->requeue_head_list); + else + list_add_tail(&rq->queuelist, &q->requeue_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); +} +EXPORT_SYMBOL(blk_mq_add_to_requeue_list); + +void blk_mq_kick_requeue_list(struct request_queue *q) +{ + kblockd_schedule_work(&q->requeue_work); +} +EXPORT_SYMBOL(blk_mq_kick_requeue_list); + struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) { return tags->rqs[tag]; @@ -1777,6 +1824,11 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) q->sg_reserved_size = INT_MAX; + INIT_WORK(&q->requeue_work, blk_mq_requeue_work); + INIT_LIST_HEAD(&q->requeue_head_list); + INIT_LIST_HEAD(&q->requeue_list); + spin_lock_init(&q->requeue_lock); + if (q->nr_hw_queues > 1) blk_queue_make_request(q, blk_mq_make_request); else diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f76bb18..81bb7f1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -173,6 +173,9 @@ void __blk_mq_end_io(struct request *rq, int error); void blk_mq_requeue_request(struct request *rq); +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head); +void blk_mq_kick_requeue_list(struct request_queue *q); + void blk_mq_complete_request(struct request *rq); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b0104ba..e137348 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -99,7 +99,6 @@ struct request { struct list_head queuelist; union { struct call_single_data csd; - struct work_struct requeue_work; unsigned long fifo_time; }; @@ -463,6 +462,11 @@ struct request_queue { struct request *flush_rq; spinlock_t mq_flush_lock; + struct list_head requeue_list; + struct list_head requeue_head_list; + spinlock_t requeue_lock; + struct work_struct requeue_work; + struct mutex sysfs_lock; int bypass_depth; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 19:15 ` Christoph Hellwig @ 2014-05-27 19:17 ` Jens Axboe 2014-05-27 19:21 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-27 19:17 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, Linux Kernel Mailing List, stable On 05/27/2014 01:15 PM, Christoph Hellwig wrote: > On Tue, May 27, 2014 at 12:54:22PM -0600, Jens Axboe wrote: >> Space is plenty big again now... The down side is that the splice trick >> wont work, but if we have more than a few requests on the requeue list, >> we're doing it wrong anyway. > > I don't see a problem with the splice. How about this version (untested so > far)? True, I guess it's iterated afterwards anyway, so that's where you just check it. But I think you sent the old one again, not the new variant :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 19:17 ` Jens Axboe @ 2014-05-27 19:21 ` Christoph Hellwig 2014-05-27 19:35 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-27 19:21 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Ming Lei, Linux Kernel Mailing List, stable [-- Attachment #1: Type: text/plain, Size: 146 bytes --] On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote: > But I think you sent the old one again, not the new variant :-) Oh well, next try: [-- Attachment #2: 0001-blk-mq-add-helper-to-insert-requests-from-irq-contex.patch --] [-- Type: text/x-patch, Size: 4830 bytes --] >From ff3685b6052a31fbafc72551cdf7c123cd2b4634 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Wed, 21 May 2014 19:37:11 +0200 Subject: blk-mq: add helper to insert requests from irq context --- block/blk-flush.c | 16 +++--------- block/blk-mq.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 5 +++- 4 files changed, 74 insertions(+), 14 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index ec7a224..ef608b3 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,21 +130,13 @@ static void blk_flush_restore_request(struct request *rq) blk_clear_rq_complete(rq); } -static void mq_flush_run(struct work_struct *work) -{ - struct request *rq; - - rq = container_of(work, struct request, requeue_work); - - memset(&rq->csd, 0, sizeof(rq->csd)); - blk_mq_insert_request(rq, false, true, false); -} - static bool blk_flush_queue_rq(struct request *rq, bool add_front) { if (rq->q->mq_ops) { - INIT_WORK(&rq->requeue_work, mq_flush_run); - kblockd_schedule_work(&rq->requeue_work); + struct request_queue *q = rq->q; + + blk_mq_add_to_requeue_list(rq, add_front); + blk_mq_kick_requeue_list(q); return false; } else { if (add_front) diff --git a/block/blk-mq.c b/block/blk-mq.c index 62082c5..0457010 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -510,10 +510,68 @@ void blk_mq_requeue_request(struct request *rq) blk_clear_rq_complete(rq); BUG_ON(blk_queued_rq(rq)); - blk_mq_insert_request(rq, true, true, false); + blk_mq_add_to_requeue_list(rq, true); } EXPORT_SYMBOL(blk_mq_requeue_request); +static void blk_mq_requeue_work(struct work_struct *work) +{ + struct request_queue *q = + container_of(work, struct request_queue, requeue_work); + LIST_HEAD(rq_list); + struct request *rq, *next; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + list_splice_init(&q->requeue_list, &rq_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); + + list_for_each_entry_safe(rq, next, &rq_list, queuelist) { + if (!(rq->cmd_flags & REQ_SOFTBARRIER)) + continue; + + rq->cmd_flags &= ~REQ_SOFTBARRIER; + list_del_init(&rq->queuelist); + blk_mq_insert_request(rq, true, false, false); + } + + while (!list_empty(&rq_list)) { + rq = list_entry(rq_list.next, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_insert_request(rq, false, false, false); + } + + blk_mq_run_queues(q, false); +} + +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) +{ + struct request_queue *q = rq->q; + unsigned long flags; + + /* + * We abuse this flag that is otherwise used by the I/O scheduler to + * request head insertation from the workqueue. + */ + BUG_ON(rq->cmd_flags & REQ_SOFTBARRIER); + + spin_lock_irqsave(&q->requeue_lock, flags); + if (at_head) { + rq->cmd_flags |= REQ_SOFTBARRIER; + list_add(&rq->queuelist, &q->requeue_list); + } else { + list_add_tail(&rq->queuelist, &q->requeue_list); + } + spin_unlock_irqrestore(&q->requeue_lock, flags); +} +EXPORT_SYMBOL(blk_mq_add_to_requeue_list); + +void blk_mq_kick_requeue_list(struct request_queue *q) +{ + kblockd_schedule_work(&q->requeue_work); +} +EXPORT_SYMBOL(blk_mq_kick_requeue_list); + struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) { return tags->rqs[tag]; @@ -1777,6 +1835,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) q->sg_reserved_size = INT_MAX; + INIT_WORK(&q->requeue_work, blk_mq_requeue_work); + INIT_LIST_HEAD(&q->requeue_list); + spin_lock_init(&q->requeue_lock); + if (q->nr_hw_queues > 1) blk_queue_make_request(q, blk_mq_make_request); else diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f76bb18..81bb7f1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -173,6 +173,9 @@ void __blk_mq_end_io(struct request *rq, int error); void blk_mq_requeue_request(struct request *rq); +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head); +void blk_mq_kick_requeue_list(struct request_queue *q); + void blk_mq_complete_request(struct request *rq); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b0104ba..e90e169 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -99,7 +99,6 @@ struct request { struct list_head queuelist; union { struct call_single_data csd; - struct work_struct requeue_work; unsigned long fifo_time; }; @@ -463,6 +462,10 @@ struct request_queue { struct request *flush_rq; spinlock_t mq_flush_lock; + struct list_head requeue_list; + spinlock_t requeue_lock; + struct work_struct requeue_work; + struct mutex sysfs_lock; int bypass_depth; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 19:21 ` Christoph Hellwig @ 2014-05-27 19:35 ` Jens Axboe 2014-05-28 1:34 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-27 19:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, Linux Kernel Mailing List, stable On 05/27/2014 01:21 PM, Christoph Hellwig wrote: > On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote: >> But I think you sent the old one again, not the new variant :-) > > Oh well, next try: This looks good to me. Was trying to think of ways to reduce that to one list iteration, but I think it's cleaner to just retain the two separate ones. Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise. Let me know when you have runtime verified it. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-27 19:35 ` Jens Axboe @ 2014-05-28 1:34 ` Ming Lei 2014-05-28 2:26 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2014-05-28 1:34 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Linux Kernel Mailing List, stable On Wed, May 28, 2014 at 3:35 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 05/27/2014 01:21 PM, Christoph Hellwig wrote: >> On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote: >>> But I think you sent the old one again, not the new variant :-) >> >> Oh well, next try: > > This looks good to me. Was trying to think of ways to reduce that to one > list iteration, but I think it's cleaner to just retain the two separate > ones. > > Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise. > > Let me know when you have runtime verified it. Looks writing over ext4(especially sync writing) can survive with Christoph's patch now, thanks Christoph. Reported-and-tested-by: Ming Lei <tom.leiming@gmail.com> Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-28 1:34 ` Ming Lei @ 2014-05-28 2:26 ` Jens Axboe 2014-05-28 2:31 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-28 2:26 UTC (permalink / raw) To: Ming Lei; +Cc: Christoph Hellwig, Linux Kernel Mailing List, stable On 2014-05-27 19:34, Ming Lei wrote: > On Wed, May 28, 2014 at 3:35 AM, Jens Axboe <axboe@kernel.dk> wrote: >> On 05/27/2014 01:21 PM, Christoph Hellwig wrote: >>> On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote: >>>> But I think you sent the old one again, not the new variant :-) >>> >>> Oh well, next try: >> >> This looks good to me. Was trying to think of ways to reduce that to one >> list iteration, but I think it's cleaner to just retain the two separate >> ones. >> >> Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise. >> >> Let me know when you have runtime verified it. > > Looks writing over ext4(especially sync writing) can survive > with Christoph's patch now, thanks Christoph. > > Reported-and-tested-by: Ming Lei <tom.leiming@gmail.com> Great! I'll queue it up here too, then. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-28 2:26 ` Jens Axboe @ 2014-05-28 2:31 ` Jens Axboe 2014-05-28 5:22 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2014-05-28 2:31 UTC (permalink / raw) To: Ming Lei; +Cc: Christoph Hellwig, Linux Kernel Mailing List, stable On 2014-05-27 20:26, Jens Axboe wrote: > On 2014-05-27 19:34, Ming Lei wrote: >> On Wed, May 28, 2014 at 3:35 AM, Jens Axboe <axboe@kernel.dk> wrote: >>> On 05/27/2014 01:21 PM, Christoph Hellwig wrote: >>>> On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote: >>>>> But I think you sent the old one again, not the new variant :-) >>>> >>>> Oh well, next try: >>> >>> This looks good to me. Was trying to think of ways to reduce that to one >>> list iteration, but I think it's cleaner to just retain the two separate >>> ones. >>> >>> Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise. >>> >>> Let me know when you have runtime verified it. >> >> Looks writing over ext4(especially sync writing) can survive >> with Christoph's patch now, thanks Christoph. >> >> Reported-and-tested-by: Ming Lei <tom.leiming@gmail.com> > > Great! I'll queue it up here too, then. Christoph, I'll just run a few tests and then queue it up in the morning. Can you send a properly signed-off patch with a commit message as well? I was writing one up, but I still need the signed-off-by. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-28 2:31 ` Jens Axboe @ 2014-05-28 5:22 ` Christoph Hellwig 2014-05-28 14:08 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2014-05-28 5:22 UTC (permalink / raw) To: Jens Axboe; +Cc: Ming Lei, Christoph Hellwig, Linux Kernel Mailing List, stable [-- Attachment #1: Type: text/plain, Size: 284 bytes --] On Tue, May 27, 2014 at 08:31:18PM -0600, Jens Axboe wrote: > Christoph, I'll just run a few tests and then queue it up in the morning. > Can you send a properly signed-off patch with a commit message as well? I > was writing one up, but I still need the signed-off-by. Attached. [-- Attachment #2: 0001-blk-mq-add-helper-to-insert-requests-from-irq-contex.patch --] [-- Type: text/x-patch, Size: 5297 bytes --] >From 125823de325211c3e96dad884b0d1a52ec04947d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Wed, 21 May 2014 19:37:11 +0200 Subject: blk-mq: add helper to insert requests from irq context Both the cache flush state machine and the SCSI midlayer want to submit requests from irq context, and the current per-request requeue_work unfortunately causes corruption due to sharing with the csd field for flushes. Replace them with a per-request_queue list of requests to be requeued. Based on an earlier test by Ming Lei. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Ming Lei <tom.leiming@gmail.com> --- block/blk-flush.c | 16 +++--------- block/blk-mq.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 5 +++- 4 files changed, 74 insertions(+), 14 deletions(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index ec7a224..ef608b3 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -130,21 +130,13 @@ static void blk_flush_restore_request(struct request *rq) blk_clear_rq_complete(rq); } -static void mq_flush_run(struct work_struct *work) -{ - struct request *rq; - - rq = container_of(work, struct request, requeue_work); - - memset(&rq->csd, 0, sizeof(rq->csd)); - blk_mq_insert_request(rq, false, true, false); -} - static bool blk_flush_queue_rq(struct request *rq, bool add_front) { if (rq->q->mq_ops) { - INIT_WORK(&rq->requeue_work, mq_flush_run); - kblockd_schedule_work(&rq->requeue_work); + struct request_queue *q = rq->q; + + blk_mq_add_to_requeue_list(rq, add_front); + blk_mq_kick_requeue_list(q); return false; } else { if (add_front) diff --git a/block/blk-mq.c b/block/blk-mq.c index 62082c5..0457010 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -510,10 +510,68 @@ void blk_mq_requeue_request(struct request *rq) blk_clear_rq_complete(rq); BUG_ON(blk_queued_rq(rq)); - blk_mq_insert_request(rq, true, true, false); + blk_mq_add_to_requeue_list(rq, true); } EXPORT_SYMBOL(blk_mq_requeue_request); +static void blk_mq_requeue_work(struct work_struct *work) +{ + struct request_queue *q = + container_of(work, struct request_queue, requeue_work); + LIST_HEAD(rq_list); + struct request *rq, *next; + unsigned long flags; + + spin_lock_irqsave(&q->requeue_lock, flags); + list_splice_init(&q->requeue_list, &rq_list); + spin_unlock_irqrestore(&q->requeue_lock, flags); + + list_for_each_entry_safe(rq, next, &rq_list, queuelist) { + if (!(rq->cmd_flags & REQ_SOFTBARRIER)) + continue; + + rq->cmd_flags &= ~REQ_SOFTBARRIER; + list_del_init(&rq->queuelist); + blk_mq_insert_request(rq, true, false, false); + } + + while (!list_empty(&rq_list)) { + rq = list_entry(rq_list.next, struct request, queuelist); + list_del_init(&rq->queuelist); + blk_mq_insert_request(rq, false, false, false); + } + + blk_mq_run_queues(q, false); +} + +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) +{ + struct request_queue *q = rq->q; + unsigned long flags; + + /* + * We abuse this flag that is otherwise used by the I/O scheduler to + * request head insertation from the workqueue. + */ + BUG_ON(rq->cmd_flags & REQ_SOFTBARRIER); + + spin_lock_irqsave(&q->requeue_lock, flags); + if (at_head) { + rq->cmd_flags |= REQ_SOFTBARRIER; + list_add(&rq->queuelist, &q->requeue_list); + } else { + list_add_tail(&rq->queuelist, &q->requeue_list); + } + spin_unlock_irqrestore(&q->requeue_lock, flags); +} +EXPORT_SYMBOL(blk_mq_add_to_requeue_list); + +void blk_mq_kick_requeue_list(struct request_queue *q) +{ + kblockd_schedule_work(&q->requeue_work); +} +EXPORT_SYMBOL(blk_mq_kick_requeue_list); + struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) { return tags->rqs[tag]; @@ -1777,6 +1835,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) q->sg_reserved_size = INT_MAX; + INIT_WORK(&q->requeue_work, blk_mq_requeue_work); + INIT_LIST_HEAD(&q->requeue_list); + spin_lock_init(&q->requeue_lock); + if (q->nr_hw_queues > 1) blk_queue_make_request(q, blk_mq_make_request); else diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f76bb18..81bb7f1 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -173,6 +173,9 @@ void __blk_mq_end_io(struct request *rq, int error); void blk_mq_requeue_request(struct request *rq); +void blk_mq_add_to_requeue_list(struct request *rq, bool at_head); +void blk_mq_kick_requeue_list(struct request_queue *q); + void blk_mq_complete_request(struct request *rq); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b0104ba..e90e169 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -99,7 +99,6 @@ struct request { struct list_head queuelist; union { struct call_single_data csd; - struct work_struct requeue_work; unsigned long fifo_time; }; @@ -463,6 +462,10 @@ struct request_queue { struct request *flush_rq; spinlock_t mq_flush_lock; + struct list_head requeue_list; + spinlock_t requeue_lock; + struct work_struct requeue_work; + struct mutex sysfs_lock; int bypass_depth; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker 2014-05-28 5:22 ` Christoph Hellwig @ 2014-05-28 14:08 ` Jens Axboe 0 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2014-05-28 14:08 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ming Lei, Linux Kernel Mailing List, stable On 2014-05-27 23:22, Christoph Hellwig wrote: > On Tue, May 27, 2014 at 08:31:18PM -0600, Jens Axboe wrote: >> Christoph, I'll just run a few tests and then queue it up in the morning. >> Can you send a properly signed-off patch with a commit message as well? I >> was writing one up, but I still need the signed-off-by. > > Attached. Thanks, added. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-05-28 14:08 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-19 15:05 [PATCH] block: mq flush: fix race between IPI handler and mq flush worker Ming Lei 2014-05-19 15:18 ` Christoph Hellwig 2014-05-19 15:34 ` Ming Lei 2014-05-20 3:20 ` Ming Lei 2014-05-20 15:23 ` Christoph Hellwig 2014-05-21 5:16 ` Ming Lei 2014-05-21 5:36 ` Christoph Hellwig 2014-05-21 6:48 ` Ming Lei 2014-05-27 18:13 ` Christoph Hellwig 2014-05-27 18:38 ` Jens Axboe 2014-05-27 18:53 ` Christoph Hellwig 2014-05-27 18:54 ` Jens Axboe 2014-05-27 19:15 ` Christoph Hellwig 2014-05-27 19:17 ` Jens Axboe 2014-05-27 19:21 ` Christoph Hellwig 2014-05-27 19:35 ` Jens Axboe 2014-05-28 1:34 ` Ming Lei 2014-05-28 2:26 ` Jens Axboe 2014-05-28 2:31 ` Jens Axboe 2014-05-28 5:22 ` Christoph Hellwig 2014-05-28 14:08 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).