From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 31 Aug 2017 00:58:53 +0800 From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Oleksandr Natalenko Subject: Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch Message-ID: <20170830165852.GC14684@ming.t460p> References: <20170830151935.24253-1-ming.lei@redhat.com> <20170830151935.24253-3-ming.lei@redhat.com> <567ad683-d577-1817-cf96-eff5aaf47db6@kernel.dk> <20170830153929.GB14684@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: List-ID: On Wed, Aug 30, 2017 at 09:51:31AM -0600, Jens Axboe wrote: > On 08/30/2017 09:39 AM, Ming Lei wrote: > > On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote: > >> On 08/30/2017 09:19 AM, Ming Lei wrote: > >>> It is more reasonable to add requests to ->dispatch in way > >>> of FIFO style, instead of LIFO style. > >>> > >>> Also in this way, we can allow to insert request at the front > >>> of hw queue, which function is needed to fix one bug > >>> in blk-mq's implementation of blk_execute_rq() > >>> > >>> Reported-by: Oleksandr Natalenko > >>> Tested-by: Oleksandr Natalenko > >>> Signed-off-by: Ming Lei > >>> --- > >>> block/blk-mq-sched.c | 2 +- > >>> block/blk-mq.c | 2 +- > >>> 2 files changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > >>> index 4ab69435708c..8d97df40fc28 100644 > >>> --- a/block/blk-mq-sched.c > >>> +++ b/block/blk-mq-sched.c > >>> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, > >>> * the dispatch list. > >>> */ > >>> spin_lock(&hctx->lock); > >>> - list_add(&rq->queuelist, &hctx->dispatch); > >>> + list_add_tail(&rq->queuelist, &hctx->dispatch); > >>> spin_unlock(&hctx->lock); > >>> return true; > >>> } > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>> index 4603b115e234..fed3d0c16266 100644 > >>> --- a/block/blk-mq.c > >>> +++ b/block/blk-mq.c > >>> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) > >>> blk_mq_put_driver_tag(rq); > >>> > >>> spin_lock(&hctx->lock); > >>> - list_splice_init(list, &hctx->dispatch); > >>> + list_splice_tail_init(list, &hctx->dispatch); > >>> spin_unlock(&hctx->lock); > >> > >> I'm not convinced this is safe, there's actually a reason why the > >> request is added to the front and not the back. We do have > >> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of > > > > reorder_tags_to_front() is for reordering the requests in current list, > > this patch is for splicing list into hctx->dispatch, so I can't see > > it isn't safe, or could you explain it a bit? > > If we can get the ordering right, then down the line we won't need to > have the tags reordering at all. It's an ugly hack that I'd love to see > go away. If reorder_tags_to_front() isn't removed, this patch is still safe. But blk_execute_rq_nowait() need to add one request in the front of hw queue, that can be a contradiction compared with maintaining a perfect order for removing reorder_tags_to_front(). So could you share your opinion on the 2nd patch for fixing blk_execute_rq_nowait()? > > >> that than make this change. > >> > >> What's your reasoning here? Your changelog doesn't really explain why > > > > Firstly the 2nd patch need to add one rq(such as RQF_PM) to the > > front of the hw queue, the simple way is to add it to the front > > of hctx->dispatch. Without this change, the 2nd patch can't work > > at all. > > > > Secondly this way is still reasonable: > > > > - one rq is added to hctx->dispatch because queue is busy > > - another rq is added to hctx->dispatch too because of same reason > > > > so it is reasonable to to add list into hctx->dispatch in FIFO style. > > Not disagreeing with the logic. But it also begs the question of why we > don't apply the same treatment to when we splice leftovers to the > dispatch list, currently we front splice that. > > All I'm saying is that you need to tread very carefully with this, and > throw it through some careful testing to ensure that we don't introduce > conditions that now livelock. NVMe is the easy test case, that will Yes, ->dispatch is far away from NVMe, but friends of SCSI-MQ. > generally always work since we never run out of tags. The problematic > test case is usually things like SATA with 31 tags, and especially SATA > with flushes that don't queue. One good test case is the one where you > end up having all tags (or almost all) consumed by flushes, and still > ensuring that we're making forward progress. Understood! Even we can make the test more aggressive. I just setup one virtio-scsi by changing both 'can_queue' and 'cmd_per_lun' as 1, that means the hw queue depth is 1, and hw queue number is set as 1. Then I run 'dbench -t 30 -s -F 64' in ext4 which is over this virtio-scsi device. The dbench(64 jobs, sync write, fsync) just works fine with this patch applied. -- Ming