From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 19 Jan 2018 14:36:48 +0800 From: Ming Lei To: Jens Axboe Cc: Bart Van Assche , "dm-devel@redhat.com" , "linux-block@vger.kernel.org" , "snitzer@redhat.com" Subject: Re: [PATCH] dm rq: Avoid that request processing stalls sporadically Message-ID: <20180119063647.GA25369@ming.t460p> References: <20180118163707.11825-1-bart.vanassche@wdc.com> <20180119001136.GA4712@ming.t460p> <1516320863.2676.104.camel@wdc.com> <20180119001834.GB4712@ming.t460p> <040c195c-732e-70e8-3a54-89c5939e815c@kernel.dk> <20180119003516.GD4712@ming.t460p> <91af852d-cb1b-f306-d899-d72de4023d8d@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <91af852d-cb1b-f306-d899-d72de4023d8d@kernel.dk> List-ID: On Thu, Jan 18, 2018 at 05:49:10PM -0700, Jens Axboe wrote: > On 1/18/18 5:35 PM, Ming Lei wrote: > > On Thu, Jan 18, 2018 at 05:24:29PM -0700, Jens Axboe wrote: > >> On 1/18/18 5:18 PM, Ming Lei wrote: > >>> On Fri, Jan 19, 2018 at 12:14:24AM +0000, Bart Van Assche wrote: > >>>> On Fri, 2018-01-19 at 08:11 +0800, Ming Lei wrote: > >>>>> On Thu, Jan 18, 2018 at 08:37:07AM -0800, Bart Van Assche wrote: > >>>>>> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c > >>>>>> index f16096af879a..c59c59cfd2a5 100644 > >>>>>> --- a/drivers/md/dm-rq.c > >>>>>> +++ b/drivers/md/dm-rq.c > >>>>>> @@ -761,6 +761,7 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, > >>>>>> /* Undo dm_start_request() before requeuing */ > >>>>>> rq_end_stats(md, rq); > >>>>>> rq_completed(md, rq_data_dir(rq), false); > >>>>>> + blk_mq_delay_run_hw_queue(hctx, 100/*ms*/); > >>>>>> return BLK_STS_RESOURCE; > >>>>>> } > >>>>>> > >>>>> > >>>>> Nak. > >>>> > >>>> This patch fixes a regression that was introduced by you. You should know > >>>> that regressions are not acceptable. If you don't agree with this patch, > >>>> please fix the root cause. > >>> > >>> Yesterday I sent a patch, did you test that? > >> > >> That patch isn't going to be of much help. It might prevent you from > >> completely stalling, but it might take you tens of seconds to get there. > > > > Yeah, that is true, and why I marked it as RFC, but the case is so rare to > > happen. > > You don't know that. If the resource is very limited, or someone else > is gobbling up all of it, then it may trigger quite often. Granted, > in that case, you need some way of signaling the freeing of those > resources to make it efficient. > > >> On top of that, it's a rolling timer that gets added when IO is queued, > >> and re-added if IO is pending when it fires. If the latter case is not > >> true, then it won't get armed again. So if IO fails to queue without > >> any other IO pending, you're pretty much in the same situation as if > > > > No IO pending detection may take a bit time, we can do it after > > BLK_STS_RESOURCE is returned and SCHED_RESTART is set. I have done > > this way in the following patch, what do you think of it? > > > > https://github.com/ming1/linux/commit/f866ce0b97b0ae22d033881da7eb07706fd458b4 > > I think it's overly complicated. As I wrote in a lengthier email earlier > today, just ensure that we have a unique return code from the driver for > when it aborts queuing an IO due to a resource shortage that isn't tied > to it's own consumption of it. When we get that return, do a delayed > queue run after X amount of time to ensure we always try again. If you > want to get fancy (later on), you could track the frequency of such > returns and complain if it's too high. Suppose the new introduced return code is BLK_STS_NO_DEV_RESOURCE. This way may degrade performance, for example, DM-MPATH returns BLK_STS_NO_DEV_RESOURCE when blk_get_request() returns NULL, blk-mq handles it by blk_mq_delay_run_hw_queue(10ms). Then blk_mq_sched_restart() is just triggered when one DM-MPATH request is completed, that means one request of underlying queue is completed too, but blk_mq_sched_restart() still can't make progress during the 10ms. That means we should only run blk_mq_delay_run_hw_queue(10ms/100ms) when the queue is idle. I will post out the patch in github for discussion. Thanks, Ming