From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752148AbdJLPXM (ORCPT ); Thu, 12 Oct 2017 11:23:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45452 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751119AbdJLPXK (ORCPT ); Thu, 12 Oct 2017 11:23:10 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com EA4BC7EAB3 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=ming.lei@redhat.com Date: Thu, 12 Oct 2017 23:22:47 +0800 From: Ming Lei To: Jens Axboe Cc: Omar Sandoval , linux-block@vger.kernel.org, Christoph Hellwig , Mike Snitzer , dm-devel@redhat.com, Bart Van Assche , Laurence Oberman , Paolo Valente , Oleksandr Natalenko , Tom Nguyen , linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, Omar Sandoval Subject: Re: [PATCH V6 4/5] blk-mq-sched: improve dispatching from sw queue Message-ID: <20171012152240.GA9275@ming.t460p> References: <20171009112424.30524-1-ming.lei@redhat.com> <20171009112424.30524-5-ming.lei@redhat.com> <20171010182345.GD30738@vader.DHCP.thefacebook.com> <20171012100107.GA28224@ming.t460p> <82efff77-6b17-8bd8-e425-c143284ebb07@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <82efff77-6b17-8bd8-e425-c143284ebb07@kernel.dk> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 12 Oct 2017 15:23:10 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 12, 2017 at 08:52:12AM -0600, Jens Axboe wrote: > On 10/12/2017 04:01 AM, Ming Lei wrote: > > On Tue, Oct 10, 2017 at 11:23:45AM -0700, Omar Sandoval wrote: > >> On Mon, Oct 09, 2017 at 07:24:23PM +0800, Ming Lei wrote: > >>> SCSI devices use host-wide tagset, and the shared driver tag space is > >>> often quite big. Meantime there is also queue depth for each lun( > >>> .cmd_per_lun), which is often small, for example, on both lpfc and > >>> qla2xxx, .cmd_per_lun is just 3. > >>> > >>> So lots of requests may stay in sw queue, and we always flush all > >>> belonging to same hw queue and dispatch them all to driver, unfortunately > >>> it is easy to cause queue busy because of the small .cmd_per_lun. > >>> Once these requests are flushed out, they have to stay in hctx->dispatch, > >>> and no bio merge can participate into these requests, and sequential IO > >>> performance is hurt a lot. > >>> > >>> This patch introduces blk_mq_dequeue_from_ctx for dequeuing request from > >>> sw queue so that we can dispatch them in scheduler's way, then we can > >>> avoid to dequeue too many requests from sw queue when ->dispatch isn't > >>> flushed completely. > >>> > >>> This patch improves dispatching from sw queue when there is per-request-queue > >>> queue depth by taking request one by one from sw queue, just like the way > >>> of IO scheduler. > >> > >> This still didn't address Jens' concern about using q->queue_depth as > >> the heuristic for whether to do the full sw queue flush or one-by-one > >> dispatch. The EWMA approach is a bit too complex for now, can you please > >> try the heuristic of whether the driver ever returned BLK_STS_RESOURCE? > > > > That can be done easily, but I am not sure if it is good. > > > > For example, inside queue rq path of NVMe, kmalloc(GFP_ATOMIC) is > > often used, if kmalloc() returns NULL just once, BLK_STS_RESOURCE > > will be returned to blk-mq, then blk-mq will never do full sw > > queue flush even when kmalloc() always succeed from that time > > on. > > Have it be a bit more than a single bit, then. Reset it every x IOs or > something like that, that'll be more representative of transient busy > conditions anyway. OK, that can be done via a simplified EWMA by considering the dispatch result only. I will address it in V6. -- Ming