From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 565A1CA9ED3 for ; Tue, 5 Nov 2019 02:21:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2943821882 for ; Tue, 5 Nov 2019 02:21:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IHquP6wp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729760AbfKECVD (ORCPT ); Mon, 4 Nov 2019 21:21:03 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:37764 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729428AbfKECVD (ORCPT ); Mon, 4 Nov 2019 21:21:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1572920462; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qlq/sRpRM1W6CTpIloUGhSlBiZL97CzKxHAcxId3m48=; b=IHquP6wpep26NlEPepkJTVO3d8RcdSioIU2Sy0QbB0eIX+FqhMMAE/B/VagMspv62q6r42 dZq2nzWpkzWFTweeuRS/XdXHTfgNjmJfQRQ8BL0l4MVEKUwGDT/I2st+kZcJ0mI/2R+wJ5 ZlyY5Uf1RGTUZeWtbjm+99P8ryobyyM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-258-WAfon5DyNY63XMa8XzyZbA-1; Mon, 04 Nov 2019 21:20:59 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 764DA1005500; Tue, 5 Nov 2019 02:20:57 +0000 (UTC) Received: from ming.t460p (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 017BA608AC; Tue, 5 Nov 2019 02:20:50 +0000 (UTC) Date: Tue, 5 Nov 2019 10:20:46 +0800 From: Ming Lei To: Kent Overstreet Cc: Jens Axboe , Christoph Hellwig , linux-block@vger.kernel.org, Coly Li , Keith Busch , linux-bcache@vger.kernel.org Subject: Re: [PATCH V4] block: optimize for small block size IO Message-ID: <20191105022046.GF11436@ming.t460p> References: <20191102072911.24817-1-ming.lei@redhat.com> <20191104181403.GA8984@kmo-pixel> <20191104181541.GA21116@infradead.org> <20191104181742.GC8984@kmo-pixel> <20191104184217.GD8984@kmo-pixel> <20191105011135.GD11436@ming.t460p> <20191105021130.GB18564@moria.home.lan> MIME-Version: 1.0 In-Reply-To: <20191105021130.GB18564@moria.home.lan> User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-MC-Unique: WAfon5DyNY63XMa8XzyZbA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Mon, Nov 04, 2019 at 09:11:30PM -0500, Kent Overstreet wrote: > On Tue, Nov 05, 2019 at 09:11:35AM +0800, Ming Lei wrote: > > On Mon, Nov 04, 2019 at 01:42:17PM -0500, Kent Overstreet wrote: > > > On Mon, Nov 04, 2019 at 11:23:42AM -0700, Jens Axboe wrote: > > > > On 11/4/19 11:17 AM, Kent Overstreet wrote: > > > > > On Mon, Nov 04, 2019 at 10:15:41AM -0800, Christoph Hellwig wrote= : > > > > >> On Mon, Nov 04, 2019 at 01:14:03PM -0500, Kent Overstreet wrote: > > > > >>> On Sat, Nov 02, 2019 at 03:29:11PM +0800, Ming Lei wrote: > > > > >>>> __blk_queue_split() may be a bit heavy for small block size(su= ch as > > > > >>>> 512B, or 4KB) IO, so introduce one flag to decide if this bio = includes > > > > >>>> multiple page. And only consider to try splitting this bio in = case > > > > >>>> that the multiple page flag is set. > > > > >>> > > > > >>> So, back in the day I had an alternative approach in mind: get = rid of > > > > >>> blk_queue_split entirely, by pushing splitting down to the requ= est layer - when > > > > >>> we map the bio/request to sgl, just have it map as much as will= fit in the sgl > > > > >>> and if it doesn't entirely fit bump bi_remaining and leave it o= n the request > > > > >>> queue. > > > > >>> > > > > >>> This would mean there'd be no need for counting segments at all= , and would cut a > > > > >>> fair amount of code out of the io path. > > > > >> > > > > >> I thought about that to, but it will take a lot more effort. Mo= stly > > > > >> because md/dm heavily rely on splitting as well. I still think = it is > > > > >> worthwhile, it will just take a significant amount of time and w= e > > > > >> should have the quick improvement now. > > > > >=20 > > > > > We can do it one driver at a time - driver sets a flag to disable > > > > > blk_queue_split(). Obvious one to do first would be nvme since th= at's where it > > > > > shows up the most. > > > > >=20 > > > > > And md/md do splitting internally, but I'm not so sure they need > > > > > blk_queue_split(). > > > >=20 > > > > I'm a big proponent of doing something like that instead, but it is= a > > > > lot of work. I absolutely hate the splitting we're doing now, even > > > > though the original "let's work as hard as we add add page time to = get > > > > things right" was pretty abysmal as well. > > >=20 > > > Last I looked I don't think it was going to be that bad, just needed = a bit of > > > finesse. We just need to be able to partially process a request in e.= g. > > > nvme_map_data(), and blk_rq_map_sg() needs to be modified to only map= as much as > > > will fit instead of popping an assertion. > >=20 > > I think it may not be doable. > >=20 > > blk_rq_map_sg() is called by drivers and has to work on single request,= however > > more requests have to be involved if we delay the splitting to blk_rq_m= ap_sg(). > > Cause splitting means that two bios can't be submitted in single IO req= uest. >=20 > Of course it's doable, do I have to show you how? No, you don't have to, could you just point out where my above words is wro= ng? Thanks, Ming