From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH V15 00/18] block: support multi-page bvec Date: Fri, 15 Feb 2019 08:49:31 -0700 Message-ID: References: <20190215111324.30129-1-ming.lei@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20190215111324.30129-1-ming.lei@redhat.com> Content-Language: en-US Sender: linux-btrfs-owner@vger.kernel.org To: Ming Lei Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Theodore Ts'o , Omar Sandoval , Sagi Grimberg , Dave Chinner , Kent Overstreet , Mike Snitzer , dm-devel@redhat.com, Alexander Viro , linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org, David Sterba , linux-btrfs@vger.kernel.org, "Darrick J . Wong" , linux-xfs@vger.kernel.org, Gao Xiang , Christoph Hellwig , linux-ext4@vger.kernel.org, Coly Li , linux-bcache@vger.kernel.org, Boaz Harrosh , Bob Peterson , clus List-Id: linux-raid.ids On 2/15/19 4:13 AM, Ming Lei wrote: > Hi, > > This patchset brings multi-page bvec into block layer: > > 1) what is multi-page bvec? > > Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages > which are physically contiguous instead of one single page used in linux > kernel for long time. > > 2) why is multi-page bvec introduced? > > Kent proposed the idea[1] first. > > As system's RAM becomes much bigger than before, and huge page, transparent > huge page and memory compaction are widely used, it is a bit easy now > to see physically contiguous pages from fs in I/O. On the other hand, from > block layer's view, it isn't necessary to store intermediate pages into bvec, > and it is enough to just store the physicallly contiguous 'segment' in each > io vector. > > Also huge pages are being brought to filesystem and swap [2][6], we can > do IO on a hugepage each time[3], which requires that one bio can transfer > at least one huge page one time. Turns out it isn't flexiable to change > BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well. > As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured > as much bigger, such as 512, which requires at least two 4K pages for holding > the bvec table. > > With multi-page bvec: > > - Inside block layer, both bio splitting and sg map can become more > efficient than before by just traversing the physically contiguous > 'segment' instead of each page. > > - segment handling in block layer can be improved much in future since it > should be quite easy to convert multipage bvec into segment easily. For > example, we might just store segment in each bvec directly in future. > > - bio size can be increased and it should improve some high-bandwidth IO > case in theory[4]. > > - there is opportunity in future to improve memory footprint of bvecs. > > 3) how is multi-page bvec implemented in this patchset? > > Patch 1 ~ 3 parpares for supporting multi-page bvec. > > Patches 4 ~ 14 implement multipage bvec in block layer: > > - put all tricks into bvec/bio/rq iterators, and as far as > drivers and fs use these standard iterators, they are happy > with multipage bvec > > - introduce bio_for_each_bvec() to iterate over multipage bvec for splitting > bio and mapping sg > > - keep current bio_for_each_segment*() to itereate over singlepage bvec and > make sure current users won't be broken; especailly, convert to this > new helper prototype in single patch 21 given it is bascially a mechanism > conversion > > - deal with iomap & xfs's sub-pagesize io vec in patch 13 > > - enalbe multipage bvec in patch 14 > > Patch 15 redefines BIO_MAX_PAGES as 256. > > Patch 16 documents usages of bio iterator helpers. > > Patch 17~18 kills NO_SG_MERGE. > > These patches can be found in the following git tree: > > git: https://github.com/ming1/linux.git v5.0-blk_mp_bvec_v14 ^^^ v15? > Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset, > and not see regression. > > Thanks Christoph for reviewing the early version and providing very good > suggestions, such as: introduce bio_init_with_vec_table(), remove another > unnecessary helpers for cleanup and so on. > > Thanks Chritoph and Omar for reviewing V10/V11/V12, and provides lots of > helpful comments. Applied, thanks Ming. Let's hope it sticks! -- Jens Axboe