From: Boaz Harrosh <boaz@plexistor.com>
To: Ming Lei <tom.leiming@gmail.com>, lsf-pc@lists.linuxfoundation.org
Cc: linux-block@vger.kernel.org,
Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [LSF/MM ATTEND] block: multipage bvecs
Date: Sun, 28 Feb 2016 13:17:43 +0200 [thread overview]
Message-ID: <56D2D757.2000204@plexistor.com> (raw)
In-Reply-To: <CACVXFVPH2z9H9ZudEV9yirg2mJuz6f7Z4LB5MQrabFEh82Xa7A@mail.gmail.com>
On 02/26/2016 06:33 PM, Ming Lei wrote:
> Hi,
>
> I'd like to participate in LSF/MM and discuss multipage bvecs.
>
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
>
Hi Ming Lei
This is an interesting talk for me.
I don't know if you ever tried it but I did. If I take a regular
SSD disk or a PCIE flash card that I have in my machine and
I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
submit_bio, I get 8 pages worth of IO with a single bvec and it
all just works.
Yes Yes I know it would break bunch of other places, probably
the single bvec case works better. But just to say that current
code is not that picky in assuming a single page size.
I would like to see an audit and test cases done in this regard
but to keep current API and make this transparent. I think
that all the below places you mentioned can be made transparent
to "big bvec" if coded carefully, and there need not be a separate
API for multi-page / single-page bvecs. It should all just work.
I might be wrong, have not looked at this deeply, but is my gut
feeling, that it can be possible.
Thanks for bringing up the issue
Boaz
> IMO, there are several advantages by supporting multipage bvecs:
>
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
>
> - CPU consumed in iterating bvec table should be decreased
>
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
>
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
>
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
>
> 1) change on bio_for_each_segment()
>
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
>
> 2) change on bio_for_each_segment_all()
>
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
>
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
>
> Both two new helpers returns pointer to bio_bvec like before.
>
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
>
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
>
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
>
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
>
> 4) introduce bio_for_each_mp_segment()
>
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.
>
>
> [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2
>
> Thanks,
> Ming Lei
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2016-02-28 11:17 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
2016-02-28 11:17 ` Boaz Harrosh [this message]
2016-02-28 14:34 ` Ming Lei
2016-02-28 14:41 ` Ming Lei
2016-02-28 16:01 ` [Lsf-pc] " James Bottomley
2016-02-29 9:41 ` Boaz Harrosh
2016-02-28 16:08 ` Christoph Hellwig
2016-02-29 10:16 ` Boaz Harrosh
2016-02-29 15:46 ` James Bottomley
2016-02-28 16:07 ` Christoph Hellwig
2016-02-28 16:26 ` James Bottomley
2016-02-28 16:29 ` Christoph Hellwig
2016-02-28 16:45 ` James Bottomley
2016-02-28 16:59 ` Ming Lei
2016-02-28 17:09 ` James Bottomley
2016-02-28 18:49 ` Ming Lei
2016-03-03 8:58 ` Christoph Hellwig
2016-03-03 11:04 ` Ming Lei
2016-03-03 12:11 ` Christoph Hellwig
2016-03-03 23:49 ` Ming Lin
2016-03-07 8:44 ` Ming Lei
2016-03-21 15:55 ` Christoph Hellwig
2016-03-22 0:12 ` Ming Lei
2016-03-05 8:35 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D2D757.2000204@plexistor.com \
--to=boaz@plexistor.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linuxfoundation.org \
--cc=tom.leiming@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).