Re: [LSF/MM ATTEND] block: multipage bvecs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boaz Harrosh <boaz@plexistor.com>
To: Ming Lei <tom.leiming@gmail.com>, lsf-pc@lists.linuxfoundation.org
Cc: linux-block@vger.kernel.org,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [LSF/MM ATTEND] block: multipage bvecs
Date: Sun, 28 Feb 2016 13:17:43 +0200	[thread overview]
Message-ID: <56D2D757.2000204@plexistor.com> (raw)
In-Reply-To: <CACVXFVPH2z9H9ZudEV9yirg2mJuz6f7Z4LB5MQrabFEh82Xa7A@mail.gmail.com>

On 02/26/2016 06:33 PM, Ming Lei wrote:
> Hi,
> 
> I'd like to participate in LSF/MM and discuss multipage bvecs.
> 
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
> 
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
> 

Hi Ming Lei

This is an interesting talk for me.

I don't know if you ever tried it but I did. If I take a regular
SSD disk or a PCIE flash card that I have in my machine and
I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
submit_bio, I get 8 pages worth of IO with a single bvec and it
all just works.

Yes Yes I know it would break bunch of other places, probably
the single bvec case works better. But just to say that current
code is not that picky in assuming a single page size.

I would like to see an audit and test cases done in this regard
but to keep current API and make this transparent. I think
that all the below places you mentioned can be made transparent
to "big bvec" if coded carefully, and there need not be a separate
API for multi-page / single-page bvecs. It should all just work.
I might be wrong, have not looked at this deeply, but is my gut
feeling, that it can be possible.

Thanks for bringing up the issue
Boaz

> IMO, there are several advantages by supporting multipage bvecs:
> 
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
> 
> - CPU consumed in iterating bvec table should be decreased
> 
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
> 
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
> 
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
> 
> 1) change on bio_for_each_segment()
> 
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
> 
> 2) change on bio_for_each_segment_all()
> 
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
> 
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
> 
> Both two new helpers returns pointer to bio_bvec like before.
> 
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
> 
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
> 
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
> 
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
> 
> 4) introduce bio_for_each_mp_segment()
> 
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.
> 
> 
> [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> 
> Thanks,
> Ming Lei
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2016-02-28 11:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
2016-02-28 11:17 ` Boaz Harrosh [this message]
2016-02-28 14:34   ` Ming Lei
2016-02-28 14:41     ` Ming Lei
2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
2016-02-29  9:41     ` Boaz Harrosh
2016-02-28 16:08   ` Christoph Hellwig
2016-02-29 10:16     ` Boaz Harrosh
2016-02-29 15:46       ` James Bottomley
2016-02-28 16:07 ` Christoph Hellwig
2016-02-28 16:26   ` James Bottomley
2016-02-28 16:29     ` Christoph Hellwig
2016-02-28 16:45       ` James Bottomley
2016-02-28 16:59         ` Ming Lei
2016-02-28 17:09           ` James Bottomley
2016-02-28 18:49             ` Ming Lei
2016-03-03  8:58 ` Christoph Hellwig
2016-03-03 11:04   ` Ming Lei
2016-03-03 12:11     ` Christoph Hellwig
2016-03-03 23:49       ` Ming Lin
2016-03-07  8:44       ` Ming Lei
2016-03-21 15:55         ` Christoph Hellwig
2016-03-22  0:12           ` Ming Lei
2016-03-05  8:35 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D2D757.2000204@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.