linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM ATTEND] block: multipage bvecs
@ 2016-02-26 16:33 Ming Lei
  2016-02-28 11:17 ` Boaz Harrosh
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Ming Lei @ 2016-02-26 16:33 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-block, Linux FS Devel

Hi,

I'd like to participate in LSF/MM and discuss multipage bvecs.

Kent posted the idea[1] before, but never pushed out.
I have studied multipage bvecs for a while, and think
it is a good idea to improve block subsystem.

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in current kernel.

IMO, there are several advantages by supporting multipage bvecs:

- currently one bio from bio_alloc() can only hold at most 256
vectors, which means one bio can be used to transfer at most
1Mbytes(256*4K). With multipage bvecs fs can submit bigger
chunk via single bio because big physically contiguous segment
is very common.

- CPU consumed in iterating bvec table should be decreased

- block merge gets simplified a lot, and segment can be merged
just inside bio_add_page(), then singlepage bvec needn't to store
in bvec table, finally the segment can be splitted to driver with
proper size. blk_bio_map_sg() gets simplified too. Recent days,
block merge becomes a bit complicated and we saw quite bug reports/fixes
in block merge.

I'd like to hear opinions from fs guys about multipage bvecs based bio
because this should bring up some change to the bio interface(one bio
will represent bigger I/O than before).

Also I hope to discuss with guys in fs, dm, md, bcache... about
the implementation because this feature will bring changes on
these subsystems. So far, I have the following ideas:

1) change on bio_for_each_segment()

bvec returned from this iterator helper need to keep as singlepage
vector as before, so most users of bio iterator don't need change

2) change on bio_for_each_segment_all()

bio_for_each_segment_all() has to be changed because callers may
change the bvec and assume it is always singlepage now.

I think bio_for_each_segment_all() need to be splitted into
bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().

Both two new helpers returns pointer to bio_bvec like before.

*_rd() is used to iterate each vector for reading the pointed bvec,
and caller can not write to this vector. This helper can still
return singlepage bvec like before, so one extra local/temp 'bio_bvec'
variable has to be added for conversion from multipage bvec to
singlepage bvec.

*_wt() is used to iterate each vector for changing the bvec, and
only allowed for iterating bio with singlepage bvecs, there are
just several such cases, such as bio bounce, bio_alloc_pages(),
raid1 and raid10.

3) change bvecs of cloned bio
Such as bio bounce and raid1, one bio is cloned from the incoming
bio, and each bvec of the cloned bio may be updated. We have to
introduce singlepage version of bio_clone() to make the cloned bio
only include singlepage bvec, then the bvecs can be updated like
before.

One problem is that the cloned bio may not hold all singlepage bvec
converted from multipage bvecs in the source bio, and one simple
solution is to split the source bio and make sure its size can't be
bigger than 1Mbytes(256 single page vectors).

4) introduce bio_for_each_mp_segment()

bvec returned from this iterator helper will become multipage bvec
which should be the actual/real segment, so drivers may switch to
this helper if they can handle multipage segment directly, which
should be common case.


[1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-03-22  0:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
2016-02-28 11:17 ` Boaz Harrosh
2016-02-28 14:34   ` Ming Lei
2016-02-28 14:41     ` Ming Lei
2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
2016-02-29  9:41     ` Boaz Harrosh
2016-02-28 16:08   ` Christoph Hellwig
2016-02-29 10:16     ` Boaz Harrosh
2016-02-29 15:46       ` James Bottomley
2016-02-28 16:07 ` Christoph Hellwig
2016-02-28 16:26   ` James Bottomley
2016-02-28 16:29     ` Christoph Hellwig
2016-02-28 16:45       ` James Bottomley
2016-02-28 16:59         ` Ming Lei
2016-02-28 17:09           ` James Bottomley
2016-02-28 18:49             ` Ming Lei
2016-03-03  8:58 ` Christoph Hellwig
2016-03-03 11:04   ` Ming Lei
2016-03-03 12:11     ` Christoph Hellwig
2016-03-03 23:49       ` Ming Lin
2016-03-07  8:44       ` Ming Lei
2016-03-21 15:55         ` Christoph Hellwig
2016-03-22  0:12           ` Ming Lei
2016-03-05  8:35 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).