* [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad
@ 2018-01-29 21:33 Mike Snitzer
2018-02-02 6:19 ` [dm-devel] " NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2018-01-29 21:33 UTC (permalink / raw)
To: linux-block, dm-devel
I'd like to enable bio-based DM to _not_ need to clone bios. But to do
so each bio-based DM target's required per-bio-data would need to be
provided by upper layer biosets (as opposed to the bioset DM currently
creates).
So my thinking is that all system-level biosets (e.g. fs_bio_set,
blkdev_dio_pool) would redirect to a device specific variant bioset IFF
the underlying device advertises the need for a specific per-bio-data
payload to be provided.
I know this _could_ become a rathole but I'd like to avoid reverting DM
back to the days of having to worry about managing mempools for the
purpose of per-io allocations. I've grown spoiled by the performance
and elegance that comes with having the bio and per-bio-data allocated
from the same bioset.
Thoughts?
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dm-devel] [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad
2018-01-29 21:33 [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad Mike Snitzer
@ 2018-02-02 6:19 ` NeilBrown
2018-02-02 16:08 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2018-02-02 6:19 UTC (permalink / raw)
To: Mike Snitzer, linux-block, dm-devel
[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]
On Mon, Jan 29 2018, Mike Snitzer wrote:
> I'd like to enable bio-based DM to _not_ need to clone bios. But to do
> so each bio-based DM target's required per-bio-data would need to be
> provided by upper layer biosets (as opposed to the bioset DM currently
> creates).
>
> So my thinking is that all system-level biosets (e.g. fs_bio_set,
> blkdev_dio_pool) would redirect to a device specific variant bioset IFF
> the underlying device advertises the need for a specific per-bio-data
> payload to be provided.
>
> I know this _could_ become a rathole but I'd like to avoid reverting DM
> back to the days of having to worry about managing mempools for the
> purpose of per-io allocations. I've grown spoiled by the performance
> and elegance that comes with having the bio and per-bio-data allocated
> from the same bioset.
>
> Thoughts?
md/raid0 remaps each bio and passes it directly down to one of several
devices.
I think your scheme would mean that it would need to clone each bio to
make sure it is from the correctly sized pool.
I suspect it could be made to work though.
1/ have a way for the driver receiving a bio to discover how much
frontpad was allocated.
2/ require drivers to accept bios with any size of frontpad, but a
fast-path is taken if it is already big enough.
3/ allow a block device to advertise it's preferred frontpad.
4/ make sure your config-change-notification mechanism can communicate
changes to this number.
5/ gather statistics on what percentage of bios have a too-small
frontpad.
Then start modifying places that allocate bios to use the hint,
and when benchmarks show the percentage is high - use it to encourage
other people to allocate better bios.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad
2018-02-02 6:19 ` [dm-devel] " NeilBrown
@ 2018-02-02 16:08 ` Mike Snitzer
2018-02-02 22:55 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2018-02-02 16:08 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-block, dm-devel
On Fri, Feb 02 2018 at 1:19am -0500,
NeilBrown <neilb@suse.com> wrote:
> On Mon, Jan 29 2018, Mike Snitzer wrote:
>
> > I'd like to enable bio-based DM to _not_ need to clone bios. But to do
> > so each bio-based DM target's required per-bio-data would need to be
> > provided by upper layer biosets (as opposed to the bioset DM currently
> > creates).
> >
> > So my thinking is that all system-level biosets (e.g. fs_bio_set,
> > blkdev_dio_pool) would redirect to a device specific variant bioset IFF
> > the underlying device advertises the need for a specific per-bio-data
> > payload to be provided.
> >
> > I know this _could_ become a rathole but I'd like to avoid reverting DM
> > back to the days of having to worry about managing mempools for the
> > purpose of per-io allocations. I've grown spoiled by the performance
> > and elegance that comes with having the bio and per-bio-data allocated
> > from the same bioset.
> >
> > Thoughts?
>
> md/raid0 remaps each bio and passes it directly down to one of several
> devices.
> I think your scheme would mean that it would need to clone each bio to
> make sure it is from the correctly sized pool.
Not sure why the md/raid0 device would need to do anything (not unless
it wanted to take advantage of this).
The model, in my head, would be that this is _not_ intended for an
arbitrary stacked MD or DM device. So the underlying device(s) for
a DM or MD device, that wants to leverage upper layer bio_set provided
front_pad, would be a real device (e.g. NVMe, SCSI, whatever).
But if a mix of underlying drivers were used (each with unique
per-bio-data, aka front_pad, requirements) then the blk_stack_limits()
interface _could_ build that information up. And while that pretty much
gets us all we'd need to support md/raid0 on arbitrary stacked DM or MD
volumes that isn't what I'm interested in. But supporting an arbitrary
stacked device could be done as follow-on work.
I'm specifically looking to optimize DM's new DM_TYPE_NVME_BIO_BASED
device (see commits 22c11858e, 978e51ba3, cd02538445, etc) so that bios
are just remapped to the underlying NVMe device without cloning. DM
core now takes care to validate that a DM_TYPE_NVME_BIO_BASED DM device
(e.g. a DM multipath device -- via "queue_mode nvme") is _only_ stacked
directly ontop of native NVMe devices.
> I suspect it could be made to work though.
>
> 1/ have a way for the driver receiving a bio to discover how much
> frontpad was allocated.
Yes
> 2/ require drivers to accept bios with any size of frontpad, but a
> fast-path is taken if it is already big enough.
Yes and no, the driver really should be able to trust that the block
layer is sending it bios with adeuqate front_pad (if the device
registered its front_pad requirements).
That said, making it optional (via "hint") is likely safer from the
standpoint that it is less cut-throat given we'd be depending on the
various biosets to respect our wishes. I just worry that having the
code for falling back to cloning, if the front_pad isn't adequate, will
defeat much of the benefit of optimizing what is intended to be a faster
fast path.
But if the bioset enhancements are implemented properly then the kernels
N biosets shouldn't need to be in doubt. They'll all just adapt to have
N backing mempools (N being for N conflicting front_pad requirements).
> 3/ allow a block device to advertise it's preferred frontpad.
s/preferred/required/ in my mental model.
> 4/ make sure your config-change-notification mechanism can communicate
> changes to this number.
You're referring to he notifier chain idea about restacking
queue_limits? Yes, that'd be needed once that exists.
> 5/ gather statistics on what percentage of bios have a too-small
> frontpad.
>
> Then start modifying places that allocate bios to use the hint,
> and when benchmarks show the percentage is high - use it to encourage
> other people to allocate better bios.
This shouldn't be needed if we were to go the route where bio_sets
dynamically select the appropriate mempool based on the front_pad
requirements advertised by the underlying device ("3/" above).
Thanks,
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad
2018-02-02 16:08 ` Mike Snitzer
@ 2018-02-02 22:55 ` Mike Snitzer
0 siblings, 0 replies; 4+ messages in thread
From: Mike Snitzer @ 2018-02-02 22:55 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-block, dm-devel
On Fri, Feb 02 2018 at 11:08am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
>
> But if the bioset enhancements are implemented properly then the kernels
> N biosets shouldn't need to be in doubt. They'll all just adapt to have
> N backing mempools (N being for N conflicting front_pad requirements).
This should've read:
"But if the bioset enhancements are implemented properly then the kernels
N biosets ability to provide adequate front_pad shouldn't need to be in
doubt. They'll all just adapt to have M backing mempools (for M
conflicting front_pad requirements)."
What this implies is there would need to be a way for the bioset
code to maintain a global graph of all biosets in the system. And when
a device comes along with a unique bioset front_pad requirement, that
isn't already met by existing mempool, the device's driver (DM in
my case) would call into a bioset interface that would add a new backing
mempool, that accounts for the front_pad increase, to each bioset in the
system.
Not liking that (DM) device creation would potentially spawn a new
mempool within each existing bioset. It could/would easily result in
many of those mempools going completely unused.
In addition: how would a bio_alloc_bioset() call _know_ the bio was for
use on a specific block device? The entire beauty of the existing
bio_set code, especially for upper layers like filesystems, is it _is_
device agnostic.
So all this could be the worst idea ever.. not sure. I've deferred
judging it one way or the other because the details are shakey at best.
And I still need to look closer at all the existing code.
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-02-02 22:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-29 21:33 [LSF/MM TOPIC] block: extend generic biosets to allow per-device frontpad Mike Snitzer
2018-02-02 6:19 ` [dm-devel] " NeilBrown
2018-02-02 16:08 ` Mike Snitzer
2018-02-02 22:55 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).