All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Alasdair G Kergon <agk@redhat.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	linux-kernel@vger.kernel.org, Mike Snitzer <snitzer@redhat.com>,
	jens.axboe@oracle.com
Subject: Re: [dm-devel] [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks
Date: Sat, 6 Mar 2010 10:52:37 +1100	[thread overview]
Message-ID: <20100306105237.0a8621ed@notabene.brown> (raw)
In-Reply-To: <20100305222748.GT27852@agk-dp.fab.redhat.com>

On Fri, 5 Mar 2010 22:27:48 +0000
Alasdair G Kergon <agk@redhat.com> wrote:

> On Sat, Mar 06, 2010 at 08:56:51AM +1100, Neil Brown wrote:
> > My preferred end-game would be to allow a bio of any size to be submitted to
> > any device.  The device would be responsible for cutting it up if necessary.
> 
> >From the dm point of view, splitting is undesirable - memory allocations from
> separate mempools, submitting the split-off parts could reorder/delay but must
> still respect barrier constraints etc.  Splitting is the 'slow and complicated'
> path for us.  We support it, but it is simpler and more efficient if the bio is
> created a suitable size in the first place - and the merge_bvec_fn does this
> for us most of the time.

I hadn't thought about barriers in this context previously, but on reflection
I don't think splitting makes barriers noticeably more complex than they are
already.
If you have multiple devices, then you pretty much have to handle a barrier
as:
  - stall new requests
  - send zero length barrier to all devices
  - write the barrier request, either with the barrier flag set,
    or followed by a zero-length barrier

I don't think splitting adds complexity.
I guess if you have a single device, but still wont to split a request, there
might be some optimisations you have to forego...

But my first suggestion, that splitting could be made easier, still stands.
Maybe it doesn't have to be so complicated - then it wouldn't be so slow?

And do you honour merge_bvec_fn's of underlying devices?  A quick grep
suggests you do only for dm-linear and dm-crypt.  This suggests to me that it
is actually a hard interface to support completely in a stacked device, so we
might be better off without it.
In md, I check if merge_bvec_fn is defined and if it is, I just drop
max_sectors to 4K so it will never be needed.  I figure that a performance
drop is better than a correctness error.

Yes, splitting is undesirable, and if we can arrange that most requests don't
get split, then that is good.  But there will always be some requests that
have to be split - whether by bio_add_page or something lower - and I think
that it makes sense to only require that the lowest level does the
splitting, as that is where the specifics on what might be required exists.

Thanks,
NeilBrown

  reply	other threads:[~2010-03-05 23:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-27 17:35 [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks Dmitry Monakhov
2010-02-27 17:35 ` [PATCH 2/2] blktrace: perform cleanup after setup error Dmitry Monakhov
2010-02-28 18:46   ` Jens Axboe
2010-02-28 18:46 ` [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks Jens Axboe
2010-03-03  3:49   ` Dmitry Monakhov
2010-03-03  7:30     ` Jens Axboe
2010-03-03  8:39       ` Dmitry Monakhov
2010-03-03 12:21         ` Jens Axboe
2010-03-03 18:20     ` Mike Snitzer
2010-03-03 18:45       ` Dmitry Monakhov
2010-03-03 19:16         ` Mike Snitzer
2010-03-03 19:42           ` Dmitry Monakhov
2010-03-03 20:07             ` Jens Axboe
2010-03-04 11:47               ` [dm-devel] " Mikulas Patocka
2010-03-04 12:19                 ` Jens Axboe
2010-03-04 21:55                   ` Mikulas Patocka
2010-03-05  7:30                     ` Jens Axboe
2010-03-05  7:30                       ` [dm-devel] " Jens Axboe
2010-03-05 21:56                       ` Neil Brown
2010-03-05 22:27                         ` Alasdair G Kergon
2010-03-05 22:27                           ` Alasdair G Kergon
2010-03-05 23:52                           ` Neil Brown [this message]
2010-03-06  2:20                             ` Alasdair G Kergon
2010-03-08  9:01                       ` Mikulas Patocka
2010-03-04 17:59               ` Lars Ellenberg
2010-03-05 17:37                 ` Alasdair G Kergon
2010-03-05 19:20                   ` Lars Ellenberg
2010-03-08  5:43                   ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100306105237.0a8621ed@notabene.brown \
    --to=neilb@suse.de \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=dmonakhov@openvz.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.