All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org, Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH 1/2] blkdev: fix merge_bvec_fn return value	checks
Date: Fri, 5 Mar 2010 08:30:59 +0100	[thread overview]
Message-ID: <20100305073059.GI5768@kernel.dk> (raw)
In-Reply-To: <Pine.LNX.4.64.1003041615120.17032@hs20-bc2-1.build.redhat.com>

On Thu, Mar 04 2010, Mikulas Patocka wrote:
> > > I think it should be redesigned in this way:
> > > * the bio should be allowed to have arbitrary size, up to BIO_MAX_PAGES
> > > * the code for splitting bios should be just at one place --- in the 
> > > generic block layer and it should work with all drivers. I.e.
> > > q->make_request_fn will return that the request is too big for the given 
> > > driver and the generic code will allocate few another bios from 
> > > per-request_queue mempool and split the bio.
> > > * then, we would fix the raid1 bug and we would also simplify md and dm 
> > > --- remove the splitting code.
> > 
> > The design certainly isn't perfect, but neither is the one that you
> > suggest. For instance, what would the point of building bigger bios just
> > to split them _everytime_ be? That's not good design, and it's certainly
> > not helping performance much.
> 
> The point for building a big bio and splitting it is code size reduction.
> 
> You must realize that you need to split the request anyway. If the caller 
> needs to read/write M sectors and the device can process at most N 
> sectors, where N<M, then the i/o must be split. You can't get rid of that 
> split.

If we consider the basic (and mosted used) file system IO path, then
those 'M' sectors will usually be submitted in units that are smaller
than 'N' anyway. It would be silly to allocate a bio for 'N' sectors
(those get big), only to split it up again shortly.

The point is not to build it too big to begin with, and keep the
splitting as the slow path. That's how it was originally designed, and
it'll surely stay that way. Doing the split handling in generic code is
a good idea, it definitely should be. But I'm never going to make
splitting a normal part of IO operations just because we allow
arbitrarily large bios, sorry but that's insane.

> Now, the question is, where this splitting happens. Currently, the 
> splitting happens in the issuers of the bio (for example dm-io, which is a 
> big chunk of code written just to work around the uncertain bio size 
> limits) as well as in the consumers of the bio (md, dm and request-based 
> drivers, if their maximum transfer size < PAGE_SIZE).
> 
> What I'm proposing is to put that bio splitting just in one place and 
> remove it from both the issuers and the consumers of the bio.

Agree on that, we can make that generic (and should).

> There is just one case, where it is not desirable to create larger 
> requests than the physical transfer size because of performance --- that 
> is page cache readahead. That could use the current approach of querying 
> the queue for limits and not stuffing more data than the device can 
> handle.
> 
> In all the other (non-readahead) cases, the caller is waiting for all the 
> data anyway, there is no performance problem with creating a larger 
> requests and splitting them.

But this I still think is crazy, sorry.

> > Alasdair had an idea for doing a map/unmap like interface instead, which
> > I think is a lot better. I don't think he ever coded it up, though.
> > Perhaps talk to him, if you want to improve our situation in this area.
> 
> It would increase code size, not reduce it.

What kind of answer is that?

> BTW. see fs/bio.c:bio_split --- it uses a shared mempool. It can deadlock 
> if the devices depend on each other. You need to use per-queue mempool.

Only if you need to split the same original bio twice on the same IO
path.

-- 
Jens Axboe

WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <jens.axboe@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org, Mike Snitzer <snitzer@redhat.com>
Subject: Re: [dm-devel] [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks
Date: Fri, 5 Mar 2010 08:30:59 +0100	[thread overview]
Message-ID: <20100305073059.GI5768@kernel.dk> (raw)
In-Reply-To: <Pine.LNX.4.64.1003041615120.17032@hs20-bc2-1.build.redhat.com>

On Thu, Mar 04 2010, Mikulas Patocka wrote:
> > > I think it should be redesigned in this way:
> > > * the bio should be allowed to have arbitrary size, up to BIO_MAX_PAGES
> > > * the code for splitting bios should be just at one place --- in the 
> > > generic block layer and it should work with all drivers. I.e.
> > > q->make_request_fn will return that the request is too big for the given 
> > > driver and the generic code will allocate few another bios from 
> > > per-request_queue mempool and split the bio.
> > > * then, we would fix the raid1 bug and we would also simplify md and dm 
> > > --- remove the splitting code.
> > 
> > The design certainly isn't perfect, but neither is the one that you
> > suggest. For instance, what would the point of building bigger bios just
> > to split them _everytime_ be? That's not good design, and it's certainly
> > not helping performance much.
> 
> The point for building a big bio and splitting it is code size reduction.
> 
> You must realize that you need to split the request anyway. If the caller 
> needs to read/write M sectors and the device can process at most N 
> sectors, where N<M, then the i/o must be split. You can't get rid of that 
> split.

If we consider the basic (and mosted used) file system IO path, then
those 'M' sectors will usually be submitted in units that are smaller
than 'N' anyway. It would be silly to allocate a bio for 'N' sectors
(those get big), only to split it up again shortly.

The point is not to build it too big to begin with, and keep the
splitting as the slow path. That's how it was originally designed, and
it'll surely stay that way. Doing the split handling in generic code is
a good idea, it definitely should be. But I'm never going to make
splitting a normal part of IO operations just because we allow
arbitrarily large bios, sorry but that's insane.

> Now, the question is, where this splitting happens. Currently, the 
> splitting happens in the issuers of the bio (for example dm-io, which is a 
> big chunk of code written just to work around the uncertain bio size 
> limits) as well as in the consumers of the bio (md, dm and request-based 
> drivers, if their maximum transfer size < PAGE_SIZE).
> 
> What I'm proposing is to put that bio splitting just in one place and 
> remove it from both the issuers and the consumers of the bio.

Agree on that, we can make that generic (and should).

> There is just one case, where it is not desirable to create larger 
> requests than the physical transfer size because of performance --- that 
> is page cache readahead. That could use the current approach of querying 
> the queue for limits and not stuffing more data than the device can 
> handle.
> 
> In all the other (non-readahead) cases, the caller is waiting for all the 
> data anyway, there is no performance problem with creating a larger 
> requests and splitting them.

But this I still think is crazy, sorry.

> > Alasdair had an idea for doing a map/unmap like interface instead, which
> > I think is a lot better. I don't think he ever coded it up, though.
> > Perhaps talk to him, if you want to improve our situation in this area.
> 
> It would increase code size, not reduce it.

What kind of answer is that?

> BTW. see fs/bio.c:bio_split --- it uses a shared mempool. It can deadlock 
> if the devices depend on each other. You need to use per-queue mempool.

Only if you need to split the same original bio twice on the same IO
path.

-- 
Jens Axboe


  reply	other threads:[~2010-03-05  7:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-27 17:35 [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks Dmitry Monakhov
2010-02-27 17:35 ` [PATCH 2/2] blktrace: perform cleanup after setup error Dmitry Monakhov
2010-02-28 18:46   ` Jens Axboe
2010-02-28 18:46 ` [PATCH 1/2] blkdev: fix merge_bvec_fn return value checks Jens Axboe
2010-03-03  3:49   ` Dmitry Monakhov
2010-03-03  7:30     ` Jens Axboe
2010-03-03  8:39       ` Dmitry Monakhov
2010-03-03 12:21         ` Jens Axboe
2010-03-03 18:20     ` Mike Snitzer
2010-03-03 18:45       ` Dmitry Monakhov
2010-03-03 19:16         ` Mike Snitzer
2010-03-03 19:42           ` Dmitry Monakhov
2010-03-03 20:07             ` Jens Axboe
2010-03-04 11:47               ` [dm-devel] " Mikulas Patocka
2010-03-04 12:19                 ` Jens Axboe
2010-03-04 21:55                   ` Mikulas Patocka
2010-03-05  7:30                     ` Jens Axboe [this message]
2010-03-05  7:30                       ` Jens Axboe
2010-03-05 21:56                       ` Neil Brown
2010-03-05 22:27                         ` Alasdair G Kergon
2010-03-05 22:27                           ` Alasdair G Kergon
2010-03-05 23:52                           ` Neil Brown
2010-03-06  2:20                             ` Alasdair G Kergon
2010-03-08  9:01                       ` Mikulas Patocka
2010-03-04 17:59               ` Lars Ellenberg
2010-03-05 17:37                 ` Alasdair G Kergon
2010-03-05 19:20                   ` Lars Ellenberg
2010-03-08  5:43                   ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100305073059.GI5768@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.