From: Avi Kivity <avi@exanet.com>
To: Joel Becker <jlbec@evilplan.org>
Cc: Yasushi Saito <ysaito@hpl.hp.com>,
linux-aio@kvack.org, linux-kernel@vger.kernel.org,
suparna@in.ibm.com, Janet Morgan <janetmor@us.ibm.com>
Subject: Re: [PATCH 1/2] aio: add vectored I/O support
Date: Sat, 16 Oct 2004 19:29:03 +0200 [thread overview]
Message-ID: <41715A5F.2060006@exanet.com> (raw)
In-Reply-To: <20041016162836.GG17142@parcelfarce.linux.theplanet.co.uk>
Joel Becker wrote:
>On Sat, Oct 16, 2004 at 10:43:04AM +0200, Avi Kivity wrote:
>
>
>>Using IO_CMD_READ for a vector entails
>>
>>- converting the userspace structure (which might well an iovec) to iocbs
>>
>>
>
> Why create an iov if you don't need to?
>
>
>
If you aren't writing directly to the kernel API, an iovec is very
convenient. It need not be an iovec, but surely you need _some_ vector.
>>- merging the iocbs
>>
>>
>
> I don't see how this is different than merging iovs. Whether an
>I/O range is represented by two segments of an iov or by two iocbs, the
>elevator is going to merge them. If the userspace program had the
>knowledge to merge them up front, it should have submitted one larger
>segment.
>
>
No. An iovec is already merged; it is known that adjacent segments of an
iovec have adjacent offsets. a single IO_CMD_READV iovec can generate a
single bio without any merging.
The app did not submit a single large segment for the same reason
non-aio readv is used: because app memory is paged. in my case, a
userspace filesystem has a paged cache; large, disk-contiguous reads go
into many small noncontiguous memory pages. or it might be a database
performing a sequential scan and reading a large block into multiple
block buffers, which are usually discontiguous.
>
>
>>- coalescing the multiple completions in userspace to a single completion
>>
>>
>
> You generally have to do this anyway. In fact, it is often far
>more efficient and performant to have a pattern of:
>
> submit 10;
> reap 3; submit 3 more;
> reap 6; submit 6 more;
> repeat until you are done;
>
>than to wait on all 10 before you can submit 10 again.
>
>
If the data is physically contiguous, it will (should) be merged, and
thus completed in a single event anyway. All 10 completions will happen
at the same time.
I might divide a 1M read into 4 iocbs to get the effect you mention. I
don't want to be forced into dividing them based on virtual address,
into 256 4K iocbs. *if* I wanted to do anything with partial data.
>>error handling is difficult as well. one would expect that a bad sector
>>with multiple iocbs would only fail one of the requests. it seems to be
>>non-trivial to implement this correctly.
>>
>>
>
> I don't follow this. If you mean that you want all io from
>later segments in an iov to fail if one segment has a bad sector, I
>don't know that we can enforce it without running one segment at a
>time. That's terribly slow.
>
>
That's not what I meant. If you submit 16 iocbs which are merged by the
kernel, and there is an error somewhere within the seventh iocb, I would
expect to get 15 success completions and one error completion. so error
information from the merged iocb must be demultiplexed into the originals.
If you have a single iocb, then any error simply fails that iocb.
> Again, even if READV is a good idea, we need to fix whatever
>inefficiencies io_submit() has. copying to/from userspace just can't be
>that slow.
>
>
The inefficiencies I refered to were disk inefficiencies, not processor.
I think what happened was that the number of iocbs submitted (64 iocbs
of 4K each) did not merge because the device queue depth was very large;
no queuing occured because (I imagine) merging happens while a request
is waiting for disk readiness.
Decreasing the queue depth is not an option, because I might want to do
random reads of small iovecs later.
Of course, it is better to copy less than to copy more; so that is an
additional win for PREADV.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2004-10-16 17:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-14 20:10 [PATCH 1/2] aio: add vectored I/O support Yasushi Saito
2004-10-16 3:13 ` Joel Becker
2004-10-16 5:18 ` Avi Kivity
2004-10-16 5:37 ` Joel Becker
2004-10-16 8:43 ` Avi Kivity
2004-10-16 16:28 ` Joel Becker
2004-10-16 17:29 ` Avi Kivity [this message]
2004-10-17 0:14 ` Joel Becker
2004-10-17 6:25 ` Avi Kivity
2004-10-16 12:05 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41715A5F.2060006@exanet.com \
--to=avi@exanet.com \
--cc=janetmor@us.ibm.com \
--cc=jlbec@evilplan.org \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=suparna@in.ibm.com \
--cc=ysaito@hpl.hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox