All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shailabh Nagar <nagar@watson.ibm.com>
To: Stephen Hemminger <shemminger@osdl.org>
Cc: Benjamin LaHaise <bcrl@redhat.com>,
	Andrew Morton <akpm@digeo.com>,
	Alexander Viro <viro@math.psu.edu>,
	linux-aio <linux-aio@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] adding aio_readv/writev
Date: Mon, 23 Sep 2002 10:30:05 -0400	[thread overview]
Message-ID: <3D8F256D.1070107@watson.ibm.com> (raw)
In-Reply-To: 1032555981.2082.10.camel@dell_ss3.pdx.osdl.net

Stephen Hemminger wrote:

>Why not batch up multiple requests with one io_submit? It has the same
>effect, except there would be multiple responses.
>
Even though the multiple iocb's enter the kernel together, they still 
get processed individually so a fair amount of unnecessary data 
transmission and function invocation are still occurring in the submit 
code path.
Depending on how long it takes for io_submit_one to return, there might 
be a reduced probability for merging of io requests at the i/o scheduler.
Finally, the multiple responses need to be handled as you mentioned. I 
suppose the application could wait for the last request (in the 
io_submit list) and that would most probably ensure that the preceding 
ones were complete as well but its not a guarantee offered by the aio 
API, right ?
Besides, the application needs the data (represented by multiple 
requests) at one go so partial completion isn't likely to be  useful and 
will only be an overhead.

While a quantitative assessment of the above tradeoffs is possible, it 
will be difficult to make a good comparison before "true" aio 
functionality is in place for 2.5. Such an assessment is unlikely to 
happen before the feature freeze takes effect. So I'm making a case for 
putting in async vector I/O interfaces in for the following three reasons:
- the synchronous API does provide separate entry points for vector I/O. 
Extending the same to the async interfaces, especially when it doesn't 
even involve creating new syscalls, seems natural for completeness.
- underlying in-kernel infrastructure already supports it, so no major 
changes are needed.
- there exists atleast one major application class (databases) that uses 
vectored I/O heavily and benefits from async I/O. Hence async vectored 
I/O is also likely to be useful. Can anyone else with experience on 
other OS's comment on this ?

Comments, reasons for not doing async readv/writev directly welcome.

- Shailabh

>
>
>On Fri, 2002-09-20 at 13:39, Shailabh Nagar wrote:
>
>>Ben,
>>
>>Currently there is no way to initiate an aio readv/writev in 2.5. There 
>>were no aio_readv/writev calls in 2.4 either - I'm wondering if there 
>>was any particular reason for excluding readv/writev operations from aio ?
>>
>>The read/readv paths have anyway been merged for raw/O_DIRECT and 
>>regular file read/writes. So why not expose the vector read/write to the 
>>user by adding the IOCB_CMD_PREADV/IOCB_CMD_READV and 
>>IOCB_CMD_PWRITEV/IOCB_CMD_WRITEV commands to the aio set. Without that, 
>>raw/O_DIRECT readv users would need to unnecessarily cycle through their 
>>iovecs at a library level submitting them individually.
>>For larger iovecs, user/library code would needlessly deal with multiple 
>>completions. While I'm not sure of the performance impact of the absence 
>>of aio_readv/writev, it seems easy enough to provide.
>>Most of the functions are already in place. We would only
>>need a way to pass the iovec through the iocb.
>>
>>I was thinking of something like this:
>>
>>struct iocb {
>>
>>+union {
>>        __u64	aio_buf
>>+      __u64	aio_iovp
>>+}
>>+union {
>>        __u64	aio_nbytes
>>+      __u64	aio_nsegs
>>+}
>>
>>allowing the iovec * & nsegs to be passed into sys_io_submit. Some code 
>>would be added (within case handling of IOCB_CMD_READV within 
>>io_submit_one) to copy & verify the iovec pointers and then call 
>>aio_readv/aio_writev (if its defined for the fs).
>>
>>What do you think ? I wanted to get some feedback before trying to code 
>>this up.
>>
>>While we are on the topic of expanding aio operations, what about 
>>providing IOCB_CMD_READ/WRITE, distinct from their pread/pwrite 
>>counterparts ? Do you think thats needed ?
>>
>>- Shailabh
>>


  parent reply	other threads:[~2002-09-23 14:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-20 20:39 [RFC] adding aio_readv/writev Shailabh Nagar
     [not found] ` <1032555981.2082.10.camel@dell_ss3.pdx.osdl.net>
2002-09-23 14:30   ` Shailabh Nagar [this message]
2002-09-23 18:53     ` Clement T. Cole
     [not found]     ` <20020923114104.A11680@redhat.com>
2002-09-24 13:20       ` John Gardiner Myers
2002-09-24 13:52         ` Stephen C. Tweedie
2002-09-24 14:13           ` John Gardiner Myers
  -- strict thread matches above, loose matches on Subject: below --
2002-09-23 17:59 Chen, Kenneth W
     [not found] <200209231851.g8NIpea12782@igw2.watson.ibm.com>
2002-09-23 19:52 ` Shailabh Nagar
2002-09-23 20:39   ` Clement T. Cole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D8F256D.1070107@watson.ibm.com \
    --to=nagar@watson.ibm.com \
    --cc=akpm@digeo.com \
    --cc=bcrl@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shemminger@osdl.org \
    --cc=viro@math.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.