From: "Sébastien Dugué" <sebastien.dugue@bull.net>
To: suparna@in.ibm.com
Cc: "Ulrich Drepper" <drepper@redhat.com>,
=?iso-8859-1?Q?S=E9bastien_Dugu=E9_=3Csebastien=2Edugue=40bull=2Enet?=.=?iso-8859-1?Q?=3E?=@qubit.in.ibm.com,
"Badari Pulavarty" <pbadari@us.ibm.com>,
"Zach Brown" <zach.brown@oracle.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Evgeniy Polyakov" <johnpol@2ka.mipt.ru>,
lkml <linux-kernel@vger.kernel.org>,
"David Miller" <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>,
linux-aio@kvack.org, "Benjamin LaHaise" <bcrl@linux.intel.com>
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)
Date: Mon, 04 Sep 2006 16:28:55 +0200 [thread overview]
Message-ID: <1157380136.3895.44.camel@frecb000686> (raw)
In-Reply-To: <20060812182928.GA1989@in.ibm.com>
Hi,
just came back from vacation, sorry for the delay.
On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote:
> BTW, if anyone would like to be dropped off this growing cc list, please
> let us know.
>
> On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote:
> > Sébastien Dugué wrote:
> > > aio completion notification
> >
> > I looked over this now but I don't think I understand everything. Or I
> > don't see how it all is integrated. And no, I'm not looking at the
> > proposed glibc code since would mean being tainted.
>
> Oh, I didn't realise that.
> I'll make an attempt to clarify parts that I understand based on what I
> have gleaned from my reading of the code and intent, but hopefully Sebastien,
> Ben, Zach et al will be able to pitch in for a more accurate and complete
> picture.
>
> >
> >
> > > Details:
> > > -------
> > >
> > > A struct sigevent *aio_sigeventp is added to struct iocb in
> > > include/linux/aio_abi.h
> > >
> > > An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> > > include/linux/aio.h:
> > >
> > > - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> > > requesting thread
> > >
> > > - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> > > specifi thread.
> >
> > This has been proved to be sufficient in the timer code which basically
> > has the same problem. But why do you need separate constants? We have
> > the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use
> > these constants for the values of ki_notify.
> >
>
> I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> part of the ABI, but only internal to the kernel implementation. I think
> Zach had suggested inferring THREAD_ID notification if the pid specified
> is not zero. But, I don't see why ->sigev_notify couldn't used directly
> (just like the POSIX timers code does) thus doing away with the
> new constants altogether. Sebestian/Laurent, do you recall?
As I see it, those IO_NOTIFY_* constants are uneeded and we could use
->sigev_notify directly. I will change this so that we use the same
mechanism as the POSIX timers code.
>
> >
> > > The following fields are added to struct kiocb in include/linux/aio.h:
> > >
> > > - pid_t ki_pid: target of the signal
> > >
> > > - __u16 ki_signo: signal number
> > >
> > > - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> > > IO_NOTIFY_THREAD_ID
> > >
> > > - uid_t ki_uid, ki_euid: filled with the submitter credentials
> >
> > These two fields aren't needed for the POSIX interfaces. Where does the
> > requirement come from? I don't say they should be removed, they might
> > be useful, but if the costs are non-negligible then they could go away.
>
> I'm guessing they are being used for validation of permissions at the time
> of sending the signal, but maybe saving the task pointer in the iocb instead
> of the pid would suffice ?
IIRC, Ben added these for that exact reason. Is this really needed?
Ben?
>
> >
> >
> > > - check whether the submitting thread wants to be notified directly
> > > (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> > > to another thread.
> > > In the latter case a check is made to assert that the target thread
> > > is in the same thread group
> >
> > Is this really how it's implemented? This is not how it should be.
> > Either a signal is sent to a specific thread in the same process (this
> > is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
> > process. Sending a signal to the process means that from the kernel's
> > POV any thread which doesn't have the signal blocked can receive it.
> > The final decision is made by the kernel. There is no mechanism to send
> > the signal to another process.
>
> The code seems to be set up to call specific_send_sig_info() in the case
> of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the
> intended behaviour is as you describe it should be (__group_send_sig_info
> does the equivalent of sending a signal to the process and so any thread
> which doesn't have signals blocked can receive it, while specific_send_sig_info
> sends it to a particular thread).
>
> But, I should really leave it to Sebestian to confirm that.
That's right, but I think that part needs to be reworked to follow
the same logic as the POSIX timers.
> > > listio support
> > >
> >
> > I really don't understand the kernel interface for this feature.
>
> I'm sorry this is confusing. This probably means that we need to
> separate the external interface description more clearly and completely
> from the internals.
>
> >
> >
> > > Details:
> > > -------
> > >
> > > An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
> > >
> > > A struct lio_event is added in include/linux/aio.h
> > >
> > > A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
> >
> > So you have a pointer in the structure for the individual requests. I
> > assume you use the atomic counter to trigger the final delivery. I
> > further assume that if lio_wait is set the calling thread is suspended
> > until all requests are handled and that the final notification in this
> > case means that thread gets woken.
> >
> > This is all fine.
> >
> > But how do you pass the requests to the kernel? If you have a new
> > lio_listio-like syscall it'll be easy. But I haven't seen anything like
> > this mentioned.
> >
> > The alternative is to pass the requests one-by-one in which case I don't
> > see how you create the reference to the lio_listio control block. This
> > approach seems to be slower.
>
> The way it works (and better ideas are welcome) is that, since the io_submit()
> syscall already accepts an array of iocbs[], no new syscall was introduced.
> To implement lio_listio, one has to set up such an array, with the first iocb
> in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> specifies the sigev notification to be associated with group completion
> (a NULL value of the sigev notification pointer would imply equivalent of
> LIO_WAIT). The following iocbs in the array should correspond to the set of
> listio aiocbs. Whenever it encounters an IOCB_CMD_GROUP iocb opcode, the
> kernel would interpret all subsequent iocbs[] submitted in the same
> io_submit() call to be associated with the same lio control block.
>
> Does that clarify ?
>
> Would an example help ?
>
> >
> > If all requests are passed at once, do you have the equivalent of
> > LIO_NOP entries?
So far, LIO_NOP entries are pruned by the support library
(libposix-aio) and never sent to the kernel.
> >
>
> Good question - we do have an IOCB_CMD_NOOP defined, and I seem to even
> recall a patch that implemented it, but am wondering if it ever got merged.
> Ben/Zach ?
>
> >
> > How can we support the extension where we wait for a number of requests
> > which need not be all of them. I.e., I submit N requests and want to be
> > notified when at least M (M <= N) notified. I am not yet clear about
> > the actual semantics we should implement (e.g., do we send another
> > notification after the first one?) but it's something which IMO should
> > be taken into account in the design.
> >
>
> My thought here was that it should be possible to include M as a parameter
> to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> block ... then whatever semantics are agreed upon can be implemented.
>
> >
> > Finally, and this is very important, does you code send out the
> > individual requests notification and then in the end the lio_listio
> > completion? I think Suparna wrote this is the case but I want to make sure.
>
> Sebestian, could you confirm ?
If (and only if) the user did setup a sigevent for one or more
individual requests then those requests completion will trigger a
notification and in the end the list completion notification is sent.
Otherwise, only the list completion notification is sent.
--
-----------------------------------------------------
Sébastien Dugué BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770
mailto:sebastien.dugue@bull.net
Linux POSIX AIO: http://www.bullopensource.org/posix
http://sourceforge.net/projects/paiol
-----------------------------------------------------
next prev parent reply other threads:[~2006-09-04 14:29 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <44C66FC9.3050402@redhat.com>
2006-07-25 22:01 ` async network I/O, event channels, etc David Miller
2006-07-25 22:55 ` Nicholas Miell
2006-07-26 6:28 ` Evgeniy Polyakov
2006-07-26 9:18 ` [0/4] kevent: generic event processing subsystem Evgeniy Polyakov
2006-07-26 9:18 ` [1/4] kevent: core files Evgeniy Polyakov
2006-07-26 9:18 ` [2/4] kevent: network AIO, socket notifications Evgeniy Polyakov
2006-07-26 9:18 ` [3/4] kevent: AIO, aio_sendfile() implementation Evgeniy Polyakov
2006-07-26 9:18 ` [4/4] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-07-26 10:00 ` [3/4] kevent: AIO, aio_sendfile() implementation Christoph Hellwig
2006-07-26 10:08 ` Evgeniy Polyakov
2006-07-26 10:13 ` Christoph Hellwig
2006-07-26 10:25 ` Evgeniy Polyakov
2006-07-26 10:04 ` Christoph Hellwig
2006-07-26 10:12 ` David Miller
2006-07-26 10:15 ` Christoph Hellwig
2006-07-26 20:21 ` Phillip Susi
2006-07-26 14:14 ` Avi Kivity
2006-07-26 10:19 ` Evgeniy Polyakov
2006-07-26 10:30 ` Christoph Hellwig
2006-07-26 14:28 ` Ulrich Drepper
2006-07-26 16:22 ` Badari Pulavarty
2006-07-27 6:49 ` Sébastien Dugué
2006-07-27 15:28 ` Badari Pulavarty
2006-07-27 18:14 ` Zach Brown
2006-07-27 18:29 ` Badari Pulavarty
2006-07-27 18:44 ` Ulrich Drepper
2006-07-27 21:02 ` Badari Pulavarty
2006-07-28 7:31 ` Sébastien Dugué
2006-07-28 12:58 ` Sébastien Dugué
2006-08-11 19:45 ` Ulrich Drepper
2006-08-12 18:29 ` Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile) Suparna Bhattacharya
2006-08-12 19:10 ` Ulrich Drepper
2006-08-12 19:28 ` Jakub Jelinek
2006-09-04 14:37 ` Sébastien Dugué
2006-08-14 7:02 ` Suparna Bhattacharya
2006-08-14 16:38 ` Ulrich Drepper
2006-08-15 2:06 ` Nicholas Miell
2006-09-04 14:36 ` Sébastien Dugué
2006-09-04 14:28 ` Sébastien Dugué [this message]
2006-07-28 7:29 ` [3/4] kevent: AIO, aio_sendfile() implementation Sébastien Dugué
2006-07-31 10:11 ` Suparna Bhattacharya
2006-07-28 7:26 ` Sébastien Dugué
2006-07-26 10:31 ` [1/4] kevent: core files Andrew Morton
2006-07-26 10:37 ` Evgeniy Polyakov
2006-07-26 10:44 ` Evgeniy Polyakov
2006-07-27 6:10 ` async network I/O, event channels, etc David Miller
2006-07-27 7:49 ` Evgeniy Polyakov
2006-07-27 8:02 ` David Miller
2006-07-27 8:09 ` Jens Axboe
2006-07-27 8:11 ` Jens Axboe
2006-07-27 8:20 ` David Miller
2006-07-27 8:29 ` Jens Axboe
2006-07-27 8:37 ` David Miller
2006-07-27 8:39 ` Jens Axboe
2006-07-27 8:58 ` Evgeniy Polyakov
2006-07-27 9:31 ` David Miller
2006-07-27 9:37 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1157380136.3895.44.camel@frecb000686 \
--to=sebastien.dugue@bull.net \
--cc==?iso-8859-1?Q?S=E9bastien_Dugu=E9_=3Csebastien=2Edugue=40bull=2Enet?=.=?iso-8859-1?Q?=3E?=@qubit.in.ibm.com \
--cc=bcrl@linux.intel.com \
--cc=davem@davemloft.net \
--cc=drepper@redhat.com \
--cc=hch@infradead.org \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=suparna@in.ibm.com \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).