* Re: [PATCH -mm 1/4][AIO] - fix aio.h includes
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
@ 2006-11-20 13:31 ` Zach Brown
0 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2006-11-20 13:31 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
Sébastien Dugué wrote:
> Fix the double inclusion of linux/uio.h in linux/aio.h
Sure seems reasonable to me.
> Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
- z
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent()
2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué
@ 2006-11-20 13:42 ` Zach Brown
2006-11-20 21:45 ` Ulrich Drepper
0 siblings, 1 reply; 26+ messages in thread
From: Zach Brown @ 2006-11-20 13:42 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
> +extern struct task_struct * good_sigevent(sigevent_t *);
I wonder if "good_sigevent()" isn't a very strong name for something
that is now up in signal.h. Maybe "sigevent_find_task()"?
- z
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent()
2006-11-20 21:45 ` Ulrich Drepper
@ 2006-11-20 14:02 ` Zach Brown
0 siblings, 0 replies; 26+ messages in thread
From: Zach Brown @ 2006-11-20 14:02 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Sébastien Dugué, linux-kernel, linux-aio, Andrew Morton,
Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty,
Jean Pierre Dion
Ulrich Drepper wrote:
> Zach Brown wrote:
>> I wonder if "good_sigevent()" isn't a very strong name for something
>> that is now up in signal.h. Maybe "sigevent_find_task()"?
>
> That's not the purpose, it's just one of the checks which need to be
> done to check that the sigevent value is "good".
Hmm? It returns a task_struct *, potentially after finding it with
find_task_by_pid().
That said, I don't particularly care much. It just jumped out as I was
reading it.
- z
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH -mm 0/4][AIO] - AIO completion signal notification v2
@ 2006-11-20 14:17 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
Hi
Here is the latest rework of the AIO completion signal notification patches.
This set consists in 4 patches:
1. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h
2. export-good_sigevent: move good_sigevent into signal.c and export it
3. aio-notify-sig: the AIO completion signal notification
4. listio: adds listio support
Description are in the individual patches.
Changes from v1:
- cleanups suggested by Christoph Hellwig, Badari Pulavarty and Zach Brown
- added lisio patch
Comments are welcome as usual.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 1/4][AIO] - fix aio.h includes
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
@ 2006-11-20 14:22 ` Sébastien Dugué
2006-11-20 13:31 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué
` (2 subsequent siblings)
3 siblings, 1 reply; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
Fix the double inclusion of linux/uio.h in linux/aio.h
aio.h | 1 -
1 file changed, 1 deletion(-)
Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Index: linux-2.6.19-rc5-mm2/include/linux/aio.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17
11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h
2006-11-17 11:20:27.000000000 +0100 @@ -7,7 +7,6 @@
#include <linux/uio.h>
#include <asm/atomic.h>
-#include <linux/uio.h>
#define AIO_MAXSEGS 4
#define AIO_KIOGRP_NR_ATOMIC 8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent()
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
@ 2006-11-20 14:22 ` Sébastien Dugué
2006-11-20 13:42 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
3 siblings, 1 reply; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
Export good_sigevent()
Move good_sigevent() from posix-timers.c to signal.c where it belongs,
and export it so that it can be used by other subsystems.
include/linux/signal.h | 2 ++
kernel/posix-timers.c | 17 -----------------
kernel/signal.c | 23 +++++++++++++++++++++++
3 files changed, 25 insertions(+), 17 deletions(-)
Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Index: linux-2.6.19-rc5-mm2/include/linux/signal.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/signal.h 2006-11-17
11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/signal.h
2006-11-17 11:20:31.000000000 +0100 @@ -241,6 +241,8 @@ extern int
sigprocmask(int, sigset_t *, struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction
*return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * good_sigevent(sigevent_t *);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SIGNAL_H */
Index: linux-2.6.19-rc5-mm2/kernel/posix-timers.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/posix-timers.c 2006-11-17
11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/posix-timers.c
2006-11-17 11:20:31.000000000 +0100 @@ -367,23 +367,6 @@ static enum
hrtimer_restart posix_timer_ return ret;
}
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
- struct task_struct *rtn = current->group_leader;
-
- if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
- (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
- rtn->tgid != current->tgid ||
- (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
- return NULL;
-
- if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
- ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
- return NULL;
-
- return rtn;
-}
-
void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
{
if ((unsigned) clock_id >= MAX_CLOCKS) {
Index: linux-2.6.19-rc5-mm2/kernel/signal.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/signal.c 2006-11-17
11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/signal.c
2006-11-17 11:20:31.000000000 +0100 @@ -1189,6 +1189,29 @@ int
group_send_sig_info(int sig, struct return ret;
}
+/***
+ * good_sigevent - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with tasklist_lock held for reading.
+ */
+struct task_struct * good_sigevent(sigevent_t * event)
+{
+ struct task_struct *rtn = current->group_leader;
+
+ if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
+ (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
+ rtn->tgid != current->tgid ||
+ (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
+ return NULL;
+
+ if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
+ ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
+ return NULL;
+
+ return rtn;
+}
+
/*
* kill_pgrp_info() sends a signal to a process group: this is what the tty
* control characters do (^C, ^Z etc)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué
@ 2006-11-20 14:22 ` Sébastien Dugué
2006-11-20 15:13 ` Zach Brown
2006-11-22 1:02 ` Andrew Morton
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
3 siblings, 2 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
AIO completion signal notification
The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.
This patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.
That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().
A struct sigevent *aio_sigeventp field is added to struct iocb in
include/linux/aio_abi.h
A struct aio_notify containing the sigevent parameters is defined in aio.h:
struct aio_notify {
struct task_struct *target;
__u16 signo;
__u16 notify;
sigval_t value;
};
A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h
In io_submit_one(), if the application provided a sigevent then
setup_sigevent() is called which does the following:
- check access to the user sigevent and make a local copy
- if the requested notification is SIGEV_NONE, then nothing to do
- fill in the kiocb->ki_notify fields (notify, signo, value)
- check sigevent consistency, get the signal target task and
save it in kiocb->ki_notify.target
- preallocate a sigqueue for this event using sigqueue_alloc()
Upon request completion, in aio_complete(), if notification is needed for
this request (iocb->ki_notify.notify != SIGEV_NONE), then aio_send_signal()
is called to signal the target task as follows:
- fill in the siginfo struct to be sent to the task
- if notify is SIGEV_THREAD_ID then send signal to specific task
using send_sigqueue()
- else send signal to task group using send_5group_sigqueue()
Notes concerning sigqueue preallocation:
To ensure reliable delivery of completion notification, the sigqueue is
preallocated in the submission path so that there is no chance it can fail
in the completion path.
Unlike the posix-timers case (currently the single other user of sigqueue
preallocation), where the sigqueue is allocated for the lifetime of the timer
and freed at timer destruction time, the aio case is a bit more tricky due to
the async nature of the whole thing.
In the aio case, the sigqueue exists for the lifetime of the request,
therefore it must be freed only once the signal for the request completion has
been delivered. This involves changing __sigqueue_free() to free the sigqueue
when the signal is collected if si_code is SI_ASYNCIO even if it was
preallocated as well as explicitly calling sigqueue_free() in submission and
completion error paths.
fs/aio.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/aio.h | 12 ++++
include/linux/aio_abi.h | 3 -
kernel/signal.c | 2
4 files changed, 127 insertions(+), 6 deletions(-)
Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Index: linux-2.6.19-rc5-mm2/fs/aio.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:08.000000000
+0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 11:20:32.000000000
+0100 @@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
+ req->ki_notify.sigq = NULL;
INIT_LIST_HEAD(&req->ki_run_list);
/* Check if the completion queue has enough free space to
@@ -463,6 +464,11 @@ static inline void really_put_req(struct
req->ki_dtor(req);
if (req->ki_iovec != &req->ki_inline_vec)
kfree(req->ki_iovec);
+
+ /* Release task ref */
+ if (req->ki_notify.notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
+ put_task_struct(req->ki_notify.target);
+
kmem_cache_free(kiocb_cachep, req);
ctx->reqs_active--;
@@ -929,6 +935,81 @@ void fastcall kick_iocb(struct kiocb *io
}
EXPORT_SYMBOL(kick_iocb);
+static int aio_send_signal(struct aio_notify *notify)
+{
+ struct sigqueue *sigq = notify->sigq;
+ struct siginfo *info = &sigq->info;
+ int ret;
+
+ memset(info, 0, sizeof(struct siginfo));
+
+ info->si_signo = notify->signo;
+ info->si_errno = 0;
+ info->si_code = SI_ASYNCIO;
+ info->si_pid = 0;
+ info->si_uid = 0;
+ info->si_value = notify->value;
+
+ if (notify->notify & SIGEV_THREAD_ID)
+ ret = send_sigqueue(notify->signo, sigq, notify->target);
+ else
+ ret = send_group_sigqueue(notify->signo, sigq, notify->target);
+
+ return ret;
+}
+
+static long aio_setup_sigevent(struct aio_notify *notify,
+ struct sigevent __user *user_event)
+{
+ sigevent_t event;
+ struct task_struct *target;
+
+ if (copy_from_user(&event, user_event, sizeof (event)))
+ return -EFAULT;
+
+ if (event.sigev_notify == SIGEV_NONE)
+ return 0;
+
+ notify->notify = event.sigev_notify;
+ notify->signo = event.sigev_signo;
+ notify->value = event.sigev_value;
+
+ read_lock(&tasklist_lock);
+ target = good_sigevent(&event);
+
+ if (unlikely(!target || (target->flags & PF_EXITING)))
+ goto out_unlock;
+
+ notify->target = target;
+
+ if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
+ /*
+ * This reference will be dropped in really_put_req() when
+ * we're done with the request.
+ */
+ get_task_struct(target);
+ }
+
+ read_unlock(&tasklist_lock);
+
+ /*
+ * NOTE: we cannot free the sigqueue in the completion path as
+ * the signal may not have been delivered to the target task.
+ * Therefore it has to be freed in __sigqueue_free() when the
+ * signal is collected if si_code is SI_ASYNCIO.
+ */
+ notify->sigq = sigqueue_alloc();
+
+ if (unlikely(!notify->sigq))
+ return -EAGAIN;
+
+ return 0;
+
+out_unlock:
+ read_unlock(&tasklist_lock);
+ return -EINVAL;
+}
+
/* aio_complete
* Called when the io request on the given iocb is complete.
* Returns true if this is the last user of the request. The
@@ -976,8 +1057,11 @@ int fastcall aio_complete(struct kiocb *
* cancelled requests don't get events, userland was given one
* when the event got cancelled.
*/
- if (kiocbIsCancelled(iocb))
+ if (kiocbIsCancelled(iocb)) {
+ if (iocb->ki_notify.sigq)
+ sigqueue_free(iocb->ki_notify.sigq);
goto put_rq;
+ }
ring = kmap_atomic(info->ring_pages[0], KM_IRQ1);
@@ -1008,6 +1092,14 @@ int fastcall aio_complete(struct kiocb *
pr_debug("added to ring %p at [%lu]\n", iocb, tail);
+ if (iocb->ki_notify.notify != SIGEV_NONE) {
+ ret = aio_send_signal(&iocb->ki_notify);
+
+ /* If signal generation failed, release the sigqueue */
+ if (ret)
+ sigqueue_free(iocb->ki_notify.sigq);
+ }
+
pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried,
iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes);
put_rq:
@@ -1549,8 +1641,7 @@ int fastcall io_submit_one(struct kioctx
ssize_t ret;
/* enforce forwards compatibility on users */
- if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2 ||
- iocb->aio_reserved3)) {
+ if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved3)) {
pr_debug("EINVAL: io_submit: reserve field set\n");
return -EINVAL;
}
@@ -1559,6 +1650,7 @@ int fastcall io_submit_one(struct kioctx
if (unlikely(
(iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
(iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
+ (iocb->aio_sigeventp != (unsigned long)iocb->aio_sigeventp) ||
((ssize_t)iocb->aio_nbytes < 0)
)) {
pr_debug("EINVAL: io_submit: overflow check\n");
@@ -1593,10 +1685,21 @@ int fastcall io_submit_one(struct kioctx
INIT_LIST_HEAD(&req->ki_wait.task_list);
req->ki_retried = 0;
+ /* handle setting up the sigevent for POSIX AIO signals */
+ req->ki_notify.notify = SIGEV_NONE;
+
+ if (iocb->aio_sigeventp) {
+ ret = aio_setup_sigevent(&req->ki_notify,
+ (struct sigevent __user *)(unsigned
long)
+ iocb->aio_sigeventp);
+ if (ret)
+ goto out_put_req;
+ }
+
ret = aio_setup_iocb(req);
if (ret)
- goto out_put_req;
+ goto out_sigqfree;
spin_lock_irq(&ctx->ctx_lock);
aio_run_iocb(req);
@@ -1609,6 +1712,11 @@ int fastcall io_submit_one(struct kioctx
aio_put_req(req); /* drop extra ref to req */
return 0;
+out_sigqfree:
+ /* Undo the sigqueue alloc if someting went bad */
+ if (req->ki_notify.sigq)
+ sigqueue_free(req->ki_notify.sigq);
+
out_put_req:
aio_put_req(req); /* drop extra ref to req */
aio_put_req(req); /* drop i/o ref to req */
Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17
11:20:08.000000000 +0100 +++
linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17
11:20:32.000000000 +0100 @@ -82,8 +82,9 @@ struct iocb { __u64
aio_nbytes; __s64 aio_offset;
+ __u64 aio_sigeventp; /* pointer to struct sigevent */
+
/* extra parameters */
- __u64 aio_reserved2; /* TODO: use this for a (struct
sigevent *) */ __u64 aio_reserved3;
}; /* 64 bytes */
Index: linux-2.6.19-rc5-mm2/include/linux/aio.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17
11:20:27.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h
2006-11-17 11:20:32.000000000 +0100 @@ -7,6 +7,7 @@
#include <linux/uio.h>
#include <asm/atomic.h>
+#include <asm/siginfo.h>
#define AIO_MAXSEGS 4
#define AIO_KIOGRP_NR_ATOMIC 8
@@ -49,6 +50,14 @@ struct kioctx;
#define kiocbIsKicked(iocb) test_bit(KIF_KICKED, &(iocb)->ki_flags)
#define kiocbIsCancelled(iocb) test_bit(KIF_CANCELLED,
&(iocb)->ki_flags)
+struct aio_notify {
+ struct task_struct *target;
+ __u16 signo;
+ __u16 notify;
+ sigval_t value;
+ struct sigqueue *sigq;
+};
+
/* is there a better place to document function pointer methods? */
/**
* ki_retry - iocb forward progress callback
@@ -118,6 +127,9 @@ struct kiocb {
struct list_head ki_list; /* the aio core uses this
* for cancellation */
+
+ /* to notify a process on I/O event */
+ struct aio_notify ki_notify;
};
#define is_sync_kiocb(iocb) ((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.19-rc5-mm2/kernel/signal.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/signal.c 2006-11-17
11:20:31.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/signal.c
2006-11-17 11:20:32.000000000 +0100 @@ -297,7 +297,7 @@ static struct sigqueue
*__sigqueue_alloc
static void __sigqueue_free(struct sigqueue *q)
{
- if (q->flags & SIGQUEUE_PREALLOC)
+ if (q->flags & SIGQUEUE_PREALLOC && q->info.si_code != SI_ASYNCIO)
return;
atomic_dec(&q->user->sigpending);
free_uid(q->user);
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
` (2 preceding siblings ...)
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
@ 2006-11-20 14:23 ` Sébastien Dugué
2006-11-21 10:35 ` Suparna Bhattacharya
2006-11-27 13:39 ` Bharata B Rao
3 siblings, 2 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:23 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
POSIX listio support
This patch adds POSIX listio completion notification support. It builds
on support provided by the aio signal notification patch and adds an
IOCB_CMD_GROUP command to io_submit().
The purpose of IOCB_CMD_GROUP is to group together the following requests in
the list up to the end of the list sumbitted to io_submit.
As io_submit already accepts an array of iocbs, as part of listio submission,
the user process prepends to a list of requests an empty special aiocb with
an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields.
An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
A struct lio_event is added in include/linux/aio.h
A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
In io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb, an
lio_event is created in lio_create() which contains the necessary information
for signaling a thread (signal number, pid, notify type and value) along with
a count of requests attached to this event.
The following depicts the lio_event structure:
struct lio_event {
atomic_t lio_users;
struct aio_notify lio_notify;
};
lio_users holds an atomic counter of the number of requests attached to this
lio. It is incremented with each request submitted and decremented at each
request completion. When the counter reaches 0, we send the notification.
Each subsequent submitted request is attached to this lio_event by setting
the request kiocb->ki_lio to that lio_event (in io_submit_one()) and
incrementing the lio_users count.
In aio_complete(), if the request is attached to an lio (ki_lio <> 0),
then lio_check() is called to decrement the lio_users count and eventually
signal the user process when all the requests in the group have completed.
The IOCB_CMD_GROUP semantic is as follows:
- if the associated sigevent is NULL then we want to group
requests for the purpose of blocking on the group completion
(LIO_WAIT sync behavior).
- if the associated sigevent is valid (not NULL) then we want to
group requests for the purpose of being notified upon that
group of requests completion (LIO_NOWAIT async behaviour).
fs/aio.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/aio.h | 13 ++++-
include/linux/aio_abi.h | 1
3 files changed, 131 insertions(+), 6 deletions(-)
Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Index: linux-2.6.19-rc5-mm2/fs/aio.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:32.000000000
+0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 16:22:00.000000000
+0100 @@ -414,6 +414,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_cancel = NULL;
req->ki_retry = NULL;
req->ki_dtor = NULL;
+ req->ki_lio = NULL;
req->private = NULL;
req->ki_iovec = NULL;
req->ki_notify.sigq = NULL;
@@ -1010,6 +1011,53 @@ out_unlock:
return -EINVAL;
}
+static inline void lio_check(struct lio_event *lio)
+{
+ int ret;
+
+ ret = atomic_dec_and_test(&lio->lio_users);
+
+ if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
+ /* last one -> notify process */
+ aio_send_signal(&lio->lio_notify);
+ kfree(lio);
+ }
+}
+
+static struct lio_event *lio_create(struct sigevent __user *user_event)
+{
+ int ret = 0;
+ struct lio_event *lio = NULL;
+
+ lio = kzalloc(sizeof(*lio), GFP_KERNEL);
+
+ if (!lio)
+ return ERR_PTR(-EAGAIN);
+
+ /*
+ * Grab an initial ref on the lio to avoid races between
+ * submission and completion.
+ */
+ atomic_set(&lio->lio_users, 1);
+
+ lio->lio_notify.notify = SIGEV_NONE;
+
+ if (user_event) {
+ /*
+ * User specified an event for this lio,
+ * he wants to be notified upon lio completion.
+ */
+ ret = aio_setup_sigevent(&lio->lio_notify, user_event);
+
+ if (ret) {
+ kfree(lio);
+ return ERR_PTR(ret);
+ }
+ }
+
+ return lio;
+}
+
/* aio_complete
* Called when the io request on the given iocb is complete.
* Returns true if this is the last user of the request. The
@@ -1058,8 +1106,12 @@ int fastcall aio_complete(struct kiocb *
* when the event got cancelled.
*/
if (kiocbIsCancelled(iocb)) {
+ if (iocb->ki_lio)
+ lio_check(iocb->ki_lio);
+
if (iocb->ki_notify.sigq)
sigqueue_free(iocb->ki_notify.sigq);
+
goto put_rq;
}
@@ -1100,6 +1152,9 @@ int fastcall aio_complete(struct kiocb *
sigqueue_free(iocb->ki_notify.sigq);
}
+ if (iocb->ki_lio)
+ lio_check(iocb->ki_lio);
+
pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried,
iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes);
put_rq:
@@ -1634,7 +1689,7 @@ static int aio_wake_function(wait_queue_
}
int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb)
+ struct iocb *iocb, struct lio_event *lio)
{
struct kiocb *req;
struct file *file;
@@ -1696,6 +1751,9 @@ int fastcall io_submit_one(struct kioctx
goto out_put_req;
}
+ /* Attach this iocb to its lio */
+ req->ki_lio = lio;
+
ret = aio_setup_iocb(req);
if (ret)
@@ -1739,6 +1797,8 @@ asmlinkage long sys_io_submit(aio_contex
struct iocb __user * __user *iocbpp)
{
struct kioctx *ctx;
+ struct lio_event *lio = NULL;
+ int lio_wait = 0;
long ret = 0;
int i;
@@ -1772,11 +1832,66 @@ asmlinkage long sys_io_submit(aio_contex
break;
}
- ret = io_submit_one(ctx, user_iocb, &tmp);
- if (ret)
- break;
+ if (tmp.aio_lio_opcode == IOCB_CMD_GROUP) {
+
+ /* this command means that all following IO commands
+ * are in the same group.
+ *
+ * Userspace either wants to be notified upon or block
until
+ * completion of all the requests in the group.
+ */
+ /*
+ * Ignore an IOCB_CMD_GROUP request if we are already
+ * processing one. This means only one listio per
+ * io_submit call.
+ */
+ if (lio)
+ continue;
+
+ lio = lio_create((struct sigevent __user *)(unsigned
long)
+ tmp.aio_sigeventp);
+
+ ret = PTR_ERR(lio);
+
+ if (IS_ERR(lio))
+ goto out_put_ctx;
+
+ if (!tmp.aio_sigeventp)
+ lio_wait = 1;
+ } else {
+ if (lio)
+ atomic_inc(&lio->lio_users);
+
+ ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+
+ if (ret) {
+ if (lio) {
+ /*
+ * If a request failed, just decrement
+ * the users count, but go on
submitting
+ * subsequent requests.
+ */
+ atomic_dec(&lio->lio_users);
+ } else
+ break;
+ }
+ }
+ }
+
+ if (lio) {
+ /*
+ * Drop extra ref on the lio now that we're done submitting
+ * requests
+ */
+ lio_check(lio);
+
+ if (lio_wait) {
+ wait_event(ctx->wait, atomic_read(&lio->lio_users)==0);
+ kfree(lio);
+ }
}
+out_put_ctx:
put_ioctx(ctx);
return i ? i : ret;
}
Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17
11:20:32.000000000 +0100 +++
linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17
11:21:31.000000000 +0100 @@ -43,6 +43,7 @@ enum { IOCB_CMD_NOOP = 6,
IOCB_CMD_PREADV = 7,
IOCB_CMD_PWRITEV = 8,
+ IOCB_CMD_GROUP = 9,
};
/* read() from /dev/aio returns these structures. */
Index: linux-2.6.19-rc5-mm2/include/linux/aio.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17
11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h
2006-11-17 11:21:31.000000000 +0100 @@ -58,6 +58,11 @@ struct aio_notify {
struct sigqueue *sigq;
};
+struct lio_event {
+ atomic_t lio_users;
+ struct aio_notify lio_notify;
+};
+
/* is there a better place to document function pointer methods? */
/**
* ki_retry - iocb forward progress callback
@@ -113,6 +118,9 @@ struct kiocb {
wait_queue_t ki_wait;
loff_t ki_pos;
+ /* lio this iocb might be attached to */
+ struct lio_event *ki_lio;
+
void *private;
/* State that we remember to be able to restart/retry */
unsigned short ki_opcode;
@@ -220,12 +228,13 @@ struct mm_struct;
extern void FASTCALL(exit_aio(struct mm_struct *mm));
extern struct kioctx *lookup_ioctx(unsigned long ctx_id);
extern int FASTCALL(io_submit_one(struct kioctx *ctx,
- struct iocb __user *user_iocb, struct iocb *iocb));
+ struct iocb __user *user_iocb, struct iocb
*iocb,
+ struct lio_event *lio));
/* semi private, but used by the 32bit emulations: */
struct kioctx *lookup_ioctx(unsigned long ctx_id);
int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb));
+ struct iocb *iocb, struct lio_event *lio));
#define get_ioctx(kioctx) do { \
BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
@ 2006-11-20 15:13 ` Zach Brown
2006-11-21 10:40 ` Sébastien Dugué
[not found] ` <20061122104055.3d1c029a@frecb000686>
2006-11-22 1:02 ` Andrew Morton
1 sibling, 2 replies; 26+ messages in thread
From: Zach Brown @ 2006-11-20 15:13 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
Sébastien Dugué wrote:
> AIO completion signal notification
This is looking a lot better, thanks for keeping at it.
> +static long aio_setup_sigevent(struct aio_notify *notify,
> + struct sigevent __user *user_event)
> +{
> + sigevent_t event;
> + struct task_struct *target;
> +
> + if (copy_from_user(&event, user_event, sizeof (event)))
> + return -EFAULT;
Last time we talked about this needing to call get_compat_sigevent(). I
think it still needs to.
I think we should avoid the examples set by the current
compat_sys_io_submit() and get_compat_sigevent() callers. They copy
translated data on to the userspace stack and pass it to the syscalls.
That will get crazy for compat_sys_io_submit() because it would have to
rewrite the iocb and the pointer to the iocb to get sys_io_submit() to
find a copied sigevent on the stack.
I think the model is compat_do_readv_writev(). Hoist some of the
syscall logic up into the compat layer so that one copying and
translating pass is made instead of trying to fool the syscall logic
into thinking that it's being called from a native word size caller.
So io_submit_one() should be given the kernel copies of the userspace
structures it needs. sys_io_submit() will pass it the copies it made
for native word size callers. compat_sys_io_submit() will pass in the
copies it made after translating from 32bit arguments. io_submit_one()
and lookup_kioctx() will have to be made available to kernel/compat.c
(via linux/aio.h, surely.). aio_setup_sigevent() will be called from
*_io_submit() and given the kernel sigevent, not a userspace pointer.
Reworking things this way should have the added benefit of making 32/64
sys_io_submit() more efficient than it is today.
- z
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent()
2006-11-20 13:42 ` Zach Brown
@ 2006-11-20 21:45 ` Ulrich Drepper
2006-11-20 14:02 ` Zach Brown
0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Drepper @ 2006-11-20 21:45 UTC (permalink / raw)
To: Zach Brown
Cc: Sébastien Dugué, linux-kernel, linux-aio, Andrew Morton,
Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty,
Jean Pierre Dion
Zach Brown wrote:
> I wonder if "good_sigevent()" isn't a very strong name for something
> that is now up in signal.h. Maybe "sigevent_find_task()"?
That's not the purpose, it's just one of the checks which need to be
done to check that the sigevent value is "good". I think the name is fine.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
@ 2006-11-21 10:35 ` Suparna Bhattacharya
2006-11-27 13:39 ` Bharata B Rao
1 sibling, 0 replies; 26+ messages in thread
From: Suparna Bhattacharya @ 2006-11-21 10:35 UTC (permalink / raw)
To: =?iso-8859-1?Q?S=E9bastien_Dugu=E9_=3Csebastien=2Edugue=40bull=2Enet?=.=?iso-8859-1?Q?=3E?=
Cc: linux-kernel, linux-aio, Andrew Morton, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
Sebestian,
Thanks for taking this forward ! I assume that you have addressed all the
comments (including missing counts in cancel path etc), except for the
interface debate (current vs syscall vs sys_io_wait_for_kiocb), is
that correct ?
Zach,
You had suggested a couple of alternative interfaces.
If we go the separate syscall way, there are a couple of things to figure
out/decide:
(1) Do we want to keep the LIO_WAIT semantics in the kernel, or
should we try to expose the lio to user-space for an explicit wait to
be issued ?
(2) We would have to address compat alternatives for the new syscall as well,
hopefully it should be possible to structure the code to reuse most of
the logic.
If we go the way of associating the array of iocbs as a parameter to
IOCB_CMD_GROUP, and make LIO_WAIT happen through a sys_io_wait for one iocb
on the group iocb, then
(1) Compat handling could get a little more complicated
(2) We have to think through the semantics of the completion path in various
situations. Will it be possible to cancel the whole group in one go ?
Regards
Suparna
On Mon, Nov 20, 2006 at 03:23:07PM +0100, Sébastien Dugué wrote:
>
> POSIX listio support
>
> This patch adds POSIX listio completion notification support. It builds
> on support provided by the aio signal notification patch and adds an
> IOCB_CMD_GROUP command to io_submit().
>
> The purpose of IOCB_CMD_GROUP is to group together the following requests in
> the list up to the end of the list sumbitted to io_submit.
>
> As io_submit already accepts an array of iocbs, as part of listio submission,
> the user process prepends to a list of requests an empty special aiocb with
> an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields.
>
>
> An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
>
> A struct lio_event is added in include/linux/aio.h
>
> A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
>
> In io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb, an
> lio_event is created in lio_create() which contains the necessary information
> for signaling a thread (signal number, pid, notify type and value) along with
> a count of requests attached to this event.
>
> The following depicts the lio_event structure:
>
> struct lio_event {
> atomic_t lio_users;
> struct aio_notify lio_notify;
> };
>
> lio_users holds an atomic counter of the number of requests attached to this
> lio. It is incremented with each request submitted and decremented at each
> request completion. When the counter reaches 0, we send the notification.
>
> Each subsequent submitted request is attached to this lio_event by setting
> the request kiocb->ki_lio to that lio_event (in io_submit_one()) and
> incrementing the lio_users count.
>
> In aio_complete(), if the request is attached to an lio (ki_lio <> 0),
> then lio_check() is called to decrement the lio_users count and eventually
> signal the user process when all the requests in the group have completed.
>
>
> The IOCB_CMD_GROUP semantic is as follows:
>
> - if the associated sigevent is NULL then we want to group
> requests for the purpose of blocking on the group completion
> (LIO_WAIT sync behavior).
>
> - if the associated sigevent is valid (not NULL) then we want to
> group requests for the purpose of being notified upon that
> group of requests completion (LIO_NOWAIT async behaviour).
>
>
>
>
> fs/aio.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++--
> include/linux/aio.h | 13 ++++-
> include/linux/aio_abi.h | 1
> 3 files changed, 131 insertions(+), 6 deletions(-)
>
>
> Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
>
> Index: linux-2.6.19-rc5-mm2/fs/aio.c
> ===================================================================
> --- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:32.000000000
> +0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 16:22:00.000000000
> +0100 @@ -414,6 +414,7 @@ static struct kiocb fastcall *__aio_get_
> req->ki_cancel = NULL;
> req->ki_retry = NULL;
> req->ki_dtor = NULL;
> + req->ki_lio = NULL;
> req->private = NULL;
> req->ki_iovec = NULL;
> req->ki_notify.sigq = NULL;
> @@ -1010,6 +1011,53 @@ out_unlock:
> return -EINVAL;
> }
>
> +static inline void lio_check(struct lio_event *lio)
> +{
> + int ret;
> +
> + ret = atomic_dec_and_test(&lio->lio_users);
> +
> + if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
> + /* last one -> notify process */
> + aio_send_signal(&lio->lio_notify);
> + kfree(lio);
> + }
> +}
> +
> +static struct lio_event *lio_create(struct sigevent __user *user_event)
> +{
> + int ret = 0;
> + struct lio_event *lio = NULL;
> +
> + lio = kzalloc(sizeof(*lio), GFP_KERNEL);
> +
> + if (!lio)
> + return ERR_PTR(-EAGAIN);
> +
> + /*
> + * Grab an initial ref on the lio to avoid races between
> + * submission and completion.
> + */
> + atomic_set(&lio->lio_users, 1);
> +
> + lio->lio_notify.notify = SIGEV_NONE;
> +
> + if (user_event) {
> + /*
> + * User specified an event for this lio,
> + * he wants to be notified upon lio completion.
> + */
> + ret = aio_setup_sigevent(&lio->lio_notify, user_event);
> +
> + if (ret) {
> + kfree(lio);
> + return ERR_PTR(ret);
> + }
> + }
> +
> + return lio;
> +}
> +
> /* aio_complete
> * Called when the io request on the given iocb is complete.
> * Returns true if this is the last user of the request. The
> @@ -1058,8 +1106,12 @@ int fastcall aio_complete(struct kiocb *
> * when the event got cancelled.
> */
> if (kiocbIsCancelled(iocb)) {
> + if (iocb->ki_lio)
> + lio_check(iocb->ki_lio);
> +
> if (iocb->ki_notify.sigq)
> sigqueue_free(iocb->ki_notify.sigq);
> +
> goto put_rq;
> }
>
> @@ -1100,6 +1152,9 @@ int fastcall aio_complete(struct kiocb *
> sigqueue_free(iocb->ki_notify.sigq);
> }
>
> + if (iocb->ki_lio)
> + lio_check(iocb->ki_lio);
> +
> pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried,
> iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes);
> put_rq:
> @@ -1634,7 +1689,7 @@ static int aio_wake_function(wait_queue_
> }
>
> int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> - struct iocb *iocb)
> + struct iocb *iocb, struct lio_event *lio)
> {
> struct kiocb *req;
> struct file *file;
> @@ -1696,6 +1751,9 @@ int fastcall io_submit_one(struct kioctx
> goto out_put_req;
> }
>
> + /* Attach this iocb to its lio */
> + req->ki_lio = lio;
> +
> ret = aio_setup_iocb(req);
>
> if (ret)
> @@ -1739,6 +1797,8 @@ asmlinkage long sys_io_submit(aio_contex
> struct iocb __user * __user *iocbpp)
> {
> struct kioctx *ctx;
> + struct lio_event *lio = NULL;
> + int lio_wait = 0;
> long ret = 0;
> int i;
>
> @@ -1772,11 +1832,66 @@ asmlinkage long sys_io_submit(aio_contex
> break;
> }
>
> - ret = io_submit_one(ctx, user_iocb, &tmp);
> - if (ret)
> - break;
> + if (tmp.aio_lio_opcode == IOCB_CMD_GROUP) {
> +
> + /* this command means that all following IO commands
> + * are in the same group.
> + *
> + * Userspace either wants to be notified upon or block
> until
> + * completion of all the requests in the group.
> + */
> + /*
> + * Ignore an IOCB_CMD_GROUP request if we are already
> + * processing one. This means only one listio per
> + * io_submit call.
> + */
> + if (lio)
> + continue;
> +
> + lio = lio_create((struct sigevent __user *)(unsigned
> long)
> + tmp.aio_sigeventp);
> +
> + ret = PTR_ERR(lio);
> +
> + if (IS_ERR(lio))
> + goto out_put_ctx;
> +
> + if (!tmp.aio_sigeventp)
> + lio_wait = 1;
> + } else {
> + if (lio)
> + atomic_inc(&lio->lio_users);
> +
> + ret = io_submit_one(ctx, user_iocb, &tmp, lio);
> +
> + if (ret) {
> + if (lio) {
> + /*
> + * If a request failed, just decrement
> + * the users count, but go on
> submitting
> + * subsequent requests.
> + */
> + atomic_dec(&lio->lio_users);
> + } else
> + break;
> + }
> + }
> + }
> +
> + if (lio) {
> + /*
> + * Drop extra ref on the lio now that we're done submitting
> + * requests
> + */
> + lio_check(lio);
> +
> + if (lio_wait) {
> + wait_event(ctx->wait, atomic_read(&lio->lio_users)==0);
> + kfree(lio);
> + }
> }
>
> +out_put_ctx:
> put_ioctx(ctx);
> return i ? i : ret;
> }
> Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h
> ===================================================================
> --- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17
> 11:20:32.000000000 +0100 +++
> linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17
> 11:21:31.000000000 +0100 @@ -43,6 +43,7 @@ enum { IOCB_CMD_NOOP = 6,
> IOCB_CMD_PREADV = 7,
> IOCB_CMD_PWRITEV = 8,
> + IOCB_CMD_GROUP = 9,
> };
>
> /* read() from /dev/aio returns these structures. */
> Index: linux-2.6.19-rc5-mm2/include/linux/aio.h
> ===================================================================
> --- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17
> 11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h
> 2006-11-17 11:21:31.000000000 +0100 @@ -58,6 +58,11 @@ struct aio_notify {
> struct sigqueue *sigq;
> };
>
> +struct lio_event {
> + atomic_t lio_users;
> + struct aio_notify lio_notify;
> +};
> +
> /* is there a better place to document function pointer methods? */
> /**
> * ki_retry - iocb forward progress callback
> @@ -113,6 +118,9 @@ struct kiocb {
> wait_queue_t ki_wait;
> loff_t ki_pos;
>
> + /* lio this iocb might be attached to */
> + struct lio_event *ki_lio;
> +
> void *private;
> /* State that we remember to be able to restart/retry */
> unsigned short ki_opcode;
> @@ -220,12 +228,13 @@ struct mm_struct;
> extern void FASTCALL(exit_aio(struct mm_struct *mm));
> extern struct kioctx *lookup_ioctx(unsigned long ctx_id);
> extern int FASTCALL(io_submit_one(struct kioctx *ctx,
> - struct iocb __user *user_iocb, struct iocb *iocb));
> + struct iocb __user *user_iocb, struct iocb
> *iocb,
> + struct lio_event *lio));
>
> /* semi private, but used by the 32bit emulations: */
> struct kioctx *lookup_ioctx(unsigned long ctx_id);
> int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> - struct iocb *iocb));
> + struct iocb *iocb, struct lio_event *lio));
>
> #define get_ioctx(kioctx) do { \
> BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-20 15:13 ` Zach Brown
@ 2006-11-21 10:40 ` Sébastien Dugué
[not found] ` <20061122104055.3d1c029a@frecb000686>
1 sibling, 0 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-21 10:40 UTC (permalink / raw)
To: Zach Brown
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
Zach Brown <zach.brown@oracle.com> wrote:
> Sébastien Dugué wrote:
> > AIO completion signal notification
>
> This is looking a lot better, thanks for keeping at it.
>
> > +static long aio_setup_sigevent(struct aio_notify *notify,
> > + struct sigevent __user *user_event)
> > +{
> > + sigevent_t event;
> > + struct task_struct *target;
> > +
> > + if (copy_from_user(&event, user_event, sizeof (event)))
> > + return -EFAULT;
>
> Last time we talked about this needing to call get_compat_sigevent(). I
> think it still needs to.
Yes, I could not find an elegant way to tackle this, so I left it
float for a while.
>
> I think we should avoid the examples set by the current
> compat_sys_io_submit() and get_compat_sigevent() callers. They copy
> translated data on to the userspace stack and pass it to the syscalls.
> That will get crazy for compat_sys_io_submit() because it would have to
> rewrite the iocb and the pointer to the iocb to get sys_io_submit() to
> find a copied sigevent on the stack.
Right, that's what I really wanted to avoid at all costs.
>
> I think the model is compat_do_readv_writev(). Hoist some of the
> syscall logic up into the compat layer so that one copying and
> translating pass is made instead of trying to fool the syscall logic
> into thinking that it's being called from a native word size caller.
Yes, but you end up duplicating a lot of code, but if it's the price to
pay, I'm all for it.
>
> So io_submit_one() should be given the kernel copies of the userspace
> structures it needs. sys_io_submit() will pass it the copies it made
> for native word size callers. compat_sys_io_submit() will pass in the
> copies it made after translating from 32bit arguments. io_submit_one()
> and lookup_kioctx() will have to be made available to kernel/compat.c
> (via linux/aio.h, surely.). aio_setup_sigevent() will be called from
> *_io_submit() and given the kernel sigevent, not a userspace pointer.
>
> Reworking things this way should have the added benefit of making 32/64
> sys_io_submit() more efficient than it is today.
I completely aggree. I will look into this.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
2006-11-20 15:13 ` Zach Brown
@ 2006-11-22 1:02 ` Andrew Morton
2006-11-23 8:28 ` Sébastien Dugué
1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-11-22 1:02 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Mon, 20 Nov 2006 15:22:52 +0100
S__bastien Dugu__ <sebastien.dugue@bull.net> wrote:
> +static long aio_setup_sigevent(struct aio_notify *notify,
> + struct sigevent __user *user_event)
> +{
> + sigevent_t event;
> + struct task_struct *target;
> +
> + if (copy_from_user(&event, user_event, sizeof (event)))
> + return -EFAULT;
> +
> + if (event.sigev_notify == SIGEV_NONE)
> + return 0;
> +
> + notify->notify = event.sigev_notify;
> + notify->signo = event.sigev_signo;
> + notify->value = event.sigev_value;
> +
> + read_lock(&tasklist_lock);
> + target = good_sigevent(&event);
> +
> + if (unlikely(!target || (target->flags & PF_EXITING)))
> + goto out_unlock;
> +
> + notify->target = target;
> +
> + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> + /*
> + * This reference will be dropped in really_put_req() when
> + * we're done with the request.
> + */
> + get_task_struct(target);
> + }
It worries me that this function can save away a task_struct* without
having taken a reference against it.
> + read_unlock(&tasklist_lock);
> +
> + /*
> + * NOTE: we cannot free the sigqueue in the completion path as
> + * the signal may not have been delivered to the target task.
> + * Therefore it has to be freed in __sigqueue_free() when the
> + * signal is collected if si_code is SI_ASYNCIO.
> + */
> + notify->sigq = sigqueue_alloc();
> +
> + if (unlikely(!notify->sigq))
> + return -EAGAIN;
> +
> + return 0;
> +
> +out_unlock:
> + read_unlock(&tasklist_lock);
> + return -EINVAL;
> +}
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
[not found] ` <20061122104055.3d1c029a@frecb000686>
@ 2006-11-22 10:22 ` Zach Brown
2006-11-23 8:24 ` Sébastien Dugué
0 siblings, 1 reply; 26+ messages in thread
From: Zach Brown @ 2006-11-22 10:22 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
> OK, looking at this, there's something bothering me: io_submit_one() needs
> a pointer to the user iocb in order to push back the iocb->ki_key to userspace,
> as well as storing the user_iocb pointer into iocb->ki_obj.
Why can't it continue to do what it does today? Both of those uses of
the user_iocb pointer involve fixed-width fields and don't need compat help.
> So I think that some of the logic in io_submit_one() must be moved up to
> sys_io_submit(), including the aio_get_req() call.
I don't see why that would be needed.
- z
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-22 10:22 ` Zach Brown
@ 2006-11-23 8:24 ` Sébastien Dugué
0 siblings, 0 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-23 8:24 UTC (permalink / raw)
To: Zach Brown
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
On Wed, 22 Nov 2006 02:22:15 -0800
Zach Brown <zach.brown@oracle.com> wrote:
> > OK, looking at this, there's something bothering me: io_submit_one() needs
> > a pointer to the user iocb in order to push back the iocb->ki_key to userspace,
> > as well as storing the user_iocb pointer into iocb->ki_obj.
>
> Why can't it continue to do what it does today? Both of those uses of
> the user_iocb pointer involve fixed-width fields and don't need compat help.
Aah, right, thanks.
>
> > So I think that some of the logic in io_submit_one() must be moved up to
> > sys_io_submit(), including the aio_get_req() call.
>
> I don't see why that would be needed.
Not anymore, indeed.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-22 1:02 ` Andrew Morton
@ 2006-11-23 8:28 ` Sébastien Dugué
2006-11-23 8:40 ` Andrew Morton
0 siblings, 1 reply; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-23 8:28 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Tue, 21 Nov 2006 17:02:28 -0800
Andrew Morton <akpm@osdl.org> wrote:
> On Mon, 20 Nov 2006 15:22:52 +0100
> S__bastien Dugu__ <sebastien.dugue@bull.net> wrote:
>
> > +static long aio_setup_sigevent(struct aio_notify *notify,
> > + struct sigevent __user *user_event)
> > +{
> > + sigevent_t event;
> > + struct task_struct *target;
> > +
> > + if (copy_from_user(&event, user_event, sizeof (event)))
> > + return -EFAULT;
> > +
> > + if (event.sigev_notify == SIGEV_NONE)
> > + return 0;
> > +
> > + notify->notify = event.sigev_notify;
> > + notify->signo = event.sigev_signo;
> > + notify->value = event.sigev_value;
> > +
> > + read_lock(&tasklist_lock);
> > + target = good_sigevent(&event);
> > +
> > + if (unlikely(!target || (target->flags & PF_EXITING)))
> > + goto out_unlock;
> > +
> > +
> > +
> > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> > + /*
> > + * This reference will be dropped in really_put_req() when
> > + * we're done with the request.
> > + */
> > + get_task_struct(target);
> > + }
>
> It worries me that this function can save away a task_struct* without
> having taken a reference against it.
>
OK. Does moving 'notify->target = target;' after the get_task_struct() will
do, or am I missing something more subtle?
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-23 8:28 ` Sébastien Dugué
@ 2006-11-23 8:40 ` Andrew Morton
2006-11-23 9:47 ` Sébastien Dugué
2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet
0 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2006-11-23 8:40 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Thu, 23 Nov 2006 09:28:05 +0100
Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> > > + /*
> > > + * This reference will be dropped in really_put_req() when
> > > + * we're done with the request.
> > > + */
> > > + get_task_struct(target);
> > > + }
> >
> > It worries me that this function can save away a task_struct* without
> > having taken a reference against it.
> >
>
> OK. Does moving 'notify->target = target;' after the get_task_struct() will
> do, or am I missing something more subtle?
Well it's your code - you tell me ;)
It is unsafe (and rather pointless) to be saving the address of some structure
which can be freed at any time.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-23 8:40 ` Andrew Morton
@ 2006-11-23 9:47 ` Sébastien Dugué
2006-11-23 10:14 ` Andrew Morton
2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet
1 sibling, 1 reply; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-23 9:47 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Thu, 23 Nov 2006 00:40:53 -0800
Andrew Morton <akpm@osdl.org> wrote:
> On Thu, 23 Nov 2006 09:28:05 +0100
> Sébastien Dugué <sebastien.dugue@bull.net> wrote:
>
> > > > + target = good_sigevent(&event);
> > > > +
> > > > + if (unlikely(!target || (target->flags & PF_EXITING)))
> > > > + goto out_unlock;
> > > > +
> > > > +
> > > > +
> > > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> > > > + /*
> > > > + * This reference will be dropped in really_put_req() when
> > > > + * we're done with the request.
> > > > + */
> > > > + get_task_struct(target);
> > > > + }
> > >
> > > It worries me that this function can save away a task_struct* without
> > > having taken a reference against it.
> > >
> >
> > OK. Does moving 'notify->target = target;' after the get_task_struct() will
> > do, or am I missing something more subtle?
>
> Well it's your code - you tell me ;)
>
> It is unsafe (and rather pointless) to be saving the address of some structure
> which can be freed at any time.
Sorry, I expressed myself quite badly. What I wanted to know is whether you
are worried with the task been freed between saving its pointer and getting a
ref on it (which is trivial to fix) or you are thinking of something deeper.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-23 9:47 ` Sébastien Dugué
@ 2006-11-23 10:14 ` Andrew Morton
2006-11-23 10:27 ` Sébastien Dugué
0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-11-23 10:14 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Thu, 23 Nov 2006 10:47:55 +0100
Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> On Thu, 23 Nov 2006 00:40:53 -0800
> Andrew Morton <akpm@osdl.org> wrote:
>
> > On Thu, 23 Nov 2006 09:28:05 +0100
> > Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> >
> > > > > + target = good_sigevent(&event);
> > > > > +
> > > > > + if (unlikely(!target || (target->flags & PF_EXITING)))
> > > > > + goto out_unlock;
> > > > > +
> > > > > +
> > > > > +
> > > > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> > > > > + /*
> > > > > + * This reference will be dropped in really_put_req() when
> > > > > + * we're done with the request.
> > > > > + */
> > > > > + get_task_struct(target);
> > > > > + }
> > > >
> > > > It worries me that this function can save away a task_struct* without
> > > > having taken a reference against it.
> > > >
> > >
> > > OK. Does moving 'notify->target = target;' after the get_task_struct() will
> > > do, or am I missing something more subtle?
> >
> > Well it's your code - you tell me ;)
> >
> > It is unsafe (and rather pointless) to be saving the address of some structure
> > which can be freed at any time.
>
> Sorry, I expressed myself quite badly. What I wanted to know is whether you
> are worried with the task been freed between saving its pointer and getting a
> ref on it (which is trivial to fix) or you are thinking of something deeper.
>
Look:
> + notify->target = target;
> +
> + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> + /*
> + * This reference will be dropped in really_put_req() when
> + * we're done with the request.
> + */
> + get_task_struct(target);
If that test fails, we've saved a pointer to the task_struct without having
taken a refreence on it.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification
2006-11-23 10:14 ` Andrew Morton
@ 2006-11-23 10:27 ` Sébastien Dugué
0 siblings, 0 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-23 10:27 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
On Thu, 23 Nov 2006 02:14:37 -0800
Andrew Morton <akpm@osdl.org> wrote:
[snip]
> Look:
>
> > + notify->target = target;
> > +
> > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) {
> > + /*
> > + * This reference will be dropped in really_put_req() when
> > + * we're done with the request.
> > + */
> > + get_task_struct(target);
>
> If that test fails, we've saved a pointer to the task_struct without having
> taken a refreence on it.
>
Aggreed, just wanted to make sure.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations
2006-11-23 8:40 ` Andrew Morton
2006-11-23 9:47 ` Sébastien Dugué
@ 2006-11-23 10:57 ` Eric Dumazet
2006-11-27 21:37 ` Andrew Morton
1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2006-11-23 10:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Daniel McNeil
[-- Attachment #1: Type: text/plain, Size: 766 bytes --]
On 32bits SMP platforms, 64bits i_size is protected by a seqcount
(i_size_seqcount).
When i_size is read or written, i_size_seqcount is read/written as well, so it
make sense to group these two fields together in the same cache line.
Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one
for i_size_seqcount). After, only one cache line is needed/ (dirtied on a
i_size change).
This patch moves i_size_seqcount next to i_size, and also moves i_version to
let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for 32bits
platforms).
For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a 'long
i_version' should not introduce a new hole because of padding.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: move_i_size_seqcount.patch --]
[-- Type: text/plain, Size: 816 bytes --]
--- linux-2.6.19-rc6/include/linux/fs.h 2006-11-23 10:47:53.000000000 +0100
+++ linux-2.6.19-rc6-ed/include/linux/fs.h 2006-11-23 11:13:36.000000000 +0100
@@ -548,12 +548,15 @@ struct inode {
uid_t i_uid;
gid_t i_gid;
dev_t i_rdev;
+ unsigned long i_version;
loff_t i_size;
+#ifdef __NEED_I_SIZE_ORDERED
+ seqcount_t i_size_seqcount;
+#endif
struct timespec i_atime;
struct timespec i_mtime;
struct timespec i_ctime;
unsigned int i_blkbits;
- unsigned long i_version;
blkcnt_t i_blocks;
unsigned short i_bytes;
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
@@ -598,9 +601,6 @@ struct inode {
void *i_security;
#endif
void *i_private; /* fs or device private pointer */
-#ifdef __NEED_I_SIZE_ORDERED
- seqcount_t i_size_seqcount;
-#endif
};
/*
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
2006-11-21 10:35 ` Suparna Bhattacharya
@ 2006-11-27 13:39 ` Bharata B Rao
1 sibling, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2006-11-27 13:39 UTC (permalink / raw)
To: Sébastien Dugué
Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya,
Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion,
Ulrich Drepper
[-- Attachment #1: Type: text/plain, Size: 794 bytes --]
On 11/20/06, Sébastien Dugué <sebastien.dugue@bull.net> wrote:
>
> POSIX listio support
>
> This patch adds POSIX listio completion notification support. It builds
> on support provided by the aio signal notification patch and adds an
> IOCB_CMD_GROUP command to io_submit().
>
I have altered this patch to provide the listio support through a
separate syscall.
The code is in RFC stage at the moment and is only compile-tested on
i386. I have refrained from modifying unistd.h for now in order to
avoid touching too many files. I plan to do actual tests (which
involves user space changes to aio libraries) only if there is an
agreement with this syscall approach.
More details about the interface are in the patch itself.
Regards,
Bharata.
[-- Attachment #2: aio-listio-support.patch --]
[-- Type: text/x-patch, Size: 12055 bytes --]
This patch adds POSIX listio completion notification support. It builds
on support provided by the aio signal notification patch and adds a
system call lio_submit().
lio_submit() is similar to io_submit() except that it takes two more
argurments: mode and sigevent.
mode can be LIO_WAIT or LIO_NOWAIT. sigevent argument specificies
the sigevent notification mechanism on listio completion. lio_submit()
waits within the syscall if LIO_WAIT is specified.
A struct lio_event is added in include/linux/aio.h
A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
In lio_submit(), a lio_event is created in lio_create() which contains the
necessary information for signaling a thread (signal number, pid, notify type
and value) along with a count of requests attached to this event.
The following depicts the lio_event structure:
struct lio_event {
atomic_t lio_users;
struct aio_notify lio_notify;
};
lio_users holds an atomic counter of the number of requests attached to this
lio. It is incremented with each request submitted and decremented at each
request completion. When the counter reaches 0, we send the notification.
In aio_complete(), if the request is attached to an lio (ki_lio <> 0),
then lio_check() is called to decrement the lio_users count and eventually
signal the user process when all the requests in the group have completed.
Sebastien Dugue's listio patch has been modified to arrive at this patch.
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Bharata B Rao <bharata.rao@gmail.com>
---
fs/aio.c | 181 +++++++++++++++++++++++++++++++++++++++++------
fs/compat.c | 29 +++++++
include/linux/aio.h | 13 ++-
include/linux/aio_abi.h | 5 +
include/linux/syscalls.h | 2
5 files changed, 204 insertions(+), 26 deletions(-)
diff -puN fs/aio.c~aio-listio-support fs/aio.c
--- linux-2.6.19-rc5-mm2/fs/aio.c~aio-listio-support 2006-11-23 19:05:33.000000000 +0530
+++ linux-2.6.19-rc5-mm2-bharata/fs/aio.c 2006-11-25 11:24:55.000000000 +0530
@@ -413,6 +413,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_ctx = ctx;
req->ki_cancel = NULL;
req->ki_retry = NULL;
+ req->ki_lio = NULL;
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
@@ -1010,6 +1011,59 @@ out_unlock:
return -EINVAL;
}
+static inline void lio_check(struct lio_event *lio)
+{
+ int ret;
+
+ ret = atomic_dec_and_test(&lio->lio_users);
+
+ if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
+ /* last one -> notify process */
+ if (aio_send_signal(&lio->lio_notify))
+ sigqueue_free(lio->lio_notify.sigq);
+ kfree(lio);
+ }
+}
+
+static struct lio_event *lio_create(struct sigevent __user *user_event,
+ int mode)
+{
+ int ret = 0;
+ struct lio_event *lio = NULL;
+
+ if (unlikely((mode == LIO_NOWAIT) && !user_event))
+ return lio;
+
+ lio = kzalloc(sizeof(*lio), GFP_KERNEL);
+
+ if (!lio)
+ return ERR_PTR(-EAGAIN);
+
+ /*
+ * Grab an initial ref on the lio to avoid races between
+ * submission and completion.
+ */
+ atomic_set(&lio->lio_users, 1);
+
+ lio->lio_notify.notify = SIGEV_NONE;
+
+ /* sigevent argument is ignored with LIO_WAIT */
+ if (user_event && (mode == LIO_NOWAIT)) {
+ /*
+ * User specified an event for this lio,
+ * he wants to be notified upon lio completion.
+ */
+ ret = aio_setup_sigevent(&lio->lio_notify, user_event);
+
+ if (ret) {
+ kfree(lio);
+ return ERR_PTR(ret);
+ }
+ }
+
+ return lio;
+}
+
/* aio_complete
* Called when the io request on the given iocb is complete.
* Returns true if this is the last user of the request. The
@@ -1058,6 +1112,8 @@ int fastcall aio_complete(struct kiocb *
* when the event got cancelled.
*/
if (kiocbIsCancelled(iocb)) {
+ if (iocb->ki_lio)
+ lio_check(iocb->ki_lio);
if (iocb->ki_notify.sigq)
sigqueue_free(iocb->ki_notify.sigq);
goto put_rq;
@@ -1100,6 +1156,14 @@ int fastcall aio_complete(struct kiocb *
sigqueue_free(iocb->ki_notify.sigq);
}
+ /*
+ * In case of listio, in addition to the optional per-iocb sigevent
+ * notification (as above), the listio as a whole can also generate
+ * a sigevent notification.
+ */
+ if (iocb->ki_lio)
+ lio_check(iocb->ki_lio);
+
pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried,
iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes);
put_rq:
@@ -1634,7 +1698,7 @@ static int aio_wake_function(wait_queue_
}
int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb)
+ struct iocb *iocb, struct lio_event *lio)
{
struct kiocb *req;
struct file *file;
@@ -1690,12 +1754,13 @@ int fastcall io_submit_one(struct kioctx
if (iocb->aio_sigeventp) {
ret = aio_setup_sigevent(&req->ki_notify,
- (struct sigevent __user *)(unsigned long)
- iocb->aio_sigeventp);
+ (struct sigevent __user *)(unsigned long)
+ iocb->aio_sigeventp);
if (ret)
goto out_put_req;
}
+ req->ki_lio = lio;
ret = aio_setup_iocb(req);
if (ret)
@@ -1723,6 +1788,48 @@ out_put_req:
return ret;
}
+static int io_submit_group(struct kioctx *ctx, long nr,
+ struct iocb __user * __user *iocbpp, struct lio_event *lio)
+{
+ int i;
+ long ret = 0;
+
+ /*
+ * AKPM: should this return a partial result if some of the IOs were
+ * successfully submitted?
+ */
+ for (i = 0; i < nr; i++) {
+ struct iocb __user *user_iocb;
+ struct iocb tmp;
+
+ if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (lio)
+ atomic_inc(&lio->lio_users);
+
+ ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+ if (ret) {
+ if (lio) {
+ /*
+ * In case of listio, continue with
+ * the subsequent requests
+ */
+ atomic_dec(&lio->lio_users);
+ } else
+ break;
+ }
+ }
+ return i ? i : ret;
+}
+
/* sys_io_submit:
* Queue the nr iocbs pointed to by iocbpp for processing. Returns
* the number of iocbs queued. May return -EINVAL if the aio_context
@@ -1740,7 +1847,6 @@ asmlinkage long sys_io_submit(aio_contex
{
struct kioctx *ctx;
long ret = 0;
- int i;
if (unlikely(nr < 0))
return -EINVAL;
@@ -1754,31 +1860,60 @@ asmlinkage long sys_io_submit(aio_contex
return -EINVAL;
}
- /*
- * AKPM: should this return a partial result if some of the IOs were
- * successfully submitted?
- */
- for (i=0; i<nr; i++) {
- struct iocb __user *user_iocb;
- struct iocb tmp;
+ ret = io_submit_group(ctx, nr, iocbpp, NULL);
- if (unlikely(__get_user(user_iocb, iocbpp + i))) {
- ret = -EFAULT;
- break;
- }
+ put_ioctx(ctx);
+ return ret;
+}
- if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
- ret = -EFAULT;
- break;
- }
+asmlinkage long sys_lio_submit(aio_context_t ctx_id, int mode, long nr,
+ struct iocb __user * __user *iocbpp, struct sigevent __user *event)
+{
+ struct kioctx *ctx;
+ struct lio_event *lio = NULL;
+ long ret = 0;
- ret = io_submit_one(ctx, user_iocb, &tmp);
- if (ret)
- break;
+ if (unlikely(nr < 0))
+ return -EINVAL;
+
+ if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
+ return -EFAULT;
+
+ ctx = lookup_ioctx(ctx_id);
+ if (unlikely(!ctx)) {
+ pr_debug("EINVAL: lio_submit: invalid context id\n");
+ return -EINVAL;
+ }
+
+ lio = lio_create(event, mode);
+
+ ret = PTR_ERR(lio);
+ if (IS_ERR(lio))
+ goto out_put_ctx;
+
+ ret = io_submit_group(ctx, nr, iocbpp, lio);
+
+ /* If we failed to submit even one request just return */
+ if (ret < 0 ) {
+ if (lio)
+ kfree(lio);
+ goto out_put_ctx;
+ }
+
+ /*
+ * Drop extra ref on the lio now that we're done submitting requests.
+ */
+ if (lio)
+ lio_check(lio);
+
+ if (mode == LIO_WAIT) {
+ wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0);
+ kfree(lio);
}
+out_put_ctx:
put_ioctx(ctx);
- return i ? i : ret;
+ return ret;
}
/* lookup_kiocb
diff -puN include/linux/aio_abi.h~aio-listio-support include/linux/aio_abi.h
--- linux-2.6.19-rc5-mm2/include/linux/aio_abi.h~aio-listio-support 2006-11-23 21:56:14.000000000 +0530
+++ linux-2.6.19-rc5-mm2-bharata/include/linux/aio_abi.h 2006-11-23 21:56:55.000000000 +0530
@@ -45,6 +45,11 @@ enum {
IOCB_CMD_PWRITEV = 8,
};
+enum {
+ LIO_WAIT = 0,
+ LIO_NOWAIT = 1,
+};
+
/* read() from /dev/aio returns these structures. */
struct io_event {
__u64 data; /* the data field from the iocb */
diff -puN include/linux/aio.h~aio-listio-support include/linux/aio.h
--- linux-2.6.19-rc5-mm2/include/linux/aio.h~aio-listio-support 2006-11-23 21:57:23.000000000 +0530
+++ linux-2.6.19-rc5-mm2-bharata/include/linux/aio.h 2006-11-23 22:01:11.000000000 +0530
@@ -58,6 +58,11 @@ struct aio_notify {
struct sigqueue *sigq;
};
+struct lio_event {
+ atomic_t lio_users;
+ struct aio_notify lio_notify;
+};
+
/* is there a better place to document function pointer methods? */
/**
* ki_retry - iocb forward progress callback
@@ -112,7 +117,8 @@ struct kiocb {
__u64 ki_user_data; /* user's data for completion */
wait_queue_t ki_wait;
loff_t ki_pos;
-
+ /* lio this iocb might be attached to */
+ struct lio_event *ki_lio;
void *private;
/* State that we remember to be able to restart/retry */
unsigned short ki_opcode;
@@ -220,12 +226,13 @@ struct mm_struct;
extern void FASTCALL(exit_aio(struct mm_struct *mm));
extern struct kioctx *lookup_ioctx(unsigned long ctx_id);
extern int FASTCALL(io_submit_one(struct kioctx *ctx,
- struct iocb __user *user_iocb, struct iocb *iocb));
+ struct iocb __user *user_iocb, struct iocb *iocb,
+ struct lio_event *lio));
/* semi private, but used by the 32bit emulations: */
struct kioctx *lookup_ioctx(unsigned long ctx_id);
int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb));
+ struct iocb *iocb, struct lio_event *lio));
#define get_ioctx(kioctx) do { \
BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
diff -puN fs/compat.c~aio-listio-support fs/compat.c
--- linux-2.6.19-rc5-mm2/fs/compat.c~aio-listio-support 2006-11-24 08:52:42.000000000 +0530
+++ linux-2.6.19-rc5-mm2-bharata/fs/compat.c 2006-11-24 09:53:56.000000000 +0530
@@ -678,6 +678,35 @@ compat_sys_io_submit(aio_context_t ctx_i
return ret;
}
+asmlinkage long
+compat_sys_lio_submit(aio_context_t ctx_id, int mode, int nr, u32 __user *iocb,
+ struct compat_sigevent __user *sig_user)
+{
+ struct iocb __user * __user *iocb64;
+ struct sigvent __user *event = NULL;
+ long ret;
+
+ if (unlikely(nr < 0))
+ return -EINVAL;
+
+ if (nr > MAX_AIO_SUBMITS)
+ nr = MAX_AIO_SUBMITS;
+
+ iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
+ ret = copy_iocb(nr, iocb, iocb64);
+ if (ret)
+ return ret;
+
+ if (sig_user) {
+ struct sigevent kevent;
+ event = compat_alloc_user_space(sizeof(struct sigevent));
+ if (get_compat_sigevent(&kevent, sig_user) ||
+ copy_to_user(event, &kevent, sizeof(struct sigevent)))
+ return -EFAULT;
+ }
+ return sys_lio_submit(ctx_id, mode, nr, iocb64, event);
+}
+
struct compat_ncp_mount_data {
compat_int_t version;
compat_uint_t ncp_fd;
diff -puN include/linux/syscalls.h~aio-listio-support include/linux/syscalls.h
--- linux-2.6.19-rc5-mm2/include/linux/syscalls.h~aio-listio-support 2006-11-24 09:42:44.000000000 +0530
+++ linux-2.6.19-rc5-mm2-bharata/include/linux/syscalls.h 2006-11-24 09:43:42.000000000 +0530
@@ -319,6 +319,8 @@ asmlinkage long sys_io_getevents(aio_con
struct timespec __user *timeout);
asmlinkage long sys_io_submit(aio_context_t, long,
struct iocb __user * __user *);
+asmlinkage long sys_lio_submit(aio_context_t, int, long,
+ struct iocb __user * __user *, struct sigevent __user *);
asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
struct io_event __user *result);
asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd,
_
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations
2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet
@ 2006-11-27 21:37 ` Andrew Morton
2006-11-27 21:52 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-11-27 21:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel, Daniel McNeil
On Thu, 23 Nov 2006 11:57:29 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:
> On 32bits SMP platforms, 64bits i_size is protected by a seqcount
> (i_size_seqcount).
>
> When i_size is read or written, i_size_seqcount is read/written as well, so it
> make sense to group these two fields together in the same cache line.
>
> Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one
> for i_size_seqcount). After, only one cache line is needed/ (dirtied on a
> i_size change).
I didn't understand that paragraph at all, really, so I took it out.
At present an i_size change will dirty one, two or three cachelines, most
likely one or two.
After your patch an i_size change will dirty one or two cachelines, most
likely one.
yes?
> This patch moves i_size_seqcount next to i_size, and also moves i_version to
> let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for 32bits
> platforms).
>
> For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a 'long
> i_version' should not introduce a new hole because of padding.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations
2006-11-27 21:37 ` Andrew Morton
@ 2006-11-27 21:52 ` Eric Dumazet
2006-11-27 22:01 ` Andrew Morton
0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2006-11-27 21:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew Morton a écrit :
> On Thu, 23 Nov 2006 11:57:29 +0100
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>> On 32bits SMP platforms, 64bits i_size is protected by a seqcount
>> (i_size_seqcount).
>>
>> When i_size is read or written, i_size_seqcount is read/written as well, so it
>> make sense to group these two fields together in the same cache line.
>>
>> Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one
>> for i_size_seqcount). After, only one cache line is needed/ (dirtied on a
>> i_size change).
>
> I didn't understand that paragraph at all, really, so I took it out.
>
> At present an i_size change will dirty one, two or three cachelines, most
> likely one or two.
>
> After your patch an i_size change will dirty one or two cachelines, most
> likely one.
>
> yes?
nope
Before :
---------
offsetof(i_size) = 0x3C
i_size is 8 bytes, so i_size spans 2 cache lines (if 64 or 32 bytes cache lines)
and offsetof(i_size_seqcount) = 0x160, so a read of i_size (coupled with a
read of seqcount) needed 3 cache lines. A change of i_size dirtied 2 or 3
cache lines.
After :
--------
offsetof(i_size) = 0x40
offsetof(i_size_seqcount) = 0x48
One cache line 'only', reading or writing.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations
2006-11-27 21:52 ` Eric Dumazet
@ 2006-11-27 22:01 ` Andrew Morton
2006-11-28 20:31 ` Eric Dumazet
0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2006-11-27 22:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel
On Mon, 27 Nov 2006 22:52:04 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:
> > I didn't understand that paragraph at all, really, so I took it out.
> >
> > At present an i_size change will dirty one, two or three cachelines, most
> > likely one or two.
> >
> > After your patch an i_size change will dirty one or two cachelines, most
> > likely one.
> >
> > yes?
>
> nope
>
> Before :
> ---------
> offsetof(i_size) = 0x3C
>
> i_size is 8 bytes, so i_size spans 2 cache lines (if 64 or 32 bytes cache lines)
This all depends on the offset of the inode, and you don't know what that is.
offsetof(ext3_inode_info, vfs_inode) != offsetof(nfs_inode, vfs_inode), etc.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations
2006-11-27 22:01 ` Andrew Morton
@ 2006-11-28 20:31 ` Eric Dumazet
0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2006-11-28 20:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew Morton a écrit :
> This all depends on the offset of the inode, and you don't know what that is.
>
> offsetof(ext3_inode_info, vfs_inode) != offsetof(nfs_inode, vfs_inode), etc.
Doh... yes you are absolutly right :) I feel dumb now :(
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-11-28 20:31 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
2006-11-20 13:31 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué
2006-11-20 13:42 ` Zach Brown
2006-11-20 21:45 ` Ulrich Drepper
2006-11-20 14:02 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
2006-11-20 15:13 ` Zach Brown
2006-11-21 10:40 ` Sébastien Dugué
[not found] ` <20061122104055.3d1c029a@frecb000686>
2006-11-22 10:22 ` Zach Brown
2006-11-23 8:24 ` Sébastien Dugué
2006-11-22 1:02 ` Andrew Morton
2006-11-23 8:28 ` Sébastien Dugué
2006-11-23 8:40 ` Andrew Morton
2006-11-23 9:47 ` Sébastien Dugué
2006-11-23 10:14 ` Andrew Morton
2006-11-23 10:27 ` Sébastien Dugué
2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet
2006-11-27 21:37 ` Andrew Morton
2006-11-27 21:52 ` Eric Dumazet
2006-11-27 22:01 ` Andrew Morton
2006-11-28 20:31 ` Eric Dumazet
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
2006-11-21 10:35 ` Suparna Bhattacharya
2006-11-27 13:39 ` Bharata B Rao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox