* [PATCH -mm 0/4][AIO] - AIO completion signal notification v2
@ 2006-11-20 14:17 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Sébastien Dugué @ 2006-11-20 14:17 UTC (permalink / raw)
To: linux-kernel
Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig,
Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper
Hi
Here is the latest rework of the AIO completion signal notification patches.
This set consists in 4 patches:
1. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h
2. export-good_sigevent: move good_sigevent into signal.c and export it
3. aio-notify-sig: the AIO completion signal notification
4. listio: adds listio support
Description are in the individual patches.
Changes from v1:
- cleanups suggested by Christoph Hellwig, Badari Pulavarty and Zach Brown
- added lisio patch
Comments are welcome as usual.
Thanks,
Sébastien.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH -mm 1/4][AIO] - fix aio.h includes 2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué @ 2006-11-20 14:22 ` Sébastien Dugué 2006-11-20 13:31 ` Zach Brown 2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué ` (2 subsequent siblings) 3 siblings, 1 reply; 26+ messages in thread From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw) To: linux-kernel Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Fix the double inclusion of linux/uio.h in linux/aio.h aio.h | 1 - 1 file changed, 1 deletion(-) Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Index: linux-2.6.19-rc5-mm2/include/linux/aio.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h 2006-11-17 11:20:27.000000000 +0100 @@ -7,7 +7,6 @@ #include <linux/uio.h> #include <asm/atomic.h> -#include <linux/uio.h> #define AIO_MAXSEGS 4 #define AIO_KIOGRP_NR_ATOMIC 8 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 1/4][AIO] - fix aio.h includes 2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué @ 2006-11-20 13:31 ` Zach Brown 0 siblings, 0 replies; 26+ messages in thread From: Zach Brown @ 2006-11-20 13:31 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Sébastien Dugué wrote: > Fix the double inclusion of linux/uio.h in linux/aio.h Sure seems reasonable to me. > Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Signed-off-by: Zach Brown <zach.brown@oracle.com> - z ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent() 2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué 2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué @ 2006-11-20 14:22 ` Sébastien Dugué 2006-11-20 13:42 ` Zach Brown 2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué 2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué 3 siblings, 1 reply; 26+ messages in thread From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw) To: linux-kernel Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Export good_sigevent() Move good_sigevent() from posix-timers.c to signal.c where it belongs, and export it so that it can be used by other subsystems. include/linux/signal.h | 2 ++ kernel/posix-timers.c | 17 ----------------- kernel/signal.c | 23 +++++++++++++++++++++++ 3 files changed, 25 insertions(+), 17 deletions(-) Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Index: linux-2.6.19-rc5-mm2/include/linux/signal.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/signal.h 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/signal.h 2006-11-17 11:20:31.000000000 +0100 @@ -241,6 +241,8 @@ extern int sigprocmask(int, sigset_t *, struct pt_regs; extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie); +extern struct task_struct * good_sigevent(sigevent_t *); + #endif /* __KERNEL__ */ #endif /* _LINUX_SIGNAL_H */ Index: linux-2.6.19-rc5-mm2/kernel/posix-timers.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/kernel/posix-timers.c 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/posix-timers.c 2006-11-17 11:20:31.000000000 +0100 @@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_ return ret; } -static struct task_struct * good_sigevent(sigevent_t * event) -{ - struct task_struct *rtn = current->group_leader; - - if ((event->sigev_notify & SIGEV_THREAD_ID ) && - (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) || - rtn->tgid != current->tgid || - (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)) - return NULL; - - if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) && - ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX))) - return NULL; - - return rtn; -} - void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock) { if ((unsigned) clock_id >= MAX_CLOCKS) { Index: linux-2.6.19-rc5-mm2/kernel/signal.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/kernel/signal.c 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/signal.c 2006-11-17 11:20:31.000000000 +0100 @@ -1189,6 +1189,29 @@ int group_send_sig_info(int sig, struct return ret; } +/*** + * good_sigevent - check and get target task from a sigevent. + * @event: the sigevent to be checked + * + * This function must be called with tasklist_lock held for reading. + */ +struct task_struct * good_sigevent(sigevent_t * event) +{ + struct task_struct *rtn = current->group_leader; + + if ((event->sigev_notify & SIGEV_THREAD_ID ) && + (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) || + rtn->tgid != current->tgid || + (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)) + return NULL; + + if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) && + ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX))) + return NULL; + + return rtn; +} + /* * kill_pgrp_info() sends a signal to a process group: this is what the tty * control characters do (^C, ^Z etc) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent() 2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué @ 2006-11-20 13:42 ` Zach Brown 2006-11-20 21:45 ` Ulrich Drepper 0 siblings, 1 reply; 26+ messages in thread From: Zach Brown @ 2006-11-20 13:42 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper > +extern struct task_struct * good_sigevent(sigevent_t *); I wonder if "good_sigevent()" isn't a very strong name for something that is now up in signal.h. Maybe "sigevent_find_task()"? - z ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent() 2006-11-20 13:42 ` Zach Brown @ 2006-11-20 21:45 ` Ulrich Drepper 2006-11-20 14:02 ` Zach Brown 0 siblings, 1 reply; 26+ messages in thread From: Ulrich Drepper @ 2006-11-20 21:45 UTC (permalink / raw) To: Zach Brown Cc: Sébastien Dugué, linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion Zach Brown wrote: > I wonder if "good_sigevent()" isn't a very strong name for something > that is now up in signal.h. Maybe "sigevent_find_task()"? That's not the purpose, it's just one of the checks which need to be done to check that the sigevent value is "good". I think the name is fine. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 2/4][AIO] - export good_sigevent() 2006-11-20 21:45 ` Ulrich Drepper @ 2006-11-20 14:02 ` Zach Brown 0 siblings, 0 replies; 26+ messages in thread From: Zach Brown @ 2006-11-20 14:02 UTC (permalink / raw) To: Ulrich Drepper Cc: Sébastien Dugué, linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion Ulrich Drepper wrote: > Zach Brown wrote: >> I wonder if "good_sigevent()" isn't a very strong name for something >> that is now up in signal.h. Maybe "sigevent_find_task()"? > > That's not the purpose, it's just one of the checks which need to be > done to check that the sigevent value is "good". Hmm? It returns a task_struct *, potentially after finding it with find_task_by_pid(). That said, I don't particularly care much. It just jumped out as I was reading it. - z ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué 2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué 2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué @ 2006-11-20 14:22 ` Sébastien Dugué 2006-11-20 15:13 ` Zach Brown 2006-11-22 1:02 ` Andrew Morton 2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué 3 siblings, 2 replies; 26+ messages in thread From: Sébastien Dugué @ 2006-11-20 14:22 UTC (permalink / raw) To: linux-kernel Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper AIO completion signal notification The current 2.6 kernel does not support notification of user space via an RT signal upon an asynchronous IO completion. The POSIX specification states that when an AIO request completes, a signal can be delivered to the application as notification. This patch adds a struct sigevent *aio_sigeventp to the iocb. The relevant fields (pid, signal number and value) are stored in the kiocb for use when the request completes. That sigevent structure is filled by the application as part of the AIO request preparation. Upon request completion, the kernel notifies the application using those sigevent parameters. If SIGEV_NONE has been specified, then the old behaviour is retained and the application must rely on polling the completion queue using io_getevents(). A struct sigevent *aio_sigeventp field is added to struct iocb in include/linux/aio_abi.h A struct aio_notify containing the sigevent parameters is defined in aio.h: struct aio_notify { struct task_struct *target; __u16 signo; __u16 notify; sigval_t value; }; A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h In io_submit_one(), if the application provided a sigevent then setup_sigevent() is called which does the following: - check access to the user sigevent and make a local copy - if the requested notification is SIGEV_NONE, then nothing to do - fill in the kiocb->ki_notify fields (notify, signo, value) - check sigevent consistency, get the signal target task and save it in kiocb->ki_notify.target - preallocate a sigqueue for this event using sigqueue_alloc() Upon request completion, in aio_complete(), if notification is needed for this request (iocb->ki_notify.notify != SIGEV_NONE), then aio_send_signal() is called to signal the target task as follows: - fill in the siginfo struct to be sent to the task - if notify is SIGEV_THREAD_ID then send signal to specific task using send_sigqueue() - else send signal to task group using send_5group_sigqueue() Notes concerning sigqueue preallocation: To ensure reliable delivery of completion notification, the sigqueue is preallocated in the submission path so that there is no chance it can fail in the completion path. Unlike the posix-timers case (currently the single other user of sigqueue preallocation), where the sigqueue is allocated for the lifetime of the timer and freed at timer destruction time, the aio case is a bit more tricky due to the async nature of the whole thing. In the aio case, the sigqueue exists for the lifetime of the request, therefore it must be freed only once the signal for the request completion has been delivered. This involves changing __sigqueue_free() to free the sigqueue when the signal is collected if si_code is SI_ASYNCIO even if it was preallocated as well as explicitly calling sigqueue_free() in submission and completion error paths. fs/aio.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++-- include/linux/aio.h | 12 ++++ include/linux/aio_abi.h | 3 - kernel/signal.c | 2 4 files changed, 127 insertions(+), 6 deletions(-) Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Index: linux-2.6.19-rc5-mm2/fs/aio.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 11:20:32.000000000 +0100 @@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_ req->ki_dtor = NULL; req->private = NULL; req->ki_iovec = NULL; + req->ki_notify.sigq = NULL; INIT_LIST_HEAD(&req->ki_run_list); /* Check if the completion queue has enough free space to @@ -463,6 +464,11 @@ static inline void really_put_req(struct req->ki_dtor(req); if (req->ki_iovec != &req->ki_inline_vec) kfree(req->ki_iovec); + + /* Release task ref */ + if (req->ki_notify.notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) + put_task_struct(req->ki_notify.target); + kmem_cache_free(kiocb_cachep, req); ctx->reqs_active--; @@ -929,6 +935,81 @@ void fastcall kick_iocb(struct kiocb *io } EXPORT_SYMBOL(kick_iocb); +static int aio_send_signal(struct aio_notify *notify) +{ + struct sigqueue *sigq = notify->sigq; + struct siginfo *info = &sigq->info; + int ret; + + memset(info, 0, sizeof(struct siginfo)); + + info->si_signo = notify->signo; + info->si_errno = 0; + info->si_code = SI_ASYNCIO; + info->si_pid = 0; + info->si_uid = 0; + info->si_value = notify->value; + + if (notify->notify & SIGEV_THREAD_ID) + ret = send_sigqueue(notify->signo, sigq, notify->target); + else + ret = send_group_sigqueue(notify->signo, sigq, notify->target); + + return ret; +} + +static long aio_setup_sigevent(struct aio_notify *notify, + struct sigevent __user *user_event) +{ + sigevent_t event; + struct task_struct *target; + + if (copy_from_user(&event, user_event, sizeof (event))) + return -EFAULT; + + if (event.sigev_notify == SIGEV_NONE) + return 0; + + notify->notify = event.sigev_notify; + notify->signo = event.sigev_signo; + notify->value = event.sigev_value; + + read_lock(&tasklist_lock); + target = good_sigevent(&event); + + if (unlikely(!target || (target->flags & PF_EXITING))) + goto out_unlock; + + notify->target = target; + + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { + /* + * This reference will be dropped in really_put_req() when + * we're done with the request. + */ + get_task_struct(target); + } + + read_unlock(&tasklist_lock); + + /* + * NOTE: we cannot free the sigqueue in the completion path as + * the signal may not have been delivered to the target task. + * Therefore it has to be freed in __sigqueue_free() when the + * signal is collected if si_code is SI_ASYNCIO. + */ + notify->sigq = sigqueue_alloc(); + + if (unlikely(!notify->sigq)) + return -EAGAIN; + + return 0; + +out_unlock: + read_unlock(&tasklist_lock); + return -EINVAL; +} + /* aio_complete * Called when the io request on the given iocb is complete. * Returns true if this is the last user of the request. The @@ -976,8 +1057,11 @@ int fastcall aio_complete(struct kiocb * * cancelled requests don't get events, userland was given one * when the event got cancelled. */ - if (kiocbIsCancelled(iocb)) + if (kiocbIsCancelled(iocb)) { + if (iocb->ki_notify.sigq) + sigqueue_free(iocb->ki_notify.sigq); goto put_rq; + } ring = kmap_atomic(info->ring_pages[0], KM_IRQ1); @@ -1008,6 +1092,14 @@ int fastcall aio_complete(struct kiocb * pr_debug("added to ring %p at [%lu]\n", iocb, tail); + if (iocb->ki_notify.notify != SIGEV_NONE) { + ret = aio_send_signal(&iocb->ki_notify); + + /* If signal generation failed, release the sigqueue */ + if (ret) + sigqueue_free(iocb->ki_notify.sigq); + } + pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried, iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes); put_rq: @@ -1549,8 +1641,7 @@ int fastcall io_submit_one(struct kioctx ssize_t ret; /* enforce forwards compatibility on users */ - if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2 || - iocb->aio_reserved3)) { + if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved3)) { pr_debug("EINVAL: io_submit: reserve field set\n"); return -EINVAL; } @@ -1559,6 +1650,7 @@ int fastcall io_submit_one(struct kioctx if (unlikely( (iocb->aio_buf != (unsigned long)iocb->aio_buf) || (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) || + (iocb->aio_sigeventp != (unsigned long)iocb->aio_sigeventp) || ((ssize_t)iocb->aio_nbytes < 0) )) { pr_debug("EINVAL: io_submit: overflow check\n"); @@ -1593,10 +1685,21 @@ int fastcall io_submit_one(struct kioctx INIT_LIST_HEAD(&req->ki_wait.task_list); req->ki_retried = 0; + /* handle setting up the sigevent for POSIX AIO signals */ + req->ki_notify.notify = SIGEV_NONE; + + if (iocb->aio_sigeventp) { + ret = aio_setup_sigevent(&req->ki_notify, + (struct sigevent __user *)(unsigned long) + iocb->aio_sigeventp); + if (ret) + goto out_put_req; + } + ret = aio_setup_iocb(req); if (ret) - goto out_put_req; + goto out_sigqfree; spin_lock_irq(&ctx->ctx_lock); aio_run_iocb(req); @@ -1609,6 +1712,11 @@ int fastcall io_submit_one(struct kioctx aio_put_req(req); /* drop extra ref to req */ return 0; +out_sigqfree: + /* Undo the sigqueue alloc if someting went bad */ + if (req->ki_notify.sigq) + sigqueue_free(req->ki_notify.sigq); + out_put_req: aio_put_req(req); /* drop extra ref to req */ aio_put_req(req); /* drop i/o ref to req */ Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17 11:20:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17 11:20:32.000000000 +0100 @@ -82,8 +82,9 @@ struct iocb { __u64 aio_nbytes; __s64 aio_offset; + __u64 aio_sigeventp; /* pointer to struct sigevent */ + /* extra parameters */ - __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */ __u64 aio_reserved3; }; /* 64 bytes */ Index: linux-2.6.19-rc5-mm2/include/linux/aio.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17 11:20:27.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h 2006-11-17 11:20:32.000000000 +0100 @@ -7,6 +7,7 @@ #include <linux/uio.h> #include <asm/atomic.h> +#include <asm/siginfo.h> #define AIO_MAXSEGS 4 #define AIO_KIOGRP_NR_ATOMIC 8 @@ -49,6 +50,14 @@ struct kioctx; #define kiocbIsKicked(iocb) test_bit(KIF_KICKED, &(iocb)->ki_flags) #define kiocbIsCancelled(iocb) test_bit(KIF_CANCELLED, &(iocb)->ki_flags) +struct aio_notify { + struct task_struct *target; + __u16 signo; + __u16 notify; + sigval_t value; + struct sigqueue *sigq; +}; + /* is there a better place to document function pointer methods? */ /** * ki_retry - iocb forward progress callback @@ -118,6 +127,9 @@ struct kiocb { struct list_head ki_list; /* the aio core uses this * for cancellation */ + + /* to notify a process on I/O event */ + struct aio_notify ki_notify; }; #define is_sync_kiocb(iocb) ((iocb)->ki_key == KIOCB_SYNC_KEY) Index: linux-2.6.19-rc5-mm2/kernel/signal.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/kernel/signal.c 2006-11-17 11:20:31.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/signal.c 2006-11-17 11:20:32.000000000 +0100 @@ -297,7 +297,7 @@ static struct sigqueue *__sigqueue_alloc static void __sigqueue_free(struct sigqueue *q) { - if (q->flags & SIGQUEUE_PREALLOC) + if (q->flags & SIGQUEUE_PREALLOC && q->info.si_code != SI_ASYNCIO) return; atomic_dec(&q->user->sigpending); free_uid(q->user); ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué @ 2006-11-20 15:13 ` Zach Brown 2006-11-21 10:40 ` Sébastien Dugué [not found] ` <20061122104055.3d1c029a@frecb000686> 2006-11-22 1:02 ` Andrew Morton 1 sibling, 2 replies; 26+ messages in thread From: Zach Brown @ 2006-11-20 15:13 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Sébastien Dugué wrote: > AIO completion signal notification This is looking a lot better, thanks for keeping at it. > +static long aio_setup_sigevent(struct aio_notify *notify, > + struct sigevent __user *user_event) > +{ > + sigevent_t event; > + struct task_struct *target; > + > + if (copy_from_user(&event, user_event, sizeof (event))) > + return -EFAULT; Last time we talked about this needing to call get_compat_sigevent(). I think it still needs to. I think we should avoid the examples set by the current compat_sys_io_submit() and get_compat_sigevent() callers. They copy translated data on to the userspace stack and pass it to the syscalls. That will get crazy for compat_sys_io_submit() because it would have to rewrite the iocb and the pointer to the iocb to get sys_io_submit() to find a copied sigevent on the stack. I think the model is compat_do_readv_writev(). Hoist some of the syscall logic up into the compat layer so that one copying and translating pass is made instead of trying to fool the syscall logic into thinking that it's being called from a native word size caller. So io_submit_one() should be given the kernel copies of the userspace structures it needs. sys_io_submit() will pass it the copies it made for native word size callers. compat_sys_io_submit() will pass in the copies it made after translating from 32bit arguments. io_submit_one() and lookup_kioctx() will have to be made available to kernel/compat.c (via linux/aio.h, surely.). aio_setup_sigevent() will be called from *_io_submit() and given the kernel sigevent, not a userspace pointer. Reworking things this way should have the added benefit of making 32/64 sys_io_submit() more efficient than it is today. - z ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-20 15:13 ` Zach Brown @ 2006-11-21 10:40 ` Sébastien Dugué [not found] ` <20061122104055.3d1c029a@frecb000686> 1 sibling, 0 replies; 26+ messages in thread From: Sébastien Dugué @ 2006-11-21 10:40 UTC (permalink / raw) To: Zach Brown Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Zach Brown <zach.brown@oracle.com> wrote: > Sébastien Dugué wrote: > > AIO completion signal notification > > This is looking a lot better, thanks for keeping at it. > > > +static long aio_setup_sigevent(struct aio_notify *notify, > > + struct sigevent __user *user_event) > > +{ > > + sigevent_t event; > > + struct task_struct *target; > > + > > + if (copy_from_user(&event, user_event, sizeof (event))) > > + return -EFAULT; > > Last time we talked about this needing to call get_compat_sigevent(). I > think it still needs to. Yes, I could not find an elegant way to tackle this, so I left it float for a while. > > I think we should avoid the examples set by the current > compat_sys_io_submit() and get_compat_sigevent() callers. They copy > translated data on to the userspace stack and pass it to the syscalls. > That will get crazy for compat_sys_io_submit() because it would have to > rewrite the iocb and the pointer to the iocb to get sys_io_submit() to > find a copied sigevent on the stack. Right, that's what I really wanted to avoid at all costs. > > I think the model is compat_do_readv_writev(). Hoist some of the > syscall logic up into the compat layer so that one copying and > translating pass is made instead of trying to fool the syscall logic > into thinking that it's being called from a native word size caller. Yes, but you end up duplicating a lot of code, but if it's the price to pay, I'm all for it. > > So io_submit_one() should be given the kernel copies of the userspace > structures it needs. sys_io_submit() will pass it the copies it made > for native word size callers. compat_sys_io_submit() will pass in the > copies it made after translating from 32bit arguments. io_submit_one() > and lookup_kioctx() will have to be made available to kernel/compat.c > (via linux/aio.h, surely.). aio_setup_sigevent() will be called from > *_io_submit() and given the kernel sigevent, not a userspace pointer. > > Reworking things this way should have the added benefit of making 32/64 > sys_io_submit() more efficient than it is today. I completely aggree. I will look into this. Thanks, Sébastien. ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20061122104055.3d1c029a@frecb000686>]
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification [not found] ` <20061122104055.3d1c029a@frecb000686> @ 2006-11-22 10:22 ` Zach Brown 2006-11-23 8:24 ` Sébastien Dugué 0 siblings, 1 reply; 26+ messages in thread From: Zach Brown @ 2006-11-22 10:22 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper > OK, looking at this, there's something bothering me: io_submit_one() needs > a pointer to the user iocb in order to push back the iocb->ki_key to userspace, > as well as storing the user_iocb pointer into iocb->ki_obj. Why can't it continue to do what it does today? Both of those uses of the user_iocb pointer involve fixed-width fields and don't need compat help. > So I think that some of the logic in io_submit_one() must be moved up to > sys_io_submit(), including the aio_get_req() call. I don't see why that would be needed. - z ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-22 10:22 ` Zach Brown @ 2006-11-23 8:24 ` Sébastien Dugué 0 siblings, 0 replies; 26+ messages in thread From: Sébastien Dugué @ 2006-11-23 8:24 UTC (permalink / raw) To: Zach Brown Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Wed, 22 Nov 2006 02:22:15 -0800 Zach Brown <zach.brown@oracle.com> wrote: > > OK, looking at this, there's something bothering me: io_submit_one() needs > > a pointer to the user iocb in order to push back the iocb->ki_key to userspace, > > as well as storing the user_iocb pointer into iocb->ki_obj. > > Why can't it continue to do what it does today? Both of those uses of > the user_iocb pointer involve fixed-width fields and don't need compat help. Aah, right, thanks. > > > So I think that some of the logic in io_submit_one() must be moved up to > > sys_io_submit(), including the aio_get_req() call. > > I don't see why that would be needed. Not anymore, indeed. Thanks, Sébastien. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué 2006-11-20 15:13 ` Zach Brown @ 2006-11-22 1:02 ` Andrew Morton 2006-11-23 8:28 ` Sébastien Dugué 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2006-11-22 1:02 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Mon, 20 Nov 2006 15:22:52 +0100 S__bastien Dugu__ <sebastien.dugue@bull.net> wrote: > +static long aio_setup_sigevent(struct aio_notify *notify, > + struct sigevent __user *user_event) > +{ > + sigevent_t event; > + struct task_struct *target; > + > + if (copy_from_user(&event, user_event, sizeof (event))) > + return -EFAULT; > + > + if (event.sigev_notify == SIGEV_NONE) > + return 0; > + > + notify->notify = event.sigev_notify; > + notify->signo = event.sigev_signo; > + notify->value = event.sigev_value; > + > + read_lock(&tasklist_lock); > + target = good_sigevent(&event); > + > + if (unlikely(!target || (target->flags & PF_EXITING))) > + goto out_unlock; > + > + notify->target = target; > + > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > + /* > + * This reference will be dropped in really_put_req() when > + * we're done with the request. > + */ > + get_task_struct(target); > + } It worries me that this function can save away a task_struct* without having taken a reference against it. > + read_unlock(&tasklist_lock); > + > + /* > + * NOTE: we cannot free the sigqueue in the completion path as > + * the signal may not have been delivered to the target task. > + * Therefore it has to be freed in __sigqueue_free() when the > + * signal is collected if si_code is SI_ASYNCIO. > + */ > + notify->sigq = sigqueue_alloc(); > + > + if (unlikely(!notify->sigq)) > + return -EAGAIN; > + > + return 0; > + > +out_unlock: > + read_unlock(&tasklist_lock); > + return -EINVAL; > +} ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-22 1:02 ` Andrew Morton @ 2006-11-23 8:28 ` Sébastien Dugué 2006-11-23 8:40 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Sébastien Dugué @ 2006-11-23 8:28 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Tue, 21 Nov 2006 17:02:28 -0800 Andrew Morton <akpm@osdl.org> wrote: > On Mon, 20 Nov 2006 15:22:52 +0100 > S__bastien Dugu__ <sebastien.dugue@bull.net> wrote: > > > +static long aio_setup_sigevent(struct aio_notify *notify, > > + struct sigevent __user *user_event) > > +{ > > + sigevent_t event; > > + struct task_struct *target; > > + > > + if (copy_from_user(&event, user_event, sizeof (event))) > > + return -EFAULT; > > + > > + if (event.sigev_notify == SIGEV_NONE) > > + return 0; > > + > > + notify->notify = event.sigev_notify; > > + notify->signo = event.sigev_signo; > > + notify->value = event.sigev_value; > > + > > + read_lock(&tasklist_lock); > > + target = good_sigevent(&event); > > + > > + if (unlikely(!target || (target->flags & PF_EXITING))) > > + goto out_unlock; > > + > > + > > + > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > > + /* > > + * This reference will be dropped in really_put_req() when > > + * we're done with the request. > > + */ > > + get_task_struct(target); > > + } > > It worries me that this function can save away a task_struct* without > having taken a reference against it. > OK. Does moving 'notify->target = target;' after the get_task_struct() will do, or am I missing something more subtle? Thanks, Sébastien. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-23 8:28 ` Sébastien Dugué @ 2006-11-23 8:40 ` Andrew Morton 2006-11-23 9:47 ` Sébastien Dugué 2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2006-11-23 8:40 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Thu, 23 Nov 2006 09:28:05 +0100 Sébastien Dugué <sebastien.dugue@bull.net> wrote: > > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > > > + /* > > > + * This reference will be dropped in really_put_req() when > > > + * we're done with the request. > > > + */ > > > + get_task_struct(target); > > > + } > > > > It worries me that this function can save away a task_struct* without > > having taken a reference against it. > > > > OK. Does moving 'notify->target = target;' after the get_task_struct() will > do, or am I missing something more subtle? Well it's your code - you tell me ;) It is unsafe (and rather pointless) to be saving the address of some structure which can be freed at any time. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-23 8:40 ` Andrew Morton @ 2006-11-23 9:47 ` Sébastien Dugué 2006-11-23 10:14 ` Andrew Morton 2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet 1 sibling, 1 reply; 26+ messages in thread From: Sébastien Dugué @ 2006-11-23 9:47 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Thu, 23 Nov 2006 00:40:53 -0800 Andrew Morton <akpm@osdl.org> wrote: > On Thu, 23 Nov 2006 09:28:05 +0100 > Sébastien Dugué <sebastien.dugue@bull.net> wrote: > > > > > + target = good_sigevent(&event); > > > > + > > > > + if (unlikely(!target || (target->flags & PF_EXITING))) > > > > + goto out_unlock; > > > > + > > > > + > > > > + > > > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > > > > + /* > > > > + * This reference will be dropped in really_put_req() when > > > > + * we're done with the request. > > > > + */ > > > > + get_task_struct(target); > > > > + } > > > > > > It worries me that this function can save away a task_struct* without > > > having taken a reference against it. > > > > > > > OK. Does moving 'notify->target = target;' after the get_task_struct() will > > do, or am I missing something more subtle? > > Well it's your code - you tell me ;) > > It is unsafe (and rather pointless) to be saving the address of some structure > which can be freed at any time. Sorry, I expressed myself quite badly. What I wanted to know is whether you are worried with the task been freed between saving its pointer and getting a ref on it (which is trivial to fix) or you are thinking of something deeper. Thanks, Sébastien. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-23 9:47 ` Sébastien Dugué @ 2006-11-23 10:14 ` Andrew Morton 2006-11-23 10:27 ` Sébastien Dugué 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2006-11-23 10:14 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Thu, 23 Nov 2006 10:47:55 +0100 Sébastien Dugué <sebastien.dugue@bull.net> wrote: > On Thu, 23 Nov 2006 00:40:53 -0800 > Andrew Morton <akpm@osdl.org> wrote: > > > On Thu, 23 Nov 2006 09:28:05 +0100 > > Sébastien Dugué <sebastien.dugue@bull.net> wrote: > > > > > > > + target = good_sigevent(&event); > > > > > + > > > > > + if (unlikely(!target || (target->flags & PF_EXITING))) > > > > > + goto out_unlock; > > > > > + > > > > > + > > > > > + > > > > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > > > > > + /* > > > > > + * This reference will be dropped in really_put_req() when > > > > > + * we're done with the request. > > > > > + */ > > > > > + get_task_struct(target); > > > > > + } > > > > > > > > It worries me that this function can save away a task_struct* without > > > > having taken a reference against it. > > > > > > > > > > OK. Does moving 'notify->target = target;' after the get_task_struct() will > > > do, or am I missing something more subtle? > > > > Well it's your code - you tell me ;) > > > > It is unsafe (and rather pointless) to be saving the address of some structure > > which can be freed at any time. > > Sorry, I expressed myself quite badly. What I wanted to know is whether you > are worried with the task been freed between saving its pointer and getting a > ref on it (which is trivial to fix) or you are thinking of something deeper. > Look: > + notify->target = target; > + > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > + /* > + * This reference will be dropped in really_put_req() when > + * we're done with the request. > + */ > + get_task_struct(target); If that test fails, we've saved a pointer to the task_struct without having taken a refreence on it. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 3/4][AIO] - AIO completion signal notification 2006-11-23 10:14 ` Andrew Morton @ 2006-11-23 10:27 ` Sébastien Dugué 0 siblings, 0 replies; 26+ messages in thread From: Sébastien Dugué @ 2006-11-23 10:27 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-aio, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper On Thu, 23 Nov 2006 02:14:37 -0800 Andrew Morton <akpm@osdl.org> wrote: [snip] > Look: > > > + notify->target = target; > > + > > + if (notify->notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID)) { > > + /* > > + * This reference will be dropped in really_put_req() when > > + * we're done with the request. > > + */ > > + get_task_struct(target); > > If that test fails, we've saved a pointer to the task_struct without having > taken a refreence on it. > Aggreed, just wanted to make sure. Thanks, Sébastien. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations 2006-11-23 8:40 ` Andrew Morton 2006-11-23 9:47 ` Sébastien Dugué @ 2006-11-23 10:57 ` Eric Dumazet 2006-11-27 21:37 ` Andrew Morton 1 sibling, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2006-11-23 10:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Daniel McNeil [-- Attachment #1: Type: text/plain, Size: 766 bytes --] On 32bits SMP platforms, 64bits i_size is protected by a seqcount (i_size_seqcount). When i_size is read or written, i_size_seqcount is read/written as well, so it make sense to group these two fields together in the same cache line. Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one for i_size_seqcount). After, only one cache line is needed/ (dirtied on a i_size change). This patch moves i_size_seqcount next to i_size, and also moves i_version to let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for 32bits platforms). For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a 'long i_version' should not introduce a new hole because of padding. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: move_i_size_seqcount.patch --] [-- Type: text/plain, Size: 816 bytes --] --- linux-2.6.19-rc6/include/linux/fs.h 2006-11-23 10:47:53.000000000 +0100 +++ linux-2.6.19-rc6-ed/include/linux/fs.h 2006-11-23 11:13:36.000000000 +0100 @@ -548,12 +548,15 @@ struct inode { uid_t i_uid; gid_t i_gid; dev_t i_rdev; + unsigned long i_version; loff_t i_size; +#ifdef __NEED_I_SIZE_ORDERED + seqcount_t i_size_seqcount; +#endif struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; unsigned int i_blkbits; - unsigned long i_version; blkcnt_t i_blocks; unsigned short i_bytes; spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ @@ -598,9 +601,6 @@ struct inode { void *i_security; #endif void *i_private; /* fs or device private pointer */ -#ifdef __NEED_I_SIZE_ORDERED - seqcount_t i_size_seqcount; -#endif }; /* ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations 2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet @ 2006-11-27 21:37 ` Andrew Morton 2006-11-27 21:52 ` Eric Dumazet 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2006-11-27 21:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, Daniel McNeil On Thu, 23 Nov 2006 11:57:29 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote: > On 32bits SMP platforms, 64bits i_size is protected by a seqcount > (i_size_seqcount). > > When i_size is read or written, i_size_seqcount is read/written as well, so it > make sense to group these two fields together in the same cache line. > > Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one > for i_size_seqcount). After, only one cache line is needed/ (dirtied on a > i_size change). I didn't understand that paragraph at all, really, so I took it out. At present an i_size change will dirty one, two or three cachelines, most likely one or two. After your patch an i_size change will dirty one or two cachelines, most likely one. yes? > This patch moves i_size_seqcount next to i_size, and also moves i_version to > let offsetof(struct inode, i_size) being 0x40 instead of 0x3c (for 32bits > platforms). > > For 64 bits platforms, i_size_seqcount doesnt exist, and the move of a 'long > i_version' should not introduce a new hole because of padding. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations 2006-11-27 21:37 ` Andrew Morton @ 2006-11-27 21:52 ` Eric Dumazet 2006-11-27 22:01 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: Eric Dumazet @ 2006-11-27 21:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton a écrit : > On Thu, 23 Nov 2006 11:57:29 +0100 > Eric Dumazet <dada1@cosmosbay.com> wrote: > >> On 32bits SMP platforms, 64bits i_size is protected by a seqcount >> (i_size_seqcount). >> >> When i_size is read or written, i_size_seqcount is read/written as well, so it >> make sense to group these two fields together in the same cache line. >> >> Before this patch, accessing i_size needed 3 cache lines (2 for i_size, one >> for i_size_seqcount). After, only one cache line is needed/ (dirtied on a >> i_size change). > > I didn't understand that paragraph at all, really, so I took it out. > > At present an i_size change will dirty one, two or three cachelines, most > likely one or two. > > After your patch an i_size change will dirty one or two cachelines, most > likely one. > > yes? nope Before : --------- offsetof(i_size) = 0x3C i_size is 8 bytes, so i_size spans 2 cache lines (if 64 or 32 bytes cache lines) and offsetof(i_size_seqcount) = 0x160, so a read of i_size (coupled with a read of seqcount) needed 3 cache lines. A change of i_size dirtied 2 or 3 cache lines. After : -------- offsetof(i_size) = 0x40 offsetof(i_size_seqcount) = 0x48 One cache line 'only', reading or writing. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations 2006-11-27 21:52 ` Eric Dumazet @ 2006-11-27 22:01 ` Andrew Morton 2006-11-28 20:31 ` Eric Dumazet 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2006-11-27 22:01 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel On Mon, 27 Nov 2006 22:52:04 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote: > > I didn't understand that paragraph at all, really, so I took it out. > > > > At present an i_size change will dirty one, two or three cachelines, most > > likely one or two. > > > > After your patch an i_size change will dirty one or two cachelines, most > > likely one. > > > > yes? > > nope > > Before : > --------- > offsetof(i_size) = 0x3C > > i_size is 8 bytes, so i_size spans 2 cache lines (if 64 or 32 bytes cache lines) This all depends on the offset of the inode, and you don't know what that is. offsetof(ext3_inode_info, vfs_inode) != offsetof(nfs_inode, vfs_inode), etc. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations 2006-11-27 22:01 ` Andrew Morton @ 2006-11-28 20:31 ` Eric Dumazet 0 siblings, 0 replies; 26+ messages in thread From: Eric Dumazet @ 2006-11-28 20:31 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton a écrit : > This all depends on the offset of the inode, and you don't know what that is. > > offsetof(ext3_inode_info, vfs_inode) != offsetof(nfs_inode, vfs_inode), etc. Doh... yes you are absolutly right :) I feel dumb now :( ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support 2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué ` (2 preceding siblings ...) 2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué @ 2006-11-20 14:23 ` Sébastien Dugué 2006-11-21 10:35 ` Suparna Bhattacharya 2006-11-27 13:39 ` Bharata B Rao 3 siblings, 2 replies; 26+ messages in thread From: Sébastien Dugué @ 2006-11-20 14:23 UTC (permalink / raw) To: linux-kernel Cc: linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper POSIX listio support This patch adds POSIX listio completion notification support. It builds on support provided by the aio signal notification patch and adds an IOCB_CMD_GROUP command to io_submit(). The purpose of IOCB_CMD_GROUP is to group together the following requests in the list up to the end of the list sumbitted to io_submit. As io_submit already accepts an array of iocbs, as part of listio submission, the user process prepends to a list of requests an empty special aiocb with an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields. An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h A struct lio_event is added in include/linux/aio.h A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h In io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb, an lio_event is created in lio_create() which contains the necessary information for signaling a thread (signal number, pid, notify type and value) along with a count of requests attached to this event. The following depicts the lio_event structure: struct lio_event { atomic_t lio_users; struct aio_notify lio_notify; }; lio_users holds an atomic counter of the number of requests attached to this lio. It is incremented with each request submitted and decremented at each request completion. When the counter reaches 0, we send the notification. Each subsequent submitted request is attached to this lio_event by setting the request kiocb->ki_lio to that lio_event (in io_submit_one()) and incrementing the lio_users count. In aio_complete(), if the request is attached to an lio (ki_lio <> 0), then lio_check() is called to decrement the lio_users count and eventually signal the user process when all the requests in the group have completed. The IOCB_CMD_GROUP semantic is as follows: - if the associated sigevent is NULL then we want to group requests for the purpose of blocking on the group completion (LIO_WAIT sync behavior). - if the associated sigevent is valid (not NULL) then we want to group requests for the purpose of being notified upon that group of requests completion (LIO_NOWAIT async behaviour). fs/aio.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++-- include/linux/aio.h | 13 ++++- include/linux/aio_abi.h | 1 3 files changed, 131 insertions(+), 6 deletions(-) Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Index: linux-2.6.19-rc5-mm2/fs/aio.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 16:22:00.000000000 +0100 @@ -414,6 +414,7 @@ static struct kiocb fastcall *__aio_get_ req->ki_cancel = NULL; req->ki_retry = NULL; req->ki_dtor = NULL; + req->ki_lio = NULL; req->private = NULL; req->ki_iovec = NULL; req->ki_notify.sigq = NULL; @@ -1010,6 +1011,53 @@ out_unlock: return -EINVAL; } +static inline void lio_check(struct lio_event *lio) +{ + int ret; + + ret = atomic_dec_and_test(&lio->lio_users); + + if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) { + /* last one -> notify process */ + aio_send_signal(&lio->lio_notify); + kfree(lio); + } +} + +static struct lio_event *lio_create(struct sigevent __user *user_event) +{ + int ret = 0; + struct lio_event *lio = NULL; + + lio = kzalloc(sizeof(*lio), GFP_KERNEL); + + if (!lio) + return ERR_PTR(-EAGAIN); + + /* + * Grab an initial ref on the lio to avoid races between + * submission and completion. + */ + atomic_set(&lio->lio_users, 1); + + lio->lio_notify.notify = SIGEV_NONE; + + if (user_event) { + /* + * User specified an event for this lio, + * he wants to be notified upon lio completion. + */ + ret = aio_setup_sigevent(&lio->lio_notify, user_event); + + if (ret) { + kfree(lio); + return ERR_PTR(ret); + } + } + + return lio; +} + /* aio_complete * Called when the io request on the given iocb is complete. * Returns true if this is the last user of the request. The @@ -1058,8 +1106,12 @@ int fastcall aio_complete(struct kiocb * * when the event got cancelled. */ if (kiocbIsCancelled(iocb)) { + if (iocb->ki_lio) + lio_check(iocb->ki_lio); + if (iocb->ki_notify.sigq) sigqueue_free(iocb->ki_notify.sigq); + goto put_rq; } @@ -1100,6 +1152,9 @@ int fastcall aio_complete(struct kiocb * sigqueue_free(iocb->ki_notify.sigq); } + if (iocb->ki_lio) + lio_check(iocb->ki_lio); + pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried, iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes); put_rq: @@ -1634,7 +1689,7 @@ static int aio_wake_function(wait_queue_ } int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - struct iocb *iocb) + struct iocb *iocb, struct lio_event *lio) { struct kiocb *req; struct file *file; @@ -1696,6 +1751,9 @@ int fastcall io_submit_one(struct kioctx goto out_put_req; } + /* Attach this iocb to its lio */ + req->ki_lio = lio; + ret = aio_setup_iocb(req); if (ret) @@ -1739,6 +1797,8 @@ asmlinkage long sys_io_submit(aio_contex struct iocb __user * __user *iocbpp) { struct kioctx *ctx; + struct lio_event *lio = NULL; + int lio_wait = 0; long ret = 0; int i; @@ -1772,11 +1832,66 @@ asmlinkage long sys_io_submit(aio_contex break; } - ret = io_submit_one(ctx, user_iocb, &tmp); - if (ret) - break; + if (tmp.aio_lio_opcode == IOCB_CMD_GROUP) { + + /* this command means that all following IO commands + * are in the same group. + * + * Userspace either wants to be notified upon or block until + * completion of all the requests in the group. + */ + /* + * Ignore an IOCB_CMD_GROUP request if we are already + * processing one. This means only one listio per + * io_submit call. + */ + if (lio) + continue; + + lio = lio_create((struct sigevent __user *)(unsigned long) + tmp.aio_sigeventp); + + ret = PTR_ERR(lio); + + if (IS_ERR(lio)) + goto out_put_ctx; + + if (!tmp.aio_sigeventp) + lio_wait = 1; + } else { + if (lio) + atomic_inc(&lio->lio_users); + + ret = io_submit_one(ctx, user_iocb, &tmp, lio); + + if (ret) { + if (lio) { + /* + * If a request failed, just decrement + * the users count, but go on submitting + * subsequent requests. + */ + atomic_dec(&lio->lio_users); + } else + break; + } + } + } + + if (lio) { + /* + * Drop extra ref on the lio now that we're done submitting + * requests + */ + lio_check(lio); + + if (lio_wait) { + wait_event(ctx->wait, atomic_read(&lio->lio_users)==0); + kfree(lio); + } } +out_put_ctx: put_ioctx(ctx); return i ? i : ret; } Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17 11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17 11:21:31.000000000 +0100 @@ -43,6 +43,7 @@ enum { IOCB_CMD_NOOP = 6, IOCB_CMD_PREADV = 7, IOCB_CMD_PWRITEV = 8, + IOCB_CMD_GROUP = 9, }; /* read() from /dev/aio returns these structures. */ Index: linux-2.6.19-rc5-mm2/include/linux/aio.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17 11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h 2006-11-17 11:21:31.000000000 +0100 @@ -58,6 +58,11 @@ struct aio_notify { struct sigqueue *sigq; }; +struct lio_event { + atomic_t lio_users; + struct aio_notify lio_notify; +}; + /* is there a better place to document function pointer methods? */ /** * ki_retry - iocb forward progress callback @@ -113,6 +118,9 @@ struct kiocb { wait_queue_t ki_wait; loff_t ki_pos; + /* lio this iocb might be attached to */ + struct lio_event *ki_lio; + void *private; /* State that we remember to be able to restart/retry */ unsigned short ki_opcode; @@ -220,12 +228,13 @@ struct mm_struct; extern void FASTCALL(exit_aio(struct mm_struct *mm)); extern struct kioctx *lookup_ioctx(unsigned long ctx_id); extern int FASTCALL(io_submit_one(struct kioctx *ctx, - struct iocb __user *user_iocb, struct iocb *iocb)); + struct iocb __user *user_iocb, struct iocb *iocb, + struct lio_event *lio)); /* semi private, but used by the 32bit emulations: */ struct kioctx *lookup_ioctx(unsigned long ctx_id); int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - struct iocb *iocb)); + struct iocb *iocb, struct lio_event *lio)); #define get_ioctx(kioctx) do { \ BUG_ON(atomic_read(&(kioctx)->users) <= 0); \ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support 2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué @ 2006-11-21 10:35 ` Suparna Bhattacharya 2006-11-27 13:39 ` Bharata B Rao 1 sibling, 0 replies; 26+ messages in thread From: Suparna Bhattacharya @ 2006-11-21 10:35 UTC (permalink / raw) To: =?iso-8859-1?Q?S=E9bastien_Dugu=E9_=3Csebastien=2Edugue=40bull=2Enet?=.=?iso-8859-1?Q?=3E?= Cc: linux-kernel, linux-aio, Andrew Morton, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper Sebestian, Thanks for taking this forward ! I assume that you have addressed all the comments (including missing counts in cancel path etc), except for the interface debate (current vs syscall vs sys_io_wait_for_kiocb), is that correct ? Zach, You had suggested a couple of alternative interfaces. If we go the separate syscall way, there are a couple of things to figure out/decide: (1) Do we want to keep the LIO_WAIT semantics in the kernel, or should we try to expose the lio to user-space for an explicit wait to be issued ? (2) We would have to address compat alternatives for the new syscall as well, hopefully it should be possible to structure the code to reuse most of the logic. If we go the way of associating the array of iocbs as a parameter to IOCB_CMD_GROUP, and make LIO_WAIT happen through a sys_io_wait for one iocb on the group iocb, then (1) Compat handling could get a little more complicated (2) We have to think through the semantics of the completion path in various situations. Will it be possible to cancel the whole group in one go ? Regards Suparna On Mon, Nov 20, 2006 at 03:23:07PM +0100, Sébastien Dugué wrote: > > POSIX listio support > > This patch adds POSIX listio completion notification support. It builds > on support provided by the aio signal notification patch and adds an > IOCB_CMD_GROUP command to io_submit(). > > The purpose of IOCB_CMD_GROUP is to group together the following requests in > the list up to the end of the list sumbitted to io_submit. > > As io_submit already accepts an array of iocbs, as part of listio submission, > the user process prepends to a list of requests an empty special aiocb with > an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields. > > > An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h > > A struct lio_event is added in include/linux/aio.h > > A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h > > In io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb, an > lio_event is created in lio_create() which contains the necessary information > for signaling a thread (signal number, pid, notify type and value) along with > a count of requests attached to this event. > > The following depicts the lio_event structure: > > struct lio_event { > atomic_t lio_users; > struct aio_notify lio_notify; > }; > > lio_users holds an atomic counter of the number of requests attached to this > lio. It is incremented with each request submitted and decremented at each > request completion. When the counter reaches 0, we send the notification. > > Each subsequent submitted request is attached to this lio_event by setting > the request kiocb->ki_lio to that lio_event (in io_submit_one()) and > incrementing the lio_users count. > > In aio_complete(), if the request is attached to an lio (ki_lio <> 0), > then lio_check() is called to decrement the lio_users count and eventually > signal the user process when all the requests in the group have completed. > > > The IOCB_CMD_GROUP semantic is as follows: > > - if the associated sigevent is NULL then we want to group > requests for the purpose of blocking on the group completion > (LIO_WAIT sync behavior). > > - if the associated sigevent is valid (not NULL) then we want to > group requests for the purpose of being notified upon that > group of requests completion (LIO_NOWAIT async behaviour). > > > > > fs/aio.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++-- > include/linux/aio.h | 13 ++++- > include/linux/aio_abi.h | 1 > 3 files changed, 131 insertions(+), 6 deletions(-) > > > Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net> > Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> > > Index: linux-2.6.19-rc5-mm2/fs/aio.c > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/fs/aio.c 2006-11-17 11:20:32.000000000 > +0100 +++ linux-2.6.19-rc5-mm2/fs/aio.c 2006-11-17 16:22:00.000000000 > +0100 @@ -414,6 +414,7 @@ static struct kiocb fastcall *__aio_get_ > req->ki_cancel = NULL; > req->ki_retry = NULL; > req->ki_dtor = NULL; > + req->ki_lio = NULL; > req->private = NULL; > req->ki_iovec = NULL; > req->ki_notify.sigq = NULL; > @@ -1010,6 +1011,53 @@ out_unlock: > return -EINVAL; > } > > +static inline void lio_check(struct lio_event *lio) > +{ > + int ret; > + > + ret = atomic_dec_and_test(&lio->lio_users); > + > + if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) { > + /* last one -> notify process */ > + aio_send_signal(&lio->lio_notify); > + kfree(lio); > + } > +} > + > +static struct lio_event *lio_create(struct sigevent __user *user_event) > +{ > + int ret = 0; > + struct lio_event *lio = NULL; > + > + lio = kzalloc(sizeof(*lio), GFP_KERNEL); > + > + if (!lio) > + return ERR_PTR(-EAGAIN); > + > + /* > + * Grab an initial ref on the lio to avoid races between > + * submission and completion. > + */ > + atomic_set(&lio->lio_users, 1); > + > + lio->lio_notify.notify = SIGEV_NONE; > + > + if (user_event) { > + /* > + * User specified an event for this lio, > + * he wants to be notified upon lio completion. > + */ > + ret = aio_setup_sigevent(&lio->lio_notify, user_event); > + > + if (ret) { > + kfree(lio); > + return ERR_PTR(ret); > + } > + } > + > + return lio; > +} > + > /* aio_complete > * Called when the io request on the given iocb is complete. > * Returns true if this is the last user of the request. The > @@ -1058,8 +1106,12 @@ int fastcall aio_complete(struct kiocb * > * when the event got cancelled. > */ > if (kiocbIsCancelled(iocb)) { > + if (iocb->ki_lio) > + lio_check(iocb->ki_lio); > + > if (iocb->ki_notify.sigq) > sigqueue_free(iocb->ki_notify.sigq); > + > goto put_rq; > } > > @@ -1100,6 +1152,9 @@ int fastcall aio_complete(struct kiocb * > sigqueue_free(iocb->ki_notify.sigq); > } > > + if (iocb->ki_lio) > + lio_check(iocb->ki_lio); > + > pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried, > iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes); > put_rq: > @@ -1634,7 +1689,7 @@ static int aio_wake_function(wait_queue_ > } > > int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, > - struct iocb *iocb) > + struct iocb *iocb, struct lio_event *lio) > { > struct kiocb *req; > struct file *file; > @@ -1696,6 +1751,9 @@ int fastcall io_submit_one(struct kioctx > goto out_put_req; > } > > + /* Attach this iocb to its lio */ > + req->ki_lio = lio; > + > ret = aio_setup_iocb(req); > > if (ret) > @@ -1739,6 +1797,8 @@ asmlinkage long sys_io_submit(aio_contex > struct iocb __user * __user *iocbpp) > { > struct kioctx *ctx; > + struct lio_event *lio = NULL; > + int lio_wait = 0; > long ret = 0; > int i; > > @@ -1772,11 +1832,66 @@ asmlinkage long sys_io_submit(aio_contex > break; > } > > - ret = io_submit_one(ctx, user_iocb, &tmp); > - if (ret) > - break; > + if (tmp.aio_lio_opcode == IOCB_CMD_GROUP) { > + > + /* this command means that all following IO commands > + * are in the same group. > + * > + * Userspace either wants to be notified upon or block > until > + * completion of all the requests in the group. > + */ > + /* > + * Ignore an IOCB_CMD_GROUP request if we are already > + * processing one. This means only one listio per > + * io_submit call. > + */ > + if (lio) > + continue; > + > + lio = lio_create((struct sigevent __user *)(unsigned > long) > + tmp.aio_sigeventp); > + > + ret = PTR_ERR(lio); > + > + if (IS_ERR(lio)) > + goto out_put_ctx; > + > + if (!tmp.aio_sigeventp) > + lio_wait = 1; > + } else { > + if (lio) > + atomic_inc(&lio->lio_users); > + > + ret = io_submit_one(ctx, user_iocb, &tmp, lio); > + > + if (ret) { > + if (lio) { > + /* > + * If a request failed, just decrement > + * the users count, but go on > submitting > + * subsequent requests. > + */ > + atomic_dec(&lio->lio_users); > + } else > + break; > + } > + } > + } > + > + if (lio) { > + /* > + * Drop extra ref on the lio now that we're done submitting > + * requests > + */ > + lio_check(lio); > + > + if (lio_wait) { > + wait_event(ctx->wait, atomic_read(&lio->lio_users)==0); > + kfree(lio); > + } > } > > +out_put_ctx: > put_ioctx(ctx); > return i ? i : ret; > } > Index: linux-2.6.19-rc5-mm2/include/linux/aio_abi.h > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/include/linux/aio_abi.h 2006-11-17 > 11:20:32.000000000 +0100 +++ > linux-2.6.19-rc5-mm2/include/linux/aio_abi.h 2006-11-17 > 11:21:31.000000000 +0100 @@ -43,6 +43,7 @@ enum { IOCB_CMD_NOOP = 6, > IOCB_CMD_PREADV = 7, > IOCB_CMD_PWRITEV = 8, > + IOCB_CMD_GROUP = 9, > }; > > /* read() from /dev/aio returns these structures. */ > Index: linux-2.6.19-rc5-mm2/include/linux/aio.h > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/include/linux/aio.h 2006-11-17 > 11:20:32.000000000 +0100 +++ linux-2.6.19-rc5-mm2/include/linux/aio.h > 2006-11-17 11:21:31.000000000 +0100 @@ -58,6 +58,11 @@ struct aio_notify { > struct sigqueue *sigq; > }; > > +struct lio_event { > + atomic_t lio_users; > + struct aio_notify lio_notify; > +}; > + > /* is there a better place to document function pointer methods? */ > /** > * ki_retry - iocb forward progress callback > @@ -113,6 +118,9 @@ struct kiocb { > wait_queue_t ki_wait; > loff_t ki_pos; > > + /* lio this iocb might be attached to */ > + struct lio_event *ki_lio; > + > void *private; > /* State that we remember to be able to restart/retry */ > unsigned short ki_opcode; > @@ -220,12 +228,13 @@ struct mm_struct; > extern void FASTCALL(exit_aio(struct mm_struct *mm)); > extern struct kioctx *lookup_ioctx(unsigned long ctx_id); > extern int FASTCALL(io_submit_one(struct kioctx *ctx, > - struct iocb __user *user_iocb, struct iocb *iocb)); > + struct iocb __user *user_iocb, struct iocb > *iocb, > + struct lio_event *lio)); > > /* semi private, but used by the 32bit emulations: */ > struct kioctx *lookup_ioctx(unsigned long ctx_id); > int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, > - struct iocb *iocb)); > + struct iocb *iocb, struct lio_event *lio)); > > #define get_ioctx(kioctx) do { \ > BUG_ON(atomic_read(&(kioctx)->users) <= 0); \ -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH -mm 4/4][AIO] - Listio support 2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué 2006-11-21 10:35 ` Suparna Bhattacharya @ 2006-11-27 13:39 ` Bharata B Rao 1 sibling, 0 replies; 26+ messages in thread From: Bharata B Rao @ 2006-11-27 13:39 UTC (permalink / raw) To: Sébastien Dugué Cc: linux-kernel, linux-aio, Andrew Morton, Suparna Bhattacharya, Christoph Hellwig, Zach Brown, Badari Pulavarty, Jean Pierre Dion, Ulrich Drepper [-- Attachment #1: Type: text/plain, Size: 794 bytes --] On 11/20/06, Sébastien Dugué <sebastien.dugue@bull.net> wrote: > > POSIX listio support > > This patch adds POSIX listio completion notification support. It builds > on support provided by the aio signal notification patch and adds an > IOCB_CMD_GROUP command to io_submit(). > I have altered this patch to provide the listio support through a separate syscall. The code is in RFC stage at the moment and is only compile-tested on i386. I have refrained from modifying unistd.h for now in order to avoid touching too many files. I plan to do actual tests (which involves user space changes to aio libraries) only if there is an agreement with this syscall approach. More details about the interface are in the patch itself. Regards, Bharata. [-- Attachment #2: aio-listio-support.patch --] [-- Type: text/x-patch, Size: 12055 bytes --] This patch adds POSIX listio completion notification support. It builds on support provided by the aio signal notification patch and adds a system call lio_submit(). lio_submit() is similar to io_submit() except that it takes two more argurments: mode and sigevent. mode can be LIO_WAIT or LIO_NOWAIT. sigevent argument specificies the sigevent notification mechanism on listio completion. lio_submit() waits within the syscall if LIO_WAIT is specified. A struct lio_event is added in include/linux/aio.h A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h In lio_submit(), a lio_event is created in lio_create() which contains the necessary information for signaling a thread (signal number, pid, notify type and value) along with a count of requests attached to this event. The following depicts the lio_event structure: struct lio_event { atomic_t lio_users; struct aio_notify lio_notify; }; lio_users holds an atomic counter of the number of requests attached to this lio. It is incremented with each request submitted and decremented at each request completion. When the counter reaches 0, we send the notification. In aio_complete(), if the request is attached to an lio (ki_lio <> 0), then lio_check() is called to decrement the lio_users count and eventually signal the user process when all the requests in the group have completed. Sebastien Dugue's listio patch has been modified to arrive at this patch. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Bharata B Rao <bharata.rao@gmail.com> --- fs/aio.c | 181 +++++++++++++++++++++++++++++++++++++++++------ fs/compat.c | 29 +++++++ include/linux/aio.h | 13 ++- include/linux/aio_abi.h | 5 + include/linux/syscalls.h | 2 5 files changed, 204 insertions(+), 26 deletions(-) diff -puN fs/aio.c~aio-listio-support fs/aio.c --- linux-2.6.19-rc5-mm2/fs/aio.c~aio-listio-support 2006-11-23 19:05:33.000000000 +0530 +++ linux-2.6.19-rc5-mm2-bharata/fs/aio.c 2006-11-25 11:24:55.000000000 +0530 @@ -413,6 +413,7 @@ static struct kiocb fastcall *__aio_get_ req->ki_ctx = ctx; req->ki_cancel = NULL; req->ki_retry = NULL; + req->ki_lio = NULL; req->ki_dtor = NULL; req->private = NULL; req->ki_iovec = NULL; @@ -1010,6 +1011,59 @@ out_unlock: return -EINVAL; } +static inline void lio_check(struct lio_event *lio) +{ + int ret; + + ret = atomic_dec_and_test(&lio->lio_users); + + if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) { + /* last one -> notify process */ + if (aio_send_signal(&lio->lio_notify)) + sigqueue_free(lio->lio_notify.sigq); + kfree(lio); + } +} + +static struct lio_event *lio_create(struct sigevent __user *user_event, + int mode) +{ + int ret = 0; + struct lio_event *lio = NULL; + + if (unlikely((mode == LIO_NOWAIT) && !user_event)) + return lio; + + lio = kzalloc(sizeof(*lio), GFP_KERNEL); + + if (!lio) + return ERR_PTR(-EAGAIN); + + /* + * Grab an initial ref on the lio to avoid races between + * submission and completion. + */ + atomic_set(&lio->lio_users, 1); + + lio->lio_notify.notify = SIGEV_NONE; + + /* sigevent argument is ignored with LIO_WAIT */ + if (user_event && (mode == LIO_NOWAIT)) { + /* + * User specified an event for this lio, + * he wants to be notified upon lio completion. + */ + ret = aio_setup_sigevent(&lio->lio_notify, user_event); + + if (ret) { + kfree(lio); + return ERR_PTR(ret); + } + } + + return lio; +} + /* aio_complete * Called when the io request on the given iocb is complete. * Returns true if this is the last user of the request. The @@ -1058,6 +1112,8 @@ int fastcall aio_complete(struct kiocb * * when the event got cancelled. */ if (kiocbIsCancelled(iocb)) { + if (iocb->ki_lio) + lio_check(iocb->ki_lio); if (iocb->ki_notify.sigq) sigqueue_free(iocb->ki_notify.sigq); goto put_rq; @@ -1100,6 +1156,14 @@ int fastcall aio_complete(struct kiocb * sigqueue_free(iocb->ki_notify.sigq); } + /* + * In case of listio, in addition to the optional per-iocb sigevent + * notification (as above), the listio as a whole can also generate + * a sigevent notification. + */ + if (iocb->ki_lio) + lio_check(iocb->ki_lio); + pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried, iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes); put_rq: @@ -1634,7 +1698,7 @@ static int aio_wake_function(wait_queue_ } int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - struct iocb *iocb) + struct iocb *iocb, struct lio_event *lio) { struct kiocb *req; struct file *file; @@ -1690,12 +1754,13 @@ int fastcall io_submit_one(struct kioctx if (iocb->aio_sigeventp) { ret = aio_setup_sigevent(&req->ki_notify, - (struct sigevent __user *)(unsigned long) - iocb->aio_sigeventp); + (struct sigevent __user *)(unsigned long) + iocb->aio_sigeventp); if (ret) goto out_put_req; } + req->ki_lio = lio; ret = aio_setup_iocb(req); if (ret) @@ -1723,6 +1788,48 @@ out_put_req: return ret; } +static int io_submit_group(struct kioctx *ctx, long nr, + struct iocb __user * __user *iocbpp, struct lio_event *lio) +{ + int i; + long ret = 0; + + /* + * AKPM: should this return a partial result if some of the IOs were + * successfully submitted? + */ + for (i = 0; i < nr; i++) { + struct iocb __user *user_iocb; + struct iocb tmp; + + if (unlikely(__get_user(user_iocb, iocbpp + i))) { + ret = -EFAULT; + break; + } + + if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) { + ret = -EFAULT; + break; + } + + if (lio) + atomic_inc(&lio->lio_users); + + ret = io_submit_one(ctx, user_iocb, &tmp, lio); + if (ret) { + if (lio) { + /* + * In case of listio, continue with + * the subsequent requests + */ + atomic_dec(&lio->lio_users); + } else + break; + } + } + return i ? i : ret; +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context @@ -1740,7 +1847,6 @@ asmlinkage long sys_io_submit(aio_contex { struct kioctx *ctx; long ret = 0; - int i; if (unlikely(nr < 0)) return -EINVAL; @@ -1754,31 +1860,60 @@ asmlinkage long sys_io_submit(aio_contex return -EINVAL; } - /* - * AKPM: should this return a partial result if some of the IOs were - * successfully submitted? - */ - for (i=0; i<nr; i++) { - struct iocb __user *user_iocb; - struct iocb tmp; + ret = io_submit_group(ctx, nr, iocbpp, NULL); - if (unlikely(__get_user(user_iocb, iocbpp + i))) { - ret = -EFAULT; - break; - } + put_ioctx(ctx); + return ret; +} - if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) { - ret = -EFAULT; - break; - } +asmlinkage long sys_lio_submit(aio_context_t ctx_id, int mode, long nr, + struct iocb __user * __user *iocbpp, struct sigevent __user *event) +{ + struct kioctx *ctx; + struct lio_event *lio = NULL; + long ret = 0; - ret = io_submit_one(ctx, user_iocb, &tmp); - if (ret) - break; + if (unlikely(nr < 0)) + return -EINVAL; + + if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp))))) + return -EFAULT; + + ctx = lookup_ioctx(ctx_id); + if (unlikely(!ctx)) { + pr_debug("EINVAL: lio_submit: invalid context id\n"); + return -EINVAL; + } + + lio = lio_create(event, mode); + + ret = PTR_ERR(lio); + if (IS_ERR(lio)) + goto out_put_ctx; + + ret = io_submit_group(ctx, nr, iocbpp, lio); + + /* If we failed to submit even one request just return */ + if (ret < 0 ) { + if (lio) + kfree(lio); + goto out_put_ctx; + } + + /* + * Drop extra ref on the lio now that we're done submitting requests. + */ + if (lio) + lio_check(lio); + + if (mode == LIO_WAIT) { + wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0); + kfree(lio); } +out_put_ctx: put_ioctx(ctx); - return i ? i : ret; + return ret; } /* lookup_kiocb diff -puN include/linux/aio_abi.h~aio-listio-support include/linux/aio_abi.h --- linux-2.6.19-rc5-mm2/include/linux/aio_abi.h~aio-listio-support 2006-11-23 21:56:14.000000000 +0530 +++ linux-2.6.19-rc5-mm2-bharata/include/linux/aio_abi.h 2006-11-23 21:56:55.000000000 +0530 @@ -45,6 +45,11 @@ enum { IOCB_CMD_PWRITEV = 8, }; +enum { + LIO_WAIT = 0, + LIO_NOWAIT = 1, +}; + /* read() from /dev/aio returns these structures. */ struct io_event { __u64 data; /* the data field from the iocb */ diff -puN include/linux/aio.h~aio-listio-support include/linux/aio.h --- linux-2.6.19-rc5-mm2/include/linux/aio.h~aio-listio-support 2006-11-23 21:57:23.000000000 +0530 +++ linux-2.6.19-rc5-mm2-bharata/include/linux/aio.h 2006-11-23 22:01:11.000000000 +0530 @@ -58,6 +58,11 @@ struct aio_notify { struct sigqueue *sigq; }; +struct lio_event { + atomic_t lio_users; + struct aio_notify lio_notify; +}; + /* is there a better place to document function pointer methods? */ /** * ki_retry - iocb forward progress callback @@ -112,7 +117,8 @@ struct kiocb { __u64 ki_user_data; /* user's data for completion */ wait_queue_t ki_wait; loff_t ki_pos; - + /* lio this iocb might be attached to */ + struct lio_event *ki_lio; void *private; /* State that we remember to be able to restart/retry */ unsigned short ki_opcode; @@ -220,12 +226,13 @@ struct mm_struct; extern void FASTCALL(exit_aio(struct mm_struct *mm)); extern struct kioctx *lookup_ioctx(unsigned long ctx_id); extern int FASTCALL(io_submit_one(struct kioctx *ctx, - struct iocb __user *user_iocb, struct iocb *iocb)); + struct iocb __user *user_iocb, struct iocb *iocb, + struct lio_event *lio)); /* semi private, but used by the 32bit emulations: */ struct kioctx *lookup_ioctx(unsigned long ctx_id); int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - struct iocb *iocb)); + struct iocb *iocb, struct lio_event *lio)); #define get_ioctx(kioctx) do { \ BUG_ON(atomic_read(&(kioctx)->users) <= 0); \ diff -puN fs/compat.c~aio-listio-support fs/compat.c --- linux-2.6.19-rc5-mm2/fs/compat.c~aio-listio-support 2006-11-24 08:52:42.000000000 +0530 +++ linux-2.6.19-rc5-mm2-bharata/fs/compat.c 2006-11-24 09:53:56.000000000 +0530 @@ -678,6 +678,35 @@ compat_sys_io_submit(aio_context_t ctx_i return ret; } +asmlinkage long +compat_sys_lio_submit(aio_context_t ctx_id, int mode, int nr, u32 __user *iocb, + struct compat_sigevent __user *sig_user) +{ + struct iocb __user * __user *iocb64; + struct sigvent __user *event = NULL; + long ret; + + if (unlikely(nr < 0)) + return -EINVAL; + + if (nr > MAX_AIO_SUBMITS) + nr = MAX_AIO_SUBMITS; + + iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64)); + ret = copy_iocb(nr, iocb, iocb64); + if (ret) + return ret; + + if (sig_user) { + struct sigevent kevent; + event = compat_alloc_user_space(sizeof(struct sigevent)); + if (get_compat_sigevent(&kevent, sig_user) || + copy_to_user(event, &kevent, sizeof(struct sigevent))) + return -EFAULT; + } + return sys_lio_submit(ctx_id, mode, nr, iocb64, event); +} + struct compat_ncp_mount_data { compat_int_t version; compat_uint_t ncp_fd; diff -puN include/linux/syscalls.h~aio-listio-support include/linux/syscalls.h --- linux-2.6.19-rc5-mm2/include/linux/syscalls.h~aio-listio-support 2006-11-24 09:42:44.000000000 +0530 +++ linux-2.6.19-rc5-mm2-bharata/include/linux/syscalls.h 2006-11-24 09:43:42.000000000 +0530 @@ -319,6 +319,8 @@ asmlinkage long sys_io_getevents(aio_con struct timespec __user *timeout); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); +asmlinkage long sys_lio_submit(aio_context_t, int, long, + struct iocb __user * __user *, struct sigevent __user *); asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb, struct io_event __user *result); asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd, _ ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-11-28 20:31 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-20 14:17 [PATCH -mm 0/4][AIO] - AIO completion signal notification v2 Sébastien Dugué
2006-11-20 14:22 ` [PATCH -mm 1/4][AIO] - fix aio.h includes Sébastien Dugué
2006-11-20 13:31 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 2/4][AIO] - export good_sigevent() Sébastien Dugué
2006-11-20 13:42 ` Zach Brown
2006-11-20 21:45 ` Ulrich Drepper
2006-11-20 14:02 ` Zach Brown
2006-11-20 14:22 ` [PATCH -mm 3/4][AIO] - AIO completion signal notification Sébastien Dugué
2006-11-20 15:13 ` Zach Brown
2006-11-21 10:40 ` Sébastien Dugué
[not found] ` <20061122104055.3d1c029a@frecb000686>
2006-11-22 10:22 ` Zach Brown
2006-11-23 8:24 ` Sébastien Dugué
2006-11-22 1:02 ` Andrew Morton
2006-11-23 8:28 ` Sébastien Dugué
2006-11-23 8:40 ` Andrew Morton
2006-11-23 9:47 ` Sébastien Dugué
2006-11-23 10:14 ` Andrew Morton
2006-11-23 10:27 ` Sébastien Dugué
2006-11-23 10:57 ` [PATCH] fs : reorder some 'struct inode' fields to speedup i_size manipulations Eric Dumazet
2006-11-27 21:37 ` Andrew Morton
2006-11-27 21:52 ` Eric Dumazet
2006-11-27 22:01 ` Andrew Morton
2006-11-28 20:31 ` Eric Dumazet
2006-11-20 14:23 ` [PATCH -mm 4/4][AIO] - Listio support Sébastien Dugué
2006-11-21 10:35 ` Suparna Bhattacharya
2006-11-27 13:39 ` Bharata B Rao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox