* [PATCH] eventfd signal race in aio_complete() @ 2008-03-08 2:32 Jeff Roberson 2008-03-08 4:29 ` Davide Libenzi 0 siblings, 1 reply; 6+ messages in thread From: Jeff Roberson @ 2008-03-08 2:32 UTC (permalink / raw) To: linux-kernel; +Cc: riel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1440 bytes --] Hello, I have an application that makes use of eventfd to merge socket and aio blocking with epoll in one thread. Under heavy loads the application sometimes hangs when we receive notification from epoll that the eventfd has an event ready but reading the aio completions produces no results. Further investigation revealed that the aiocb was later ready with no new event and completing it based on a timer resolved the application hang. This pointed to the eventfd being signaled prematurely and I verified that this was indeed the problem. aio_complete() calls eventfd_signal() before the event is actually placed on the completion ring. On a multi-processor system it is possible to read the event from epoll and return to userspace before aio_complete() finishes. The enclosed patch simply moves the signaling to the bottom of the function. I'm not 100% familiar with this code and it looks like it may be possible to have spurious wakeups now but there will be no missed wakeups. An application may also race the other way now and receive aio completion before the signal, thus still leaving it with a signal with no completion. signaling while the kioctx is locked would resolve this but I was hesitant to introduce further nesting of spinlocks that might have another order elsewhere. Please keep me in the cc line for any necessary replies. Thanks, Jeff Signed-off-by: Jeff Roberson <jeff@freebsd.org> [-- Attachment #2: Type: TEXT/x-diff, Size: 892 bytes --] --- aio.c.orig 2008-03-08 00:23:50.000000000 +0000 +++ aio.c 2008-03-08 00:24:32.000000000 +0000 @@ -946,14 +946,6 @@ int fastcall aio_complete(struct kiocb * return 1; } - /* - * Check if the user asked us to deliver the result through an - * eventfd. The eventfd_signal() function is safe to be called - * from IRQ context. - */ - if (!IS_ERR(iocb->ki_eventfd)) - eventfd_signal(iocb->ki_eventfd, 1); - info = &ctx->ring_info; /* add a completion event to the ring buffer. @@ -1010,6 +1002,15 @@ put_rq: wake_up(&ctx->wait); spin_unlock_irqrestore(&ctx->ctx_lock, flags); + + /* + * Check if the user asked us to deliver the result through an + * eventfd. The eventfd_signal() function is safe to be called + * from IRQ context. + */ + if (!IS_ERR(iocb->ki_eventfd)) + eventfd_signal(iocb->ki_eventfd, 1); + return ret; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] eventfd signal race in aio_complete() 2008-03-08 2:32 [PATCH] eventfd signal race in aio_complete() Jeff Roberson @ 2008-03-08 4:29 ` Davide Libenzi 2008-03-08 15:23 ` Rik van Riel 0 siblings, 1 reply; 6+ messages in thread From: Davide Libenzi @ 2008-03-08 4:29 UTC (permalink / raw) To: Jeff Roberson; +Cc: Linux Kernel Mailing List, riel, Zach Brown [cc-ing zab] On Fri, 7 Mar 2008, Jeff Roberson wrote: > Hello, Hi! > I have an application that makes use of eventfd to merge socket and aio > blocking with epoll in one thread. Under heavy loads the application > sometimes hangs when we receive notification from epoll that the eventfd has > an event ready but reading the aio completions produces no results. Further > investigation revealed that the aiocb was later ready with no new event and > completing it based on a timer resolved the application hang. > > This pointed to the eventfd being signaled prematurely and I verified that > this was indeed the problem. aio_complete() calls eventfd_signal() before the > event is actually placed on the completion ring. On a multi-processor system > it is possible to read the event from epoll and return to userspace before > aio_complete() finishes. > > The enclosed patch simply moves the signaling to the bottom of the function. > I'm not 100% familiar with this code and it looks like it may be possible to > have spurious wakeups now but there will be no missed wakeups. An application > may also race the other way now and receive aio completion before the signal, > thus still leaving it with a signal with no completion. signaling while the > kioctx is locked would resolve this but I was hesitant to introduce further > nesting of spinlocks that might have another order elsewhere. Your patch access the iocb after the __aio_put_req() call, that can make the iocb (and the reference to the ki_eventfd) to become invalid. It also has the spurious wakeup issue (not a biggie, but if it can be avoided). There're two solutions AFAICS. The first solution/patch get a reference to the file*, and signal (if the event has really been dropped inside the ring) and release. The second solution/patch simply moves the eventfd_signal() call before the __aio_put_req() call, but after the event has beed "ringed". We should be clear to go with the shorter/nicer second solution. Those patches builds, but I'm not even signing them off till I tested them. - Davide --- fs/aio.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) Index: linux-2.6.mod/fs/aio.c =================================================================== --- linux-2.6.mod.orig/fs/aio.c 2008-03-07 19:33:44.000000000 -0800 +++ linux-2.6.mod/fs/aio.c 2008-03-07 19:45:50.000000000 -0800 @@ -916,7 +916,8 @@ struct kioctx *ctx = iocb->ki_ctx; struct aio_ring_info *info; struct aio_ring *ring; - struct io_event *event; + struct io_event *event = NULL; + struct file *file = NULL; unsigned long flags; unsigned long tail; int ret; @@ -937,12 +938,15 @@ } /* - * Check if the user asked us to deliver the result through an - * eventfd. The eventfd_signal() function is safe to be called - * from IRQ context. + * Get a reference now, but do not deliver the event until + * we're sure we actually dropped it inside the ring. We + * need to get a reference before calling __aio_put_req(), + * since the ->ki_eventfd may become invalid after such call. */ - if (!IS_ERR(iocb->ki_eventfd)) - eventfd_signal(iocb->ki_eventfd, 1); + if (!IS_ERR(iocb->ki_eventfd)) { + file = iocb->ki_eventfd; + get_file(file); + } info = &ctx->ring_info; @@ -1000,6 +1004,19 @@ wake_up(&ctx->wait); spin_unlock_irqrestore(&ctx->ctx_lock, flags); + + /* + * If the user requested us to deliver a completion event to an + * eventfd file descriptor *and* we actually delivered the event, + * signal it with eventfd_signal(). The eventfd_signal() function + * is safe to be called from IRQ context. + */ + if (file) { + if (event) + eventfd_signal(file, 1); + fput(file); + } + return ret; } --- fs/aio.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) Index: linux-2.6.mod/fs/aio.c =================================================================== --- linux-2.6.mod.orig/fs/aio.c 2008-03-07 20:14:55.000000000 -0800 +++ linux-2.6.mod/fs/aio.c 2008-03-07 20:15:24.000000000 -0800 @@ -936,14 +936,6 @@ return 1; } - /* - * Check if the user asked us to deliver the result through an - * eventfd. The eventfd_signal() function is safe to be called - * from IRQ context. - */ - if (!IS_ERR(iocb->ki_eventfd)) - eventfd_signal(iocb->ki_eventfd, 1); - info = &ctx->ring_info; /* add a completion event to the ring buffer. @@ -992,6 +984,15 @@ kunmap_atomic(ring, KM_IRQ1); pr_debug("added to ring %p at [%lu]\n", iocb, tail); + + /* + * Check if the user asked us to deliver the result through an + * eventfd. The eventfd_signal() function is safe to be called + * from IRQ context. + */ + if (!IS_ERR(iocb->ki_eventfd)) + eventfd_signal(iocb->ki_eventfd, 1); + put_rq: /* everything turned out well, dispose of the aiocb. */ ret = __aio_put_req(ctx, iocb); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] eventfd signal race in aio_complete() 2008-03-08 4:29 ` Davide Libenzi @ 2008-03-08 15:23 ` Rik van Riel 2008-03-08 20:38 ` Davide Libenzi 0 siblings, 1 reply; 6+ messages in thread From: Rik van Riel @ 2008-03-08 15:23 UTC (permalink / raw) To: Davide Libenzi; +Cc: Jeff Roberson, Linux Kernel Mailing List, Zach Brown On Fri, 7 Mar 2008 20:29:20 -0800 (PST) Davide Libenzi <davidel@xmailserver.org> wrote: > The second solution/patch simply moves the eventfd_signal() call before > the __aio_put_req() call, but after the event has beed "ringed". > We should be clear to go with the shorter/nicer second solution. Those > patches builds, but I'm not even signing them off till I tested them. If there are no spinlock ordering issues between &ctx->ctx_lock and &ctx->wqh.lock (taken inside eventfd_signal), then the second patch is indeed preferable. Jeff and I did look at that briefly last night, but were not familiar enough with the code to decide whether or not that was safe. -- All rights reversed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] eventfd signal race in aio_complete() 2008-03-08 15:23 ` Rik van Riel @ 2008-03-08 20:38 ` Davide Libenzi 2008-03-08 21:38 ` Jeff Roberson 0 siblings, 1 reply; 6+ messages in thread From: Davide Libenzi @ 2008-03-08 20:38 UTC (permalink / raw) To: Rik van Riel; +Cc: Jeff Roberson, Linux Kernel Mailing List, Zach Brown On Sat, 8 Mar 2008, Rik van Riel wrote: > On Fri, 7 Mar 2008 20:29:20 -0800 (PST) > Davide Libenzi <davidel@xmailserver.org> wrote: > > > The second solution/patch simply moves the eventfd_signal() call before > > the __aio_put_req() call, but after the event has beed "ringed". > > We should be clear to go with the shorter/nicer second solution. Those > > patches builds, but I'm not even signing them off till I tested them. > > If there are no spinlock ordering issues between &ctx->ctx_lock > and &ctx->wqh.lock (taken inside eventfd_signal), then the second > patch is indeed preferable. > > Jeff and I did look at that briefly last night, but were not > familiar enough with the code to decide whether or not that was > safe. There's no interlocking between the two, so let's go with #2. Jeff, would you mind giving patch #2 a spin in your test suite? - Davide ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] eventfd signal race in aio_complete() 2008-03-08 20:38 ` Davide Libenzi @ 2008-03-08 21:38 ` Jeff Roberson 2008-03-08 21:55 ` Davide Libenzi 0 siblings, 1 reply; 6+ messages in thread From: Jeff Roberson @ 2008-03-08 21:38 UTC (permalink / raw) To: Davide Libenzi; +Cc: Rik van Riel, Linux Kernel Mailing List, Zach Brown On Sat, 8 Mar 2008, Davide Libenzi wrote: > On Sat, 8 Mar 2008, Rik van Riel wrote: > >> On Fri, 7 Mar 2008 20:29:20 -0800 (PST) >> Davide Libenzi <davidel@xmailserver.org> wrote: >> >>> The second solution/patch simply moves the eventfd_signal() call before >>> the __aio_put_req() call, but after the event has beed "ringed". >>> We should be clear to go with the shorter/nicer second solution. Those >>> patches builds, but I'm not even signing them off till I tested them. >> >> If there are no spinlock ordering issues between &ctx->ctx_lock >> and &ctx->wqh.lock (taken inside eventfd_signal), then the second >> patch is indeed preferable. >> >> Jeff and I did look at that briefly last night, but were not >> familiar enough with the code to decide whether or not that was >> safe. > > There's no interlocking between the two, so let's go with #2. > Jeff, would you mind giving patch #2 a spin in your test suite? I agree #2 would be best as well. It may take me a few days due to some equipment issues but I will get back to you as soon as I can. Thanks, Jeff > > > > - Davide > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] eventfd signal race in aio_complete() 2008-03-08 21:38 ` Jeff Roberson @ 2008-03-08 21:55 ` Davide Libenzi 0 siblings, 0 replies; 6+ messages in thread From: Davide Libenzi @ 2008-03-08 21:55 UTC (permalink / raw) To: Jeff Roberson; +Cc: Rik van Riel, Linux Kernel Mailing List, Zach Brown On Sat, 8 Mar 2008, Jeff Roberson wrote: > On Sat, 8 Mar 2008, Davide Libenzi wrote: > > > On Sat, 8 Mar 2008, Rik van Riel wrote: > > > > > On Fri, 7 Mar 2008 20:29:20 -0800 (PST) > > > Davide Libenzi <davidel@xmailserver.org> wrote: > > > > > > > The second solution/patch simply moves the eventfd_signal() call before > > > > the __aio_put_req() call, but after the event has beed "ringed". > > > > We should be clear to go with the shorter/nicer second solution. Those > > > > patches builds, but I'm not even signing them off till I tested them. > > > > > > If there are no spinlock ordering issues between &ctx->ctx_lock > > > and &ctx->wqh.lock (taken inside eventfd_signal), then the second > > > patch is indeed preferable. > > > > > > Jeff and I did look at that briefly last night, but were not > > > familiar enough with the code to decide whether or not that was > > > safe. > > > > There's no interlocking between the two, so let's go with #2. > > Jeff, would you mind giving patch #2 a spin in your test suite? > > I agree #2 would be best as well. It may take me a few days due to some > equipment issues but I will get back to you as soon as I can. I've just tested it here, and it's fine. My test case is not as complex as yours though, so I'll wait your feedback before posting. Just remember to ping me back, otherwise I'll forget to post ;) - Davide ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-03-08 21:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-08 2:32 [PATCH] eventfd signal race in aio_complete() Jeff Roberson 2008-03-08 4:29 ` Davide Libenzi 2008-03-08 15:23 ` Rik van Riel 2008-03-08 20:38 ` Davide Libenzi 2008-03-08 21:38 ` Jeff Roberson 2008-03-08 21:55 ` Davide Libenzi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.