From: Eric Dumazet <dada1@cosmosbay.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Benjamin LaHaise <bcrl@kvack.org>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Andrew Morton <akpm@linux-foundation.org>,
linux-aio <linux-aio@kvack.org>,
zach.brown@oracle.com
Subject: Re: [patch] eventfd - remove fput() call from possible IRQ context (2nd rev)
Date: Wed, 18 Mar 2009 15:55:39 +0100 [thread overview]
Message-ID: <49C10B6B.3040108@cosmosbay.com> (raw)
In-Reply-To: <x49wsamhpvg.fsf@segfault.boston.devel.redhat.com>
Jeff Moyer a écrit :
> Davide Libenzi <davidel@xmailserver.org> writes:
>
>> The following patch remove a possible source of fput() call from inside
>> IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call
>> from IRQ context, but conceptually the bug is there.
>
> I've attached a test program which can reproduce the fput call in
> interrupt context. It's a modified version of the eventfd test that
> Rusty wrote for the libaio test harness. I verified that fput was in
> fact being called in interrupt context by using systemtap to print out
> the "thead_indent" of fput calls, and observing a "swapper(0)" in the
> output. After applying your patch, I confirmed that __fput is no longer
> called from interrupt context. Strangely enough, I never did get any
> output from the might_sleep in __fput. I can't explain that.
>
> I have some minor comments inlined below.
>
>> This patch adds an optimization similar to the one we already do on
>> ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty
>> in general, but the alternative here would be to add a brand new delayed
>> fput() infrastructure, that I'm not sure is worth it.
>>
>> On Sun, 15 Mar 2009, Benjamin LaHaise wrote:
>>
>>> This looks reasonably sane, the only concern I have with it is that I think
>>> it logically makes more sense to use the same convention for fi_filp and
>>> ki_eventfd, as the different in IS_ERR vs checking for NULL is a bit
>>> confusing. Aside from that, it looks like it should fix the problem
>>> correctly.
>> Makes sense.
>>
>> Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
>>
>>
>> - Davide
>>
>>
>> ---
>> fs/aio.c | 37 +++++++++++++++++++++++++++----------
>> 1 file changed, 27 insertions(+), 10 deletions(-)
>>
>> Index: linux-2.6.mod/fs/aio.c
>> ===================================================================
>> --- linux-2.6.mod.orig/fs/aio.c 2009-03-14 09:24:12.000000000 -0700
>> +++ linux-2.6.mod/fs/aio.c 2009-03-15 12:54:10.000000000 -0700
> ...
>> @@ -527,12 +528,14 @@ static void aio_fput_routine(struct work
>> */
>> static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
>> {
>> + int schedule_putreq = 0;
>> +
>> dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n",
>> req, atomic_long_read(&req->ki_filp->f_count));
>>
>> assert_spin_locked(&ctx->ctx_lock);
>>
>> - req->ki_users --;
>> + req->ki_users--;
>> BUG_ON(req->ki_users < 0);
>> if (likely(req->ki_users))
>> return 0;
>> @@ -540,10 +543,23 @@ static int __aio_put_req(struct kioctx *
>> req->ki_cancel = NULL;
>> req->ki_retry = NULL;
>>
>> - /* Must be done under the lock to serialise against cancellation.
>> - * Call this aio_fput as it duplicates fput via the fput_work.
>> + /*
>> + * Try to optimize the aio and eventfd file* puts, by avoiding to
>> + * schedule work in case it is not __fput() time. In normal cases,
>> + * we wouldn not be holding the last reference to the file*, so
> ^^^^^^^^^^
> tyop
>
>> + * this function will be executed w/out any aio kthread wakeup.
>> */
>> - if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) {
>> + if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count)))
>> + schedule_putreq++;
>> + else
>> + req->ki_filp = NULL;
>> + if (unlikely(req->ki_eventfd != NULL)) {
>> + if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count)))
>> + schedule_putreq++;
>> + else
>> + req->ki_eventfd = NULL;
>> + }
>
> I agree with Jamie that you should get rid of the unlikely.
>
> Thanks for taking care of this, Davide.
>
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>
> Cheers,
> Jeff
>
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <sys/types.h>
> #include <errno.h>
> #include <assert.h>
> #include <sys/eventfd.h>
> #include <libaio.h>
>
> int
> main(int argc, char **argv)
> {
> #define SIZE (256*1024*1024)
> char *buf;
> struct io_event io_event;
> struct iocb iocb;
> struct iocb *iocbs[] = { &iocb };
> int rwfd, efd;
> int res;
> io_context_t io_ctx;
>
> efd = eventfd(0, 0);
> if (efd < 0) {
> perror("eventfd");
> exit(1);
> }
>
> rwfd = open("rwfile", O_RDWR|O_DIRECT); assert(rwfd != -1);
> if (posix_memalign((void **)&buf, getpagesize(), SIZE) < 0) {
> perror("posix_memalign");
> exit(1);
> }
> memset(buf, 0x42, SIZE);
>
> /* Write test. */
> res = io_queue_init(1024, &io_ctx); assert(res == 0);
> io_prep_pwrite(&iocb, rwfd, buf, SIZE, 0);
> io_set_eventfd(&iocb, efd);
> res = io_submit(io_ctx, 1, iocbs); assert(res == 1);
yes but io_submit() is blocking. so your close(efd) will come after the release in fs/aio.c
I suggest you start a thread just before io_submit() and give it this work :
void *thread_work(void *arg)
{
usleep(10000);
close(efd);
return (void *)0;
}
>
> /* Now close the eventfd so that AIO has the last reference */
> close(efd);
>
> /* Keep this process around so that the aio subsystem does not hold
> * the last reference on the rwfd, otherwise the really_put_req will
> * be called from process context */
> res = io_getevents(io_ctx, 1, 1, &io_event, NULL);
> if (res != 1) {
> if (res < 0) {
> errno = -res;
> perror("io_getevents");
> } else
> printf("io_getevents did not return 1 event after "
> "closing eventfd\n");
> exit(1);
> }
> assert(io_event.res == SIZE);
> printf("eventfd write test [SUCCESS]\n");
>
> return 0;
> }
> /*
> * Local variables:
> * c-basic-offset: 8
> * compile-command: "gcc -o eventfd-in-irq eventfd-in-irq.c -laio -g3"
> * End:
> */
>
>
next prev parent reply other threads:[~2009-03-18 15:00 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-09 15:49 [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Jeff Moyer
2009-03-09 15:54 ` [patch] factor out checks against the memlock rlimit Jeff Moyer
2009-03-09 15:59 ` [patch] man-pages: add documentation about the memlock implications of io_setup Jeff Moyer
2009-03-09 16:45 ` Michael Kerrisk
2009-03-09 16:48 ` Michael Kerrisk
2009-03-09 20:44 ` Jeff Moyer
2009-03-09 16:18 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Avi Kivity
2009-03-09 17:57 ` Jeff Moyer
2009-03-09 19:45 ` Avi Kivity
2009-03-09 20:36 ` Jamie Lokier
2009-03-10 8:36 ` Avi Kivity
2009-03-09 20:31 ` Eric Dumazet
2009-03-12 2:39 ` Eric Dumazet
2009-03-12 2:44 ` Benjamin LaHaise
2009-03-12 3:24 ` Eric Dumazet
2009-03-12 3:29 ` Benjamin LaHaise
2009-03-12 3:33 ` Eric Dumazet
2009-03-12 3:36 ` Benjamin LaHaise
2009-03-12 3:40 ` Eric Dumazet
2009-03-12 3:09 ` Eric Dumazet
2009-03-12 5:18 ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12 5:42 ` [PATCH] aio: " Eric Dumazet
2009-03-12 5:47 ` [PATCH] fs: " Andrew Morton
2009-03-12 6:10 ` Eric Dumazet
2009-03-12 6:39 ` Andrew Morton
2009-03-12 13:39 ` Davide Libenzi
2009-03-13 22:34 ` Davide Libenzi
2009-03-13 22:43 ` Eric Dumazet
2009-03-13 23:28 ` Trond Myklebust
2009-03-14 1:40 ` Davide Libenzi
2009-03-14 4:02 ` Trond Myklebust
2009-03-14 14:32 ` Davide Libenzi
2009-03-15 1:36 ` [patch] eventfd - remove fput() call from possible IRQ context Davide Libenzi
2009-03-15 17:44 ` Benjamin LaHaise
2009-03-15 20:08 ` [patch] eventfd - remove fput() call from possible IRQ context (2nd rev) Davide Libenzi
2009-03-16 17:25 ` Jamie Lokier
2009-03-16 18:36 ` Davide Libenzi
2009-03-18 14:22 ` Jeff Moyer
2009-03-18 14:46 ` Davide Libenzi
2009-03-18 14:55 ` Eric Dumazet [this message]
2009-03-18 15:25 ` Jeff Moyer
2009-03-18 15:43 ` Eric Dumazet
2009-03-18 16:13 ` Jeff Moyer
2009-03-18 17:25 ` [patch] eventfd - remove fput() call from possible IRQ context (3rd rev) Davide Libenzi
2009-03-18 17:34 ` Jeff Moyer
2009-03-12 19:22 ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12 20:21 ` Andrew Morton
2009-03-09 22:36 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Andrew Morton
2009-03-10 13:43 ` Jeff Moyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49C10B6B.3040108@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=davidel@xmailserver.org \
--cc=jmoyer@redhat.com \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox