public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-aio <linux-aio@kvack.org>,
	zach.brown@oracle.com
Subject: Re: [patch] eventfd - remove fput() call from possible IRQ context (2nd rev)
Date: Wed, 18 Mar 2009 15:55:39 +0100	[thread overview]
Message-ID: <49C10B6B.3040108@cosmosbay.com> (raw)
In-Reply-To: <x49wsamhpvg.fsf@segfault.boston.devel.redhat.com>

Jeff Moyer a écrit :
> Davide Libenzi <davidel@xmailserver.org> writes:
> 
>> The following patch remove a possible source of fput() call from inside 
>> IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call 
>> from IRQ context, but conceptually the bug is there.
> 
> I've attached a test program which can reproduce the fput call in
> interrupt context.  It's a modified version of the eventfd test that
> Rusty wrote for the libaio test harness.  I verified that fput was in
> fact being called in interrupt context by using systemtap to print out
> the "thead_indent" of fput calls, and observing a "swapper(0)" in the
> output.  After applying your patch, I confirmed that __fput is no longer
> called from interrupt context.  Strangely enough, I never did get any
> output from the might_sleep in __fput.  I can't explain that.
> 
> I have some minor comments inlined below.
> 
>> This patch adds an optimization similar to the one we already do on 
>> ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty 
>> in general, but the alternative here would be to add a brand new delayed 
>> fput() infrastructure, that I'm not sure is worth it.
>>
>> On Sun, 15 Mar 2009, Benjamin LaHaise wrote:
>>
>>> This looks reasonably sane, the only concern I have with it is that I think 
>>> it logically makes more sense to use the same convention for fi_filp and 
>>> ki_eventfd, as the different in IS_ERR vs checking for NULL is a bit 
>>> confusing.  Aside from that, it looks like it should fix the problem 
>>> correctly.
>> Makes sense.
>>
>> Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
>>
>>
>> - Davide
>>
>>
>> ---
>>  fs/aio.c |   37 +++++++++++++++++++++++++++----------
>>  1 file changed, 27 insertions(+), 10 deletions(-)
>>
>> Index: linux-2.6.mod/fs/aio.c
>> ===================================================================
>> --- linux-2.6.mod.orig/fs/aio.c	2009-03-14 09:24:12.000000000 -0700
>> +++ linux-2.6.mod/fs/aio.c	2009-03-15 12:54:10.000000000 -0700
> ...
>> @@ -527,12 +528,14 @@ static void aio_fput_routine(struct work
>>   */
>>  static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
>>  {
>> +	int schedule_putreq = 0;
>> +
>>  	dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n",
>>  		req, atomic_long_read(&req->ki_filp->f_count));
>>  
>>  	assert_spin_locked(&ctx->ctx_lock);
>>  
>> -	req->ki_users --;
>> +	req->ki_users--;
>>  	BUG_ON(req->ki_users < 0);
>>  	if (likely(req->ki_users))
>>  		return 0;
>> @@ -540,10 +543,23 @@ static int __aio_put_req(struct kioctx *
>>  	req->ki_cancel = NULL;
>>  	req->ki_retry = NULL;
>>  
>> -	/* Must be done under the lock to serialise against cancellation.
>> -	 * Call this aio_fput as it duplicates fput via the fput_work.
>> +	/*
>> +	 * Try to optimize the aio and eventfd file* puts, by avoiding to
>> +	 * schedule work in case it is not __fput() time. In normal cases,
>> +	 * we wouldn not be holding the last reference to the file*, so
>               ^^^^^^^^^^
> tyop
> 
>> +	 * this function will be executed w/out any aio kthread wakeup.
>>  	 */
>> -	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) {
>> +	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count)))
>> +		schedule_putreq++;
>> +	else
>> +		req->ki_filp = NULL;
>> +	if (unlikely(req->ki_eventfd != NULL)) {
>> +		if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count)))
>> +			schedule_putreq++;
>> +		else
>> +			req->ki_eventfd = NULL;
>> +	}
> 
> I agree with Jamie that you should get rid of the unlikely.
> 
> Thanks for taking care of this, Davide.
> 
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
> 
> Cheers,
> Jeff
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <sys/types.h>
> #include <errno.h>
> #include <assert.h>
> #include <sys/eventfd.h>
> #include <libaio.h>
> 
> int
> main(int argc, char **argv)
> {
> #define SIZE	(256*1024*1024)
> 	char *buf;
> 	struct io_event io_event;
> 	struct iocb iocb;
> 	struct iocb *iocbs[] = { &iocb };
> 	int rwfd, efd;
> 	int res;
> 	io_context_t	io_ctx;
> 
> 	efd = eventfd(0, 0);
> 	if (efd < 0) {
> 		perror("eventfd");
> 		exit(1);
> 	}
> 
> 	rwfd = open("rwfile", O_RDWR|O_DIRECT);		assert(rwfd != -1);
> 	if (posix_memalign((void **)&buf, getpagesize(), SIZE) < 0) {
> 		perror("posix_memalign");
> 		exit(1);
> 	}
> 	memset(buf, 0x42, SIZE);
> 
> 	/* Write test. */
> 	res = io_queue_init(1024, &io_ctx);		assert(res == 0);
> 	io_prep_pwrite(&iocb, rwfd, buf, SIZE, 0);
> 	io_set_eventfd(&iocb, efd);
> 	res = io_submit(io_ctx, 1, iocbs);		assert(res == 1);

yes but io_submit() is blocking. so your close(efd) will come after the release in fs/aio.c

I suggest you start a thread just before io_submit() and give it this work :

void *thread_work(void *arg)
{
	usleep(10000);
	close(efd);
	return (void *)0;
}

> 
> 	/* Now close the eventfd so that AIO has the last reference */
> 	close(efd);
> 
> 	/* Keep this process around so that the aio subsystem does not hold
> 	 * the last reference on the rwfd, otherwise the really_put_req will
> 	 * be called from process context */
> 	res = io_getevents(io_ctx, 1, 1, &io_event, NULL);
> 	if (res != 1) {
> 		if (res < 0) {
> 			errno = -res;
> 			perror("io_getevents");
> 		} else
> 			printf("io_getevents did not return 1 event after "
> 			       "closing eventfd\n");
> 		exit(1);
> 	}
> 	assert(io_event.res == SIZE);
> 	printf("eventfd write test [SUCCESS]\n");
> 
> 	return 0;
> }
> /*
>  * Local variables:
>  *   c-basic-offset: 8
>  *   compile-command: "gcc -o eventfd-in-irq eventfd-in-irq.c -laio -g3"
>  * End:
>  */
> 
> 



  parent reply	other threads:[~2009-03-18 15:00 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-09 15:49 [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Jeff Moyer
2009-03-09 15:54 ` [patch] factor out checks against the memlock rlimit Jeff Moyer
2009-03-09 15:59 ` [patch] man-pages: add documentation about the memlock implications of io_setup Jeff Moyer
2009-03-09 16:45   ` Michael Kerrisk
2009-03-09 16:48   ` Michael Kerrisk
2009-03-09 20:44     ` Jeff Moyer
2009-03-09 16:18 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Avi Kivity
2009-03-09 17:57   ` Jeff Moyer
2009-03-09 19:45     ` Avi Kivity
2009-03-09 20:36       ` Jamie Lokier
2009-03-10  8:36         ` Avi Kivity
2009-03-09 20:31     ` Eric Dumazet
2009-03-12  2:39       ` Eric Dumazet
2009-03-12  2:44         ` Benjamin LaHaise
2009-03-12  3:24           ` Eric Dumazet
2009-03-12  3:29             ` Benjamin LaHaise
2009-03-12  3:33               ` Eric Dumazet
2009-03-12  3:36                 ` Benjamin LaHaise
2009-03-12  3:40                   ` Eric Dumazet
2009-03-12  3:09         ` Eric Dumazet
2009-03-12  5:18           ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12  5:42             ` [PATCH] aio: " Eric Dumazet
2009-03-12  5:47             ` [PATCH] fs: " Andrew Morton
2009-03-12  6:10               ` Eric Dumazet
2009-03-12  6:39                 ` Andrew Morton
2009-03-12 13:39                   ` Davide Libenzi
2009-03-13 22:34                     ` Davide Libenzi
2009-03-13 22:43                       ` Eric Dumazet
2009-03-13 23:28                     ` Trond Myklebust
2009-03-14  1:40                       ` Davide Libenzi
2009-03-14  4:02                         ` Trond Myklebust
2009-03-14 14:32                           ` Davide Libenzi
2009-03-15  1:36                             ` [patch] eventfd - remove fput() call from possible IRQ context Davide Libenzi
2009-03-15 17:44                               ` Benjamin LaHaise
2009-03-15 20:08                                 ` [patch] eventfd - remove fput() call from possible IRQ context (2nd rev) Davide Libenzi
2009-03-16 17:25                                   ` Jamie Lokier
2009-03-16 18:36                                     ` Davide Libenzi
2009-03-18 14:22                                   ` Jeff Moyer
2009-03-18 14:46                                     ` Davide Libenzi
2009-03-18 14:55                                     ` Eric Dumazet [this message]
2009-03-18 15:25                                       ` Jeff Moyer
2009-03-18 15:43                                         ` Eric Dumazet
2009-03-18 16:13                                           ` Jeff Moyer
2009-03-18 17:25                                     ` [patch] eventfd - remove fput() call from possible IRQ context (3rd rev) Davide Libenzi
2009-03-18 17:34                                       ` Jeff Moyer
2009-03-12 19:22                   ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12 20:21                     ` Andrew Morton
2009-03-09 22:36 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Andrew Morton
2009-03-10 13:43   ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C10B6B.3040108@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=davidel@xmailserver.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox