All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-aio <linux-aio@kvack.org>,
	zach.brown@oracle.com
Subject: Re: [patch] eventfd - remove fput() call from possible IRQ context (2nd rev)
Date: Wed, 18 Mar 2009 15:55:39 +0100	[thread overview]
Message-ID: <49C10B6B.3040108@cosmosbay.com> (raw)
In-Reply-To: <x49wsamhpvg.fsf@segfault.boston.devel.redhat.com>

Jeff Moyer a écrit :
> Davide Libenzi <davidel@xmailserver.org> writes:
> 
>> The following patch remove a possible source of fput() call from inside 
>> IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call 
>> from IRQ context, but conceptually the bug is there.
> 
> I've attached a test program which can reproduce the fput call in
> interrupt context.  It's a modified version of the eventfd test that
> Rusty wrote for the libaio test harness.  I verified that fput was in
> fact being called in interrupt context by using systemtap to print out
> the "thead_indent" of fput calls, and observing a "swapper(0)" in the
> output.  After applying your patch, I confirmed that __fput is no longer
> called from interrupt context.  Strangely enough, I never did get any
> output from the might_sleep in __fput.  I can't explain that.
> 
> I have some minor comments inlined below.
> 
>> This patch adds an optimization similar to the one we already do on 
>> ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty 
>> in general, but the alternative here would be to add a brand new delayed 
>> fput() infrastructure, that I'm not sure is worth it.
>>
>> On Sun, 15 Mar 2009, Benjamin LaHaise wrote:
>>
>>> This looks reasonably sane, the only concern I have with it is that I think 
>>> it logically makes more sense to use the same convention for fi_filp and 
>>> ki_eventfd, as the different in IS_ERR vs checking for NULL is a bit 
>>> confusing.  Aside from that, it looks like it should fix the problem 
>>> correctly.
>> Makes sense.
>>
>> Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
>>
>>
>> - Davide
>>
>>
>> ---
>>  fs/aio.c |   37 +++++++++++++++++++++++++++----------
>>  1 file changed, 27 insertions(+), 10 deletions(-)
>>
>> Index: linux-2.6.mod/fs/aio.c
>> ===================================================================
>> --- linux-2.6.mod.orig/fs/aio.c	2009-03-14 09:24:12.000000000 -0700
>> +++ linux-2.6.mod/fs/aio.c	2009-03-15 12:54:10.000000000 -0700
> ...
>> @@ -527,12 +528,14 @@ static void aio_fput_routine(struct work
>>   */
>>  static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
>>  {
>> +	int schedule_putreq = 0;
>> +
>>  	dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n",
>>  		req, atomic_long_read(&req->ki_filp->f_count));
>>  
>>  	assert_spin_locked(&ctx->ctx_lock);
>>  
>> -	req->ki_users --;
>> +	req->ki_users--;
>>  	BUG_ON(req->ki_users < 0);
>>  	if (likely(req->ki_users))
>>  		return 0;
>> @@ -540,10 +543,23 @@ static int __aio_put_req(struct kioctx *
>>  	req->ki_cancel = NULL;
>>  	req->ki_retry = NULL;
>>  
>> -	/* Must be done under the lock to serialise against cancellation.
>> -	 * Call this aio_fput as it duplicates fput via the fput_work.
>> +	/*
>> +	 * Try to optimize the aio and eventfd file* puts, by avoiding to
>> +	 * schedule work in case it is not __fput() time. In normal cases,
>> +	 * we wouldn not be holding the last reference to the file*, so
>               ^^^^^^^^^^
> tyop
> 
>> +	 * this function will be executed w/out any aio kthread wakeup.
>>  	 */
>> -	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) {
>> +	if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count)))
>> +		schedule_putreq++;
>> +	else
>> +		req->ki_filp = NULL;
>> +	if (unlikely(req->ki_eventfd != NULL)) {
>> +		if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count)))
>> +			schedule_putreq++;
>> +		else
>> +			req->ki_eventfd = NULL;
>> +	}
> 
> I agree with Jamie that you should get rid of the unlikely.
> 
> Thanks for taking care of this, Davide.
> 
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
> 
> Cheers,
> Jeff
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <sys/types.h>
> #include <errno.h>
> #include <assert.h>
> #include <sys/eventfd.h>
> #include <libaio.h>
> 
> int
> main(int argc, char **argv)
> {
> #define SIZE	(256*1024*1024)
> 	char *buf;
> 	struct io_event io_event;
> 	struct iocb iocb;
> 	struct iocb *iocbs[] = { &iocb };
> 	int rwfd, efd;
> 	int res;
> 	io_context_t	io_ctx;
> 
> 	efd = eventfd(0, 0);
> 	if (efd < 0) {
> 		perror("eventfd");
> 		exit(1);
> 	}
> 
> 	rwfd = open("rwfile", O_RDWR|O_DIRECT);		assert(rwfd != -1);
> 	if (posix_memalign((void **)&buf, getpagesize(), SIZE) < 0) {
> 		perror("posix_memalign");
> 		exit(1);
> 	}
> 	memset(buf, 0x42, SIZE);
> 
> 	/* Write test. */
> 	res = io_queue_init(1024, &io_ctx);		assert(res == 0);
> 	io_prep_pwrite(&iocb, rwfd, buf, SIZE, 0);
> 	io_set_eventfd(&iocb, efd);
> 	res = io_submit(io_ctx, 1, iocbs);		assert(res == 1);

yes but io_submit() is blocking. so your close(efd) will come after the release in fs/aio.c

I suggest you start a thread just before io_submit() and give it this work :

void *thread_work(void *arg)
{
	usleep(10000);
	close(efd);
	return (void *)0;
}

> 
> 	/* Now close the eventfd so that AIO has the last reference */
> 	close(efd);
> 
> 	/* Keep this process around so that the aio subsystem does not hold
> 	 * the last reference on the rwfd, otherwise the really_put_req will
> 	 * be called from process context */
> 	res = io_getevents(io_ctx, 1, 1, &io_event, NULL);
> 	if (res != 1) {
> 		if (res < 0) {
> 			errno = -res;
> 			perror("io_getevents");
> 		} else
> 			printf("io_getevents did not return 1 event after "
> 			       "closing eventfd\n");
> 		exit(1);
> 	}
> 	assert(io_event.res == SIZE);
> 	printf("eventfd write test [SUCCESS]\n");
> 
> 	return 0;
> }
> /*
>  * Local variables:
>  *   c-basic-offset: 8
>  *   compile-command: "gcc -o eventfd-in-irq eventfd-in-irq.c -laio -g3"
>  * End:
>  */
> 
> 



  parent reply	other threads:[~2009-03-18 15:00 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-09 15:49 [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Jeff Moyer
2009-03-09 15:54 ` [patch] factor out checks against the memlock rlimit Jeff Moyer
2009-03-09 15:59 ` [patch] man-pages: add documentation about the memlock implications of io_setup Jeff Moyer
2009-03-09 16:45   ` Michael Kerrisk
2009-03-09 16:48   ` Michael Kerrisk
2009-03-09 20:44     ` Jeff Moyer
2009-03-09 16:18 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Avi Kivity
2009-03-09 17:57   ` Jeff Moyer
2009-03-09 19:45     ` Avi Kivity
2009-03-09 20:36       ` Jamie Lokier
2009-03-10  8:36         ` Avi Kivity
2009-03-09 20:31     ` Eric Dumazet
2009-03-12  2:39       ` Eric Dumazet
2009-03-12  2:44         ` Benjamin LaHaise
2009-03-12  3:24           ` Eric Dumazet
2009-03-12  3:29             ` Benjamin LaHaise
2009-03-12  3:33               ` Eric Dumazet
2009-03-12  3:36                 ` Benjamin LaHaise
2009-03-12  3:40                   ` Eric Dumazet
2009-03-12  3:09         ` Eric Dumazet
2009-03-12  5:18           ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12  5:42             ` [PATCH] aio: " Eric Dumazet
2009-03-12  5:47             ` [PATCH] fs: " Andrew Morton
2009-03-12  6:10               ` Eric Dumazet
2009-03-12  6:39                 ` Andrew Morton
2009-03-12 13:39                   ` Davide Libenzi
2009-03-13 22:34                     ` Davide Libenzi
2009-03-13 22:43                       ` Eric Dumazet
2009-03-13 23:28                     ` Trond Myklebust
2009-03-14  1:40                       ` Davide Libenzi
2009-03-14  4:02                         ` Trond Myklebust
2009-03-14 14:32                           ` Davide Libenzi
2009-03-15  1:36                             ` [patch] eventfd - remove fput() call from possible IRQ context Davide Libenzi
2009-03-15 17:44                               ` Benjamin LaHaise
2009-03-15 20:08                                 ` [patch] eventfd - remove fput() call from possible IRQ context (2nd rev) Davide Libenzi
2009-03-16 17:25                                   ` Jamie Lokier
2009-03-16 18:36                                     ` Davide Libenzi
2009-03-18 14:22                                   ` Jeff Moyer
2009-03-18 14:46                                     ` Davide Libenzi
2009-03-18 14:55                                     ` Eric Dumazet [this message]
2009-03-18 15:25                                       ` Jeff Moyer
2009-03-18 15:43                                         ` Eric Dumazet
2009-03-18 16:13                                           ` Jeff Moyer
2009-03-18 17:25                                     ` [patch] eventfd - remove fput() call from possible IRQ context (3rd rev) Davide Libenzi
2009-03-18 17:34                                       ` Jeff Moyer
2009-03-12 19:22                   ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12 20:21                     ` Andrew Morton
2009-03-09 22:36 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Andrew Morton
2009-03-10 13:43   ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C10B6B.3040108@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=davidel@xmailserver.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.