public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] aio: invalidate async directio writes
@ 2008-06-18 18:09 Jeff Moyer
  2008-06-18 18:22 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jeff Moyer @ 2008-06-18 18:09 UTC (permalink / raw)
  To: akpm; +Cc: zach.brown, linux-aio, linux-kernel

Hi, Andrew,

This is a follow-up to:

commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
Author: Zach Brown <zach.brown@oracle.com>
Date:   Tue Oct 30 11:45:46 2007 -0700

    dio: fix cache invalidation after sync writes
    
    Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
    clean pages before dio write") introduced a bug which stopped dio from
    ever invalidating the page cache after writes.  It still invalidated it
    before writes so most users were fine.
    
    Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
    this bug when he had a buffered reader immediately reading file data
    after an O_DIRECT [writer] had written the data.  The kernel issued
    read-ahead beyond the position of the reader which overlapped with the
    O_DIRECT writer.  The failure to invalidate after writes caused the
    reader to see stale data from the read-ahead.
    
    The following patch is originally from Karl.  The following commentary
    is his:
    
        The below 3rd try takes on your suggestion of just invalidating
        no matter what the retval from the direct_IO call.  I ran it
        thru the test-case several times and it has worked every time.
        The post-invalidate is probably still too early for async-directio,
        but I don't have a testcase for that;  just sync.  And, this
        won't be any worse in the async case.
    
    I added a test to the aio-dio-regress repository which mimics Karl's IO
    pattern.  It verifed the bad behaviour and that the patch fixed it.  I
    agree with Karl, this still doesn't help the case where a buffered
    reader follows an AIO O_DIRECT writer.  That will require a bit more
    work.
    
    This gives up on the idea of returning EIO to indicate to userspace that
    stale data remains if the invalidation failed.

Note the second-to-last paragraph, where it mentions that this does not fix
the AIO case.  I updated the regression test to also perform asynchronous
I/O and verified that the problem does exist.

To fix the problem, we need to invalidate the pages that were under write
I/O after the I/O completes.  Because the I/O completion handler can be called
in interrupt context (and invalidate_inode_pages2 cannot be called in interrupt
context), this patch opts to defer the completion to a workqueue.  That
workqueue is responsible for invalidating the page cache pages and completing
the I/O.

I verified that the test case passes with the following patch applied.
Since this affects the code path for all async writers, I had our
performance group run a patched kernel through an OLTP workload.  I'm
told that the noise level for this test is in the 5% range.  The results
showed a 2.9% degradation in performance at 80 users, and 80 users was
the peak performance for the test.  For other user counts, the patched
kernel varied from a percent better to a couple of percent worse.  So,
overall I think the patch is worth taking, given that it fixes a
potential data corrupter.  (Sorry I can't report the actual numbers,
since the tests were run on unreleased hardware.)

Comments, as always, are encouraged.

Cheers,

Jeff

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 9e81add..90fa9e2 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -131,6 +131,8 @@ struct dio {
 	int is_async;			/* is IO async ? */
 	int io_error;			/* IO error in completion path */
 	ssize_t result;                 /* IO result */
+	struct list_head done_list;	/* AIO DIO writes to be completed
+					 * in process context */
 };
 
 /*
@@ -260,6 +262,40 @@ static int dio_complete(struct dio *dio, loff_t offset, int ret)
 	return ret;
 }
 
+static void aio_complete_fn(void *data);
+static DECLARE_WORK(aio_complete_work, aio_complete_fn, NULL);
+static DEFINE_SPINLOCK(iocb_completion_list_lock);
+static LIST_HEAD(iocb_completion_list);
+
+static void aio_complete_fn(void *data)
+{
+	unsigned long flags;
+	LIST_HEAD(private);
+	struct dio *dio, *tmp;
+
+	spin_lock_irqsave(&iocb_completion_list_lock, flags);
+	list_splice_init(&iocb_completion_list, &private);
+	spin_unlock_irqrestore(&iocb_completion_list_lock, flags);
+
+	list_for_each_entry_safe(dio, tmp, &private, done_list) {
+		struct kiocb *iocb = dio->iocb;
+		loff_t offset = iocb->ki_pos;
+		struct file *file = iocb->ki_filp;
+		struct address_space *mapping = file->f_mapping;
+		int ret;
+                pgoff_t end = (offset + dio->size - 1) >> PAGE_CACHE_SHIFT;
+
+		/* and now invalidate the page cache */
+		ret = dio_complete(dio, offset, 0);
+		if (mapping->nrpages)
+			invalidate_inode_pages2_range(mapping,
+						offset >> PAGE_CACHE_SHIFT, end);
+		aio_complete(dio->iocb, ret, 0);
+		list_del(&dio->done_list);
+		kfree(dio);
+	}
+}
+
 static int dio_bio_complete(struct dio *dio, struct bio *bio);
 /*
  * Asynchronous IO callback. 
@@ -280,9 +316,22 @@ static void dio_bio_end_aio(struct bio *bio, int error)
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (remaining == 0) {
-		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
-		aio_complete(dio->iocb, ret, 0);
-		kfree(dio);
+		/* For async O_DIRECT writes, we need to invalidate the 
+		 * page cache after the write completes.  Kick off a
+		 * workqueue to do this and issue the completion in process
+		 * context.
+		 */
+		if (dio->rw == READ) {
+			int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
+			aio_complete(dio->iocb, ret, 0);
+			kfree(dio);
+		} else {
+			unsigned long flags;
+			spin_lock_irqsave(&iocb_completion_list_lock, flags);
+			list_add(&dio->done_list, &iocb_completion_list);
+			spin_unlock_irqrestore(&iocb_completion_list_lock, flags);
+			schedule_work(&aio_complete_work);
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-18 18:09 [patch] aio: invalidate async directio writes Jeff Moyer
@ 2008-06-18 18:22 ` Christoph Hellwig
  2008-06-18 19:45   ` Jeff Moyer
  2008-06-19  7:51 ` Peter Zijlstra
  2008-06-19 17:23 ` Zach Brown
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-06-18 18:22 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: akpm, zach.brown, linux-aio, linux-kernel

On Wed, Jun 18, 2008 at 02:09:51PM -0400, Jeff Moyer wrote:
> +		/* For async O_DIRECT writes, we need to invalidate the 
> +		 * page cache after the write completes.  Kick off a
> +		 * workqueue to do this and issue the completion in process
> +		 * context.
> +		 */
> +		if (dio->rw == READ) {
> +			int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
> +			aio_complete(dio->iocb, ret, 0);
> +			kfree(dio);
> +		} else {
> +			unsigned long flags;
> +			spin_lock_irqsave(&iocb_completion_list_lock, flags);
> +			list_add(&dio->done_list, &iocb_completion_list);
> +			spin_unlock_irqrestore(&iocb_completion_list_lock, flags);
> +			schedule_work(&aio_complete_work);
> +		}

Can we please move all these aio_complete calls to user context?  Having
AIO contexts completing from irq context is a major pain for complex
filesystems like XFS.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-18 18:22 ` Christoph Hellwig
@ 2008-06-18 19:45   ` Jeff Moyer
  2008-06-18 19:48     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2008-06-18 19:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: akpm, zach.brown, linux-aio, linux-kernel

Christoph Hellwig <hch@infradead.org> writes:

> On Wed, Jun 18, 2008 at 02:09:51PM -0400, Jeff Moyer wrote:
>> +		/* For async O_DIRECT writes, we need to invalidate the 
>> +		 * page cache after the write completes.  Kick off a
>> +		 * workqueue to do this and issue the completion in process
>> +		 * context.
>> +		 */
>> +		if (dio->rw == READ) {
>> +			int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
>> +			aio_complete(dio->iocb, ret, 0);
>> +			kfree(dio);
>> +		} else {
>> +			unsigned long flags;
>> +			spin_lock_irqsave(&iocb_completion_list_lock, flags);
>> +			list_add(&dio->done_list, &iocb_completion_list);
>> +			spin_unlock_irqrestore(&iocb_completion_list_lock, flags);
>> +			schedule_work(&aio_complete_work);
>> +		}
>
> Can we please move all these aio_complete calls to user context?  Having
> AIO contexts completing from irq context is a major pain for complex
> filesystems like XFS.

Can you help me understand why this is a pain?  I'm having trouble
making the connection.

Thanks!

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-18 19:45   ` Jeff Moyer
@ 2008-06-18 19:48     ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2008-06-18 19:48 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Christoph Hellwig, akpm, zach.brown, linux-aio, linux-kernel

On Wed, Jun 18, 2008 at 03:45:28PM -0400, Jeff Moyer wrote:
> > Can we please move all these aio_complete calls to user context?  Having
> > AIO contexts completing from irq context is a major pain for complex
> > filesystems like XFS.
> 
> Can you help me understand why this is a pain?  I'm having trouble
> making the connection.

With your patch we complete aio dio write request in user context, which
is great for filesystems that need to do more complex activity in the
completion handler, e.g. XFS for the unwritten extent conversion.  But
only doing this for the write case is only very partially useful, we
should be doing this for the read case, too.

See fs/xfs/linux-2.6/xfs_aops.c:xfs_end_io_direct() for what I mean.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-18 18:09 [patch] aio: invalidate async directio writes Jeff Moyer
  2008-06-18 18:22 ` Christoph Hellwig
@ 2008-06-19  7:51 ` Peter Zijlstra
  2008-06-19 13:50   ` Jeff Moyer
  2008-06-19 17:50   ` Zach Brown
  2008-06-19 17:23 ` Zach Brown
  2 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2008-06-19  7:51 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: akpm, zach.brown, linux-aio, linux-kernel

On Wed, 2008-06-18 at 14:09 -0400, Jeff Moyer wrote:
> Hi, Andrew,
> 
> This is a follow-up to:
> 
> commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
> Author: Zach Brown <zach.brown@oracle.com>
> Date:   Tue Oct 30 11:45:46 2007 -0700
> 
>     dio: fix cache invalidation after sync writes
>     
>     Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
>     clean pages before dio write") introduced a bug which stopped dio from
>     ever invalidating the page cache after writes.  It still invalidated it
>     before writes so most users were fine.
>     
>     Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
>     this bug when he had a buffered reader immediately reading file data
>     after an O_DIRECT [writer] had written the data.  The kernel issued
>     read-ahead beyond the position of the reader which overlapped with the
>     O_DIRECT writer.  The failure to invalidate after writes caused the
>     reader to see stale data from the read-ahead.
>     
>     The following patch is originally from Karl.  The following commentary
>     is his:
>     
>         The below 3rd try takes on your suggestion of just invalidating
>         no matter what the retval from the direct_IO call.  I ran it
>         thru the test-case several times and it has worked every time.
>         The post-invalidate is probably still too early for async-directio,
>         but I don't have a testcase for that;  just sync.  And, this
>         won't be any worse in the async case.
>     
>     I added a test to the aio-dio-regress repository which mimics Karl's IO
>     pattern.  It verifed the bad behaviour and that the patch fixed it.  I
>     agree with Karl, this still doesn't help the case where a buffered
>     reader follows an AIO O_DIRECT writer.  That will require a bit more
>     work.
>     
>     This gives up on the idea of returning EIO to indicate to userspace that
>     stale data remains if the invalidation failed.
> 
> Note the second-to-last paragraph, where it mentions that this does not fix
> the AIO case.  I updated the regression test to also perform asynchronous
> I/O and verified that the problem does exist.
> 
> To fix the problem, we need to invalidate the pages that were under write
> I/O after the I/O completes.  Because the I/O completion handler can be called
> in interrupt context (and invalidate_inode_pages2 cannot be called in interrupt
> context), this patch opts to defer the completion to a workqueue.  That
> workqueue is responsible for invalidating the page cache pages and completing
> the I/O.
> 
> I verified that the test case passes with the following patch applied.

I'm utterly ignorant of all thing [AD]IO, but doesn't deferring the
invalidate open up/widen a race window?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-19  7:51 ` Peter Zijlstra
@ 2008-06-19 13:50   ` Jeff Moyer
  2008-06-19 13:58     ` Peter Zijlstra
  2008-06-19 17:50   ` Zach Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Moyer @ 2008-06-19 13:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, zach.brown, linux-aio, linux-kernel

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, 2008-06-18 at 14:09 -0400, Jeff Moyer wrote:
>> Hi, Andrew,
>> 
>> This is a follow-up to:
>> 
>> commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
>> Author: Zach Brown <zach.brown@oracle.com>
>> Date:   Tue Oct 30 11:45:46 2007 -0700
>> 
>>     dio: fix cache invalidation after sync writes
>>     
>>     Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
>>     clean pages before dio write") introduced a bug which stopped dio from
>>     ever invalidating the page cache after writes.  It still invalidated it
>>     before writes so most users were fine.
>>     
>>     Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
>>     this bug when he had a buffered reader immediately reading file data
>>     after an O_DIRECT [writer] had written the data.  The kernel issued
>>     read-ahead beyond the position of the reader which overlapped with the
>>     O_DIRECT writer.  The failure to invalidate after writes caused the
>>     reader to see stale data from the read-ahead.
>>     
>>     The following patch is originally from Karl.  The following commentary
>>     is his:
>>     
>>         The below 3rd try takes on your suggestion of just invalidating
>>         no matter what the retval from the direct_IO call.  I ran it
>>         thru the test-case several times and it has worked every time.
>>         The post-invalidate is probably still too early for async-directio,
>>         but I don't have a testcase for that;  just sync.  And, this
>>         won't be any worse in the async case.
>>     
>>     I added a test to the aio-dio-regress repository which mimics Karl's IO
>>     pattern.  It verifed the bad behaviour and that the patch fixed it.  I
>>     agree with Karl, this still doesn't help the case where a buffered
>>     reader follows an AIO O_DIRECT writer.  That will require a bit more
>>     work.
>>     
>>     This gives up on the idea of returning EIO to indicate to userspace that
>>     stale data remains if the invalidation failed.
>> 
>> Note the second-to-last paragraph, where it mentions that this does not fix
>> the AIO case.  I updated the regression test to also perform asynchronous
>> I/O and verified that the problem does exist.
>> 
>> To fix the problem, we need to invalidate the pages that were under write
>> I/O after the I/O completes.  Because the I/O completion handler can be called
>> in interrupt context (and invalidate_inode_pages2 cannot be called in interrupt
>> context), this patch opts to defer the completion to a workqueue.  That
>> workqueue is responsible for invalidating the page cache pages and completing
>> the I/O.
>> 
>> I verified that the test case passes with the following patch applied.
>
> I'm utterly ignorant of all thing [AD]IO, but doesn't deferring the
> invalidate open up/widen a race window?

We weren't doing the invalidate at all before this patch.  This patch
introduces the invalidation, but we can't do it in interrupt context.

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-19 13:50   ` Jeff Moyer
@ 2008-06-19 13:58     ` Peter Zijlstra
  2008-06-19 14:05       ` Jeff Moyer
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2008-06-19 13:58 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: akpm, zach.brown, linux-aio, linux-kernel

On Thu, 2008-06-19 at 09:50 -0400, Jeff Moyer wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Wed, 2008-06-18 at 14:09 -0400, Jeff Moyer wrote:
> >> Hi, Andrew,
> >> 
> >> This is a follow-up to:
> >> 
> >> commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
> >> Author: Zach Brown <zach.brown@oracle.com>
> >> Date:   Tue Oct 30 11:45:46 2007 -0700
> >> 
> >>     dio: fix cache invalidation after sync writes
> >>     
> >>     Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
> >>     clean pages before dio write") introduced a bug which stopped dio from
> >>     ever invalidating the page cache after writes.  It still invalidated it
> >>     before writes so most users were fine.
> >>     
> >>     Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
> >>     this bug when he had a buffered reader immediately reading file data
> >>     after an O_DIRECT [writer] had written the data.  The kernel issued
> >>     read-ahead beyond the position of the reader which overlapped with the
> >>     O_DIRECT writer.  The failure to invalidate after writes caused the
> >>     reader to see stale data from the read-ahead.
> >>     
> >>     The following patch is originally from Karl.  The following commentary
> >>     is his:
> >>     
> >>         The below 3rd try takes on your suggestion of just invalidating
> >>         no matter what the retval from the direct_IO call.  I ran it
> >>         thru the test-case several times and it has worked every time.
> >>         The post-invalidate is probably still too early for async-directio,
> >>         but I don't have a testcase for that;  just sync.  And, this
> >>         won't be any worse in the async case.
> >>     
> >>     I added a test to the aio-dio-regress repository which mimics Karl's IO
> >>     pattern.  It verifed the bad behaviour and that the patch fixed it.  I
> >>     agree with Karl, this still doesn't help the case where a buffered
> >>     reader follows an AIO O_DIRECT writer.  That will require a bit more
> >>     work.
> >>     
> >>     This gives up on the idea of returning EIO to indicate to userspace that
> >>     stale data remains if the invalidation failed.
> >> 
> >> Note the second-to-last paragraph, where it mentions that this does not fix
> >> the AIO case.  I updated the regression test to also perform asynchronous
> >> I/O and verified that the problem does exist.
> >> 
> >> To fix the problem, we need to invalidate the pages that were under write
> >> I/O after the I/O completes.  Because the I/O completion handler can be called
> >> in interrupt context (and invalidate_inode_pages2 cannot be called in interrupt
> >> context), this patch opts to defer the completion to a workqueue.  That
> >> workqueue is responsible for invalidating the page cache pages and completing
> >> the I/O.
> >> 
> >> I verified that the test case passes with the following patch applied.
> >
> > I'm utterly ignorant of all thing [AD]IO, but doesn't deferring the
> > invalidate open up/widen a race window?
> 
> We weren't doing the invalidate at all before this patch.  This patch
> introduces the invalidation, but we can't do it in interrupt context.

Sure, I understand that, so this patch goes from always wrong, to
sometimes wrong. I'm just wondering if this non-determinism will hurt
us.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-19 13:58     ` Peter Zijlstra
@ 2008-06-19 14:05       ` Jeff Moyer
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Moyer @ 2008-06-19 14:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, zach.brown, linux-aio, linux-kernel

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, 2008-06-19 at 09:50 -0400, Jeff Moyer wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Wed, 2008-06-18 at 14:09 -0400, Jeff Moyer wrote:
>> >> Hi, Andrew,
>> >> 
>> >> This is a follow-up to:
>> >> 
>> >> commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
>> >> Author: Zach Brown <zach.brown@oracle.com>
>> >> Date:   Tue Oct 30 11:45:46 2007 -0700
>> >> 
>> >>     dio: fix cache invalidation after sync writes
>> >>     
>> >>     Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
>> >>     clean pages before dio write") introduced a bug which stopped dio from
>> >>     ever invalidating the page cache after writes.  It still invalidated it
>> >>     before writes so most users were fine.
>> >>     
>> >>     Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
>> >>     this bug when he had a buffered reader immediately reading file data
>> >>     after an O_DIRECT [writer] had written the data.  The kernel issued
>> >>     read-ahead beyond the position of the reader which overlapped with the
>> >>     O_DIRECT writer.  The failure to invalidate after writes caused the
>> >>     reader to see stale data from the read-ahead.
>> >>     
>> >>     The following patch is originally from Karl.  The following commentary
>> >>     is his:
>> >>     
>> >>         The below 3rd try takes on your suggestion of just invalidating
>> >>         no matter what the retval from the direct_IO call.  I ran it
>> >>         thru the test-case several times and it has worked every time.
>> >>         The post-invalidate is probably still too early for async-directio,
>> >>         but I don't have a testcase for that;  just sync.  And, this
>> >>         won't be any worse in the async case.
>> >>     
>> >>     I added a test to the aio-dio-regress repository which mimics Karl's IO
>> >>     pattern.  It verifed the bad behaviour and that the patch fixed it.  I
>> >>     agree with Karl, this still doesn't help the case where a buffered
>> >>     reader follows an AIO O_DIRECT writer.  That will require a bit more
>> >>     work.
>> >>     
>> >>     This gives up on the idea of returning EIO to indicate to userspace that
>> >>     stale data remains if the invalidation failed.
>> >> 
>> >> Note the second-to-last paragraph, where it mentions that this does not fix
>> >> the AIO case.  I updated the regression test to also perform asynchronous
>> >> I/O and verified that the problem does exist.
>> >> 
>> >> To fix the problem, we need to invalidate the pages that were under write
>> >> I/O after the I/O completes.  Because the I/O completion handler can be called
>> >> in interrupt context (and invalidate_inode_pages2 cannot be called in interrupt
>> >> context), this patch opts to defer the completion to a workqueue.  That
>> >> workqueue is responsible for invalidating the page cache pages and completing
>> >> the I/O.
>> >> 
>> >> I verified that the test case passes with the following patch applied.
>> >
>> > I'm utterly ignorant of all thing [AD]IO, but doesn't deferring the
>> > invalidate open up/widen a race window?
>> 
>> We weren't doing the invalidate at all before this patch.  This patch
>> introduces the invalidation, but we can't do it in interrupt context.
>
> Sure, I understand that, so this patch goes from always wrong, to
> sometimes wrong. I'm just wondering if this non-determinism will hurt
> us.

Actually, it goes from sometimes wrong, to sometimes, but less likely,
wrong.  We already have the non-determinism.  And, as I mentioned, a
test case written to expose this very problem doesn't hit it after this
patch.

I'll see if I can't come up with a more deterministic solution.  Any
ideas on the matter would be welcome.  ;)

Thanks for taking a look, Peter.

Cheers,

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-18 18:09 [patch] aio: invalidate async directio writes Jeff Moyer
  2008-06-18 18:22 ` Christoph Hellwig
  2008-06-19  7:51 ` Peter Zijlstra
@ 2008-06-19 17:23 ` Zach Brown
  2 siblings, 0 replies; 10+ messages in thread
From: Zach Brown @ 2008-06-19 17:23 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: akpm, linux-aio, linux-kernel


> +static DECLARE_WORK(aio_complete_work, aio_complete_fn, NULL);
> +static DEFINE_SPINLOCK(iocb_completion_list_lock);
> +static LIST_HEAD(iocb_completion_list);

It seems like a bad idea to funnel all AIO DIO completion in the system
through one cacheline.  Should we have per-cpu lists and work structs?

> +			unsigned long flags;
> +			spin_lock_irqsave(&iocb_completion_list_lock, flags);
> +			list_add(&dio->done_list, &iocb_completion_list);
> +			spin_unlock_irqrestore(&iocb_completion_list_lock, flags);
> +			schedule_work(&aio_complete_work);

And we should probably use list_add_tail() here so that we don't reverse
 the order of IO completion and end_io() callbacks.

And hopefully going per-cpu could simplify the locking so that we don't
have even more per-io locking.

- z

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] aio: invalidate async directio writes
  2008-06-19  7:51 ` Peter Zijlstra
  2008-06-19 13:50   ` Jeff Moyer
@ 2008-06-19 17:50   ` Zach Brown
  1 sibling, 0 replies; 10+ messages in thread
From: Zach Brown @ 2008-06-19 17:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jeff Moyer, akpm, linux-aio, linux-kernel


> I'm utterly ignorant of all thing [AD]IO, but doesn't deferring the
> invalidate open up/widen a race window?

The case we care about making consistent are buffered reads which the
user politely only issues after O_DIRECT writes have completed.

If they issue buffered reads that race with O_DIRECT writes, well, they
get to see weird versions of the data.  Just like if they issue buffered
reads that race with buffered writes.

But we must make sure that reads issued after the O_DIRECT writes are
not satisfied by cached data which was populated during a nasty racing
read.  So we invalidate cached pages in the range of the write after the
O_DIRECT write is on disk but before we tell the user that the write has
completed.

This is made even worse by the observation that racing buffered reads
might be issued behind the user's back as part of read-ahead.  People
have hit this very issue in systems where they have a ring buffer in a
file, an O_DIRECT writer, and a buffered reader that reads the blocks
just after they're written.

- z

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-06-19 17:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-18 18:09 [patch] aio: invalidate async directio writes Jeff Moyer
2008-06-18 18:22 ` Christoph Hellwig
2008-06-18 19:45   ` Jeff Moyer
2008-06-18 19:48     ` Christoph Hellwig
2008-06-19  7:51 ` Peter Zijlstra
2008-06-19 13:50   ` Jeff Moyer
2008-06-19 13:58     ` Peter Zijlstra
2008-06-19 14:05       ` Jeff Moyer
2008-06-19 17:50   ` Zach Brown
2008-06-19 17:23 ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox