public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re:  [Bug 3317] New: Kernel oops in aio_complete while running AIO application
       [not found] <20040831081835.08942f70.akpm@osdl.org>
@ 2004-09-03 15:52 ` Badari Pulavarty
  2004-09-03 22:57   ` Daniel McNeil
  0 siblings, 1 reply; 3+ messages in thread
From: Badari Pulavarty @ 2004-09-03 15:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Suparna Bhattacharya, daniel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> Begin forwarded message:
> 
> Date: Tue, 31 Aug 2004 06:15:18 -0700
> From: bugme-daemon@osdl.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
> 
> 
> http://bugme.osdl.org/show_bug.cgi?id=3317
> 

Hi Andrew,

I debugged this some more. Here is whats happening:

The test program used program text address as buffer to do the READ to.
DIO get_user_pages() returned EFAULT. We called finished_one_bio()
as part of dropping the ref. to dio. It called aio_complete().
do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
"ret" value. This is where the second aio_complete() is coming from.
So we cleanup "req" and on the next de-ref we get OOPS.

The problem here is, finished_one_bio() shouldn't call aio_complete()
since no work has been done. I have a fix for this - can you verify this
? I am not really comfortable with this "tweaking". (I am not really
sure about IO errors like EIO etc. - if they can lead to calling
aio_complete() twice)


Fix is to call aio_complete() ONLY if there is something to report.
Note the we don't update dio->result with any error codes from
get_user_pages(), they just passed as "ret" value from do_direct_IO().

Thanks,
Badari








[-- Attachment #2: aio-dio.patch --]
[-- Type: text/plain, Size: 557 bytes --]

--- linux-2.6.9-rc1.org/fs/direct-io.c	2004-09-03 08:44:22.186328240 -0700
+++ linux-2.6.9-rc1/fs/direct-io.c	2004-09-03 08:45:48.382224472 -0700
@@ -235,7 +235,8 @@ static void finished_one_bio(struct dio 
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
 			/* Complete AIO later if falling back to buffered i/o */
-			if (dio->result == dio->size || dio->rw == READ) {
+			if (dio->result == dio->size || 
+				((dio->rw == READ) && dio->result)) {
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
 				return;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re:  [Bug 3317] New: Kernel oops in aio_complete while running AIO application
  2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty
@ 2004-09-03 22:57   ` Daniel McNeil
  2004-09-04 17:27     ` badari
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel McNeil @ 2004-09-03 22:57 UTC (permalink / raw)
  To: Badari Pulavarty
  Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List,
	linux-aio@kvack.org

On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
> On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> > Begin forwarded message:
> > 
> > Date: Tue, 31 Aug 2004 06:15:18 -0700
> > From: bugme-daemon@osdl.org
> > To: bugme-new@lists.osdl.org
> > Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
> > 
> > 
> > http://bugme.osdl.org/show_bug.cgi?id=3317
> > 
> 
> Hi Andrew,
> 
> I debugged this some more. Here is whats happening:
> 
> The test program used program text address as buffer to do the READ to.
> DIO get_user_pages() returned EFAULT. We called finished_one_bio()
> as part of dropping the ref. to dio. It called aio_complete().
> do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
> to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
> "ret" value. This is where the second aio_complete() is coming from.
> So we cleanup "req" and on the next de-ref we get OOPS.
> 
> The problem here is, finished_one_bio() shouldn't call aio_complete()
> since no work has been done. I have a fix for this - can you verify this
> ? I am not really comfortable with this "tweaking". (I am not really
> sure about IO errors like EIO etc. - if they can lead to calling
> aio_complete() twice)
> 
> 
> Fix is to call aio_complete() ONLY if there is something to report.
> Note the we don't update dio->result with any error codes from
> get_user_pages(), they just passed as "ret" value from do_direct_IO().
> 
> Thanks,
> Badari

Badari,

This does fix the problem when running on my system (ext3).

One question, finished_one_bio() is called in 3 places,
are you sure the other places won't be harmed by this
change?

I'm also looking over the code and will let you know if
I see any problems.

Daniel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application
  2004-09-03 22:57   ` Daniel McNeil
@ 2004-09-04 17:27     ` badari
  0 siblings, 0 replies; 3+ messages in thread
From: badari @ 2004-09-04 17:27 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List,
	linux-aio@kvack.org

Daniel,

aio_complete() gets called only when we are done with this dio.
Other calls to finished_one_bio() should be fine. dio->result
should have the return value we want to send back. The fix
I made is to call aio_complete() only if we have something to
report back.

One problem is, dio->result gets updated for IO errors bur
doesn't get updated for errors from get_user_pages().  Things
should be fine, but I am not really comfortable retruning half
errors thro aio_complete() and other half thro return value
of do_direct_IO(). I guess its okay, since some of the IO errors
can happen only after we submit the bio.

Thanks,
Badari

Daniel McNeil wrote:

>On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
>  
>
>>On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
>>    
>>
>>>Begin forwarded message:
>>>
>>>Date: Tue, 31 Aug 2004 06:15:18 -0700
>>>From: bugme-daemon@osdl.org
>>>To: bugme-new@lists.osdl.org
>>>Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
>>>
>>>
>>>http://bugme.osdl.org/show_bug.cgi?id=3317
>>>
>>>      
>>>
>>Hi Andrew,
>>
>>I debugged this some more. Here is whats happening:
>>
>>The test program used program text address as buffer to do the READ to.
>>DIO get_user_pages() returned EFAULT. We called finished_one_bio()
>>as part of dropping the ref. to dio. It called aio_complete().
>>do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
>>to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
>>"ret" value. This is where the second aio_complete() is coming from.
>>So we cleanup "req" and on the next de-ref we get OOPS.
>>
>>The problem here is, finished_one_bio() shouldn't call aio_complete()
>>since no work has been done. I have a fix for this - can you verify this
>>? I am not really comfortable with this "tweaking". (I am not really
>>sure about IO errors like EIO etc. - if they can lead to calling
>>aio_complete() twice)
>>
>>
>>Fix is to call aio_complete() ONLY if there is something to report.
>>Note the we don't update dio->result with any error codes from
>>get_user_pages(), they just passed as "ret" value from do_direct_IO().
>>
>>Thanks,
>>Badari
>>    
>>
>
>Badari,
>
>This does fix the problem when running on my system (ext3).
>
>One question, finished_one_bio() is called in 3 places,
>are you sure the other places won't be harmed by this
>change?
>
>I'm also looking over the code and will let you know if
>I see any problems.
>
>Daniel
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-aio' in
>the body to majordomo@kvack.org.  For more info on Linux AIO,
>see: http://www.kvack.org/aio/
>Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
>
>  
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-09-04 17:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20040831081835.08942f70.akpm@osdl.org>
2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty
2004-09-03 22:57   ` Daniel McNeil
2004-09-04 17:27     ` badari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox