* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application
[not found] <20040831081835.08942f70.akpm@osdl.org>
@ 2004-09-03 15:52 ` Badari Pulavarty
2004-09-03 22:57 ` Daniel McNeil
0 siblings, 1 reply; 3+ messages in thread
From: Badari Pulavarty @ 2004-09-03 15:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: Suparna Bhattacharya, daniel, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]
On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> Begin forwarded message:
>
> Date: Tue, 31 Aug 2004 06:15:18 -0700
> From: bugme-daemon@osdl.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
>
>
> http://bugme.osdl.org/show_bug.cgi?id=3317
>
Hi Andrew,
I debugged this some more. Here is whats happening:
The test program used program text address as buffer to do the READ to.
DIO get_user_pages() returned EFAULT. We called finished_one_bio()
as part of dropping the ref. to dio. It called aio_complete().
do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
"ret" value. This is where the second aio_complete() is coming from.
So we cleanup "req" and on the next de-ref we get OOPS.
The problem here is, finished_one_bio() shouldn't call aio_complete()
since no work has been done. I have a fix for this - can you verify this
? I am not really comfortable with this "tweaking". (I am not really
sure about IO errors like EIO etc. - if they can lead to calling
aio_complete() twice)
Fix is to call aio_complete() ONLY if there is something to report.
Note the we don't update dio->result with any error codes from
get_user_pages(), they just passed as "ret" value from do_direct_IO().
Thanks,
Badari
[-- Attachment #2: aio-dio.patch --]
[-- Type: text/plain, Size: 557 bytes --]
--- linux-2.6.9-rc1.org/fs/direct-io.c 2004-09-03 08:44:22.186328240 -0700
+++ linux-2.6.9-rc1/fs/direct-io.c 2004-09-03 08:45:48.382224472 -0700
@@ -235,7 +235,8 @@ static void finished_one_bio(struct dio
dio_complete(dio, dio->block_in_file << dio->blkbits,
dio->result);
/* Complete AIO later if falling back to buffered i/o */
- if (dio->result == dio->size || dio->rw == READ) {
+ if (dio->result == dio->size ||
+ ((dio->rw == READ) && dio->result)) {
aio_complete(dio->iocb, dio->result, 0);
kfree(dio);
return;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application
2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty
@ 2004-09-03 22:57 ` Daniel McNeil
2004-09-04 17:27 ` badari
0 siblings, 1 reply; 3+ messages in thread
From: Daniel McNeil @ 2004-09-03 22:57 UTC (permalink / raw)
To: Badari Pulavarty
Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List,
linux-aio@kvack.org
On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
> On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> > Begin forwarded message:
> >
> > Date: Tue, 31 Aug 2004 06:15:18 -0700
> > From: bugme-daemon@osdl.org
> > To: bugme-new@lists.osdl.org
> > Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
> >
> >
> > http://bugme.osdl.org/show_bug.cgi?id=3317
> >
>
> Hi Andrew,
>
> I debugged this some more. Here is whats happening:
>
> The test program used program text address as buffer to do the READ to.
> DIO get_user_pages() returned EFAULT. We called finished_one_bio()
> as part of dropping the ref. to dio. It called aio_complete().
> do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
> to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
> "ret" value. This is where the second aio_complete() is coming from.
> So we cleanup "req" and on the next de-ref we get OOPS.
>
> The problem here is, finished_one_bio() shouldn't call aio_complete()
> since no work has been done. I have a fix for this - can you verify this
> ? I am not really comfortable with this "tweaking". (I am not really
> sure about IO errors like EIO etc. - if they can lead to calling
> aio_complete() twice)
>
>
> Fix is to call aio_complete() ONLY if there is something to report.
> Note the we don't update dio->result with any error codes from
> get_user_pages(), they just passed as "ret" value from do_direct_IO().
>
> Thanks,
> Badari
Badari,
This does fix the problem when running on my system (ext3).
One question, finished_one_bio() is called in 3 places,
are you sure the other places won't be harmed by this
change?
I'm also looking over the code and will let you know if
I see any problems.
Daniel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application
2004-09-03 22:57 ` Daniel McNeil
@ 2004-09-04 17:27 ` badari
0 siblings, 0 replies; 3+ messages in thread
From: badari @ 2004-09-04 17:27 UTC (permalink / raw)
To: Daniel McNeil
Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List,
linux-aio@kvack.org
Daniel,
aio_complete() gets called only when we are done with this dio.
Other calls to finished_one_bio() should be fine. dio->result
should have the return value we want to send back. The fix
I made is to call aio_complete() only if we have something to
report back.
One problem is, dio->result gets updated for IO errors bur
doesn't get updated for errors from get_user_pages(). Things
should be fine, but I am not really comfortable retruning half
errors thro aio_complete() and other half thro return value
of do_direct_IO(). I guess its okay, since some of the IO errors
can happen only after we submit the bio.
Thanks,
Badari
Daniel McNeil wrote:
>On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
>
>
>>On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
>>
>>
>>>Begin forwarded message:
>>>
>>>Date: Tue, 31 Aug 2004 06:15:18 -0700
>>>From: bugme-daemon@osdl.org
>>>To: bugme-new@lists.osdl.org
>>>Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
>>>
>>>
>>>http://bugme.osdl.org/show_bug.cgi?id=3317
>>>
>>>
>>>
>>Hi Andrew,
>>
>>I debugged this some more. Here is whats happening:
>>
>>The test program used program text address as buffer to do the READ to.
>>DIO get_user_pages() returned EFAULT. We called finished_one_bio()
>>as part of dropping the ref. to dio. It called aio_complete().
>>do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
>>to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
>>"ret" value. This is where the second aio_complete() is coming from.
>>So we cleanup "req" and on the next de-ref we get OOPS.
>>
>>The problem here is, finished_one_bio() shouldn't call aio_complete()
>>since no work has been done. I have a fix for this - can you verify this
>>? I am not really comfortable with this "tweaking". (I am not really
>>sure about IO errors like EIO etc. - if they can lead to calling
>>aio_complete() twice)
>>
>>
>>Fix is to call aio_complete() ONLY if there is something to report.
>>Note the we don't update dio->result with any error codes from
>>get_user_pages(), they just passed as "ret" value from do_direct_IO().
>>
>>Thanks,
>>Badari
>>
>>
>
>Badari,
>
>This does fix the problem when running on my system (ext3).
>
>One question, finished_one_bio() is called in 3 places,
>are you sure the other places won't be harmed by this
>change?
>
>I'm also looking over the code and will let you know if
>I see any problems.
>
>Daniel
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-aio' in
>the body to majordomo@kvack.org. For more info on Linux AIO,
>see: http://www.kvack.org/aio/
>Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
>
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-09-04 17:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040831081835.08942f70.akpm@osdl.org>
2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty
2004-09-03 22:57 ` Daniel McNeil
2004-09-04 17:27 ` badari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox