* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application [not found] <20040831081835.08942f70.akpm@osdl.org> @ 2004-09-03 15:52 ` Badari Pulavarty 2004-09-03 22:57 ` Daniel McNeil 0 siblings, 1 reply; 3+ messages in thread From: Badari Pulavarty @ 2004-09-03 15:52 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, daniel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1390 bytes --] On Tue, 2004-08-31 at 08:18, Andrew Morton wrote: > Begin forwarded message: > > Date: Tue, 31 Aug 2004 06:15:18 -0700 > From: bugme-daemon@osdl.org > To: bugme-new@lists.osdl.org > Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application > > > http://bugme.osdl.org/show_bug.cgi?id=3317 > Hi Andrew, I debugged this some more. Here is whats happening: The test program used program text address as buffer to do the READ to. DIO get_user_pages() returned EFAULT. We called finished_one_bio() as part of dropping the ref. to dio. It called aio_complete(). do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the "ret" value. This is where the second aio_complete() is coming from. So we cleanup "req" and on the next de-ref we get OOPS. The problem here is, finished_one_bio() shouldn't call aio_complete() since no work has been done. I have a fix for this - can you verify this ? I am not really comfortable with this "tweaking". (I am not really sure about IO errors like EIO etc. - if they can lead to calling aio_complete() twice) Fix is to call aio_complete() ONLY if there is something to report. Note the we don't update dio->result with any error codes from get_user_pages(), they just passed as "ret" value from do_direct_IO(). Thanks, Badari [-- Attachment #2: aio-dio.patch --] [-- Type: text/plain, Size: 557 bytes --] --- linux-2.6.9-rc1.org/fs/direct-io.c 2004-09-03 08:44:22.186328240 -0700 +++ linux-2.6.9-rc1/fs/direct-io.c 2004-09-03 08:45:48.382224472 -0700 @@ -235,7 +235,8 @@ static void finished_one_bio(struct dio dio_complete(dio, dio->block_in_file << dio->blkbits, dio->result); /* Complete AIO later if falling back to buffered i/o */ - if (dio->result == dio->size || dio->rw == READ) { + if (dio->result == dio->size || + ((dio->rw == READ) && dio->result)) { aio_complete(dio->iocb, dio->result, 0); kfree(dio); return; ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application 2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty @ 2004-09-03 22:57 ` Daniel McNeil 2004-09-04 17:27 ` badari 0 siblings, 1 reply; 3+ messages in thread From: Daniel McNeil @ 2004-09-03 22:57 UTC (permalink / raw) To: Badari Pulavarty Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List, linux-aio@kvack.org On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote: > On Tue, 2004-08-31 at 08:18, Andrew Morton wrote: > > Begin forwarded message: > > > > Date: Tue, 31 Aug 2004 06:15:18 -0700 > > From: bugme-daemon@osdl.org > > To: bugme-new@lists.osdl.org > > Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application > > > > > > http://bugme.osdl.org/show_bug.cgi?id=3317 > > > > Hi Andrew, > > I debugged this some more. Here is whats happening: > > The test program used program text address as buffer to do the READ to. > DIO get_user_pages() returned EFAULT. We called finished_one_bio() > as part of dropping the ref. to dio. It called aio_complete(). > do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects > to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the > "ret" value. This is where the second aio_complete() is coming from. > So we cleanup "req" and on the next de-ref we get OOPS. > > The problem here is, finished_one_bio() shouldn't call aio_complete() > since no work has been done. I have a fix for this - can you verify this > ? I am not really comfortable with this "tweaking". (I am not really > sure about IO errors like EIO etc. - if they can lead to calling > aio_complete() twice) > > > Fix is to call aio_complete() ONLY if there is something to report. > Note the we don't update dio->result with any error codes from > get_user_pages(), they just passed as "ret" value from do_direct_IO(). > > Thanks, > Badari Badari, This does fix the problem when running on my system (ext3). One question, finished_one_bio() is called in 3 places, are you sure the other places won't be harmed by this change? I'm also looking over the code and will let you know if I see any problems. Daniel ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application 2004-09-03 22:57 ` Daniel McNeil @ 2004-09-04 17:27 ` badari 0 siblings, 0 replies; 3+ messages in thread From: badari @ 2004-09-04 17:27 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Suparna Bhattacharya, Linux Kernel Mailing List, linux-aio@kvack.org Daniel, aio_complete() gets called only when we are done with this dio. Other calls to finished_one_bio() should be fine. dio->result should have the return value we want to send back. The fix I made is to call aio_complete() only if we have something to report back. One problem is, dio->result gets updated for IO errors bur doesn't get updated for errors from get_user_pages(). Things should be fine, but I am not really comfortable retruning half errors thro aio_complete() and other half thro return value of do_direct_IO(). I guess its okay, since some of the IO errors can happen only after we submit the bio. Thanks, Badari Daniel McNeil wrote: >On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote: > > >>On Tue, 2004-08-31 at 08:18, Andrew Morton wrote: >> >> >>>Begin forwarded message: >>> >>>Date: Tue, 31 Aug 2004 06:15:18 -0700 >>>From: bugme-daemon@osdl.org >>>To: bugme-new@lists.osdl.org >>>Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application >>> >>> >>>http://bugme.osdl.org/show_bug.cgi?id=3317 >>> >>> >>> >>Hi Andrew, >> >>I debugged this some more. Here is whats happening: >> >>The test program used program text address as buffer to do the READ to. >>DIO get_user_pages() returned EFAULT. We called finished_one_bio() >>as part of dropping the ref. to dio. It called aio_complete(). >>do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects >>to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the >>"ret" value. This is where the second aio_complete() is coming from. >>So we cleanup "req" and on the next de-ref we get OOPS. >> >>The problem here is, finished_one_bio() shouldn't call aio_complete() >>since no work has been done. I have a fix for this - can you verify this >>? I am not really comfortable with this "tweaking". (I am not really >>sure about IO errors like EIO etc. - if they can lead to calling >>aio_complete() twice) >> >> >>Fix is to call aio_complete() ONLY if there is something to report. >>Note the we don't update dio->result with any error codes from >>get_user_pages(), they just passed as "ret" value from do_direct_IO(). >> >>Thanks, >>Badari >> >> > >Badari, > >This does fix the problem when running on my system (ext3). > >One question, finished_one_bio() is called in 3 places, >are you sure the other places won't be harmed by this >change? > >I'm also looking over the code and will let you know if >I see any problems. > >Daniel > >-- >To unsubscribe, send a message with 'unsubscribe linux-aio' in >the body to majordomo@kvack.org. For more info on Linux AIO, >see: http://www.kvack.org/aio/ >Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> > > > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-09-04 17:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040831081835.08942f70.akpm@osdl.org>
2004-09-03 15:52 ` [Bug 3317] New: Kernel oops in aio_complete while running AIO application Badari Pulavarty
2004-09-03 22:57 ` Daniel McNeil
2004-09-04 17:27 ` badari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox