From: Kent Overstreet <koverstreet@google.com>
To: Valdis.Kletnieks@vt.edu
Cc: Hillf Danton <dhillf@gmail.com>,
Benjamin LaHaise <bcrl@kvack.org>,
linux-kernel@vger.kernel.org, linux-aio@kvack.org, zab@zabbo.net,
akpm@linux-foundation.org
Subject: Re: next-20130117 - kernel BUG with aio
Date: Thu, 24 Jan 2013 13:18:50 -0800 [thread overview]
Message-ID: <20130124211850.GH26407@google.com> (raw)
In-Reply-To: <13450.1359048141@turing-police.cc.vt.edu>
On Thu, Jan 24, 2013 at 12:22:21PM -0500, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 23 Jan 2013 20:10:03 +0800, Hillf Danton said:
>
> > Try again?
> > ---
> >
> > --- a/fs/aio.c Tue Jan 22 21:37:54 2013
> > +++ b/fs/aio.c Wed Jan 23 20:06:14 2013
>
> Now seeing this:
>
> [ 2941.495370] ------------[ cut here ]------------
> [ 2941.495379] WARNING: at fs/aio.c:336 put_ioctx+0x1cb/0x252()
>
> Same warn location, but different traceback?
Finally reproduced it (thanks to Zach Brown for the idea - using a
loopback device) - it's triggered when there's outstanding kiocbs when
we're trying to tear down the kioctx.
Found a couple bugs once I was able to reproduce it. Besides the null
pointer deref that Hillf posted a patch for, cancellation was fubar -
kiocb_cancel() shouldn't be marking the kiocb as cancelled if it didn't
have a cancel callback.
The other two bugs I found were both a result of the fact that
aio_run_iocb() touches the kiocb after passing it off to a method that
may call aio_complete() on it - which is something I originally missed.
Digression: When I was refactoring, I was hoping to be able to change
the kiocb refcounting so that the refcount's just initialized to one,
and the initial refcount is dropped when aio_complete() is called - and
since nothing outside of the aio code messes with the kiocb refcount, it
might be possible to get rid of the kiocb refcount entirely if I can
figure out how to deal with cancellation.
Anyways - that didn't work out, or at least it's going to take more
work. The two bugs that resulted from that were:
- The "aio: kill ki_retry" patch dropped the second kiocb ref that
io_submit_one drops when it returns. But aio_rw_vect_retry() touches
the kiocb after passing it off to f_op->aio_(read|write) which will
call aio_complete(), so this is a use after free.
- The "aio: smoosh struct kiocb" patch assumed that some of the fields
in struct kiocb aren't needed after aio_complete() has been called.
This is _almost_ true, but again aio_rw_vect_retry() looks at those
fields which ends up determining whether aio_run_iocb() calls
aio_complete() itself.
This seems _ridiculously_ sketchy to me, or at least convoluted...
but it'll take awhile to figure out how to clean that up and I'm not
in a great hurry to do so.
So, Andrew - that "smoosh struct kiocb" patch should just be dropped,
even if I fixed that issue clearly the idea is a lot less safe than I
thought. I've got patches for the other stuff I'm going to mail out
momentarily.
Ben, Zach - the cancellation stuff (both the fix and the
other changes) need more review, and we need a saner way of testing
cancellation. Maybe it'd be worth implementing cancellation for regular
block devices just so that we have a way of testing it :/
next prev parent reply other threads:[~2013-01-24 21:20 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-21 13:24 next-20130117 - kernel BUG with aio Valdis Kletnieks
2013-01-22 13:43 ` Hillf Danton
2013-01-22 21:28 ` Valdis.Kletnieks
2013-01-23 12:10 ` Hillf Danton
2013-01-24 17:22 ` Valdis.Kletnieks
2013-01-24 21:18 ` Kent Overstreet [this message]
2013-01-24 21:27 ` Andrew Morton
2013-01-24 21:27 ` Andrew Morton
2013-01-24 21:39 ` Kent Overstreet
2013-01-24 21:39 ` Kent Overstreet
2013-01-24 22:25 ` Zach Brown
2013-01-24 22:25 ` Zach Brown
2013-01-24 22:47 ` Jeff Moyer
2013-01-24 22:47 ` Jeff Moyer
2013-01-24 23:03 ` Kent Overstreet
2013-01-24 22:13 ` Kent Overstreet
2013-01-29 13:41 ` Jan Kara
2013-01-29 13:41 ` Jan Kara
2013-01-24 21:43 ` [PATCH 1/3] aio: Fix a null pointer deref in batch_complete_aio Kent Overstreet
2013-01-24 21:43 ` Kent Overstreet
2013-01-25 13:15 ` Hillf Danton
2013-01-25 13:15 ` Hillf Danton
2013-01-24 21:43 ` [PATCH 2/3] aio-kill-ki_retry-fix-fix Kent Overstreet
2013-01-24 21:43 ` Kent Overstreet
2013-01-24 21:43 ` [PATCH 3/3] aio-use-cancellation-list-lazily-fix Kent Overstreet
2013-01-24 21:43 ` Kent Overstreet
2013-01-25 13:30 ` Hillf Danton
2013-01-25 13:30 ` Hillf Danton
2013-01-25 23:12 ` Andrew Morton
2013-01-25 23:12 ` Andrew Morton
2013-01-28 17:37 ` Kent Overstreet
2013-01-28 17:37 ` Kent Overstreet
2013-01-31 21:59 ` next-20130117 - kernel BUG with aio Andrew Morton
2013-02-01 0:37 ` Kent Overstreet
2013-02-05 15:53 ` Valdis.Kletnieks
2013-02-05 17:20 ` Kent Overstreet
2013-02-05 17:48 ` Valdis.Kletnieks
2013-02-06 17:15 ` Valdis.Kletnieks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130124211850.GH26407@google.com \
--to=koverstreet@google.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=dhillf@gmail.com \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.