From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: Broken dio refcounting leads to livelock?
Date: Mon, 7 Jan 2019 17:16:05 -0800 [thread overview]
Message-ID: <20190108011605.GM12689@magnolia> (raw)
In-Reply-To: <20190108002633.GL12689@magnolia>
On Mon, Jan 07, 2019 at 04:26:33PM -0800, Darrick J. Wong wrote:
> Hi Christoph,
>
> I have a question about refcounting in struct iomap_dio:
>
> As I mentioned late last year, I've been seeing a livelock since the
> early 4.20-rcX era in iomap_dio_rw when running generic/323. The
> relevant part of the function starts around line 1910:
>
> blk_finish_plug(&plug);
>
> if (ret < 0)
> iomap_dio_set_error(dio, ret);
>
> /*
> * If all the writes we issued were FUA, we don't need to flush the
> * cache on IO completion. Clear the sync flag for this case.
> */
> if (dio->flags & IOMAP_DIO_WRITE_FUA)
> dio->flags &= ~IOMAP_DIO_NEED_SYNC;
>
> if (!atomic_dec_and_test(&dio->ref)) {
> int clang = 0;
>
> if (!dio->wait_for_completion)
> return -EIOCBQUEUED;
>
> for (;;) {
> set_current_state(TASK_UNINTERRUPTIBLE);
> if (!READ_ONCE(dio->submit.waiter))
> break;
>
> if (!(iocb->ki_flags & IOCB_HIPRI) ||
> !dio->submit.last_queue ||
> !blk_poll(dio->submit.last_queue,
> dio->submit.cookie, true))
> io_schedule(); <--------- here
> }
> __set_current_state(TASK_RUNNING);
> }
>
> ret = iomap_dio_complete(dio);
>
> I finally had time to look into this, and I noticed garbage values for
> dio->submit.waiter in *dio right before we call io_schedule. I suspect
> this is a use-after-free, so I altered iomap_dio_complete to memset the
> entire structure to 0x5C just prior to kfreeing the *dio, and sure
> enough I saw 0x5C in all the dio fields right before the io_schedule
> call.
>
> Next I instrumented all the places where we access dio->ref to see
> what's going on, and observed the following sequence:
>
> 1. ref == 2 just before the blk_finish_plug call.
> dio->wait_for_completion == false.
> 2. ref == 2 just before the "if (!atomic_dec_and_test(...))"
> 3. ref == 1 just after the "if (!atomic_dec_and_test(...))"
>
> Next we apparently end up in iomap_dio_bio_end_io:
>
> 4. ref == 1 just before the "if (atomic_dec_and_test(...))"
> 5. ref == 0 just after the "if (atomic_dec_and_test(...))"
> 6. iomap_dio_bio_end_io calls iomap_dio_complete, frees the dio after
> poisoning it with 0x5C as described above.
>
> Then we jump back to iomap_dio_rw, and:
>
> 7. ref = 0x5c5c5c5c just before the "if (!(iocb->ki_flags & IOCB_HIPRI)...)"
> However, dio->wait_for_completion is 0x5c5c5c5c so we incorrectly
> call io_schedule and never wake up.
>
> KASAN confirms this diagnosis. I noticed that only ~100us elapse
> between the unplug and the bio endio function being called; I've found
> that I can reproduce this in a qemu vm with a virtio-scsi device backed
> by a tmpfs file on the host. My guess is that with slower scsi device
> the dispatch would take long enough that iomap_dio_rw would have
> returned -EIOCBQUEUED long before the bio end_io function gets called
> and frees the dio?
>
> Anyhow, I'm not sure how to fix this. It's clear that iomap_dio_rw
> can't drop its ref to the dio until it's done using the dio fields.
> It's tempting to change the if (!atomic_dec_and_test()) to a if
> (atomic_read() > 1) and only drop the ref just before returning
> -EIOCBQUEUED or just before calling iomap_dio_complete... but that just
> pushes the livelock to kill_ioctx.
No it doesn't, silly me just had debugging crapola all over iomap.c.
> Brain scrambled, kicking to the list to see if anyone has something
> smarter to say. ;)
I have a gross brainscrambled patch that makes the regression go away.
No idea what other subtle breakage this might cause...
--D
---
fs/iomap.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index a3088fae567b..cdea2a83ef78 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1535,16 +1535,18 @@ static void iomap_dio_bio_end_io(struct bio *bio)
{
struct iomap_dio *dio = bio->bi_private;
bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY);
+ int dio_ref = atomic_dec_return(&dio->ref);
if (bio->bi_status)
iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
- if (atomic_dec_and_test(&dio->ref)) {
- if (dio->wait_for_completion) {
- struct task_struct *waiter = dio->submit.waiter;
- WRITE_ONCE(dio->submit.waiter, NULL);
- blk_wake_io_task(waiter);
- } else if (dio->flags & IOMAP_DIO_WRITE) {
+ if (dio_ref == 1 && dio->wait_for_completion) {
+ struct task_struct *waiter = dio->submit.waiter;
+
+ WRITE_ONCE(dio->submit.waiter, NULL);
+ blk_wake_io_task(waiter);
+ } else if (dio_ref == 0) {
+ if (dio->flags & IOMAP_DIO_WRITE) {
struct inode *inode = file_inode(dio->iocb->ki_filp);
INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
@@ -1916,9 +1918,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (dio->flags & IOMAP_DIO_WRITE_FUA)
dio->flags &= ~IOMAP_DIO_NEED_SYNC;
- if (!atomic_dec_and_test(&dio->ref)) {
- if (!dio->wait_for_completion)
+ if (atomic_read(&dio->ref) > 1) {
+ if (!dio->wait_for_completion) {
+ if (atomic_dec_and_test(&dio->ref))
+ return iomap_dio_complete(dio);
return -EIOCBQUEUED;
+ }
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
@@ -1934,6 +1939,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
__set_current_state(TASK_RUNNING);
}
+ atomic_dec(&dio->ref);
ret = iomap_dio_complete(dio);
return ret;
next prev parent reply other threads:[~2019-01-08 1:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-08 0:26 Broken dio refcounting leads to livelock? Darrick J. Wong
2019-01-08 1:16 ` Darrick J. Wong [this message]
2019-01-08 3:03 ` Darrick J. Wong
2019-01-08 7:46 ` Dave Chinner
2019-01-08 20:08 ` Darrick J. Wong
2019-01-08 20:54 ` Jeff Moyer
2019-01-08 21:07 ` Darrick J. Wong
2019-01-08 21:30 ` Jeff Moyer
2019-01-08 23:12 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190108011605.GM12689@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).