* [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
@ 2011-03-04 9:46 Lukas Czerner
2011-03-04 14:15 ` Jeff Moyer
0 siblings, 1 reply; 6+ messages in thread
From: Lukas Czerner @ 2011-03-04 9:46 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-fsdevel, lczerner, jmoyer
BZ29402
https://bugzilla.kernel.org/show_bug.cgi?id=29402
We can hit serious mis-synchronization in bio completion path of
blkdev_issue_zeroout() leading to a panic.
The problem is that when we are going to wait_for_completion() in
blkdev_issue_zeroout() we check if the bb.done equals issued (number of
submitted bios). If it does, we can skip the wait_for_completition()
and just out of the function since there is nothing to wait for.
However, there is a ordering problem because bio_batch_end_io() is
calling atomic_inc(&bb->done) before complete(), hence it might seem to
blkdev_issue_zeroout() that all bios has been completed and exit. At
this point when bio_batch_end_io() is going to call complete(bb->wait),
bb and wait does not longer exist since it was allocated on stack in
blkdev_issue_zeroout() ==> panic!
(thread 1) (thread 2)
bio_batch_end_io() blkdev_issue_zeroout()
if(bb) { ...
if (bb->end_io) ...
bb->end_io(bio, err); ...
atomic_inc(&bb->done); ...
... while (issued != atomic_read(&bb.done))
... (let issued == bb.done)
... (do the rest of the function)
... return ret;
complete(bb->wait);
^^^^^^^^
panic
We can fix this easily by simplifying bio_batch and completion counting.
We can count completion locally in blkdev_issue_zeroout() without need of
locking or atomic operation because we are the only one handling issued
variable holding the number of submitted bios. So remove atomic_t done
from struct bio_batch.
Also remove bio_end_io_t *end_io since it is not used.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Eric Whitney <eric.whitney@hp.com>
Tested-by: Eric Whitney <eric.whitney@hp.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Dmitry Monakhov <dmonakhov@openvz.org>
---
block/blk-lib.c | 12 ++----------
1 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1a320d2..dcee6e2 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -106,10 +106,8 @@ EXPORT_SYMBOL(blkdev_issue_discard);
struct bio_batch
{
- atomic_t done;
unsigned long flags;
struct completion *wait;
- bio_end_io_t *end_io;
};
static void bio_batch_end_io(struct bio *bio, int err)
@@ -122,12 +120,8 @@ static void bio_batch_end_io(struct bio *bio, int err)
else
clear_bit(BIO_UPTODATE, &bb->flags);
}
- if (bb) {
- if (bb->end_io)
- bb->end_io(bio, err);
- atomic_inc(&bb->done);
+ if (bb)
complete(bb->wait);
- }
bio_put(bio);
}
@@ -153,10 +147,8 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
unsigned int sz, issued = 0;
DECLARE_COMPLETION_ONSTACK(wait);
- atomic_set(&bb.done, 0);
bb.flags = 1 << BIO_UPTODATE;
bb.wait = &wait;
- bb.end_io = NULL;
submit:
ret = 0;
@@ -190,7 +182,7 @@ submit:
}
/* Wait for bios in-flight */
- while (issued != atomic_read(&bb.done))
+ while (issued--)
wait_for_completion(&wait);
if (!test_bit(BIO_UPTODATE, &bb.flags))
--
1.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
2011-03-04 9:46 [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout() Lukas Czerner
@ 2011-03-04 14:15 ` Jeff Moyer
2011-03-04 15:04 ` Lukas Czerner
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2011-03-04 14:15 UTC (permalink / raw)
To: Lukas Czerner; +Cc: linux-kernel, linux-fsdevel
Lukas Czerner <lczerner@redhat.com> writes:
> BZ29402
> https://bugzilla.kernel.org/show_bug.cgi?id=29402
>
> We can hit serious mis-synchronization in bio completion path of
> blkdev_issue_zeroout() leading to a panic.
>
> The problem is that when we are going to wait_for_completion() in
> blkdev_issue_zeroout() we check if the bb.done equals issued (number of
> submitted bios). If it does, we can skip the wait_for_completition()
> and just out of the function since there is nothing to wait for.
> However, there is a ordering problem because bio_batch_end_io() is
> calling atomic_inc(&bb->done) before complete(), hence it might seem to
> blkdev_issue_zeroout() that all bios has been completed and exit. At
> this point when bio_batch_end_io() is going to call complete(bb->wait),
> bb and wait does not longer exist since it was allocated on stack in
> blkdev_issue_zeroout() ==> panic!
>
> (thread 1) (thread 2)
> bio_batch_end_io() blkdev_issue_zeroout()
> if(bb) { ...
> if (bb->end_io) ...
> bb->end_io(bio, err); ...
> atomic_inc(&bb->done); ...
> ... while (issued != atomic_read(&bb.done))
> ... (let issued == bb.done)
> ... (do the rest of the function)
> ... return ret;
> complete(bb->wait);
> ^^^^^^^^
> panic
That's a pretty tight window. The complete is immediately following the
increment. I'm surprised thread 2 has time to finish up and exit the
function before the completion is done.
> We can fix this easily by simplifying bio_batch and completion counting.
> We can count completion locally in blkdev_issue_zeroout() without need of
> locking or atomic operation because we are the only one handling issued
> variable holding the number of submitted bios. So remove atomic_t done
> from struct bio_batch.
It seems to me like it might be better to just not complete anything
until the count is zero. Why issue a wakeup for every bio?
fs/direct-io does something similar, maybe take a look at the
dio_bio_end* routines and see if that would fit well here. With your
scheme, I worry about missing a completion, maybe because the first bio
completes before you are done submitting bios. Is that possible?
> Also remove bio_end_io_t *end_io since it is not used.
Yeah, no idea why that was in there.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
2011-03-04 14:15 ` Jeff Moyer
@ 2011-03-04 15:04 ` Lukas Czerner
2011-03-04 15:15 ` Jeff Moyer
0 siblings, 1 reply; 6+ messages in thread
From: Lukas Czerner @ 2011-03-04 15:04 UTC (permalink / raw)
To: Jeff Moyer; +Cc: Lukas Czerner, linux-kernel, linux-fsdevel
On Fri, 4 Mar 2011, Jeff Moyer wrote:
> Lukas Czerner <lczerner@redhat.com> writes:
>
> > BZ29402
> > https://bugzilla.kernel.org/show_bug.cgi?id=29402
> >
> > We can hit serious mis-synchronization in bio completion path of
> > blkdev_issue_zeroout() leading to a panic.
> >
> > The problem is that when we are going to wait_for_completion() in
> > blkdev_issue_zeroout() we check if the bb.done equals issued (number of
> > submitted bios). If it does, we can skip the wait_for_completition()
> > and just out of the function since there is nothing to wait for.
> > However, there is a ordering problem because bio_batch_end_io() is
> > calling atomic_inc(&bb->done) before complete(), hence it might seem to
> > blkdev_issue_zeroout() that all bios has been completed and exit. At
> > this point when bio_batch_end_io() is going to call complete(bb->wait),
> > bb and wait does not longer exist since it was allocated on stack in
> > blkdev_issue_zeroout() ==> panic!
> >
> > (thread 1) (thread 2)
> > bio_batch_end_io() blkdev_issue_zeroout()
> > if(bb) { ...
> > if (bb->end_io) ...
> > bb->end_io(bio, err); ...
> > atomic_inc(&bb->done); ...
> > ... while (issued != atomic_read(&bb.done))
> > ... (let issued == bb.done)
> > ... (do the rest of the function)
> > ... return ret;
> > complete(bb->wait);
> > ^^^^^^^^
> > panic
>
> That's a pretty tight window. The complete is immediately following the
> increment. I'm surprised thread 2 has time to finish up and exit the
> function before the completion is done.
Yeah, maybe that is why it is so hard to hit. Eric was able to hit it
only on 48 core box running ffsb test with 192 threads. Also the panic
backtrace bug "bad spinlock magic" pointed me to this.
>
> > We can fix this easily by simplifying bio_batch and completion counting.
> > We can count completion locally in blkdev_issue_zeroout() without need of
> > locking or atomic operation because we are the only one handling issued
> > variable holding the number of submitted bios. So remove atomic_t done
> > from struct bio_batch.
>
> It seems to me like it might be better to just not complete anything
> until the count is zero. Why issue a wakeup for every bio?
> fs/direct-io does something similar, maybe take a look at the
> dio_bio_end* routines and see if that would fit well here. With your
> scheme, I worry about missing a completion, maybe because the first bio
> completes before you are done submitting bios. Is that possible?
I do not think it is possible. For every bio submitted there is
wait_for_completion called. When bio complete()s completion->done is
incremented (under the wait->lock). In wait_for_completion() we are
waiting for single submitted bio to complete (completion->done > 0),
then completion->done is decremented. It seems like simple
synchronization.
I am not sure what wakeup you have in mind, but thanks for the tip I'll
look in fs/direct-io.
Thanks!
-Lukas
>
> > Also remove bio_end_io_t *end_io since it is not used.
>
> Yeah, no idea why that was in there.
>
> Cheers,
> Jeff
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
2011-03-04 15:04 ` Lukas Czerner
@ 2011-03-04 15:15 ` Jeff Moyer
2011-03-07 12:25 ` Lukas Czerner
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Moyer @ 2011-03-04 15:15 UTC (permalink / raw)
To: Lukas Czerner; +Cc: linux-kernel, linux-fsdevel
Lukas Czerner <lczerner@redhat.com> writes:
> On Fri, 4 Mar 2011, Jeff Moyer wrote:
>> It seems to me like it might be better to just not complete anything
>> until the count is zero. Why issue a wakeup for every bio?
>> fs/direct-io does something similar, maybe take a look at the
>> dio_bio_end* routines and see if that would fit well here. With your
>> scheme, I worry about missing a completion, maybe because the first bio
>> completes before you are done submitting bios. Is that possible?
>
> I do not think it is possible. For every bio submitted there is
> wait_for_completion called. When bio complete()s completion->done is
> incremented (under the wait->lock). In wait_for_completion() we are
> waiting for single submitted bio to complete (completion->done > 0),
> then completion->done is decremented. It seems like simple
> synchronization.
>
> I am not sure what wakeup you have in mind, but thanks for the tip I'll
> look in fs/direct-io.
Let's say you have several bios to submit, and the first bio is errored
immediately in submit_bio. Since you didn't add yourself to the
waitqueue yet, you might miss the wakeup and sleep forever.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
2011-03-04 15:15 ` Jeff Moyer
@ 2011-03-07 12:25 ` Lukas Czerner
2011-03-07 14:38 ` Jeff Moyer
0 siblings, 1 reply; 6+ messages in thread
From: Lukas Czerner @ 2011-03-07 12:25 UTC (permalink / raw)
To: Jeff Moyer; +Cc: Lukas Czerner, linux-kernel, linux-fsdevel, dmonakhov, axboe
On Fri, 4 Mar 2011, Jeff Moyer wrote:
> Lukas Czerner <lczerner@redhat.com> writes:
>
> > On Fri, 4 Mar 2011, Jeff Moyer wrote:
> >> It seems to me like it might be better to just not complete anything
> >> until the count is zero. Why issue a wakeup for every bio?
> >> fs/direct-io does something similar, maybe take a look at the
> >> dio_bio_end* routines and see if that would fit well here. With your
> >> scheme, I worry about missing a completion, maybe because the first bio
> >> completes before you are done submitting bios. Is that possible?
> >
> > I do not think it is possible. For every bio submitted there is
> > wait_for_completion called. When bio complete()s completion->done is
> > incremented (under the wait->lock). In wait_for_completion() we are
> > waiting for single submitted bio to complete (completion->done > 0),
> > then completion->done is decremented. It seems like simple
> > synchronization.
> >
> > I am not sure what wakeup you have in mind, but thanks for the tip I'll
> > look in fs/direct-io.
>
> Let's say you have several bios to submit, and the first bio is errored
> immediately in submit_bio. Since you didn't add yourself to the
> waitqueue yet, you might miss the wakeup and sleep forever.
>
> Cheers,
> Jeff
>
(Adding Dimitry and Jens into CC)
This can not happen. submit_bio does not return any value. The way how
it does notify caller about its status is via ->bio_end_io (see comment
for __generic_make_request()). Now the ->bio_end_io is in this case
always set because it is the first thing we are doing, so for every bio
submitted there will be appropriate complete() and wait_for_completition()
call.
The one thing can fail though, and it is bio_alloc() however when this
fails we are jumping out of the loop immediately without touching
"issued" at all, so if it is a first bio issued = 0, hence there is
nothing to wait for.
I do not see any problems here.
But, now I can see you point about calling wakeup() for every completed
bio, which is not strictly needed and we should call complete() only
once when the last bio is completed. So how about this ?
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 1a320d2..ccf5a40 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -109,7 +109,6 @@ struct bio_batch
atomic_t done;
unsigned long flags;
struct completion *wait;
- bio_end_io_t *end_io;
};
static void bio_batch_end_io(struct bio *bio, int err)
@@ -122,12 +121,9 @@ static void bio_batch_end_io(struct bio *bio, int err)
else
clear_bit(BIO_UPTODATE, &bb->flags);
}
- if (bb) {
- if (bb->end_io)
- bb->end_io(bio, err);
- atomic_inc(&bb->done);
- complete(bb->wait);
- }
+ if (bb)
+ if (atomic_dec_and_test(&bb->done))
+ complete(bb->wait);
bio_put(bio);
}
@@ -150,13 +146,12 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
int ret;
struct bio *bio;
struct bio_batch bb;
- unsigned int sz, issued = 0;
+ unsigned int sz;
DECLARE_COMPLETION_ONSTACK(wait);
- atomic_set(&bb.done, 0);
+ atomic_set(&bb.done, 1);
bb.flags = 1 << BIO_UPTODATE;
bb.wait = &wait;
- bb.end_io = NULL;
submit:
ret = 0;
@@ -185,12 +180,12 @@ submit:
break;
}
ret = 0;
- issued++;
+ atomic_inc(&bb.done);
submit_bio(WRITE, bio);
}
/* Wait for bios in-flight */
- while (issued != atomic_read(&bb.done))
+ if (!atomic_dec_and_test(&bb.done))
wait_for_completion(&wait);
if (!test_bit(BIO_UPTODATE, &bb.flags))
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout()
2011-03-07 12:25 ` Lukas Czerner
@ 2011-03-07 14:38 ` Jeff Moyer
0 siblings, 0 replies; 6+ messages in thread
From: Jeff Moyer @ 2011-03-07 14:38 UTC (permalink / raw)
To: Lukas Czerner; +Cc: linux-kernel, linux-fsdevel, dmonakhov, axboe
Lukas Czerner <lczerner@redhat.com> writes:
> On Fri, 4 Mar 2011, Jeff Moyer wrote:
>
>> Lukas Czerner <lczerner@redhat.com> writes:
>>
>> > On Fri, 4 Mar 2011, Jeff Moyer wrote:
>> >> It seems to me like it might be better to just not complete anything
>> >> until the count is zero. Why issue a wakeup for every bio?
>> >> fs/direct-io does something similar, maybe take a look at the
>> >> dio_bio_end* routines and see if that would fit well here. With your
>> >> scheme, I worry about missing a completion, maybe because the first bio
>> >> completes before you are done submitting bios. Is that possible?
>> >
>> > I do not think it is possible. For every bio submitted there is
>> > wait_for_completion called. When bio complete()s completion->done is
>> > incremented (under the wait->lock). In wait_for_completion() we are
>> > waiting for single submitted bio to complete (completion->done > 0),
>> > then completion->done is decremented. It seems like simple
>> > synchronization.
>> >
>> > I am not sure what wakeup you have in mind, but thanks for the tip I'll
>> > look in fs/direct-io.
>>
>> Let's say you have several bios to submit, and the first bio is errored
>> immediately in submit_bio. Since you didn't add yourself to the
>> waitqueue yet, you might miss the wakeup and sleep forever.
>>
>> Cheers,
>> Jeff
>>
>
> (Adding Dimitry and Jens into CC)
>
> This can not happen. submit_bio does not return any value. The way how
> it does notify caller about its status is via ->bio_end_io (see comment
> for __generic_make_request()). Now the ->bio_end_io is in this case
> always set because it is the first thing we are doing, so for every bio
> submitted there will be appropriate complete() and wait_for_completition()
> call.
>
> The one thing can fail though, and it is bio_alloc() however when this
> fails we are jumping out of the loop immediately without touching
> "issued" at all, so if it is a first bio issued = 0, hence there is
> nothing to wait for.
>
> I do not see any problems here.
Yeah, somehow I actually missed the whole point of completions (the done
variable). Anyway, I agree with you.
> But, now I can see you point about calling wakeup() for every completed
> bio, which is not strictly needed and we should call complete() only
> once when the last bio is completed. So how about this ?
>
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 1a320d2..ccf5a40 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -109,7 +109,6 @@ struct bio_batch
> atomic_t done;
> unsigned long flags;
> struct completion *wait;
> - bio_end_io_t *end_io;
> };
>
> static void bio_batch_end_io(struct bio *bio, int err)
> @@ -122,12 +121,9 @@ static void bio_batch_end_io(struct bio *bio, int err)
> else
> clear_bit(BIO_UPTODATE, &bb->flags);
> }
> - if (bb) {
> - if (bb->end_io)
> - bb->end_io(bio, err);
> - atomic_inc(&bb->done);
> - complete(bb->wait);
> - }
> + if (bb)
> + if (atomic_dec_and_test(&bb->done))
> + complete(bb->wait);
I don't think we actually have to check for bb != NULL, do you?
> bio_put(bio);
> }
>
> @@ -150,13 +146,12 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> int ret;
> struct bio *bio;
> struct bio_batch bb;
> - unsigned int sz, issued = 0;
> + unsigned int sz;
> DECLARE_COMPLETION_ONSTACK(wait);
>
> - atomic_set(&bb.done, 0);
> + atomic_set(&bb.done, 1);
> bb.flags = 1 << BIO_UPTODATE;
> bb.wait = &wait;
> - bb.end_io = NULL;
>
> submit:
> ret = 0;
> @@ -185,12 +180,12 @@ submit:
> break;
> }
> ret = 0;
> - issued++;
> + atomic_inc(&bb.done);
> submit_bio(WRITE, bio);
> }
>
> /* Wait for bios in-flight */
> - while (issued != atomic_read(&bb.done))
> + if (!atomic_dec_and_test(&bb.done))
> wait_for_completion(&wait);
>
> if (!test_bit(BIO_UPTODATE, &bb.flags))
Yep, this looks good to me. Thanks for fixing this up, Lukas!
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-07 14:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-04 9:46 [PATCH] block: fix mis-synchronisation in blkdev_issue_zeroout() Lukas Czerner
2011-03-04 14:15 ` Jeff Moyer
2011-03-04 15:04 ` Lukas Czerner
2011-03-04 15:15 ` Jeff Moyer
2011-03-07 12:25 ` Lukas Czerner
2011-03-07 14:38 ` Jeff Moyer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).