public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix race in get_request()
@ 2014-08-08  0:54 Jörn Engel
  2014-08-08 14:24 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jörn Engel @ 2014-08-08  0:54 UTC (permalink / raw)
  To: Jens Axboe, oren; +Cc: linux-kernel

Hello Jens!

I came across the below while investigating some other problem.
Something here doesn't seem right.  This looks like an obvious bug and
something roughly along the lines of my patch would fix it.  But I
must be in the wrong decade to find such a bug in the block layer.

Is this for real?  Or if not, what am I missing?

Jörn

--

If __get_request() returns NULL, get_request will call
prepare_to_wait_exclusive() followed by io_schedule().  Not rechecking
the sleep condition after prepare_to_wait_exclusive() leaves a race
where the condition changes before prepare_to_wait_exclusive(), but
not after and accordingly this thread never gets woken up.

The race must be exceedingly hard to hit, otherwise I cannot explain how
such a classic race could outlive the last millenium.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 block/blk-core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3275353957f0..00aa6c7abe5a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1068,6 +1068,11 @@ retry:
 
 	trace_block_sleeprq(q, bio, rw_flags & 1);
 
+	rq = __get_request(rl, rw_flags, bio, gfp_mask);
+	if (rq) {
+		finish_wait(&rl->wait[is_sync], &wait);
+		return rq;
+	}
 	spin_unlock_irq(q->queue_lock);
 	io_schedule();
 
-- 
2.0.0.rc0.1.g7b2ba98


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix race in get_request()
  2014-08-08  0:54 [PATCH] Fix race in get_request() Jörn Engel
@ 2014-08-08 14:24 ` Jens Axboe
  2014-08-08 14:28   ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2014-08-08 14:24 UTC (permalink / raw)
  To: Jörn Engel, oren; +Cc: linux-kernel

On 08/07/2014 06:54 PM, Jörn Engel wrote:
> Hello Jens!
> 
> I came across the below while investigating some other problem.
> Something here doesn't seem right.  This looks like an obvious bug and
> something roughly along the lines of my patch would fix it.  But I
> must be in the wrong decade to find such a bug in the block layer.
> 
> Is this for real?  Or if not, what am I missing?
> 
> Jörn
> 
> --
> 
> If __get_request() returns NULL, get_request will call
> prepare_to_wait_exclusive() followed by io_schedule().  Not rechecking
> the sleep condition after prepare_to_wait_exclusive() leaves a race
> where the condition changes before prepare_to_wait_exclusive(), but
> not after and accordingly this thread never gets woken up.
> 
> The race must be exceedingly hard to hit, otherwise I cannot explain how
> such a classic race could outlive the last millenium.

I think that is a genuine bug, it's just extremely hard to hit in real
life. It has probably only potentially ever triggered in the cases where
we are so out of memory that a blocking ~300b alloc fails, and Linux
generally shits itself pretty hard when it gets to that stage anyway...
And for the bug to be critical, you'd need this to happen for a device
that otherwise has no IO pending, since you'd get woken up by the next
completed request anyway.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3275353957f0..00aa6c7abe5a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1068,6 +1068,11 @@ retry:
>  
>  	trace_block_sleeprq(q, bio, rw_flags & 1);
>  
> +	rq = __get_request(rl, rw_flags, bio, gfp_mask);
> +	if (rq) {
> +		finish_wait(&rl->wait[is_sync], &wait);
> +		return rq;
> +	}

The extra __get_request() call should go before the
trace_block_sleeprq(), however. I'll rejuggle that when applying.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix race in get_request()
  2014-08-08 14:24 ` Jens Axboe
@ 2014-08-08 14:28   ` Jens Axboe
  2014-08-08 17:43     ` Jörn Engel
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2014-08-08 14:28 UTC (permalink / raw)
  To: Jörn Engel, oren; +Cc: linux-kernel

On 08/08/2014 08:24 AM, Jens Axboe wrote:
> On 08/07/2014 06:54 PM, Jörn Engel wrote:
>> Hello Jens!
>>
>> I came across the below while investigating some other problem.
>> Something here doesn't seem right.  This looks like an obvious bug and
>> something roughly along the lines of my patch would fix it.  But I
>> must be in the wrong decade to find such a bug in the block layer.
>>
>> Is this for real?  Or if not, what am I missing?
>>
>> Jörn
>>
>> --
>>
>> If __get_request() returns NULL, get_request will call
>> prepare_to_wait_exclusive() followed by io_schedule().  Not rechecking
>> the sleep condition after prepare_to_wait_exclusive() leaves a race
>> where the condition changes before prepare_to_wait_exclusive(), but
>> not after and accordingly this thread never gets woken up.
>>
>> The race must be exceedingly hard to hit, otherwise I cannot explain how
>> such a classic race could outlive the last millenium.
> 
> I think that is a genuine bug, it's just extremely hard to hit in real
> life. It has probably only potentially ever triggered in the cases where
> we are so out of memory that a blocking ~300b alloc fails, and Linux
> generally shits itself pretty hard when it gets to that stage anyway...
> And for the bug to be critical, you'd need this to happen for a device
> that otherwise has no IO pending, since you'd get woken up by the next
> completed request anyway.

Actually, this can't trigger for an empty queue, since the mempool holds
a few requests. So it should never result in a softlock, we will make
progress. Given that we also still hold the queue spinlock (that will be
held for a free as well), we should not be able to get a free of a
request until the prepare_to_wait() has been done. So not sure there is
an actual bug there, but I agree the code looks confusing that way.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix race in get_request()
  2014-08-08 14:28   ` Jens Axboe
@ 2014-08-08 17:43     ` Jörn Engel
  0 siblings, 0 replies; 4+ messages in thread
From: Jörn Engel @ 2014-08-08 17:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: oren, linux-kernel

On Fri, 8 August 2014 08:28:47 -0600, Jens Axboe wrote:
> On 08/08/2014 08:24 AM, Jens Axboe wrote:
> > On 08/07/2014 06:54 PM, Jörn Engel wrote:
> >>
> >> If __get_request() returns NULL, get_request will call
> >> prepare_to_wait_exclusive() followed by io_schedule().  Not rechecking
> >> the sleep condition after prepare_to_wait_exclusive() leaves a race
> >> where the condition changes before prepare_to_wait_exclusive(), but
> >> not after and accordingly this thread never gets woken up.
> >>
> >> The race must be exceedingly hard to hit, otherwise I cannot explain how
> >> such a classic race could outlive the last millenium.
> > 
> > I think that is a genuine bug, it's just extremely hard to hit in real
> > life. It has probably only potentially ever triggered in the cases where
> > we are so out of memory that a blocking ~300b alloc fails, and Linux
> > generally shits itself pretty hard when it gets to that stage anyway...
> > And for the bug to be critical, you'd need this to happen for a device
> > that otherwise has no IO pending, since you'd get woken up by the next
> > completed request anyway.
> 
> Actually, this can't trigger for an empty queue, since the mempool holds
> a few requests. So it should never result in a softlock, we will make
> progress. Given that we also still hold the queue spinlock (that will be
> held for a free as well), we should not be able to get a free of a
> request until the prepare_to_wait() has been done. So not sure there is
> an actual bug there, but I agree the code looks confusing that way.

The race I was afraid of is not on an empty queue, but where all IO
gets completed between the __get_request() and
prepare_to_wait_exclusive().  That may well be hundreds of requests,
so under normal circumstances they should take far longer than the
race window to complete.  So something has to enlarge the window.

With the spin_lock_irq held, that removes interrupts and a preemptible
kernel from the candidate list.  Now we would need hardware support,
either some kind of unfair SMT cpus or a hypervisor that doesn't
schedule the victim cpu's thread for a long time and at just the right
time.

Maybe the hypervisor case is at least theoretically possible.  But I
would also be happy with leaving the code as-is and at most adding a
comment for the next person.

Jörn

--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-08 17:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-08  0:54 [PATCH] Fix race in get_request() Jörn Engel
2014-08-08 14:24 ` Jens Axboe
2014-08-08 14:28   ` Jens Axboe
2014-08-08 17:43     ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox