Re: [PATCH] Fix race in get_request()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Jörn Engel" <joern@logfs.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: oren@purestorage.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Fix race in get_request()
Date: Fri, 8 Aug 2014 13:43:35 -0400	[thread overview]
Message-ID: <20140808174335.GA12487@logfs.org> (raw)
In-Reply-To: <53E4DE9F.7090603@kernel.dk>

On Fri, 8 August 2014 08:28:47 -0600, Jens Axboe wrote:
> On 08/08/2014 08:24 AM, Jens Axboe wrote:
> > On 08/07/2014 06:54 PM, Jörn Engel wrote:
> >>
> >> If __get_request() returns NULL, get_request will call
> >> prepare_to_wait_exclusive() followed by io_schedule().  Not rechecking
> >> the sleep condition after prepare_to_wait_exclusive() leaves a race
> >> where the condition changes before prepare_to_wait_exclusive(), but
> >> not after and accordingly this thread never gets woken up.
> >>
> >> The race must be exceedingly hard to hit, otherwise I cannot explain how
> >> such a classic race could outlive the last millenium.
> > 
> > I think that is a genuine bug, it's just extremely hard to hit in real
> > life. It has probably only potentially ever triggered in the cases where
> > we are so out of memory that a blocking ~300b alloc fails, and Linux
> > generally shits itself pretty hard when it gets to that stage anyway...
> > And for the bug to be critical, you'd need this to happen for a device
> > that otherwise has no IO pending, since you'd get woken up by the next
> > completed request anyway.
> 
> Actually, this can't trigger for an empty queue, since the mempool holds
> a few requests. So it should never result in a softlock, we will make
> progress. Given that we also still hold the queue spinlock (that will be
> held for a free as well), we should not be able to get a free of a
> request until the prepare_to_wait() has been done. So not sure there is
> an actual bug there, but I agree the code looks confusing that way.

The race I was afraid of is not on an empty queue, but where all IO
gets completed between the __get_request() and
prepare_to_wait_exclusive().  That may well be hundreds of requests,
so under normal circumstances they should take far longer than the
race window to complete.  So something has to enlarge the window.

With the spin_lock_irq held, that removes interrupts and a preemptible
kernel from the candidate list.  Now we would need hardware support,
either some kind of unfair SMT cpus or a hypervisor that doesn't
schedule the victim cpu's thread for a long time and at just the right
time.

Maybe the hypervisor case is at least theoretically possible.  But I
would also be happy with leaving the code as-is and at most adding a
comment for the next person.

Jörn

--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers

     prev parent reply	other threads:[~2014-08-08 17:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08  0:54 [PATCH] Fix race in get_request() Jörn Engel
2014-08-08 14:24 ` Jens Axboe
2014-08-08 14:28   ` Jens Axboe
2014-08-08 17:43     ` Jörn Engel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140808174335.GA12487@logfs.org \
    --to=joern@logfs.org \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oren@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox