All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: David Jeffery <djeffery@redhat.com>
Cc: linux-block@vger.kernel.org, John Pittman <jpittman@redhat.com>
Subject: Re: [PATCH] sbitmap: only queue kyber's wait callback if not already active
Date: Thu, 19 Dec 2019 13:41:42 -0800	[thread overview]
Message-ID: <20191219214142.GB826140@vader> (raw)
In-Reply-To: <20191217160024.GA23066@redhat>

On Tue, Dec 17, 2019 at 11:00:24AM -0500, David Jeffery wrote:
> Under heavy loads where the kyber I/O scheduler hits the token limits for
> its scheduling domains, kyber can become stuck.  When active requests
> complete, kyber may not be woken up leaving the I/O requests in kyber
> stuck.
> 
> This stuck state is due to a race condition with kyber and the sbitmap
> functions it uses to run a callback when enough requests have completed.
> The running of a sbt_wait callback can race with the attempt to insert the
> sbt_wait.  Since sbitmap_del_wait_queue removes the sbt_wait from the list
> first then sets the sbq field to NULL, kyber can see the item as not on a
> list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
> results in the sbt_wait being inserted onto the wait list but ws_active
> doesn't get incremented.  So the sbitmap queue does not know there is a
> waiter on a wait list.
> 
> Since sbitmap doesn't think there is a waiter, kyber may never be
> informed that there are domain tokens available and the I/O never advances.
> With the sbt_wait on a wait list, kyber believes it has an active waiter
> so cannot insert a new waiter when reaching the domain's full state.
> 
> This race can be fixed by only adding the sbt_wait to the queue if the
> sbq field is NULL.  If sbq is not NULL, there is already an action active
> which will trigger the re-running of kyber.  Let it run and add the
> sbt_wait to the wait list if still needing to wait.

So the race here is:

Thread 1                        Thread 2
kyber_domain_wake
  sbitmap_del_wait_queue
    list_del_init
    atomic_dec sbq->ws_active
                                kyber_get_domain_token
                                  list_empty_careful
                                  sbitmap_add_wait_queue
                                    if (!sqb_wait->sb) // false
                                    add_wait_queue
    sbq_wait->sbq = NULL

Now sbq_wait->sbq == NULL, sbq->ws_active = 0, and
!list_empty(domain_wait), so sbq_wake_ptr returns NULL and
sbitmap_queue_wake_up does nothing.

I get the feeling that sbitmap_{add,del}_wait_queue need some memory
barriers... But ignoring that, this fix seems right.

Reviewed-by: Omar Sandoval <osandov@fb.com>

P.S. s/sbt_wait/sbq_wait/g in the commit message.

      parent reply	other threads:[~2019-12-19 21:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-17 16:00 [PATCH] sbitmap: only queue kyber's wait callback if not already active David Jeffery
2019-12-18  1:14 ` Jens Axboe
2019-12-19 21:41 ` Omar Sandoval [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191219214142.GB826140@vader \
    --to=osandov@osandov.com \
    --cc=djeffery@redhat.com \
    --cc=jpittman@redhat.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.