public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block <linux-block@vger.kernel.org>,
	Gang Deng <gavin.dg@linux.alibaba.com>
Subject: [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs
Date: Tue, 30 Jan 2018 19:21:43 +0800	[thread overview]
Message-ID: <5ac9562b-b20a-9f5a-90c5-2187552b6f09@linux.alibaba.com> (raw)

Hi Jens and Folks,

Recently we've gotten a hard LOCKUP issue. After investigating the issue
we've found a race between blk_init_queue_node and blkcg_print_blkgs.
The race is described below.

blk_init_queue_node                 blkcg_print_blkgs
  blk_alloc_queue_node (1)
    q->queue_lock = &q->__queue_lock (2)
    blkcg_init_queue(q) (3)
                                    spin_lock_irq(blkg->q->queue_lock) (4)
  q->queue_lock = lock (5)
                                    spin_unlock_irq(blkg->q->queue_lock) (6)

(1) allocate an uninitialized queue;
(2) initialize queue_lock to its default internal lock;
(3) initialize blkcg part of request queue, which will create blkg and
then insert it to blkg_list;
(4) traverse blkg_list and find the created blkg, and then take its
queue lock, here it is the default *internal lock*;
(5) *race window*, now queue_lock is overridden with *driver specified
lock*;
(6) now unlock *driver specified lock*, not the locked *internal lock*,
unlock balance breaks.

For the issue above, I think blkcg_init_queue is a bit earlier. We
can't allow a further use before request queue is fully initialized.
Since blk_init_queue_node is a really common path and it allows driver
to override the default internal lock, I'm afraid several other places
may also have the same issue.
Am I missing something here?

Thanks,
Joseph

             reply	other threads:[~2018-01-30 11:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-30 11:21 Joseph Qi [this message]
2018-01-30 21:19 ` [RFC] hard LOCKUP caused by race between blk_init_queue_node and blkcg_print_blkgs Bart Van Assche
2018-01-31  1:53   ` Joseph Qi
2018-01-31 16:39     ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ac9562b-b20a-9f5a-90c5-2187552b6f09@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=gavin.dg@linux.alibaba.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox