All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: James Smart <james.smart@broadcom.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Andrew Jones <drjones@redhat.com>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	"James E . J . Bottomley" <jejb@linux.vnet.ibm.com>,
	stable <stable@vger.kernel.org>,
	"jianchao . wang" <jianchao.w.wang@oracle.com>
Subject: Re: [PATCH V2] SCSI: fix queue cleanup race before queue initialization is done
Date: Sat, 30 Mar 2019 07:22:18 +0800	[thread overview]
Message-ID: <20190329232217.GA21943@ming.t460p> (raw)
In-Reply-To: <baba41b1-8495-8bd1-0bea-341d207852a7@broadcom.com>

Hello James,

On Fri, Mar 29, 2019 at 01:21:12PM -0700, James Smart wrote:
> 
> 
> On 11/21/2018 5:42 PM, Jens Axboe wrote:
> > On 11/21/18 6:00 PM, Ming Lei wrote:
> > > On Wed, Nov 21, 2018 at 02:47:35PM -0700, Jens Axboe wrote:
> > > > On 11/14/18 8:20 AM, Jens Axboe wrote:
> > > > > On 11/14/18 1:25 AM, Ming Lei wrote:
> > > > > > c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
> > > > > > already fixed this race, however the implied synchronize_rcu()
> > > > > > in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
> > > > > > performance regression.
> > > > > > 
> > > > > > Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
> > > > > > tried to quiesce queue for avoiding unnecessary synchronize_rcu()
> > > > > > only when queue initialization is done, because it is usual to see
> > > > > > lots of inexistent LUNs which need to be probed.
> > > > > > 
> > > > > > However, turns out it isn't safe to quiesce queue only when queue
> > > > > > initialization is done. Because when one SCSI command is completed,
> > > > > > the user of sending command can be waken up immediately, then the
> > > > > > scsi device may be removed, meantime the run queue in scsi_end_request()
> > > > > > is still in-progress, so kernel panic can be caused.
> > > > > > 
> > > > > > In Red Hat QE lab, there are several reports about this kind of kernel
> > > > > > panic triggered during kernel booting.
> > > > > > 
> > > > > > This patch tries to address the issue by grabing one queue usage
> > > > > > counter during freeing one request and the following run queue.
> > > > > Thanks applied, this bug was elusive but ever present in recent
> > > > > testing that we did internally, it's been a huge pain in the butt.
> > > > > The symptoms were usually a crash in blk_mq_get_driver_tag() with
> > > > > hctx->tags == NULL, or a crash inside deadline request insert off
> > > > > requeue.
> 
> All,
> 
> We are seeing errors with the following error:
> 
> [44492.814347] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [44492.814383] IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30
> ...
> [44492.815634] Call Trace:
> [44492.815652]  [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90
> [44492.819755]  [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120
> [44492.819777]  [<ffffffff81098cb4>] process_one_work+0x154/0x410
> [44492.819781]  [<ffffffff81099896>] worker_thread+0x116/0x4a0
> [44492.819784]  [<ffffffff8109edb9>] kthread+0xc9/0xe0
> [44492.819790]  [<ffffffff81619b05>] ret_from_fork+0x55/0x80
> [44492.822798] DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
> [44492.822798]
> [44492.822799] Leftover inexact backtrace:
> 
> [44492.822802]  [<ffffffff8109ecf0>] ? kthread_park+0x50/0x50
> [44492.822818] Code: c6 44 0f 46 ce 83 c2 01 45 89 ca 4c 89 54 01 08 48 8b
> 4f
> 10 2b 74 01 08 39 57 08 77 d8 f3 c3 90 8b 4f 08 85 c9 74 1f 48 8b 57 10 <48>
> 83
> 3a 00 75 18 31 c0 eb 0a 48 83 c2 40 48 83 3a 00 75 0a 83
> [44492.822820] RIP  [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30
> [44492.822821]  RSP <ffff8807219ffdd8>
> [44492.822821] CR2: 0000000000000000
> 
> It appears the queue has been freed thus the bitmap is bad.

Could you provide a little background about this report? Such as the
device/driver, reproduction steps, and kernel release.

> 
> Looking at the commit relative to this email thread:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_lib.c?id=8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a
> 
> It's interesting that the queue reference taken was released after the
> kblockd_schedule_work() call was made, and it's this work element that is
> hitting the issue. So perhaps the patch missed keeping the reference until
> the requeue_work item finished ?

blk_mq's requeue_work is supposed to be drained before freeing queue,
see blk_sync_queue(), and SCSI's requeue_work should have been drained too.

This following change might make a difference for this issue, but looks
it isn't good enough, given SCSI's requeue may come between
cancel_work_sync() and blk_cleanup_queue(). Will take a close look on it
in this weekend.

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 6a9040faed00..94882f65ccf1 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1397,8 +1397,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	scsi_device_set_state(sdev, SDEV_DEL);
 	mutex_unlock(&sdev->state_mutex);
 
-	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
+	blk_cleanup_queue(sdev->request_queue);
 
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);

Thanks,
Ming

  reply	other threads:[~2019-03-29 23:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  8:25 [PATCH V2] SCSI: fix queue cleanup race before queue initialization is done Ming Lei
2018-11-14 15:02 ` Bart Van Assche
2018-11-15  0:48   ` Ming Lei
2018-11-14 15:20 ` Jens Axboe
2018-11-15  1:02   ` Ming Lei
2018-11-21 21:47   ` Jens Axboe
2018-11-21 22:02     ` Theodore Y. Ts'o
2018-11-22  3:43       ` Theodore Y. Ts'o
2018-11-22  1:00     ` Ming Lei
2018-11-22  1:00       ` Ming Lei
2018-11-22  1:42       ` Jens Axboe
2018-11-22  2:00         ` Ming Lei
2018-11-22  2:14           ` Jens Axboe
2018-11-22  2:47             ` Ming Lei
2019-03-29 20:21         ` James Smart
2019-03-29 23:22           ` Ming Lei [this message]
2019-03-31  3:11           ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190329232217.GA21943@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@wdc.com \
    --cc=drjones@redhat.com \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jianchao.w.wang@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.