stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Zhang Yi <yizhan@redhat.com>,
	linux-block@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 1/2] nvme: fix race between removing and reseting failure
Date: Thu, 18 May 2017 10:13:07 -0400	[thread overview]
Message-ID: <20170518141307.GD28520@localhost.localdomain> (raw)
In-Reply-To: <20170517012729.13469-2-ming.lei@redhat.com>

On Wed, May 17, 2017 at 09:27:28AM +0800, Ming Lei wrote:
> When one NVMe PCI device is being resetted and found reset failue,
> nvme_remove_dead_ctrl() is called to handle the failure: blk-mq hw queues
> are put into stopped first, then schedule .remove_work to release the driver.
> 
> Unfortunately if the driver is being released via sysfs store
> just before the .remove_work is run, del_gendisk() from
> nvme_remove() may hang forever because hw queues are stopped and
> the submitted writeback IOs from fsync_bdev() can't be completed at all.
> 
> This patch fixes the following issue[1][2] by moving nvme_kill_queues()
> into nvme_remove_dead_ctrl() to avoid the issue because nvme_remove()
> flushs .reset_work, and this way is reasonable and safe because
> nvme_dev_disable() has started to suspend queues and canceled requests
> already.

I'm still not sure moving where we kill the queues is the correct way
to fix this problem. The nvme_kill_queues restarts all the hardware
queues to force all IO to failure already, so why is this really stuck?
We should be able to make forward progress even if we kill the queues
while calling into del_gendisk, right? That could happen with a different
sequence of events, so that also needs to work.

  parent reply	other threads:[~2017-05-18 14:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170517012729.13469-1-ming.lei@redhat.com>
2017-05-17  1:27 ` [PATCH 1/2] nvme: fix race between removing and reseting failure Ming Lei
2017-05-17  6:38   ` Johannes Thumshirn
2017-05-17  7:01     ` Ming Lei
2017-05-18 13:47   ` Christoph Hellwig
2017-05-18 15:04     ` Ming Lei
2017-05-18 14:13   ` Keith Busch [this message]
2017-05-19 12:52     ` Ming Lei
2017-05-19 15:15       ` Keith Busch
2017-05-19 14:41   ` Jens Axboe
2017-05-19 15:10     ` Ming Lei
2017-05-19 16:40     ` Ming Lei
2017-05-19 16:55       ` yizhan
2017-05-17  1:27 ` [PATCH 2/2] nvme: avoid to hang in remove disk Ming Lei
2017-05-18 13:49   ` Christoph Hellwig
2017-05-18 15:35     ` Ming Lei
2017-05-18 16:06       ` Keith Busch
2017-05-19 13:19         ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170518141307.GD28520@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    --cc=yizhan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).