linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Keith Busch <kbusch@kernel.org>
Cc: James Puthukattukaran <james.puthukattukaran@oracle.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Hans de Goede <hdegoede@redhat.com>,
	linux-pci@vger.kernel.org
Subject: Re: [External] : Re: sysfs interface to force power off
Date: Tue, 8 Nov 2022 21:16:53 +0100	[thread overview]
Message-ID: <20221108201653.GA4919@wunner.de> (raw)
In-Reply-To: <Y2p//Eqa9HGRmwWW@kbusch-mbp>

On Tue, Nov 08, 2022 at 09:12:44AM -0700, Keith Busch wrote:
> On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote:
> > 
> > There is a path to disable the controller and that code ran but did
> > not help. I checked wit the nvme folks and Keith mentioned that there
> > might be an issue with the nvme queue management. Unfortunately, we
> > can't try newer kernels in the field. So, looking for a way to just
> > "shut off the device" when we have scenarios like this where we can't
> > untangle the mess. 
> 
> Well, I didn't request you try new kernels in the field. I asked if you
> could experiment with a newer one on a development machine to confirm if
> the bug was fixed by some of the significant changes in this path so
> that we could confirm a reason to port to stable. You're going to have
> to change your kernel to fix this observation, so it would be worth the
> effort to know if the changes being considered actually address the
> problem.

Current mainline still contains this problematic sequence:

  nvme_reset_work()
    nvme_wait_freeze()
      blk_mq_freeze_queue_wait()

So I'm inclined to believe that the issue still persists, but I agree
that validating that hypothesis with a contemporary kernel should be
the first step.

I think nvme_reset_work() is overly optimistic that resetting the drive
succeeded.  It just freezes and unfreezes the I/O queue without checking
for errors.

In particular, nvme_wait_freeze() should call the _timeout variant of
blk_mq_freeze_queue_wait() and cope with failure of freezing.

Thanks,

Lukas

  reply	other threads:[~2022-11-08 20:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran
2022-11-07 20:41 ` Bjorn Helgaas
2022-11-07 21:14   ` [External] : " James Puthukattukaran
2022-11-07 21:29     ` Bjorn Helgaas
2022-11-08 16:12     ` Keith Busch
2022-11-08 20:16       ` Lukas Wunner [this message]
2022-11-08 20:37         ` Keith Busch
2022-11-08  9:53   ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108201653.GA4919@wunner.de \
    --to=lukas@wunner.de \
    --cc=hdegoede@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=james.puthukattukaran@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).