All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Keith Busch <kbusch@kernel.org>
Cc: James Puthukattukaran <james.puthukattukaran@oracle.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Hans de Goede <hdegoede@redhat.com>,
	linux-pci@vger.kernel.org
Subject: Re: [External] : Re: sysfs interface to force power off
Date: Tue, 8 Nov 2022 21:16:53 +0100	[thread overview]
Message-ID: <20221108201653.GA4919@wunner.de> (raw)
In-Reply-To: <Y2p//Eqa9HGRmwWW@kbusch-mbp>

On Tue, Nov 08, 2022 at 09:12:44AM -0700, Keith Busch wrote:
> On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote:
> > 
> > There is a path to disable the controller and that code ran but did
> > not help. I checked wit the nvme folks and Keith mentioned that there
> > might be an issue with the nvme queue management. Unfortunately, we
> > can't try newer kernels in the field. So, looking for a way to just
> > "shut off the device" when we have scenarios like this where we can't
> > untangle the mess. 
> 
> Well, I didn't request you try new kernels in the field. I asked if you
> could experiment with a newer one on a development machine to confirm if
> the bug was fixed by some of the significant changes in this path so
> that we could confirm a reason to port to stable. You're going to have
> to change your kernel to fix this observation, so it would be worth the
> effort to know if the changes being considered actually address the
> problem.

Current mainline still contains this problematic sequence:

  nvme_reset_work()
    nvme_wait_freeze()
      blk_mq_freeze_queue_wait()

So I'm inclined to believe that the issue still persists, but I agree
that validating that hypothesis with a contemporary kernel should be
the first step.

I think nvme_reset_work() is overly optimistic that resetting the drive
succeeded.  It just freezes and unfreezes the I/O queue without checking
for errors.

In particular, nvme_wait_freeze() should call the _timeout variant of
blk_mq_freeze_queue_wait() and cope with failure of freezing.

Thanks,

Lukas

  reply	other threads:[~2022-11-08 20:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran
2022-11-07 20:41 ` Bjorn Helgaas
2022-11-07 21:14   ` [External] : " James Puthukattukaran
2022-11-07 21:29     ` Bjorn Helgaas
2022-11-08 16:12     ` Keith Busch
2022-11-08 20:16       ` Lukas Wunner [this message]
2022-11-08 20:37         ` Keith Busch
2022-11-08  9:53   ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108201653.GA4919@wunner.de \
    --to=lukas@wunner.de \
    --cc=hdegoede@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=james.puthukattukaran@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.