linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: James Puthukattukaran <james.puthukattukaran@oracle.com>,
	Hans de Goede <hdegoede@redhat.com>,
	linux-pci@vger.kernel.org
Subject: Re: sysfs interface to force power off
Date: Tue, 8 Nov 2022 10:53:02 +0100	[thread overview]
Message-ID: <20221108095302.GA29279@wunner.de> (raw)
In-Reply-To: <20221107204129.GA417338@bhelgaas>

On Mon, Nov 07, 2022 at 02:41:29PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 04, 2022 at 07:08:34PM -0400, James Puthukattukaran wrote:
> > Looking to solve a problem where we have nvme drives that are hung
> > in the field and we are not sure of the root cause but the working
> > theory is that the controller is "bad" and not responding properly
> > to commands. The nvme driver times out on outstanding IO requests
> > and as part of recovery, attempts to reset the controller and
> > reinitialize the device. The reset controller also hangs like here
> > --   
> > 
> > ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> > kernel:warning: [10419813.132342] Call Trace:
> > kernel:warning: [10419813.132345]  __schedule+0x2bc/0x89b
> > kernel:warning: [10419813.132348]  schedule+0x36/0x7c
> > kernel:warning: [10419813.132351]  blk_mq_freeze_queue_wait+0x4b/0xaa
> > kernel:warning: [10419813.132353]  ? remove_wait_queue+0x60/0x60
> > kernel:warning: [10419813.132359]  nvme_wait_freeze+0x33/0x50 [nvme_core]
> > kernel:warning: [10419813.132362]  nvme_reset_work+0x802/0xd84 [nvme]
> > kernel:warning: [10419813.132364]  ? __switch_to_asm+0x40/0x62
> > kernel:warning: [10419813.132365]  ? __switch_to_asm+0x34/0x62
> > kernel:warning: [10419813.132367]  ? __switch_to+0x9b/0x505
> > kernel:warning: [10419813.132368]  ? __switch_to_asm+0x40/0x62
> > kernel:warning: [10419813.132370]  ? __switch_to_asm+0x40/0x62
> > kernel:warning: [10419813.132372]  process_one_work+0x169/0x399
> > kernel:warning: [10419813.132374]  worker_thread+0x4d/0x3e5
> > kernel:warning: [10419813.132377]  kthread+0x105/0x138
> > kernel:warning: [10419813.132379]  ? rescuer_thread+0x380/0x375
> > kernel:warning: [10419813.132380]  ? kthread_bind+0x20/0x15
> > kernel:warning: [10419813.132382]  ret_from_fork+0x24/0x49
> > ...
> > 
> > So, I tried to hot power off the device via
> > "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs
> > waiting for the nvme reset thread to finish (like so) -- 
> 
> Looks like this "power" sysfs file could use some documentation.  I
> couldn't find anything in Documentation/ABI/testing/ that seems to
> cover it.

That sysfs attribute was introduced in early 2002, I guess we were
less diligent with documentation back then:

http://git.kernel.org/tglx/history/c/a8a2069f432c

(search for power_write_file() in the commit)


The problem here is in the NVMe / block layer, not the PCI layer.
nvme_wait_freeze() calls blk_mq_freeze_queue_wait(), but obviously
it should call blk_mq_freeze_queue_wait_timeout() instead and handle
a timeout by retiring any outstanding I/O requests to the drive and
marking it as dead.


> > kernel:warning: [10419813.158116]  __schedule+0x2bc/0x89b
> > kernel:warning: [10419813.158119]  schedule+0x36/0x7c
> > kernel:warning: [10419813.158122]  schedule_timeout+0x1f6/0x31f
> > kernel:warning: [10419813.158124]  ? sched_clock_cpu+0x11/0xa5
> > kernel:warning: [10419813.158126]  ? try_to_wake_up+0x59/0x505
> > kernel:warning: [10419813.158130]  wait_for_completion+0x12b/0x18a
> > kernel:warning: [10419813.158132]  ? wake_up_q+0x80/0x73
> > kernel:warning: [10419813.158134]  flush_work+0x122/0x1a7
> > kernel:warning: [10419813.158137]  ? wake_up_worker+0x30/0x2b
> > kernel:warning: [10419813.158141]  nvme_remove+0x71/0x100 [nvme]
> > kernel:warning: [10419813.158146]  pci_device_remove+0x3e/0xb6
> > kernel:warning: [10419813.158149]  device_release_driver_internal+0x134/0x1eb
> > kernel:warning: [10419813.158151]  device_release_driver+0x12/0x14
> > kernel:warning: [10419813.158155]  pci_stop_bus_device+0x7c/0x96
> > kernel:warning: [10419813.158158]  pci_stop_bus_device+0x39/0x96
> > kernel:warning: [10419813.158164]  pci_stop_and_remove_bus_device+0x12/0x1d
> > kernel:warning: [10419813.158167]  pciehp_unconfigure_device+0x7a/0x1d7
> > kernel:warning: [10419813.158169]  pciehp_disable_slot+0x52/0xca
> > kernel:warning: [10419813.158171]  pciehp_sysfs_disable_slot+0x67/0x112
> > kernel:warning: [10419813.158174]  disable_slot+0x12/0x14
> > kernel:warning: [10419813.158175]  power_write_file+0x6e/0xf8
> > kernel:warning: [10419813.158179]  pci_slot_attr_store+0x24/0x2e
> > kernel:warning: [10419813.158180]  sysfs_kf_write+0x3f/0x46
> > kernel:warning: [10419813.158182]  kernfs_fop_write+0x124/0x1a3
> > kernel:warning: [10419813.158184]  __vfs_write+0x3a/0x16d
> > kernel:warning: [10419813.158187]  ? audit_filter_syscall+0x33/0xce
> > kernel:warning: [10419813.158189]  vfs_write+0xb2/0x1a1
> > 
> > Is there a way to force power off the device instead of the
> > "graceful" approach? Obviously, we don't want to reset the system
> > and don't have physical access to the device.  
> > 
> > Would it make sense to create a "force power off" in
> > /sys/bus/pci/slots/X which basically

The power attribute in sysfs already does what you want, but when
unbinding the nvme driver from the device, the flush_work() call
waits for nvme_reset_work() to finish.  And because that's stuck,
unbinding also gets stuck.  Again, the solution is a code fix
in the NVMe / block layer, so the proper mailing list to ask
would be linux-nvme and linux-block.

Thanks,

Lukas

      parent reply	other threads:[~2022-11-08  9:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran
2022-11-07 20:41 ` Bjorn Helgaas
2022-11-07 21:14   ` [External] : " James Puthukattukaran
2022-11-07 21:29     ` Bjorn Helgaas
2022-11-08 16:12     ` Keith Busch
2022-11-08 20:16       ` Lukas Wunner
2022-11-08 20:37         ` Keith Busch
2022-11-08  9:53   ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108095302.GA29279@wunner.de \
    --to=lukas@wunner.de \
    --cc=hdegoede@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=james.puthukattukaran@oracle.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).