From: Bjorn Helgaas <helgaas@kernel.org>
To: James Puthukattukaran <james.puthukattukaran@oracle.com>
Cc: Lukas Wunner <lukas@wunner.de>,
Hans de Goede <hdegoede@redhat.com>,
linux-pci@vger.kernel.org
Subject: Re: sysfs interface to force power off
Date: Mon, 7 Nov 2022 14:41:29 -0600 [thread overview]
Message-ID: <20221107204129.GA417338@bhelgaas> (raw)
In-Reply-To: <e0a2c30b-7741-4a89-1f7a-aa5eb7c5e4e3@oracle.com>
[+cc Lukas, Hans]
On Fri, Nov 04, 2022 at 07:08:34PM -0400, James Puthukattukaran wrote:
> Looking to solve a problem where we have nvme drives that are hung
> in the field and we are not sure of the root cause but the working
> theory is that the controller is "bad" and not responding properly
> to commands. The nvme driver times out on outstanding IO requests
> and as part of recovery, attempts to reset the controller and
> reinitialize the device. The reset controller also hangs like here
> --
>
> ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> kernel:warning: [10419813.132342] Call Trace:
> kernel:warning: [10419813.132345] __schedule+0x2bc/0x89b
> kernel:warning: [10419813.132348] schedule+0x36/0x7c
> kernel:warning: [10419813.132351] blk_mq_freeze_queue_wait+0x4b/0xaa
> kernel:warning: [10419813.132353] ? remove_wait_queue+0x60/0x60
> kernel:warning: [10419813.132359] nvme_wait_freeze+0x33/0x50 [nvme_core]
> kernel:warning: [10419813.132362] nvme_reset_work+0x802/0xd84 [nvme]
> kernel:warning: [10419813.132364] ? __switch_to_asm+0x40/0x62
> kernel:warning: [10419813.132365] ? __switch_to_asm+0x34/0x62
> kernel:warning: [10419813.132367] ? __switch_to+0x9b/0x505
> kernel:warning: [10419813.132368] ? __switch_to_asm+0x40/0x62
> kernel:warning: [10419813.132370] ? __switch_to_asm+0x40/0x62
> kernel:warning: [10419813.132372] process_one_work+0x169/0x399
> kernel:warning: [10419813.132374] worker_thread+0x4d/0x3e5
> kernel:warning: [10419813.132377] kthread+0x105/0x138
> kernel:warning: [10419813.132379] ? rescuer_thread+0x380/0x375
> kernel:warning: [10419813.132380] ? kthread_bind+0x20/0x15
> kernel:warning: [10419813.132382] ret_from_fork+0x24/0x49
> ...
>
> So, I tried to hot power off the device via
> "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs
> waiting for the nvme reset thread to finish (like so) --
Looks like this "power" sysfs file could use some documentation. I
couldn't find anything in Documentation/ABI/testing/ that seems to
cover it.
> kernel:warning: [10419813.158116] __schedule+0x2bc/0x89b
> kernel:warning: [10419813.158119] schedule+0x36/0x7c
> kernel:warning: [10419813.158122] schedule_timeout+0x1f6/0x31f
> kernel:warning: [10419813.158124] ? sched_clock_cpu+0x11/0xa5
> kernel:warning: [10419813.158126] ? try_to_wake_up+0x59/0x505
> kernel:warning: [10419813.158130] wait_for_completion+0x12b/0x18a
> kernel:warning: [10419813.158132] ? wake_up_q+0x80/0x73
> kernel:warning: [10419813.158134] flush_work+0x122/0x1a7
> kernel:warning: [10419813.158137] ? wake_up_worker+0x30/0x2b
> kernel:warning: [10419813.158141] nvme_remove+0x71/0x100 [nvme]
> kernel:warning: [10419813.158146] pci_device_remove+0x3e/0xb6
> kernel:warning: [10419813.158149] device_release_driver_internal+0x134/0x1eb
> kernel:warning: [10419813.158151] device_release_driver+0x12/0x14
> kernel:warning: [10419813.158155] pci_stop_bus_device+0x7c/0x96
> kernel:warning: [10419813.158158] pci_stop_bus_device+0x39/0x96
> kernel:warning: [10419813.158164] pci_stop_and_remove_bus_device+0x12/0x1d
> kernel:warning: [10419813.158167] pciehp_unconfigure_device+0x7a/0x1d7
> kernel:warning: [10419813.158169] pciehp_disable_slot+0x52/0xca
> kernel:warning: [10419813.158171] pciehp_sysfs_disable_slot+0x67/0x112
> kernel:warning: [10419813.158174] disable_slot+0x12/0x14
> kernel:warning: [10419813.158175] power_write_file+0x6e/0xf8
> kernel:warning: [10419813.158179] pci_slot_attr_store+0x24/0x2e
> kernel:warning: [10419813.158180] sysfs_kf_write+0x3f/0x46
> kernel:warning: [10419813.158182] kernfs_fop_write+0x124/0x1a3
> kernel:warning: [10419813.158184] __vfs_write+0x3a/0x16d
> kernel:warning: [10419813.158187] ? audit_filter_syscall+0x33/0xce
> kernel:warning: [10419813.158189] vfs_write+0xb2/0x1a1
>
> Is there a way to force power off the device instead of the
> "graceful" approach? Obviously, we don't want to reset the system
> and don't have physical access to the device.
>
> Would it make sense to create a "force power off" in
> /sys/bus/pci/slots/X which basically
> a) Sets completion timeout mask (CTO) (for outstanding IO requests
> not causing a fatal error due to CTOs; not an issue for DPCs I
> would think)
> b) power off the slot
> c) enable CTO mask
> d) unconfigure the device via pciehp_unconfigure_device
So I assume the existing sysfs slot "power" interface would do what
you want except that nvme_remove() hangs?
There might be some improvement to make in nvme_remove(); maybe it
doesn't correctly detect I/O errors or something.
But maybe there's *also* a case to be made for an interface like you
suggest. Lukas, Hans, any reaction to this?
Bjorn
next prev parent reply other threads:[~2022-11-07 20:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran
2022-11-07 20:41 ` Bjorn Helgaas [this message]
2022-11-07 21:14 ` [External] : " James Puthukattukaran
2022-11-07 21:29 ` Bjorn Helgaas
2022-11-08 16:12 ` Keith Busch
2022-11-08 20:16 ` Lukas Wunner
2022-11-08 20:37 ` Keith Busch
2022-11-08 9:53 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221107204129.GA417338@bhelgaas \
--to=helgaas@kernel.org \
--cc=hdegoede@redhat.com \
--cc=james.puthukattukaran@oracle.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).