From: Lukas Wunner <lukas@wunner.de>
To: Tushar Dave <tdave@nvidia.com>, Bjorn Helgaas <helgaas@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, kbusch@kernel.org,
linux-pci@vger.kernel.org
Subject: Re: nvme-pci: Disabling device after reset failure: -5 occurs while AER recovery
Date: Sat, 11 Mar 2023 09:22:20 +0100 [thread overview]
Message-ID: <20230311082220.GA3649@wunner.de> (raw)
In-Reply-To: <20230309175321.GA1151233@bhelgaas> <843e2392-9ff0-2286-5f97-659831013c2e@nvidia.com> <20230310235306.GA1290793@bhelgaas> <4922cec7-ecc1-4971-75af-cdbaeaa6434f@nvidia.com>
On Fri, Mar 10, 2023 at 05:45:48PM -0800, Tushar Dave wrote:
> On 3/10/2023 3:53 PM, Bjorn Helgaas wrote:
> > In the log below, pciehp obviously is enabled; should I infer that in
> > the log above, it is not?
>
> pciehp is enabled all the time. In the log above and below.
> I do not have answer yet why pciehp shows-up only in some tests (due to DPC
> link down/up) and not in others like you noticed in both the logs.
Maybe some of the switch Downstream Ports are hotplug-capable and
some are not? (Check the Slot Implemented bit in the PCI Express
Capabilities Register as well as the Hot-Plug Capable bit in the
Slot Capabilities Register.)
> > Generally we've avoided handling a device reset as a remove/add event
> > because upper layers can't deal well with that. But in the log below
> > it looks like pciehp *did* treat the DPC containment as a remove/add,
> > which of course involves configuring the "new" device and its MPS
> > settings.
>
> yes and that puzzled me why? especially when"Link Down/Up ignored (recovered
> by DPC)". Do we still have race somewhere, I am not sure.
You're seeing the expected behavior. pciehp ignores DLLSC events
caused by DPC, but then double-checks that DPC recovery succeeded.
If it didn't, it would be a bug not to bring down the slot.
So pciehp does exactly that. See this code snippet in
pciehp_ignore_dpc_link_change():
/*
* If the link is unexpectedly down after successful recovery,
* the corresponding link change may have been ignored above.
* Synthesize it to ensure that it is acted on.
*/
down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (!pciehp_check_link_active(ctrl))
pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
up_read(&ctrl->reset_lock);
So on hotplug-capable ports, pciehp is able to mop up the mess created
by fiddling with the MPS settings behind the kernel's back.
We don't have that option on non-hotplug-capable ports. If error
recovery fails, we generally let the inaccessible devices remain
in the system and user interaction is necessary to recover, either
through a reboot or by manually removing and rescanning PCI devices
via syfs after reinstating sane MPS settings.
> - Switch and NVMe MPS are 512B
> - NVMe config space saved (including MPS=512B)
> - You change Switch MPS to 128B
> - NVMe does DMA with payload > 128B
> - Switch reports Malformed TLP because TLP is larger than its MPS
> - Recovery resets NVMe, which sets MPS to the default of 128B
> - nvme_slot_reset() restores NVMe config space (MPS is now 512B)
> - Subsequent NVMe DMA with payload > 128B repeats cycle
Forgive my ignorance, but if MPS is restored to 512B by nvme_slot_reset(),
shouldn't the communication with the device just work again from that
point on?
Thanks,
Lukas
next prev parent reply other threads:[~2023-03-11 8:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bcbd48b5-1d6e-8fe3-d6a0-cb341e5c34e3@nvidia.com>
2023-03-09 17:53 ` nvme-pci: Disabling device after reset failure: -5 occurs while AER recovery Bjorn Helgaas
2023-03-10 22:39 ` Tushar Dave
2023-03-10 23:53 ` Bjorn Helgaas
2023-03-11 1:45 ` Tushar Dave
2023-03-11 8:22 ` Lukas Wunner [this message]
2023-03-11 16:46 ` Keith Busch
2023-03-14 0:57 ` Tushar Dave
2023-03-14 16:11 ` Bjorn Helgaas
2023-03-14 17:26 ` Keith Busch
2023-03-15 20:01 ` Tushar Dave
2023-03-15 20:43 ` Sathyanarayanan Kuppuswamy
2023-03-15 22:16 ` Tushar Dave
2023-03-15 22:23 ` Sathyanarayanan Kuppuswamy
2023-03-15 22:25 ` Sathyanarayanan Kuppuswamy
2023-03-18 0:15 ` Tushar Dave
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230311082220.GA3649@wunner.de \
--to=lukas@wunner.de \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=tdave@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).