From: Bjorn Helgaas <helgaas@kernel.org>
To: Ido Schimmel <idosch@nvidia.com>
Cc: netdev@vger.kernel.org, linux-pci@vger.kernel.org,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
edumazet@google.com, bhelgaas@google.com,
alex.williamson@redhat.com, lukas@wunner.de, petrm@nvidia.com,
jiri@nvidia.com, mlxsw@nvidia.com
Subject: Re: [RFC PATCH net-next 05/12] PCI: Add device-specific reset for NVIDIA Spectrum devices
Date: Wed, 18 Oct 2023 15:08:26 -0500 [thread overview]
Message-ID: <20231018200826.GA1371652@bhelgaas> (raw)
In-Reply-To: <20231017074257.3389177-6-idosch@nvidia.com>
On Tue, Oct 17, 2023 at 10:42:50AM +0300, Ido Schimmel wrote:
> The PCIe specification defines two methods to trigger a hot reset across
> a link: Bus reset and link disablement (r6.0.1, sec 7.1, sec 6.6.1). In
> the first method, the Secondary Bus Reset (SBR) bit in the Bridge
> Control Register of the Downstream Port is asserted for at least 1ms
> (r6.0.1, sec 7.5.1.3.13). In the second method, the Link Disable bit in
> the Link Control Register of the Downstream Port is asserted and then
> cleared to disable and enable the link (r6.0.1, sec 7.5.3.7).
>
> While the two methods are identical from the perspective of the
> Downstream device, they are different as far as the host is concerned.
> In the first method, the Link Training and Status State Machine (LTSSM)
> of the Downstream Port is expected to be in the Hot Reset state as long
> as the SBR bit is asserted. In the second method, the LTSSM of the
> Downstream Port is expected to be in the Disabled state as long as the
> Link Disable bit is asserted.
>
> This above difference is of importance because the specification
> requires the LTTSM to exit from the Hot Reset state to the Detect state
> within a 2ms timeout (r6.0.1, sec 4.2.7.11).
I don't read 4.2.7.11 quite that way. Here's the text (from r6.0):
• Lanes that were directed by a higher Layer to initiate Hot
Reset:
◦ All Lanes in the configured Link transmit TS1 Ordered Sets
with the Hot Reset bit asserted and the configured Link and
Lane numbers.
◦ If two consecutive TS1 Ordered Sets are received on any
Lane with the Hot Reset bit asserted and configured Link
and Lane numbers, then:
▪ LinkUp = 0b (False)
▪ If no higher Layer is directing the Physical Layer to
remain in Hot Reset, the next state is Detect
▪ Otherwise, all Lanes in the configured Link continue to
transmit TS1 Ordered Sets with the Hot Reset bit asserted
and the configured Link and Lane numbers.
◦ Otherwise, after a 2 ms timeout next state is Detect.
I assume that SBR being set constitutes a "higher Layer directing the
Physical Layer to remain in Hot Reset," so I would read this as saying
the LTSSM stays in Hot Reset as long as SBR is set. Then, *after* a
2 ms timeout (not *within* 2 ms), the next state is Detect.
> NVIDIA Spectrum devices cannot guarantee it and a host enforcing
> such a behavior might fail to communicate with the device after
> issuing a Secondary Bus Reset.
I don't quite follow this. What behavior is the host enforcing here?
I guess you're doing an SBR, and the Spectrum device doesn't respond
as expected afterwards?
It looks like pci_reset_secondary_bus() asserts SBR for at least
2 ms. Then pci_bridge_wait_for_secondary_bus() should wait before
accessing the device, but maybe we don't wait long enough?
I guess this ends up back at d3cold_delay as suggested by Lukas.
> With the link disablement method, the host can leave the link
> disabled for enough time to allow the device to undergo a hot reset
> and reach the Detect state. After enabling the link, the host will
> exit from the Disabled state to Detect state (r6.0.1, sec 4.2.7.9)
> and observe that the device is already in the Detect state.
>
> The PCI core only implements the first method, which might not work with
> NVIDIA Spectrum devices on certain hosts, as explained above. Therefore,
> implement the link disablement method as a device-specific method for
> NVIDIA Spectrum devices. Specifically, disable the link, wait for 500ms,
> enable the link and then wait for the device to become accessible.
>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> drivers/pci/quirks.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 23f6bd2184e2..a6e308bb934c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4182,6 +4182,31 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
> return 0;
> }
>
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM 0xcb84
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM2 0xcf6c
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM3 0xcf70
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM4 0xcf80
> +
> +static int reset_mlx(struct pci_dev *pdev, bool probe)
> +{
> + struct pci_dev *bridge = pdev->bus->self;
> +
> + if (probe)
> + return 0;
> +
> + /*
> + * Disable the link on the Downstream port in order to trigger a hot
> + * reset in the Downstream device. Wait for 500ms before enabling the
> + * link so that the firmware on the device will have enough time to
> + * transition the Upstream port to the Detect state.
> + */
> + pcie_capability_set_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> + msleep(500);
> + pcie_capability_clear_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> +
> + return pci_bridge_wait_for_secondary_bus(bridge, "link toggle");
> +}
> +
> static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
> reset_intel_82599_sfp_virtfn },
> @@ -4197,6 +4222,10 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> reset_chelsio_generic_dev },
> { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
> reset_hinic_vf_dev },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM2, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM3, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM4, reset_mlx },
> { 0 }
> };
>
> --
> 2.40.1
>
next prev parent reply other threads:[~2023-10-18 20:08 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-17 7:42 [RFC PATCH net-next 00/12] mlxsw: Add support for new reset flow Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 01/12] netdevsim: Block until all devices are released Ido Schimmel
2023-10-19 0:53 ` Jakub Kicinski
2023-10-17 7:42 ` [RFC PATCH net-next 02/12] devlink: Hold a reference on parent device Ido Schimmel
2023-10-17 7:56 ` Jiri Pirko
2023-10-17 8:11 ` Ido Schimmel
2023-10-17 9:01 ` Jiri Pirko
2023-10-17 7:42 ` [RFC PATCH net-next 03/12] devlink: Acquire device lock during reload Ido Schimmel
2023-10-17 8:04 ` Jiri Pirko
2023-10-17 7:42 ` [RFC PATCH net-next 04/12] PCI: Add no PM reset quirk for NVIDIA Spectrum devices Ido Schimmel
2023-10-18 19:40 ` Bjorn Helgaas
2023-10-22 8:23 ` Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 05/12] PCI: Add device-specific reset " Ido Schimmel
2023-10-17 10:00 ` Lukas Wunner
2023-10-18 20:08 ` Bjorn Helgaas [this message]
2023-10-25 11:05 ` Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 06/12] PCI: Add debug print for device ready delay Ido Schimmel
2023-10-18 19:41 ` Bjorn Helgaas
2023-10-17 7:42 ` [RFC PATCH net-next 07/12] mlxsw: Extend MRSR pack() function to support new commands Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 08/12] mlxsw: pci: Rename mlxsw_pci_sw_reset() Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 09/12] mlxsw: pci: Move software reset code to a separate function Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 10/12] mlxsw: pci: Add support for new reset flow Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 11/12] mlxsw: pci: Implement PCI reset handlers Ido Schimmel
2023-10-17 7:42 ` [RFC PATCH net-next 12/12] selftests: mlxsw: Add PCI reset test Ido Schimmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231018200826.GA1371652@bhelgaas \
--to=helgaas@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=idosch@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kuba@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mlxsw@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=petrm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox