* [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
@ 2024-12-12 8:56 Lukas Wunner
2024-12-12 14:33 ` Ilpo Järvinen
2024-12-12 16:11 ` Bjorn Helgaas
0 siblings, 2 replies; 14+ messages in thread
From: Lukas Wunner @ 2024-12-12 8:56 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, Niklas Schnelle, Ilpo Jarvinen, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds. The Max Link Speed field in
the Link Capabilities Register indicates the *maximum* of those speeds.
Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
Ilpo recalls seeing this inconsistency on more devices.
pcie_get_supported_speeds() neglects to honor the Max Link Speed field
and will thus incorrectly deem higher speeds as supported. Fix it.
Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
Reported-by: Niklas Schnelle <niks@kernel.org>
Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
drivers/pci/pci.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 35dc9f2..b730560 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
+ /* Ignore speeds higher than Max Link Speed */
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
+
/* PCIe r3.0-compliant */
if (speeds)
return speeds;
- pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
-
/* Synthesize from the Max Link Speed field */
if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 8:56 [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds Lukas Wunner
@ 2024-12-12 14:33 ` Ilpo Järvinen
2024-12-12 20:10 ` Niklas Schnelle
2024-12-12 22:13 ` Lukas Wunner
2024-12-12 16:11 ` Bjorn Helgaas
1 sibling, 2 replies; 14+ messages in thread
From: Ilpo Järvinen @ 2024-12-12 14:33 UTC (permalink / raw)
To: Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Niklas Schnelle, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]
On Thu, 12 Dec 2024, Lukas Wunner wrote:
> The Supported Link Speeds Vector in the Link Capabilities 2 Register
> indicates the *supported* link speeds. The Max Link Speed field in
> the Link Capabilities Register indicates the *maximum* of those speeds.
>
> Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> Ilpo recalls seeing this inconsistency on more devices.
>
> pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> and will thus incorrectly deem higher speeds as supported. Fix it.
>
> Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> Reported-by: Niklas Schnelle <niks@kernel.org>
> Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
> drivers/pci/pci.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 35dc9f2..b730560 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
>
> + /* Ignore speeds higher than Max Link Speed */
> + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
Hi Lukas,
Why do you start GENMASK() from 0th position? That's the reserved bit.
(I doesn't exactly cause a misbehavior to & the never set 0th bit but
it is slightly confusing).
I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or
PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
to make it explicit where it originates from.
--
i.
> +
> /* PCIe r3.0-compliant */
> if (speeds)
> return speeds;
>
> - pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> -
> /* Synthesize from the Max Link Speed field */
> if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 8:56 [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds Lukas Wunner
2024-12-12 14:33 ` Ilpo Järvinen
@ 2024-12-12 16:11 ` Bjorn Helgaas
2024-12-12 16:58 ` Niklas Schnelle
2024-12-13 9:16 ` Lukas Wunner
1 sibling, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2024-12-12 16:11 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-pci, Niklas Schnelle, Ilpo Jarvinen, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> The Supported Link Speeds Vector in the Link Capabilities 2 Register
> indicates the *supported* link speeds. The Max Link Speed field in
> the Link Capabilities Register indicates the *maximum* of those speeds.
>
> Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> Ilpo recalls seeing this inconsistency on more devices.
>
> pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> and will thus incorrectly deem higher speeds as supported. Fix it.
>
> Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> Reported-by: Niklas Schnelle <niks@kernel.org>
> Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Looks like you want this in v6.13? Can we make commit log more
explicit as to why we need it there? Is this change enough to resolve
the boot hang Niklas reported?
> ---
> drivers/pci/pci.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 35dc9f2..b730560 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
>
> + /* Ignore speeds higher than Max Link Speed */
> + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> +
> /* PCIe r3.0-compliant */
> if (speeds)
> return speeds;
>
> - pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> -
> /* Synthesize from the Max Link Speed field */
> if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 16:11 ` Bjorn Helgaas
@ 2024-12-12 16:58 ` Niklas Schnelle
2024-12-12 19:40 ` Niklas Schnelle
2024-12-13 9:16 ` Lukas Wunner
1 sibling, 1 reply; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-12 16:58 UTC (permalink / raw)
To: Bjorn Helgaas, Lukas Wunner
Cc: linux-pci, Niklas Schnelle, Ilpo Jarvinen, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
On Thu, 2024-12-12 at 10:11 -0600, Bjorn Helgaas wrote:
> On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds. The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> >
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> >
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported. Fix it.
> >
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> Looks like you want this in v6.13? Can we make commit log more
> explicit as to why we need it there? Is this change enough to resolve
> the boot hang Niklas reported?
As for if it fixes my hang I will test this later today when I come
home. But even if it is not enough on its own, I believe that this will
be needed as a prerequisite for the fix of my hang issue in that
without this patch the dev->supported_speeds incorrectly shows multiple
speeds as supported making it impossible to suppress probing of bwctrl
for devices that only support a fixed speed.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 16:58 ` Niklas Schnelle
@ 2024-12-12 19:40 ` Niklas Schnelle
2024-12-13 9:43 ` Lukas Wunner
0 siblings, 1 reply; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-12 19:40 UTC (permalink / raw)
To: Niklas Schnelle, Bjorn Helgaas, Lukas Wunner
Cc: linux-pci, Ilpo Jarvinen, Jonathan Cameron, Mika Westerberg,
Maciej W. Rozycki
On Thu, 2024-12-12 at 17:58 +0100, Niklas Schnelle wrote:
> On Thu, 2024-12-12 at 10:11 -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > indicates the *supported* link speeds. The Max Link Speed field in
> > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > >
> > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > Ilpo recalls seeing this inconsistency on more devices.
> > >
> > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > and will thus incorrectly deem higher speeds as supported. Fix it.
> > >
> > > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > > Reported-by: Niklas Schnelle <niks@kernel.org>
> > > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >
> > Looks like you want this in v6.13? Can we make commit log more
> > explicit as to why we need it there? Is this change enough to resolve
> > the boot hang Niklas reported?
>
> As for if it fixes my hang I will test this later today when I come
> home. But even if it is not enough on its own, I believe that this will
> be needed as a prerequisite for the fix of my hang issue in that
> without this patch the dev->supported_speeds incorrectly shows multiple
> speeds as supported making it impossible to suppress probing of bwctrl
> for devices that only support a fixed speed.
Ok, gave this a test and as somewhat suspected this patch alone doesn't
fix my boot hang nor do I get more output (also tried Lukas suggestion
with early_printk).
Then I put my patch using the hweight8(dev->supported_speeds) > 1
condition suggested by Lukas on top and with both things work again. I
would now propose that once we've cleared up Ilpo's comment I sent a
series with both patches for you to easily pick together. If you prefer
I can of course also sent my patch stand alone.
Thanks,
Niklas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 14:33 ` Ilpo Järvinen
@ 2024-12-12 20:10 ` Niklas Schnelle
2024-12-12 22:13 ` Lukas Wunner
1 sibling, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-12 20:10 UTC (permalink / raw)
To: Ilpo Järvinen, Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Jonathan Cameron, Mika Westerberg,
Maciej W. Rozycki
On Thu, 2024-12-12 at 16:33 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
>
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds. The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> >
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> >
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported. Fix it.
> >
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> > drivers/pci/pci.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 35dc9f2..b730560 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >
> > + /* Ignore speeds higher than Max Link Speed */
> > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
>
> Hi Lukas,
>
> Why do you start GENMASK() from 0th position? That's the reserved bit.
> (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> it is slightly confusing).
>
> I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or
> PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> to make it explicit where it originates from.
>
Hi Ilpo,
I agree this is quite confusing even in the PCIe spec. According to
PCIe r6.2 the value of the Max Link Speed field references a bit in the
Supported Link Speeds Vector with a value of 0b0001 in Max Link Speeds
referring to bit 0 in Supported Link Speeds, a value of 0xb0010 to bit
1 and so on. So the value is actually shiftet left by 1 versus the
typical (i.e. non IBM ;-)) bit counting.
Then looking at the Supported Link Speeds description they refer to bit
0 as 2.5 GT/s, bit 1 as 5 GT/s up to bit 6 (RsvdP) while in the figure
the RsvdP is the right most bit. So unless I have completely confused
myself playing around with this genmask calculator[0] we actually want
GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 1) because just like the Supported
Link Speeds field the dev->supported_speeds also reserves the LSB. And
I feel this actually matches the spec wording pretty well. What do you
think?
Thanks,
Niklas
[0] https://mazdermind.de/genmask/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 14:33 ` Ilpo Järvinen
2024-12-12 20:10 ` Niklas Schnelle
@ 2024-12-12 22:13 ` Lukas Wunner
2024-12-13 10:12 ` Ilpo Järvinen
1 sibling, 1 reply; 14+ messages in thread
From: Lukas Wunner @ 2024-12-12 22:13 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Bjorn Helgaas, linux-pci, Niklas Schnelle, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >
> > + /* Ignore speeds higher than Max Link Speed */
> > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
>
> Why do you start GENMASK() from 0th position? That's the reserved bit.
> (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> it is slightly confusing).
GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
doing a GENMASK(0, 1) here, fulfilling the l > h condition.
Granted, the BUILD_BUG_ON() only triggers if l and h can be
evaluated at compile time, which isn't the case here.
Still, I felt uneasy risking any potential future breakage.
Including the reserved bit 0 is harmless because it's forced to
zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
expression.
> I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or
> PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> to make it explicit where it originates from.
Pardon me for being dense but I don't understand what you mean.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 16:11 ` Bjorn Helgaas
2024-12-12 16:58 ` Niklas Schnelle
@ 2024-12-13 9:16 ` Lukas Wunner
1 sibling, 0 replies; 14+ messages in thread
From: Lukas Wunner @ 2024-12-13 9:16 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, Niklas Schnelle, Ilpo Jarvinen, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
On Thu, Dec 12, 2024 at 10:11:03AM -0600, Bjorn Helgaas wrote:
> On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > indicates the *supported* link speeds. The Max Link Speed field in
> > the Link Capabilities Register indicates the *maximum* of those speeds.
> >
> > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > Ilpo recalls seeing this inconsistency on more devices.
> >
> > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > and will thus incorrectly deem higher speeds as supported. Fix it.
> >
> > Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> > Reported-by: Niklas Schnelle <niks@kernel.org>
> > Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> Looks like you want this in v6.13? Can we make commit log more
> explicit as to why we need it there? Is this change enough to resolve
> the boot hang Niklas reported?
This is more like a "we're not conforming to the spec" kind of fix,
but also happens to be a prerequisite to fixing Niklas' boot hang.
Another user-visible issue addressed here is that we're exposing an
incorrect value in the sysfs "max_link_speed" attribute on devices
such as the JHL7540 Thunderbolt controller mentioned in the commit
message.
Thanks,
Lukas
> > ---
> > drivers/pci/pci.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 35dc9f2..b730560 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> >
> > + /* Ignore speeds higher than Max Link Speed */
> > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > +
> > /* PCIe r3.0-compliant */
> > if (speeds)
> > return speeds;
> >
> > - pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > -
> > /* Synthesize from the Max Link Speed field */
> > if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> > speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
> > --
> > 2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 19:40 ` Niklas Schnelle
@ 2024-12-13 9:43 ` Lukas Wunner
2024-12-13 17:22 ` Niklas Schnelle
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Wunner @ 2024-12-13 9:43 UTC (permalink / raw)
To: Niklas Schnelle
Cc: Niklas Schnelle, Bjorn Helgaas, linux-pci, Ilpo Jarvinen,
Jonathan Cameron, Mika Westerberg, Maciej W. Rozycki
On Thu, Dec 12, 2024 at 08:40:07PM +0100, Niklas Schnelle wrote:
> > > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > > indicates the *supported* link speeds. The Max Link Speed field in
> > > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > > >
> > > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > > Ilpo recalls seeing this inconsistency on more devices.
> > > >
> > > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > > and will thus incorrectly deem higher speeds as supported. Fix it.
>
> Ok, gave this a test and as somewhat suspected this patch alone doesn't
> fix my boot hang nor do I get more output (also tried Lukas suggestion
> with early_printk).
Hm, that's kind of a bummer because while we know how to work around
your boot hang (by disabling bwctrl altogether), we don't really know
the root cause.
The bwctrl IRQ handler runs in hardirq context, so if it ends up in an
infinite loop for some reason or keeps waiting for a spinlock, that
might indeed cause a boot hang. Not that I'm seeing in the code where
that might occur. Nevertheless you can try adding "threadirqs" to the
kernel command line to force all IRQ handlers into threads.
Alternatively, enable CONFIG_PREEMPT_RT to also turn spinlocks into
sleeping locks. Maybe this turns the boot hang into a hung task splat
and thus helps identify the root cause.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-12 22:13 ` Lukas Wunner
@ 2024-12-13 10:12 ` Ilpo Järvinen
2024-12-13 17:21 ` Niklas Schnelle
2024-12-13 17:41 ` Niklas Schnelle
0 siblings, 2 replies; 14+ messages in thread
From: Ilpo Järvinen @ 2024-12-13 10:12 UTC (permalink / raw)
To: Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Niklas Schnelle, Jonathan Cameron,
Mika Westerberg, Maciej W. Rozycki
[-- Attachment #1: Type: text/plain, Size: 2719 bytes --]
On Thu, 12 Dec 2024, Lukas Wunner wrote:
> On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > >
> > > + /* Ignore speeds higher than Max Link Speed */
> > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> >
> > Why do you start GENMASK() from 0th position? That's the reserved bit.
> > (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> > it is slightly confusing).
>
> GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
> does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
> doing a GENMASK(0, 1) here, fulfilling the l > h condition.
>
> Granted, the BUILD_BUG_ON() only triggers if l and h can be
> evaluated at compile time, which isn't the case here.
>
> Still, I felt uneasy risking any potential future breakage.
> Including the reserved bit 0 is harmless because it's forced to
> zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
> expression.
Fair but that's quite many thoughts you didn't record into the commit
message. If you intentionally do confusing tricks like that, please be
mindful about the poor guy who has to figure this out years from now. :-)
(One of the most annoying thing in digging into commits far in the past
are those nagging "Why?" questions nobody is there to answer.)
I know it doesn't misbehave because of bit 0 is cleared by the earlier
statement. (I also thought of the GENMASK() inversion.)
Another option would be to explicitly ensure PCI_EXP_LNKCAP_SLS is at
least PCI_EXP_LNKCAP2_SLS_2_5GB which would also ensure pre-3.0 part
returns always at least one speed (which the current code doesn't
guarantee).
As in more broader terms there are other kinds of broken devices this
code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the
device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
> > I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or
> > PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> > to make it explicit where it originates from.
>
> Pardon me for being dense but I don't understand what you mean.
I meant that instead of GENMASK(..., 0) use
GENMASK(..., __ffs(PCI_EXP_LNKCAP2_SLS)). But it doesn't matter
if go with this bit 0 included into GENMASK() approach.
--
i.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-13 10:12 ` Ilpo Järvinen
@ 2024-12-13 17:21 ` Niklas Schnelle
2024-12-13 17:41 ` Niklas Schnelle
1 sibling, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-13 17:21 UTC (permalink / raw)
To: Ilpo Järvinen, Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Jonathan Cameron, Mika Westerberg,
Maciej W. Rozycki
On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
>
> > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > >
> > > > + /* Ignore speeds higher than Max Link Speed */
> > > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > >
> > > Why do you start GENMASK() from 0th position? That's the reserved bit.
> > > (I doesn't exactly cause a misbehavior to & the never set 0th bit but
> > > it is slightly confusing).
> >
> > GENMASK() does a BUILD_BUG_ON(l > h) and if a broken PCIe device
> > does not set any bits in the PCI_EXP_LNKCAP_SLS field, I'd risk
> > doing a GENMASK(0, 1) here, fulfilling the l > h condition.
> >
> > Granted, the BUILD_BUG_ON() only triggers if l and h can be
> > evaluated at compile time, which isn't the case here.
> >
> > Still, I felt uneasy risking any potential future breakage.
> > Including the reserved bit 0 is harmless because it's forced to
> > zero by the earlier "speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS"
> > expression.
>
> Fair but that's quite many thoughts you didn't record into the commit
> message. If you intentionally do confusing tricks like that, please be
> mindful about the poor guy who has to figure this out years from now. :-)
> (One of the most annoying thing in digging into commits far in the past
> are those nagging "Why?" questions nobody is there to answer.)
>
> I know it doesn't misbehave because of bit 0 is cleared by the earlier
> statement. (I also thought of the GENMASK() inversion.)
>
> Another option would be to explicitly ensure PCI_EXP_LNKCAP_SLS is at
> least PCI_EXP_LNKCAP2_SLS_2_5GB which would also ensure pre-3.0 part
> returns always at least one speed (which the current code doesn't
> guarantee).
>
> As in more broader terms there are other kinds of broken devices this
> code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the
> device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
>
> > > I suggest to get that either from PCI_EXP_LNKCAP2_SLS_2_5GB or
> > > PCI_EXP_LNKCAP2_SLS (e.g. with __ffs()) and do not use literal at all
> > > to make it explicit where it originates from.
> >
> > Pardon me for being dense but I don't understand what you mean.
>
> I meant that instead of GENMASK(..., 0) use
> GENMASK(..., __ffs(PCI_EXP_LNKCAP2_SLS)). But it doesn't matter
> if go with this bit 0 included into GENMASK() approach.
>
So that's essentially GENMASK(…, 1). The reason I'm not sure if it
really is more self documenting is that anyone trying to grok this is
going to look in the PCIe spec and that already shows the bit 0 as
reserved (if it still is reserved then) and the bit numbering
explaining the offset and the __ffs() is just going to add confusion.
Instead I prefer your other idea of GENMASK(…,
PCI_EXP_LNKCAP2_SLS_2_5GB) because together with that constant's
comment it clearly communicates "starting at LNKCAP2 SLS Vector bit 0".
For the issue of GENMASK() not being well defined for l == 0, I think
the code should handle lnkcap2 == 0 explizitly which is basically your
point above about other broken devices.
Thanks,
Niklas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-13 9:43 ` Lukas Wunner
@ 2024-12-13 17:22 ` Niklas Schnelle
0 siblings, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-13 17:22 UTC (permalink / raw)
To: Lukas Wunner
Cc: Niklas Schnelle, Bjorn Helgaas, linux-pci, Ilpo Jarvinen,
Jonathan Cameron, Mika Westerberg, Maciej W. Rozycki
On Fri, 2024-12-13 at 10:43 +0100, Lukas Wunner wrote:
> On Thu, Dec 12, 2024 at 08:40:07PM +0100, Niklas Schnelle wrote:
> > > > On Thu, Dec 12, 2024 at 09:56:16AM +0100, Lukas Wunner wrote:
> > > > > The Supported Link Speeds Vector in the Link Capabilities 2 Register
> > > > > indicates the *supported* link speeds. The Max Link Speed field in
> > > > > the Link Capabilities Register indicates the *maximum* of those speeds.
> > > > >
> > > > > Niklas reports that the Intel JHL7540 "Titan Ridge 2018" Thunderbolt
> > > > > controller supports 2.5-8 GT/s speeds, but indicates 2.5 GT/s as maximum.
> > > > > Ilpo recalls seeing this inconsistency on more devices.
> > > > >
> > > > > pcie_get_supported_speeds() neglects to honor the Max Link Speed field
> > > > > and will thus incorrectly deem higher speeds as supported. Fix it.
> >
> > Ok, gave this a test and as somewhat suspected this patch alone doesn't
> > fix my boot hang nor do I get more output (also tried Lukas suggestion
> > with early_printk).
>
> Hm, that's kind of a bummer because while we know how to work around
> your boot hang (by disabling bwctrl altogether), we don't really know
> the root cause.
>
> The bwctrl IRQ handler runs in hardirq context, so if it ends up in an
> infinite loop for some reason or keeps waiting for a spinlock, that
> might indeed cause a boot hang. Not that I'm seeing in the code where
> that might occur. Nevertheless you can try adding "threadirqs" to the
> kernel command line to force all IRQ handlers into threads.
> Alternatively, enable CONFIG_PREEMPT_RT to also turn spinlocks into
> sleeping locks. Maybe this turns the boot hang into a hung task splat
> and thus helps identify the root cause.
>
> Thanks,
>
> Lukas
Tried with "threadirqs" but no change. So then I tried additionally
just exiting pcie_bwnotif_irq() without doing anything if the bus
number matches my thunderbolt controller. Still same result. So I don't
think it is anything the irq handler does in software.
Thanks,
Niklas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-13 10:12 ` Ilpo Järvinen
2024-12-13 17:21 ` Niklas Schnelle
@ 2024-12-13 17:41 ` Niklas Schnelle
2024-12-13 18:49 ` Niklas Schnelle
1 sibling, 1 reply; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-13 17:41 UTC (permalink / raw)
To: Ilpo Järvinen, Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Jonathan Cameron, Mika Westerberg,
Maciej W. Rozycki
On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> On Thu, 12 Dec 2024, Lukas Wunner wrote:
>
> > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > >
> > > > + /* Ignore speeds higher than Max Link Speed */
> > > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > >
>
---8<---
> As in more broader terms there are other kinds of broken devices this
> code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the
> device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
On second look I don't think it will. If lnkcap2 & PCI_EXP_LNKCAP2_SLS
is 0 it will proceed to the synthesize part and rely on
PCI_EXP_LNKCAP_SLS alone. The potentially broken part I see is when
lnkcap2 has bits set but lnkcap doesn't which is also when the
GENMASK(…, 1) would become weird. Not sure what the right handling for
that is though.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds
2024-12-13 17:41 ` Niklas Schnelle
@ 2024-12-13 18:49 ` Niklas Schnelle
0 siblings, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2024-12-13 18:49 UTC (permalink / raw)
To: Ilpo Järvinen, Lukas Wunner
Cc: Bjorn Helgaas, linux-pci, Jonathan Cameron, Mika Westerberg,
Maciej W. Rozycki
On Fri, 2024-12-13 at 18:41 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-13 at 12:12 +0200, Ilpo Järvinen wrote:
> > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> >
> > > On Thu, Dec 12, 2024 at 04:33:23PM +0200, Ilpo Järvinen wrote:
> > > > On Thu, 12 Dec 2024, Lukas Wunner wrote:
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
> > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > > > > speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
> > > > >
> > > > > + /* Ignore speeds higher than Max Link Speed */
> > > > > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > > > + speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> > > >
> >
>
> ---8<---
>
> > As in more broader terms there are other kinds of broken devices this
> > code doesn't handle. If PCI_EXP_LNKCAP2_SLS is empty of bits but the
> > device has >5GT/s in PCI_EXP_LNKCAP_SLS, this function will return 0.
>
> On second look I don't think it will. If lnkcap2 & PCI_EXP_LNKCAP2_SLS
> is 0 it will proceed to the synthesize part and rely on
> PCI_EXP_LNKCAP_SLS alone. The potentially broken part I see is when
> lnkcap2 has bits set but lnkcap doesn't which is also when the
> GENMASK(…, 1) would become weird. Not sure what the right handling for
> that is though.
Never mind, I missed the >5 GT/s bit. Though I think that returning 0
speeds is probably the safest bet for a broken device, no?
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-12-13 18:49 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12 8:56 [PATCH for-linus] PCI: Honor Max Link Speed when determining supported speeds Lukas Wunner
2024-12-12 14:33 ` Ilpo Järvinen
2024-12-12 20:10 ` Niklas Schnelle
2024-12-12 22:13 ` Lukas Wunner
2024-12-13 10:12 ` Ilpo Järvinen
2024-12-13 17:21 ` Niklas Schnelle
2024-12-13 17:41 ` Niklas Schnelle
2024-12-13 18:49 ` Niklas Schnelle
2024-12-12 16:11 ` Bjorn Helgaas
2024-12-12 16:58 ` Niklas Schnelle
2024-12-12 19:40 ` Niklas Schnelle
2024-12-13 9:43 ` Lukas Wunner
2024-12-13 17:22 ` Niklas Schnelle
2024-12-13 9:16 ` Lukas Wunner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox