public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
@ 2025-04-22 11:55 Ilpo Järvinen
  2025-04-23 10:07 ` Lukas Wunner
  2025-04-23 21:04 ` Bjorn Helgaas
  0 siblings, 2 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2025-04-22 11:55 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, Ilpo Järvinen, linux-pci,
	linux-kernel

PCIe BW controller counts LBMS assertions for the purposes of the
Target Speed quirk. It was also a plan to expose the LBMS count through
sysfs to allow better diagnosing link related issues. Lukas Wunner
suggested, however, that adding a trace event would be better for
diagnostics purposes. Leaving only Target Speed quirk as an user of the
lbms_count.

The logic in the Target Speed quirk does not require keeping count of
LBMS assertions but a simple flag is enough which can be placed into
pci_dev's priv_flags. The reduced complexity allows removing
pcie_bwctrl_lbms_rwsem.

As bwctrl is not yet probed when the Target Speed quirk runs during
boot, the LBMS in Link Status register has to still be checked by
the quirk.

The priv_flags numbering is not continuous because hotplug code added
a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed
through in different branches).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---

This will conflict with the new flags Lukas added due to the hp fixes
but it should be simple to deal with that conflict while merging the
topic branches.

v2:
- Change flag value to 6 to make merging with the newly added HP flags easier
- Renamed flag to PCI_LINK_LBMS_SEEN to match the naming of the new HP flags

 drivers/pci/hotplug/pciehp_ctrl.c |  2 +-
 drivers/pci/pci.c                 |  2 +-
 drivers/pci/pci.h                 | 10 ++---
 drivers/pci/pcie/bwctrl.c         | 63 +++++++++----------------------
 drivers/pci/quirks.c              | 10 ++---
 5 files changed, 25 insertions(+), 62 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index d603a7aa7483..bcc938d4420f 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -131,7 +131,7 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
 			      INDICATOR_NOOP);
 
 	/* Don't carry LBMS indications across */
-	pcie_reset_lbms_count(ctrl->pcie->port);
+	pcie_reset_lbms(ctrl->pcie->port);
 }
 
 static int pciehp_enable_slot(struct controller *ctrl);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..3d94cf33c1b6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4757,7 +4757,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
 	 * to track link speed or width changes made by hardware itself
 	 * in attempt to correct unreliable link operation.
 	 */
-	pcie_reset_lbms_count(pdev);
+	pcie_reset_lbms(pdev);
 	return rc;
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index b81e99cd4b62..887811fbe722 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -557,6 +557,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 #define PCI_DPC_RECOVERED 1
 #define PCI_DPC_RECOVERING 2
 #define PCI_DEV_REMOVED 3
+#define PCI_LINK_LBMS_SEEN	6
 
 static inline void pci_dev_assign_added(struct pci_dev *dev)
 {
@@ -824,14 +825,9 @@ static inline void pcie_ecrc_get_policy(char *str) { }
 #endif
 
 #ifdef CONFIG_PCIEPORTBUS
-void pcie_reset_lbms_count(struct pci_dev *port);
-int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
+void pcie_reset_lbms(struct pci_dev *port);
 #else
-static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
-static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
-{
-	return -EOPNOTSUPP;
-}
+static inline void pcie_reset_lbms(struct pci_dev *port) {}
 #endif
 
 struct pci_dev_reset_methods {
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index d8d2aa85a229..ab0387ff2de2 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -38,12 +38,10 @@
 /**
  * struct pcie_bwctrl_data - PCIe bandwidth controller
  * @set_speed_mutex:	Serializes link speed changes
- * @lbms_count:		Count for LBMS (since last reset)
  * @cdev:		Thermal cooling device associated with the port
  */
 struct pcie_bwctrl_data {
 	struct mutex set_speed_mutex;
-	atomic_t lbms_count;
 	struct thermal_cooling_device *cdev;
 };
 
@@ -202,15 +200,14 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
 
 static void pcie_bwnotif_enable(struct pcie_device *srv)
 {
-	struct pcie_bwctrl_data *data = srv->port->link_bwctrl;
 	struct pci_dev *port = srv->port;
 	u16 link_status;
 	int ret;
 
-	/* Count LBMS seen so far as one */
+	/* Note down if LBMS has been seen so far */
 	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
 	if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS)
-		atomic_inc(&data->lbms_count);
+		set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
 
 	pcie_capability_set_word(port, PCI_EXP_LNKCTL,
 				 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
@@ -233,7 +230,6 @@ static void pcie_bwnotif_disable(struct pci_dev *port)
 static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
 {
 	struct pcie_device *srv = context;
-	struct pcie_bwctrl_data *data = srv->port->link_bwctrl;
 	struct pci_dev *port = srv->port;
 	u16 link_status, events;
 	int ret;
@@ -247,7 +243,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
 		return IRQ_NONE;
 
 	if (events & PCI_EXP_LNKSTA_LBMS)
-		atomic_inc(&data->lbms_count);
+		set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
 
 	pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
 
@@ -262,31 +258,10 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
 	return IRQ_HANDLED;
 }
 
-void pcie_reset_lbms_count(struct pci_dev *port)
+void pcie_reset_lbms(struct pci_dev *port)
 {
-	struct pcie_bwctrl_data *data;
-
-	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
-	data = port->link_bwctrl;
-	if (data)
-		atomic_set(&data->lbms_count, 0);
-	else
-		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
-					   PCI_EXP_LNKSTA_LBMS);
-}
-
-int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
-{
-	struct pcie_bwctrl_data *data;
-
-	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
-	data = port->link_bwctrl;
-	if (!data)
-		return -ENOTTY;
-
-	*val = atomic_read(&data->lbms_count);
-
-	return 0;
+	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
+	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
 }
 
 static int pcie_bwnotif_probe(struct pcie_device *srv)
@@ -308,18 +283,16 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
 		return ret;
 
 	scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
-		scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) {
-			port->link_bwctrl = data;
+		port->link_bwctrl = data;
 
-			ret = request_irq(srv->irq, pcie_bwnotif_irq,
-					  IRQF_SHARED, "PCIe bwctrl", srv);
-			if (ret) {
-				port->link_bwctrl = NULL;
-				return ret;
-			}
-
-			pcie_bwnotif_enable(srv);
+		ret = request_irq(srv->irq, pcie_bwnotif_irq,
+				  IRQF_SHARED, "PCIe bwctrl", srv);
+		if (ret) {
+			port->link_bwctrl = NULL;
+			return ret;
 		}
+
+		pcie_bwnotif_enable(srv);
 	}
 
 	pci_dbg(port, "enabled with IRQ %d\n", srv->irq);
@@ -339,13 +312,11 @@ static void pcie_bwnotif_remove(struct pcie_device *srv)
 	pcie_cooling_device_unregister(data->cdev);
 
 	scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
-		scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) {
-			pcie_bwnotif_disable(srv->port);
+		pcie_bwnotif_disable(srv->port);
 
-			free_irq(srv->irq, srv);
+		free_irq(srv->irq, srv);
 
-			srv->port->link_bwctrl = NULL;
-		}
+		srv->port->link_bwctrl = NULL;
 	}
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8d610c17e0f2..64ac1ee944d3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -38,14 +38,10 @@
 
 static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
 {
-	unsigned long count;
-	int ret;
-
-	ret = pcie_lbms_count(dev, &count);
-	if (ret < 0)
-		return lnksta & PCI_EXP_LNKSTA_LBMS;
+	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
+		return true;
 
-	return count > 0;
+	return lnksta & PCI_EXP_LNKSTA_LBMS;
 }
 
 /*

base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-22 11:55 [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag Ilpo Järvinen
@ 2025-04-23 10:07 ` Lukas Wunner
  2025-04-23 11:37   ` Ilpo Järvinen
  2025-04-23 21:04 ` Bjorn Helgaas
  1 sibling, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2025-04-23 10:07 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Maciej W. Rozycki

[cc += Maciej, start of thread is here:
https://lore.kernel.org/r/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com/
]

On Tue, Apr 22, 2025 at 02:55:47PM +0300, Ilpo Järvinen wrote:
> +void pcie_reset_lbms(struct pci_dev *port)
>  {
> -	struct pcie_bwctrl_data *data;
> -
> -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> -	data = port->link_bwctrl;
> -	if (data)
> -		atomic_set(&data->lbms_count, 0);
> -	else
> -		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
> -					   PCI_EXP_LNKSTA_LBMS);
> +	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
> +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
>  }

Hm, previously the LBMS bit was only cleared in the Link Status register
if the bandwith controller hadn't probed yet.  Now it's cleared
unconditionally.  I'm wondering if this changes the logic somehow?


>  static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
>  {
> -	unsigned long count;
> -	int ret;
> -
> -	ret = pcie_lbms_count(dev, &count);
> -	if (ret < 0)
> -		return lnksta & PCI_EXP_LNKSTA_LBMS;
> +	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
> +		return true;
>  
> -	return count > 0;
> +	return lnksta & PCI_EXP_LNKSTA_LBMS;
>  }

Another small logic change here:  Previously pcie_lbms_count()
returned a negative value if the bandwidth controller hadn't
probed yet or wasn't compiled into the kernel.  Only in those
two cases was the LBMS flag in the lnksta variable returned.

Now the LBMS flag is also returned if the bandwidth controller
is compiled into the kernel and has probed, but its irq handler
hasn't recorded a seen LBMS bit yet.

I'm guessing this can happen if the quirk races with the irq
handler and wins the race, so this safety net is needed?

This is quite subtle so I thought I'd ask.  The patch otherwise
LGTM, so assuming the two subtle logic changes above are intentional
and can be explained, this is

Reviewed-by: Lukas Wunner <lukas@wunner.de>

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-23 10:07 ` Lukas Wunner
@ 2025-04-23 11:37   ` Ilpo Järvinen
  2025-04-24  5:38     ` Lukas Wunner
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2025-04-23 11:37 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

[-- Attachment #1: Type: text/plain, Size: 3582 bytes --]

On Wed, 23 Apr 2025, Lukas Wunner wrote:

> [cc += Maciej, start of thread is here:
> https://lore.kernel.org/r/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com/
> ]
> 
> On Tue, Apr 22, 2025 at 02:55:47PM +0300, Ilpo Järvinen wrote:
> > +void pcie_reset_lbms(struct pci_dev *port)
> >  {
> > -	struct pcie_bwctrl_data *data;
> > -
> > -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> > -	data = port->link_bwctrl;
> > -	if (data)
> > -		atomic_set(&data->lbms_count, 0);
> > -	else
> > -		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
> > -					   PCI_EXP_LNKSTA_LBMS);
> > +	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
> > +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
> >  }
> 
> Hm, previously the LBMS bit was only cleared in the Link Status register
> if the bandwith controller hadn't probed yet.  Now it's cleared
> unconditionally.  I'm wondering if this changes the logic somehow?

Hmm, that's a good question and I hadn't thought all the implications.
I suppose leaving if (!port->link_bwctrl) there would retain the existing 
behavior better allowing bwctrl to pick the link speed changes more 
reliably.

However, I'm not entirely sure if the old code was a good idea either as 
it assumed the irq handler had read LBMS by the time lbms_count is reset.
Solving that would seemingly require locking to not race with remove, 
which just got removed (LOL) :-(.

Given this flag is only for the purposes of the quirk, it seems very much 
out of proportions. The quirk seeing extra LBMS doesn't seem to have a big 
practical impact. At worst case, the link speed becomes gen1 if the quirk 
fails to restore the original link speed for some reason (which, IIRC, it 
didn't yet attempt do when the original LBMS reset code was added).

So I'd prefer going with the if (!port->link_bwctrl) solution.

> >  static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
> >  {
> > -	unsigned long count;
> > -	int ret;
> > -
> > -	ret = pcie_lbms_count(dev, &count);
> > -	if (ret < 0)
> > -		return lnksta & PCI_EXP_LNKSTA_LBMS;
> > +	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
> > +		return true;
> >  
> > -	return count > 0;
> > +	return lnksta & PCI_EXP_LNKSTA_LBMS;
> >  }
> 
> Another small logic change here:  Previously pcie_lbms_count()
> returned a negative value if the bandwidth controller hadn't
> probed yet or wasn't compiled into the kernel.

One cannot disable bwctrl, it always comes on with PCIe.

> Only in those two cases was the LBMS flag in the lnksta variable 
> returned.
> 
> Now the LBMS flag is also returned if the bandwidth controller
> is compiled into the kernel and has probed, but its irq handler
> hasn't recorded a seen LBMS bit yet.
> 
> I'm guessing this can happen if the quirk races with the irq
> handler and wins the race, so this safety net is needed?

The main reason why this check is here is for the boot when bwctrl is not 
yet probed when the quirk runs. But the check just seems harmless, or 
even somewhat useful, in the case when bwctrl has already probed. LBMS 
being asserted should result in PCI_LINK_LBMS_SEEN even if the irq 
handler has not yet done its job to transfer it into priv_flags.

> This is quite subtle so I thought I'd ask.

It's good that you asked! :-)

> The patch otherwise
> LGTM, so assuming the two subtle logic changes above are intentional
> and can be explained, this is
> 
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> 
> Thanks,
> 
> Lukas
> 

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-22 11:55 [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag Ilpo Järvinen
  2025-04-23 10:07 ` Lukas Wunner
@ 2025-04-23 21:04 ` Bjorn Helgaas
  1 sibling, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2025-04-23 21:04 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Lukas Wunner, Bjorn Helgaas, linux-pci, linux-kernel

On Tue, Apr 22, 2025 at 02:55:47PM +0300, Ilpo Järvinen wrote:
> PCIe BW controller counts LBMS assertions for the purposes of the
> Target Speed quirk. It was also a plan to expose the LBMS count through
> sysfs to allow better diagnosing link related issues. Lukas Wunner
> suggested, however, that adding a trace event would be better for
> diagnostics purposes. Leaving only Target Speed quirk as an user of the
> lbms_count.
> 
> The logic in the Target Speed quirk does not require keeping count of
> LBMS assertions but a simple flag is enough which can be placed into
> pci_dev's priv_flags. The reduced complexity allows removing
> pcie_bwctrl_lbms_rwsem.
> 
> As bwctrl is not yet probed when the Target Speed quirk runs during
> boot, the LBMS in Link Status register has to still be checked by
> the quirk.
> 
> The priv_flags numbering is not continuous because hotplug code added
> a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed
> through in different branches).
> 
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
> 
> This will conflict with the new flags Lukas added due to the hp fixes
> but it should be simple to deal with that conflict while merging the
> topic branches.
> 
> v2:
> - Change flag value to 6 to make merging with the newly added HP flags easier
> - Renamed flag to PCI_LINK_LBMS_SEEN to match the naming of the new HP flags
> 
>  drivers/pci/hotplug/pciehp_ctrl.c |  2 +-
>  drivers/pci/pci.c                 |  2 +-
>  drivers/pci/pci.h                 | 10 ++---
>  drivers/pci/pcie/bwctrl.c         | 63 +++++++++----------------------
>  drivers/pci/quirks.c              | 10 ++---
>  5 files changed, 25 insertions(+), 62 deletions(-)

Applied to pci/bwctrl for v6.16, on the assumption that everybody's
happy with this.  Thanks!

> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index d603a7aa7483..bcc938d4420f 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -131,7 +131,7 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>  			      INDICATOR_NOOP);
>  
>  	/* Don't carry LBMS indications across */
> -	pcie_reset_lbms_count(ctrl->pcie->port);
> +	pcie_reset_lbms(ctrl->pcie->port);
>  }
>  
>  static int pciehp_enable_slot(struct controller *ctrl);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4d7c9f64ea24..3d94cf33c1b6 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4757,7 +4757,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
>  	 * to track link speed or width changes made by hardware itself
>  	 * in attempt to correct unreliable link operation.
>  	 */
> -	pcie_reset_lbms_count(pdev);
> +	pcie_reset_lbms(pdev);
>  	return rc;
>  }
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index b81e99cd4b62..887811fbe722 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -557,6 +557,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>  #define PCI_DPC_RECOVERED 1
>  #define PCI_DPC_RECOVERING 2
>  #define PCI_DEV_REMOVED 3
> +#define PCI_LINK_LBMS_SEEN	6
>  
>  static inline void pci_dev_assign_added(struct pci_dev *dev)
>  {
> @@ -824,14 +825,9 @@ static inline void pcie_ecrc_get_policy(char *str) { }
>  #endif
>  
>  #ifdef CONFIG_PCIEPORTBUS
> -void pcie_reset_lbms_count(struct pci_dev *port);
> -int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> +void pcie_reset_lbms(struct pci_dev *port);
>  #else
> -static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> -static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> -{
> -	return -EOPNOTSUPP;
> -}
> +static inline void pcie_reset_lbms(struct pci_dev *port) {}
>  #endif
>  
>  struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
> index d8d2aa85a229..ab0387ff2de2 100644
> --- a/drivers/pci/pcie/bwctrl.c
> +++ b/drivers/pci/pcie/bwctrl.c
> @@ -38,12 +38,10 @@
>  /**
>   * struct pcie_bwctrl_data - PCIe bandwidth controller
>   * @set_speed_mutex:	Serializes link speed changes
> - * @lbms_count:		Count for LBMS (since last reset)
>   * @cdev:		Thermal cooling device associated with the port
>   */
>  struct pcie_bwctrl_data {
>  	struct mutex set_speed_mutex;
> -	atomic_t lbms_count;
>  	struct thermal_cooling_device *cdev;
>  };
>  
> @@ -202,15 +200,14 @@ int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
>  
>  static void pcie_bwnotif_enable(struct pcie_device *srv)
>  {
> -	struct pcie_bwctrl_data *data = srv->port->link_bwctrl;
>  	struct pci_dev *port = srv->port;
>  	u16 link_status;
>  	int ret;
>  
> -	/* Count LBMS seen so far as one */
> +	/* Note down if LBMS has been seen so far */
>  	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
>  	if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS)
> -		atomic_inc(&data->lbms_count);
> +		set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
>  
>  	pcie_capability_set_word(port, PCI_EXP_LNKCTL,
>  				 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
> @@ -233,7 +230,6 @@ static void pcie_bwnotif_disable(struct pci_dev *port)
>  static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
>  {
>  	struct pcie_device *srv = context;
> -	struct pcie_bwctrl_data *data = srv->port->link_bwctrl;
>  	struct pci_dev *port = srv->port;
>  	u16 link_status, events;
>  	int ret;
> @@ -247,7 +243,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
>  		return IRQ_NONE;
>  
>  	if (events & PCI_EXP_LNKSTA_LBMS)
> -		atomic_inc(&data->lbms_count);
> +		set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
>  
>  	pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
>  
> @@ -262,31 +258,10 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
>  	return IRQ_HANDLED;
>  }
>  
> -void pcie_reset_lbms_count(struct pci_dev *port)
> +void pcie_reset_lbms(struct pci_dev *port)
>  {
> -	struct pcie_bwctrl_data *data;
> -
> -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> -	data = port->link_bwctrl;
> -	if (data)
> -		atomic_set(&data->lbms_count, 0);
> -	else
> -		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
> -					   PCI_EXP_LNKSTA_LBMS);
> -}
> -
> -int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> -{
> -	struct pcie_bwctrl_data *data;
> -
> -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> -	data = port->link_bwctrl;
> -	if (!data)
> -		return -ENOTTY;
> -
> -	*val = atomic_read(&data->lbms_count);
> -
> -	return 0;
> +	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
> +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
>  }
>  
>  static int pcie_bwnotif_probe(struct pcie_device *srv)
> @@ -308,18 +283,16 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
>  		return ret;
>  
>  	scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
> -		scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) {
> -			port->link_bwctrl = data;
> +		port->link_bwctrl = data;
>  
> -			ret = request_irq(srv->irq, pcie_bwnotif_irq,
> -					  IRQF_SHARED, "PCIe bwctrl", srv);
> -			if (ret) {
> -				port->link_bwctrl = NULL;
> -				return ret;
> -			}
> -
> -			pcie_bwnotif_enable(srv);
> +		ret = request_irq(srv->irq, pcie_bwnotif_irq,
> +				  IRQF_SHARED, "PCIe bwctrl", srv);
> +		if (ret) {
> +			port->link_bwctrl = NULL;
> +			return ret;
>  		}
> +
> +		pcie_bwnotif_enable(srv);
>  	}
>  
>  	pci_dbg(port, "enabled with IRQ %d\n", srv->irq);
> @@ -339,13 +312,11 @@ static void pcie_bwnotif_remove(struct pcie_device *srv)
>  	pcie_cooling_device_unregister(data->cdev);
>  
>  	scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
> -		scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) {
> -			pcie_bwnotif_disable(srv->port);
> +		pcie_bwnotif_disable(srv->port);
>  
> -			free_irq(srv->irq, srv);
> +		free_irq(srv->irq, srv);
>  
> -			srv->port->link_bwctrl = NULL;
> -		}
> +		srv->port->link_bwctrl = NULL;
>  	}
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 8d610c17e0f2..64ac1ee944d3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -38,14 +38,10 @@
>  
>  static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
>  {
> -	unsigned long count;
> -	int ret;
> -
> -	ret = pcie_lbms_count(dev, &count);
> -	if (ret < 0)
> -		return lnksta & PCI_EXP_LNKSTA_LBMS;
> +	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
> +		return true;
>  
> -	return count > 0;
> +	return lnksta & PCI_EXP_LNKSTA_LBMS;
>  }
>  
>  /*
> 
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-23 11:37   ` Ilpo Järvinen
@ 2025-04-24  5:38     ` Lukas Wunner
  2025-04-24 12:37       ` Ilpo Järvinen
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2025-04-24  5:38 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

On Wed, Apr 23, 2025 at 02:37:11PM +0300, Ilpo Järvinen wrote:
> On Wed, 23 Apr 2025, Lukas Wunner wrote:
> > On Tue, Apr 22, 2025 at 02:55:47PM +0300, Ilpo Järvinen wrote:
> > > +void pcie_reset_lbms(struct pci_dev *port)
> > >  {
> > > -	struct pcie_bwctrl_data *data;
> > > -
> > > -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> > > -	data = port->link_bwctrl;
> > > -	if (data)
> > > -		atomic_set(&data->lbms_count, 0);
> > > -	else
> > > -		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
> > > -					   PCI_EXP_LNKSTA_LBMS);
> > > +	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
> > > +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
> > >  }
> > 
> > Hm, previously the LBMS bit was only cleared in the Link Status register
> > if the bandwith controller hadn't probed yet.  Now it's cleared
> > unconditionally.  I'm wondering if this changes the logic somehow?
> 
> Hmm, that's a good question and I hadn't thought all the implications.
> I suppose leaving if (!port->link_bwctrl) there would retain the existing 
> behavior better allowing bwctrl to pick the link speed changes more 
> reliably.

I think the only potential issue with clearing the LBMS bit in the register
is that the bandwidth controller's irq handler won't see the bit and may
return with IRQ_NONE.

However, looking at the callers of pcie_reset_lbms(), that doesn't seem
to be a real issue.  There are only two of them:

- pcie_retrain_link() calls the function after the link was retrained.
  I guess the LBMS bit in the register may be set as a side-effect of
  the link retraining?  The only concern here is whether the cached
  link speed is updated.  pcie_bwctrl_change_speed() does call
  pcie_update_link_speed() after calling pcie_retrain_link(), so that
  looks fine.  But there's a second caller of pcie_retrain_link():
  pcie_aspm_configure_common_clock().  It doesn't update the cached
  link speed after calling pcie_retrain_link().  Not sure if this can
  lead to a change in link speed and therefore the cached link speed
  should be updated?  The Target Link Speed isn't changed, but maybe
  the link fails to retrain to the same speed for electrical reasons?

- pciehp's remove_board() calls the function after bringing down the slot
  to avoid a stale PCI_LINK_LBMS_SEEN flag.  No real harm in clearing the
  bit in the register at this point I guess.  But I do wonder, is the link
  speed updated somewhere when a new board is added?  The replacement
  device may not support the same speeds as the previous device.


> Given this flag is only for the purposes of the quirk, it seems very much 
> out of proportions.

Yes, let's try to minimize the amount of locking, flags and code to support
the quirk.  Keep it as simple as possible.  So in that sense, the solution
you've chosen is probably fine.


> > >  static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
> > >  {
> > > -	unsigned long count;
> > > -	int ret;
> > > -
> > > -	ret = pcie_lbms_count(dev, &count);
> > > -	if (ret < 0)
> > > -		return lnksta & PCI_EXP_LNKSTA_LBMS;
> > > +	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
> > > +		return true;
> > >  
> > > -	return count > 0;
> > > +	return lnksta & PCI_EXP_LNKSTA_LBMS;
> > >  }
> > 
> > Another small logic change here:  Previously pcie_lbms_count()
> > returned a negative value if the bandwidth controller hadn't
> > probed yet or wasn't compiled into the kernel.
> > Only in those two cases was the LBMS flag in the lnksta variable 
> > returned.
> > 
> > Now the LBMS flag is also returned if the bandwidth controller
> > is compiled into the kernel and has probed, but its irq handler
> > hasn't recorded a seen LBMS bit yet.
> > 
> > I'm guessing this can happen if the quirk races with the irq
> > handler and wins the race, so this safety net is needed?
> 
> The main reason why this check is here is for the boot when bwctrl is not 
> yet probed when the quirk runs. But the check just seems harmless, or 
> even somewhat useful, in the case when bwctrl has already probed. LBMS 
> being asserted should result in PCI_LINK_LBMS_SEEN even if the irq 
> handler has not yet done its job to transfer it into priv_flags.

Okay I'm convinced that the logic change in pcie_lbms_seen() is fine.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-24  5:38     ` Lukas Wunner
@ 2025-04-24 12:37       ` Ilpo Järvinen
  2025-04-25 10:12         ` Lukas Wunner
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2025-04-24 12:37 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

[-- Attachment #1: Type: text/plain, Size: 5720 bytes --]

On Thu, 24 Apr 2025, Lukas Wunner wrote:

> On Wed, Apr 23, 2025 at 02:37:11PM +0300, Ilpo Järvinen wrote:
> > On Wed, 23 Apr 2025, Lukas Wunner wrote:
> > > On Tue, Apr 22, 2025 at 02:55:47PM +0300, Ilpo Järvinen wrote:
> > > > +void pcie_reset_lbms(struct pci_dev *port)
> > > >  {
> > > > -	struct pcie_bwctrl_data *data;
> > > > -
> > > > -	guard(rwsem_read)(&pcie_bwctrl_lbms_rwsem);
> > > > -	data = port->link_bwctrl;
> > > > -	if (data)
> > > > -		atomic_set(&data->lbms_count, 0);
> > > > -	else
> > > > -		pcie_capability_write_word(port, PCI_EXP_LNKSTA,
> > > > -					   PCI_EXP_LNKSTA_LBMS);
> > > > +	clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
> > > > +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
> > > >  }
> > > 
> > > Hm, previously the LBMS bit was only cleared in the Link Status register
> > > if the bandwith controller hadn't probed yet.  Now it's cleared
> > > unconditionally.  I'm wondering if this changes the logic somehow?
> > 
> > Hmm, that's a good question and I hadn't thought all the implications.
> > I suppose leaving if (!port->link_bwctrl) there would retain the existing 
> > behavior better allowing bwctrl to pick the link speed changes more 
> > reliably.
> 
> I think the only potential issue with clearing the LBMS bit in the register
> is that the bandwidth controller's irq handler won't see the bit and may
> return with IRQ_NONE.
> 
> However, looking at the callers of pcie_reset_lbms(), that doesn't seem
> to be a real issue.  There are only two of them:
> 
> - pcie_retrain_link() calls the function after the link was retrained.
>   I guess the LBMS bit in the register may be set as a side-effect of
>   the link retraining?

Retraining does set LBMS, whether the speed was same before doesn't 
matter. I think it's because LTSSM-wise, retraining transitions through 
Recovery.
 
(I don't know why, but in most tests I've done LBMS is actually asserted 
not only once but twice with only one Link Retraining event).

>   The only concern here is whether the cached
>   link speed is updated.  pcie_bwctrl_change_speed() does call
>   pcie_update_link_speed() after calling pcie_retrain_link(), so that
>   looks fine.  But there's a second caller of pcie_retrain_link():
>   pcie_aspm_configure_common_clock().  It doesn't update the cached
>   link speed after calling pcie_retrain_link().  Not sure if this can
>   lead to a change in link speed and therefore the cached link speed
>   should be updated?  The Target Link Speed isn't changed, but maybe
>   the link fails to retrain to the same speed for electrical reasons?

I've never seen that to happen but it would seem odd if that is forbidden 
(as the alternative is probably that the link remains down).

Perhaps pcie_reset_lbms() should just call pcie_update_link_speed() as the 
last step, then the irq handler returning IRQ_NONE doesn't matter.

> - pciehp's remove_board() calls the function after bringing down the slot
>   to avoid a stale PCI_LINK_LBMS_SEEN flag.  No real harm in clearing the
>   bit in the register at this point I guess.  But I do wonder, is the link
>   speed updated somewhere when a new board is added?  The replacement
>   device may not support the same speeds as the previous device.

The supported speeds are always recalculated using dev->supported_speeds. 
A new board implies a new pci_dev structure with newly read supported 
speeds. Also, bringing the link up with the replacement device will also 
trigger LBMS so the new Link Speed should be picked up by that.

Racing LBMS reset from remove_board() with LBMS due to the replacement 
board shouldn't result in stale Link Speed because of:

board_added()
  pciehp_check_link_status()
    __pcie_update_link_speed()

> > Given this flag is only for the purposes of the quirk, it seems very much 
> > out of proportions.
> 
> Yes, let's try to minimize the amount of locking, flags and code to support
> the quirk.  Keep it as simple as possible.  So in that sense, the solution
> you've chosen is probably fine.
> 
> 
> > > >  static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
> > > >  {
> > > > -	unsigned long count;
> > > > -	int ret;
> > > > -
> > > > -	ret = pcie_lbms_count(dev, &count);
> > > > -	if (ret < 0)
> > > > -		return lnksta & PCI_EXP_LNKSTA_LBMS;
> > > > +	if (test_bit(PCI_LINK_LBMS_SEEN, &dev->priv_flags))
> > > > +		return true;
> > > >  
> > > > -	return count > 0;
> > > > +	return lnksta & PCI_EXP_LNKSTA_LBMS;
> > > >  }
> > > 
> > > Another small logic change here:  Previously pcie_lbms_count()
> > > returned a negative value if the bandwidth controller hadn't
> > > probed yet or wasn't compiled into the kernel.
> > > Only in those two cases was the LBMS flag in the lnksta variable 
> > > returned.
> > > 
> > > Now the LBMS flag is also returned if the bandwidth controller
> > > is compiled into the kernel and has probed, but its irq handler
> > > hasn't recorded a seen LBMS bit yet.
> > > 
> > > I'm guessing this can happen if the quirk races with the irq
> > > handler and wins the race, so this safety net is needed?
> > 
> > The main reason why this check is here is for the boot when bwctrl is not 
> > yet probed when the quirk runs. But the check just seems harmless, or 
> > even somewhat useful, in the case when bwctrl has already probed. LBMS 
> > being asserted should result in PCI_LINK_LBMS_SEEN even if the irq 
> > handler has not yet done its job to transfer it into priv_flags.
> 
> Okay I'm convinced that the logic change in pcie_lbms_seen() is fine.
> 
> Thanks,
> 
> Lukas
> 

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-24 12:37       ` Ilpo Järvinen
@ 2025-04-25 10:12         ` Lukas Wunner
  2025-04-25 12:24           ` Ilpo Järvinen
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2025-04-25 10:12 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

On Thu, Apr 24, 2025 at 03:37:38PM +0300, Ilpo Järvinen wrote:
> On Thu, 24 Apr 2025, Lukas Wunner wrote:
> >   The only concern here is whether the cached
> >   link speed is updated.  pcie_bwctrl_change_speed() does call
> >   pcie_update_link_speed() after calling pcie_retrain_link(), so that
> >   looks fine.  But there's a second caller of pcie_retrain_link():
> >   pcie_aspm_configure_common_clock().  It doesn't update the cached
> >   link speed after calling pcie_retrain_link().  Not sure if this can
> >   lead to a change in link speed and therefore the cached link speed
> >   should be updated?  The Target Link Speed isn't changed, but maybe
> >   the link fails to retrain to the same speed for electrical reasons?
> 
> I've never seen that to happen but it would seem odd if that is forbidden 
> (as the alternative is probably that the link remains down).
> 
> Perhaps pcie_reset_lbms() should just call pcie_update_link_speed() as the 
> last step, then the irq handler returning IRQ_NONE doesn't matter.

Why pcie_reset_lbms()?  I was rather thinking that pcie_update_link_speed()
should be called from pcie_retrain_link().  Maybe right after the final
pcie_wait_for_link_status().

That would ensure that the speed is updated in case retraining from
pcie_aspm_configure_common_clock() happens to lead to a lower speed.
And the call to pcie_update_link_speed() from pcie_bwctrl_change_speed()
could then be dropped.

PCIe r6.2 sec 7.5.3.19 says the Target Link Speed "sets an upper limit
on Link operational speed", which implies that the actual negotiated
speed might be lower.


> > - pciehp's remove_board() calls the function after bringing down the slot
> >   to avoid a stale PCI_LINK_LBMS_SEEN flag.  No real harm in clearing the
> >   bit in the register at this point I guess.  But I do wonder, is the link
> >   speed updated somewhere when a new board is added?  The replacement
> >   device may not support the same speeds as the previous device.
> 
> The supported speeds are always recalculated using dev->supported_speeds. 
> A new board implies a new pci_dev structure with newly read supported 
> speeds. Also, bringing the link up with the replacement device will also 
> trigger LBMS so the new Link Speed should be picked up by that.
> 
> Racing LBMS reset from remove_board() with LBMS due to the replacement 
> board shouldn't result in stale Link Speed because of:
> 
> board_added()
>   pciehp_check_link_status()
>     __pcie_update_link_speed()

Good!  That's the information I was looking for.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-25 10:12         ` Lukas Wunner
@ 2025-04-25 12:24           ` Ilpo Järvinen
  2025-04-29 10:02             ` Lukas Wunner
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2025-04-25 12:24 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

[-- Attachment #1: Type: text/plain, Size: 3279 bytes --]

On Fri, 25 Apr 2025, Lukas Wunner wrote:
> On Thu, Apr 24, 2025 at 03:37:38PM +0300, Ilpo Järvinen wrote:
> > On Thu, 24 Apr 2025, Lukas Wunner wrote:
> > >   The only concern here is whether the cached
> > >   link speed is updated.  pcie_bwctrl_change_speed() does call
> > >   pcie_update_link_speed() after calling pcie_retrain_link(), so that
> > >   looks fine.  But there's a second caller of pcie_retrain_link():
> > >   pcie_aspm_configure_common_clock().  It doesn't update the cached
> > >   link speed after calling pcie_retrain_link().  Not sure if this can
> > >   lead to a change in link speed and therefore the cached link speed
> > >   should be updated?  The Target Link Speed isn't changed, but maybe
> > >   the link fails to retrain to the same speed for electrical reasons?
> > 
> > I've never seen that to happen but it would seem odd if that is forbidden 
> > (as the alternative is probably that the link remains down).
> > 
> > Perhaps pcie_reset_lbms() should just call pcie_update_link_speed() as the 
> > last step, then the irq handler returning IRQ_NONE doesn't matter.
> 
> Why pcie_reset_lbms()?  I was rather thinking that pcie_update_link_speed()
> should be called from pcie_retrain_link().  Maybe right after the final
> pcie_wait_for_link_status().

My reasonale for having it in pcie_reset_lbms() is that LBMS is cleared
there which races with the irq handler reading LBMS. If LBMS is cleared 
before the irq handler reads linksta register, it returns IRQ_NONE and 
will misses the LBMS event. So this race problem is directly associated 
with the write-to-clear of LBMS.

> That would ensure that the speed is updated in case retraining from
> pcie_aspm_configure_common_clock() happens to lead to a lower speed.
> And the call to pcie_update_link_speed() from pcie_bwctrl_change_speed()
> could then be dropped.
>
> PCIe r6.2 sec 7.5.3.19 says the Target Link Speed "sets an upper limit
> on Link operational speed", which implies that the actual negotiated
> speed might be lower.

While I don't disagree with that spec interpretation, in case of ASPM, the 
question is more complex than that. The link was already trained to speed 
x, can the new link training result in failing to train to x (in 
practice).

The funny problem here is that all 3 places have a different, but good 
reason to call pcie_update_link_speed():

1) pcie_reset_lbms() because of the LBMS race mentioned above.

2) pcie_retrain_link() because Link Speed could have changed because of 
   the link training.

3) pcie_bwctrl_change_speed() because it asked to change link speed, and 
   also due to the known HW issue that some platforms do not reliably send 
   LBMS (I don't recall if it was interrupt triggering issue or issue 
   with asserting LBMS itself, the effect is the same regardless).

In addition, in the code 3) calls 2) and 2) calls 1), which leaves 
pcie_reset_lbms() as the function where all roads always lead to including 
those that only call pcie_reset_lbms(). So if pcie_update_link_speed() is 
to be placed not all those three places but only one, the best place seems 
to be pcie_reset_lbms() as it covers all cases including those that only 
calls it directly.

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
  2025-04-25 12:24           ` Ilpo Järvinen
@ 2025-04-29 10:02             ` Lukas Wunner
  0 siblings, 0 replies; 9+ messages in thread
From: Lukas Wunner @ 2025-04-29 10:02 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Maciej W. Rozycki

On Fri, Apr 25, 2025 at 03:24:45PM +0300, Ilpo Järvinen wrote:
> On Fri, 25 Apr 2025, Lukas Wunner wrote:
> > On Thu, Apr 24, 2025 at 03:37:38PM +0300, Ilpo Järvinen wrote:
> > > On Thu, 24 Apr 2025, Lukas Wunner wrote:
> > > >   The only concern here is whether the cached
> > > >   link speed is updated.  pcie_bwctrl_change_speed() does call
> > > >   pcie_update_link_speed() after calling pcie_retrain_link(), so that
> > > >   looks fine.  But there's a second caller of pcie_retrain_link():
> > > >   pcie_aspm_configure_common_clock().  It doesn't update the cached
> > > >   link speed after calling pcie_retrain_link().  Not sure if this can
> > > >   lead to a change in link speed and therefore the cached link speed
> > > >   should be updated?  The Target Link Speed isn't changed, but maybe
> > > >   the link fails to retrain to the same speed for electrical reasons?
> > > 
> > > I've never seen that to happen but it would seem odd if that is forbidden 
> > > (as the alternative is probably that the link remains down).
> > > 
> > > Perhaps pcie_reset_lbms() should just call pcie_update_link_speed() as the 
> > > last step, then the irq handler returning IRQ_NONE doesn't matter.
> > 
> > Why pcie_reset_lbms()?  I was rather thinking that pcie_update_link_speed()
> > should be called from pcie_retrain_link().  Maybe right after the final
> > pcie_wait_for_link_status().
> 
> My reasonale for having it in pcie_reset_lbms() is that LBMS is cleared
> there which races with the irq handler reading LBMS. If LBMS is cleared 
> before the irq handler reads linksta register, it returns IRQ_NONE and 
> will misses the LBMS event. So this race problem is directly associated 
> with the write-to-clear of LBMS.

pcie_reset_lbms() is only called from two places:

(1) pciehp's remove_board() -- no need to update the link speed of an empty
    slot and you've proven that the speed *is* updated by board_added()
    once there is a new card in the slot.

(2) pcie_retrain_link() -- retraining could always lead to a different
    speed, e.g. due to electrical issues, so unconditionally updating
    the link speed makes sense.

It feels awkward that a function named pcie_reset_lbms() would also
update the link speed as a side effect.

> While I don't disagree with that spec interpretation, in case of ASPM, the 
> question is more complex than that. The link was already trained to speed 
> x, can the new link training result in failing to train to x (in 
> practice).

It's probably rare but bad wiring or soldering issues can always cause
a lower or higher speed than before.

My recommendation would be to move the invocation of
pcie_update_link_speed() from pcie_bwctrl_change_speed()
to pcie_retrain_link().

Just to cover the case that the retraining initiated by
pcie_aspm_configure_common_clock() leads to a different speed
and pcie_reset_lbms() wins the race against the bwctrl irq handler.
It's a corner case, but if we've identified it now, might as well
fix it I guess?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-29 10:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-22 11:55 [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag Ilpo Järvinen
2025-04-23 10:07 ` Lukas Wunner
2025-04-23 11:37   ` Ilpo Järvinen
2025-04-24  5:38     ` Lukas Wunner
2025-04-24 12:37       ` Ilpo Järvinen
2025-04-25 10:12         ` Lukas Wunner
2025-04-25 12:24           ` Ilpo Järvinen
2025-04-29 10:02             ` Lukas Wunner
2025-04-23 21:04 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox