linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1
@ 2023-08-26 11:10 Heiner Kallweit
  2023-10-02 15:14 ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Heiner Kallweit @ 2023-08-26 11:10 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org, Ajay Agarwal

After the referenced commit we may see L1 sub-states being active
unexpectedly. Following scenario as an example:
r8169 disables L1 because of known hardware issues on a number of
systems. Implicitly L1.1 and L1.2 are disabled too.
On my system L1 and L1.1 work fine, but L1.2 causes missed
rx packets. Therefore I write 1 to aspm_l1_1.
This removes ASPM_STATE_L1 from the disabled modes and therefore
unexpectedly enables also L1.2. So return to the old behavior.

A comment in the commit message of the referenced change correctly points
out that this behavior is inconsistent with aspm_attr_store_common().
So change aspm_attr_store_common() accordingly.

Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/pci/pcie/aspm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 3dafba0b5..6d3788257 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 	if (state & PCIE_LINK_STATE_L0S)
 		link->aspm_disable |= ASPM_STATE_L0S;
 	if (state & PCIE_LINK_STATE_L1)
-		link->aspm_disable |= ASPM_STATE_L1;
+		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
 	if (state & PCIE_LINK_STATE_L1_1)
 		link->aspm_disable |= ASPM_STATE_L1_1;
 	if (state & PCIE_LINK_STATE_L1_2)
@@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
 			link->aspm_disable &= ~ASPM_STATE_L1;
 	} else {
 		link->aspm_disable |= state;
+		if (state & ASPM_STATE_L1)
+			link->aspm_disable |= ASPM_STATE_L1SS;
 	}
 
 	pcie_config_aspm_link(link, policy_to_aspm_state(link));
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1
  2023-08-26 11:10 [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1 Heiner Kallweit
@ 2023-10-02 15:14 ` Bjorn Helgaas
  2023-10-03  3:33   ` Kuppuswamy Sathyanarayanan
  2023-10-10 20:29   ` Heiner Kallweit
  0 siblings, 2 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2023-10-02 15:14 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Ajay Agarwal,
	Kuppuswamy Sathyanarayanan, Lukas Wunner

[+cc Sathy, Lukas]

On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
> After the referenced commit we may see L1 sub-states being active
> unexpectedly. Following scenario as an example:
> r8169 disables L1 because of known hardware issues on a number of
> systems. Implicitly L1.1 and L1.2 are disabled too.
> On my system L1 and L1.1 work fine, but L1.2 causes missed
> rx packets. Therefore I write 1 to aspm_l1_1.
> This removes ASPM_STATE_L1 from the disabled modes and therefore
> unexpectedly enables also L1.2. So return to the old behavior.
> 
> A comment in the commit message of the referenced change correctly points
> out that this behavior is inconsistent with aspm_attr_store_common().
> So change aspm_attr_store_common() accordingly.

I think we should split this into a pure revert of fb097dcd5a28 with
the description of the unintended consequence, followed by another
patch to change aspm_attr_store_common().

I guess the existing aspm_attr_store_common() behavior allows a
similar unexpected behavior?  For example, in this sequence:

  - Write 0 to "l1_aspm" to disable L1
  - Write 1 to "l1_1_aspm" to enable L1.1

does L1.2 get implicitly enabled as well even though that's clearly
not what the user intended?

There's also the separate question of how the sysfs file and the
pci_disable_link_state() API should interact.  Drivers use that API
when they know about a defect in their device, but the user can
override that via syfs.

In [1], we have a similar situation with D3cold support, where we're
thinking that we should not allow a user to use sysfs to override that
driver knowledge.

Bjorn

[1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de

> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 3dafba0b5..6d3788257 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>  	if (state & PCIE_LINK_STATE_L0S)
>  		link->aspm_disable |= ASPM_STATE_L0S;
>  	if (state & PCIE_LINK_STATE_L1)
> -		link->aspm_disable |= ASPM_STATE_L1;
> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>  	if (state & PCIE_LINK_STATE_L1_1)
>  		link->aspm_disable |= ASPM_STATE_L1_1;
>  	if (state & PCIE_LINK_STATE_L1_2)
> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>  			link->aspm_disable &= ~ASPM_STATE_L1;
>  	} else {
>  		link->aspm_disable |= state;
> +		if (state & ASPM_STATE_L1)
> +			link->aspm_disable |= ASPM_STATE_L1SS;
>  	}
>  
>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
> -- 
> 2.42.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1
  2023-10-02 15:14 ` Bjorn Helgaas
@ 2023-10-03  3:33   ` Kuppuswamy Sathyanarayanan
  2023-10-10 20:33     ` Heiner Kallweit
  2023-10-10 20:29   ` Heiner Kallweit
  1 sibling, 1 reply; 5+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2023-10-03  3:33 UTC (permalink / raw)
  To: Bjorn Helgaas, Heiner Kallweit
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Ajay Agarwal,
	Lukas Wunner



On 10/2/2023 8:14 AM, Bjorn Helgaas wrote:
> [+cc Sathy, Lukas]
> 
> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>> After the referenced commit we may see L1 sub-states being active
>> unexpectedly. Following scenario as an example:
>> r8169 disables L1 because of known hardware issues on a number of
>> systems. Implicitly L1.1 and L1.2 are disabled too.
>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>> rx packets. Therefore I write 1 to aspm_l1_1.
>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>> unexpectedly enables also L1.2. So return to the old behavior.

IIUC, the above-mentioned SysFS issue will be fixed by your change to
aspm_attr_store_common(), right?

Can you elaborate why you need the following change?

>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;

>>
>> A comment in the commit message of the referenced change correctly points
>> out that this behavior is inconsistent with aspm_attr_store_common().
>> So change aspm_attr_store_common() accordingly.
> 
> I think we should split this into a pure revert of fb097dcd5a28 with
> the description of the unintended consequence, followed by another
> patch to change aspm_attr_store_common().
> 

I agree, the revert and new change should be split into two patches.

> I guess the existing aspm_attr_store_common() behavior allows a
> similar unexpected behavior?  For example, in this sequence:
> 
>   - Write 0 to "l1_aspm" to disable L1
>   - Write 1 to "l1_1_aspm" to enable L1.1
> 
> does L1.2 get implicitly enabled as well even though that's clearly
> not what the user intended?
> 
> There's also the separate question of how the sysfs file and the
> pci_disable_link_state() API should interact.  Drivers use that API
> when they know about a defect in their device, but the user can
> override that via syfs.
> 
> In [1], we have a similar situation with D3cold support, where we're
> thinking that we should not allow a user to use sysfs to override that
> driver knowledge.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
> 
>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 3dafba0b5..6d3788257 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L0S)
>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>  	if (state & PCIE_LINK_STATE_L1_1)
>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>  	if (state & PCIE_LINK_STATE_L1_2)
>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>  	} else {
>>  		link->aspm_disable |= state;
>> +		if (state & ASPM_STATE_L1)
>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>  	}
>>  
>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>> -- 
>> 2.42.0
>>

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1
  2023-10-02 15:14 ` Bjorn Helgaas
  2023-10-03  3:33   ` Kuppuswamy Sathyanarayanan
@ 2023-10-10 20:29   ` Heiner Kallweit
  1 sibling, 0 replies; 5+ messages in thread
From: Heiner Kallweit @ 2023-10-10 20:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Ajay Agarwal,
	Kuppuswamy Sathyanarayanan, Lukas Wunner

On 02.10.2023 17:14, Bjorn Helgaas wrote:
> [+cc Sathy, Lukas]
> 
> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>> After the referenced commit we may see L1 sub-states being active
>> unexpectedly. Following scenario as an example:
>> r8169 disables L1 because of known hardware issues on a number of
>> systems. Implicitly L1.1 and L1.2 are disabled too.
>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>> rx packets. Therefore I write 1 to aspm_l1_1.
>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>> unexpectedly enables also L1.2. So return to the old behavior.
>>
>> A comment in the commit message of the referenced change correctly points
>> out that this behavior is inconsistent with aspm_attr_store_common().
>> So change aspm_attr_store_common() accordingly.
> 
> I think we should split this into a pure revert of fb097dcd5a28 with
> the description of the unintended consequence, followed by another
> patch to change aspm_attr_store_common().
> 
OK

> I guess the existing aspm_attr_store_common() behavior allows a
> similar unexpected behavior?  For example, in this sequence:
> 
>   - Write 0 to "l1_aspm" to disable L1
>   - Write 1 to "l1_1_aspm" to enable L1.1
> 
> does L1.2 get implicitly enabled as well even though that's clearly
> not what the user intended?
> 
Right, it's the same here. Therefore the second change in the patch.

> There's also the separate question of how the sysfs file and the
> pci_disable_link_state() API should interact.  Drivers use that API
> when they know about a defect in their device, but the user can
> override that via syfs.
> 
> In [1], we have a similar situation with D3cold support, where we're
> thinking that we should not allow a user to use sysfs to override that
> driver knowledge.
> 
In my r8169 use case ASPM works fine on one system, but causes issues
on another one with same NIC chip version. So it may be a wild mix of
NIC hw erratum, BIOS bug, mainboard chipset incompatibilities, etc.
Therefore I disable L1 per default, and allow users to re-enable
L1/L1.1/L1.2 if their system isn't affected.

> Bjorn
> 
> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
> 
>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 3dafba0b5..6d3788257 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L0S)
>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>  	if (state & PCIE_LINK_STATE_L1_1)
>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>  	if (state & PCIE_LINK_STATE_L1_2)
>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>  	} else {
>>  		link->aspm_disable |= state;
>> +		if (state & ASPM_STATE_L1)
>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>  	}
>>  
>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>> -- 
>> 2.42.0
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1
  2023-10-03  3:33   ` Kuppuswamy Sathyanarayanan
@ 2023-10-10 20:33     ` Heiner Kallweit
  0 siblings, 0 replies; 5+ messages in thread
From: Heiner Kallweit @ 2023-10-10 20:33 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Ajay Agarwal,
	Lukas Wunner

On 03.10.2023 05:33, Kuppuswamy Sathyanarayanan wrote:
> 
> 
> On 10/2/2023 8:14 AM, Bjorn Helgaas wrote:
>> [+cc Sathy, Lukas]
>>
>> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>>> After the referenced commit we may see L1 sub-states being active
>>> unexpectedly. Following scenario as an example:
>>> r8169 disables L1 because of known hardware issues on a number of
>>> systems. Implicitly L1.1 and L1.2 are disabled too.
>>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>>> rx packets. Therefore I write 1 to aspm_l1_1.
>>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>>> unexpectedly enables also L1.2. So return to the old behavior.
> 
> IIUC, the above-mentioned SysFS issue will be fixed by your change to
> aspm_attr_store_common(), right?
> 
In my use case the driver uses pci_disable_link_state() to disable L1
and implicitly also L1.1 and L1.2. The following change is needed so
that L1.2 remains disabled if L1 is removed from the list of disabled
states later. That's the revert we're talking about.

> Can you elaborate why you need the following change?
> 
>>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>>  	if (state & PCIE_LINK_STATE_L1)
>>> -		link->aspm_disable |= ASPM_STATE_L1;
>>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
> 
>>>
>>> A comment in the commit message of the referenced change correctly points
>>> out that this behavior is inconsistent with aspm_attr_store_common().
>>> So change aspm_attr_store_common() accordingly.
>>
>> I think we should split this into a pure revert of fb097dcd5a28 with
>> the description of the unintended consequence, followed by another
>> patch to change aspm_attr_store_common().
>>
> 
> I agree, the revert and new change should be split into two patches.
> 
>> I guess the existing aspm_attr_store_common() behavior allows a
>> similar unexpected behavior?  For example, in this sequence:
>>
>>   - Write 0 to "l1_aspm" to disable L1
>>   - Write 1 to "l1_1_aspm" to enable L1.1
>>
>> does L1.2 get implicitly enabled as well even though that's clearly
>> not what the user intended?
>>
>> There's also the separate question of how the sysfs file and the
>> pci_disable_link_state() API should interact.  Drivers use that API
>> when they know about a defect in their device, but the user can
>> override that via syfs.
>>
>> In [1], we have a similar situation with D3cold support, where we're
>> thinking that we should not allow a user to use sysfs to override that
>> driver knowledge.
>>
>> Bjorn
>>
>> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
>>
>>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>> ---
>>>  drivers/pci/pcie/aspm.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>> index 3dafba0b5..6d3788257 100644
>>> --- a/drivers/pci/pcie/aspm.c
>>> +++ b/drivers/pci/pcie/aspm.c
>>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>>  	if (state & PCIE_LINK_STATE_L0S)
>>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>>  	if (state & PCIE_LINK_STATE_L1)
>>> -		link->aspm_disable |= ASPM_STATE_L1;
>>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>>  	if (state & PCIE_LINK_STATE_L1_1)
>>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>>  	if (state & PCIE_LINK_STATE_L1_2)
>>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>>  	} else {
>>>  		link->aspm_disable |= state;
>>> +		if (state & ASPM_STATE_L1)
>>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>>  	}
>>>  
>>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>>> -- 
>>> 2.42.0
>>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-10 20:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-26 11:10 [PATCH] PCI/ASPM: fix unexpected behavior when re-enabling L1 Heiner Kallweit
2023-10-02 15:14 ` Bjorn Helgaas
2023-10-03  3:33   ` Kuppuswamy Sathyanarayanan
2023-10-10 20:33     ` Heiner Kallweit
2023-10-10 20:29   ` Heiner Kallweit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).