* [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe
@ 2024-07-19 7:52 Jian-Hong Pan
2024-07-19 7:55 ` [PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates Jian-Hong Pan
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-07-19 7:52 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, linux-pci, linux-kernel, linux,
Jian-Hong Pan
Notice the VMD remapped PCIe Root Port and NVMe have PCI PM L1 substates
capability, but they are disabled originally.
Here is a failed example on ASUS B1400CEAE with enabled VMD:
10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode])
...
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=45us LTR1.2_Threshold=101376ns
L1SubCtl2: T_PwrOn=50us
10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express])
...
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
According to "PCIe r6.0, sec 5.5.4", to config the link between the PCIe
Root Port and the child device correctly:
* Ensure both devices are in D0 before enabling PCI-PM L1 PM Substates.
* Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
LTR_L1.2_THRESHOLD are programmed properly on both devices before enable
bits for L1.2.
Prepare this series to fix that.
Jian-Hong Pan (4):
PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates
PCI/ASPM: Add notes about enabling PCI-PM L1SS to
pci_enable_link_state(_locked)
PCI/ASPM: Introduce aspm_get_l1ss_cap()
PCI/ASPM: Fix L1.2 parameters when enable link state
drivers/pci/controller/vmd.c | 13 +++++++----
drivers/pci/pcie/aspm.c | 44 ++++++++++++++++++++++++++++--------
2 files changed, 44 insertions(+), 13 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates
2024-07-19 7:52 [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe Jian-Hong Pan
@ 2024-07-19 7:55 ` Jian-Hong Pan
2024-07-19 7:57 ` [PATCH v8 2/4] PCI/ASPM: Add notes about enabling PCI-PM L1SS to pci_enable_link_state(_locked) Jian-Hong Pan
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-07-19 7:55 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, linux-pci, linux-kernel, linux,
Jian-Hong Pan
The remapped PCIe Root Port and the child device have PCI PM L1 substates
capability, but they are disabled originally.
Here is a failed example on ASUS B1400CEAE:
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Power on all of the VMD remapped PCI devices to D0 before enable PCI-PM L1
PM Substates by following "PCIe r6.0, sec 5.5.4".
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
v2:
- Power on the VMD remapped devices with pci_set_power_state_locked()
- Prepare the PCIe LTR parameters before enable L1 Substates
- Add note into the comments of both pci_enable_link_state() and
pci_enable_link_state_locked() for kernel-doc.
- The original patch set can be split as individual patches.
v3:
- Re-send for the missed version information.
- Split drivers/pci/pcie/aspm.c modification into following patches.
- Fix the comment for enasuring the PCI devices in D0.
v4:
- The same
v5:
- Tweak the commit title and message
- Change the goto label from out_enable_link_state to out_state_change
v6~8:
- The same
drivers/pci/controller/vmd.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index a726de0af011..d021129d661c 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
if (!(features & VMD_FEAT_BIOS_PM_QUIRK))
return 0;
- pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
-
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
if (!pos)
- return 0;
+ goto out_state_change;
/*
* Skip if the max snoop LTR is non-zero, indicating BIOS has set it
@@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
*/
pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg);
if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK)))
- return 0;
+ goto out_state_change;
/*
* Set the default values to the maximum required by the platform to
@@ -775,6 +773,13 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg);
pci_info(pdev, "VMD: Default LTR value set by driver\n");
+out_state_change:
+ /*
+ * Ensure devices are in D0 before enabling PCI-PM L1 PM Substates, per
+ * PCIe r6.0, sec 5.5.4.
+ */
+ pci_set_power_state_locked(pdev, PCI_D0);
+ pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 2/4] PCI/ASPM: Add notes about enabling PCI-PM L1SS to pci_enable_link_state(_locked)
2024-07-19 7:52 [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe Jian-Hong Pan
2024-07-19 7:55 ` [PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates Jian-Hong Pan
@ 2024-07-19 7:57 ` Jian-Hong Pan
2024-07-19 7:59 ` [PATCH v8 3/4] PCI/ASPM: Introduce aspm_get_l1ss_cap() Jian-Hong Pan
2024-07-19 8:02 ` [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state Jian-Hong Pan
3 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-07-19 7:57 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, linux-pci, linux-kernel, linux,
Jian-Hong Pan, Bjorn Helgaas
According to "PCIe r6.0, sec 5.5.4", add note about D0 requirement in
pci_enable_link_state() kernel-doc.
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
v3:
- Fix as readable comments
v4:
- The same
v5:
- Tweak and simplify the commit message
v6~8:
- The same
drivers/pci/pcie/aspm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8..bd0a8a05647e 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1442,6 +1442,9 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked)
* touch the LNKCTL register. Also note that this does not enable states
* disabled by pci_disable_link_state(). Return 0 or a negative errno.
*
+ * Note: Ensure devices are in D0 before enabling PCI-PM L1 PM Substates, per
+ * PCIe r6.0, sec 5.5.4.
+ *
* @pdev: PCI device
* @state: Mask of ASPM link states to enable
*/
@@ -1458,6 +1461,9 @@ EXPORT_SYMBOL(pci_enable_link_state);
* can't touch the LNKCTL register. Also note that this does not enable states
* disabled by pci_disable_link_state(). Return 0 or a negative errno.
*
+ * Note: Ensure devices are in D0 before enabling PCI-PM L1 PM Substates, per
+ * PCIe r6.0, sec 5.5.4.
+ *
* @pdev: PCI device
* @state: Mask of ASPM link states to enable
*
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 3/4] PCI/ASPM: Introduce aspm_get_l1ss_cap()
2024-07-19 7:52 [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe Jian-Hong Pan
2024-07-19 7:55 ` [PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates Jian-Hong Pan
2024-07-19 7:57 ` [PATCH v8 2/4] PCI/ASPM: Add notes about enabling PCI-PM L1SS to pci_enable_link_state(_locked) Jian-Hong Pan
@ 2024-07-19 7:59 ` Jian-Hong Pan
2024-07-19 8:02 ` [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state Jian-Hong Pan
3 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-07-19 7:59 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, linux-pci, linux-kernel, linux,
Jian-Hong Pan
Introduce aspm_get_l1ss_cap() which is extracted from aspm_l1ss_init() to
get the PCIe's L1SS capability. This does not change any behavior, but
aspm_get_l1ss_cap() can be reused later.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
---
v6:
- Skipped
v7:
- Pick back
v8:
- The same
drivers/pci/pcie/aspm.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index bd0a8a05647e..5db1044c9895 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -611,6 +611,18 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
}
}
+static u32 aspm_get_l1ss_cap(struct pci_dev *pdev)
+{
+ u32 l1ss_cap;
+
+ pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CAP, &l1ss_cap);
+
+ if (!(l1ss_cap & PCI_L1SS_CAP_L1_PM_SS))
+ l1ss_cap = 0;
+
+ return l1ss_cap;
+}
+
/* Calculate L1.2 PM substate timing parameters */
static void aspm_calc_l12_info(struct pcie_link_state *link,
u32 parent_l1ss_cap, u32 child_l1ss_cap)
@@ -721,15 +733,8 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
return;
/* Setup L1 substate */
- pci_read_config_dword(parent, parent->l1ss + PCI_L1SS_CAP,
- &parent_l1ss_cap);
- pci_read_config_dword(child, child->l1ss + PCI_L1SS_CAP,
- &child_l1ss_cap);
-
- if (!(parent_l1ss_cap & PCI_L1SS_CAP_L1_PM_SS))
- parent_l1ss_cap = 0;
- if (!(child_l1ss_cap & PCI_L1SS_CAP_L1_PM_SS))
- child_l1ss_cap = 0;
+ parent_l1ss_cap = aspm_get_l1ss_cap(parent);
+ child_l1ss_cap = aspm_get_l1ss_cap(child);
/*
* If we don't have LTR for the entire path from the Root Complex
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-07-19 7:52 [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe Jian-Hong Pan
` (2 preceding siblings ...)
2024-07-19 7:59 ` [PATCH v8 3/4] PCI/ASPM: Introduce aspm_get_l1ss_cap() Jian-Hong Pan
@ 2024-07-19 8:02 ` Jian-Hong Pan
2024-08-02 8:24 ` Jian-Hong Pan
3 siblings, 1 reply; 22+ messages in thread
From: Jian-Hong Pan @ 2024-07-19 8:02 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
linux-kernel, linux, Jian-Hong Pan
Currently, when enable link's L1.2 features with __pci_enable_link_state(),
it configs the link directly without ensuring related L1.2 parameters, such
as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
programmed.
This leads the link's L1.2 between PCIe Root Port and child device gets
wrong configs when a caller tries to enabled it.
Here is a failed example on ASUS B1400CEAE with enabled VMD:
10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode])
...
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=45us LTR1.2_Threshold=101376ns
L1SubCtl2: T_PwrOn=50us
10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express])
...
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe
Root Port and the child NVMe, they should be programmed with the same
LTR1.2_Threshold value. However, they have different values in this case.
Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before
enable L1.2 bits of L1 PM Substates Control Register in
__pci_enable_link_state().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
---
v2:
- Prepare the PCIe LTR parameters before enable L1 Substates
v3:
- Only enable supported features for the L1 Substates part
v4:
- Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS
v5:
- Fix typo and commit message
- Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
aspm_get_l1ss_cap()"
v6:
- Skipped
v7:
- Pick back and rebase on the new version kernel
- Drop the link state flag check. And, always config link state's timing
parameters
v8:
- Because pcie_aspm_get_link() might return the link as NULL, move
getting the link's parent and child devices after check the link is
not NULL. This avoids NULL memory access.
drivers/pci/pcie/aspm.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 5db1044c9895..55ff1d26fcea 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked)
{
struct pcie_link_state *link = pcie_aspm_get_link(pdev);
+ u32 parent_l1ss_cap, child_l1ss_cap;
+ struct pci_dev *parent, *child;
if (!link)
return -EINVAL;
+
+ parent = link->pdev;
+ child = link->downstream;
+
/*
* A driver requested that ASPM be enabled on this device, but
* if we don't have permission to manage ASPM (e.g., on ACPI
@@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked)
if (!locked)
down_read(&pci_bus_sem);
mutex_lock(&aspm_lock);
+ /*
+ * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
+ * LTR_L1.2_THRESHOLD are programmed properly before enable bits for
+ * L1.2, per PCIe r6.0, sec 5.5.4.
+ */
+ parent_l1ss_cap = aspm_get_l1ss_cap(parent);
+ child_l1ss_cap = aspm_get_l1ss_cap(child);
+ aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
+
link->aspm_default = pci_calc_aspm_enable_mask(state);
pcie_config_aspm_link(link, policy_to_aspm_state(link));
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-07-19 8:02 ` [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state Jian-Hong Pan
@ 2024-08-02 8:24 ` Jian-Hong Pan
2024-08-05 18:24 ` Nirmal Patel
2024-08-05 20:26 ` David E. Box
0 siblings, 2 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-08-02 8:24 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
linux-kernel, linux
Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
>
> Currently, when enable link's L1.2 features with __pci_enable_link_state(),
> it configs the link directly without ensuring related L1.2 parameters, such
> as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> programmed.
>
> This leads the link's L1.2 between PCIe Root Port and child device gets
> wrong configs when a caller tries to enabled it.
>
> Here is a failed example on ASUS B1400CEAE with enabled VMD:
>
> 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode])
> ...
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=45us LTR1.2_Threshold=101376ns
> L1SubCtl2: T_PwrOn=50us
>
> 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> ...
> Capabilities: [900 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
> PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=0ns
> L1SubCtl2: T_PwrOn=10us
>
> According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe
> Root Port and the child NVMe, they should be programmed with the same
> LTR1.2_Threshold value. However, they have different values in this case.
>
> Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before
> enable L1.2 bits of L1 PM Substates Control Register in
> __pci_enable_link_state().
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> ---
> v2:
> - Prepare the PCIe LTR parameters before enable L1 Substates
>
> v3:
> - Only enable supported features for the L1 Substates part
>
> v4:
> - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS
>
> v5:
> - Fix typo and commit message
> - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> aspm_get_l1ss_cap()"
>
> v6:
> - Skipped
>
> v7:
> - Pick back and rebase on the new version kernel
> - Drop the link state flag check. And, always config link state's timing
> parameters
>
> v8:
> - Because pcie_aspm_get_link() might return the link as NULL, move
> getting the link's parent and child devices after check the link is
> not NULL. This avoids NULL memory access.
>
> drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 5db1044c9895..55ff1d26fcea 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked)
> {
> struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> + u32 parent_l1ss_cap, child_l1ss_cap;
> + struct pci_dev *parent, *child;
>
> if (!link)
> return -EINVAL;
> +
> + parent = link->pdev;
> + child = link->downstream;
> +
> /*
> * A driver requested that ASPM be enabled on this device, but
> * if we don't have permission to manage ASPM (e.g., on ACPI
> @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked)
> if (!locked)
> down_read(&pci_bus_sem);
> mutex_lock(&aspm_lock);
> + /*
> + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
> + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for
> + * L1.2, per PCIe r6.0, sec 5.5.4.
> + */
> + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> + child_l1ss_cap = aspm_get_l1ss_cap(child);
> + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> +
> link->aspm_default = pci_calc_aspm_enable_mask(state);
> pcie_config_aspm_link(link, policy_to_aspm_state(link));
>
> --
> 2.45.2
>
Hi Nirmal and Paul,
It will be great to have your review here.
I had tried to "set the threshold value in vmd_pm_enable_quirk()"
directly as Paul said [1]. However, it still needs to get the PCIe
link from the PCIe device to set the threshold value.
And, pci_enable_link_state_locked() gets the link. Then, it will be
great to calculate and programm L1 sub-states' parameters properly
before configuring the link's ASPM there.
[1]: https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-02 8:24 ` Jian-Hong Pan
@ 2024-08-05 18:24 ` Nirmal Patel
2024-08-07 4:23 ` Jian-Hong Pan
2024-08-05 20:26 ` David E. Box
1 sibling, 1 reply; 22+ messages in thread
From: Nirmal Patel @ 2024-08-05 18:24 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: Bjorn Helgaas, Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, linux-kernel,
linux
On Fri, 2 Aug 2024 16:24:18 +0800
Jian-Hong Pan <jhp@endlessos.org> wrote:
> Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> >
> > Currently, when enable link's L1.2 features with
> > __pci_enable_link_state(), it configs the link directly without
> > ensuring related L1.2 parameters, such as T_POWER_ON,
> > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> > programmed.
> >
> > This leads the link's L1.2 between PCIe Root Port and child device
> > gets wrong configs when a caller tries to enabled it.
> >
> > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> >
> > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > PCIe Controller (rev 01) (prog-if 00 [Normal decode]) ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > L1_PM_Substates+ PortCommonModeRestoreTime=45us
> > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1-
> > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > L1_PM_Substates+ PortCommonModeRestoreTime=32us
> > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1-
> > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=10us
> >
> > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on
> > the PCIe Root Port and the child NVMe, they should be programmed
> > with the same LTR1.2_Threshold value. However, they have different
> > values in this case.
> >
> > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > before enable L1.2 bits of L1 PM Substates Control Register in
> > __pci_enable_link_state().
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > ---
> > v2:
> > - Prepare the PCIe LTR parameters before enable L1 Substates
> >
> > v3:
> > - Only enable supported features for the L1 Substates part
> >
> > v4:
> > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > L1SS
> >
> > v5:
> > - Fix typo and commit message
> > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > aspm_get_l1ss_cap()"
> >
> > v6:
> > - Skipped
> >
> > v7:
> > - Pick back and rebase on the new version kernel
> > - Drop the link state flag check. And, always config link state's
> > timing parameters
> >
> > v8:
> > - Because pcie_aspm_get_link() might return the link as NULL, move
> > getting the link's parent and child devices after check the link
> > is not NULL. This avoids NULL memory access.
> >
> > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 5db1044c9895..55ff1d26fcea 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > static int __pci_enable_link_state(struct pci_dev *pdev, int
> > state, bool locked) {
> > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > + u32 parent_l1ss_cap, child_l1ss_cap;
> > + struct pci_dev *parent, *child;
> >
> > if (!link)
> > return -EINVAL;
> > +
> > + parent = link->pdev;
> > + child = link->downstream;
> > +
> > /*
> > * A driver requested that ASPM be enabled on this device,
> > but
> > * if we don't have permission to manage ASPM (e.g., on ACPI
> > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > pci_dev *pdev, int state, bool locked) if (!locked)
> > down_read(&pci_bus_sem);
> > mutex_lock(&aspm_lock);
> > + /*
> > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > T_POWER_ON and
> > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > bits for
> > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > + */
> > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > +
> > link->aspm_default = pci_calc_aspm_enable_mask(state);
> > pcie_config_aspm_link(link, policy_to_aspm_state(link));
> >
> > --
> > 2.45.2
> >
>
> Hi Nirmal and Paul,
>
> It will be great to have your review here.
>
> I had tried to "set the threshold value in vmd_pm_enable_quirk()"
> directly as Paul said [1]. However, it still needs to get the PCIe
> link from the PCIe device to set the threshold value.
> And, pci_enable_link_state_locked() gets the link. Then, it will be
> great to calculate and programm L1 sub-states' parameters properly
> before configuring the link's ASPM there.
>
> [1]:
> https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
>
> Jian-Hong Pan
Hi Jian-Hong Pan,
I am not an LTR, ASPM expert, but this part looks good to me.
Can you explain why you decided to move pci_enable_link_state_locked()
call down to out_state_change in vmd.c? Will it cause any issue if
pci_find_ext_capability returns 0?
Thanks
-nirmal
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-02 8:24 ` Jian-Hong Pan
2024-08-05 18:24 ` Nirmal Patel
@ 2024-08-05 20:26 ` David E. Box
2024-08-07 10:05 ` Jian-Hong Pan
1 sibling, 1 reply; 22+ messages in thread
From: David E. Box @ 2024-08-05 20:26 UTC (permalink / raw)
To: Jian-Hong Pan, Bjorn Helgaas
Cc: Johan Hovold, Ilpo Järvinen, Kuppuswamy Sathyanarayanan,
Mika Westerberg, Damien Le Moal, Nirmal Patel, Jonathan Derrick,
Paul M Stillwell Jr, linux-pci, linux-kernel, linux
Hi Jian-Hong,
On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> >
> > Currently, when enable link's L1.2 features with __pci_enable_link_state(),
> > it configs the link directly without ensuring related L1.2 parameters, such
> > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> > programmed.
> >
> > This leads the link's L1.2 between PCIe Root Port and child device gets
> > wrong configs when a caller tries to enabled it.
> >
> > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> >
> > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe
> > Controller (rev 01) (prog-if 00 [Normal decode])
> > ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > L1_PM_Substates+
> > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550
> > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > L1_PM_Substates+
> > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=10us
> >
> > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe
> > Root Port and the child NVMe, they should be programmed with the same
> > LTR1.2_Threshold value. However, they have different values in this case.
> >
> > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before
> > enable L1.2 bits of L1 PM Substates Control Register in
> > __pci_enable_link_state().
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > ---
> > v2:
> > - Prepare the PCIe LTR parameters before enable L1 Substates
> >
> > v3:
> > - Only enable supported features for the L1 Substates part
> >
> > v4:
> > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS
> >
> > v5:
> > - Fix typo and commit message
> > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > aspm_get_l1ss_cap()"
> >
> > v6:
> > - Skipped
> >
> > v7:
> > - Pick back and rebase on the new version kernel
> > - Drop the link state flag check. And, always config link state's timing
> > parameters
> >
> > v8:
> > - Because pcie_aspm_get_link() might return the link as NULL, move
> > getting the link's parent and child devices after check the link is
> > not NULL. This avoids NULL memory access.
> >
> > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 5db1044c9895..55ff1d26fcea 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool
> > locked)
> > {
> > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > + u32 parent_l1ss_cap, child_l1ss_cap;
> > + struct pci_dev *parent, *child;
> >
> > if (!link)
> > return -EINVAL;
> > +
> > + parent = link->pdev;
> > + child = link->downstream;
> > +
> > /*
> > * A driver requested that ASPM be enabled on this device, but
> > * if we don't have permission to manage ASPM (e.g., on ACPI
> > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev
> > *pdev, int state, bool locked)
> > if (!locked)
> > down_read(&pci_bus_sem);
> > mutex_lock(&aspm_lock);
> > + /*
> > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
> > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for
> > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > + */
> > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
I still don't think this is the place to recalculate the L1.2 parameters
especially when know the calculation was done but was cleared by
pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
pci_bus_reset() in vmd.c?
David
> > +
> > link->aspm_default = pci_calc_aspm_enable_mask(state);
> > pcie_config_aspm_link(link, policy_to_aspm_state(link));
> >
> > --
> > 2.45.2
> >
>
> Hi Nirmal and Paul,
>
> It will be great to have your review here.
>
> I had tried to "set the threshold value in vmd_pm_enable_quirk()"
> directly as Paul said [1]. However, it still needs to get the PCIe
> link from the PCIe device to set the threshold value.
> And, pci_enable_link_state_locked() gets the link. Then, it will be
> great to calculate and programm L1 sub-states' parameters properly
> before configuring the link's ASPM there.
>
> [1]:
> https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
>
> Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-05 18:24 ` Nirmal Patel
@ 2024-08-07 4:23 ` Jian-Hong Pan
0 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-08-07 4:23 UTC (permalink / raw)
To: Nirmal Patel
Cc: Bjorn Helgaas, Johan Hovold, David Box, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, linux-kernel,
linux
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年8月6日 週二 上午2:25寫道:
>
> On Fri, 2 Aug 2024 16:24:18 +0800
> Jian-Hong Pan <jhp@endlessos.org> wrote:
>
> > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > >
> > > Currently, when enable link's L1.2 features with
> > > __pci_enable_link_state(), it configs the link directly without
> > > ensuring related L1.2 parameters, such as T_POWER_ON,
> > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> > > programmed.
> > >
> > > This leads the link's L1.2 between PCIe Root Port and child device
> > > gets wrong configs when a caller tries to enabled it.
> > >
> > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > >
> > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > PCIe Controller (rev 01) (prog-if 00 [Normal decode]) ...
> > > Capabilities: [200 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > L1_PM_Substates+ PortCommonModeRestoreTime=45us
> > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1-
> > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > L1SubCtl2: T_PwrOn=50us
> > >
> > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) ...
> > > Capabilities: [900 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > L1_PM_Substates+ PortCommonModeRestoreTime=32us
> > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1-
> > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns
> > > L1SubCtl2: T_PwrOn=10us
> > >
> > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on
> > > the PCIe Root Port and the child NVMe, they should be programmed
> > > with the same LTR1.2_Threshold value. However, they have different
> > > values in this case.
> > >
> > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > before enable L1.2 bits of L1 PM Substates Control Register in
> > > __pci_enable_link_state().
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > ---
> > > v2:
> > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > >
> > > v3:
> > > - Only enable supported features for the L1 Substates part
> > >
> > > v4:
> > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > L1SS
> > >
> > > v5:
> > > - Fix typo and commit message
> > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > aspm_get_l1ss_cap()"
> > >
> > > v6:
> > > - Skipped
> > >
> > > v7:
> > > - Pick back and rebase on the new version kernel
> > > - Drop the link state flag check. And, always config link state's
> > > timing parameters
> > >
> > > v8:
> > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > getting the link's parent and child devices after check the link
> > > is not NULL. This avoids NULL memory access.
> > >
> > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > 1 file changed, 15 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 5db1044c9895..55ff1d26fcea 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > static int __pci_enable_link_state(struct pci_dev *pdev, int
> > > state, bool locked) {
> > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > + struct pci_dev *parent, *child;
> > >
> > > if (!link)
> > > return -EINVAL;
> > > +
> > > + parent = link->pdev;
> > > + child = link->downstream;
> > > +
> > > /*
> > > * A driver requested that ASPM be enabled on this device,
> > > but
> > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > pci_dev *pdev, int state, bool locked) if (!locked)
> > > down_read(&pci_bus_sem);
> > > mutex_lock(&aspm_lock);
> > > + /*
> > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > T_POWER_ON and
> > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > bits for
> > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > + */
> > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > +
> > > link->aspm_default = pci_calc_aspm_enable_mask(state);
> > > pcie_config_aspm_link(link, policy_to_aspm_state(link));
> > >
> > > --
> > > 2.45.2
> > >
> >
> > Hi Nirmal and Paul,
> >
> > It will be great to have your review here.
> >
> > I had tried to "set the threshold value in vmd_pm_enable_quirk()"
> > directly as Paul said [1]. However, it still needs to get the PCIe
> > link from the PCIe device to set the threshold value.
> > And, pci_enable_link_state_locked() gets the link. Then, it will be
> > great to calculate and programm L1 sub-states' parameters properly
> > before configuring the link's ASPM there.
> >
> > [1]:
> > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
> >
> > Jian-Hong Pan
>
> Hi Jian-Hong Pan,
>
> I am not an LTR, ASPM expert, but this part looks good to me.
>
> Can you explain why you decided to move pci_enable_link_state_locked()
> call down to out_state_change in vmd.c?
The idea is setting all LTR related parameters before enabling the ASPM feature.
> Will it cause any issue if pci_find_ext_capability returns 0?
If pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR) in
vmd_pm_enable_quirk() returns 0, then the device is not a PCIe device.
Then, it goes to:
...
pci_enable_link_state_locked()
__pci_enable_link_state()
__pci_enable_link_state() uses pcie_aspm_get_link() to get the link
between the PCIe bridge and the PCIe device. And,
pcie_aspm_get_link() returns the link as a barrier. If
pcie_aspm_get_link() does not get the link, then the device is a PCIe
bridge, or not a PCIe device. Because the link is NULL,
__pci_enable_link_state() returns with -EINVAL directly and will not
configure/enable ASPM things.
Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-05 20:26 ` David E. Box
@ 2024-08-07 10:05 ` Jian-Hong Pan
2024-08-07 11:18 ` Ilpo Järvinen
0 siblings, 1 reply; 22+ messages in thread
From: Jian-Hong Pan @ 2024-08-07 10:05 UTC (permalink / raw)
To: david.e.box
Cc: Bjorn Helgaas, Johan Hovold, Ilpo Järvinen,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
linux-kernel, linux
David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
>
> Hi Jian-Hong,
>
> On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > >
> > > Currently, when enable link's L1.2 features with __pci_enable_link_state(),
> > > it configs the link directly without ensuring related L1.2 parameters, such
> > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> > > programmed.
> > >
> > > This leads the link's L1.2 between PCIe Root Port and child device gets
> > > wrong configs when a caller tries to enabled it.
> > >
> > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > >
> > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe
> > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > ...
> > > Capabilities: [200 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > L1_PM_Substates+
> > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > L1SubCtl2: T_PwrOn=50us
> > >
> > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550
> > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > ...
> > > Capabilities: [900 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > L1_PM_Substates+
> > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > L1SubCtl2: T_PwrOn=10us
> > >
> > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe
> > > Root Port and the child NVMe, they should be programmed with the same
> > > LTR1.2_Threshold value. However, they have different values in this case.
> > >
> > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before
> > > enable L1.2 bits of L1 PM Substates Control Register in
> > > __pci_enable_link_state().
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > ---
> > > v2:
> > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > >
> > > v3:
> > > - Only enable supported features for the L1 Substates part
> > >
> > > v4:
> > > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS
> > >
> > > v5:
> > > - Fix typo and commit message
> > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > aspm_get_l1ss_cap()"
> > >
> > > v6:
> > > - Skipped
> > >
> > > v7:
> > > - Pick back and rebase on the new version kernel
> > > - Drop the link state flag check. And, always config link state's timing
> > > parameters
> > >
> > > v8:
> > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > getting the link's parent and child devices after check the link is
> > > not NULL. This avoids NULL memory access.
> > >
> > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > 1 file changed, 15 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 5db1044c9895..55ff1d26fcea 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool
> > > locked)
> > > {
> > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > + struct pci_dev *parent, *child;
> > >
> > > if (!link)
> > > return -EINVAL;
> > > +
> > > + parent = link->pdev;
> > > + child = link->downstream;
> > > +
> > > /*
> > > * A driver requested that ASPM be enabled on this device, but
> > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev
> > > *pdev, int state, bool locked)
> > > if (!locked)
> > > down_read(&pci_bus_sem);
> > > mutex_lock(&aspm_lock);
> > > + /*
> > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
> > > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for
> > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > + */
> > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
>
> I still don't think this is the place to recalculate the L1.2 parameters
> especially when know the calculation was done but was cleared by
> pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> pci_bus_reset() in vmd.c?
I have not thought pci_save/restore_state() around pci_bus_reset()
before. It is an interesting direction.
So, I prepare modification below for test. Include "[PATCH v8 1/4]
PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
too. Then, both the PCIe bridge and the PCIe device have the same
LTR_L1.2_THRESHOLD 101376ns as expected.
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index bbf4a47e7b31..6b8dd4f30127 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
pci_host_bridge *root_bridge,
vmd_bridge->native_dpc = root_bridge->native_dpc;
}
+static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
+{
+ pci_save_state(pdev);
+ return 0;
+}
+
+static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
+{
+ pci_restore_state(pdev);
+ return 0;
+}
+
/*
* Enable ASPM and LTR settings on devices that aren't configured by BIOS.
*/
@@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
unsigned long features)
pci_scan_child_bus(vmd->bus);
vmd_domain_reset(vmd);
+ pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
/* When Intel VMD is enabled, the OS does not discover the Root Ports
* owned by Intel VMD within the MMCFG space. pci_reset_bus() applies
* a reset to the parent of the PCI device supplied as argument. This
@@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
unsigned long features)
break;
}
}
+ pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
pci_assign_unassigned_bus_resources(vmd->bus);
Jian-Hong Pan
> > > +
> > > link->aspm_default = pci_calc_aspm_enable_mask(state);
> > > pcie_config_aspm_link(link, policy_to_aspm_state(link));
> > >
> > > --
> > > 2.45.2
> > >
> >
> > Hi Nirmal and Paul,
> >
> > It will be great to have your review here.
> >
> > I had tried to "set the threshold value in vmd_pm_enable_quirk()"
> > directly as Paul said [1]. However, it still needs to get the PCIe
> > link from the PCIe device to set the threshold value.
> > And, pci_enable_link_state_locked() gets the link. Then, it will be
> > great to calculate and programm L1 sub-states' parameters properly
> > before configuring the link's ASPM there.
> >
> > [1]:
> > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
> >
> > Jian-Hong Pan
>
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-07 10:05 ` Jian-Hong Pan
@ 2024-08-07 11:18 ` Ilpo Järvinen
2024-08-07 23:27 ` David E. Box
0 siblings, 1 reply; 22+ messages in thread
From: Ilpo Järvinen @ 2024-08-07 11:18 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: david.e.box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
[-- Attachment #1: Type: text/plain, Size: 8541 bytes --]
On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> >
> > Hi Jian-Hong,
> >
> > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > >
> > > > Currently, when enable link's L1.2 features with __pci_enable_link_state(),
> > > > it configs the link directly without ensuring related L1.2 parameters, such
> > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been
> > > > programmed.
> > > >
> > > > This leads the link's L1.2 between PCIe Root Port and child device gets
> > > > wrong configs when a caller tries to enabled it.
> > > >
> > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > >
> > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe
> > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > ...
> > > > Capabilities: [200 v1] L1 PM Substates
> > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > L1_PM_Substates+
> > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > L1SubCtl2: T_PwrOn=50us
> > > >
> > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550
> > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > ...
> > > > Capabilities: [900 v1] L1 PM Substates
> > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > L1_PM_Substates+
> > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > L1SubCtl2: T_PwrOn=10us
> > > >
> > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe
> > > > Root Port and the child NVMe, they should be programmed with the same
> > > > LTR1.2_Threshold value. However, they have different values in this case.
> > > >
> > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before
> > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > __pci_enable_link_state().
> > > >
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > ---
> > > > v2:
> > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > >
> > > > v3:
> > > > - Only enable supported features for the L1 Substates part
> > > >
> > > > v4:
> > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS
> > > >
> > > > v5:
> > > > - Fix typo and commit message
> > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > aspm_get_l1ss_cap()"
> > > >
> > > > v6:
> > > > - Skipped
> > > >
> > > > v7:
> > > > - Pick back and rebase on the new version kernel
> > > > - Drop the link state flag check. And, always config link state's timing
> > > > parameters
> > > >
> > > > v8:
> > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > getting the link's parent and child devices after check the link is
> > > > not NULL. This avoids NULL memory access.
> > > >
> > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > 1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > --- a/drivers/pci/pcie/aspm.c
> > > > +++ b/drivers/pci/pcie/aspm.c
> > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool
> > > > locked)
> > > > {
> > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > + struct pci_dev *parent, *child;
> > > >
> > > > if (!link)
> > > > return -EINVAL;
> > > > +
> > > > + parent = link->pdev;
> > > > + child = link->downstream;
> > > > +
> > > > /*
> > > > * A driver requested that ASPM be enabled on this device, but
> > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev
> > > > *pdev, int state, bool locked)
> > > > if (!locked)
> > > > down_read(&pci_bus_sem);
> > > > mutex_lock(&aspm_lock);
> > > > + /*
> > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and
> > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for
> > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > + */
> > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> >
> > I still don't think this is the place to recalculate the L1.2 parameters
> > especially when know the calculation was done but was cleared by
> > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > pci_bus_reset() in vmd.c?
>
> I have not thought pci_save/restore_state() around pci_bus_reset()
> before. It is an interesting direction.
>
> So, I prepare modification below for test. Include "[PATCH v8 1/4]
> PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> too. Then, both the PCIe bridge and the PCIe device have the same
> LTR_L1.2_THRESHOLD 101376ns as expected.
>
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index bbf4a47e7b31..6b8dd4f30127 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> pci_host_bridge *root_bridge,
> vmd_bridge->native_dpc = root_bridge->native_dpc;
> }
>
> +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> +{
> + pci_save_state(pdev);
> + return 0;
> +}
> +
> +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> +{
> + pci_restore_state(pdev);
> + return 0;
> +}
> +
> /*
> * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> */
> @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> unsigned long features)
> pci_scan_child_bus(vmd->bus);
> vmd_domain_reset(vmd);
>
> + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> /* When Intel VMD is enabled, the OS does not discover the Root Ports
> * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies
> * a reset to the parent of the PCI device supplied as argument. This
> @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> unsigned long features)
> break;
> }
> }
> + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
Why not call pci_reset_bus() (or __pci_reset_bus()) then in
vmd_enable_domain() which preserves state unlike pci_reset_bus()?
(Don't tell me naming of these functions is a horrible mess. :-/)
--
i.
>
> pci_assign_unassigned_bus_resources(vmd->bus);
>
>
> Jian-Hong Pan
>
> > > > +
> > > > link->aspm_default = pci_calc_aspm_enable_mask(state);
> > > > pcie_config_aspm_link(link, policy_to_aspm_state(link));
> > > >
> > > > --
> > > > 2.45.2
> > > >
> > >
> > > Hi Nirmal and Paul,
> > >
> > > It will be great to have your review here.
> > >
> > > I had tried to "set the threshold value in vmd_pm_enable_quirk()"
> > > directly as Paul said [1]. However, it still needs to get the PCIe
> > > link from the PCIe device to set the threshold value.
> > > And, pci_enable_link_state_locked() gets the link. Then, it will be
> > > great to calculate and programm L1 sub-states' parameters properly
> > > before configuring the link's ASPM there.
> > >
> > > [1]:
> > > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124
> > >
> > > Jian-Hong Pan
> >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-07 11:18 ` Ilpo Järvinen
@ 2024-08-07 23:27 ` David E. Box
2024-08-08 9:48 ` Ilpo Järvinen
0 siblings, 1 reply; 22+ messages in thread
From: David E. Box @ 2024-08-07 23:27 UTC (permalink / raw)
To: Ilpo Järvinen, Jian-Hong Pan
Cc: Bjorn Helgaas, Johan Hovold, Kuppuswamy Sathyanarayanan,
Mika Westerberg, Damien Le Moal, Nirmal Patel, Jonathan Derrick,
Paul M Stillwell Jr, linux-pci, LKML, linux
On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
>
> > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > >
> > > Hi Jian-Hong,
> > >
> > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > >
> > > > > Currently, when enable link's L1.2 features with
> > > > > __pci_enable_link_state(),
> > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > such
> > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > been
> > > > > programmed.
> > > > >
> > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > gets
> > > > > wrong configs when a caller tries to enabled it.
> > > > >
> > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > >
> > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > PCIe
> > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > ...
> > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > L1_PM_Substates+
> > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > L1SubCtl2: T_PwrOn=50us
> > > > >
> > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > SN550
> > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > ...
> > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > L1_PM_Substates+
> > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > L1SubCtl2: T_PwrOn=10us
> > > > >
> > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > PCIe
> > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > case.
> > > > >
> > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > before
> > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > __pci_enable_link_state().
> > > > >
> > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > ---
> > > > > v2:
> > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > >
> > > > > v3:
> > > > > - Only enable supported features for the L1 Substates part
> > > > >
> > > > > v4:
> > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > L1SS
> > > > >
> > > > > v5:
> > > > > - Fix typo and commit message
> > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > aspm_get_l1ss_cap()"
> > > > >
> > > > > v6:
> > > > > - Skipped
> > > > >
> > > > > v7:
> > > > > - Pick back and rebase on the new version kernel
> > > > > - Drop the link state flag check. And, always config link state's
> > > > > timing
> > > > > parameters
> > > > >
> > > > > v8:
> > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > getting the link's parent and child devices after check the link is
> > > > > not NULL. This avoids NULL memory access.
> > > > >
> > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > 1 file changed, 15 insertions(+)
> > > > >
> > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > bool
> > > > > locked)
> > > > > {
> > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > + struct pci_dev *parent, *child;
> > > > >
> > > > > if (!link)
> > > > > return -EINVAL;
> > > > > +
> > > > > + parent = link->pdev;
> > > > > + child = link->downstream;
> > > > > +
> > > > > /*
> > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > pci_dev
> > > > > *pdev, int state, bool locked)
> > > > > if (!locked)
> > > > > down_read(&pci_bus_sem);
> > > > > mutex_lock(&aspm_lock);
> > > > > + /*
> > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > T_POWER_ON and
> > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > bits for
> > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > + */
> > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > >
> > > I still don't think this is the place to recalculate the L1.2 parameters
> > > especially when know the calculation was done but was cleared by
> > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > pci_bus_reset() in vmd.c?
> >
> > I have not thought pci_save/restore_state() around pci_bus_reset()
> > before. It is an interesting direction.
> >
> > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > too. Then, both the PCIe bridge and the PCIe device have the same
> > LTR_L1.2_THRESHOLD 101376ns as expected.
> >
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index bbf4a47e7b31..6b8dd4f30127 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > pci_host_bridge *root_bridge,
> > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > }
> >
> > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > +{
> > + pci_save_state(pdev);
> > + return 0;
> > +}
> > +
> > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > +{
> > + pci_restore_state(pdev);
> > + return 0;
> > +}
> > +
> > /*
> > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > */
> > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > unsigned long features)
> > pci_scan_child_bus(vmd->bus);
> > vmd_domain_reset(vmd);
> >
> > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > /* When Intel VMD is enabled, the OS does not discover the Root
> > Ports
> > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > applies
> > * a reset to the parent of the PCI device supplied as argument.
> > This
> > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > unsigned long features)
> > break;
> > }
> > }
> > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
>
> Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> vmd_enable_domain() which preserves state unlike pci_reset_bus()?
>
> (Don't tell me naming of these functions is a horrible mess. :-/)
>
Hmm. So this *is* calling pci_reset_bus(). L1.2 configuration has specific
ordering requirements for changes to parent & child devices. Could be why it's
not getting restored properly.
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-07 23:27 ` David E. Box
@ 2024-08-08 9:48 ` Ilpo Järvinen
2024-08-12 8:18 ` Jian-Hong Pan
0 siblings, 1 reply; 22+ messages in thread
From: Ilpo Järvinen @ 2024-08-08 9:48 UTC (permalink / raw)
To: David E. Box
Cc: Jian-Hong Pan, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
[-- Attachment #1: Type: text/plain, Size: 9617 bytes --]
On Wed, 7 Aug 2024, David E. Box wrote:
> On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> >
> > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > >
> > > > Hi Jian-Hong,
> > > >
> > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > >
> > > > > > Currently, when enable link's L1.2 features with
> > > > > > __pci_enable_link_state(),
> > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > such
> > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > been
> > > > > > programmed.
> > > > > >
> > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > gets
> > > > > > wrong configs when a caller tries to enabled it.
> > > > > >
> > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > >
> > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > PCIe
> > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > ...
> > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > L1_PM_Substates+
> > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > >
> > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > SN550
> > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > ...
> > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > L1_PM_Substates+
> > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > >
> > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > PCIe
> > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > case.
> > > > > >
> > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > before
> > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > __pci_enable_link_state().
> > > > > >
> > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > ---
> > > > > > v2:
> > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > >
> > > > > > v3:
> > > > > > - Only enable supported features for the L1 Substates part
> > > > > >
> > > > > > v4:
> > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > L1SS
> > > > > >
> > > > > > v5:
> > > > > > - Fix typo and commit message
> > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > aspm_get_l1ss_cap()"
> > > > > >
> > > > > > v6:
> > > > > > - Skipped
> > > > > >
> > > > > > v7:
> > > > > > - Pick back and rebase on the new version kernel
> > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > timing
> > > > > > parameters
> > > > > >
> > > > > > v8:
> > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > getting the link's parent and child devices after check the link is
> > > > > > not NULL. This avoids NULL memory access.
> > > > > >
> > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > 1 file changed, 15 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > bool
> > > > > > locked)
> > > > > > {
> > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > + struct pci_dev *parent, *child;
> > > > > >
> > > > > > if (!link)
> > > > > > return -EINVAL;
> > > > > > +
> > > > > > + parent = link->pdev;
> > > > > > + child = link->downstream;
> > > > > > +
> > > > > > /*
> > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > pci_dev
> > > > > > *pdev, int state, bool locked)
> > > > > > if (!locked)
> > > > > > down_read(&pci_bus_sem);
> > > > > > mutex_lock(&aspm_lock);
> > > > > > + /*
> > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > T_POWER_ON and
> > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > bits for
> > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > + */
> > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > >
> > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > especially when know the calculation was done but was cleared by
> > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > pci_bus_reset() in vmd.c?
> > >
> > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > before. It is an interesting direction.
> > >
> > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > >
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > pci_host_bridge *root_bridge,
> > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > }
> > >
> > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > +{
> > > + pci_save_state(pdev);
> > > + return 0;
> > > +}
> > > +
> > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > +{
> > > + pci_restore_state(pdev);
> > > + return 0;
> > > +}
> > > +
> > > /*
> > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > */
> > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > unsigned long features)
> > > pci_scan_child_bus(vmd->bus);
> > > vmd_domain_reset(vmd);
> > >
> > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > Ports
> > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > applies
> > > * a reset to the parent of the PCI device supplied as argument.
> > > This
> > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > unsigned long features)
> > > break;
> > > }
> > > }
> > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> >
> > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> >
> > (Don't tell me naming of these functions is a horrible mess. :-/)
>
> Hmm. So this *is* calling pci_reset_bus().
Yeah, I managed to get confused by the names myself, I somehow
ended up thinking it calls pci_bus_reset() which is not correct...
> L1.2 configuration has specific
> ordering requirements for changes to parent & child devices. Could be why it's
> not getting restored properly.
Indeed, it has to be something else since the patch above doesn't even
restore anything because dev->state_saved should get set to false by the
first pci_restore_state() called from
__pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
think!?
--
i.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-08 9:48 ` Ilpo Järvinen
@ 2024-08-12 8:18 ` Jian-Hong Pan
2024-09-02 15:43 ` Ilpo Järvinen
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-08-12 8:18 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道:
>
> On Wed, 7 Aug 2024, David E. Box wrote:
>
> > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > >
> > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > > >
> > > > > Hi Jian-Hong,
> > > > >
> > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > > >
> > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > __pci_enable_link_state(),
> > > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > > such
> > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > > been
> > > > > > > programmed.
> > > > > > >
> > > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > > gets
> > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > >
> > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > > >
> > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > > PCIe
> > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > ...
> > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > > L1_PM_Substates+
> > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > > >
> > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > > SN550
> > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > ...
> > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > L1_PM_Substates+
> > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > > >
> > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > > PCIe
> > > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > > case.
> > > > > > >
> > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > > before
> > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > __pci_enable_link_state().
> > > > > > >
> > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > ---
> > > > > > > v2:
> > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > > >
> > > > > > > v3:
> > > > > > > - Only enable supported features for the L1 Substates part
> > > > > > >
> > > > > > > v4:
> > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > > L1SS
> > > > > > >
> > > > > > > v5:
> > > > > > > - Fix typo and commit message
> > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > > aspm_get_l1ss_cap()"
> > > > > > >
> > > > > > > v6:
> > > > > > > - Skipped
> > > > > > >
> > > > > > > v7:
> > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > > timing
> > > > > > > parameters
> > > > > > >
> > > > > > > v8:
> > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > > getting the link's parent and child devices after check the link is
> > > > > > > not NULL. This avoids NULL memory access.
> > > > > > >
> > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > 1 file changed, 15 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > > bool
> > > > > > > locked)
> > > > > > > {
> > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > + struct pci_dev *parent, *child;
> > > > > > >
> > > > > > > if (!link)
> > > > > > > return -EINVAL;
> > > > > > > +
> > > > > > > + parent = link->pdev;
> > > > > > > + child = link->downstream;
> > > > > > > +
> > > > > > > /*
> > > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > > pci_dev
> > > > > > > *pdev, int state, bool locked)
> > > > > > > if (!locked)
> > > > > > > down_read(&pci_bus_sem);
> > > > > > > mutex_lock(&aspm_lock);
> > > > > > > + /*
> > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > > T_POWER_ON and
> > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > > bits for
> > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > + */
> > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > > >
> > > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > > especially when know the calculation was done but was cleared by
> > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > > pci_bus_reset() in vmd.c?
> > > >
> > > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > > before. It is an interesting direction.
> > > >
> > > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > > >
> > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > > --- a/drivers/pci/controller/vmd.c
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > > pci_host_bridge *root_bridge,
> > > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > > }
> > > >
> > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > > +{
> > > > + pci_save_state(pdev);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > > +{
> > > > + pci_restore_state(pdev);
> > > > + return 0;
> > > > +}
> > > > +
> > > > /*
> > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > > */
> > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > unsigned long features)
> > > > pci_scan_child_bus(vmd->bus);
> > > > vmd_domain_reset(vmd);
> > > >
> > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > > Ports
> > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > > applies
> > > > * a reset to the parent of the PCI device supplied as argument.
> > > > This
> > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > unsigned long features)
> > > > break;
> > > > }
> > > > }
> > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > >
> > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> > >
> > > (Don't tell me naming of these functions is a horrible mess. :-/)
> >
> > Hmm. So this *is* calling pci_reset_bus().
>
> Yeah, I managed to get confused by the names myself, I somehow
> ended up thinking it calls pci_bus_reset() which is not correct...
>
> > L1.2 configuration has specific
> > ordering requirements for changes to parent & child devices. Could be why it's
> > not getting restored properly.
>
> Indeed, it has to be something else since the patch above doesn't even
> restore anything because dev->state_saved should get set to false by the
> first pci_restore_state() called from
> __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
> think!?
Inspired by Ilpo's comment. I add some debug messages based on
linux-next's tag 'next-20240809' to understand the code path of
pci_reset_bus():
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ffaaca0978cb..3ee71374f1de 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
* races with ->remove() by the device lock, which must be held by
* the caller.
*/
- if (err_handler && err_handler->reset_prepare)
+ if (err_handler && err_handler->reset_prepare) {
+ pci_info(dev, "%s: %pF\n", __func__,
err_handler->reset_prepare);
err_handler->reset_prepare(dev);
+ }
/*
* Wake-up device prior to save. PM registers default to D0 after
@@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
pci_set_power_state(dev, PCI_D0);
pci_save_state(dev);
+ pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
dev->state_saved ? "true" : "false");
/*
* Disable the device by clearing the Command register, except for
* INTx-disable which is set. This not only disables MMIO and I/O port
@@ -5655,6 +5658,10 @@ static void
pci_bus_save_and_disable_locked(struct pci_bus *bus)
struct pci_dev *dev;
list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
+ __func__,
+ dev->state_saved ? "true" : "false",
+ dev->subordinate ? "has" : "does not have");
pci_dev_save_and_disable(dev);
if (dev->subordinate)
pci_bus_save_and_disable_locked(dev->subordinate);
@@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
struct pci_dev *dev;
list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
+ __func__,
+ dev->state_saved ? "true" : "false",
+ dev->subordinate ? "has" : "does not have");
pci_dev_restore(dev);
if (dev->subordinate)
pci_bus_restore_locked(dev->subordinate);
@@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
if (!bus->self || !pci_bus_resettable(bus))
return -ENOTTY;
- if (probe)
+ if (probe) {
+ pci_info(bus->self, "%s: probe is true. So return 0
directly", __func__);
return 0;
+ }
pci_bus_lock(bus);
@@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus)
int rc;
rc = pci_bus_reset(bus, PCI_RESET_PROBE);
+ pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc);
if (rc)
return rc;
if (pci_bus_trylock(bus)) {
+ pci_info(bus->self, "%s: pci_bus_trylock() returns
true\n", __func__);
pci_bus_save_and_disable_locked(bus);
might_sleep();
rc = pci_bridge_secondary_bus_reset(bus->self);
@@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
*/
int pci_reset_bus(struct pci_dev *pdev)
{
+ pci_info(pdev, "%s: %s", __func__,
!pci_probe_reset_slot(pdev->slot) ? "true" : "false");
return (!pci_probe_reset_slot(pdev->slot)) ?
__pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
}
And, have the information of VMD PCIe devices with the built kernel:
10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
decode])
...
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=0us
10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
...
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=101376ns
L1SubCtl2: T_PwrOn=50us
We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
PCIe bridge has LTR1.2_Threshold=0ns.
Then, check the dmesg. I notice the debug messages:
pci 10000:e0:06.0: PCI bridge to [bus e1]
pci 10000:e0:06.0: Primary bus is hard wired to 0
pci 10000:e1:00.0: pci_reset_bus: false
pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
false, and does not have subordinate
pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
Freeing initrd memory: 75236K
pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
and does not have subordinate
So, the code path is:
vmd_enable_domain()
pci_reset_bus()
__pci_reset_bus()
pci_bus_reset()
pci_bus_save_and_disable_locked()
pci_dev_save_and_disable()
pci_bus_restore_locked()
pci_dev_restore()
And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
bus in vmd_enable_domain().
* Bus "e1" has only NVMe 10000:e1:00.0
* VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
10000:e1:00.0 and SATA Controller 10000:e0:17.0.
Here is the PCI tree:
-+-[0000:00]-+-00.0 Intel Corporation Device 9a04
| +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4]
| +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning
Processor Participant
| +-06.0 Intel Corporation RST VMD Managed Controller
| +-07.0-[01-2b]--
| +-08.0 Intel Corporation GNA Scoring Accelerator module
| +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver
| +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB
Controller
| +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0
| +-0e.0 Intel Corporation Volume Management Device NVMe
RAID Controller
| +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
xHCI Host Controller
| +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
| +-14.3 Intel Corporation Wi-Fi 6 AX201
| +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0
| +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1
| +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface
| +-17.0 Intel Corporation RST VMD Managed Controller
| +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller
| +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound
Technology Audio Controller
| +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
| +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
| \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
\-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
\-17.0 Intel Corporation Tiger Lake-LP SATA Controller
According the findings above, to ensure the devices on the VMD bus
have correctly states, seems pci_save_state() all the devices before
pci_reset_bus(), and pci_restore_state() all the devices after
pci_reset_bus() is the correct answer.
Jian-Hong Pan
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-12 8:18 ` Jian-Hong Pan
@ 2024-09-02 15:43 ` Ilpo Järvinen
2024-09-03 10:31 ` Jian-Hong Pan
2024-09-23 8:41 ` Jian-Hong Pan
2024-09-03 15:17 ` Nirmal Patel
2024-09-20 16:03 ` Nirmal Patel
2 siblings, 2 replies; 22+ messages in thread
From: Ilpo Järvinen @ 2024-09-02 15:43 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
[-- Attachment #1: Type: text/plain, Size: 19697 bytes --]
On Mon, 12 Aug 2024, Jian-Hong Pan wrote:
> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道:
> > On Wed, 7 Aug 2024, David E. Box wrote:
> > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > >
> > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > > > >
> > > > > > Hi Jian-Hong,
> > > > > >
> > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > > > >
> > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > __pci_enable_link_state(),
> > > > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > > > such
> > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > > > been
> > > > > > > > programmed.
> > > > > > > >
> > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > > > gets
> > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > >
> > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > > > >
> > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > > > PCIe
> > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > ...
> > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > > > L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > > > >
> > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > > > SN550
> > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > ...
> > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > > > >
> > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > > > PCIe
> > > > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > > > case.
> > > > > > > >
> > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > > > before
> > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > __pci_enable_link_state().
> > > > > > > >
> > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > ---
> > > > > > > > v2:
> > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > > > >
> > > > > > > > v3:
> > > > > > > > - Only enable supported features for the L1 Substates part
> > > > > > > >
> > > > > > > > v4:
> > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > > > L1SS
> > > > > > > >
> > > > > > > > v5:
> > > > > > > > - Fix typo and commit message
> > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > > > aspm_get_l1ss_cap()"
> > > > > > > >
> > > > > > > > v6:
> > > > > > > > - Skipped
> > > > > > > >
> > > > > > > > v7:
> > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > > > timing
> > > > > > > > parameters
> > > > > > > >
> > > > > > > > v8:
> > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > > > getting the link's parent and child devices after check the link is
> > > > > > > > not NULL. This avoids NULL memory access.
> > > > > > > >
> > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > > > bool
> > > > > > > > locked)
> > > > > > > > {
> > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > >
> > > > > > > > if (!link)
> > > > > > > > return -EINVAL;
> > > > > > > > +
> > > > > > > > + parent = link->pdev;
> > > > > > > > + child = link->downstream;
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > > > pci_dev
> > > > > > > > *pdev, int state, bool locked)
> > > > > > > > if (!locked)
> > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > + /*
> > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > > > T_POWER_ON and
> > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > > > bits for
> > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > + */
> > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > > > >
> > > > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > > > especially when know the calculation was done but was cleared by
> > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > > > pci_bus_reset() in vmd.c?
> > > > >
> > > > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > > > before. It is an interesting direction.
> > > > >
> > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > > > >
> > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > > > --- a/drivers/pci/controller/vmd.c
> > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > > > pci_host_bridge *root_bridge,
> > > > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > > > }
> > > > >
> > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > > > +{
> > > > > + pci_save_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > > > +{
> > > > > + pci_restore_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > /*
> > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > > > */
> > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > unsigned long features)
> > > > > pci_scan_child_bus(vmd->bus);
> > > > > vmd_domain_reset(vmd);
> > > > >
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > > > Ports
> > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > > > applies
> > > > > * a reset to the parent of the PCI device supplied as argument.
> > > > > This
> > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > unsigned long features)
> > > > > break;
> > > > > }
> > > > > }
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > >
> > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> > > >
> > > > (Don't tell me naming of these functions is a horrible mess. :-/)
> > >
> > > Hmm. So this *is* calling pci_reset_bus().
> >
> > Yeah, I managed to get confused by the names myself, I somehow
> > ended up thinking it calls pci_bus_reset() which is not correct...
> >
> > > L1.2 configuration has specific
> > > ordering requirements for changes to parent & child devices. Could be why it's
> > > not getting restored properly.
> >
> > Indeed, it has to be something else since the patch above doesn't even
> > restore anything because dev->state_saved should get set to false by the
> > first pci_restore_state() called from
> > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
> > think!?
>
> Inspired by Ilpo's comment. I add some debug messages based on
> linux-next's tag 'next-20240809' to understand the code path of
> pci_reset_bus():
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ffaaca0978cb..3ee71374f1de 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> * races with ->remove() by the device lock, which must be held by
> * the caller.
> */
> - if (err_handler && err_handler->reset_prepare)
> + if (err_handler && err_handler->reset_prepare) {
> + pci_info(dev, "%s: %pF\n", __func__,
> err_handler->reset_prepare);
> err_handler->reset_prepare(dev);
> + }
>
> /*
> * Wake-up device prior to save. PM registers default to D0 after
> @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> pci_set_power_state(dev, PCI_D0);
>
> pci_save_state(dev);
> + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> dev->state_saved ? "true" : "false");
> /*
> * Disable the device by clearing the Command register, except for
> * INTx-disable which is set. This not only disables MMIO and I/O port
> @@ -5655,6 +5658,10 @@ static void
> pci_bus_save_and_disable_locked(struct pci_bus *bus)
> struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_save_and_disable(dev);
> if (dev->subordinate)
> pci_bus_save_and_disable_locked(dev->subordinate);
> @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
> struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_restore(dev);
> if (dev->subordinate)
> pci_bus_restore_locked(dev->subordinate);
> @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
> if (!bus->self || !pci_bus_resettable(bus))
> return -ENOTTY;
>
> - if (probe)
> + if (probe) {
> + pci_info(bus->self, "%s: probe is true. So return 0
> directly", __func__);
> return 0;
> + }
>
> pci_bus_lock(bus);
>
> @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus)
> int rc;
>
> rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc);
> if (rc)
> return rc;
>
> if (pci_bus_trylock(bus)) {
> + pci_info(bus->self, "%s: pci_bus_trylock() returns
> true\n", __func__);
> pci_bus_save_and_disable_locked(bus);
> might_sleep();
> rc = pci_bridge_secondary_bus_reset(bus->self);
> @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> */
> int pci_reset_bus(struct pci_dev *pdev)
> {
> + pci_info(pdev, "%s: %s", __func__,
> !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> return (!pci_probe_reset_slot(pdev->slot)) ?
> __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> }
>
> And, have the information of VMD PCIe devices with the built kernel:
>
> 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> decode])
> ...
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=0ns
> L1SubCtl2: T_PwrOn=0us
>
> 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> ...
> Capabilities: [900 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
> PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=101376ns
> L1SubCtl2: T_PwrOn=50us
>
> We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> PCIe bridge has LTR1.2_Threshold=0ns.
This is now the other way around as in the original posting that had
0ns for 10000:e1:00.0 ??
Is this behavior even consistent or did you e.g. mess up some copy
pasting somewhere?
> Then, check the dmesg. I notice the debug messages:
>
> pci 10000:e0:06.0: PCI bridge to [bus e1]
> pci 10000:e0:06.0: Primary bus is hard wired to 0
> pci 10000:e1:00.0: pci_reset_bus: false
> pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> false, and does not have subordinate
> pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> Freeing initrd memory: 75236K
> pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> and does not have subordinate
>
> So, the code path is:
>
> vmd_enable_domain()
> pci_reset_bus()
> __pci_reset_bus()
> pci_bus_reset()
> pci_bus_save_and_disable_locked()
> pci_dev_save_and_disable()
> pci_bus_restore_locked()
> pci_dev_restore()
>
> And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
It should not be necessary to restore the bridge's configuration as it
ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13:
"Port configuration registers must not be changed, except as required to
update Port status."
While the second part wording leaves some leeway, I don't think any of
these field really fall under "Port status".
> Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> bus in vmd_enable_domain().
> * Bus "e1" has only NVMe 10000:e1:00.0
> * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
...But even if those registers on the PCIe bridge were changed underneath
against the spec, it's not clear from your debug log why pci_dev_restore()
-> pci_restore_state() -> pci_restore_pcie_state() ->
pci_restore_aspm_l1ss_state() did not restore also parent's
LTR1.2_Threshold?? I think it should attempt to do that.
--
i.
> Here is the PCI tree:
>
> -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4]
> | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning
> Processor Participant
> | +-06.0 Intel Corporation RST VMD Managed Controller
> | +-07.0-[01-2b]--
> | +-08.0 Intel Corporation GNA Scoring Accelerator module
> | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver
> | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB
> Controller
> | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0
> | +-0e.0 Intel Corporation Volume Management Device NVMe
> RAID Controller
> | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> xHCI Host Controller
> | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> | +-14.3 Intel Corporation Wi-Fi 6 AX201
> | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0
> | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1
> | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface
> | +-17.0 Intel Corporation RST VMD Managed Controller
> | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller
> | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound
> Technology Audio Controller
> | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
>
> According the findings above, to ensure the devices on the VMD bus
> have correctly states, seems pci_save_state() all the devices before
> pci_reset_bus(), and pci_restore_state() all the devices after
> pci_reset_bus() is the correct answer.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-09-02 15:43 ` Ilpo Järvinen
@ 2024-09-03 10:31 ` Jian-Hong Pan
2024-09-23 8:41 ` Jian-Hong Pan
1 sibling, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-09-03 10:31 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道:
>
> On Mon, 12 Aug 2024, Jian-Hong Pan wrote:
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道:
> > > On Wed, 7 Aug 2024, David E. Box wrote:
> > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > > >
> > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > > > > >
> > > > > > > Hi Jian-Hong,
> > > > > > >
> > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > > > > >
> > > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > > __pci_enable_link_state(),
> > > > > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > > > > such
> > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > > > > been
> > > > > > > > > programmed.
> > > > > > > > >
> > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > > > > gets
> > > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > > >
> > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > > > > >
> > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > > > > PCIe
> > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > > > > L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > > > > >
> > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > > > > SN550
> > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > > > > >
> > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > > > > PCIe
> > > > > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > > > > case.
> > > > > > > > >
> > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > > > > before
> > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > > __pci_enable_link_state().
> > > > > > > > >
> > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > > > > >
> > > > > > > > > v3:
> > > > > > > > > - Only enable supported features for the L1 Substates part
> > > > > > > > >
> > > > > > > > > v4:
> > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > > > > L1SS
> > > > > > > > >
> > > > > > > > > v5:
> > > > > > > > > - Fix typo and commit message
> > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > > > > aspm_get_l1ss_cap()"
> > > > > > > > >
> > > > > > > > > v6:
> > > > > > > > > - Skipped
> > > > > > > > >
> > > > > > > > > v7:
> > > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > > > > timing
> > > > > > > > > parameters
> > > > > > > > >
> > > > > > > > > v8:
> > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > > > > getting the link's parent and child devices after check the link is
> > > > > > > > > not NULL. This avoids NULL memory access.
> > > > > > > > >
> > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > > > > bool
> > > > > > > > > locked)
> > > > > > > > > {
> > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > > >
> > > > > > > > > if (!link)
> > > > > > > > > return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + parent = link->pdev;
> > > > > > > > > + child = link->downstream;
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > > > > pci_dev
> > > > > > > > > *pdev, int state, bool locked)
> > > > > > > > > if (!locked)
> > > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > > + /*
> > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > > > > T_POWER_ON and
> > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > > > > bits for
> > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > > + */
> > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > > > > >
> > > > > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > > > > especially when know the calculation was done but was cleared by
> > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > > > > pci_bus_reset() in vmd.c?
> > > > > >
> > > > > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > > > > before. It is an interesting direction.
> > > > > >
> > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > > > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > > > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > > > > --- a/drivers/pci/controller/vmd.c
> > > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > > > > pci_host_bridge *root_bridge,
> > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > > > > }
> > > > > >
> > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > > > > +{
> > > > > > + pci_save_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > > > > +{
> > > > > > + pci_restore_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > > > > */
> > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > unsigned long features)
> > > > > > pci_scan_child_bus(vmd->bus);
> > > > > > vmd_domain_reset(vmd);
> > > > > >
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > > > > Ports
> > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > > > > applies
> > > > > > * a reset to the parent of the PCI device supplied as argument.
> > > > > > This
> > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > unsigned long features)
> > > > > > break;
> > > > > > }
> > > > > > }
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > > >
> > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> > > > >
> > > > > (Don't tell me naming of these functions is a horrible mess. :-/)
> > > >
> > > > Hmm. So this *is* calling pci_reset_bus().
> > >
> > > Yeah, I managed to get confused by the names myself, I somehow
> > > ended up thinking it calls pci_bus_reset() which is not correct...
> > >
> > > > L1.2 configuration has specific
> > > > ordering requirements for changes to parent & child devices. Could be why it's
> > > > not getting restored properly.
> > >
> > > Indeed, it has to be something else since the patch above doesn't even
> > > restore anything because dev->state_saved should get set to false by the
> > > first pci_restore_state() called from
> > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
> > > think!?
> >
> > Inspired by Ilpo's comment. I add some debug messages based on
> > linux-next's tag 'next-20240809' to understand the code path of
> > pci_reset_bus():
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index ffaaca0978cb..3ee71374f1de 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > * races with ->remove() by the device lock, which must be held by
> > * the caller.
> > */
> > - if (err_handler && err_handler->reset_prepare)
> > + if (err_handler && err_handler->reset_prepare) {
> > + pci_info(dev, "%s: %pF\n", __func__,
> > err_handler->reset_prepare);
> > err_handler->reset_prepare(dev);
> > + }
> >
> > /*
> > * Wake-up device prior to save. PM registers default to D0 after
> > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > pci_set_power_state(dev, PCI_D0);
> >
> > pci_save_state(dev);
> > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> > dev->state_saved ? "true" : "false");
> > /*
> > * Disable the device by clearing the Command register, except for
> > * INTx-disable which is set. This not only disables MMIO and I/O port
> > @@ -5655,6 +5658,10 @@ static void
> > pci_bus_save_and_disable_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_save_and_disable(dev);
> > if (dev->subordinate)
> > pci_bus_save_and_disable_locked(dev->subordinate);
> > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_restore(dev);
> > if (dev->subordinate)
> > pci_bus_restore_locked(dev->subordinate);
> > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
> > if (!bus->self || !pci_bus_resettable(bus))
> > return -ENOTTY;
> >
> > - if (probe)
> > + if (probe) {
> > + pci_info(bus->self, "%s: probe is true. So return 0
> > directly", __func__);
> > return 0;
> > + }
> >
> > pci_bus_lock(bus);
> >
> > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > int rc;
> >
> > rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc);
> > if (rc)
> > return rc;
> >
> > if (pci_bus_trylock(bus)) {
> > + pci_info(bus->self, "%s: pci_bus_trylock() returns
> > true\n", __func__);
> > pci_bus_save_and_disable_locked(bus);
> > might_sleep();
> > rc = pci_bridge_secondary_bus_reset(bus->self);
> > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > */
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> > + pci_info(pdev, "%s: %s", __func__,
> > !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> > return (!pci_probe_reset_slot(pdev->slot)) ?
> > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > And, have the information of VMD PCIe devices with the built kernel:
> >
> > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> > decode])
> > ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=0us
> >
> > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> > ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
> > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> > PCIe bridge has LTR1.2_Threshold=0ns.
>
> This is now the other way around as in the original posting that had
> 0ns for 10000:e1:00.0 ??
>
> Is this behavior even consistent or did you e.g. mess up some copy
> pasting somewhere?
The original posting came with older kernel 6.5. It shows:
10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
decode])
...
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=45us LTR1.2_Threshold=101376ns
L1SubCtl2: T_PwrOn=50us
...
10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
...
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
...
Full information:
https://gist.github.com/starnight/e19487a44efefff477f9ac9ed641c183
But, newer kernel, for example linux-next next-20240809 and
next-20240820 which I have tried shows:
10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
decode])
...
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=0us
...
10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
...
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=101376ns
L1SubCtl2: T_PwrOn=50us
...
Full information:
https://gist.github.com/starnight/081ea4adbce40a27faf234e5e135b49a
So, according to the information above, different kernel versions show
different L1 sub-states.
Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-12 8:18 ` Jian-Hong Pan
2024-09-02 15:43 ` Ilpo Järvinen
@ 2024-09-03 15:17 ` Nirmal Patel
2024-09-04 3:56 ` Jian-Hong Pan
2024-09-20 16:03 ` Nirmal Patel
2 siblings, 1 reply; 22+ messages in thread
From: Nirmal Patel @ 2024-09-03 15:17 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: Ilpo Järvinen, David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, LKML, linux
On Mon, 12 Aug 2024 16:18:22 +0800
Jian-Hong Pan <jhp@endlessos.org> wrote:
> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四
> 下午5:49寫道:
> >
> > On Wed, 7 Aug 2024, David E. Box wrote:
> >
> > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > >
> > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日
> > > > > 週二 上午4:26寫道:
> > > > > >
> > > > > > Hi Jian-Hong,
> > > > > >
> > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五
> > > > > > > 下午4:04寫道:
> > > > > > > >
> > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > __pci_enable_link_state(),
> > > > > > > > it configs the link directly without ensuring related
> > > > > > > > L1.2 parameters, such
> > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and
> > > > > > > > LTR_L1.2_THRESHOLD have been
> > > > > > > > programmed.
> > > > > > > >
> > > > > > > > This leads the link's L1.2 between PCIe Root Port and
> > > > > > > > child device gets
> > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > >
> > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled
> > > > > > > > VMD:
> > > > > > > >
> > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen
> > > > > > > > Core Processor PCIe
> > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > ...
> > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > > > > > > ASPM_L1.1+ L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=45us
> > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us
> > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us
> > > > > > > >
> > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk
> > > > > > > > Corp WD Blue SN550
> > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > ...
> > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+
> > > > > > > > ASPM_L1.1- L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=32us
> > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us
> > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us
> > > > > > > >
> > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling
> > > > > > > > ASPM L1.2 on the PCIe
> > > > > > > > Root Port and the child NVMe, they should be programmed
> > > > > > > > with the same LTR1.2_Threshold value. However, they
> > > > > > > > have different values in this case.
> > > > > > > >
> > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2
> > > > > > > > parameters properly before
> > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > __pci_enable_link_state().
> > > > > > > >
> > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > ---
> > > > > > > > v2:
> > > > > > > > - Prepare the PCIe LTR parameters before enable L1
> > > > > > > > Substates
> > > > > > > >
> > > > > > > > v3:
> > > > > > > > - Only enable supported features for the L1 Substates
> > > > > > > > part
> > > > > > > >
> > > > > > > > v4:
> > > > > > > > - Focus on fixing L1.2 parameters, instead of
> > > > > > > > re-initializing whole L1SS
> > > > > > > >
> > > > > > > > v5:
> > > > > > > > - Fix typo and commit message
> > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM:
> > > > > > > > Introduce aspm_get_l1ss_cap()"
> > > > > > > >
> > > > > > > > v6:
> > > > > > > > - Skipped
> > > > > > > >
> > > > > > > > v7:
> > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > - Drop the link state flag check. And, always config
> > > > > > > > link state's timing
> > > > > > > > parameters
> > > > > > > >
> > > > > > > > v8:
> > > > > > > > - Because pcie_aspm_get_link() might return the link as
> > > > > > > > NULL, move getting the link's parent and child devices
> > > > > > > > after check the link is not NULL. This avoids NULL
> > > > > > > > memory access.
> > > > > > > >
> > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/pci/pcie/aspm.c
> > > > > > > > b/drivers/pci/pcie/aspm.c index
> > > > > > > > 5db1044c9895..55ff1d26fcea 100644 ---
> > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > @@ -1411,9 +1411,15 @@
> > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int
> > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int
> > > > > > > > state, bool locked)
> > > > > > > > {
> > > > > > > > struct pcie_link_state *link =
> > > > > > > > pcie_aspm_get_link(pdev);
> > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > >
> > > > > > > > if (!link)
> > > > > > > > return -EINVAL;
> > > > > > > > +
> > > > > > > > + parent = link->pdev;
> > > > > > > > + child = link->downstream;
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * A driver requested that ASPM be enabled on
> > > > > > > > this device, but
> > > > > > > > * if we don't have permission to manage ASPM
> > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int
> > > > > > > > __pci_enable_link_state(struct pci_dev
> > > > > > > > *pdev, int state, bool locked)
> > > > > > > > if (!locked)
> > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > + /*
> > > > > > > > + * Ensure L1.2 parameters:
> > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and
> > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly
> > > > > > > > before enable bits for
> > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > + */
> > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap,
> > > > > > > > child_l1ss_cap);
> > > > > >
> > > > > > I still don't think this is the place to recalculate the
> > > > > > L1.2 parameters especially when know the calculation was
> > > > > > done but was cleared by pci_bus_reset(). Can't we just do a
> > > > > > pci_save/restore_state() before/after pci_bus_reset() in
> > > > > > vmd.c?
> > > > >
> > > > > I have not thought pci_save/restore_state() around
> > > > > pci_bus_reset() before. It is an interesting direction.
> > > > >
> > > > > So, I prepare modification below for test. Include "[PATCH
> > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI
> > > > > PM's L1 substates", too. Then, both the PCIe bridge and the
> > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as
> > > > > expected.
> > > > >
> > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > b/drivers/pci/controller/vmd.c index
> > > > > bbf4a47e7b31..6b8dd4f30127 100644 ---
> > > > > a/drivers/pci/controller/vmd.c +++
> > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static
> > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge
> > > > > *root_bridge, vmd_bridge->native_dpc =
> > > > > root_bridge->native_dpc; }
> > > > >
> > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void
> > > > > *userdata) +{
> > > > > + pci_save_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void
> > > > > *userdata) +{
> > > > > + pci_restore_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > /*
> > > > > * Enable ASPM and LTR settings on devices that aren't
> > > > > configured by BIOS. */
> > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct
> > > > > vmd_dev *vmd, unsigned long features)
> > > > > pci_scan_child_bus(vmd->bus);
> > > > > vmd_domain_reset(vmd);
> > > > >
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > /* When Intel VMD is enabled, the OS does not
> > > > > discover the Root Ports
> > > > > * owned by Intel VMD within the MMCFG space.
> > > > > pci_reset_bus() applies
> > > > > * a reset to the parent of the PCI device supplied
> > > > > as argument. This
> > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct
> > > > > vmd_dev *vmd, unsigned long features)
> > > > > break;
> > > > > }
> > > > > }
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > >
> > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > vmd_enable_domain() which preserves state unlike
> > > > pci_reset_bus()?
> > > >
> > > > (Don't tell me naming of these functions is a horrible mess.
> > > > :-/)
> > >
> > > Hmm. So this *is* calling pci_reset_bus().
> >
> > Yeah, I managed to get confused by the names myself, I somehow
> > ended up thinking it calls pci_bus_reset() which is not correct...
> >
> > > L1.2 configuration has specific
> > > ordering requirements for changes to parent & child devices.
> > > Could be why it's not getting restored properly.
> >
> > Indeed, it has to be something else since the patch above doesn't
> > even restore anything because dev->state_saved should get set to
> > false by the first pci_restore_state() called from
> > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(),
> > I think!?
>
> Inspired by Ilpo's comment. I add some debug messages based on
> linux-next's tag 'next-20240809' to understand the code path of
> pci_reset_bus():
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ffaaca0978cb..3ee71374f1de 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct
> pci_dev *dev)
> * races with ->remove() by the device lock, which must be
> held by
> * the caller.
> */
> - if (err_handler && err_handler->reset_prepare)
> + if (err_handler && err_handler->reset_prepare) {
> + pci_info(dev, "%s: %pF\n", __func__,
> err_handler->reset_prepare);
> err_handler->reset_prepare(dev);
> + }
>
> /*
> * Wake-up device prior to save. PM registers default to D0
> after @@ -5144,6 +5146,7 @@ static void
> pci_dev_save_and_disable(struct pci_dev *dev)
> pci_set_power_state(dev, PCI_D0);
>
> pci_save_state(dev);
> + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> dev->state_saved ? "true" : "false");
> /*
> * Disable the device by clearing the Command register,
> except for
> * INTx-disable which is set. This not only disables MMIO
> and I/O port @@ -5655,6 +5658,10 @@ static void
> pci_bus_save_and_disable_locked(struct pci_bus *bus)
> struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s
> subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_save_and_disable(dev);
> if (dev->subordinate)
> pci_bus_save_and_disable_locked(dev->subordinate);
> @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct
> pci_bus *bus) struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s
> subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_restore(dev);
> if (dev->subordinate)
> pci_bus_restore_locked(dev->subordinate);
> @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus,
> bool probe) if (!bus->self || !pci_bus_resettable(bus))
> return -ENOTTY;
>
> - if (probe)
> + if (probe) {
> + pci_info(bus->self, "%s: probe is true. So return 0
> directly", __func__);
> return 0;
> + }
>
> pci_bus_lock(bus);
>
> @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus
> *bus) int rc;
>
> rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n",
> __func__, rc); if (rc)
> return rc;
>
> if (pci_bus_trylock(bus)) {
> + pci_info(bus->self, "%s: pci_bus_trylock() returns
> true\n", __func__);
> pci_bus_save_and_disable_locked(bus);
> might_sleep();
> rc = pci_bridge_secondary_bus_reset(bus->self);
> @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> */
> int pci_reset_bus(struct pci_dev *pdev)
> {
> + pci_info(pdev, "%s: %s", __func__,
> !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> return (!pci_probe_reset_slot(pdev->slot)) ?
> __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> }
>
> And, have the information of VMD PCIe devices with the built kernel:
>
> 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> decode])
> ...
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=0ns
> L1SubCtl2: T_PwrOn=0us
>
> 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> ...
> Capabilities: [900 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=101376ns
> L1SubCtl2: T_PwrOn=50us
>
> We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> PCIe bridge has LTR1.2_Threshold=0ns.
>
> Then, check the dmesg. I notice the debug messages:
>
> pci 10000:e0:06.0: PCI bridge to [bus e1]
> pci 10000:e0:06.0: Primary bus is hard wired to 0
> pci 10000:e1:00.0: pci_reset_bus: false
> pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> false, and does not have subordinate
> pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> Freeing initrd memory: 75236K
> pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> and does not have subordinate
>
> So, the code path is:
>
> vmd_enable_domain()
> pci_reset_bus()
> __pci_reset_bus()
> pci_bus_reset()
> pci_bus_save_and_disable_locked()
> pci_dev_save_and_disable()
> pci_bus_restore_locked()
> pci_dev_restore()
>
> And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
>
> Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> bus in vmd_enable_domain().
> * Bus "e1" has only NVMe 10000:e1:00.0
> * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
>
> Here is the PCI tree:
>
> -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD
> Graphics G4] | +-04.0 Intel Corporation TigerLake-LP
> Dynamic Tuning Processor Participant
> | +-06.0 Intel Corporation RST VMD Managed Controller
> | +-07.0-[01-2b]--
> | +-08.0 Intel Corporation GNA Scoring Accelerator module
> | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator
> Driver | +-0d.0 Intel Corporation Tiger Lake-LP
> Thunderbolt 4 USB Controller
> | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4
> NHI #0 | +-0e.0 Intel Corporation Volume Management Device
> NVMe RAID Controller
> | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> xHCI Host Controller
> | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> | +-14.3 Intel Corporation Wi-Fi 6 AX201
> | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C
> Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP
> Serial IO I2C Controller #1 | +-16.0 Intel Corporation
> Tiger Lake-LP Management Engine Interface | +-17.0 Intel
> Corporation RST VMD Managed Controller | +-1f.0 Intel
> Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel
> Corporation Tiger Lake-LP Smart Sound Technology Audio Controller
> | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
>
> According the findings above, to ensure the devices on the VMD bus
> have correctly states, seems pci_save_state() all the devices before
> pci_reset_bus(), and pci_restore_state() all the devices after
> pci_reset_bus() is the correct answer.
What happens if you call pci_reset_bus with PCIe bridge 10000:e0:06.0
instead of NVMe 10000:e1:00.0? I believe the current implementation in
vmd_enable_domain finds first child device on each rootport and calls
pci_reset_bus with NVMe.
-nirmal
>
> Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-09-03 15:17 ` Nirmal Patel
@ 2024-09-04 3:56 ` Jian-Hong Pan
0 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-09-04 3:56 UTC (permalink / raw)
To: Nirmal Patel
Cc: Ilpo Järvinen, David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, LKML, linux
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年9月3日 週二 下午11:17寫道:
>
> On Mon, 12 Aug 2024 16:18:22 +0800
> Jian-Hong Pan <jhp@endlessos.org> wrote:
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四
> > 下午5:49寫道:
> > >
> > > On Wed, 7 Aug 2024, David E. Box wrote:
> > >
> > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > > >
> > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日
> > > > > > 週二 上午4:26寫道:
> > > > > > >
> > > > > > > Hi Jian-Hong,
> > > > > > >
> > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五
> > > > > > > > 下午4:04寫道:
> > > > > > > > >
> > > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > > __pci_enable_link_state(),
> > > > > > > > > it configs the link directly without ensuring related
> > > > > > > > > L1.2 parameters, such
> > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and
> > > > > > > > > LTR_L1.2_THRESHOLD have been
> > > > > > > > > programmed.
> > > > > > > > >
> > > > > > > > > This leads the link's L1.2 between PCIe Root Port and
> > > > > > > > > child device gets
> > > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > > >
> > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled
> > > > > > > > > VMD:
> > > > > > > > >
> > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen
> > > > > > > > > Core Processor PCIe
> > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > > > > > > > ASPM_L1.1+ L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=45us
> > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us
> > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us
> > > > > > > > >
> > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk
> > > > > > > > > Corp WD Blue SN550
> > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+
> > > > > > > > > ASPM_L1.1- L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=32us
> > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us
> > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us
> > > > > > > > >
> > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling
> > > > > > > > > ASPM L1.2 on the PCIe
> > > > > > > > > Root Port and the child NVMe, they should be programmed
> > > > > > > > > with the same LTR1.2_Threshold value. However, they
> > > > > > > > > have different values in this case.
> > > > > > > > >
> > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2
> > > > > > > > > parameters properly before
> > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > > __pci_enable_link_state().
> > > > > > > > >
> > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - Prepare the PCIe LTR parameters before enable L1
> > > > > > > > > Substates
> > > > > > > > >
> > > > > > > > > v3:
> > > > > > > > > - Only enable supported features for the L1 Substates
> > > > > > > > > part
> > > > > > > > >
> > > > > > > > > v4:
> > > > > > > > > - Focus on fixing L1.2 parameters, instead of
> > > > > > > > > re-initializing whole L1SS
> > > > > > > > >
> > > > > > > > > v5:
> > > > > > > > > - Fix typo and commit message
> > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM:
> > > > > > > > > Introduce aspm_get_l1ss_cap()"
> > > > > > > > >
> > > > > > > > > v6:
> > > > > > > > > - Skipped
> > > > > > > > >
> > > > > > > > > v7:
> > > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > > - Drop the link state flag check. And, always config
> > > > > > > > > link state's timing
> > > > > > > > > parameters
> > > > > > > > >
> > > > > > > > > v8:
> > > > > > > > > - Because pcie_aspm_get_link() might return the link as
> > > > > > > > > NULL, move getting the link's parent and child devices
> > > > > > > > > after check the link is not NULL. This avoids NULL
> > > > > > > > > memory access.
> > > > > > > > >
> > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c
> > > > > > > > > b/drivers/pci/pcie/aspm.c index
> > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 ---
> > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > > @@ -1411,9 +1411,15 @@
> > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int
> > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int
> > > > > > > > > state, bool locked)
> > > > > > > > > {
> > > > > > > > > struct pcie_link_state *link =
> > > > > > > > > pcie_aspm_get_link(pdev);
> > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > > >
> > > > > > > > > if (!link)
> > > > > > > > > return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + parent = link->pdev;
> > > > > > > > > + child = link->downstream;
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * A driver requested that ASPM be enabled on
> > > > > > > > > this device, but
> > > > > > > > > * if we don't have permission to manage ASPM
> > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int
> > > > > > > > > __pci_enable_link_state(struct pci_dev
> > > > > > > > > *pdev, int state, bool locked)
> > > > > > > > > if (!locked)
> > > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > > + /*
> > > > > > > > > + * Ensure L1.2 parameters:
> > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and
> > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly
> > > > > > > > > before enable bits for
> > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > > + */
> > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap,
> > > > > > > > > child_l1ss_cap);
> > > > > > >
> > > > > > > I still don't think this is the place to recalculate the
> > > > > > > L1.2 parameters especially when know the calculation was
> > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a
> > > > > > > pci_save/restore_state() before/after pci_bus_reset() in
> > > > > > > vmd.c?
> > > > > >
> > > > > > I have not thought pci_save/restore_state() around
> > > > > > pci_bus_reset() before. It is an interesting direction.
> > > > > >
> > > > > > So, I prepare modification below for test. Include "[PATCH
> > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI
> > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the
> > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as
> > > > > > expected.
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > > b/drivers/pci/controller/vmd.c index
> > > > > > bbf4a47e7b31..6b8dd4f30127 100644 ---
> > > > > > a/drivers/pci/controller/vmd.c +++
> > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static
> > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge
> > > > > > *root_bridge, vmd_bridge->native_dpc =
> > > > > > root_bridge->native_dpc; }
> > > > > >
> > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void
> > > > > > *userdata) +{
> > > > > > + pci_save_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void
> > > > > > *userdata) +{
> > > > > > + pci_restore_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Enable ASPM and LTR settings on devices that aren't
> > > > > > configured by BIOS. */
> > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct
> > > > > > vmd_dev *vmd, unsigned long features)
> > > > > > pci_scan_child_bus(vmd->bus);
> > > > > > vmd_domain_reset(vmd);
> > > > > >
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > > /* When Intel VMD is enabled, the OS does not
> > > > > > discover the Root Ports
> > > > > > * owned by Intel VMD within the MMCFG space.
> > > > > > pci_reset_bus() applies
> > > > > > * a reset to the parent of the PCI device supplied
> > > > > > as argument. This
> > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct
> > > > > > vmd_dev *vmd, unsigned long features)
> > > > > > break;
> > > > > > }
> > > > > > }
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > > >
> > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > > vmd_enable_domain() which preserves state unlike
> > > > > pci_reset_bus()?
> > > > >
> > > > > (Don't tell me naming of these functions is a horrible mess.
> > > > > :-/)
> > > >
> > > > Hmm. So this *is* calling pci_reset_bus().
> > >
> > > Yeah, I managed to get confused by the names myself, I somehow
> > > ended up thinking it calls pci_bus_reset() which is not correct...
> > >
> > > > L1.2 configuration has specific
> > > > ordering requirements for changes to parent & child devices.
> > > > Could be why it's not getting restored properly.
> > >
> > > Indeed, it has to be something else since the patch above doesn't
> > > even restore anything because dev->state_saved should get set to
> > > false by the first pci_restore_state() called from
> > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(),
> > > I think!?
> >
> > Inspired by Ilpo's comment. I add some debug messages based on
> > linux-next's tag 'next-20240809' to understand the code path of
> > pci_reset_bus():
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index ffaaca0978cb..3ee71374f1de 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct
> > pci_dev *dev)
> > * races with ->remove() by the device lock, which must be
> > held by
> > * the caller.
> > */
> > - if (err_handler && err_handler->reset_prepare)
> > + if (err_handler && err_handler->reset_prepare) {
> > + pci_info(dev, "%s: %pF\n", __func__,
> > err_handler->reset_prepare);
> > err_handler->reset_prepare(dev);
> > + }
> >
> > /*
> > * Wake-up device prior to save. PM registers default to D0
> > after @@ -5144,6 +5146,7 @@ static void
> > pci_dev_save_and_disable(struct pci_dev *dev)
> > pci_set_power_state(dev, PCI_D0);
> >
> > pci_save_state(dev);
> > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> > dev->state_saved ? "true" : "false");
> > /*
> > * Disable the device by clearing the Command register,
> > except for
> > * INTx-disable which is set. This not only disables MMIO
> > and I/O port @@ -5655,6 +5658,10 @@ static void
> > pci_bus_save_and_disable_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s
> > subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_save_and_disable(dev);
> > if (dev->subordinate)
> > pci_bus_save_and_disable_locked(dev->subordinate);
> > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct
> > pci_bus *bus) struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s
> > subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_restore(dev);
> > if (dev->subordinate)
> > pci_bus_restore_locked(dev->subordinate);
> > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus,
> > bool probe) if (!bus->self || !pci_bus_resettable(bus))
> > return -ENOTTY;
> >
> > - if (probe)
> > + if (probe) {
> > + pci_info(bus->self, "%s: probe is true. So return 0
> > directly", __func__);
> > return 0;
> > + }
> >
> > pci_bus_lock(bus);
> >
> > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus
> > *bus) int rc;
> >
> > rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n",
> > __func__, rc); if (rc)
> > return rc;
> >
> > if (pci_bus_trylock(bus)) {
> > + pci_info(bus->self, "%s: pci_bus_trylock() returns
> > true\n", __func__);
> > pci_bus_save_and_disable_locked(bus);
> > might_sleep();
> > rc = pci_bridge_secondary_bus_reset(bus->self);
> > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > */
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> > + pci_info(pdev, "%s: %s", __func__,
> > !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> > return (!pci_probe_reset_slot(pdev->slot)) ?
> > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > And, have the information of VMD PCIe devices with the built kernel:
> >
> > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> > decode])
> > ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=0us
> >
> > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> > ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> > PCIe bridge has LTR1.2_Threshold=0ns.
> >
> > Then, check the dmesg. I notice the debug messages:
> >
> > pci 10000:e0:06.0: PCI bridge to [bus e1]
> > pci 10000:e0:06.0: Primary bus is hard wired to 0
> > pci 10000:e1:00.0: pci_reset_bus: false
> > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> > false, and does not have subordinate
> > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> > Freeing initrd memory: 75236K
> > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> > and does not have subordinate
> >
> > So, the code path is:
> >
> > vmd_enable_domain()
> > pci_reset_bus()
> > __pci_reset_bus()
> > pci_bus_reset()
> > pci_bus_save_and_disable_locked()
> > pci_dev_save_and_disable()
> > pci_bus_restore_locked()
> > pci_dev_restore()
> >
> > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> > So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
> >
> > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> > bus in vmd_enable_domain().
> > * Bus "e1" has only NVMe 10000:e1:00.0
> > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> > 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
> >
> > Here is the PCI tree:
> >
> > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD
> > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP
> > Dynamic Tuning Processor Participant
> > | +-06.0 Intel Corporation RST VMD Managed Controller
> > | +-07.0-[01-2b]--
> > | +-08.0 Intel Corporation GNA Scoring Accelerator module
> > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator
> > Driver | +-0d.0 Intel Corporation Tiger Lake-LP
> > Thunderbolt 4 USB Controller
> > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4
> > NHI #0 | +-0e.0 Intel Corporation Volume Management Device
> > NVMe RAID Controller
> > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> > xHCI Host Controller
> > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> > | +-14.3 Intel Corporation Wi-Fi 6 AX201
> > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C
> > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP
> > Serial IO I2C Controller #1 | +-16.0 Intel Corporation
> > Tiger Lake-LP Management Engine Interface | +-17.0 Intel
> > Corporation RST VMD Managed Controller | +-1f.0 Intel
> > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel
> > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller
> > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
> >
> > According the findings above, to ensure the devices on the VMD bus
> > have correctly states, seems pci_save_state() all the devices before
> > pci_reset_bus(), and pci_restore_state() all the devices after
> > pci_reset_bus() is the correct answer.
> What happens if you call pci_reset_bus with PCIe bridge 10000:e0:06.0
> instead of NVMe 10000:e1:00.0? I believe the current implementation in
> vmd_enable_domain finds first child device on each rootport and calls
> pci_reset_bus with NVMe.
If call pci_reset_bus with PCIe bridge 10000:e0:06.0 instead of NVMe
10000:e1:00.0, it will show warning message "pci 10000:e0:06.0 can't
reset device: -25".
Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-08-12 8:18 ` Jian-Hong Pan
2024-09-02 15:43 ` Ilpo Järvinen
2024-09-03 15:17 ` Nirmal Patel
@ 2024-09-20 16:03 ` Nirmal Patel
2024-09-23 10:57 ` Jian-Hong Pan
2 siblings, 1 reply; 22+ messages in thread
From: Nirmal Patel @ 2024-09-20 16:03 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: Ilpo Järvinen, David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, LKML, linux
On Mon, 12 Aug 2024 16:18:22 +0800
Jian-Hong Pan <jhp@endlessos.org> wrote:
> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四
> 下午5:49寫道:
> >
> > On Wed, 7 Aug 2024, David E. Box wrote:
> >
> > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > >
> > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日
> > > > > 週二 上午4:26寫道:
> > > > > >
> > > > > > Hi Jian-Hong,
> > > > > >
> > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五
> > > > > > > 下午4:04寫道:
> > > > > > > >
> > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > __pci_enable_link_state(),
> > > > > > > > it configs the link directly without ensuring related
> > > > > > > > L1.2 parameters, such
> > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and
> > > > > > > > LTR_L1.2_THRESHOLD have been
> > > > > > > > programmed.
> > > > > > > >
> > > > > > > > This leads the link's L1.2 between PCIe Root Port and
> > > > > > > > child device gets
> > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > >
> > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled
> > > > > > > > VMD:
> > > > > > > >
> > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen
> > > > > > > > Core Processor PCIe
> > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > ...
> > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > > > > > > ASPM_L1.1+ L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=45us
> > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us
> > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us
> > > > > > > >
> > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk
> > > > > > > > Corp WD Blue SN550
> > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > ...
> > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+
> > > > > > > > ASPM_L1.1- L1_PM_Substates+
> > > > > > > > PortCommonModeRestoreTime=32us
> > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us
> > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us
> > > > > > > >
> > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling
> > > > > > > > ASPM L1.2 on the PCIe
> > > > > > > > Root Port and the child NVMe, they should be programmed
> > > > > > > > with the same LTR1.2_Threshold value. However, they
> > > > > > > > have different values in this case.
> > > > > > > >
> > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2
> > > > > > > > parameters properly before
> > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > __pci_enable_link_state().
> > > > > > > >
> > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > ---
> > > > > > > > v2:
> > > > > > > > - Prepare the PCIe LTR parameters before enable L1
> > > > > > > > Substates
> > > > > > > >
> > > > > > > > v3:
> > > > > > > > - Only enable supported features for the L1 Substates
> > > > > > > > part
> > > > > > > >
> > > > > > > > v4:
> > > > > > > > - Focus on fixing L1.2 parameters, instead of
> > > > > > > > re-initializing whole L1SS
> > > > > > > >
> > > > > > > > v5:
> > > > > > > > - Fix typo and commit message
> > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM:
> > > > > > > > Introduce aspm_get_l1ss_cap()"
> > > > > > > >
> > > > > > > > v6:
> > > > > > > > - Skipped
> > > > > > > >
> > > > > > > > v7:
> > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > - Drop the link state flag check. And, always config
> > > > > > > > link state's timing
> > > > > > > > parameters
> > > > > > > >
> > > > > > > > v8:
> > > > > > > > - Because pcie_aspm_get_link() might return the link as
> > > > > > > > NULL, move getting the link's parent and child devices
> > > > > > > > after check the link is not NULL. This avoids NULL
> > > > > > > > memory access.
> > > > > > > >
> > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/pci/pcie/aspm.c
> > > > > > > > b/drivers/pci/pcie/aspm.c index
> > > > > > > > 5db1044c9895..55ff1d26fcea 100644 ---
> > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > @@ -1411,9 +1411,15 @@
> > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int
> > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int
> > > > > > > > state, bool locked)
> > > > > > > > {
> > > > > > > > struct pcie_link_state *link =
> > > > > > > > pcie_aspm_get_link(pdev);
> > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > >
> > > > > > > > if (!link)
> > > > > > > > return -EINVAL;
> > > > > > > > +
> > > > > > > > + parent = link->pdev;
> > > > > > > > + child = link->downstream;
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * A driver requested that ASPM be enabled on
> > > > > > > > this device, but
> > > > > > > > * if we don't have permission to manage ASPM
> > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int
> > > > > > > > __pci_enable_link_state(struct pci_dev
> > > > > > > > *pdev, int state, bool locked)
> > > > > > > > if (!locked)
> > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > + /*
> > > > > > > > + * Ensure L1.2 parameters:
> > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and
> > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly
> > > > > > > > before enable bits for
> > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > + */
> > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap,
> > > > > > > > child_l1ss_cap);
> > > > > >
> > > > > > I still don't think this is the place to recalculate the
> > > > > > L1.2 parameters especially when know the calculation was
> > > > > > done but was cleared by pci_bus_reset(). Can't we just do a
> > > > > > pci_save/restore_state() before/after pci_bus_reset() in
> > > > > > vmd.c?
> > > > >
> > > > > I have not thought pci_save/restore_state() around
> > > > > pci_bus_reset() before. It is an interesting direction.
> > > > >
> > > > > So, I prepare modification below for test. Include "[PATCH
> > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI
> > > > > PM's L1 substates", too. Then, both the PCIe bridge and the
> > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as
> > > > > expected.
> > > > >
> > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > b/drivers/pci/controller/vmd.c index
> > > > > bbf4a47e7b31..6b8dd4f30127 100644 ---
> > > > > a/drivers/pci/controller/vmd.c +++
> > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static
> > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge
> > > > > *root_bridge, vmd_bridge->native_dpc =
> > > > > root_bridge->native_dpc; }
> > > > >
> > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void
> > > > > *userdata) +{
> > > > > + pci_save_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void
> > > > > *userdata) +{
> > > > > + pci_restore_state(pdev);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > /*
> > > > > * Enable ASPM and LTR settings on devices that aren't
> > > > > configured by BIOS. */
> > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct
> > > > > vmd_dev *vmd, unsigned long features)
> > > > > pci_scan_child_bus(vmd->bus);
> > > > > vmd_domain_reset(vmd);
> > > > >
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > /* When Intel VMD is enabled, the OS does not
> > > > > discover the Root Ports
> > > > > * owned by Intel VMD within the MMCFG space.
> > > > > pci_reset_bus() applies
> > > > > * a reset to the parent of the PCI device supplied
> > > > > as argument. This
> > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct
> > > > > vmd_dev *vmd, unsigned long features)
> > > > > break;
> > > > > }
> > > > > }
> > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > >
> > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > vmd_enable_domain() which preserves state unlike
> > > > pci_reset_bus()?
> > > >
> > > > (Don't tell me naming of these functions is a horrible mess.
> > > > :-/)
> > >
> > > Hmm. So this *is* calling pci_reset_bus().
> >
> > Yeah, I managed to get confused by the names myself, I somehow
> > ended up thinking it calls pci_bus_reset() which is not correct...
> >
> > > L1.2 configuration has specific
> > > ordering requirements for changes to parent & child devices.
> > > Could be why it's not getting restored properly.
> >
> > Indeed, it has to be something else since the patch above doesn't
> > even restore anything because dev->state_saved should get set to
> > false by the first pci_restore_state() called from
> > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(),
> > I think!?
>
> Inspired by Ilpo's comment. I add some debug messages based on
> linux-next's tag 'next-20240809' to understand the code path of
> pci_reset_bus():
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ffaaca0978cb..3ee71374f1de 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct
> pci_dev *dev)
> * races with ->remove() by the device lock, which must be
> held by
> * the caller.
> */
> - if (err_handler && err_handler->reset_prepare)
> + if (err_handler && err_handler->reset_prepare) {
> + pci_info(dev, "%s: %pF\n", __func__,
> err_handler->reset_prepare);
> err_handler->reset_prepare(dev);
> + }
>
> /*
> * Wake-up device prior to save. PM registers default to D0
> after @@ -5144,6 +5146,7 @@ static void
> pci_dev_save_and_disable(struct pci_dev *dev)
> pci_set_power_state(dev, PCI_D0);
>
> pci_save_state(dev);
> + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> dev->state_saved ? "true" : "false");
> /*
> * Disable the device by clearing the Command register,
> except for
> * INTx-disable which is set. This not only disables MMIO
> and I/O port @@ -5655,6 +5658,10 @@ static void
> pci_bus_save_and_disable_locked(struct pci_bus *bus)
> struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s
> subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_save_and_disable(dev);
> if (dev->subordinate)
> pci_bus_save_and_disable_locked(dev->subordinate);
> @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct
> pci_bus *bus) struct pci_dev *dev;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> + pci_info(dev, "%s: PCI state_saved is %s, and %s
> subordinate\n",
> + __func__,
> + dev->state_saved ? "true" : "false",
> + dev->subordinate ? "has" : "does not have");
> pci_dev_restore(dev);
> if (dev->subordinate)
> pci_bus_restore_locked(dev->subordinate);
> @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus,
> bool probe) if (!bus->self || !pci_bus_resettable(bus))
> return -ENOTTY;
>
> - if (probe)
> + if (probe) {
> + pci_info(bus->self, "%s: probe is true. So return 0
> directly", __func__);
> return 0;
> + }
>
> pci_bus_lock(bus);
>
> @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus
> *bus) int rc;
>
> rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n",
> __func__, rc); if (rc)
> return rc;
>
> if (pci_bus_trylock(bus)) {
> + pci_info(bus->self, "%s: pci_bus_trylock() returns
> true\n", __func__);
> pci_bus_save_and_disable_locked(bus);
> might_sleep();
> rc = pci_bridge_secondary_bus_reset(bus->self);
> @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> */
> int pci_reset_bus(struct pci_dev *pdev)
> {
> + pci_info(pdev, "%s: %s", __func__,
> !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> return (!pci_probe_reset_slot(pdev->slot)) ?
> __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> }
>
> And, have the information of VMD PCIe devices with the built kernel:
>
> 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> decode])
> ...
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=0ns
> L1SubCtl2: T_PwrOn=0us
>
> 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> ...
> Capabilities: [900 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=101376ns
> L1SubCtl2: T_PwrOn=50us
>
> We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> PCIe bridge has LTR1.2_Threshold=0ns.
>
> Then, check the dmesg. I notice the debug messages:
>
> pci 10000:e0:06.0: PCI bridge to [bus e1]
> pci 10000:e0:06.0: Primary bus is hard wired to 0
> pci 10000:e1:00.0: pci_reset_bus: false
> pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> false, and does not have subordinate
> pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> Freeing initrd memory: 75236K
> pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> and does not have subordinate
>
> So, the code path is:
>
> vmd_enable_domain()
> pci_reset_bus()
> __pci_reset_bus()
> pci_bus_reset()
> pci_bus_save_and_disable_locked()
> pci_dev_save_and_disable()
> pci_bus_restore_locked()
> pci_dev_restore()
>
> And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
>
> Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> bus in vmd_enable_domain().
> * Bus "e1" has only NVMe 10000:e1:00.0
> * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
>
> Here is the PCI tree:
>
> -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD
> Graphics G4] | +-04.0 Intel Corporation TigerLake-LP
> Dynamic Tuning Processor Participant
> | +-06.0 Intel Corporation RST VMD Managed Controller
> | +-07.0-[01-2b]--
> | +-08.0 Intel Corporation GNA Scoring Accelerator module
> | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator
> Driver | +-0d.0 Intel Corporation Tiger Lake-LP
> Thunderbolt 4 USB Controller
> | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4
> NHI #0 | +-0e.0 Intel Corporation Volume Management Device
> NVMe RAID Controller
> | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> xHCI Host Controller
> | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> | +-14.3 Intel Corporation Wi-Fi 6 AX201
> | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C
> Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP
> Serial IO I2C Controller #1 | +-16.0 Intel Corporation
> Tiger Lake-LP Management Engine Interface | +-17.0 Intel
> Corporation RST VMD Managed Controller | +-1f.0 Intel
> Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel
> Corporation Tiger Lake-LP Smart Sound Technology Audio Controller
> | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
>
> According the findings above, to ensure the devices on the VMD bus
> have correctly states, seems pci_save_state() all the devices before
> pci_reset_bus(), and pci_restore_state() all the devices after
> pci_reset_bus() is the correct answer.
Did you get a chance to test this theory?
>
> Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-09-02 15:43 ` Ilpo Järvinen
2024-09-03 10:31 ` Jian-Hong Pan
@ 2024-09-23 8:41 ` Jian-Hong Pan
2024-09-23 13:44 ` Ilpo Järvinen
1 sibling, 1 reply; 22+ messages in thread
From: Jian-Hong Pan @ 2024-09-23 8:41 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道:
>
> On Mon, 12 Aug 2024, Jian-Hong Pan wrote:
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道:
> > > On Wed, 7 Aug 2024, David E. Box wrote:
> > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > > >
> > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > > > > >
> > > > > > > Hi Jian-Hong,
> > > > > > >
> > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > > > > >
> > > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > > __pci_enable_link_state(),
> > > > > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > > > > such
> > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > > > > been
> > > > > > > > > programmed.
> > > > > > > > >
> > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > > > > gets
> > > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > > >
> > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > > > > >
> > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > > > > PCIe
> > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > > > > L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > > > > >
> > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > > > > SN550
> > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > > > > >
> > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > > > > PCIe
> > > > > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > > > > case.
> > > > > > > > >
> > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > > > > before
> > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > > __pci_enable_link_state().
> > > > > > > > >
> > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > > > > >
> > > > > > > > > v3:
> > > > > > > > > - Only enable supported features for the L1 Substates part
> > > > > > > > >
> > > > > > > > > v4:
> > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > > > > L1SS
> > > > > > > > >
> > > > > > > > > v5:
> > > > > > > > > - Fix typo and commit message
> > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > > > > aspm_get_l1ss_cap()"
> > > > > > > > >
> > > > > > > > > v6:
> > > > > > > > > - Skipped
> > > > > > > > >
> > > > > > > > > v7:
> > > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > > > > timing
> > > > > > > > > parameters
> > > > > > > > >
> > > > > > > > > v8:
> > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > > > > getting the link's parent and child devices after check the link is
> > > > > > > > > not NULL. This avoids NULL memory access.
> > > > > > > > >
> > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > > > > bool
> > > > > > > > > locked)
> > > > > > > > > {
> > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > > >
> > > > > > > > > if (!link)
> > > > > > > > > return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + parent = link->pdev;
> > > > > > > > > + child = link->downstream;
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > > > > pci_dev
> > > > > > > > > *pdev, int state, bool locked)
> > > > > > > > > if (!locked)
> > > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > > + /*
> > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > > > > T_POWER_ON and
> > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > > > > bits for
> > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > > + */
> > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > > > > >
> > > > > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > > > > especially when know the calculation was done but was cleared by
> > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > > > > pci_bus_reset() in vmd.c?
> > > > > >
> > > > > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > > > > before. It is an interesting direction.
> > > > > >
> > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > > > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > > > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > > > > --- a/drivers/pci/controller/vmd.c
> > > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > > > > pci_host_bridge *root_bridge,
> > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > > > > }
> > > > > >
> > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > > > > +{
> > > > > > + pci_save_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > > > > +{
> > > > > > + pci_restore_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > > > > */
> > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > unsigned long features)
> > > > > > pci_scan_child_bus(vmd->bus);
> > > > > > vmd_domain_reset(vmd);
> > > > > >
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > > > > Ports
> > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > > > > applies
> > > > > > * a reset to the parent of the PCI device supplied as argument.
> > > > > > This
> > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > unsigned long features)
> > > > > > break;
> > > > > > }
> > > > > > }
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > > >
> > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> > > > >
> > > > > (Don't tell me naming of these functions is a horrible mess. :-/)
> > > >
> > > > Hmm. So this *is* calling pci_reset_bus().
> > >
> > > Yeah, I managed to get confused by the names myself, I somehow
> > > ended up thinking it calls pci_bus_reset() which is not correct...
> > >
> > > > L1.2 configuration has specific
> > > > ordering requirements for changes to parent & child devices. Could be why it's
> > > > not getting restored properly.
> > >
> > > Indeed, it has to be something else since the patch above doesn't even
> > > restore anything because dev->state_saved should get set to false by the
> > > first pci_restore_state() called from
> > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
> > > think!?
> >
> > Inspired by Ilpo's comment. I add some debug messages based on
> > linux-next's tag 'next-20240809' to understand the code path of
> > pci_reset_bus():
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index ffaaca0978cb..3ee71374f1de 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > * races with ->remove() by the device lock, which must be held by
> > * the caller.
> > */
> > - if (err_handler && err_handler->reset_prepare)
> > + if (err_handler && err_handler->reset_prepare) {
> > + pci_info(dev, "%s: %pF\n", __func__,
> > err_handler->reset_prepare);
> > err_handler->reset_prepare(dev);
> > + }
> >
> > /*
> > * Wake-up device prior to save. PM registers default to D0 after
> > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > pci_set_power_state(dev, PCI_D0);
> >
> > pci_save_state(dev);
> > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> > dev->state_saved ? "true" : "false");
> > /*
> > * Disable the device by clearing the Command register, except for
> > * INTx-disable which is set. This not only disables MMIO and I/O port
> > @@ -5655,6 +5658,10 @@ static void
> > pci_bus_save_and_disable_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_save_and_disable(dev);
> > if (dev->subordinate)
> > pci_bus_save_and_disable_locked(dev->subordinate);
> > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_restore(dev);
> > if (dev->subordinate)
> > pci_bus_restore_locked(dev->subordinate);
> > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
> > if (!bus->self || !pci_bus_resettable(bus))
> > return -ENOTTY;
> >
> > - if (probe)
> > + if (probe) {
> > + pci_info(bus->self, "%s: probe is true. So return 0
> > directly", __func__);
> > return 0;
> > + }
> >
> > pci_bus_lock(bus);
> >
> > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > int rc;
> >
> > rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc);
> > if (rc)
> > return rc;
> >
> > if (pci_bus_trylock(bus)) {
> > + pci_info(bus->self, "%s: pci_bus_trylock() returns
> > true\n", __func__);
> > pci_bus_save_and_disable_locked(bus);
> > might_sleep();
> > rc = pci_bridge_secondary_bus_reset(bus->self);
> > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > */
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> > + pci_info(pdev, "%s: %s", __func__,
> > !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> > return (!pci_probe_reset_slot(pdev->slot)) ?
> > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > And, have the information of VMD PCIe devices with the built kernel:
> >
> > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> > decode])
> > ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=0us
> >
> > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> > ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
> > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> > PCIe bridge has LTR1.2_Threshold=0ns.
>
> This is now the other way around as in the original posting that had
> 0ns for 10000:e1:00.0 ??
>
> Is this behavior even consistent or did you e.g. mess up some copy
> pasting somewhere?
>
> > Then, check the dmesg. I notice the debug messages:
> >
> > pci 10000:e0:06.0: PCI bridge to [bus e1]
> > pci 10000:e0:06.0: Primary bus is hard wired to 0
> > pci 10000:e1:00.0: pci_reset_bus: false
> > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> > false, and does not have subordinate
> > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> > Freeing initrd memory: 75236K
> > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> > and does not have subordinate
> >
> > So, the code path is:
> >
> > vmd_enable_domain()
> > pci_reset_bus()
> > __pci_reset_bus()
> > pci_bus_reset()
> > pci_bus_save_and_disable_locked()
> > pci_dev_save_and_disable()
> > pci_bus_restore_locked()
> > pci_dev_restore()
> >
> > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> > So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
>
> It should not be necessary to restore the bridge's configuration as it
> ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13:
>
> "Port configuration registers must not be changed, except as required to
> update Port status."
>
> While the second part wording leaves some leeway, I don't think any of
> these field really fall under "Port status".
>
> > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> > bus in vmd_enable_domain().
> > * Bus "e1" has only NVMe 10000:e1:00.0
> > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> > 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
>
> ...But even if those registers on the PCIe bridge were changed underneath
> against the spec, it's not clear from your debug log why pci_dev_restore()
> -> pci_restore_state() -> pci_restore_pcie_state() ->
> pci_restore_aspm_l1ss_state() did not restore also parent's
> LTR1.2_Threshold?? I think it should attempt to do that.
>
> --
> i.
>
> > Here is the PCI tree:
> >
> > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4]
> > | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning
> > Processor Participant
> > | +-06.0 Intel Corporation RST VMD Managed Controller
> > | +-07.0-[01-2b]--
> > | +-08.0 Intel Corporation GNA Scoring Accelerator module
> > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver
> > | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB
> > Controller
> > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0
> > | +-0e.0 Intel Corporation Volume Management Device NVMe
> > RAID Controller
> > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> > xHCI Host Controller
> > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> > | +-14.3 Intel Corporation Wi-Fi 6 AX201
> > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0
> > | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1
> > | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface
> > | +-17.0 Intel Corporation RST VMD Managed Controller
> > | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller
> > | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound
> > Technology Audio Controller
> > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
> >
> > According the findings above, to ensure the devices on the VMD bus
> > have correctly states, seems pci_save_state() all the devices before
> > pci_reset_bus(), and pci_restore_state() all the devices after
> > pci_reset_bus() is the correct answer.
I add some debug messages based on v6.11 as following:
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8..404ce92f0152 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -109,21 +109,28 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev)
u32 cl_ctl1, cl_ctl2, cl_l1_2_enable;
u16 clnkctl, plnkctl;
+ pci_info(pdev, "%s", __func__);
/*
* In case BIOS enabled L1.2 when resuming, we need to disable it first
* on the downstream component before the upstream. So, don't attempt to
* restore either until we are at the downstream component.
*/
- if (pcie_downstream_port(pdev) || !parent)
+ if (pcie_downstream_port(pdev) || !parent) {
+ pci_info(pdev, "%s: %s", __func__,
pcie_downstream_port(pdev) ? "is downstream port" : "not parent");
return;
+ }
- if (!pdev->l1ss || !parent->l1ss)
+ if (!pdev->l1ss || !parent->l1ss) {
+ pci_info(pdev, "%s: %sdoes not have l1ss", __func__,
!pdev->l1ss ? "" : "parent ");
return;
+ }
cl_save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_L1SS);
pl_save_state = pci_find_saved_ext_cap(parent, PCI_EXT_CAP_ID_L1SS);
- if (!cl_save_state || !pl_save_state)
+ if (!cl_save_state || !pl_save_state) {
+ pci_info(pdev, "%s: %sdid not save ext_cap", __func__,
!cl_save_state ? "" : "parent ");
return;
+ }
cap = &cl_save_state->cap.data[0];
cl_ctl2 = *cap++;
@@ -131,6 +138,7 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev)
cap = &pl_save_state->cap.data[0];
pl_ctl2 = *cap++;
pl_ctl1 = *cap;
+ pci_info(pdev, "%s: cl_ctl1: 0x%08x, cl_ctl2: 0x%08x, pl_ctl1:
0x%08x, pl_ctl2: 0x%08x", __func__, cl_ctl1, cl_ctl2, pl_ctl1,
pl_ctl2);
/* Make sure L0s/L1 are disabled before updating L1SS config */
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &clnkctl);
Here is the corresponding log:
[ 0.418931] pci 10000:e0:06.0: PCI bridge to [bus e1]
[ 0.418936] pci 10000:e0:06.0: Primary bus is hard wired to 0
[ 0.447474] Freeing initrd memory: 24700K
[ 0.670789] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state
[ 0.670807] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state:
cl_ctl1: 0x40630000, cl_ctl2: 0x00000029, pl_ctl1: 0x00000000,
pl_ctl2: 0x00000000
[ 0.670862] pci 10000:e0:06.0: bridge window [mem
0x76000000-0x760fffff]: assigned
[ 0.670864] pci 10000:e0:17.0: BAR 0 [mem 0x76100000-0x76101fff]: assigned
[ 0.670881] pci 10000:e0:06.0: bridge window [io size 0x1000]:
can't assign; no space
[ 0.670882] pci 10000:e0:06.0: bridge window [io size 0x1000]:
failed to assign
[ 0.670884] pci 10000:e0:17.0: BAR 5 [mem 0x76102000-0x761027ff]: assigned
[ 0.670893] pci 10000:e0:17.0: BAR 1 [mem 0x76102800-0x761028ff]: assigned
[ 0.670896] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: can't
assign; no space
[ 0.670897] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: failed to assign
[ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: can't
assign; no space
[ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: failed to assign
[ 0.670899] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: can't
assign; no space
[ 0.670900] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: failed to assign
[ 0.670901] pci 10000:e1:00.0: BAR 0 [mem 0x76000000-0x76003fff
64bit]: assigned
[ 0.670909] pci 10000:e1:00.0: BAR 4 [mem 0x76004000-0x760040ff
64bit]: assigned
[ 0.670918] pci 10000:e0:06.0: PCI bridge to [bus e1]
[ 0.670921] pci 10000:e0:06.0: bridge window [mem 0x76000000-0x760fffff]
[ 0.670950] pci 10000:e1:00.0: VMD: Default LTR value set by driver
[ 0.670977] pcieport 10000:e0:06.0: can't derive routing for PCI INT D
[ 0.670979] pcieport 10000:e0:06.0: PCI INT D: no GSI
[ 0.671043] pcieport 10000:e0:06.0: PME: Signaling with IRQ 143
[ 0.671092] vmd 0000:00:0e.0: Bound to PCI domain 10000
We can notice both pl_ctl1 and pl_ctl2 are 0x0. Because, the link's
parent device (PCIe bridge 10000:e0:06.0) did not save state before
reset. So, pci_restore_aspm_l1ss_state() restores parent's
LTR1.2_Threshold with a wrong value 0.
Jian-Hong Pan
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-09-20 16:03 ` Nirmal Patel
@ 2024-09-23 10:57 ` Jian-Hong Pan
0 siblings, 0 replies; 22+ messages in thread
From: Jian-Hong Pan @ 2024-09-23 10:57 UTC (permalink / raw)
To: Nirmal Patel
Cc: Ilpo Järvinen, David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Jonathan Derrick, Paul M Stillwell Jr, linux-pci, LKML, linux
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年9月21日 週六 上午12:03寫道:
>
> On Mon, 12 Aug 2024 16:18:22 +0800
> Jian-Hong Pan <jhp@endlessos.org> wrote:
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四
> > 下午5:49寫道:
> > >
> > > On Wed, 7 Aug 2024, David E. Box wrote:
> > >
> > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > > >
> > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日
> > > > > > 週二 上午4:26寫道:
> > > > > > >
> > > > > > > Hi Jian-Hong,
> > > > > > >
> > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五
> > > > > > > > 下午4:04寫道:
> > > > > > > > >
> > > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > > __pci_enable_link_state(),
> > > > > > > > > it configs the link directly without ensuring related
> > > > > > > > > L1.2 parameters, such
> > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and
> > > > > > > > > LTR_L1.2_THRESHOLD have been
> > > > > > > > > programmed.
> > > > > > > > >
> > > > > > > > > This leads the link's L1.2 between PCIe Root Port and
> > > > > > > > > child device gets
> > > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > > >
> > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled
> > > > > > > > > VMD:
> > > > > > > > >
> > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen
> > > > > > > > > Core Processor PCIe
> > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> > > > > > > > > ASPM_L1.1+ L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=45us
> > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us
> > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us
> > > > > > > > >
> > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk
> > > > > > > > > Corp WD Blue SN550
> > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > > ...
> > > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+
> > > > > > > > > ASPM_L1.1- L1_PM_Substates+
> > > > > > > > > PortCommonModeRestoreTime=32us
> > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2-
> > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us
> > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us
> > > > > > > > >
> > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling
> > > > > > > > > ASPM L1.2 on the PCIe
> > > > > > > > > Root Port and the child NVMe, they should be programmed
> > > > > > > > > with the same LTR1.2_Threshold value. However, they
> > > > > > > > > have different values in this case.
> > > > > > > > >
> > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2
> > > > > > > > > parameters properly before
> > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > > __pci_enable_link_state().
> > > > > > > > >
> > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - Prepare the PCIe LTR parameters before enable L1
> > > > > > > > > Substates
> > > > > > > > >
> > > > > > > > > v3:
> > > > > > > > > - Only enable supported features for the L1 Substates
> > > > > > > > > part
> > > > > > > > >
> > > > > > > > > v4:
> > > > > > > > > - Focus on fixing L1.2 parameters, instead of
> > > > > > > > > re-initializing whole L1SS
> > > > > > > > >
> > > > > > > > > v5:
> > > > > > > > > - Fix typo and commit message
> > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM:
> > > > > > > > > Introduce aspm_get_l1ss_cap()"
> > > > > > > > >
> > > > > > > > > v6:
> > > > > > > > > - Skipped
> > > > > > > > >
> > > > > > > > > v7:
> > > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > > - Drop the link state flag check. And, always config
> > > > > > > > > link state's timing
> > > > > > > > > parameters
> > > > > > > > >
> > > > > > > > > v8:
> > > > > > > > > - Because pcie_aspm_get_link() might return the link as
> > > > > > > > > NULL, move getting the link's parent and child devices
> > > > > > > > > after check the link is not NULL. This avoids NULL
> > > > > > > > > memory access.
> > > > > > > > >
> > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c
> > > > > > > > > b/drivers/pci/pcie/aspm.c index
> > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 ---
> > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > > @@ -1411,9 +1411,15 @@
> > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int
> > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int
> > > > > > > > > state, bool locked)
> > > > > > > > > {
> > > > > > > > > struct pcie_link_state *link =
> > > > > > > > > pcie_aspm_get_link(pdev);
> > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > > >
> > > > > > > > > if (!link)
> > > > > > > > > return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + parent = link->pdev;
> > > > > > > > > + child = link->downstream;
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * A driver requested that ASPM be enabled on
> > > > > > > > > this device, but
> > > > > > > > > * if we don't have permission to manage ASPM
> > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int
> > > > > > > > > __pci_enable_link_state(struct pci_dev
> > > > > > > > > *pdev, int state, bool locked)
> > > > > > > > > if (!locked)
> > > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > > + /*
> > > > > > > > > + * Ensure L1.2 parameters:
> > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and
> > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly
> > > > > > > > > before enable bits for
> > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > > + */
> > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap,
> > > > > > > > > child_l1ss_cap);
> > > > > > >
> > > > > > > I still don't think this is the place to recalculate the
> > > > > > > L1.2 parameters especially when know the calculation was
> > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a
> > > > > > > pci_save/restore_state() before/after pci_bus_reset() in
> > > > > > > vmd.c?
> > > > > >
> > > > > > I have not thought pci_save/restore_state() around
> > > > > > pci_bus_reset() before. It is an interesting direction.
> > > > > >
> > > > > > So, I prepare modification below for test. Include "[PATCH
> > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI
> > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the
> > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as
> > > > > > expected.
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > > b/drivers/pci/controller/vmd.c index
> > > > > > bbf4a47e7b31..6b8dd4f30127 100644 ---
> > > > > > a/drivers/pci/controller/vmd.c +++
> > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static
> > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge
> > > > > > *root_bridge, vmd_bridge->native_dpc =
> > > > > > root_bridge->native_dpc; }
> > > > > >
> > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void
> > > > > > *userdata) +{
> > > > > > + pci_save_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void
> > > > > > *userdata) +{
> > > > > > + pci_restore_state(pdev);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Enable ASPM and LTR settings on devices that aren't
> > > > > > configured by BIOS. */
> > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct
> > > > > > vmd_dev *vmd, unsigned long features)
> > > > > > pci_scan_child_bus(vmd->bus);
> > > > > > vmd_domain_reset(vmd);
> > > > > >
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > > /* When Intel VMD is enabled, the OS does not
> > > > > > discover the Root Ports
> > > > > > * owned by Intel VMD within the MMCFG space.
> > > > > > pci_reset_bus() applies
> > > > > > * a reset to the parent of the PCI device supplied
> > > > > > as argument. This
> > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct
> > > > > > vmd_dev *vmd, unsigned long features)
> > > > > > break;
> > > > > > }
> > > > > > }
> > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > > >
> > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > > vmd_enable_domain() which preserves state unlike
> > > > > pci_reset_bus()?
> > > > >
> > > > > (Don't tell me naming of these functions is a horrible mess.
> > > > > :-/)
> > > >
> > > > Hmm. So this *is* calling pci_reset_bus().
> > >
> > > Yeah, I managed to get confused by the names myself, I somehow
> > > ended up thinking it calls pci_bus_reset() which is not correct...
> > >
> > > > L1.2 configuration has specific
> > > > ordering requirements for changes to parent & child devices.
> > > > Could be why it's not getting restored properly.
> > >
> > > Indeed, it has to be something else since the patch above doesn't
> > > even restore anything because dev->state_saved should get set to
> > > false by the first pci_restore_state() called from
> > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(),
> > > I think!?
> >
> > Inspired by Ilpo's comment. I add some debug messages based on
> > linux-next's tag 'next-20240809' to understand the code path of
> > pci_reset_bus():
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index ffaaca0978cb..3ee71374f1de 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct
> > pci_dev *dev)
> > * races with ->remove() by the device lock, which must be
> > held by
> > * the caller.
> > */
> > - if (err_handler && err_handler->reset_prepare)
> > + if (err_handler && err_handler->reset_prepare) {
> > + pci_info(dev, "%s: %pF\n", __func__,
> > err_handler->reset_prepare);
> > err_handler->reset_prepare(dev);
> > + }
> >
> > /*
> > * Wake-up device prior to save. PM registers default to D0
> > after @@ -5144,6 +5146,7 @@ static void
> > pci_dev_save_and_disable(struct pci_dev *dev)
> > pci_set_power_state(dev, PCI_D0);
> >
> > pci_save_state(dev);
> > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> > dev->state_saved ? "true" : "false");
> > /*
> > * Disable the device by clearing the Command register,
> > except for
> > * INTx-disable which is set. This not only disables MMIO
> > and I/O port @@ -5655,6 +5658,10 @@ static void
> > pci_bus_save_and_disable_locked(struct pci_bus *bus)
> > struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s
> > subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_save_and_disable(dev);
> > if (dev->subordinate)
> > pci_bus_save_and_disable_locked(dev->subordinate);
> > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct
> > pci_bus *bus) struct pci_dev *dev;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > + pci_info(dev, "%s: PCI state_saved is %s, and %s
> > subordinate\n",
> > + __func__,
> > + dev->state_saved ? "true" : "false",
> > + dev->subordinate ? "has" : "does not have");
> > pci_dev_restore(dev);
> > if (dev->subordinate)
> > pci_bus_restore_locked(dev->subordinate);
> > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus,
> > bool probe) if (!bus->self || !pci_bus_resettable(bus))
> > return -ENOTTY;
> >
> > - if (probe)
> > + if (probe) {
> > + pci_info(bus->self, "%s: probe is true. So return 0
> > directly", __func__);
> > return 0;
> > + }
> >
> > pci_bus_lock(bus);
> >
> > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus
> > *bus) int rc;
> >
> > rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n",
> > __func__, rc); if (rc)
> > return rc;
> >
> > if (pci_bus_trylock(bus)) {
> > + pci_info(bus->self, "%s: pci_bus_trylock() returns
> > true\n", __func__);
> > pci_bus_save_and_disable_locked(bus);
> > might_sleep();
> > rc = pci_bridge_secondary_bus_reset(bus->self);
> > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > */
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> > + pci_info(pdev, "%s: %s", __func__,
> > !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> > return (!pci_probe_reset_slot(pdev->slot)) ?
> > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > And, have the information of VMD PCIe devices with the built kernel:
> >
> > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> > decode])
> > ...
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=0ns
> > L1SubCtl2: T_PwrOn=0us
> >
> > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> > ...
> > Capabilities: [900 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > T_CommonMode=0us LTR1.2_Threshold=101376ns
> > L1SubCtl2: T_PwrOn=50us
> >
> > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> > PCIe bridge has LTR1.2_Threshold=0ns.
> >
> > Then, check the dmesg. I notice the debug messages:
> >
> > pci 10000:e0:06.0: PCI bridge to [bus e1]
> > pci 10000:e0:06.0: Primary bus is hard wired to 0
> > pci 10000:e1:00.0: pci_reset_bus: false
> > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> > false, and does not have subordinate
> > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> > Freeing initrd memory: 75236K
> > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> > and does not have subordinate
> >
> > So, the code path is:
> >
> > vmd_enable_domain()
> > pci_reset_bus()
> > __pci_reset_bus()
> > pci_bus_reset()
> > pci_bus_save_and_disable_locked()
> > pci_dev_save_and_disable()
> > pci_bus_restore_locked()
> > pci_dev_restore()
> >
> > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> > So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
> >
> > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> > bus in vmd_enable_domain().
> > * Bus "e1" has only NVMe 10000:e1:00.0
> > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> > 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
> >
> > Here is the PCI tree:
> >
> > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD
> > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP
> > Dynamic Tuning Processor Participant
> > | +-06.0 Intel Corporation RST VMD Managed Controller
> > | +-07.0-[01-2b]--
> > | +-08.0 Intel Corporation GNA Scoring Accelerator module
> > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator
> > Driver | +-0d.0 Intel Corporation Tiger Lake-LP
> > Thunderbolt 4 USB Controller
> > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4
> > NHI #0 | +-0e.0 Intel Corporation Volume Management Device
> > NVMe RAID Controller
> > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> > xHCI Host Controller
> > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> > | +-14.3 Intel Corporation Wi-Fi 6 AX201
> > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C
> > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP
> > Serial IO I2C Controller #1 | +-16.0 Intel Corporation
> > Tiger Lake-LP Management Engine Interface | +-17.0 Intel
> > Corporation RST VMD Managed Controller | +-1f.0 Intel
> > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel
> > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller
> > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
> >
> > According the findings above, to ensure the devices on the VMD bus
> > have correctly states, seems pci_save_state() all the devices before
> > pci_reset_bus(), and pci_restore_state() all the devices after
> > pci_reset_bus() is the correct answer.
>
> Did you get a chance to test this theory?
Yes. But, according the findings in the previous mail talking about
restoring parent's LTR1.2_Threshold with a wrong value 0, it can be
shrunk to save/restore the PCIe bridge's state.
Preparing a new patch set.
Jian-Hong Pan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state
2024-09-23 8:41 ` Jian-Hong Pan
@ 2024-09-23 13:44 ` Ilpo Järvinen
0 siblings, 0 replies; 22+ messages in thread
From: Ilpo Järvinen @ 2024-09-23 13:44 UTC (permalink / raw)
To: Jian-Hong Pan
Cc: David E. Box, Bjorn Helgaas, Johan Hovold,
Kuppuswamy Sathyanarayanan, Mika Westerberg, Damien Le Moal,
Nirmal Patel, Jonathan Derrick, Paul M Stillwell Jr, linux-pci,
LKML, linux
[-- Attachment #1: Type: text/plain, Size: 26406 bytes --]
On Mon, 23 Sep 2024, Jian-Hong Pan wrote:
> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道:
> >
> > On Mon, 12 Aug 2024, Jian-Hong Pan wrote:
> >
> > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道:
> > > > On Wed, 7 Aug 2024, David E. Box wrote:
> > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote:
> > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote:
> > > > > >
> > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道:
> > > > > > > >
> > > > > > > > Hi Jian-Hong,
> > > > > > > >
> > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote:
> > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道:
> > > > > > > > > >
> > > > > > > > > > Currently, when enable link's L1.2 features with
> > > > > > > > > > __pci_enable_link_state(),
> > > > > > > > > > it configs the link directly without ensuring related L1.2 parameters,
> > > > > > > > > > such
> > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have
> > > > > > > > > > been
> > > > > > > > > > programmed.
> > > > > > > > > >
> > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device
> > > > > > > > > > gets
> > > > > > > > > > wrong configs when a caller tries to enabled it.
> > > > > > > > > >
> > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD:
> > > > > > > > > >
> > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor
> > > > > > > > > > PCIe
> > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode])
> > > > > > > > > > ...
> > > > > > > > > > Capabilities: [200 v1] L1 PM Substates
> > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > > > > > > > > > L1_PM_Substates+
> > > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns
> > > > > > > > > > L1SubCtl2: T_PwrOn=50us
> > > > > > > > > >
> > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue
> > > > > > > > > > SN550
> > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express])
> > > > > > > > > > ...
> > > > > > > > > > Capabilities: [900 v1] L1 PM Substates
> > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > > L1_PM_Substates+
> > > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > > > > > > > > L1SubCtl2: T_PwrOn=10us
> > > > > > > > > >
> > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the
> > > > > > > > > > PCIe
> > > > > > > > > > Root Port and the child NVMe, they should be programmed with the same
> > > > > > > > > > LTR1.2_Threshold value. However, they have different values in this
> > > > > > > > > > case.
> > > > > > > > > >
> > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly
> > > > > > > > > > before
> > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in
> > > > > > > > > > __pci_enable_link_state().
> > > > > > > > > >
> > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394
> > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
> > > > > > > > > > ---
> > > > > > > > > > v2:
> > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates
> > > > > > > > > >
> > > > > > > > > > v3:
> > > > > > > > > > - Only enable supported features for the L1 Substates part
> > > > > > > > > >
> > > > > > > > > > v4:
> > > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole
> > > > > > > > > > L1SS
> > > > > > > > > >
> > > > > > > > > > v5:
> > > > > > > > > > - Fix typo and commit message
> > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce
> > > > > > > > > > aspm_get_l1ss_cap()"
> > > > > > > > > >
> > > > > > > > > > v6:
> > > > > > > > > > - Skipped
> > > > > > > > > >
> > > > > > > > > > v7:
> > > > > > > > > > - Pick back and rebase on the new version kernel
> > > > > > > > > > - Drop the link state flag check. And, always config link state's
> > > > > > > > > > timing
> > > > > > > > > > parameters
> > > > > > > > > >
> > > > > > > > > > v8:
> > > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move
> > > > > > > > > > getting the link's parent and child devices after check the link is
> > > > > > > > > > not NULL. This avoids NULL memory access.
> > > > > > > > > >
> > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++
> > > > > > > > > > 1 file changed, 15 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644
> > > > > > > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > > > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state);
> > > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state,
> > > > > > > > > > bool
> > > > > > > > > > locked)
> > > > > > > > > > {
> > > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap;
> > > > > > > > > > + struct pci_dev *parent, *child;
> > > > > > > > > >
> > > > > > > > > > if (!link)
> > > > > > > > > > return -EINVAL;
> > > > > > > > > > +
> > > > > > > > > > + parent = link->pdev;
> > > > > > > > > > + child = link->downstream;
> > > > > > > > > > +
> > > > > > > > > > /*
> > > > > > > > > > * A driver requested that ASPM be enabled on this device, but
> > > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI
> > > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct
> > > > > > > > > > pci_dev
> > > > > > > > > > *pdev, int state, bool locked)
> > > > > > > > > > if (!locked)
> > > > > > > > > > down_read(&pci_bus_sem);
> > > > > > > > > > mutex_lock(&aspm_lock);
> > > > > > > > > > + /*
> > > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times,
> > > > > > > > > > T_POWER_ON and
> > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable
> > > > > > > > > > bits for
> > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4.
> > > > > > > > > > + */
> > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent);
> > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child);
> > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > > > > > >
> > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters
> > > > > > > > especially when know the calculation was done but was cleared by
> > > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after
> > > > > > > > pci_bus_reset() in vmd.c?
> > > > > > >
> > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset()
> > > > > > > before. It is an interesting direction.
> > > > > > >
> > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4]
> > > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates",
> > > > > > > too. Then, both the PCIe bridge and the PCIe device have the same
> > > > > > > LTR_L1.2_THRESHOLD 101376ns as expected.
> > > > > > >
> > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > > > > index bbf4a47e7b31..6b8dd4f30127 100644
> > > > > > > --- a/drivers/pci/controller/vmd.c
> > > > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct
> > > > > > > pci_host_bridge *root_bridge,
> > > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc;
> > > > > > > }
> > > > > > >
> > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata)
> > > > > > > +{
> > > > > > > + pci_save_state(pdev);
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata)
> > > > > > > +{
> > > > > > > + pci_restore_state(pdev);
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > /*
> > > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> > > > > > > */
> > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > > unsigned long features)
> > > > > > > pci_scan_child_bus(vmd->bus);
> > > > > > > vmd_domain_reset(vmd);
> > > > > > >
> > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL);
> > > > > > > /* When Intel VMD is enabled, the OS does not discover the Root
> > > > > > > Ports
> > > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus()
> > > > > > > applies
> > > > > > > * a reset to the parent of the PCI device supplied as argument.
> > > > > > > This
> > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd,
> > > > > > > unsigned long features)
> > > > > > > break;
> > > > > > > }
> > > > > > > }
> > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL);
> > > > > >
> > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in
> > > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()?
> > > > > >
> > > > > > (Don't tell me naming of these functions is a horrible mess. :-/)
> > > > >
> > > > > Hmm. So this *is* calling pci_reset_bus().
> > > >
> > > > Yeah, I managed to get confused by the names myself, I somehow
> > > > ended up thinking it calls pci_bus_reset() which is not correct...
> > > >
> > > > > L1.2 configuration has specific
> > > > > ordering requirements for changes to parent & child devices. Could be why it's
> > > > > not getting restored properly.
> > > >
> > > > Indeed, it has to be something else since the patch above doesn't even
> > > > restore anything because dev->state_saved should get set to false by the
> > > > first pci_restore_state() called from
> > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I
> > > > think!?
> > >
> > > Inspired by Ilpo's comment. I add some debug messages based on
> > > linux-next's tag 'next-20240809' to understand the code path of
> > > pci_reset_bus():
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index ffaaca0978cb..3ee71374f1de 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > > * races with ->remove() by the device lock, which must be held by
> > > * the caller.
> > > */
> > > - if (err_handler && err_handler->reset_prepare)
> > > + if (err_handler && err_handler->reset_prepare) {
> > > + pci_info(dev, "%s: %pF\n", __func__,
> > > err_handler->reset_prepare);
> > > err_handler->reset_prepare(dev);
> > > + }
> > >
> > > /*
> > > * Wake-up device prior to save. PM registers default to D0 after
> > > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> > > pci_set_power_state(dev, PCI_D0);
> > >
> > > pci_save_state(dev);
> > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__,
> > > dev->state_saved ? "true" : "false");
> > > /*
> > > * Disable the device by clearing the Command register, except for
> > > * INTx-disable which is set. This not only disables MMIO and I/O port
> > > @@ -5655,6 +5658,10 @@ static void
> > > pci_bus_save_and_disable_locked(struct pci_bus *bus)
> > > struct pci_dev *dev;
> > >
> > > list_for_each_entry(dev, &bus->devices, bus_list) {
> > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > > + __func__,
> > > + dev->state_saved ? "true" : "false",
> > > + dev->subordinate ? "has" : "does not have");
> > > pci_dev_save_and_disable(dev);
> > > if (dev->subordinate)
> > > pci_bus_save_and_disable_locked(dev->subordinate);
> > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus)
> > > struct pci_dev *dev;
> > >
> > > list_for_each_entry(dev, &bus->devices, bus_list) {
> > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n",
> > > + __func__,
> > > + dev->state_saved ? "true" : "false",
> > > + dev->subordinate ? "has" : "does not have");
> > > pci_dev_restore(dev);
> > > if (dev->subordinate)
> > > pci_bus_restore_locked(dev->subordinate);
> > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe)
> > > if (!bus->self || !pci_bus_resettable(bus))
> > > return -ENOTTY;
> > >
> > > - if (probe)
> > > + if (probe) {
> > > + pci_info(bus->self, "%s: probe is true. So return 0
> > > directly", __func__);
> > > return 0;
> > > + }
> > >
> > > pci_bus_lock(bus);
> > >
> > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > > int rc;
> > >
> > > rc = pci_bus_reset(bus, PCI_RESET_PROBE);
> > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc);
> > > if (rc)
> > > return rc;
> > >
> > > if (pci_bus_trylock(bus)) {
> > > + pci_info(bus->self, "%s: pci_bus_trylock() returns
> > > true\n", __func__);
> > > pci_bus_save_and_disable_locked(bus);
> > > might_sleep();
> > > rc = pci_bridge_secondary_bus_reset(bus->self);
> > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
> > > */
> > > int pci_reset_bus(struct pci_dev *pdev)
> > > {
> > > + pci_info(pdev, "%s: %s", __func__,
> > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false");
> > > return (!pci_probe_reset_slot(pdev->slot)) ?
> > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > > }
> > >
> > > And, have the information of VMD PCIe devices with the built kernel:
> > >
> > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core
> > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal
> > > decode])
> > > ...
> > > Capabilities: [200 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us
> > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > T_CommonMode=0us LTR1.2_Threshold=0ns
> > > L1SubCtl2: T_PwrOn=0us
> > >
> > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD
> > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express])
> > > ...
> > > Capabilities: [900 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
> > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
> > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
> > > T_CommonMode=0us LTR1.2_Threshold=101376ns
> > > L1SubCtl2: T_PwrOn=50us
> > >
> > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the
> > > PCIe bridge has LTR1.2_Threshold=0ns.
> >
> > This is now the other way around as in the original posting that had
> > 0ns for 10000:e1:00.0 ??
> >
> > Is this behavior even consistent or did you e.g. mess up some copy
> > pasting somewhere?
> >
> > > Then, check the dmesg. I notice the debug messages:
> > >
> > > pci 10000:e0:06.0: PCI bridge to [bus e1]
> > > pci 10000:e0:06.0: Primary bus is hard wired to 0
> > > pci 10000:e1:00.0: pci_reset_bus: false
> > > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly
> > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0
> > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true
> > > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is
> > > false, and does not have subordinate
> > > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true
> > > Freeing initrd memory: 75236K
> > > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true,
> > > and does not have subordinate
> > >
> > > So, the code path is:
> > >
> > > vmd_enable_domain()
> > > pci_reset_bus()
> > > __pci_reset_bus()
> > > pci_bus_reset()
> > > pci_bus_save_and_disable_locked()
> > > pci_dev_save_and_disable()
> > > pci_bus_restore_locked()
> > > pci_dev_restore()
> > >
> > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does
> > > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not.
> > > So, PCIe bridge 10000:e0:06.0 does not restore state correctly.
> >
> > It should not be necessary to restore the bridge's configuration as it
> > ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13:
> >
> > "Port configuration registers must not be changed, except as required to
> > update Port status."
> >
> > While the second part wording leaves some leeway, I don't think any of
> > these field really fall under "Port status".
> >
> > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's
> > > bus in vmd_enable_domain().
> > > * Bus "e1" has only NVMe 10000:e1:00.0
> > > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe
> > > 10000:e1:00.0 and SATA Controller 10000:e0:17.0.
> >
> > ...But even if those registers on the PCIe bridge were changed underneath
> > against the spec, it's not clear from your debug log why pci_dev_restore()
> > -> pci_restore_state() -> pci_restore_pcie_state() ->
> > pci_restore_aspm_l1ss_state() did not restore also parent's
> > LTR1.2_Threshold?? I think it should attempt to do that.
> >
> > --
> > i.
> >
> > > Here is the PCI tree:
> > >
> > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04
> > > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4]
> > > | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning
> > > Processor Participant
> > > | +-06.0 Intel Corporation RST VMD Managed Controller
> > > | +-07.0-[01-2b]--
> > > | +-08.0 Intel Corporation GNA Scoring Accelerator module
> > > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver
> > > | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB
> > > Controller
> > > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0
> > > | +-0e.0 Intel Corporation Volume Management Device NVMe
> > > RAID Controller
> > > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1
> > > xHCI Host Controller
> > > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM
> > > | +-14.3 Intel Corporation Wi-Fi 6 AX201
> > > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0
> > > | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1
> > > | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface
> > > | +-17.0 Intel Corporation RST VMD Managed Controller
> > > | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller
> > > | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound
> > > Technology Audio Controller
> > > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller
> > > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller
> > > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V
> > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD
> > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller
> > >
> > > According the findings above, to ensure the devices on the VMD bus
> > > have correctly states, seems pci_save_state() all the devices before
> > > pci_reset_bus(), and pci_restore_state() all the devices after
> > > pci_reset_bus() is the correct answer.
>
> I add some debug messages based on v6.11 as following:
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index cee2365e54b8..404ce92f0152 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -109,21 +109,28 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev)
> u32 cl_ctl1, cl_ctl2, cl_l1_2_enable;
> u16 clnkctl, plnkctl;
>
> + pci_info(pdev, "%s", __func__);
> /*
> * In case BIOS enabled L1.2 when resuming, we need to disable it first
> * on the downstream component before the upstream. So, don't attempt to
> * restore either until we are at the downstream component.
> */
> - if (pcie_downstream_port(pdev) || !parent)
> + if (pcie_downstream_port(pdev) || !parent) {
> + pci_info(pdev, "%s: %s", __func__,
> pcie_downstream_port(pdev) ? "is downstream port" : "not parent");
> return;
> + }
>
> - if (!pdev->l1ss || !parent->l1ss)
> + if (!pdev->l1ss || !parent->l1ss) {
> + pci_info(pdev, "%s: %sdoes not have l1ss", __func__,
> !pdev->l1ss ? "" : "parent ");
> return;
> + }
>
> cl_save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_L1SS);
> pl_save_state = pci_find_saved_ext_cap(parent, PCI_EXT_CAP_ID_L1SS);
> - if (!cl_save_state || !pl_save_state)
> + if (!cl_save_state || !pl_save_state) {
> + pci_info(pdev, "%s: %sdid not save ext_cap", __func__,
> !cl_save_state ? "" : "parent ");
> return;
> + }
>
> cap = &cl_save_state->cap.data[0];
> cl_ctl2 = *cap++;
> @@ -131,6 +138,7 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev)
> cap = &pl_save_state->cap.data[0];
> pl_ctl2 = *cap++;
> pl_ctl1 = *cap;
> + pci_info(pdev, "%s: cl_ctl1: 0x%08x, cl_ctl2: 0x%08x, pl_ctl1:
> 0x%08x, pl_ctl2: 0x%08x", __func__, cl_ctl1, cl_ctl2, pl_ctl1,
> pl_ctl2);
>
> /* Make sure L0s/L1 are disabled before updating L1SS config */
> pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &clnkctl);
>
> Here is the corresponding log:
>
> [ 0.418931] pci 10000:e0:06.0: PCI bridge to [bus e1]
> [ 0.418936] pci 10000:e0:06.0: Primary bus is hard wired to 0
> [ 0.447474] Freeing initrd memory: 24700K
> [ 0.670789] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state
> [ 0.670807] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state:
> cl_ctl1: 0x40630000, cl_ctl2: 0x00000029, pl_ctl1: 0x00000000,
> pl_ctl2: 0x00000000
> [ 0.670862] pci 10000:e0:06.0: bridge window [mem
> 0x76000000-0x760fffff]: assigned
> [ 0.670864] pci 10000:e0:17.0: BAR 0 [mem 0x76100000-0x76101fff]: assigned
> [ 0.670881] pci 10000:e0:06.0: bridge window [io size 0x1000]:
> can't assign; no space
> [ 0.670882] pci 10000:e0:06.0: bridge window [io size 0x1000]:
> failed to assign
> [ 0.670884] pci 10000:e0:17.0: BAR 5 [mem 0x76102000-0x761027ff]: assigned
> [ 0.670893] pci 10000:e0:17.0: BAR 1 [mem 0x76102800-0x761028ff]: assigned
> [ 0.670896] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: can't
> assign; no space
> [ 0.670897] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: failed to assign
> [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: can't
> assign; no space
> [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: failed to assign
> [ 0.670899] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: can't
> assign; no space
> [ 0.670900] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: failed to assign
> [ 0.670901] pci 10000:e1:00.0: BAR 0 [mem 0x76000000-0x76003fff
> 64bit]: assigned
> [ 0.670909] pci 10000:e1:00.0: BAR 4 [mem 0x76004000-0x760040ff
> 64bit]: assigned
> [ 0.670918] pci 10000:e0:06.0: PCI bridge to [bus e1]
> [ 0.670921] pci 10000:e0:06.0: bridge window [mem 0x76000000-0x760fffff]
> [ 0.670950] pci 10000:e1:00.0: VMD: Default LTR value set by driver
> [ 0.670977] pcieport 10000:e0:06.0: can't derive routing for PCI INT D
> [ 0.670979] pcieport 10000:e0:06.0: PCI INT D: no GSI
> [ 0.671043] pcieport 10000:e0:06.0: PME: Signaling with IRQ 143
> [ 0.671092] vmd 0000:00:0e.0: Bound to PCI domain 10000
>
> We can notice both pl_ctl1 and pl_ctl2 are 0x0. Because, the link's
> parent device (PCIe bridge 10000:e0:06.0) did not save state before
> reset. So, pci_restore_aspm_l1ss_state() restores parent's
> LTR1.2_Threshold with a wrong value 0.
Okay, this is very useful finding because it leads us to the root cause,
which is the disparity between pci_save_aspm_l1ss_state() and
pci_restore_aspm_l1ss_state(). The latter restores both dev and its
parent, yet the save state side only saves the ASPM state for the dev
itself, not for the parent.
--
i.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-09-23 13:44 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19 7:52 [PATCH v8 0/4] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe Jian-Hong Pan
2024-07-19 7:55 ` [PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates Jian-Hong Pan
2024-07-19 7:57 ` [PATCH v8 2/4] PCI/ASPM: Add notes about enabling PCI-PM L1SS to pci_enable_link_state(_locked) Jian-Hong Pan
2024-07-19 7:59 ` [PATCH v8 3/4] PCI/ASPM: Introduce aspm_get_l1ss_cap() Jian-Hong Pan
2024-07-19 8:02 ` [PATCH v8 4/4] PCI/ASPM: Fix L1.2 parameters when enable link state Jian-Hong Pan
2024-08-02 8:24 ` Jian-Hong Pan
2024-08-05 18:24 ` Nirmal Patel
2024-08-07 4:23 ` Jian-Hong Pan
2024-08-05 20:26 ` David E. Box
2024-08-07 10:05 ` Jian-Hong Pan
2024-08-07 11:18 ` Ilpo Järvinen
2024-08-07 23:27 ` David E. Box
2024-08-08 9:48 ` Ilpo Järvinen
2024-08-12 8:18 ` Jian-Hong Pan
2024-09-02 15:43 ` Ilpo Järvinen
2024-09-03 10:31 ` Jian-Hong Pan
2024-09-23 8:41 ` Jian-Hong Pan
2024-09-23 13:44 ` Ilpo Järvinen
2024-09-03 15:17 ` Nirmal Patel
2024-09-04 3:56 ` Jian-Hong Pan
2024-09-20 16:03 ` Nirmal Patel
2024-09-23 10:57 ` Jian-Hong Pan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).