Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Vidya Sagar <vidyas@nvidia.com>
To: <bhelgaas@google.com>
Cc: <vsethi@nvidia.com>, <sdonthineni@nvidia.com>,
	<kthota@nvidia.com>, <sagar.tv@gmail.com>,
	<linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Vidya Sagar <vidyas@nvidia.com>
Subject: [PATCH] PCI: Save/restore Device 3 control and clear stale 14-bit Tag enables
Date: Tue, 12 May 2026 22:00:12 +0530	[thread overview]
Message-ID: <20260512163012.2336191-1-vidyas@nvidia.com> (raw)

The Device 3 Extended Capability (DEV3_CTL) lets software enable a
device's 14-Bit Tag Requester and 14-Bit Tag Completer behavior.
Those enable bits, however, are only meaningful while the link
operates in flit mode; in non-flit mode the upper 4 tag bits are not
transmitted on the wire, so a requester that still has them enabled
emits TLPs whose completions the requester can no longer match. The
visible symptom on the first downstream TLP after such a reset is
Completion Timeout plus Unexpected Completion on the root/switch
port.

The PCI core currently does not save/restore DEV3_CTL, and nothing
re-evaluates the 14-Bit Tag enable bits when the link mode changes.
Across a Secondary Bus Reset, DPC trigger/release, slot reset, AER
bus reset, FLR, or D3cold->D0 bridge resume the link can come back
in non-flit mode while DEV3_CTL still advertises 14-bit tagging,
which breaks the very first config read the kernel issues to the
endpoint.

Address this with two complementary pieces of work in one patch:

1. Save and restore DEV3_CTL across pci_save_state() /
   pci_restore_state().  Allocate the save buffer in pci_dev3_init()
   for every PCIe device that exposes the Device 3 Extended
   Capability, save DEV3_CTL only (DEV3_STA is RW1C and not
   restored), and on restore sanitize the saved value: if the device
   advertises 14-Bit Tag support but flit mode is no longer active
   (checked via the live LNKSTA2.Flit_Mode and DEV3_STA.Segment
   Captured), drop PCI_DEV3_CTL_14BIT_TAG_REQ_EN and
   PCI_DEV3_CTL_14BIT_TAG_COMP_EN before writing DEV3_CTL back.

   This covers every path that goes through pci_dev_restore(),
   including the endpoint side of SBR/slot reset/D3cold resume.

2. Add pci_bridge_refresh_14bit_tag() and call it from the two
   universal choke points where the link mode can change but
   pci_dev_restore() does not necessarily run on the bridge itself:

     - pci_bridge_wait_for_secondary_bus() runs after every family
       of bridge-mediated reset (SBR via pci_bridge_secondary_bus_
       reset(), DPC release via dpc_reset_link(), AER bus reset via
       pci_bus_error_reset(), slot reset via pciehp_reset_slot(),
       and D3cold->D0 resume via pci_pm_bridge_power_up_actions()).
       The helper is called there immediately after the link is
       known active but before pci_dev_wait() issues the first
       downstream config read, so the requester is fixed up before
       any TLP is sent.

     - __pcie_update_link_speed() is where bus->flit_mode is
       authoritatively updated whenever the kernel observes a link
       change: initial enumeration, manual pcie_retrain_link()
       (ASPM common-clock, target-speed change), bwctrl IRQ from
       autonomous HW speed changes, and the pciehp link-status
       poll.  Calling the helper there covers retrain paths that
       never go through pci_bridge_wait_for_secondary_bus().

   The helper re-reads the bridge's own LNKSTA2.Flit_Mode, fixes the
   bridge's DEV3_CTL first (because the bridge is the requester for
   outbound config/MMIO TLPs and is what produces the post-reset
   error storm), refreshes bus->flit_mode, then walks the
   subordinate bus to fix every device that advertises 14-Bit Tag
   support.  All bridge accesses are to the bridge's own config
   space on the primary bus, so the fix runs safely before any
   potentially-broken downstream TLP.  Operation is idempotent:
   hardware is touched only when the enable bits are set and flit
   is no longer active.

   To call the helper from __pcie_update_link_speed(), move that
   function's body from a static inline in drivers/pci/pci.h to an
   ordinary out-of-line definition in drivers/pci/probe.c (pure
   refactor, no functional change for existing callers).

Also add the missing DEV3 14-Bit Tag bit names to
include/uapi/linux/pci_regs.h:
PCI_DEV3_CAP_14BIT_TAG_{COMP,REQ} and
PCI_DEV3_CTL_14BIT_TAG_{COMP,REQ}_EN.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
 drivers/pci/pci.c             | 213 ++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h             |  17 +--
 drivers/pci/probe.c           |  32 +++++
 include/uapi/linux/pci_regs.h |   4 +
 4 files changed, 254 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0be4e263bca0..ad1a282a4e63 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1682,6 +1682,199 @@ static void pci_restore_pcie_state(struct pci_dev *dev)
 	pcie_capability_write_word(dev, PCI_EXP_SLTCTL2, cap[i++]);
 }
 
+static int pci_save_dev3_state(struct pci_dev *dev)
+{
+	struct pci_cap_saved_state *save_state;
+	u32 *cap;
+	int pos;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DEV3);
+	if (!pos)
+		return 0;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DEV3);
+	if (!save_state)
+		return -ENOMEM;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	pci_read_config_dword(dev, pos + PCI_DEV3_CTL, &cap[0]);
+
+	return 0;
+}
+
+static void pci_restore_dev3_state(struct pci_dev *dev)
+{
+	struct pci_cap_saved_state *save_state;
+	u32 *cap, val, dev3_cap, dev3_sta;
+	u16 lnksta2 = 0;
+	bool flit_now;
+	int pos;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DEV3);
+	if (!pos)
+		return;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DEV3);
+	if (!save_state)
+		return;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	val = cap[0];
+
+	/*
+	 * The 14-Bit Tag enable bits in DEV3_CTL are only valid in flit
+	 * mode.  On devices that advertise 14-Bit Tag support, sanitize
+	 * the saved value before writing it back, so that callers that
+	 * issue further TLPs through this device after restore see a
+	 * coherent enable state.  On other devices (and for any other
+	 * DEV3_CTL bits, including future architected or vendor-specific
+	 * ones), the saved value is written back unchanged.
+	 *
+	 * Note: bridge-side and link-event paths are handled separately by
+	 * pci_bridge_refresh_14bit_tag(), which runs from
+	 * pci_bridge_wait_for_secondary_bus() and __pcie_update_link_speed()
+	 * and clears the bits directly in hardware as soon as the link is
+	 * observed to leave flit mode.  This function's responsibility is
+	 * narrowed to the save-buffer-restore path.
+	 */
+	pci_read_config_dword(dev, pos + PCI_DEV3_CAP, &dev3_cap);
+	if (dev3_cap & (PCI_DEV3_CAP_14BIT_TAG_REQ |
+			PCI_DEV3_CAP_14BIT_TAG_COMP)) {
+		/*
+		 * Re-check link state here too: pci_restore_state() may run
+		 * on paths where the link has changed mode but
+		 * pci_bridge_refresh_14bit_tag() has not yet been called for
+		 * this device.  Check both LNKSTA2.Flit_Mode (link-level)
+		 * and DEV3_STA.Segment Captured (end-to-end); both must be
+		 * active for 14-bit tags.  Refresh bus->flit_mode and
+		 * dev->fm_enabled in lock-step.
+		 */
+		pci_read_config_dword(dev, pos + PCI_DEV3_STA, &dev3_sta);
+		dev->fm_enabled = !!(dev3_sta & PCI_DEV3_STA_SEGMENT);
+
+		pcie_capability_read_word(dev, PCI_EXP_LNKSTA2, &lnksta2);
+		flit_now = !!(lnksta2 & PCI_EXP_LNKSTA2_FLIT);
+		if (dev->bus)
+			dev->bus->flit_mode = flit_now;
+
+		if ((!dev->fm_enabled || !flit_now) &&
+		    (val & (PCI_DEV3_CTL_14BIT_TAG_REQ_EN |
+			    PCI_DEV3_CTL_14BIT_TAG_COMP_EN))) {
+			val &= ~(PCI_DEV3_CTL_14BIT_TAG_REQ_EN |
+				 PCI_DEV3_CTL_14BIT_TAG_COMP_EN);
+			cap[0] = val;
+			pci_info(dev, "clearing 14-bit tag enable: flit mode no longer active (LNKSTA2=%#06x, DEV3_STA=%#010x)\n",
+				 lnksta2, dev3_sta);
+		}
+	}
+
+	pci_write_config_dword(dev, pos + PCI_DEV3_CTL, val);
+}
+
+/*
+ * Clear DEV3_CTL.14-Bit Tag {Req,Comp} Enable on @dev if flit mode is
+ * no longer active.  Touches only @dev's own config space, so it is safe
+ * to call on a bridge before the first downstream TLP is issued after a
+ * reset.
+ *
+ * 14-Bit Tag enables are only meaningful in flit mode.  If the link came
+ * back as non-flit (e.g. after SBR, DPC, slot reset, or D3cold resume),
+ * a requester that still has these bits set will emit TLPs the completer
+ * cannot match, producing Completion Timeout plus Unexpected Completion
+ * on the first transaction.
+ */
+static void __pci_dev_clear_stale_14bit_tag(struct pci_dev *dev, bool flit_now)
+{
+	u32 dev3_cap, dev3_ctl, dev3_sta;
+	int pos;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DEV3);
+	if (!pos)
+		return;
+
+	pci_read_config_dword(dev, pos + PCI_DEV3_CAP, &dev3_cap);
+	if (!(dev3_cap & (PCI_DEV3_CAP_14BIT_TAG_REQ |
+			  PCI_DEV3_CAP_14BIT_TAG_COMP)))
+		return;
+
+	pci_read_config_dword(dev, pos + PCI_DEV3_STA, &dev3_sta);
+	dev->fm_enabled = !!(dev3_sta & PCI_DEV3_STA_SEGMENT);
+
+	if (flit_now && dev->fm_enabled)
+		return;
+
+	pci_read_config_dword(dev, pos + PCI_DEV3_CTL, &dev3_ctl);
+	if (!(dev3_ctl & (PCI_DEV3_CTL_14BIT_TAG_REQ_EN |
+			  PCI_DEV3_CTL_14BIT_TAG_COMP_EN)))
+		return;
+
+	dev3_ctl &= ~(PCI_DEV3_CTL_14BIT_TAG_REQ_EN |
+		      PCI_DEV3_CTL_14BIT_TAG_COMP_EN);
+	pci_write_config_dword(dev, pos + PCI_DEV3_CTL, dev3_ctl);
+	pci_info(dev, "cleared 14-bit Tag enable: flit mode no longer active (DEV3_STA=%#010x)\n",
+		 dev3_sta);
+}
+
+/**
+ * pci_bridge_refresh_14bit_tag - Drop stale 14-Bit Tag enables across a link
+ * @bridge: PCIe bridge whose link may have changed mode
+ *
+ * Re-evaluate the bridge's own DEV3_CTL 14-Bit Tag enables against the
+ * live LNKSTA2.Flit_Mode, then walk the bridge's subordinate bus and do
+ * the same for every device that advertises 14-Bit Tag support.  Also
+ * refresh bus->flit_mode so the rest of the PCI core sees a consistent
+ * view of the link.
+ *
+ * Called from every kernel-visible link state change site:
+ *   - pci_bridge_wait_for_secondary_bus() (covers SBR, DPC release, slot
+ *     reset, AER bus reset, bridge D3cold->D0 resume).
+ *   - __pcie_update_link_speed() (covers manual retrain, bwctrl IRQ,
+ *     hotplug link status check, initial enumeration).
+ *
+ * Safe to call repeatedly; only writes hardware when the enable bits are
+ * set and flit mode is no longer active.
+ */
+void pci_bridge_refresh_14bit_tag(struct pci_dev *bridge)
+{
+	struct pci_bus *bus;
+	struct pci_dev *child;
+	u16 lnksta2 = 0;
+	bool flit_now;
+
+	if (!bridge || !pci_is_pcie(bridge))
+		return;
+
+	pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &lnksta2);
+	flit_now = !!(lnksta2 & PCI_EXP_LNKSTA2_FLIT);
+
+	/*
+	 * Fix the bridge itself first.  The bridge is the requester for
+	 * outbound config/MMIO TLPs, so stale 14-bit Tag enables here are
+	 * what produce the post-reset Completion Timeout / Unexpected
+	 * Completion failure.
+	 */
+	__pci_dev_clear_stale_14bit_tag(bridge, flit_now);
+
+	bus = bridge->subordinate;
+	if (!bus)
+		return;
+
+	bus->flit_mode = flit_now;
+
+	/*
+	 * Walk the secondary bus.  pci_restore_dev3_state() only fires on
+	 * paths that go through pci_dev_restore(); DPC release, hotplug
+	 * link status updates, and similar paths do not.  Fix those too.
+	 */
+	down_read(&pci_bus_sem);
+	list_for_each_entry(child, &bus->devices, bus_list)
+		__pci_dev_clear_stale_14bit_tag(child, flit_now);
+	up_read(&pci_bus_sem);
+}
+
 static int pci_save_pcix_state(struct pci_dev *dev)
 {
 	int pos;
@@ -1738,6 +1931,10 @@ int pci_save_state(struct pci_dev *dev)
 	if (i != 0)
 		return i;
 
+	i = pci_save_dev3_state(dev);
+	if (i != 0)
+		return i;
+
 	i = pci_save_pcix_state(dev);
 	if (i != 0)
 		return i;
@@ -1815,6 +2012,7 @@ static void pci_restore_config_space(struct pci_dev *pdev)
 void pci_restore_state(struct pci_dev *dev)
 {
 	pci_restore_pcie_state(dev);
+	pci_restore_dev3_state(dev);
 	pci_restore_pasid_state(dev);
 	pci_restore_pri_state(dev);
 	pci_restore_ats_state(dev);
@@ -4745,6 +4943,14 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
 
+		/*
+		 * The link has had a chance to come back; refresh the
+		 * bridge's (and subtree's) DEV3_CTL 14-Bit Tag enables
+		 * against the live LNKSTA2.Flit_Mode before we issue the
+		 * first config TLP to the child.
+		 */
+		pci_bridge_refresh_14bit_tag(dev);
+
 		if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay))
 			return 0;
 
@@ -4772,6 +4978,13 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 		return -ENOTTY;
 	}
 
+	/*
+	 * Link is up.  Refresh the bridge's (and subtree's) DEV3_CTL
+	 * 14-Bit Tag enables against the live LNKSTA2.Flit_Mode before
+	 * we issue the first config TLP to the child below.
+	 */
+	pci_bridge_refresh_14bit_tag(dev);
+
 	return pci_dev_wait(child, reset_type,
 			    PCIE_RESET_READY_POLL_MS - delay);
 }
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 19660d068fb7..a47a0f3ad2e2 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -633,21 +633,14 @@ enum pcie_link_change_reason {
 	PCIE_HOTPLUG,
 };
 
-static inline void __pcie_update_link_speed(struct pci_bus *bus,
-					    enum pcie_link_change_reason reason,
-					    u16 linksta, u16 linksta2)
-{
-	bus->cur_bus_speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
-	bus->flit_mode = (linksta2 & PCI_EXP_LNKSTA2_FLIT) ? 1 : 0;
-
-	trace_pcie_link_event(bus,
-			     reason,
-			     FIELD_GET(PCI_EXP_LNKSTA_NLW, linksta),
-			     linksta & PCI_EXP_LNKSTA_LINK_STATUS_MASK);
-}
+void __pcie_update_link_speed(struct pci_bus *bus,
+			      enum pcie_link_change_reason reason,
+			      u16 linksta, u16 linksta2);
 
 void pcie_update_link_speed(struct pci_bus *bus, enum pcie_link_change_reason reason);
 
+void pci_bridge_refresh_14bit_tag(struct pci_dev *bridge);
+
 /* Single Root I/O Virtualization */
 struct pci_sriov {
 	int		pos;		/* Capability position */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 748c7a198262..27eb97b22c36 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -826,6 +826,30 @@ const char *pci_speed_string(enum pci_bus_speed speed)
 }
 EXPORT_SYMBOL_GPL(pci_speed_string);
 
+void __pcie_update_link_speed(struct pci_bus *bus,
+			      enum pcie_link_change_reason reason,
+			      u16 linksta, u16 linksta2)
+{
+	bus->cur_bus_speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
+	bus->flit_mode = (linksta2 & PCI_EXP_LNKSTA2_FLIT) ? 1 : 0;
+
+	trace_pcie_link_event(bus,
+			      reason,
+			      FIELD_GET(PCI_EXP_LNKSTA_NLW, linksta),
+			      linksta & PCI_EXP_LNKSTA_LINK_STATUS_MASK);
+
+	/*
+	 * Re-evaluate DEV3_CTL 14-Bit Tag enables on this bridge and its
+	 * subordinate bus.  Any time bus->flit_mode is updated, the link
+	 * has just changed state; if flit mode is no longer active, the
+	 * bridge and downstream devices must drop their 14-Bit Tag enables
+	 * before further TLPs are issued, or the requester (the bridge)
+	 * will tag config/MMIO requests with 14-bit tags that the
+	 * completer can no longer echo back in non-flit mode.
+	 */
+	pci_bridge_refresh_14bit_tag(bus->self);
+}
+
 void pcie_update_link_speed(struct pci_bus *bus,
 			    enum pcie_link_change_reason reason)
 {
@@ -2320,11 +2344,19 @@ static void pci_dev3_init(struct pci_dev *pdev)
 {
 	u16 cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_DEV3);
 	u32 val = 0;
+	int err;
 
 	if (!cap)
 		return;
 	pci_read_config_dword(pdev, cap + PCI_DEV3_STA, &val);
 	pdev->fm_enabled = !!(val & PCI_DEV3_STA_SEGMENT);
+
+	/* Save buffer for DEV3_CTL only; DEV3_STA is RW1C and is not restored. */
+	err = pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_DEV3,
+					  sizeof(u32));
+	if (err)
+		pci_warn(pdev,
+			 "unable to preallocate Device 3 save buffer\n");
 }
 
 /**
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 14f634ab9350..6e00dbdf04eb 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1254,7 +1254,11 @@
 
 /* Device 3 Extended Capability */
 #define PCI_DEV3_CAP		0x04	/* Device 3 Capabilities Register */
+#define  PCI_DEV3_CAP_14BIT_TAG_COMP	0x00000004 /* 14-Bit Tag Completer Supported */
+#define  PCI_DEV3_CAP_14BIT_TAG_REQ	0x00000008 /* 14-Bit Tag Requester Supported */
 #define PCI_DEV3_CTL		0x08	/* Device 3 Control Register */
+#define  PCI_DEV3_CTL_14BIT_TAG_COMP_EN	0x00000002 /* 14-Bit Tag Completer Enable */
+#define  PCI_DEV3_CTL_14BIT_TAG_REQ_EN	0x00000004 /* 14-Bit Tag Requester Enable */
 #define PCI_DEV3_STA		0x0c	/* Device 3 Status Register */
 #define  PCI_DEV3_STA_SEGMENT	0x8	/* Segment Captured (end-to-end flit-mode detected) */
 
-- 
2.25.1


             reply	other threads:[~2026-05-12 16:31 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 16:30 Vidya Sagar [this message]
2026-05-12 22:49 ` [PATCH] PCI: Save/restore Device 3 control and clear stale 14-bit Tag enables Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512163012.2336191-1-vidyas@nvidia.com \
    --to=vidyas@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=kthota@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sagar.tv@gmail.com \
    --cc=sdonthineni@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox