Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH] PCI: rzg3s-host: Re-enumerate the bus on PCIe link-state changes
@ 2026-06-18 19:09 John Madieu
  2026-06-18 19:26 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: John Madieu @ 2026-06-18 19:09 UTC (permalink / raw)
  To: claudiu.beznea.uj, lpieralisi, kwilczynski, mani, bhelgaas
  Cc: robh, linux-pci, linux-renesas-soc, linux-kernel, john.madieu,
	biju.das.jz, geert+renesas, John Madieu

The RZ/G3E PCIe controller does not expose the standard PCIe Slot
Capability registers, so the generic pciehp driver cannot be used. The
only link-state signal the hardware provides is the DL_UpDown bit in the
PEIS0 event status register, which is raised on every Data Link layer
up/down transition.

Enable DL_UpDown in PEIE0 and hook up an interrupt handler so the driver
can react to link-state changes: a device that trains after boot gets
enumerated, and a device that disappears on link loss is removed. This
provides hotplug-like behaviour without the PCI hotplug core, which is
unavailable for the reason above.

On a DL_UpDown event the handler acks the W1C status bit and schedules a
worker that inspects PCSTAT1.DL_DOWN_STS:

  - link up: re-run max link speed negotiation, wait for the link to
    settle and pci_rescan_bus() the root bus;
  - link down: walk the bus in reverse and
    pci_stop_and_remove_bus_device() each child.

Both paths take pci_lock_rescan_remove() to serialise against the PCI
core.

Link events are only acted upon once the controller is fully
initialised. A DL_UpDown latched while the registers are not configured,
for example when the event IRQ is used as a system wakeup source during
resume, is acknowledged but does not schedule a rescan. The
hw_initialized flag, set at the end of controller setup and cleared on
suspend, gates this.

While at it, make probe tolerant of an absent device. Previously, if the
link failed to come up during rzg3s_pcie_host_init(), probe tore the
controller back down and failed. Distinguish this case with -ENODEV,
leave the controller and refclk running, and let the link-up path
enumerate the device once it appears.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
---
 drivers/pci/controller/pcie-rzg3s-host.c | 153 +++++++++++++++++++++--
 1 file changed, 143 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/controller/pcie-rzg3s-host.c b/drivers/pci/controller/pcie-rzg3s-host.c
index d86e7516dcc2..5a10422ced2e 100644
--- a/drivers/pci/controller/pcie-rzg3s-host.c
+++ b/drivers/pci/controller/pcie-rzg3s-host.c
@@ -34,6 +34,7 @@
 #include <linux/sizes.h>
 #include <linux/slab.h>
 #include <linux/units.h>
+#include <linux/workqueue.h>
 
 #include "../pci.h"
 
@@ -294,7 +295,12 @@ struct rzg3s_pcie_port {
  * @msi: MSI data structure
  * @port: PCIe Root Port
  * @hw_lock: lock for access to the HW resources
+ * @link_work: work for DL_UpDown link-state change handling
+ * @event_irq: PCIe event interrupt for DL_UpDown detection
  * @intx_irqs: INTx interrupts
+ * @hw_initialized: set once the controller HW is fully initialised; gates
+ *                  DL_UpDown event handling against events latched while
+ *                  the registers are not configured
  * @max_link_speed: maximum supported link speed
  */
 struct rzg3s_pcie_host {
@@ -309,7 +315,10 @@ struct rzg3s_pcie_host {
 	struct rzg3s_pcie_msi msi;
 	struct rzg3s_pcie_port port;
 	raw_spinlock_t hw_lock;
+	struct work_struct link_work;
+	int event_irq;
 	int intx_irqs[PCI_NUM_INTX];
+	bool hw_initialized;
 	int max_link_speed;
 };
 
@@ -575,6 +584,30 @@ static irqreturn_t rzg3s_pcie_msi_irq(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t rzg3s_pcie_event_irq(int irq, void *data)
+{
+	struct rzg3s_pcie_host *host = data;
+	u32 status;
+
+	status = readl_relaxed(host->axi + RZG3S_PCI_PEIS0);
+
+	if (!(status & RZG3S_PCI_PEIS0_DL_UPDOWN))
+		return IRQ_NONE;
+
+	/* Clear the DL_UpDown status (W1C) */
+	writel_relaxed(RZG3S_PCI_PEIS0_DL_UPDOWN, host->axi + RZG3S_PCI_PEIS0);
+
+	/*
+	 * Drop the event until the controller is fully initialised. The
+	 * event IRQ may act as a system wakeup source and fire during
+	 * resume before the HW registers have been reconfigured.
+	 */
+	if (READ_ONCE(host->hw_initialized))
+		schedule_work(&host->link_work);
+
+	return IRQ_HANDLED;
+}
+
 static void rzg3s_pcie_msi_irq_ack(struct irq_data *d)
 {
 	struct rzg3s_pcie_msi *msi = irq_data_get_irq_chip_data(d);
@@ -1107,6 +1140,47 @@ static int rzg3s_pcie_set_max_link_speed(struct rzg3s_pcie_host *host)
 	return ret;
 }
 
+static void rzg3s_pcie_link_work(struct work_struct *work)
+{
+	struct rzg3s_pcie_host *host =
+		container_of(work, struct rzg3s_pcie_host, link_work);
+	struct pci_host_bridge *bridge = pci_host_bridge_from_priv(host);
+	struct pci_bus *bus = bridge->bus;
+	u32 val;
+
+	val = readl_relaxed(host->axi + RZG3S_PCI_PCSTAT1);
+	if (val & RZG3S_PCI_PCSTAT1_DL_DOWN_STS) {
+		struct pci_dev *dev, *tmp;
+
+		dev_info(host->dev, "PCIe link down, removing devices\n");
+
+		pci_lock_rescan_remove();
+		list_for_each_entry_safe_reverse(dev, tmp,
+						 &bus->devices, bus_list)
+			pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	} else {
+		int ret;
+
+		dev_info(host->dev, "PCIe link up, rescanning bus\n");
+
+		/*
+		 * Attempt link speed negotiation now that the link is up.
+		 * Failure is non-fatal: the device works at the negotiated
+		 * speed.
+		 */
+		ret = rzg3s_pcie_set_max_link_speed(host);
+		if (ret)
+			dev_info(host->dev, "Failed to set max link speed\n");
+
+		msleep(PCIE_RESET_CONFIG_WAIT_MS);
+
+		pci_lock_rescan_remove();
+		pci_rescan_bus(bus);
+		pci_unlock_rescan_remove();
+	}
+}
+
 static int rzg3s_pcie_config_init(struct rzg3s_pcie_host *host)
 {
 	struct pci_host_bridge *bridge = pci_host_bridge_from_priv(host);
@@ -1217,8 +1291,8 @@ static void rzg3s_pcie_irq_init(struct rzg3s_pcie_host *host)
 		       RZG3S_PCI_PEIS0_RX_DLLP_PM_ENTER,
 		       host->axi + RZG3S_PCI_PEIS0);
 
-	/* Disable all interrupts */
-	writel_relaxed(0, host->axi + RZG3S_PCI_PEIE0);
+	/* Enable DL_UpDown interrupt for link state change detection */
+	writel_relaxed(RZG3S_PCI_PEIS0_DL_UPDOWN, host->axi + RZG3S_PCI_PEIE0);
 
 	/* Clear all parity and ecc error interrupts */
 	writel_relaxed(~0U, host->axi + RZG3S_PCI_PEIS1);
@@ -1384,16 +1458,21 @@ static int rzg3s_pcie_host_init(struct rzg3s_pcie_host *host)
 				 PCIE_LINK_WAIT_SLEEP_MS * MILLI,
 				 PCIE_LINK_WAIT_SLEEP_MS * MILLI *
 				 PCIE_LINK_WAIT_MAX_RETRIES);
-	if (ret)
-		goto config_deinit_post;
+	if (ret) {
+		/*
+		 * Link is down. Leave the controller running so the
+		 * DL_UpDown handler can enumerate a device that appears
+		 * later.
+		 */
+		dev_info(host->dev, "PCIe link down, waiting for DL_UpDown\n");
+		ret = -ENODEV;
+	}
 
 	val = readl_relaxed(host->axi + RZG3S_PCI_PCSTAT2);
 	dev_info(host->dev, "PCIe link status [0x%x]\n", val);
 
-	return 0;
+	return ret;
 
-config_deinit_post:
-	host->data->config_deinit(host);
 config_deinit_and_refclk:
 	clk_disable_unprepare(host->port.refclk);
 config_deinit:
@@ -1655,8 +1734,15 @@ rzg3s_pcie_host_setup(struct rzg3s_pcie_host *host,
 
 	ret = rzg3s_pcie_host_init(host);
 	if (ret) {
-		dev_err_probe(dev, ret, "Failed to initialize the HW!\n");
-		goto teardown_irqdomain;
+		if (ret != -ENODEV) {
+			dev_err_probe(dev, ret,
+				      "Failed to initialize the HW!\n");
+			goto teardown_irqdomain;
+		}
+
+		/* Link is down: hotplug via DL_UpDown will recover. */
+		WRITE_ONCE(host->hw_initialized, true);
+		return 0;
 	}
 
 	ret = rzg3s_pcie_set_max_link_speed(host);
@@ -1665,6 +1751,8 @@ rzg3s_pcie_host_setup(struct rzg3s_pcie_host *host,
 
 	msleep(PCIE_RESET_CONFIG_WAIT_MS);
 
+	WRITE_ONCE(host->hw_initialized, true);
+
 	return 0;
 
 teardown_irqdomain:
@@ -1682,6 +1770,7 @@ static int rzg3s_pcie_probe(struct platform_device *pdev)
 		of_parse_phandle(np, "renesas,sysc", 0);
 	struct rzg3s_pcie_host *host;
 	struct rzg3s_sysc *sysc;
+	const char *evt_name;
 	int ret;
 
 	bridge = devm_pci_alloc_host_bridge(dev, sizeof(*host));
@@ -1745,6 +1834,7 @@ static int rzg3s_pcie_probe(struct platform_device *pdev)
 		goto rpm_disable;
 
 	raw_spin_lock_init(&host->hw_lock);
+	INIT_WORK(&host->link_work, rzg3s_pcie_link_work);
 
 	ret = rzg3s_pcie_host_setup(host, rzg3s_pcie_init_irqdomain,
 				    rzg3s_pcie_teardown_irqdomain);
@@ -1758,8 +1848,39 @@ static int rzg3s_pcie_probe(struct platform_device *pdev)
 	if (ret)
 		goto host_probe_teardown;
 
+	/*
+	 * Request the PCIe event IRQ at the end of probe to avoid
+	 * spurious link-state events during controller setup and bus
+	 * enumeration. From here on, DL_UpDown events trigger the link
+	 * worker to (re)scan the bus.
+	 */
+	host->event_irq = platform_get_irq_byname(pdev, "pcie_evt");
+	if (host->event_irq < 0) {
+		ret = host->event_irq;
+		goto pci_host_remove;
+	}
+
+	evt_name = devm_kasprintf(dev, GFP_KERNEL, "%s-pcie-evt",
+				  dev_name(dev));
+	if (!evt_name) {
+		ret = -ENOMEM;
+		goto pci_host_remove;
+	}
+
+	ret = request_irq(host->event_irq, rzg3s_pcie_event_irq, 0,
+			  evt_name, host);
+	if (ret) {
+		dev_err_probe(dev, ret, "Failed to request pcie_evt IRQ\n");
+		goto pci_host_remove;
+	}
+
 	return 0;
 
+pci_host_remove:
+	pci_lock_rescan_remove();
+	pci_stop_root_bus(bridge->bus);
+	pci_remove_root_bus(bridge->bus);
+	pci_unlock_rescan_remove();
 host_probe_teardown:
 	rzg3s_pcie_teardown_irqdomain(host);
 	host->data->config_deinit(host);
@@ -1789,9 +1910,19 @@ static int rzg3s_pcie_suspend_noirq(struct device *dev)
 	struct rzg3s_sysc *sysc = host->sysc;
 	int ret;
 
+	/*
+	 * Stop accepting DL_UpDown events, then drain any worker that may
+	 * already be running, before tearing the controller down.
+	 */
+	WRITE_ONCE(host->hw_initialized, false);
+	cancel_work_sync(&host->link_work);
+
 	ret = pm_runtime_put_sync(dev);
-	if (ret)
+	if (ret) {
+		/* Suspend aborted; keep handling DL_UpDown events. */
+		WRITE_ONCE(host->hw_initialized, true);
 		return ret;
+	}
 
 	clk_disable_unprepare(port->refclk);
 
@@ -1822,6 +1953,8 @@ static int rzg3s_pcie_suspend_noirq(struct device *dev)
 refclk_restore:
 	clk_prepare_enable(port->refclk);
 	pm_runtime_resume_and_get(dev);
+	/* Controller is alive again; resume DL_UpDown handling. */
+	WRITE_ONCE(host->hw_initialized, true);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] PCI: rzg3s-host: Re-enumerate the bus on PCIe link-state changes
  2026-06-18 19:09 [PATCH] PCI: rzg3s-host: Re-enumerate the bus on PCIe link-state changes John Madieu
@ 2026-06-18 19:26 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-18 19:26 UTC (permalink / raw)
  To: John Madieu; +Cc: wsa+renesas, linux-pci

Thank you for your contribution! Sashiko AI review found 9 potential issue(s) to consider:

New issues:
- [High] Calling cancel_work_sync() in the suspend_noirq callback can cause a system deadlock and violates atomic context rules.
- [High] The event IRQ handler accesses hardware registers unconditionally, which will crash the system if the device is suspended or unclocked.
- [High] Link state change events that occur while the system is suspended are intentionally dropped but never recovered on resume, permanently desynchronizing the driver's state.
- [High] Devices physically unplugged during system suspend are not removed from the bus, resulting in zombie devices.
- [High] Unprotected Read-Modify-Write of Root Port configuration space races with generic PCI subsystem accesses.

Pre-existing issues:
- [High] Calling pm_runtime_put_sync() in suspend_noirq unconditionally aborts system suspend.
- [High] The rzg3s_pcie_resume_noirq() callback calls sleeping functions while interrupts are disabled.
- [High] Read-modify-write on W1C (Write-1-to-Clear) interrupt status registers silently drops concurrent hardware interrupts.
- [High] The hardware IRQ index calculation assumes contiguous Linux virtual IRQ numbers, leading to unhandled IRQ loops.
--

commit 59f68b406a390ed1dd3041cecba9856005c43783
Author: John Madieu <john.madieu.xa@bp.renesas.com>

PCI: rzg3s-host: Re-enumerate the bus on PCIe link-state changes

The RZ/G3E PCIe controller does not expose the standard PCIe Slot Capability
registers, so the generic pciehp driver cannot be used. The only link-state
signal the hardware provides is the DL_UpDown bit in the PEIS0 event status
register, which is raised on every Data Link layer up/down transition.

> diff --git a/drivers/pci/controller/pcie-rzg3s-host.c b/drivers/pci/controller/pcie-rzg3s-host.c
> index d86e7516dcc29..5a10422ced2e5 100644
> --- a/drivers/pci/controller/pcie-rzg3s-host.c
> +++ b/drivers/pci/controller/pcie-rzg3s-host.c
> @@ -575,6 +584,30 @@ static irqreturn_t rzg3s_pcie_msi_irq(int irq, void *data)
>  	return IRQ_HANDLED;
>  }

[Severity: High]
This is a pre-existing issue, but does read-modify-write on a W1C register
silently drop concurrent hardware interrupts?

In rzg3s_pcie_msi_irq(), rzg3s_pcie_update_bits() is used:

    rzg3s_pcie_update_bits(host->axi, RZG3S_PCI_PINTRCVIS,
                           RZG3S_PCI_PINTRCVIS_MSI,
                           RZG3S_PCI_PINTRCVIS_MSI);

Since RZG3S_PCI_PINTRCVIS is a W1C register, does this read all pending
interrupts as 1s and write them back, clearing all interrupts simultaneously
before their handlers can process them?

> +static irqreturn_t rzg3s_pcie_event_irq(int irq, void *data)
> +{
> +	struct rzg3s_pcie_host *host = data;
> +	u32 status;
> +
> +	status = readl_relaxed(host->axi + RZG3S_PCI_PEIS0);

[Severity: High]
Can this unconditional hardware register read crash the system if the device
is suspended or unclocked?

The pcie_evt IRQ may act as a system wakeup source. If the interrupt fires
during sleep, the handler runs immediately after CPU wakeup, before the PM
core reaches resume_noirq to restore power and clocks to the controller.

Will reading the AXI register RZG3S_PCI_PEIS0 while the block is unclocked
trigger a synchronous external abort? Should the handler verify if the device
is powered before touching its registers?

> +
> +	if (!(status & RZG3S_PCI_PEIS0_DL_UPDOWN))
> +		return IRQ_NONE;
> +
> +	/* Clear the DL_UpDown status (W1C) */
> +	writel_relaxed(RZG3S_PCI_PEIS0_DL_UPDOWN, host->axi + RZG3S_PCI_PEIS0);
> +
> +	/*
> +	 * Drop the event until the controller is fully initialised. The
> +	 * event IRQ may act as a system wakeup source and fire during
> +	 * resume before the HW registers have been reconfigured.
> +	 */
> +	if (READ_ONCE(host->hw_initialized))
> +		schedule_work(&host->link_work);

[Severity: High]
Does dropping these events permanently desynchronize the driver's state?

In rzg3s_pcie_suspend_noirq(), hw_initialized is set to false, which causes
this handler to intentionally drop valid hotplug events without scheduling the
bus rescan work.

During system resume, the hw_initialized flag is restored to true, but the
driver doesn't appear to poll the link state or trigger a manual rescan. Will
devices inserted or removed during the suspend/resume cycle fail to be
enumerated or cleaned up?

[ ... ]

> @@ -1107,6 +1140,47 @@ static int rzg3s_pcie_set_max_link_speed(struct rzg3s_pcie_host *host)
>  	return ret;
>  }
>  
> +static void rzg3s_pcie_link_work(struct work_struct *work)
> +{
[ ... ]
> +		/*
> +		 * Attempt link speed negotiation now that the link is up.
> +		 * Failure is non-fatal: the device works at the negotiated
> +		 * speed.
> +		 */
> +		ret = rzg3s_pcie_set_max_link_speed(host);

[Severity: High]
Does rzg3s_pcie_set_max_link_speed() race with generic PCI subsystem accesses?

The scheduled link_work calls rzg3s_pcie_set_max_link_speed(), which uses the
rzg3s_pcie_update_bits() helper:

    rzg3s_pcie_update_bits(host->pcie, pcie_cap + PCI_EXP_LNKCTL2,
                           PCI_EXP_LNKCTL2_TLS,
                           FIELD_PREP(PCI_EXP_LNKCTL2_TLS, link_speed));

Since this natively modifies the Root Port's configuration space without
holding the PCI subsystem's pci_lock, can concurrent userspace accesses (via
sysfs setpci) or other kernel configuration accesses corrupt the configuration
state?

[ ... ]

> @@ -1655,8 +1734,15 @@ rzg3s_pcie_host_setup(struct rzg3s_pcie_host *host,
>  
>  	ret = rzg3s_pcie_host_init(host);
>  	if (ret) {
> -		dev_err_probe(dev, ret, "Failed to initialize the HW!\n");
> -		goto teardown_irqdomain;
> +		if (ret != -ENODEV) {
> +			dev_err_probe(dev, ret,
> +				      "Failed to initialize the HW!\n");
> +			goto teardown_irqdomain;
> +		}
> +
> +		/* Link is down: hotplug via DL_UpDown will recover. */
> +		WRITE_ONCE(host->hw_initialized, true);
> +		return 0;

[Severity: High]
Can this leave zombie devices on the bus if a device is physically unplugged
during system suspend?

If a device is removed while suspended, rzg3s_pcie_host_init() correctly
returns -ENODEV (link down) during resume. However, by intercepting -ENODEV
and returning 0 here without cleaning up the bus, the stale child devices are
retained.

Because there is no link state transition from up to down at this stage, the
DL_UpDown interrupt will not fire. Will the kernel indefinitely retain the
pci_dev structures for the absent devices and cause failed MMIO accesses?

[ ... ]

> @@ -1789,9 +1910,19 @@ static int rzg3s_pcie_suspend_noirq(struct device *dev)
>  	struct rzg3s_sysc *sysc = host->sysc;
>  	int ret;
>  
> +	/*
> +	 * Stop accepting DL_UpDown events, then drain any worker that may
> +	 * already be running, before tearing the controller down.
> +	 */
> +	WRITE_ONCE(host->hw_initialized, false);
> +	cancel_work_sync(&host->link_work);

[Severity: High]
Does calling cancel_work_sync() here cause a system deadlock and violate
atomic context rules?

The suspend_noirq callback executes with local CPU interrupts disabled.
Calling cancel_work_sync() can block and sleep while waiting for the work
item to finish.

Can sleeping in an atomic context with interrupts disabled lead to a
'scheduling while atomic' bug or cause deadlocks if the work item relies on
locks or hardware interrupts that cannot be serviced?

>  
>  	ret = pm_runtime_put_sync(dev);
> -	if (ret)
> +	if (ret) {
> +		/* Suspend aborted; keep handling DL_UpDown events. */
> +		WRITE_ONCE(host->hw_initialized, true);
>  		return ret;
> +	}

[Severity: High]
This isn't a bug introduced by this patch, but does calling
pm_runtime_put_sync() in suspend_noirq unconditionally abort system suspend?

In the suspend_noirq phase, the PM core has already unconditionally disabled
runtime PM by incrementing the disable_depth, so pm_runtime_put_sync() will
fail with -EACCES.

By returning this error, does the driver force the PM core to abort the entire
system suspend? Should the driver use pm_runtime_force_suspend() instead?

[ ... ]

[Severity: High]
This isn't a bug introduced by this patch, but does the
rzg3s_pcie_resume_noirq() callback call sleeping functions while interrupts
are disabled?

rzg3s_pcie_resume_noirq() executes in the NOIRQ phase with local CPU
interrupts disabled. It indirectly calls msleep() and fsleep() via:

    rzg3s_pcie_resume_noirq() -> rzg3s_pcie_host_setup() -> msleep()

Can sleeping in an atomic context violate kernel rules and lead to a
'scheduling while atomic' bug or system deadlocks?

[Severity: High]
This is a pre-existing issue, but does the hardware IRQ index calculation
assume contiguous Linux virtual IRQ numbers?

In rzg3s_pcie_intx_irq_handler(), the hardware INTx index is calculated as:

    u32 intx = irq - host->intx_irqs[0];

Linux dynamically assigns virtual IRQs without contiguity guarantees. If the
allocated IRQs are not perfectly contiguous, does this math produce
out-of-bounds indices, causing generic_handle_domain_irq() to fail and
resulting in an unhandled IRQ storm?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618190935.3018531-1-john.madieu.xa@bp.renesas.com?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-18 19:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 19:09 [PATCH] PCI: rzg3s-host: Re-enumerate the bus on PCIe link-state changes John Madieu
2026-06-18 19:26 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox