The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] irqchip/gic-v3-its: Reconfigure ITS from software state on resume
@ 2026-05-07 18:31 Bjoern Doebel
  0 siblings, 0 replies; only message in thread
From: Bjoern Doebel @ 2026-05-07 18:31 UTC (permalink / raw)
  Cc: Bjoern Doebel, stable, Marc Zyngier, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, David Woodhouse, Ali Saidi,
	David Arinzon, Zeev Zilberman

After resume, MSI-X interrupts can be silently dropped because the ITS
hardware state does not match the software its_device state. The
in-memory tables pointed at by GITS_BASER survive suspend, but the ITS
has been reset and must be reconfigured via ITS commands, per the GICv3
ITS architecture specification §5.6.1 (Enabling an ITS). Some ITS
implementations also keep internal state that is only populated via ITS
commands rather than by reading guest/firmware memory on demand, so
restoring GITS_BASER alone is not enough.

Before commit 713335b6ee29 ("irqchip/gic-v3-its: Implement
.msi_teardown() callback"), pci_free_irq_vectors() tore down the
its_device (MAPD valid=0, ITT freed) and pci_alloc_irq_vectors() rebuilt
it (MAPD valid=1). Drivers that disabled/re-enabled MSI-X across
suspend/resume (e.g. ENA, NVMe) thus reprogrammed the ITS as a side
effect. After commit 713335b6ee29 ("irqchip/gic-v3-its: Implement
.msi_teardown() callback"), device teardown moved to .msi_teardown(),
which only runs when the MSI domain is removed (driver unbind). Since
the MSI domain persists across suspend/resume, MAPD is never replayed.

Fix this in its_restore_enable() and its_cpu_init_collection() by
walking the preserved software state and re-issuing the ITS commands
needed to bring the hardware back in sync:

  1. For each device, issue MAPD(V=0), zero the ITT, then MAPD(V=1)
     with the same parameters. §5.3.10 makes MAPD(V=1) with a non-zero
     ITT UNPREDICTABLE, so the ITT must be zeroed first.

  2. Restore every CPU's collection (MAPC) and replay MAPTI for events
     targeting that CPU, once all target collections have been mapped.
     The per-event replay is driven by a bool parameter to
     its_cpu_init_collection() so that every ITS on a given CPU gets
     its MAPTIs restored. For the boot CPU, which does not traverse
     its_cpu_init_collections() on resume, replay is driven directly
     from its_restore_enable() for every ITS; the HCC optimisation
     that previously skipped MAPC for memory-resident collections on
     the boot CPU is dropped, matching what secondary CPUs already do
     in their normal cpuhp startup path. For secondary CPUs, replay
     is gated by a cpumask armed by its_restore_enable() once per
     resume cycle so that normal CPU hotplug is unaffected.

GICv4 vLPI state is skipped here: vLPIs are hypervisor-only, replayed
through separate GICv4 VM resume paths, and not relevant to guest
kernels or to this fix.

Tested on EC2 c6gn.16xlarge (ARM64 Graviton). Without the fix,
hibernation resume fails 100% with:

  ena 0000:00:05.0: The ena device sent a completion but the driver
  didn't receive a MSI-X interrupt (cmd 3)
  ena 0000:00:05.0: Failed to create IO CQ. error: -62

With the fix, hibernation resume works reliably.

Fixes: 713335b6ee29 ("irqchip/gic-v3-its: Implement .msi_teardown() callback")
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ali Saidi <alisaidi@amazon.com>
Co-developed-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Co-developed-by: Zeev Zilberman <zeev@amazon.com>
Signed-off-by: Zeev Zilberman <zeev@amazon.com>
Signed-off-by: Bjoern Doebel <doebel@amazon.de>
Assisted-by: Kiro:claude-opus-4.6
---
Testing: Tested hibernation using Amazon Linux 2023 and kernel 7.1-rc2
on EC2 c6gn, c7gn, and c8gn instances. Without the patch, hibernation
failed to bring up the ENA network device. With the patch, ENA devices
are properly re-initialized on resume.
---
 drivers/irqchip/irq-gic-v3-its.c | 124 ++++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 10 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 291d7668cc8da..0d240230037cd 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3283,7 +3283,66 @@ static void its_cpu_init_lpis(void)
 		&paddr);
 }
 
-static void its_cpu_init_collection(struct its_node *its)
+static cpumask_var_t its_restore_pending_cpus;
+
+static void its_restore_device(struct its_device *its_dev)
+{
+	/*
+	 * Bring each device back to a quiescent mapping state, as required
+	 * by GICv3 ITS architecture §5.6.1 (Enabling an ITS) after an ITS
+	 * reset: the device table entries are gone, so software must
+	 * reconfigure them with ITS commands. MAPD(V=1) with a non-zero
+	 * ITT is UNPREDICTABLE (§5.3.10, §5.2.4), so unmap first, zero the
+	 * ITT, and map again with a clean ITT. MAPTI replay is deferred to
+	 * its_cpu_init_collection() so that the collection a given event
+	 * targets is MAPC'd before MAPTI is issued for it.
+	 */
+	its_send_mapd(its_dev, 0);
+	memset(its_dev->itt, 0, its_dev->itt_sz);
+	gic_flush_dcache_to_poc(its_dev->itt, its_dev->itt_sz);
+	its_send_mapd(its_dev, 1);
+}
+
+static void its_cpu_replay_mapti(struct its_node *its)
+{
+	int cpu = smp_processor_id();
+	struct its_device *its_dev;
+	int event;
+
+	/*
+	 * Walk its_device_list without holding its->dev_alloc_lock.
+	 * Device add/remove normally requires that mutex, but this
+	 * function only runs on the resume path, from
+	 * its_cpu_init_collection() on either the boot CPU (called
+	 * directly from its_restore_enable() under its_lock) or a
+	 * secondary CPU (called from its_cpu_init_collections() under
+	 * its_lock). Concurrency with driver MSI alloc/free is excluded
+	 * by the hibernate sequence:
+	 *
+	 *   syscore_resume()            <- its_restore_enable() runs here
+	 *   pm_sleep_enable_secondary_cpus()  <- its_cpu_init() on each CPU
+	 *   dpm_resume_start() / dpm_resume()  <- driver .resume callbacks
+	 *
+	 * See kernel/power/hibernate.c:resume_target_kernel(). Drivers
+	 * cannot add or remove MSI allocations until their .resume
+	 * callbacks run, which is strictly after every CPU has passed
+	 * through its_cpu_init_collection().
+	 */
+	list_for_each_entry(its_dev, &its->its_device_list, entry) {
+		if (its_dev->event_map.vm)
+			continue;
+		for_each_set_bit(event, its_dev->event_map.lpi_map,
+				 its_dev->event_map.nr_lpis) {
+			if (its_dev->event_map.col_map[event] != cpu)
+				continue;
+			its_send_mapti(its_dev,
+				       its_dev->event_map.lpi_base + event,
+				       event);
+		}
+	}
+}
+
+static void its_cpu_init_collection(struct its_node *its, bool replay)
 {
 	int cpu = smp_processor_id();
 	u64 target;
@@ -3320,17 +3379,33 @@ static void its_cpu_init_collection(struct its_node *its)
 
 	its_send_mapc(its, &its->collections[cpu], 1);
 	its_send_invall(its, &its->collections[cpu]);
+
+	/*
+	 * On resume from hibernation, its_restore_enable() has reprogrammed
+	 * the device table but deferred per-event MAPTI replay until each
+	 * target collection is MAPC'd. Now that the local collection is
+	 * mapped, replay MAPTIs for events targeting this CPU on this ITS.
+	 */
+	if (replay)
+		its_cpu_replay_mapti(its);
 }
 
 static void its_cpu_init_collections(void)
 {
 	struct its_node *its;
+	bool replay;
 
-	raw_spin_lock(&its_lock);
+	/*
+	 * On resume from hibernation, its_restore_enable() arms this cpumask
+	 * for every secondary CPU that still needs MAPTI replay. Test-and-
+	 * clear once per CPU and propagate the flag to every ITS on this CPU.
+	 */
+	replay = cpumask_test_and_clear_cpu(smp_processor_id(),
+					    its_restore_pending_cpus);
 
+	raw_spin_lock(&its_lock);
 	list_for_each_entry(its, &its_nodes, entry)
-		its_cpu_init_collection(its);
-
+		its_cpu_init_collection(its, replay);
 	raw_spin_unlock(&its_lock);
 }
 
@@ -5036,8 +5111,22 @@ static void its_restore_enable(void *data)
 	struct its_node *its;
 	int ret;
 
+	/*
+	 * Arm MAPTI replay for every secondary CPU. The boot CPU does not
+	 * go through its_cpu_init_collections() on resume, so it is handled
+	 * directly in the per-ITS loop below; exclude it here to avoid
+	 * leaving a stale bit set.
+	 *
+	 * See §5.6.1 of the GICv3 ITS architecture specification: after an
+	 * ITS reset, software must reconfigure devices, collections and
+	 * translations via ITS commands.
+	 */
+	cpumask_copy(its_restore_pending_cpus, cpu_possible_mask);
+	cpumask_clear_cpu(smp_processor_id(), its_restore_pending_cpus);
+
 	raw_spin_lock(&its_lock);
 	list_for_each_entry(its, &its_nodes, entry) {
+		struct its_device *its_dev;
 		void __iomem *base;
 		int i;
 
@@ -5080,13 +5169,23 @@ static void its_restore_enable(void *data)
 		writel_relaxed(its->ctlr_save, base + GITS_CTLR);
 
 		/*
-		 * Reinit the collection if it's stored in the ITS. This is
-		 * indicated by the col_id being less than the HCC field.
-		 * CID < HCC as specified in the GIC v3 Documentation.
+		 * Reset and remap each device on this ITS. After resume,
+		 * the ITS has no device table entries and ITT contents may
+		 * be stale; per GICv3 ITS §5.3.10, MAPD(V=1) with a non-zero
+		 * ITT is UNPREDICTABLE. Unmap first, zero the ITT, then map
+		 * again.
 		 */
-		if (its->collections[smp_processor_id()].col_id <
-		    GITS_TYPER_HCC(gic_read_typer(base + GITS_TYPER)))
-			its_cpu_init_collection(its);
+		list_for_each_entry(its_dev, &its->its_device_list, entry)
+			its_restore_device(its_dev);
+
+		/*
+		 * Unconditionally MAPC the boot CPU's collection and replay
+		 * MAPTIs for events targeting it, on every ITS. This mirrors
+		 * the unconditional MAPC that secondary CPUs do in their
+		 * cpuhp startup path, and covers both HW-resident and
+		 * memory-resident collections.
+		 */
+		its_cpu_init_collection(its, true);
 	}
 	raw_spin_unlock(&its_lock);
 }
@@ -5826,6 +5925,11 @@ int __init its_init(struct fwnode_handle *handle, struct rdists *rdists,
 	if (!itt_pool)
 		return -ENOMEM;
 
+	if (!zalloc_cpumask_var(&its_restore_pending_cpus, GFP_KERNEL)) {
+		gen_pool_destroy(itt_pool);
+		return -ENOMEM;
+	}
+
 	gic_rdists = rdists;
 
 	lpi_prop_prio = irq_prio;
-- 
2.48.2




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-05-07 18:31 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 18:31 [PATCH] irqchip/gic-v3-its: Reconfigure ITS from software state on resume Bjoern Doebel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox