* [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure
2025-11-28 21:20 [PATCH v3 0/3] Enable MSI affinity support for dwc PCI Radu Rendec
@ 2025-11-28 21:20 ` Radu Rendec
2025-12-15 21:34 ` [tip: irq/msi] " tip-bot2 for Radu Rendec
2025-11-28 21:20 ` [PATCH v3 2/3] PCI: dwc: Code cleanup Radu Rendec
2025-11-28 21:20 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Radu Rendec
2 siblings, 1 reply; 25+ messages in thread
From: Radu Rendec @ 2025-11-28 21:20 UTC (permalink / raw)
To: Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel
Add infrastructure to redirect interrupt handler execution to a
different CPU when the current CPU is not part of the interrupt's CPU
affinity mask.
This is primarily aimed at (de)multiplexed interrupts, where the child
interrupt handler runs in the context of the parent interrupt handler,
and therefore CPU affinity control for the child interrupt is typically
not available.
With the new infrastructure, the child interrupt is allowed to freely
change its affinity setting, independently of the parent. If the
interrupt handler happens to be triggered on an "incompatible" CPU (a
CPU that's not part of the child interrupt's affinity mask), the handler
is redirected and runs in IRQ work context on a "compatible" CPU.
No functional change is being made to any existing irqchip driver, and
irqchip drivers must be explicitly modified to use the newly added
infrastructure to support interrupt redirection.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Signed-off-by: Radu Rendec <rrendec@redhat.com>
---
include/linux/irq.h | 10 +++++
include/linux/irqdesc.h | 17 +++++++-
kernel/irq/chip.c | 22 ++++++++++-
kernel/irq/irqdesc.c | 86 ++++++++++++++++++++++++++++++++++++++++-
kernel/irq/manage.c | 15 ++++++-
5 files changed, 144 insertions(+), 6 deletions(-)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index c67e76fbcc077..b6966747d88ca 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -459,6 +459,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
* checks against the supplied affinity mask are not
* required. This is used for CPU hotplug where the
* target CPU is not yet set in the cpu_online_mask.
+ * @irq_pre_redirect: Optional function to be invoked before redirecting
+ * an interrupt via irq_work. Called only on CONFIG_SMP.
* @irq_retrigger: resend an IRQ to the CPU
* @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ
* @irq_set_wake: enable/disable power-management wake-on of an IRQ
@@ -503,6 +505,7 @@ struct irq_chip {
void (*irq_eoi)(struct irq_data *data);
int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
+ void (*irq_pre_redirect)(struct irq_data *data);
int (*irq_retrigger)(struct irq_data *data);
int (*irq_set_type)(struct irq_data *data, unsigned int flow_type);
int (*irq_set_wake)(struct irq_data *data, unsigned int on);
@@ -688,6 +691,13 @@ extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data,
extern int irq_chip_set_type_parent(struct irq_data *data, unsigned int type);
extern int irq_chip_request_resources_parent(struct irq_data *data);
extern void irq_chip_release_resources_parent(struct irq_data *data);
+#ifdef CONFIG_SMP
+void irq_chip_pre_redirect_parent(struct irq_data *data);
+#endif
+#endif
+
+#ifdef CONFIG_SMP
+int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *dest, bool force);
#endif
/* Disable or mask interrupts during a kernel kexec */
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index fd091c35d5721..620ddd3951751 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -2,9 +2,10 @@
#ifndef _LINUX_IRQDESC_H
#define _LINUX_IRQDESC_H
-#include <linux/rcupdate.h>
+#include <linux/irq_work.h>
#include <linux/kobject.h>
#include <linux/mutex.h>
+#include <linux/rcupdate.h>
/*
* Core internal functions to deal with irq descriptors
@@ -29,6 +30,17 @@ struct irqstat {
#endif
};
+/**
+ * struct irq_redirect - interrupt redirection metadata
+ * @work: HARD work item for handler execution on a different cpu
+ * @target_cpu: cpu to run irq handler on in case the current cpu is not part
+ * of the irq affinity mask
+ */
+struct irq_redirect {
+ struct irq_work work;
+ unsigned int target_cpu;
+};
+
/**
* struct irq_desc - interrupt descriptor
* @irq_common_data: per irq and chip data passed down to chip functions
@@ -46,6 +58,7 @@ struct irqstat {
* @threads_handled: stats field for deferred spurious detection of threaded handlers
* @threads_handled_last: comparator field for deferred spurious detection of threaded handlers
* @lock: locking for SMP
+ * @redirect: Facility for redirecting interrupts via irq_work
* @affinity_hint: hint to user space for preferred irq affinity
* @affinity_notify: context for notification of affinity changes
* @pending_mask: pending rebalanced interrupts
@@ -84,6 +97,7 @@ struct irq_desc {
struct cpumask *percpu_enabled;
const struct cpumask *percpu_affinity;
#ifdef CONFIG_SMP
+ struct irq_redirect redirect;
const struct cpumask *affinity_hint;
struct irq_affinity_notify *affinity_notify;
#ifdef CONFIG_GENERIC_PENDING_IRQ
@@ -186,6 +200,7 @@ int generic_handle_irq_safe(unsigned int irq);
int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq);
int generic_handle_domain_irq_safe(struct irq_domain *domain, unsigned int hwirq);
int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq);
+bool generic_handle_demux_domain_irq(struct irq_domain *domain, unsigned int hwirq);
#endif
/* Test to see if a driver has successfully requested an irq */
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index d1917b28761a3..d5c3f6ee24cc2 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1143,7 +1143,7 @@ void irq_cpu_offline(void)
}
#endif
-#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
+#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
#ifdef CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS
/**
@@ -1215,6 +1215,15 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_mask_irq);
#endif /* CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS */
+#ifdef CONFIG_SMP
+void irq_chip_pre_redirect_parent(struct irq_data *data)
+{
+ data = data->parent_data;
+ data->chip->irq_pre_redirect(data);
+}
+EXPORT_SYMBOL_GPL(irq_chip_pre_redirect_parent);
+#endif
+
/**
* irq_chip_set_parent_state - set the state of a parent interrupt.
*
@@ -1497,6 +1506,17 @@ void irq_chip_release_resources_parent(struct irq_data *data)
data->chip->irq_release_resources(data);
}
EXPORT_SYMBOL_GPL(irq_chip_release_resources_parent);
+#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
+
+#ifdef CONFIG_SMP
+int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *dest, bool force)
+{
+ struct irq_redirect *redir = &irq_data_to_desc(data)->redirect;
+
+ WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
+ return IRQ_SET_MASK_OK;
+}
+EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
#endif
/**
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index db714d3014b5f..d3d4e7cf12937 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -78,8 +78,12 @@ static int alloc_masks(struct irq_desc *desc, int node)
return 0;
}
-static void desc_smp_init(struct irq_desc *desc, int node,
- const struct cpumask *affinity)
+static void irq_redirect_work(struct irq_work *work)
+{
+ handle_irq_desc(container_of(work, struct irq_desc, redirect.work));
+}
+
+static void desc_smp_init(struct irq_desc *desc, int node, const struct cpumask *affinity)
{
if (!affinity)
affinity = irq_default_affinity;
@@ -91,6 +95,7 @@ static void desc_smp_init(struct irq_desc *desc, int node,
#ifdef CONFIG_NUMA
desc->irq_common_data.node = node;
#endif
+ desc->redirect.work = IRQ_WORK_INIT_HARD(irq_redirect_work);
}
static void free_masks(struct irq_desc *desc)
@@ -766,6 +771,83 @@ int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq)
WARN_ON_ONCE(!in_nmi());
return handle_irq_desc(irq_resolve_mapping(domain, hwirq));
}
+
+#ifdef CONFIG_SMP
+static bool demux_redirect_remote(struct irq_desc *desc)
+{
+ guard(raw_spinlock)(&desc->lock);
+ const struct cpumask *m = irq_data_get_effective_affinity_mask(&desc->irq_data);
+ unsigned int target_cpu = READ_ONCE(desc->redirect.target_cpu);
+
+ if (desc->irq_data.chip->irq_pre_redirect)
+ desc->irq_data.chip->irq_pre_redirect(&desc->irq_data);
+
+ /*
+ * If the interrupt handler is already running on a CPU that's included
+ * in the interrupt's affinity mask, redirection is not necessary.
+ */
+ if (cpumask_test_cpu(smp_processor_id(), m))
+ return false;
+
+ /*
+ * The desc->action check protects against IRQ shutdown: __free_irq() sets
+ * desc->action to NULL while holding desc->lock, which we also hold.
+ *
+ * Calling irq_work_queue_on() here is safe w.r.t. CPU unplugging:
+ * - takedown_cpu() schedules multi_cpu_stop() on all active CPUs,
+ * including the one that's taken down.
+ * - multi_cpu_stop() acts like a barrier, which means all active
+ * CPUs go through MULTI_STOP_DISABLE_IRQ and disable hard IRQs
+ * *before* the dying CPU runs take_cpu_down() in MULTI_STOP_RUN.
+ * - Hard IRQs are re-enabled at the end of multi_cpu_stop(), *after*
+ * the dying CPU has run take_cpu_down() in MULTI_STOP_RUN.
+ * - Since we run in hard IRQ context, we run either before or after
+ * take_cpu_down() but never concurrently.
+ * - If we run before take_cpu_down(), the dying CPU hasn't been marked
+ * offline yet (it's marked via take_cpu_down() -> __cpu_disable()),
+ * so the WARN in irq_work_queue_on() can't occur.
+ * - Furthermore, the work item we queue will be flushed later via
+ * take_cpu_down() -> cpuhp_invoke_callback_range_nofail() ->
+ * smpcfd_dying_cpu() -> irq_work_run().
+ * - If we run after take_cpu_down(), target_cpu has been already
+ * updated via take_cpu_down() -> __cpu_disable(), which eventually
+ * calls irq_do_set_affinity() during IRQ migration. So, target_cpu
+ * no longer points to the dying CPU in this case.
+ */
+ if (desc->action)
+ irq_work_queue_on(&desc->redirect.work, target_cpu);
+
+ return true;
+}
+#else /* CONFIG_SMP */
+static bool demux_redirect_remote(struct irq_desc *desc)
+{
+ return false;
+}
+#endif
+
+/**
+ * generic_handle_demux_domain_irq - Invoke the handler for a hardware interrupt
+ * of a demultiplexing domain.
+ * @domain: The domain where to perform the lookup
+ * @hwirq: The hardware interrupt number to convert to a logical one
+ *
+ * Returns: True on success, or false if lookup has failed
+ */
+bool generic_handle_demux_domain_irq(struct irq_domain *domain, unsigned int hwirq)
+{
+ struct irq_desc *desc = irq_resolve_mapping(domain, hwirq);
+
+ if (unlikely(!desc))
+ return false;
+
+ if (demux_redirect_remote(desc))
+ return true;
+
+ return !handle_irq_desc(desc);
+}
+EXPORT_SYMBOL_GPL(generic_handle_demux_domain_irq);
+
#endif
/* Dynamic interrupt handling */
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 400856abf6721..0c06f37d8a203 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -35,6 +35,16 @@ static int __init setup_forced_irqthreads(char *arg)
early_param("threadirqs", setup_forced_irqthreads);
#endif
+#ifdef CONFIG_SMP
+static inline void synchronize_irqwork(struct irq_desc *desc)
+{
+ /* Synchronize pending or on the fly redirect work */
+ irq_work_sync(&desc->redirect.work);
+}
+#else
+static inline void synchronize_irqwork(struct irq_desc *desc) { }
+#endif
+
static int __irq_get_irqchip_state(struct irq_data *d, enum irqchip_irq_state which, bool *state);
static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
@@ -107,7 +117,9 @@ EXPORT_SYMBOL(synchronize_hardirq);
static void __synchronize_irq(struct irq_desc *desc)
{
+ synchronize_irqwork(desc);
__synchronize_hardirq(desc, true);
+
/*
* We made sure that no hardirq handler is running. Now verify that no
* threaded handlers are active.
@@ -217,8 +229,7 @@ static inline void irq_validate_effective_affinity(struct irq_data *data) { }
static DEFINE_PER_CPU(struct cpumask, __tmp_mask);
-int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
- bool force)
+int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
{
struct cpumask *tmp_mask = this_cpu_ptr(&__tmp_mask);
struct irq_desc *desc = irq_data_to_desc(data);
--
2.51.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [tip: irq/msi] genirq: Add interrupt redirection infrastructure
2025-11-28 21:20 ` [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure Radu Rendec
@ 2025-12-15 21:34 ` tip-bot2 for Radu Rendec
0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Radu Rendec @ 2025-12-15 21:34 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Radu Rendec, x86, linux-kernel
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: fcc1d0dabdb65ca069f77e5b76d3b20277be4a15
Gitweb: https://git.kernel.org/tip/fcc1d0dabdb65ca069f77e5b76d3b20277be4a15
Author: Radu Rendec <rrendec@redhat.com>
AuthorDate: Fri, 28 Nov 2025 16:20:53 -05:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
genirq: Add interrupt redirection infrastructure
Add infrastructure to redirect interrupt handler execution to a
different CPU when the current CPU is not part of the interrupt's CPU
affinity mask.
This is primarily aimed at (de)multiplexed interrupts, where the child
interrupt handler runs in the context of the parent interrupt handler,
and therefore CPU affinity control for the child interrupt is typically
not available.
With the new infrastructure, the child interrupt is allowed to freely
change its affinity setting, independently of the parent. If the
interrupt handler happens to be triggered on an "incompatible" CPU (a
CPU that's not part of the child interrupt's affinity mask), the handler
is redirected and runs in IRQ work context on a "compatible" CPU.
No functional change is being made to any existing irqchip driver, and
irqchip drivers must be explicitly modified to use the newly added
infrastructure to support interrupt redirection.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Link: https://patch.msgid.link/20251128212055.1409093-2-rrendec@redhat.com
---
include/linux/irq.h | 10 +++++-
include/linux/irqdesc.h | 17 +++++++-
kernel/irq/chip.c | 22 +++++++++-
kernel/irq/irqdesc.c | 86 +++++++++++++++++++++++++++++++++++++++-
kernel/irq/manage.c | 15 ++++++-
5 files changed, 144 insertions(+), 6 deletions(-)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 4a9f1d7..41d5bc5 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -459,6 +459,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
* checks against the supplied affinity mask are not
* required. This is used for CPU hotplug where the
* target CPU is not yet set in the cpu_online_mask.
+ * @irq_pre_redirect: Optional function to be invoked before redirecting
+ * an interrupt via irq_work. Called only on CONFIG_SMP.
* @irq_retrigger: resend an IRQ to the CPU
* @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ
* @irq_set_wake: enable/disable power-management wake-on of an IRQ
@@ -503,6 +505,7 @@ struct irq_chip {
void (*irq_eoi)(struct irq_data *data);
int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
+ void (*irq_pre_redirect)(struct irq_data *data);
int (*irq_retrigger)(struct irq_data *data);
int (*irq_set_type)(struct irq_data *data, unsigned int flow_type);
int (*irq_set_wake)(struct irq_data *data, unsigned int on);
@@ -687,6 +690,13 @@ extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data,
extern int irq_chip_set_type_parent(struct irq_data *data, unsigned int type);
extern int irq_chip_request_resources_parent(struct irq_data *data);
extern void irq_chip_release_resources_parent(struct irq_data *data);
+#ifdef CONFIG_SMP
+void irq_chip_pre_redirect_parent(struct irq_data *data);
+#endif
+#endif
+
+#ifdef CONFIG_SMP
+int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *dest, bool force);
#endif
/* Disable or mask interrupts during a kernel kexec */
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 1790286..dae9a9b 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -2,9 +2,10 @@
#ifndef _LINUX_IRQDESC_H
#define _LINUX_IRQDESC_H
-#include <linux/rcupdate.h>
+#include <linux/irq_work.h>
#include <linux/kobject.h>
#include <linux/mutex.h>
+#include <linux/rcupdate.h>
/*
* Core internal functions to deal with irq descriptors
@@ -30,6 +31,17 @@ struct irqstat {
};
/**
+ * struct irq_redirect - interrupt redirection metadata
+ * @work: Harg irq_work item for handler execution on a different CPU
+ * @target_cpu: CPU to run irq handler on in case the current CPU is not part
+ * of the irq affinity mask
+ */
+struct irq_redirect {
+ struct irq_work work;
+ unsigned int target_cpu;
+};
+
+/**
* struct irq_desc - interrupt descriptor
* @irq_common_data: per irq and chip data passed down to chip functions
* @kstat_irqs: irq stats per cpu
@@ -46,6 +58,7 @@ struct irqstat {
* @threads_handled: stats field for deferred spurious detection of threaded handlers
* @threads_handled_last: comparator field for deferred spurious detection of threaded handlers
* @lock: locking for SMP
+ * @redirect: Facility for redirecting interrupts via irq_work
* @affinity_hint: hint to user space for preferred irq affinity
* @affinity_notify: context for notification of affinity changes
* @pending_mask: pending rebalanced interrupts
@@ -83,6 +96,7 @@ struct irq_desc {
raw_spinlock_t lock;
struct cpumask *percpu_enabled;
#ifdef CONFIG_SMP
+ struct irq_redirect redirect;
const struct cpumask *affinity_hint;
struct irq_affinity_notify *affinity_notify;
#ifdef CONFIG_GENERIC_PENDING_IRQ
@@ -185,6 +199,7 @@ int generic_handle_irq_safe(unsigned int irq);
int generic_handle_domain_irq(struct irq_domain *domain, irq_hw_number_t hwirq);
int generic_handle_domain_irq_safe(struct irq_domain *domain, irq_hw_number_t hwirq);
int generic_handle_domain_nmi(struct irq_domain *domain, irq_hw_number_t hwirq);
+bool generic_handle_demux_domain_irq(struct irq_domain *domain, irq_hw_number_t hwirq);
#endif
/* Test to see if a driver has successfully requested an irq */
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 678f094..433f1dd 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1122,7 +1122,7 @@ void irq_cpu_offline(void)
}
#endif
-#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
+#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
#ifdef CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS
/**
@@ -1194,6 +1194,15 @@ EXPORT_SYMBOL_GPL(handle_fasteoi_mask_irq);
#endif /* CONFIG_IRQ_FASTEOI_HIERARCHY_HANDLERS */
+#ifdef CONFIG_SMP
+void irq_chip_pre_redirect_parent(struct irq_data *data)
+{
+ data = data->parent_data;
+ data->chip->irq_pre_redirect(data);
+}
+EXPORT_SYMBOL_GPL(irq_chip_pre_redirect_parent);
+#endif
+
/**
* irq_chip_set_parent_state - set the state of a parent interrupt.
*
@@ -1476,6 +1485,17 @@ void irq_chip_release_resources_parent(struct irq_data *data)
data->chip->irq_release_resources(data);
}
EXPORT_SYMBOL_GPL(irq_chip_release_resources_parent);
+#endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */
+
+#ifdef CONFIG_SMP
+int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *dest, bool force)
+{
+ struct irq_redirect *redir = &irq_data_to_desc(data)->redirect;
+
+ WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
+ return IRQ_SET_MASK_OK;
+}
+EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
#endif
/**
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index f8e4e13..501a653 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -78,8 +78,12 @@ static int alloc_masks(struct irq_desc *desc, int node)
return 0;
}
-static void desc_smp_init(struct irq_desc *desc, int node,
- const struct cpumask *affinity)
+static void irq_redirect_work(struct irq_work *work)
+{
+ handle_irq_desc(container_of(work, struct irq_desc, redirect.work));
+}
+
+static void desc_smp_init(struct irq_desc *desc, int node, const struct cpumask *affinity)
{
if (!affinity)
affinity = irq_default_affinity;
@@ -91,6 +95,7 @@ static void desc_smp_init(struct irq_desc *desc, int node,
#ifdef CONFIG_NUMA
desc->irq_common_data.node = node;
#endif
+ desc->redirect.work = IRQ_WORK_INIT_HARD(irq_redirect_work);
}
static void free_masks(struct irq_desc *desc)
@@ -766,6 +771,83 @@ int generic_handle_domain_nmi(struct irq_domain *domain, irq_hw_number_t hwirq)
WARN_ON_ONCE(!in_nmi());
return handle_irq_desc(irq_resolve_mapping(domain, hwirq));
}
+
+#ifdef CONFIG_SMP
+static bool demux_redirect_remote(struct irq_desc *desc)
+{
+ guard(raw_spinlock)(&desc->lock);
+ const struct cpumask *m = irq_data_get_effective_affinity_mask(&desc->irq_data);
+ unsigned int target_cpu = READ_ONCE(desc->redirect.target_cpu);
+
+ if (desc->irq_data.chip->irq_pre_redirect)
+ desc->irq_data.chip->irq_pre_redirect(&desc->irq_data);
+
+ /*
+ * If the interrupt handler is already running on a CPU that's included
+ * in the interrupt's affinity mask, redirection is not necessary.
+ */
+ if (cpumask_test_cpu(smp_processor_id(), m))
+ return false;
+
+ /*
+ * The desc->action check protects against IRQ shutdown: __free_irq() sets
+ * desc->action to NULL while holding desc->lock, which we also hold.
+ *
+ * Calling irq_work_queue_on() here is safe w.r.t. CPU unplugging:
+ * - takedown_cpu() schedules multi_cpu_stop() on all active CPUs,
+ * including the one that's taken down.
+ * - multi_cpu_stop() acts like a barrier, which means all active
+ * CPUs go through MULTI_STOP_DISABLE_IRQ and disable hard IRQs
+ * *before* the dying CPU runs take_cpu_down() in MULTI_STOP_RUN.
+ * - Hard IRQs are re-enabled at the end of multi_cpu_stop(), *after*
+ * the dying CPU has run take_cpu_down() in MULTI_STOP_RUN.
+ * - Since we run in hard IRQ context, we run either before or after
+ * take_cpu_down() but never concurrently.
+ * - If we run before take_cpu_down(), the dying CPU hasn't been marked
+ * offline yet (it's marked via take_cpu_down() -> __cpu_disable()),
+ * so the WARN in irq_work_queue_on() can't occur.
+ * - Furthermore, the work item we queue will be flushed later via
+ * take_cpu_down() -> cpuhp_invoke_callback_range_nofail() ->
+ * smpcfd_dying_cpu() -> irq_work_run().
+ * - If we run after take_cpu_down(), target_cpu has been already
+ * updated via take_cpu_down() -> __cpu_disable(), which eventually
+ * calls irq_do_set_affinity() during IRQ migration. So, target_cpu
+ * no longer points to the dying CPU in this case.
+ */
+ if (desc->action)
+ irq_work_queue_on(&desc->redirect.work, target_cpu);
+
+ return true;
+}
+#else /* CONFIG_SMP */
+static bool demux_redirect_remote(struct irq_desc *desc)
+{
+ return false;
+}
+#endif
+
+/**
+ * generic_handle_demux_domain_irq - Invoke the handler for a hardware interrupt
+ * of a demultiplexing domain.
+ * @domain: The domain where to perform the lookup
+ * @hwirq: The hardware interrupt number to convert to a logical one
+ *
+ * Returns: True on success, or false if lookup has failed
+ */
+bool generic_handle_demux_domain_irq(struct irq_domain *domain, irq_hw_number_t hwirq)
+{
+ struct irq_desc *desc = irq_resolve_mapping(domain, hwirq);
+
+ if (unlikely(!desc))
+ return false;
+
+ if (demux_redirect_remote(desc))
+ return true;
+
+ return !handle_irq_desc(desc);
+}
+EXPORT_SYMBOL_GPL(generic_handle_demux_domain_irq);
+
#endif
/* Dynamic interrupt handling */
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8b1b4c8..acb4c3d 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -35,6 +35,16 @@ static int __init setup_forced_irqthreads(char *arg)
early_param("threadirqs", setup_forced_irqthreads);
#endif
+#ifdef CONFIG_SMP
+static inline void synchronize_irqwork(struct irq_desc *desc)
+{
+ /* Synchronize pending or on the fly redirect work */
+ irq_work_sync(&desc->redirect.work);
+}
+#else
+static inline void synchronize_irqwork(struct irq_desc *desc) { }
+#endif
+
static int __irq_get_irqchip_state(struct irq_data *d, enum irqchip_irq_state which, bool *state);
static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
@@ -107,7 +117,9 @@ EXPORT_SYMBOL(synchronize_hardirq);
static void __synchronize_irq(struct irq_desc *desc)
{
+ synchronize_irqwork(desc);
__synchronize_hardirq(desc, true);
+
/*
* We made sure that no hardirq handler is running. Now verify that no
* threaded handlers are active.
@@ -217,8 +229,7 @@ static inline void irq_validate_effective_affinity(struct irq_data *data) { }
static DEFINE_PER_CPU(struct cpumask, __tmp_mask);
-int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
- bool force)
+int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
{
struct cpumask *tmp_mask = this_cpu_ptr(&__tmp_mask);
struct irq_desc *desc = irq_data_to_desc(data);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 2/3] PCI: dwc: Code cleanup
2025-11-28 21:20 [PATCH v3 0/3] Enable MSI affinity support for dwc PCI Radu Rendec
2025-11-28 21:20 ` [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure Radu Rendec
@ 2025-11-28 21:20 ` Radu Rendec
2025-12-15 21:34 ` [tip: irq/msi] " tip-bot2 for Radu Rendec
2025-11-28 21:20 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Radu Rendec
2 siblings, 1 reply; 25+ messages in thread
From: Radu Rendec @ 2025-11-28 21:20 UTC (permalink / raw)
To: Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel
Code cleanup with no functional changes. These changes were originally
made by Thomas Gleixner (see Link tag below) in a patch that was never
submitted as is. Other parts of that patch were eventually submitted as
commit 8e717112caf3 ("PCI: dwc: Switch to msi_create_parent_irq_domain()")
and the remaining parts are the code cleanup changes in this patch.
Summary of changes:
- Use guard()/scoped_guard() instead of open-coded lock/unlock.
- Return void in a few functions whose return value is never used.
- Simplify dw_handle_msi_irq() by using for_each_set_bit().
One notable deviation from the original patch is that I reverted back to
a simple 1 by 1 iteration over the controllers inside dw_handle_msi_irq.
The reason is that with the original changes, the IRQ offset was
calculated incorrectly.
This patch also prepares the ground for the next patch in the series,
which enables MSI affinity support, and was originally part of that same
series that Thomas Gleixner prepared.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Signed-off-by: Radu Rendec <rrendec@redhat.com>
---
.../pci/controller/dwc/pcie-designware-host.c | 98 ++++++-------------
drivers/pci/controller/dwc/pcie-designware.h | 7 +-
2 files changed, 34 insertions(+), 71 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index e92513c5bda51..aa93acaa579a5 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -46,35 +46,25 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
};
/* MSI int handler */
-irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp)
+void dw_handle_msi_irq(struct dw_pcie_rp *pp)
{
- int i, pos;
- unsigned long val;
- u32 status, num_ctrls;
- irqreturn_t ret = IRQ_NONE;
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+ unsigned int i, num_ctrls;
num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL;
for (i = 0; i < num_ctrls; i++) {
- status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
- (i * MSI_REG_CTRL_BLOCK_SIZE));
+ unsigned int reg_off = i * MSI_REG_CTRL_BLOCK_SIZE;
+ unsigned int irq_off = i * MAX_MSI_IRQS_PER_CTRL;
+ unsigned long status, pos;
+
+ status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + reg_off);
if (!status)
continue;
- ret = IRQ_HANDLED;
- val = status;
- pos = 0;
- while ((pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL,
- pos)) != MAX_MSI_IRQS_PER_CTRL) {
- generic_handle_domain_irq(pp->irq_domain,
- (i * MAX_MSI_IRQS_PER_CTRL) +
- pos);
- pos++;
- }
+ for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
+ generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
}
-
- return ret;
}
/* Chained MSI interrupt service routine */
@@ -95,13 +85,10 @@ static void dw_pci_setup_msi_msg(struct irq_data *d, struct msi_msg *msg)
{
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
- u64 msi_target;
-
- msi_target = (u64)pp->msi_data;
+ u64 msi_target = (u64)pp->msi_data;
msg->address_lo = lower_32_bits(msi_target);
msg->address_hi = upper_32_bits(msi_target);
-
msg->data = d->hwirq;
dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n",
@@ -113,18 +100,14 @@ static void dw_pci_bottom_mask(struct irq_data *d)
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
unsigned int res, bit, ctrl;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
+ guard(raw_spinlock)(&pp->lock);
ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
pp->irq_mask[ctrl] |= BIT(bit);
dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]);
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
}
static void dw_pci_bottom_unmask(struct irq_data *d)
@@ -132,18 +115,14 @@ static void dw_pci_bottom_unmask(struct irq_data *d)
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
unsigned int res, bit, ctrl;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
+ guard(raw_spinlock)(&pp->lock);
ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
pp->irq_mask[ctrl] &= ~BIT(bit);
dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]);
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
}
static void dw_pci_bottom_ack(struct irq_data *d)
@@ -160,54 +139,42 @@ static void dw_pci_bottom_ack(struct irq_data *d)
}
static struct irq_chip dw_pci_msi_bottom_irq_chip = {
- .name = "DWPCI-MSI",
- .irq_ack = dw_pci_bottom_ack,
- .irq_compose_msi_msg = dw_pci_setup_msi_msg,
- .irq_mask = dw_pci_bottom_mask,
- .irq_unmask = dw_pci_bottom_unmask,
+ .name = "DWPCI-MSI",
+ .irq_ack = dw_pci_bottom_ack,
+ .irq_compose_msi_msg = dw_pci_setup_msi_msg,
+ .irq_mask = dw_pci_bottom_mask,
+ .irq_unmask = dw_pci_bottom_unmask,
};
-static int dw_pcie_irq_domain_alloc(struct irq_domain *domain,
- unsigned int virq, unsigned int nr_irqs,
- void *args)
+static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs, void *args)
{
struct dw_pcie_rp *pp = domain->host_data;
- unsigned long flags;
- u32 i;
int bit;
- raw_spin_lock_irqsave(&pp->lock, flags);
-
- bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
- order_base_2(nr_irqs));
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
+ scoped_guard (raw_spinlock_irq, &pp->lock) {
+ bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
+ order_base_2(nr_irqs));
+ }
if (bit < 0)
return -ENOSPC;
- for (i = 0; i < nr_irqs; i++)
- irq_domain_set_info(domain, virq + i, bit + i,
- pp->msi_irq_chip,
- pp, handle_edge_irq,
- NULL, NULL);
-
+ for (unsigned int i = 0; i < nr_irqs; i++) {
+ irq_domain_set_info(domain, virq + i, bit + i, pp->msi_irq_chip,
+ pp, handle_edge_irq, NULL, NULL);
+ }
return 0;
}
-static void dw_pcie_irq_domain_free(struct irq_domain *domain,
- unsigned int virq, unsigned int nr_irqs)
+static void dw_pcie_irq_domain_free(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs)
{
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
struct dw_pcie_rp *pp = domain->host_data;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
- bitmap_release_region(pp->msi_irq_in_use, d->hwirq,
- order_base_2(nr_irqs));
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
+ guard(raw_spinlock_irq)(&pp->lock);
+ bitmap_release_region(pp->msi_irq_in_use, d->hwirq, order_base_2(nr_irqs));
}
static const struct irq_domain_ops dw_pcie_msi_domain_ops = {
@@ -240,8 +207,7 @@ void dw_pcie_free_msi(struct dw_pcie_rp *pp)
for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++) {
if (pp->msi_irq[ctrl] > 0)
- irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
- NULL, NULL);
+ irq_set_chained_handler_and_data(pp->msi_irq[ctrl], NULL, NULL);
}
irq_domain_remove(pp->irq_domain);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index e995f692a1ecd..ef212a56f60c5 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -802,7 +802,7 @@ static inline enum dw_pcie_ltssm dw_pcie_get_ltssm(struct dw_pcie *pci)
#ifdef CONFIG_PCIE_DW_HOST
int dw_pcie_suspend_noirq(struct dw_pcie *pci);
int dw_pcie_resume_noirq(struct dw_pcie *pci);
-irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp);
+void dw_handle_msi_irq(struct dw_pcie_rp *pp);
void dw_pcie_msi_init(struct dw_pcie_rp *pp);
int dw_pcie_msi_host_init(struct dw_pcie_rp *pp);
void dw_pcie_free_msi(struct dw_pcie_rp *pp);
@@ -823,10 +823,7 @@ static inline int dw_pcie_resume_noirq(struct dw_pcie *pci)
return 0;
}
-static inline irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp)
-{
- return IRQ_NONE;
-}
+static inline void dw_handle_msi_irq(struct dw_pcie_rp *pp) { }
static inline void dw_pcie_msi_init(struct dw_pcie_rp *pp)
{ }
--
2.51.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [tip: irq/msi] PCI: dwc: Code cleanup
2025-11-28 21:20 ` [PATCH v3 2/3] PCI: dwc: Code cleanup Radu Rendec
@ 2025-12-15 21:34 ` tip-bot2 for Radu Rendec
0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Radu Rendec @ 2025-12-15 21:34 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Radu Rendec, x86, linux-kernel
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: f1875091a01dd634ff5f8b6fc57ab874f755c415
Gitweb: https://git.kernel.org/tip/f1875091a01dd634ff5f8b6fc57ab874f755c415
Author: Radu Rendec <rrendec@redhat.com>
AuthorDate: Fri, 28 Nov 2025 16:20:54 -05:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
PCI: dwc: Code cleanup
Code cleanup with no functional changes. These changes were originally
made by Thomas Gleixner (see Link tag below) in a patch that was never
submitted as is. Other parts of that patch were eventually submitted as
commit 8e717112caf3 ("PCI: dwc: Switch to msi_create_parent_irq_domain()")
and the remaining parts are the code cleanup changes:
- Use guard()/scoped_guard() instead of open-coded lock/unlock.
- Return void in a few functions whose return value is never used.
- Simplify dw_handle_msi_irq() by using for_each_set_bit().
One notable deviation from the original patch is that it reverts back to a
simple 1 by 1 iteration over the controllers inside dw_handle_msi_irq. The
reason is that with the original changes, the IRQ offset was calculated
incorrectly.
This prepares the ground for enabling MSI affinity support, which was
originally part of that same series that Thomas Gleixner prepared.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Link: https://patch.msgid.link/20251128212055.1409093-3-rrendec@redhat.com
---
drivers/pci/controller/dwc/pcie-designware-host.c | 98 ++++----------
drivers/pci/controller/dwc/pcie-designware.h | 7 +-
2 files changed, 34 insertions(+), 71 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 372207c..25ad1ae 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -46,35 +46,25 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
};
/* MSI int handler */
-irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp)
+void dw_handle_msi_irq(struct dw_pcie_rp *pp)
{
- int i, pos;
- unsigned long val;
- u32 status, num_ctrls;
- irqreturn_t ret = IRQ_NONE;
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+ unsigned int i, num_ctrls;
num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL;
for (i = 0; i < num_ctrls; i++) {
- status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
- (i * MSI_REG_CTRL_BLOCK_SIZE));
+ unsigned int reg_off = i * MSI_REG_CTRL_BLOCK_SIZE;
+ unsigned int irq_off = i * MAX_MSI_IRQS_PER_CTRL;
+ unsigned long status, pos;
+
+ status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + reg_off);
if (!status)
continue;
- ret = IRQ_HANDLED;
- val = status;
- pos = 0;
- while ((pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL,
- pos)) != MAX_MSI_IRQS_PER_CTRL) {
- generic_handle_domain_irq(pp->irq_domain,
- (i * MAX_MSI_IRQS_PER_CTRL) +
- pos);
- pos++;
- }
+ for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
+ generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
}
-
- return ret;
}
/* Chained MSI interrupt service routine */
@@ -95,13 +85,10 @@ static void dw_pci_setup_msi_msg(struct irq_data *d, struct msi_msg *msg)
{
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
- u64 msi_target;
-
- msi_target = (u64)pp->msi_data;
+ u64 msi_target = (u64)pp->msi_data;
msg->address_lo = lower_32_bits(msi_target);
msg->address_hi = upper_32_bits(msi_target);
-
msg->data = d->hwirq;
dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n",
@@ -113,18 +100,14 @@ static void dw_pci_bottom_mask(struct irq_data *d)
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
unsigned int res, bit, ctrl;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
+ guard(raw_spinlock)(&pp->lock);
ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
pp->irq_mask[ctrl] |= BIT(bit);
dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]);
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
}
static void dw_pci_bottom_unmask(struct irq_data *d)
@@ -132,18 +115,14 @@ static void dw_pci_bottom_unmask(struct irq_data *d)
struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d);
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
unsigned int res, bit, ctrl;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
+ guard(raw_spinlock)(&pp->lock);
ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
pp->irq_mask[ctrl] &= ~BIT(bit);
dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, pp->irq_mask[ctrl]);
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
}
static void dw_pci_bottom_ack(struct irq_data *d)
@@ -160,54 +139,42 @@ static void dw_pci_bottom_ack(struct irq_data *d)
}
static struct irq_chip dw_pci_msi_bottom_irq_chip = {
- .name = "DWPCI-MSI",
- .irq_ack = dw_pci_bottom_ack,
- .irq_compose_msi_msg = dw_pci_setup_msi_msg,
- .irq_mask = dw_pci_bottom_mask,
- .irq_unmask = dw_pci_bottom_unmask,
+ .name = "DWPCI-MSI",
+ .irq_ack = dw_pci_bottom_ack,
+ .irq_compose_msi_msg = dw_pci_setup_msi_msg,
+ .irq_mask = dw_pci_bottom_mask,
+ .irq_unmask = dw_pci_bottom_unmask,
};
-static int dw_pcie_irq_domain_alloc(struct irq_domain *domain,
- unsigned int virq, unsigned int nr_irqs,
- void *args)
+static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs, void *args)
{
struct dw_pcie_rp *pp = domain->host_data;
- unsigned long flags;
- u32 i;
int bit;
- raw_spin_lock_irqsave(&pp->lock, flags);
-
- bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
- order_base_2(nr_irqs));
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
+ scoped_guard (raw_spinlock_irq, &pp->lock) {
+ bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
+ order_base_2(nr_irqs));
+ }
if (bit < 0)
return -ENOSPC;
- for (i = 0; i < nr_irqs; i++)
- irq_domain_set_info(domain, virq + i, bit + i,
- pp->msi_irq_chip,
- pp, handle_edge_irq,
- NULL, NULL);
-
+ for (unsigned int i = 0; i < nr_irqs; i++) {
+ irq_domain_set_info(domain, virq + i, bit + i, pp->msi_irq_chip,
+ pp, handle_edge_irq, NULL, NULL);
+ }
return 0;
}
-static void dw_pcie_irq_domain_free(struct irq_domain *domain,
- unsigned int virq, unsigned int nr_irqs)
+static void dw_pcie_irq_domain_free(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs)
{
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
struct dw_pcie_rp *pp = domain->host_data;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&pp->lock, flags);
- bitmap_release_region(pp->msi_irq_in_use, d->hwirq,
- order_base_2(nr_irqs));
-
- raw_spin_unlock_irqrestore(&pp->lock, flags);
+ guard(raw_spinlock_irq)(&pp->lock);
+ bitmap_release_region(pp->msi_irq_in_use, d->hwirq, order_base_2(nr_irqs));
}
static const struct irq_domain_ops dw_pcie_msi_domain_ops = {
@@ -241,8 +208,7 @@ void dw_pcie_free_msi(struct dw_pcie_rp *pp)
for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++) {
if (pp->msi_irq[ctrl] > 0)
- irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
- NULL, NULL);
+ irq_set_chained_handler_and_data(pp->msi_irq[ctrl], NULL, NULL);
}
irq_domain_remove(pp->irq_domain);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 3168595..403f6cf 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -821,7 +821,7 @@ static inline enum dw_pcie_ltssm dw_pcie_get_ltssm(struct dw_pcie *pci)
#ifdef CONFIG_PCIE_DW_HOST
int dw_pcie_suspend_noirq(struct dw_pcie *pci);
int dw_pcie_resume_noirq(struct dw_pcie *pci);
-irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp);
+void dw_handle_msi_irq(struct dw_pcie_rp *pp);
void dw_pcie_msi_init(struct dw_pcie_rp *pp);
int dw_pcie_msi_host_init(struct dw_pcie_rp *pp);
void dw_pcie_free_msi(struct dw_pcie_rp *pp);
@@ -842,10 +842,7 @@ static inline int dw_pcie_resume_noirq(struct dw_pcie *pci)
return 0;
}
-static inline irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp)
-{
- return IRQ_NONE;
-}
+static inline void dw_handle_msi_irq(struct dw_pcie_rp *pp) { }
static inline void dw_pcie_msi_init(struct dw_pcie_rp *pp)
{ }
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2025-11-28 21:20 [PATCH v3 0/3] Enable MSI affinity support for dwc PCI Radu Rendec
2025-11-28 21:20 ` [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure Radu Rendec
2025-11-28 21:20 ` [PATCH v3 2/3] PCI: dwc: Code cleanup Radu Rendec
@ 2025-11-28 21:20 ` Radu Rendec
2025-12-15 21:34 ` [tip: irq/msi] " tip-bot2 for Radu Rendec
2026-01-20 18:01 ` [PATCH v3 3/3] " Jon Hunter
2 siblings, 2 replies; 25+ messages in thread
From: Radu Rendec @ 2025-11-28 21:20 UTC (permalink / raw)
To: Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel
Leverage the interrupt redirection infrastructure to enable CPU affinity
support for MSI interrupts. Since the parent interrupt affinity cannot
be changed, affinity control for the child interrupt (MSI) is achieved
by redirecting the handler to run in IRQ work context on the target CPU.
This patch was originally prepared by Thomas Gleixner (see Link tag
below) in a patch series that was never submitted as is, and only
parts of that series have made it upstream so far.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Signed-off-by: Radu Rendec <rrendec@redhat.com>
---
.../pci/controller/dwc/pcie-designware-host.c | 33 ++++++++++++++++---
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index aa93acaa579a5..90d9cb45e7842 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -26,9 +26,27 @@ static struct pci_ops dw_pcie_ops;
static struct pci_ops dw_pcie_ecam_ops;
static struct pci_ops dw_child_pcie_ops;
+#ifdef CONFIG_SMP
+static void dw_irq_noop(struct irq_data *d) { }
+#endif
+
+static bool dw_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+ struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+ if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+ return false;
+
+#ifdef CONFIG_SMP
+ info->chip->irq_ack = dw_irq_noop;
+ info->chip->irq_pre_redirect = irq_chip_pre_redirect_parent;
+#else
+ info->chip->irq_ack = irq_chip_ack_parent;
+#endif
+ return true;
+}
+
#define DW_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
MSI_FLAG_USE_DEF_CHIP_OPS | \
- MSI_FLAG_NO_AFFINITY | \
MSI_FLAG_PCI_MSI_MASK_PARENT)
#define DW_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \
MSI_FLAG_PCI_MSIX | \
@@ -40,9 +58,8 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
.required_flags = DW_PCIE_MSI_FLAGS_REQUIRED,
.supported_flags = DW_PCIE_MSI_FLAGS_SUPPORTED,
.bus_select_token = DOMAIN_BUS_PCI_MSI,
- .chip_flags = MSI_CHIP_FLAG_SET_ACK,
.prefix = "DW-",
- .init_dev_msi_info = msi_lib_init_dev_msi_info,
+ .init_dev_msi_info = dw_pcie_init_dev_msi_info,
};
/* MSI int handler */
@@ -63,7 +80,7 @@ void dw_handle_msi_irq(struct dw_pcie_rp *pp)
continue;
for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
- generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
+ generic_handle_demux_domain_irq(pp->irq_domain, irq_off + pos);
}
}
@@ -140,10 +157,16 @@ static void dw_pci_bottom_ack(struct irq_data *d)
static struct irq_chip dw_pci_msi_bottom_irq_chip = {
.name = "DWPCI-MSI",
- .irq_ack = dw_pci_bottom_ack,
.irq_compose_msi_msg = dw_pci_setup_msi_msg,
.irq_mask = dw_pci_bottom_mask,
.irq_unmask = dw_pci_bottom_unmask,
+#ifdef CONFIG_SMP
+ .irq_ack = dw_irq_noop,
+ .irq_pre_redirect = dw_pci_bottom_ack,
+ .irq_set_affinity = irq_chip_redirect_set_affinity,
+#else
+ .irq_ack = dw_pci_bottom_ack,
+#endif
};
static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
--
2.51.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [tip: irq/msi] PCI: dwc: Enable MSI affinity support
2025-11-28 21:20 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Radu Rendec
@ 2025-12-15 21:34 ` tip-bot2 for Radu Rendec
2026-01-06 9:53 ` Jon Hunter
2026-01-20 18:01 ` [PATCH v3 3/3] " Jon Hunter
1 sibling, 1 reply; 25+ messages in thread
From: tip-bot2 for Radu Rendec @ 2025-12-15 21:34 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Radu Rendec, x86, linux-kernel
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: eaf290c404f7c39f23292e9ce83b8b5b51ab598a
Gitweb: https://git.kernel.org/tip/eaf290c404f7c39f23292e9ce83b8b5b51ab598a
Author: Radu Rendec <rrendec@redhat.com>
AuthorDate: Fri, 28 Nov 2025 16:20:55 -05:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
PCI: dwc: Enable MSI affinity support
Leverage the interrupt redirection infrastructure to enable CPU affinity
support for MSI interrupts. Since the parent interrupt affinity cannot
be changed, affinity control for the child interrupt (MSI) is achieved
by redirecting the handler to run in IRQ work context on the target CPU.
This patch was originally prepared by Thomas Gleixner (see Link tag below)
in a patch series that was never submitted as is, and only parts of that
series have made it upstream so far.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
Link: https://patch.msgid.link/20251128212055.1409093-4-rrendec@redhat.com
---
drivers/pci/controller/dwc/pcie-designware-host.c | 33 +++++++++++---
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 25ad1ae..f116591 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -26,9 +26,27 @@ static struct pci_ops dw_pcie_ops;
static struct pci_ops dw_pcie_ecam_ops;
static struct pci_ops dw_child_pcie_ops;
+#ifdef CONFIG_SMP
+static void dw_irq_noop(struct irq_data *d) { }
+#endif
+
+static bool dw_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+ struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+ if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+ return false;
+
+#ifdef CONFIG_SMP
+ info->chip->irq_ack = dw_irq_noop;
+ info->chip->irq_pre_redirect = irq_chip_pre_redirect_parent;
+#else
+ info->chip->irq_ack = irq_chip_ack_parent;
+#endif
+ return true;
+}
+
#define DW_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
MSI_FLAG_USE_DEF_CHIP_OPS | \
- MSI_FLAG_NO_AFFINITY | \
MSI_FLAG_PCI_MSI_MASK_PARENT)
#define DW_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \
MSI_FLAG_PCI_MSIX | \
@@ -40,9 +58,8 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
.required_flags = DW_PCIE_MSI_FLAGS_REQUIRED,
.supported_flags = DW_PCIE_MSI_FLAGS_SUPPORTED,
.bus_select_token = DOMAIN_BUS_PCI_MSI,
- .chip_flags = MSI_CHIP_FLAG_SET_ACK,
.prefix = "DW-",
- .init_dev_msi_info = msi_lib_init_dev_msi_info,
+ .init_dev_msi_info = dw_pcie_init_dev_msi_info,
};
/* MSI int handler */
@@ -63,7 +80,7 @@ void dw_handle_msi_irq(struct dw_pcie_rp *pp)
continue;
for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
- generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
+ generic_handle_demux_domain_irq(pp->irq_domain, irq_off + pos);
}
}
@@ -140,10 +157,16 @@ static void dw_pci_bottom_ack(struct irq_data *d)
static struct irq_chip dw_pci_msi_bottom_irq_chip = {
.name = "DWPCI-MSI",
- .irq_ack = dw_pci_bottom_ack,
.irq_compose_msi_msg = dw_pci_setup_msi_msg,
.irq_mask = dw_pci_bottom_mask,
.irq_unmask = dw_pci_bottom_unmask,
+#ifdef CONFIG_SMP
+ .irq_ack = dw_irq_noop,
+ .irq_pre_redirect = dw_pci_bottom_ack,
+ .irq_set_affinity = irq_chip_redirect_set_affinity,
+#else
+ .irq_ack = dw_pci_bottom_ack,
+#endif
};
static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [tip: irq/msi] PCI: dwc: Enable MSI affinity support
2025-12-15 21:34 ` [tip: irq/msi] " tip-bot2 for Radu Rendec
@ 2026-01-06 9:53 ` Jon Hunter
2026-01-06 15:07 ` Radu Rendec
0 siblings, 1 reply; 25+ messages in thread
From: Jon Hunter @ 2026-01-06 9:53 UTC (permalink / raw)
To: linux-kernel, linux-tip-commits, Radu Rendec
Cc: Thomas Gleixner, x86, linux-tegra@vger.kernel.org
Hi Radu,
On 15/12/2025 21:34, tip-bot2 for Radu Rendec wrote:
> The following commit has been merged into the irq/msi branch of tip:
>
> Commit-ID: eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> Gitweb: https://git.kernel.org/tip/eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> Author: Radu Rendec <rrendec@redhat.com>
> AuthorDate: Fri, 28 Nov 2025 16:20:55 -05:00
> Committer: Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
>
> PCI: dwc: Enable MSI affinity support
>
> Leverage the interrupt redirection infrastructure to enable CPU affinity
> support for MSI interrupts. Since the parent interrupt affinity cannot
> be changed, affinity control for the child interrupt (MSI) is achieved
> by redirecting the handler to run in IRQ work context on the target CPU.
>
> This patch was originally prepared by Thomas Gleixner (see Link tag below)
> in a patch series that was never submitted as is, and only parts of that
> series have made it upstream so far.
>
> Originally-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Radu Rendec <rrendec@redhat.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
> Link: https://patch.msgid.link/20251128212055.1409093-4-rrendec@redhat.com
With next-20260105 I am observing the following warning on the Tegra194
Jetson AGX platform ...
WARNING KERN genirq: irq_chip DW-PCI-MSI-0001:01:00.0 did not update
eff. affinity mask of irq 171
Bisect is point to this commit. This platform is using the driver
drivers/pci/controller/dwc/pcie-tegra194.c. Is there some default
affinity that we should be setting to avoid this warning?
Thanks
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [tip: irq/msi] PCI: dwc: Enable MSI affinity support
2026-01-06 9:53 ` Jon Hunter
@ 2026-01-06 15:07 ` Radu Rendec
2026-01-07 1:13 ` Radu Rendec
0 siblings, 1 reply; 25+ messages in thread
From: Radu Rendec @ 2026-01-06 15:07 UTC (permalink / raw)
To: Jon Hunter, linux-kernel, linux-tip-commits
Cc: Thomas Gleixner, x86, linux-tegra@vger.kernel.org
Hi Jon,
On Tue, 2026-01-06 at 09:53 +0000, Jon Hunter wrote:
> On 15/12/2025 21:34, tip-bot2 for Radu Rendec wrote:
> > The following commit has been merged into the irq/msi branch of tip:
> >
> > Commit-ID: eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> > Gitweb: https://git.kernel.org/tip/eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> > Author: Radu Rendec <rrendec@redhat.com>
> > AuthorDate: Fri, 28 Nov 2025 16:20:55 -05:00
> > Committer: Thomas Gleixner <tglx@linutronix.de>
> > CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
> >
> > PCI: dwc: Enable MSI affinity support
> >
> > Leverage the interrupt redirection infrastructure to enable CPU affinity
> > support for MSI interrupts. Since the parent interrupt affinity cannot
> > be changed, affinity control for the child interrupt (MSI) is achieved
> > by redirecting the handler to run in IRQ work context on the target CPU.
> >
> > This patch was originally prepared by Thomas Gleixner (see Link tag below)
> > in a patch series that was never submitted as is, and only parts of that
> > series have made it upstream so far.
> >
> > Originally-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Radu Rendec <rrendec@redhat.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
> > Link: https://patch.msgid.link/20251128212055.1409093-4-rrendec@redhat.com
>
>
> With next-20260105 I am observing the following warning on the Tegra194
> Jetson AGX platform ...
>
> WARNING KERN genirq: irq_chip DW-PCI-MSI-0001:01:00.0 did not update
> eff. affinity mask of irq 171
>
> Bisect is point to this commit. This platform is using the driver
> drivers/pci/controller/dwc/pcie-tegra194.c. Is there some default
> affinity that we should be setting to avoid this warning?
Before that patch, affinity control wasn't even possible for PCI MSIs
exposed by the dw_pci drivers. Without having looked at the code yet,
I suspect it's just because now that affinity control is enabled,
something tries to use it.
I don't think you should set some default affinity. By default, the PCI
MSIs should be affine to all available CPUs, and that warning shouldn't
happen in the first place. Let me test on Jetson AGX and see what's
going on. I'll update the thread with my findings, hopefully later
today.
--
Thanks,
Radu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [tip: irq/msi] PCI: dwc: Enable MSI affinity support
2026-01-06 15:07 ` Radu Rendec
@ 2026-01-07 1:13 ` Radu Rendec
0 siblings, 0 replies; 25+ messages in thread
From: Radu Rendec @ 2026-01-07 1:13 UTC (permalink / raw)
To: Jon Hunter, linux-kernel, linux-tip-commits
Cc: Thomas Gleixner, x86, linux-tegra@vger.kernel.org
Hi Jon,
On Tue, 2026-01-06 at 10:07 -0500, Radu Rendec wrote:
> On Tue, 2026-01-06 at 09:53 +0000, Jon Hunter wrote:
> > On 15/12/2025 21:34, tip-bot2 for Radu Rendec wrote:
> > > The following commit has been merged into the irq/msi branch of tip:
> > >
> > > Commit-ID: eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> > > Gitweb: https://git.kernel.org/tip/eaf290c404f7c39f23292e9ce83b8b5b51ab598a
> > > Author: Radu Rendec <rrendec@redhat.com>
> > > AuthorDate: Fri, 28 Nov 2025 16:20:55 -05:00
> > > Committer: Thomas Gleixner <tglx@linutronix.de>
> > > CommitterDate: Mon, 15 Dec 2025 22:30:48 +01:00
> > >
> > > PCI: dwc: Enable MSI affinity support
> > >
> > > Leverage the interrupt redirection infrastructure to enable CPU affinity
> > > support for MSI interrupts. Since the parent interrupt affinity cannot
> > > be changed, affinity control for the child interrupt (MSI) is achieved
> > > by redirecting the handler to run in IRQ work context on the target CPU.
> > >
> > > This patch was originally prepared by Thomas Gleixner (see Link tag below)
> > > in a patch series that was never submitted as is, and only parts of that
> > > series have made it upstream so far.
> > >
> > > Originally-by: Thomas Gleixner <tglx@linutronix.de>
> > > Signed-off-by: Radu Rendec <rrendec@redhat.com>
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
> > > Link: https://patch.msgid.link/20251128212055.1409093-4-rrendec@redhat.com
> >
> >
> > With next-20260105 I am observing the following warning on the Tegra194
> > Jetson AGX platform ...
> >
> > WARNING KERN genirq: irq_chip DW-PCI-MSI-0001:01:00.0 did not update
> > eff. affinity mask of irq 171
> >
> > Bisect is point to this commit. This platform is using the driver
> > drivers/pci/controller/dwc/pcie-tegra194.c. Is there some default
> > affinity that we should be setting to avoid this warning?
>
> Before that patch, affinity control wasn't even possible for PCI MSIs
> exposed by the dw_pci drivers. Without having looked at the code yet,
> I suspect it's just because now that affinity control is enabled,
> something tries to use it.
>
> I don't think you should set some default affinity. By default, the PCI
> MSIs should be affine to all available CPUs, and that warning shouldn't
> happen in the first place. Let me test on Jetson AGX and see what's
> going on. I'll update the thread with my findings, hopefully later
> today.
I looked at the code and tested, and the problem is that the effective
affinity mask is not updated for interrupt redirection. The bug is not
in this patch, but the previous one in the series [1], which adds the
interrupt redirection framework.
The warning is actually triggered when the MSI is set up. This is the
top part of the relevant stack trace:
irq_do_set_affinity+0x28c/0x300 (P)
irq_setup_affinity+0x130/0x208
irq_startup+0x118/0x170
__setup_irq+0x5b0/0x6a0
request_threaded_irq+0xb8/0x180
devm_request_threaded_irq+0x88/0x150
rtw_pci_probe+0x1e8/0x370 [rtw88_pci]
I don't immediately see an easy way to fix it for the generic case
because the affinity of the demultiplexing IRQ (the "parent" IRQ) can
change after the affinity of the demultiplexed IRQ (the "child" IRQ)
has been set up. But since dw_pcie is currently the only user of the
interrupt redirection infrastructure, and it sets up the demultiplexing
IRQ as a chained IRQ, there is no way its affinity can change other
than CPU hot(un)plug. And in this particular case, something as simple
as will work:
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index d5c3f6ee24cc2..036641f9534ae 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1512,8 +1512,11 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resources_parent);
int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *dest, bool force)
{
struct irq_redirect *redir = &irq_data_to_desc(data)->redirect;
+ unsigned int target_cpu = cpumask_first(dest);
+
+ WRITE_ONCE(redir->target_cpu, target_cpu);
+ irq_data_update_effective_affinity(data, cpumask_of(target_cpu));
- WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
return IRQ_SET_MASK_OK;
}
EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
I will send this as a proper patch tomorrow, and it will fix the
immediate problem and buy some time for a more elaborate fix for the
generic case. Meanwhile, thanks a lot for finding/reporting this!
[1] https://lore.kernel.org/all/20251128212055.1409093-2-rrendec@redhat.com/
--
Best regards,
Radu
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2025-11-28 21:20 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Radu Rendec
2025-12-15 21:34 ` [tip: irq/msi] " tip-bot2 for Radu Rendec
@ 2026-01-20 18:01 ` Jon Hunter
2026-01-20 22:30 ` Radu Rendec
1 sibling, 1 reply; 25+ messages in thread
From: Jon Hunter @ 2026-01-20 18:01 UTC (permalink / raw)
To: Radu Rendec, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Radu,
On 28/11/2025 21:20, Radu Rendec wrote:
> Leverage the interrupt redirection infrastructure to enable CPU affinity
> support for MSI interrupts. Since the parent interrupt affinity cannot
> be changed, affinity control for the child interrupt (MSI) is achieved
> by redirecting the handler to run in IRQ work context on the target CPU.
>
> This patch was originally prepared by Thomas Gleixner (see Link tag
> below) in a patch series that was never submitted as is, and only
> parts of that series have made it upstream so far.
>
> Originally-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
> Signed-off-by: Radu Rendec <rrendec@redhat.com>
> ---
> .../pci/controller/dwc/pcie-designware-host.c | 33 ++++++++++++++++---
> 1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index aa93acaa579a5..90d9cb45e7842 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -26,9 +26,27 @@ static struct pci_ops dw_pcie_ops;
> static struct pci_ops dw_pcie_ecam_ops;
> static struct pci_ops dw_child_pcie_ops;
>
> +#ifdef CONFIG_SMP
> +static void dw_irq_noop(struct irq_data *d) { }
> +#endif
> +
> +static bool dw_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> + struct irq_domain *real_parent, struct msi_domain_info *info)
> +{
> + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
> + return false;
> +
> +#ifdef CONFIG_SMP
> + info->chip->irq_ack = dw_irq_noop;
> + info->chip->irq_pre_redirect = irq_chip_pre_redirect_parent;
> +#else
> + info->chip->irq_ack = irq_chip_ack_parent;
> +#endif
> + return true;
> +}
> +
> #define DW_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
> MSI_FLAG_USE_DEF_CHIP_OPS | \
> - MSI_FLAG_NO_AFFINITY | \
> MSI_FLAG_PCI_MSI_MASK_PARENT)
> #define DW_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \
> MSI_FLAG_PCI_MSIX | \
> @@ -40,9 +58,8 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
> .required_flags = DW_PCIE_MSI_FLAGS_REQUIRED,
> .supported_flags = DW_PCIE_MSI_FLAGS_SUPPORTED,
> .bus_select_token = DOMAIN_BUS_PCI_MSI,
> - .chip_flags = MSI_CHIP_FLAG_SET_ACK,
> .prefix = "DW-",
> - .init_dev_msi_info = msi_lib_init_dev_msi_info,
> + .init_dev_msi_info = dw_pcie_init_dev_msi_info,
> };
>
> /* MSI int handler */
> @@ -63,7 +80,7 @@ void dw_handle_msi_irq(struct dw_pcie_rp *pp)
> continue;
>
> for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
> - generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
> + generic_handle_demux_domain_irq(pp->irq_domain, irq_off + pos);
> }
> }
>
> @@ -140,10 +157,16 @@ static void dw_pci_bottom_ack(struct irq_data *d)
>
> static struct irq_chip dw_pci_msi_bottom_irq_chip = {
> .name = "DWPCI-MSI",
> - .irq_ack = dw_pci_bottom_ack,
> .irq_compose_msi_msg = dw_pci_setup_msi_msg,
> .irq_mask = dw_pci_bottom_mask,
> .irq_unmask = dw_pci_bottom_unmask,
> +#ifdef CONFIG_SMP
> + .irq_ack = dw_irq_noop,
> + .irq_pre_redirect = dw_pci_bottom_ack,
> + .irq_set_affinity = irq_chip_redirect_set_affinity,
> +#else
> + .irq_ack = dw_pci_bottom_ack,
> +#endif
> };
>
> static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
I am seeing another issue with this patch. On the Tegra194 AGX Xavier
platform suspend is failing and reverting this patch fixes the problem.
Unfortunately the logs don't tell me much. In a bad case I see ...
PM: suspend entry (deep)
Filesystems sync: 0.000 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.002 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
tegra-xusb 3610000.usb: Firmware timestamp: 2020-09-11 16:55:03 UTC
dwc-eth-dwmac 2490000.ethernet eth0: Link is Down
tegra194-pcie 14100000.pcie: Link didn't transition to L2 state
Disabling non-boot CPUs ...
It appears to hang here. In a good case I see ...
PM: suspend entry (deep)
Filesystems sync: 0.000 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.002 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
tegra-xusb 3610000.usb: Firmware timestamp: 2020-09-11 16:55:03 UTC
dwc-eth-dwmac 2490000.ethernet eth0: Link is Down
tegra194-pcie 14100000.pcie: Link didn't transition to L2 state
Disabling non-boot CPUs ...
psci: CPU7 killed (polled 0 ms)
psci: CPU6 killed (polled 4 ms)
psci: CPU5 killed (polled 0 ms)
psci: CPU4 killed (polled 4 ms)
psci: CPU3 killed (polled 4 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 0 ms)
...
Enabling non-boot CPUs ... (resume starts)
So it looks like it is hanging when disabling the non-boot CPUs. So far
it only appears to happen on Tegra194.
Let me know if you have any suggestions.
Thanks
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-20 18:01 ` [PATCH v3 3/3] " Jon Hunter
@ 2026-01-20 22:30 ` Radu Rendec
2026-01-21 14:00 ` Jon Hunter
0 siblings, 1 reply; 25+ messages in thread
From: Radu Rendec @ 2026-01-20 22:30 UTC (permalink / raw)
To: Jon Hunter, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Jon,
On Tue, 2026-01-20 at 18:01 +0000, Jon Hunter wrote:
> On 28/11/2025 21:20, Radu Rendec wrote:
> > Leverage the interrupt redirection infrastructure to enable CPU affinity
> > support for MSI interrupts. Since the parent interrupt affinity cannot
> > be changed, affinity control for the child interrupt (MSI) is achieved
> > by redirecting the handler to run in IRQ work context on the target CPU.
> >
> > This patch was originally prepared by Thomas Gleixner (see Link tag
> > below) in a patch series that was never submitted as is, and only
> > parts of that series have made it upstream so far.
> >
> > Originally-by: Thomas Gleixner <tglx@linutronix.de>
> > Link: https://lore.kernel.org/linux-pci/878qpg4o4t.ffs@tglx/
> > Signed-off-by: Radu Rendec <rrendec@redhat.com>
> > ---
> > .../pci/controller/dwc/pcie-designware-host.c | 33 ++++++++++++++++---
> > 1 file changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index aa93acaa579a5..90d9cb45e7842 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -26,9 +26,27 @@ static struct pci_ops dw_pcie_ops;
> > static struct pci_ops dw_pcie_ecam_ops;
> > static struct pci_ops dw_child_pcie_ops;
> >
> > +#ifdef CONFIG_SMP
> > +static void dw_irq_noop(struct irq_data *d) { }
> > +#endif
> > +
> > +static bool dw_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> > + struct irq_domain *real_parent, struct msi_domain_info *info)
> > +{
> > + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
> > + return false;
> > +
> > +#ifdef CONFIG_SMP
> > + info->chip->irq_ack = dw_irq_noop;
> > + info->chip->irq_pre_redirect = irq_chip_pre_redirect_parent;
> > +#else
> > + info->chip->irq_ack = irq_chip_ack_parent;
> > +#endif
> > + return true;
> > +}
> > +
> > #define DW_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
> > MSI_FLAG_USE_DEF_CHIP_OPS | \
> > - MSI_FLAG_NO_AFFINITY | \
> > MSI_FLAG_PCI_MSI_MASK_PARENT)
> > #define DW_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \
> > MSI_FLAG_PCI_MSIX | \
> > @@ -40,9 +58,8 @@ static const struct msi_parent_ops dw_pcie_msi_parent_ops = {
> > .required_flags = DW_PCIE_MSI_FLAGS_REQUIRED,
> > .supported_flags = DW_PCIE_MSI_FLAGS_SUPPORTED,
> > .bus_select_token = DOMAIN_BUS_PCI_MSI,
> > - .chip_flags = MSI_CHIP_FLAG_SET_ACK,
> > .prefix = "DW-",
> > - .init_dev_msi_info = msi_lib_init_dev_msi_info,
> > + .init_dev_msi_info = dw_pcie_init_dev_msi_info,
> > };
> >
> > /* MSI int handler */
> > @@ -63,7 +80,7 @@ void dw_handle_msi_irq(struct dw_pcie_rp *pp)
> > continue;
> >
> > for_each_set_bit(pos, &status, MAX_MSI_IRQS_PER_CTRL)
> > - generic_handle_domain_irq(pp->irq_domain, irq_off + pos);
> > + generic_handle_demux_domain_irq(pp->irq_domain, irq_off + pos);
> > }
> > }
> >
> > @@ -140,10 +157,16 @@ static void dw_pci_bottom_ack(struct irq_data *d)
> >
> > static struct irq_chip dw_pci_msi_bottom_irq_chip = {
> > .name = "DWPCI-MSI",
> > - .irq_ack = dw_pci_bottom_ack,
> > .irq_compose_msi_msg = dw_pci_setup_msi_msg,
> > .irq_mask = dw_pci_bottom_mask,
> > .irq_unmask = dw_pci_bottom_unmask,
> > +#ifdef CONFIG_SMP
> > + .irq_ack = dw_irq_noop,
> > + .irq_pre_redirect = dw_pci_bottom_ack,
> > + .irq_set_affinity = irq_chip_redirect_set_affinity,
> > +#else
> > + .irq_ack = dw_pci_bottom_ack,
> > +#endif
> > };
> >
> > static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
>
>
> I am seeing another issue with this patch. On the Tegra194 AGX Xavier
> platform suspend is failing and reverting this patch fixes the problem.
>
> Unfortunately the logs don't tell me much. In a bad case I see ...
>
> PM: suspend entry (deep)
> Filesystems sync: 0.000 seconds
> Freezing user space processes
> Freezing user space processes completed (elapsed 0.002 seconds)
> OOM killer disabled.
> Freezing remaining freezable tasks
> Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> tegra-xusb 3610000.usb: Firmware timestamp: 2020-09-11 16:55:03 UTC
> dwc-eth-dwmac 2490000.ethernet eth0: Link is Down
> tegra194-pcie 14100000.pcie: Link didn't transition to L2 state
> Disabling non-boot CPUs ...
>
> It appears to hang here. In a good case I see ...
>
> PM: suspend entry (deep)
> Filesystems sync: 0.000 seconds
> Freezing user space processes
> Freezing user space processes completed (elapsed 0.002 seconds)
> OOM killer disabled.
> Freezing remaining freezable tasks
> Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> tegra-xusb 3610000.usb: Firmware timestamp: 2020-09-11 16:55:03 UTC
> dwc-eth-dwmac 2490000.ethernet eth0: Link is Down
> tegra194-pcie 14100000.pcie: Link didn't transition to L2 state
> Disabling non-boot CPUs ...
> psci: CPU7 killed (polled 0 ms)
> psci: CPU6 killed (polled 4 ms)
> psci: CPU5 killed (polled 0 ms)
> psci: CPU4 killed (polled 4 ms)
> psci: CPU3 killed (polled 4 ms)
> psci: CPU2 killed (polled 0 ms)
> psci: CPU1 killed (polled 0 ms)
> ...
> Enabling non-boot CPUs ... (resume starts)
>
> So it looks like it is hanging when disabling the non-boot CPUs. So far
> it only appears to happen on Tegra194.
>
> Let me know if you have any suggestions.
Ouch. I'm afraid this is going to be much harder to figure out than the
previous one, especially since I can't get access easily to a board to
test on. I will try to reserve a board and reproduce the bug.
Meanwhile, if you (or someone else in your team) can spare a few cycles,
could you please try to reproduce the bug again with the debug patch
below applied, and a few other changes:
* enable debug messages in kernel/irq/cpuhotplug.c;
* save the contents of /proc/interrupts to a file before suspending;
* add "no_console_suspend" to the kernel command line (although it
looks like you already have it).
It will be much more verbose during suspend but hopefully we can at
least figure out how far along it goes and how it's related to the MSI
affinity configuration.
Thanks,
Radu
---
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 84cc4bea773c0..62ae76661f26d 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1492,6 +1492,8 @@ int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *
{
struct irq_redirect *redir = &irq_data_to_desc(data)->redirect;
+ pr_info("%s: irq %u mask 0x%*pb\n", __func__, data->irq, cpumask_pr_args(dest));
+
WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
irq_data_update_effective_affinity(data, dest);
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index cd5689e383b00..d8c62547f9d06 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -59,6 +59,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
bool brokeaff = false;
int err;
+ pr_info("%s: irq %u cpu %u\n", __func__, d->irq, smp_processor_id());
+
/*
* IRQ chip might be already torn down, but the irq descriptor is
* still in the radix tree. Also if the chip has no affinity setter,
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 3fe6b0c99f3d8..94bd7ad64c9b7 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -227,6 +227,7 @@ static int multi_cpu_stop(void *data)
stop_machine_yield(cpumask);
newstate = READ_ONCE(msdata->state);
if (newstate != curstate) {
+ pr_info("%s: cpu %d entering state %d\n", __func__, cpu, newstate);
curstate = newstate;
switch (curstate) {
case MULTI_STOP_DISABLE_IRQ:
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-20 22:30 ` Radu Rendec
@ 2026-01-21 14:00 ` Jon Hunter
2026-01-22 23:31 ` Radu Rendec
0 siblings, 1 reply; 25+ messages in thread
From: Jon Hunter @ 2026-01-21 14:00 UTC (permalink / raw)
To: Radu Rendec, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
On 20/01/2026 22:30, Radu Rendec wrote:
...
>> So it looks like it is hanging when disabling the non-boot CPUs. So far
>> it only appears to happen on Tegra194.
>>
>> Let me know if you have any suggestions.
>
> Ouch. I'm afraid this is going to be much harder to figure out than the
> previous one, especially since I can't get access easily to a board to
> test on. I will try to reserve a board and reproduce the bug.
>
> Meanwhile, if you (or someone else in your team) can spare a few cycles,
> could you please try to reproduce the bug again with the debug patch
> below applied, and a few other changes:
> * enable debug messages in kernel/irq/cpuhotplug.c;
> * save the contents of /proc/interrupts to a file before suspending;
> * add "no_console_suspend" to the kernel command line (although it
> looks like you already have it).
>
> It will be much more verbose during suspend but hopefully we can at
> least figure out how far along it goes and how it's related to the MSI
> affinity configuration.
Thanks. I have dumped the boot log with the prints here:
https://pastebin.com/G8c2ssdt
And the dump of /proc/interrupts here:
https://pastebin.com/Wqzxw3r6
Looks like the last thing I see entering suspend is ...
irq_chip_redirect_set_affinity: irq 162 mask 0x7f
That appears to be a PCIe interrupt. Let me know if there are more tests
I can run.
Cheers
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-21 14:00 ` Jon Hunter
@ 2026-01-22 23:31 ` Radu Rendec
2026-01-23 13:25 ` Jon Hunter
2026-01-26 7:59 ` Thomas Gleixner
0 siblings, 2 replies; 25+ messages in thread
From: Radu Rendec @ 2026-01-22 23:31 UTC (permalink / raw)
To: Jon Hunter, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Jon,
On Wed, 2026-01-21 at 14:00 +0000, Jon Hunter wrote:
>
> On 20/01/2026 22:30, Radu Rendec wrote:
>
> ...
>
> > > So it looks like it is hanging when disabling the non-boot CPUs. So far
> > > it only appears to happen on Tegra194.
> > >
> > > Let me know if you have any suggestions.
> >
> > Ouch. I'm afraid this is going to be much harder to figure out than the
> > previous one, especially since I can't get access easily to a board to
> > test on. I will try to reserve a board and reproduce the bug.
> >
> > Meanwhile, if you (or someone else in your team) can spare a few cycles,
> > could you please try to reproduce the bug again with the debug patch
> > below applied, and a few other changes:
> > * enable debug messages in kernel/irq/cpuhotplug.c;
> > * save the contents of /proc/interrupts to a file before suspending;
> > * add "no_console_suspend" to the kernel command line (although it
> > looks like you already have it).
> >
> > It will be much more verbose during suspend but hopefully we can at
> > least figure out how far along it goes and how it's related to the MSI
> > affinity configuration.
>
>
> Thanks. I have dumped the boot log with the prints here:
>
> https://pastebin.com/G8c2ssdt
>
> And the dump of /proc/interrupts here:
>
> https://pastebin.com/Wqzxw3r6
>
> Looks like the last thing I see entering suspend is ...
>
> irq_chip_redirect_set_affinity: irq 162 mask 0x7f
>
> That appears to be a PCIe interrupt. Let me know if there are more tests
> I can run.
Thanks very much for running the test and for the logs. The good news
is good ol' printk debugging seems to be working, and the last message
in the log is indeed related to dw-pci irq affinity control, which is
what the patch touches. So we're on to something. The bad news is I
can't yet figure out what's wrong.
The CPUs are taken offline one by one, starting with CPU 7. The code in
question runs on the dying CPU, and with hardware interrupts disabled
on all CPUs. The (simplified) call stack looks like this:
irq_migrate_all_off_this_cpu
for_each_active_irq
migrate_one_irq
irq_do_set_affinity
irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
The debug patch I gave you adds:
* a printk to irq_chip_redirect_set_affinity (which is very small)
* a printk at the beginning of migrate_one_irq
Also, the call to irq_do_set_affinity is almost the last thing that
happens in migrate_one_irq, and that for_each_active_irq loop is quite
small too. So, there isn't much happening between the printk in
irq_chip_redirect_set_affinity for the msi irq (which we do see in the
log) and the printk in migrate_one_irq for the next irq (which we don't
see).
My first thought is to add more printk's between those two and narrow
down the spot where it gets stuck.
I think the fastest way to debug it is if I can test myself. I tried to
reproduce the issue on a Jetson AGX Orin, and I couldn't. By the way,
how often does it hang? e.g., out of say 10 suspend attempts, how many
fail?
I do have access to a Jetson Xavier NX (in theory) but it looks like
there's a lab issue with that board, which hopefully gets sorted out
tomorrow. If I can't get a hold of that board (or can't reproduce the
problem on it), I may ask you to try a few other things. In any case,
I'll update this thread again either tomorrow or (more likely) early
next week.
--
Thanks,
Radu
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-22 23:31 ` Radu Rendec
@ 2026-01-23 13:25 ` Jon Hunter
2026-01-26 7:59 ` Thomas Gleixner
1 sibling, 0 replies; 25+ messages in thread
From: Jon Hunter @ 2026-01-23 13:25 UTC (permalink / raw)
To: Radu Rendec, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
On 22/01/2026 23:31, Radu Rendec wrote:
...
> Thanks very much for running the test and for the logs. The good news
> is good ol' printk debugging seems to be working, and the last message
> in the log is indeed related to dw-pci irq affinity control, which is
> what the patch touches. So we're on to something. The bad news is I
> can't yet figure out what's wrong.
>
> The CPUs are taken offline one by one, starting with CPU 7. The code in
> question runs on the dying CPU, and with hardware interrupts disabled
> on all CPUs. The (simplified) call stack looks like this:
>
> irq_migrate_all_off_this_cpu
> for_each_active_irq
> migrate_one_irq
> irq_do_set_affinity
> irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
>
> The debug patch I gave you adds:
> * a printk to irq_chip_redirect_set_affinity (which is very small)
> * a printk at the beginning of migrate_one_irq
>
> Also, the call to irq_do_set_affinity is almost the last thing that
> happens in migrate_one_irq, and that for_each_active_irq loop is quite
> small too. So, there isn't much happening between the printk in
> irq_chip_redirect_set_affinity for the msi irq (which we do see in the
> log) and the printk in migrate_one_irq for the next irq (which we don't
> see).
>
> My first thought is to add more printk's between those two and narrow
> down the spot where it gets stuck.
>
> I think the fastest way to debug it is if I can test myself. I tried to
> reproduce the issue on a Jetson AGX Orin, and I couldn't. By the way,
> how often does it hang? e.g., out of say 10 suspend attempts, how many
> fail?
For Jetson AGX Xavier it fails on the first suspend attempt.
> I do have access to a Jetson Xavier NX (in theory) but it looks like
> there's a lab issue with that board, which hopefully gets sorted out
> tomorrow. If I can't get a hold of that board (or can't reproduce the
> problem on it), I may ask you to try a few other things. In any case,
> I'll update this thread again either tomorrow or (more likely) early
> next week.
Weirdly I don't see this with Jetson Xavier NX. However, could be worth
trying but you may wish to revert this change [0] because it is causing
other issues for Jetson Xavier NX.
Jon
[0]
https://lore.kernel.org/linux-tegra/e32b0819-2c29-4c83-83d5-e28dc4b2b01f@nvidia.com/
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-22 23:31 ` Radu Rendec
2026-01-23 13:25 ` Jon Hunter
@ 2026-01-26 7:59 ` Thomas Gleixner
2026-01-26 22:07 ` Jon Hunter
1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2026-01-26 7:59 UTC (permalink / raw)
To: Radu Rendec, Jon Hunter, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
On Thu, Jan 22 2026 at 18:31, Radu Rendec wrote:
> The CPUs are taken offline one by one, starting with CPU 7. The code in
> question runs on the dying CPU, and with hardware interrupts disabled
> on all CPUs. The (simplified) call stack looks like this:
>
> irq_migrate_all_off_this_cpu
> for_each_active_irq
> migrate_one_irq
> irq_do_set_affinity
> irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
>
> The debug patch I gave you adds:
> * a printk to irq_chip_redirect_set_affinity (which is very small)
> * a printk at the beginning of migrate_one_irq
>
> Also, the call to irq_do_set_affinity is almost the last thing that
> happens in migrate_one_irq, and that for_each_active_irq loop is quite
> small too. So, there isn't much happening between the printk in
> irq_chip_redirect_set_affinity for the msi irq (which we do see in the
> log) and the printk in migrate_one_irq for the next irq (which we don't
> see).
This doesn't make any sense at all. irq_chip_redirect_set_affinity() is
only accessing interrupt descriptor associated memory and the new
redirection CPU is the same as the previous one as the mask changes from
0xff to 0x7f and therefore cpumask_first() yields 0 in both cases.
According to the provided dmesg, this happens on linux-next.
Jon, can you please validate that this happens as well on
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
Thanks
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-26 7:59 ` Thomas Gleixner
@ 2026-01-26 22:07 ` Jon Hunter
2026-01-26 22:26 ` Radu Rendec
0 siblings, 1 reply; 25+ messages in thread
From: Jon Hunter @ 2026-01-26 22:07 UTC (permalink / raw)
To: Thomas Gleixner, Radu Rendec, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Thomas,
On 26/01/2026 07:59, Thomas Gleixner wrote:
> On Thu, Jan 22 2026 at 18:31, Radu Rendec wrote:
>> The CPUs are taken offline one by one, starting with CPU 7. The code in
>> question runs on the dying CPU, and with hardware interrupts disabled
>> on all CPUs. The (simplified) call stack looks like this:
>>
>> irq_migrate_all_off_this_cpu
>> for_each_active_irq
>> migrate_one_irq
>> irq_do_set_affinity
>> irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
>>
>> The debug patch I gave you adds:
>> * a printk to irq_chip_redirect_set_affinity (which is very small)
>> * a printk at the beginning of migrate_one_irq
>>
>> Also, the call to irq_do_set_affinity is almost the last thing that
>> happens in migrate_one_irq, and that for_each_active_irq loop is quite
>> small too. So, there isn't much happening between the printk in
>> irq_chip_redirect_set_affinity for the msi irq (which we do see in the
>> log) and the printk in migrate_one_irq for the next irq (which we don't
>> see).
>
> This doesn't make any sense at all. irq_chip_redirect_set_affinity() is
> only accessing interrupt descriptor associated memory and the new
> redirection CPU is the same as the previous one as the mask changes from
> 0xff to 0x7f and therefore cpumask_first() yields 0 in both cases.
>
> According to the provided dmesg, this happens on linux-next.
>
> Jon, can you please validate that this happens as well on
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
I tried this branch and I see suspend failing with that branch too. If I
revert this change on top of your branch or -next, I don't see any
problems.
Thanks
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-26 22:07 ` Jon Hunter
@ 2026-01-26 22:26 ` Radu Rendec
2026-01-27 10:30 ` Thomas Gleixner
0 siblings, 1 reply; 25+ messages in thread
From: Radu Rendec @ 2026-01-26 22:26 UTC (permalink / raw)
To: Jon Hunter, Thomas Gleixner, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Jon,
On Mon, 2026-01-26 at 22:07 +0000, Jon Hunter wrote:
> On 26/01/2026 07:59, Thomas Gleixner wrote:
> > On Thu, Jan 22 2026 at 18:31, Radu Rendec wrote:
> > > The CPUs are taken offline one by one, starting with CPU 7. The code in
> > > question runs on the dying CPU, and with hardware interrupts disabled
> > > on all CPUs. The (simplified) call stack looks like this:
> > >
> > > irq_migrate_all_off_this_cpu
> > > for_each_active_irq
> > > migrate_one_irq
> > > irq_do_set_affinity
> > > irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
> > >
> > > The debug patch I gave you adds:
> > > * a printk to irq_chip_redirect_set_affinity (which is very small)
> > > * a printk at the beginning of migrate_one_irq
> > >
> > > Also, the call to irq_do_set_affinity is almost the last thing that
> > > happens in migrate_one_irq, and that for_each_active_irq loop is quite
> > > small too. So, there isn't much happening between the printk in
> > > irq_chip_redirect_set_affinity for the msi irq (which we do see in the
> > > log) and the printk in migrate_one_irq for the next irq (which we don't
> > > see).
> >
> > This doesn't make any sense at all. irq_chip_redirect_set_affinity() is
> > only accessing interrupt descriptor associated memory and the new
> > redirection CPU is the same as the previous one as the mask changes from
> > 0xff to 0x7f and therefore cpumask_first() yields 0 in both cases.
> >
> > According to the provided dmesg, this happens on linux-next.
> >
> > Jon, can you please validate that this happens as well on
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
>
>
> I tried this branch and I see suspend failing with that branch too. If I
> revert this change on top of your branch or -next, I don't see any
> problems.
The closest hardware I have access to is Jetson Xavier NX, and you
already mentioned you couldn't reproduce the issue there (and it looks
like I can't even get a hold of that board anyway). So I'm going to ask
you to test a few more things for me.
Can you please apply the patch below on top of the previous one I sent?
The suspect is the spinlock lock in irq_migrate_all_off_this_cpu(),
although I can't think of any reason why it shouldn't be free. But I
don't have any better idea, and I would like to narrow down the spot
where hotplug gets stuck.
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index d8c62547f9d06..69c44da68e3a9 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -178,9 +178,11 @@ void irq_migrate_all_off_this_cpu(void)
for_each_active_irq(irq) {
bool affinity_broken;
+ pr_info("%s: irq %u\n", __func__, irq);
desc = irq_to_desc(irq);
scoped_guard(raw_spinlock, &desc->lock) {
affinity_broken = migrate_one_irq(desc);
+ pr_info("%s: migrate_one_irq -> %u\n", __func__, affinity_broken);
if (affinity_broken && desc->affinity_notify)
irq_affinity_schedule_notify_work(desc);
}
--
Thanks,
Radu
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-26 22:26 ` Radu Rendec
@ 2026-01-27 10:30 ` Thomas Gleixner
2026-01-27 13:34 ` Thomas Gleixner
0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2026-01-27 10:30 UTC (permalink / raw)
To: Radu Rendec, Jon Hunter, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
On Mon, Jan 26 2026 at 17:26, Radu Rendec wrote:
> On Mon, 2026-01-26 at 22:07 +0000, Jon Hunter wrote:
>> > Jon, can you please validate that this happens as well on
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
>>
>>
>> I tried this branch and I see suspend failing with that branch too. If I
>> revert this change on top of your branch or -next, I don't see any
>> problems.
>
> The closest hardware I have access to is Jetson Xavier NX, and you
> already mentioned you couldn't reproduce the issue there (and it looks
> like I can't even get a hold of that board anyway). So I'm going to ask
> you to test a few more things for me.
>
> Can you please apply the patch below on top of the previous one I sent?
> The suspect is the spinlock lock in irq_migrate_all_off_this_cpu(),
> although I can't think of any reason why it shouldn't be free. But I
> don't have any better idea, and I would like to narrow down the spot
> where hotplug gets stuck.
Can we please take a step back and think about what is actually
different when this change is in effect instead of halluzinating about
completely unrelated spinlocks?
Without this change the interrupt is ignored in the hotplug migration
because it has MSI_FLAG_NO_AFFINITY set.
Now with this new magic in place the following happens:
migrate_one_irq()
...
irq_do_set_affinity()
chip->irq_set_affinity() // --> msi_domain_set_affinity()
parent->chip->irq_set_affinity() // --> irq_chip_redirect_set_affinity()
update target_cpu/effective mask; // Benign
...
irq_chip_write_msi_msg() // --> pci_msi_domain_write_msg()
I'm pretty sure that this write screws things up because the
devices/busses are already frozen. It simply hangs there.
Usually this is prevented by this check in pci_msi_domain_write_msg():
if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev))
do_nothing();
else ...
As the boot log contains this:
[ 44.101151] tegra194-pcie 14100000.pcie: Link didn't transition to L2 state
[ 44.110764] Disabling non-boot CPUs ...
... I suspect that there is some weirdness going on with this PCIe
controller which subsequently screws up the check.
The below untested hack should confirm that theory.
Thanks,
tglx
---
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -672,7 +672,11 @@ int msi_domain_set_affinity(struct irq_d
if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
BUG_ON(irq_chip_compose_msi_msg(irq_data, msg));
msi_check_level(irq_data->domain, msg);
- irq_chip_write_msi_msg(irq_data, msg);
+ // Hack alert
+ struct irq_desc *desc = irq_data_to_desc(irq_data);
+
+ if (!(desc->istate & IRQS_SUSPENDED))
+ irq_chip_write_msi_msg(irq_data, msg);
}
return ret;
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-27 10:30 ` Thomas Gleixner
@ 2026-01-27 13:34 ` Thomas Gleixner
2026-01-27 17:09 ` Jon Hunter
0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2026-01-27 13:34 UTC (permalink / raw)
To: Radu Rendec, Jon Hunter, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
On Tue, Jan 27 2026 at 11:30, Thomas Gleixner wrote:
> The below untested hack should confirm that theory.
Actually looking at it deeper the solution is trivial because in this
case writing the MSI message to the device is not required when the
affinity changes because the message does not change. It is set once via
msi_domain_activate() and stays the same for the life time of the
interrupt.
So the below prevents the invocation of irq_chip_write_msi_msg() in
msi_domain_set_affinity(), but I would recommend to investigate the
actual underlying problem nevertheless:
It is going to roar its ugly head at some other place sooner than later
as there are tons of other places which guard against
pci_dev::current_state != PCI_D0.
Thanks,
tglx
---
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1495,7 +1495,7 @@ int irq_chip_redirect_set_affinity(struc
WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
irq_data_update_effective_affinity(data, dest);
- return IRQ_SET_MASK_OK;
+ return IRQ_SET_MASK_OK_DONE;
}
EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
#endif
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-27 13:34 ` Thomas Gleixner
@ 2026-01-27 17:09 ` Jon Hunter
2026-01-27 21:30 ` [PATCH] genirq/redirect: Prevent writing MSI message on affinity change Thomas Gleixner
2026-03-26 3:48 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Tsai Sung-Fu
0 siblings, 2 replies; 25+ messages in thread
From: Jon Hunter @ 2026-01-27 17:09 UTC (permalink / raw)
To: Thomas Gleixner, Radu Rendec, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi Thomas,
On 27/01/2026 13:34, Thomas Gleixner wrote:
> On Tue, Jan 27 2026 at 11:30, Thomas Gleixner wrote:
>> The below untested hack should confirm that theory.
>
> Actually looking at it deeper the solution is trivial because in this
> case writing the MSI message to the device is not required when the
> affinity changes because the message does not change. It is set once via
> msi_domain_activate() and stays the same for the life time of the
> interrupt.
>
> So the below prevents the invocation of irq_chip_write_msi_msg() in
> msi_domain_set_affinity(), but I would recommend to investigate the
> actual underlying problem nevertheless:
>
> It is going to roar its ugly head at some other place sooner than later
> as there are tons of other places which guard against
> pci_dev::current_state != PCI_D0.
>
> Thanks,
>
> tglx
> ---
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1495,7 +1495,7 @@ int irq_chip_redirect_set_affinity(struc
> WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
> irq_data_update_effective_affinity(data, dest);
>
> - return IRQ_SET_MASK_OK;
> + return IRQ_SET_MASK_OK_DONE;
> }
> EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
> #endif
>
Yes that does fix it!
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Thanks!
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] genirq/redirect: Prevent writing MSI message on affinity change
2026-01-27 17:09 ` Jon Hunter
@ 2026-01-27 21:30 ` Thomas Gleixner
2026-01-29 22:51 ` [tip: irq/msi] " tip-bot2 for Thomas Gleixner
2026-03-26 3:48 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Tsai Sung-Fu
1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2026-01-27 21:30 UTC (permalink / raw)
To: Jon Hunter, Radu Rendec, Manivannan Sadhasivam
Cc: Daniel Tsai, Marek Behún, Krishna Chaitanya Chundru,
Bjorn Helgaas, Rob Herring, Krzysztof Wilczyński,
Lorenzo Pieralisi, Jingoo Han, Brian Masney, Eric Chanudet,
Alessandro Carminati, Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
The interrupts which are handled by the redirection infrastructure provide
a irq_set_affinity() callback, which solely determines the target CPU for
redirection via irq_work and und updates the effective affinity mask.
Contrary to regular MSI interrupts this affinity setting does not change
the underlying interrupt message as the message is only created at setup
time to deliver to the demultiplexing interrupt.
Therefore the message write in msi_domain_set_affinity() is a pointless
exercise. In principle the write is harmless, but a Tegra system exposes a
full system hang during suspend due to that write.
It's unclear why the check for the PCI device state PCI_D0 in
pci_msi_domain_write_msg(), which prevents the actual hardware access if
a device is powered down state, fails on this particular system, but
that's a different problem which needs to be investigated by the Tegra
experts.
The irq_set_affinity() callback can advise msi_domain_set_affinity() not to
write the MSI message by returning IRQ_SET_MASK_OK_DONE instead of
IRQ_SET_MASK_OK. Do exactly that.
Just to make it clear again:
This is not a correctness issue of the redirection code as returning
IRQ_SET_MASK_OK in that context is completely correct. From the core
code point of view this is solely a optimization to avoid an redundant
hardware write.
As a byproduct it papers over the underlying problem on the Tegra platform,
which fails to put the PCIe device[s] out of PCI_D0 despite the fact that
the devices and busses have been shut down. The redirect infrastructure
just unearthed the underlying issue, which is prone to happen in quite some
other code paths which use the PCI_D0 check to prevent hardware access to
powered down devices.
This therefore has neither a 'Fixes:' nor a 'Closes:' tag associated as the
underlying problem, which is outside the scope of the interrupt code, is
still unresolved.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/all/4e5b349c-6599-4871-9e3b-e10352ae0ca0@nvidia.com
---
kernel/irq/chip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1495,7 +1495,7 @@ int irq_chip_redirect_set_affinity(struc
WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
irq_data_update_effective_affinity(data, dest);
- return IRQ_SET_MASK_OK;
+ return IRQ_SET_MASK_OK_DONE;
}
EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
#endif
^ permalink raw reply [flat|nested] 25+ messages in thread
* [tip: irq/msi] genirq/redirect: Prevent writing MSI message on affinity change
2026-01-27 21:30 ` [PATCH] genirq/redirect: Prevent writing MSI message on affinity change Thomas Gleixner
@ 2026-01-29 22:51 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2026-01-29 22:51 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Jon Hunter, Thomas Gleixner, x86, linux-kernel
The following commit has been merged into the irq/msi branch of tip:
Commit-ID: 37f9d5026cd78fbe80a124edbbadab382b26545f
Gitweb: https://git.kernel.org/tip/37f9d5026cd78fbe80a124edbbadab382b26545f
Author: Thomas Gleixner <tglx@kernel.org>
AuthorDate: Tue, 27 Jan 2026 22:30:16 +01:00
Committer: Thomas Gleixner <tglx@kernel.org>
CommitterDate: Thu, 29 Jan 2026 23:49:55 +01:00
genirq/redirect: Prevent writing MSI message on affinity change
The interrupts which are handled by the redirection infrastructure provide
a irq_set_affinity() callback, which solely determines the target CPU for
redirection via irq_work and und updates the effective affinity mask.
Contrary to regular MSI interrupts this affinity setting does not change
the underlying interrupt message as the message is only created at setup
time to deliver to the demultiplexing interrupt.
Therefore the message write in msi_domain_set_affinity() is a pointless
exercise. In principle the write is harmless, but a Tegra system exposes a
full system hang during suspend due to that write.
It's unclear why the check for the PCI device state PCI_D0 in
pci_msi_domain_write_msg(), which prevents the actual hardware access if
a device is in powered down state, fails on this particular system, but
that's a different problem which needs to be investigated by the Tegra
experts.
The irq_set_affinity() callback can advise msi_domain_set_affinity() not to
write the MSI message by returning IRQ_SET_MASK_OK_DONE instead of
IRQ_SET_MASK_OK. Do exactly that.
Just to make it clear again:
This is not a correctness issue of the redirection code as returning
IRQ_SET_MASK_OK in that context is completely correct. From the core
code point of view this is solely a optimization to avoid an redundant
hardware write.
As a byproduct it papers over the underlying problem on the Tegra platform,
which fails to put the PCIe device[s] out of PCI_D0 despite the fact that
the devices and busses have been shut down. The redirect infrastructure
just unearthed the underlying issue, which is prone to happen in quite some
other code paths which use the PCI_D0 check to prevent hardware access to
powered down devices.
This therefore has neither a 'Fixes:' nor a 'Closes:' tag associated as the
underlying problem, which is outside the scope of the interrupt code, is
still unresolved.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/all/4e5b349c-6599-4871-9e3b-e10352ae0ca0@nvidia.com
Link: https://patch.msgid.link/87tsw6aglz.ffs@tglx
---
kernel/irq/chip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 35bc17b..ccdc47a 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1495,7 +1495,7 @@ int irq_chip_redirect_set_affinity(struct irq_data *data, const struct cpumask *
WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
irq_data_update_effective_affinity(data, dest);
- return IRQ_SET_MASK_OK;
+ return IRQ_SET_MASK_OK_DONE;
}
EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
#endif
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-01-27 17:09 ` Jon Hunter
2026-01-27 21:30 ` [PATCH] genirq/redirect: Prevent writing MSI message on affinity change Thomas Gleixner
@ 2026-03-26 3:48 ` Tsai Sung-Fu
2026-03-26 12:52 ` Thomas Gleixner
1 sibling, 1 reply; 25+ messages in thread
From: Tsai Sung-Fu @ 2026-03-26 3:48 UTC (permalink / raw)
To: Jon Hunter
Cc: Thomas Gleixner, Radu Rendec, Manivannan Sadhasivam,
Marek Behún, Krishna Chaitanya Chundru, Bjorn Helgaas,
Rob Herring, Krzysztof Wilczyński, Lorenzo Pieralisi,
Jingoo Han, Brian Masney, Eric Chanudet, Alessandro Carminati,
Jared Kangas, linux-pci, linux-kernel,
linux-tegra@vger.kernel.org
Hi,
Do we have plan to land this feature upstream ?
Thanks
On Wed, Jan 28, 2026 at 1:09 AM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Thomas,
>
> On 27/01/2026 13:34, Thomas Gleixner wrote:
> > On Tue, Jan 27 2026 at 11:30, Thomas Gleixner wrote:
> >> The below untested hack should confirm that theory.
> >
> > Actually looking at it deeper the solution is trivial because in this
> > case writing the MSI message to the device is not required when the
> > affinity changes because the message does not change. It is set once via
> > msi_domain_activate() and stays the same for the life time of the
> > interrupt.
> >
> > So the below prevents the invocation of irq_chip_write_msi_msg() in
> > msi_domain_set_affinity(), but I would recommend to investigate the
> > actual underlying problem nevertheless:
> >
> > It is going to roar its ugly head at some other place sooner than later
> > as there are tons of other places which guard against
> > pci_dev::current_state != PCI_D0.
> >
> > Thanks,
> >
> > tglx
> > ---
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -1495,7 +1495,7 @@ int irq_chip_redirect_set_affinity(struc
> > WRITE_ONCE(redir->target_cpu, cpumask_first(dest));
> > irq_data_update_effective_affinity(data, dest);
> >
> > - return IRQ_SET_MASK_OK;
> > + return IRQ_SET_MASK_OK_DONE;
> > }
> > EXPORT_SYMBOL_GPL(irq_chip_redirect_set_affinity);
> > #endif
> >
>
> Yes that does fix it!
>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
>
> Thanks!
> Jon
>
> --
> nvpublic
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
2026-03-26 3:48 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Tsai Sung-Fu
@ 2026-03-26 12:52 ` Thomas Gleixner
0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2026-03-26 12:52 UTC (permalink / raw)
To: Tsai Sung-Fu, Jon Hunter
Cc: Radu Rendec, Manivannan Sadhasivam, Marek Behún,
Krishna Chaitanya Chundru, Bjorn Helgaas, Rob Herring,
Krzysztof Wilczyński, Lorenzo Pieralisi, Jingoo Han,
Brian Masney, Eric Chanudet, Alessandro Carminati, Jared Kangas,
linux-pci, linux-kernel, linux-tegra@vger.kernel.org
On Thu, Mar 26 2026 at 11:48, Tsai Sung-Fu wrote:
Please do not top post and trim your replies as documented.
> Do we have plan to land this feature upstream ?
# cd linux
# git log v6.19.. drivers/pci/controller/dwc/pcie-designware-host.c
Thanks,
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread