* [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
@ 2025-05-17 10:30 Marc Zyngier
2025-05-17 19:59 ` Thomas Gleixner
0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2025-05-17 10:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel; +Cc: Thomas Gleixner
For systems that implement interrupt masking at the interrupt
controller level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT.
It indicates that it isn't enough to only unmask the interrupt at the PCI
device level, but that the interrupt controller must also be involved.
However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both side, always. It would be far cheaper
to unmask both at the start of times, and then only deal with the
interrupt controller mask, which is likely to be cheaper than a round-trip
to the endpoint.
Implement this by patching up the irq_chip structure associated with
the MSIs to perform the full unmask on .irq_enable(), and the full mask
on .irq_shutdown(). This asymmetry allows the preservation of the
"lazy disable" feature, which relies on the top-level irq_chip not
implementing the .irq_disable() callback. Yes, this is a terrible hack.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/irqchip/irq-msi-lib.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/irqchip/irq-msi-lib.c b/drivers/irqchip/irq-msi-lib.c
index 246c30205af40..8c62034ab8d92 100644
--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -112,6 +112,21 @@ bool msi_lib_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
*/
if (!chip->irq_set_affinity && !(info->flags & MSI_FLAG_NO_AFFINITY))
chip->irq_set_affinity = msi_domain_set_affinity;
+
+ /*
+ * If the parent domain insists on being in charge of masking, obey
+ * blindly. The default mask/unmask become the shutdown/enable
+ * callbacks, ensuring that we correctly start/stop the interrupt.
+ * We make a point in not using the irq_disable() in order to
+ * preserve the "lazy disable" behaviour.
+ */
+ if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
+ chip->irq_shutdown = chip->irq_mask;
+ chip->irq_enable = chip->irq_unmask;
+ chip->irq_mask = irq_chip_mask_parent;
+ chip->irq_unmask = irq_chip_unmask_parent;
+ }
+
return true;
}
EXPORT_SYMBOL_GPL(msi_lib_init_dev_msi_info);
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
2025-05-17 10:30 [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag Marc Zyngier
@ 2025-05-17 19:59 ` Thomas Gleixner
2025-05-23 9:06 ` Marc Zyngier
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2025-05-17 19:59 UTC (permalink / raw)
To: Marc Zyngier, linux-kernel, linux-arm-kernel
On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
> + /*
> + * If the parent domain insists on being in charge of masking, obey
> + * blindly. The default mask/unmask become the shutdown/enable
> + * callbacks, ensuring that we correctly start/stop the interrupt.
> + * We make a point in not using the irq_disable() in order to
> + * preserve the "lazy disable" behaviour.
> + */
> + if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
> + chip->irq_shutdown = chip->irq_mask;
> + chip->irq_enable = chip->irq_unmask;
This is only correct, when the chip does not have dedicated
irq_shutdown/enable callbacks. And I really hate the asymmetry of this.
> + chip->irq_mask = irq_chip_mask_parent;
> + chip->irq_unmask = irq_chip_unmask_parent;
> + }
I'm still trying to understand, what's the actual problem is you are
trying to solve.
MSIs are edge type interrupts, so the interrupt handling hotpath usually
does not mask at all. The only time masking happens is when it's lazy
disabled or during affinity changes, which is not the end of the world.
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
2025-05-17 19:59 ` Thomas Gleixner
@ 2025-05-23 9:06 ` Marc Zyngier
2025-06-30 8:59 ` Thomas Gleixner
0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2025-05-23 9:06 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-kernel, linux-arm-kernel
On Sat, 17 May 2025 20:59:10 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
> > + /*
> > + * If the parent domain insists on being in charge of masking, obey
> > + * blindly. The default mask/unmask become the shutdown/enable
> > + * callbacks, ensuring that we correctly start/stop the interrupt.
> > + * We make a point in not using the irq_disable() in order to
> > + * preserve the "lazy disable" behaviour.
> > + */
> > + if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
> > + chip->irq_shutdown = chip->irq_mask;
> > + chip->irq_enable = chip->irq_unmask;
>
> This is only correct, when the chip does not have dedicated
> irq_shutdown/enable callbacks.
The chip structure provided by the PCI MSI code doesn't provide such
callback, meaning that they are unused for the whole hierarchy.
> And I really hate the asymmetry of this.
So do I, but that's how the lazy disable thing currently works. Drop
the bizarre asymmetry on irq_disable, and we can make this nicely
symmetric as well.
>
> > + chip->irq_mask = irq_chip_mask_parent;
> > + chip->irq_unmask = irq_chip_unmask_parent;
> > + }
>
> I'm still trying to understand, what's the actual problem is you are
> trying to solve.
I'm trying to remove some overhead from machines that don't need to
suffer from this nonsense double masking. Specially in VMs when
masking/unmasking requires *two* extremely costly exits (write +
synchronising read-back). This change reduces the overhead
significantly by only masking where it actually matters.
> MSIs are edge type interrupts, so the interrupt handling hotpath usually
> does not mask at all. The only time masking happens is when it's lazy
> disabled or during affinity changes, which is not the end of the world.
And that's part of the problem. The lazy disable ends up being way
more costly than it should when the interrupt fires during the
"disabled but not quite" phase, and in turn makes the critical section
delineated by disable_irq()/enable_irq() more expensive.
So while, as you put it, it's "not the end of the world", this seems
to me like a valuable optimisation.
Another possible improvement would be to teach the PCI code it can
still rely on masking even when the endpoint is not capable of masking
individual MSIs.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
2025-05-23 9:06 ` Marc Zyngier
@ 2025-06-30 8:59 ` Thomas Gleixner
2025-09-03 14:04 ` [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested Thomas Gleixner
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2025-06-30 8:59 UTC (permalink / raw)
To: Marc Zyngier; +Cc: linux-kernel, linux-arm-kernel
On Fri, May 23 2025 at 10:06, Marc Zyngier wrote:
> On Sat, 17 May 2025 20:59:10 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
>> > + /*
>> > + * If the parent domain insists on being in charge of masking, obey
>> > + * blindly. The default mask/unmask become the shutdown/enable
>> > + * callbacks, ensuring that we correctly start/stop the interrupt.
>> > + * We make a point in not using the irq_disable() in order to
>> > + * preserve the "lazy disable" behaviour.
>> > + */
>> > + if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
>> > + chip->irq_shutdown = chip->irq_mask;
>> > + chip->irq_enable = chip->irq_unmask;
>>
>> This is only correct, when the chip does not have dedicated
>> irq_shutdown/enable callbacks.
>
> The chip structure provided by the PCI MSI code doesn't provide such
> callback, meaning that they are unused for the whole hierarchy.
Fair enough, but it still stinks.
>> And I really hate the asymmetry of this.
>
> So do I, but that's how the lazy disable thing currently works. Drop
> the bizarre asymmetry on irq_disable, and we can make this nicely
> symmetric as well.
Well, it's not that bizarre and it has a massive performance win if the
thing does not need to go out to the hardware in some scenarios. Don't
ask about the main use case. Mentioning it is probably considered a
violation of the United Nations Convention Against Torture (UNCAT).
>> > + chip->irq_mask = irq_chip_mask_parent;
>> > + chip->irq_unmask = irq_chip_unmask_parent;
>> > + }
>>
>> I'm still trying to understand, what's the actual problem is you are
>> trying to solve.
>
> I'm trying to remove some overhead from machines that don't need to
> suffer from this nonsense double masking. Specially in VMs when
> masking/unmasking requires *two* extremely costly exits (write +
> synchronising read-back). This change reduces the overhead
> significantly by only masking where it actually matters.
>
>> MSIs are edge type interrupts, so the interrupt handling hotpath usually
>> does not mask at all. The only time masking happens is when it's lazy
>> disabled or during affinity changes, which is not the end of the world.
>
> And that's part of the problem. The lazy disable ends up being way
> more costly than it should when the interrupt fires during the
> "disabled but not quite" phase, and in turn makes the critical section
> delineated by disable_irq()/enable_irq() more expensive.
>
> So while, as you put it, it's "not the end of the world", this seems
> to me like a valuable optimisation.
I understand, but this needs more thoughts. Doing this wholesale for all
potential PCI/MSI parent domains which require MASK_PARTN makes me more
than nervous.
> Another possible improvement would be to teach the PCI code it can
> still rely on masking even when the endpoint is not capable of masking
> individual MSIs.
Well, it relies on that today already if the underlying parent domain is
capable of masking. If not, it hopes that nothing bad happens, which is
the only option we have :(
It get's worse when the device does not support masking _and_ the parent
domain does not provide immutable MSI messages because then the MSI
message write becomes a horrorshow. For illustration see the mess in
arch/x86/kernel/apic/msi.c::msi_set_affinity(), which is a violation of
above mentioned convention as well. Despite the fact that this has been
known for decades, RISC-V went ahead and replicated that trainwreck in
the IMSIC IP block. Oh well....
I sat down and stared at it in the few moments where the heat wave did
not completely shutdown my brain. As usual this ended in a larger
cleanup and overhaul... At the end I went and created a new pair of chip
callbacks and the corresponding logic around it. A preview of the whole
pile is at:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git irq/msi
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread
* [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested
2025-06-30 8:59 ` Thomas Gleixner
@ 2025-09-03 14:04 ` Thomas Gleixner
2025-09-03 14:04 ` [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag Thomas Gleixner
2025-09-03 14:04 ` [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic Thomas Gleixner
0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-09-03 14:04 UTC (permalink / raw)
To: LKML; +Cc: Marc Zyngier, Bjorn Helgaas
This is a follow up to Marc's attempt on this:
https://lore.kernel.org/lkml/20250517103011.2573288-1-maz@kernel.org
Now that the PCI/MSI side has irq_startup/shutdown() callbacks, which do
the [un]masking at the PCI level, let the MSI parent domains which insist
on being in charge of masking do so for normal operations.
That avoids going out to the PCI endpoint in the case that an interrupt has
to be masked on arrival of an interrupt in software (lazy) disabled state.
That's achieved by overwriting the irq_[un]mask() callbacks in the irq/MSI
library.
As a consequence the conditional mask/unmask logic in the regular
irq_[un]mask() callbacks of the PCI/MSI domain is not longer required.
Thanks,
tglx
---
irqchip/irq-msi-lib.c | 14 ++++++++++++++
pci/msi/irqdomain.c | 20 --------------------
2 files changed, 14 insertions(+), 20 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
2025-09-03 14:04 ` [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested Thomas Gleixner
@ 2025-09-03 14:04 ` Thomas Gleixner
2025-09-03 14:04 ` [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic Thomas Gleixner
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-09-03 14:04 UTC (permalink / raw)
To: LKML; +Cc: Marc Zyngier, Bjorn Helgaas
From: Marc Zyngier <maz@kernel.org>
For systems that implement interrupt masking at the interrupt controller
level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT. It indicates
that it isn't enough to only unmask the interrupt at the PCI device level,
but that the interrupt controller must also be involved.
However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both sides, always. It would be far cheaper to
unmask both at the start of times, and then only deal with the interrupt
controller mask, which is cheaper than a round-trip to the PCI endpoint.
Now that the PCI/MSI layer implements irq_startup() and irq_shutdown()
callbacks, which [un]mask at the PCI level and honor the request to
[un]mask the parent, this can be trivially done.
Overwrite the irq_mask/unmask() callbacks of the device domain interrupt
chip with irq_[un]mask_parent() when the parent domain asks for it.
[ tglx: Adopted to the PCI/MSI changes ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
drivers/irqchip/irq-msi-lib.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -112,6 +112,20 @@ bool msi_lib_init_dev_msi_info(struct de
*/
if (!chip->irq_set_affinity && !(info->flags & MSI_FLAG_NO_AFFINITY))
chip->irq_set_affinity = msi_domain_set_affinity;
+
+ /*
+ * If the parent domain insists on being in charge of masking, obey
+ * blindly. The interrupt is un-masked at the PCI level on startup
+ * and masked on shutdown to prevent rogue interrupts after the
+ * driver freed the interrupt. Not masking it at the PCI level
+ * speeds up operation for disable/enable_irq() as it avoids
+ * getting all the way out to the PCI device.
+ */
+ if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
+ chip->irq_mask = irq_chip_mask_parent;
+ chip->irq_unmask = irq_chip_unmask_parent;
+ }
+
return true;
}
EXPORT_SYMBOL_GPL(msi_lib_init_dev_msi_info);
^ permalink raw reply [flat|nested] 8+ messages in thread
* [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic
2025-09-03 14:04 ` [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested Thomas Gleixner
2025-09-03 14:04 ` [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag Thomas Gleixner
@ 2025-09-03 14:04 ` Thomas Gleixner
2025-09-03 17:38 ` Bjorn Helgaas
1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2025-09-03 14:04 UTC (permalink / raw)
To: LKML; +Cc: Marc Zyngier, Bjorn Helgaas
Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
domain, the conditional [un]mask logic is obsolete.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/msi/irqdomain.c | 20 --------------------
1 file changed, 20 deletions(-)
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -170,22 +170,6 @@ static unsigned int cond_startup_parent(
return 0;
}
-static __always_inline void cond_mask_parent(struct irq_data *data)
-{
- struct msi_domain_info *info = data->domain->host_data;
-
- if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
- irq_chip_mask_parent(data);
-}
-
-static __always_inline void cond_unmask_parent(struct irq_data *data)
-{
- struct msi_domain_info *info = data->domain->host_data;
-
- if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
- irq_chip_unmask_parent(data);
-}
-
static void pci_irq_shutdown_msi(struct irq_data *data)
{
struct msi_desc *desc = irq_data_get_msi_desc(data);
@@ -208,14 +192,12 @@ static void pci_irq_mask_msi(struct irq_
struct msi_desc *desc = irq_data_get_msi_desc(data);
pci_msi_mask(desc, BIT(data->irq - desc->irq));
- cond_mask_parent(data);
}
static void pci_irq_unmask_msi(struct irq_data *data)
{
struct msi_desc *desc = irq_data_get_msi_desc(data);
- cond_unmask_parent(data);
pci_msi_unmask(desc, BIT(data->irq - desc->irq));
}
@@ -268,12 +250,10 @@ static unsigned int pci_irq_startup_msix
static void pci_irq_mask_msix(struct irq_data *data)
{
pci_msix_mask(irq_data_get_msi_desc(data));
- cond_mask_parent(data);
}
static void pci_irq_unmask_msix(struct irq_data *data)
{
- cond_unmask_parent(data);
pci_msix_unmask(irq_data_get_msi_desc(data));
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic
2025-09-03 14:04 ` [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic Thomas Gleixner
@ 2025-09-03 17:38 ` Bjorn Helgaas
0 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2025-09-03 17:38 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: LKML, Marc Zyngier, Bjorn Helgaas
On Wed, Sep 03, 2025 at 04:04:48PM +0200, Thomas Gleixner wrote:
> Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
> callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
> domain, the conditional [un]mask logic is obsolete.
>
> Remove it.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/msi/irqdomain.c | 20 --------------------
> 1 file changed, 20 deletions(-)
>
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -170,22 +170,6 @@ static unsigned int cond_startup_parent(
> return 0;
> }
>
> -static __always_inline void cond_mask_parent(struct irq_data *data)
> -{
> - struct msi_domain_info *info = data->domain->host_data;
> -
> - if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> - irq_chip_mask_parent(data);
> -}
> -
> -static __always_inline void cond_unmask_parent(struct irq_data *data)
> -{
> - struct msi_domain_info *info = data->domain->host_data;
> -
> - if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> - irq_chip_unmask_parent(data);
> -}
> -
> static void pci_irq_shutdown_msi(struct irq_data *data)
> {
> struct msi_desc *desc = irq_data_get_msi_desc(data);
> @@ -208,14 +192,12 @@ static void pci_irq_mask_msi(struct irq_
> struct msi_desc *desc = irq_data_get_msi_desc(data);
>
> pci_msi_mask(desc, BIT(data->irq - desc->irq));
> - cond_mask_parent(data);
> }
>
> static void pci_irq_unmask_msi(struct irq_data *data)
> {
> struct msi_desc *desc = irq_data_get_msi_desc(data);
>
> - cond_unmask_parent(data);
> pci_msi_unmask(desc, BIT(data->irq - desc->irq));
> }
>
> @@ -268,12 +250,10 @@ static unsigned int pci_irq_startup_msix
> static void pci_irq_mask_msix(struct irq_data *data)
> {
> pci_msix_mask(irq_data_get_msi_desc(data));
> - cond_mask_parent(data);
> }
>
> static void pci_irq_unmask_msix(struct irq_data *data)
> {
> - cond_unmask_parent(data);
> pci_msix_unmask(irq_data_get_msi_desc(data));
> }
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-03 17:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-17 10:30 [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag Marc Zyngier
2025-05-17 19:59 ` Thomas Gleixner
2025-05-23 9:06 ` Marc Zyngier
2025-06-30 8:59 ` Thomas Gleixner
2025-09-03 14:04 ` [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested Thomas Gleixner
2025-09-03 14:04 ` [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag Thomas Gleixner
2025-09-03 14:04 ` [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic Thomas Gleixner
2025-09-03 17:38 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).