* Virtio-IOMMU interrupt remapping design
@ 2025-06-15 18:47 Demi Marie Obenour
2025-06-16 13:20 ` Jason Gunthorpe
2025-06-17 19:46 ` Michael S. Tsirkin
0 siblings, 2 replies; 9+ messages in thread
From: Demi Marie Obenour @ 2025-06-15 18:47 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross
Cc: virtualization, linux-kernel, linux-acpi, iommu, x86,
Spectrum OS Development
[-- Attachment #1.1.1: Type: text/plain, Size: 1798 bytes --]
Virtio-IOMMU interrupt remapping turned out to be much harder than I
realized. The main problem is that interrupt remapping is set up
very early in boot. In fact, Linux calls the interrupt remapping probe
function from the APIC initialization code: x86_64_probe_apic ->
enable_IR_x2apic -> irq_remapping_prepare(). This is almost certainly
much before PCI has been initialized. Also, the order in which devices
will be initialized is not something Linux guarantees at all, which is a
problem because interrupt remapping must be initialized before drivers
start setting up interrupts. Otherwise, the interrupt remapping table
won't include entries for already-existing interrupts, and things will
either break badly, not get the benefit of interrupt remapping
security-wise, or both.
The reason I expect this doesn't cause problems for address translation
is that the IOMMU probably starts in bypass mode by default, meaning
that all DMA is permitted. If the IOMMU is only used by VFIO or
IOMMUFD, it will not be needed until userspace starts up, which is after
the IOMMU has been initialized. This isn't ideal, though, as it means
that kernel drivers operate without DMA protection.
Is a paravirtualized IOMMU with interrupt remapping something that makes
sense? Absolutely! However, the IOMMU should be considered a platform
device that must be initialized very early in boot. Using virtio-IOMMU
with MMIO transport as the interface might be a reasonable option, but
the IOMMU needs to be enumerated via ACPI, device tree, or kernel
command line argument. This allows it to be brought up before anything
capable of DMA is initialized.
Is this the right path to go down? What do others think about this?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-15 18:47 Virtio-IOMMU interrupt remapping design Demi Marie Obenour
@ 2025-06-16 13:20 ` Jason Gunthorpe
2025-06-16 16:53 ` Demi Marie Obenour
2025-06-17 19:44 ` Michael S. Tsirkin
2025-06-17 19:46 ` Michael S. Tsirkin
1 sibling, 2 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2025-06-16 13:20 UTC (permalink / raw)
To: Demi Marie Obenour
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
> Is a paravirtualized IOMMU with interrupt remapping something that makes
> sense?
IMHO linking interrupt remapping to the iommu is a poor design,
interrupt routing belongs in the irq subsystem, not in the iommu.
The fact AMD and Intel both coupled their interrupt routing to their
iommu hardware is just a weird design decision. ARM didn't do this,
for instance.
So I would not try to do this at all, you should have a
para-virtualized IRQ interface, not an extension to virtio-iommu
adding interrupt handling. :\
AFAIK hyperv shows how to build something like this.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-16 13:20 ` Jason Gunthorpe
@ 2025-06-16 16:53 ` Demi Marie Obenour
2025-06-16 17:33 ` Jason Gunthorpe
2025-06-17 19:44 ` Michael S. Tsirkin
1 sibling, 1 reply; 9+ messages in thread
From: Demi Marie Obenour @ 2025-06-16 16:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
[-- Attachment #1.1.1: Type: text/plain, Size: 984 bytes --]
On 6/16/25 09:20, Jason Gunthorpe wrote:
> On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
>
>> Is a paravirtualized IOMMU with interrupt remapping something that makes
>> sense?
>
> IMHO linking interrupt remapping to the iommu is a poor design,
> interrupt routing belongs in the irq subsystem, not in the iommu.
I agree.
> The fact AMD and Intel both coupled their interrupt routing to their
> iommu hardware is just a weird design decision. ARM didn't do this,
> for instance.
Arm did the right thing here, IMO.
> So I would not try to do this at all, you should have a
> para-virtualized IRQ interface, not an extension to virtio-iommu
> adding interrupt handling. :\
I don't disagree at all.
> AFAIK hyperv shows how to build something like this.
Would this need KVM patches? I'm concerned that implementing this
in userspace would interact badly with the irqfd fast path.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-16 16:53 ` Demi Marie Obenour
@ 2025-06-16 17:33 ` Jason Gunthorpe
0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2025-06-16 17:33 UTC (permalink / raw)
To: Demi Marie Obenour
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Mon, Jun 16, 2025 at 12:53:40PM -0400, Demi Marie Obenour wrote:
> > AFAIK hyperv shows how to build something like this.
> Would this need KVM patches? I'm concerned that implementing this
> in userspace would interact badly with the irqfd fast path.
I don't know. I think you get the same issues even if you did
virtio-iommu irq handling, it shouldn't be any different.
I'm not sure there even is a fast path here, remapping happens during
initial vector setup/affinity change only. That isn't fast path. So
long as the MSI is delivered to the correct CPU vector entirely in KVM
it seems OK.
And the hyperv approach of asking the hypervisor for the addr/data
pair to achieve certain parameters will work alot better with existing
Linux than trying to build a iommu emulation where the guest is
building its own private addr/data pairs :\
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-16 13:20 ` Jason Gunthorpe
2025-06-16 16:53 ` Demi Marie Obenour
@ 2025-06-17 19:44 ` Michael S. Tsirkin
2025-06-17 19:57 ` Jason Gunthorpe
1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2025-06-17 19:44 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Demi Marie Obenour, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Mon, Jun 16, 2025 at 10:20:31AM -0300, Jason Gunthorpe wrote:
> On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
>
> > Is a paravirtualized IOMMU with interrupt remapping something that makes
> > sense?
>
> IMHO linking interrupt remapping to the iommu is a poor design,
> interrupt routing belongs in the irq subsystem, not in the iommu.
>
> The fact AMD and Intel both coupled their interrupt routing to their
> iommu hardware is just a weird design decision. ARM didn't do this,
> for instance.
why does it matter in which device it resides?
Way I see it, there is little reason to remap interrupts
without also using an iommu, so why not a single device.
what did I miss?
> So I would not try to do this at all, you should have a
> para-virtualized IRQ interface, not an extension to virtio-iommu
> adding interrupt handling. :\
>
> AFAIK hyperv shows how to build something like this.
>
> Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-17 19:44 ` Michael S. Tsirkin
@ 2025-06-17 19:57 ` Jason Gunthorpe
2025-06-17 20:01 ` Michael S. Tsirkin
0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2025-06-17 19:57 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Demi Marie Obenour, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Tue, Jun 17, 2025 at 03:44:20PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 16, 2025 at 10:20:31AM -0300, Jason Gunthorpe wrote:
> > On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
> >
> > > Is a paravirtualized IOMMU with interrupt remapping something that makes
> > > sense?
> >
> > IMHO linking interrupt remapping to the iommu is a poor design,
> > interrupt routing belongs in the irq subsystem, not in the iommu.
> >
> > The fact AMD and Intel both coupled their interrupt routing to their
> > iommu hardware is just a weird design decision. ARM didn't do this,
> > for instance.
>
> why does it matter in which device it resides?
It would cleanup the boot process if the IRQ components were available
at the same time as the IRQ drivers instead of much later when the
iommu gets plugged in.
> Way I see it, there is little reason to remap interrupts without
> also using an iommu, so why not a single device. what did I miss?
Remapping interrupts can be understood to be virtualizing the MSI
addr/data pair space so that the CPU controls where the interrupt goes
though its internal tables not the device through the addr/data.
On x86 you also need to use remapping to exceed the max CPU count that
can be encoded in the MSI, no iommu required to need this.
There is also some stuff related to IMS that could get improved here.
You don't need an iommu to enjoy those benefits.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-17 19:57 ` Jason Gunthorpe
@ 2025-06-17 20:01 ` Michael S. Tsirkin
2025-06-17 23:03 ` Jason Gunthorpe
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2025-06-17 20:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Demi Marie Obenour, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Tue, Jun 17, 2025 at 04:57:20PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 17, 2025 at 03:44:20PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jun 16, 2025 at 10:20:31AM -0300, Jason Gunthorpe wrote:
> > > On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
> > >
> > > > Is a paravirtualized IOMMU with interrupt remapping something that makes
> > > > sense?
> > >
> > > IMHO linking interrupt remapping to the iommu is a poor design,
> > > interrupt routing belongs in the irq subsystem, not in the iommu.
> > >
> > > The fact AMD and Intel both coupled their interrupt routing to their
> > > iommu hardware is just a weird design decision. ARM didn't do this,
> > > for instance.
> >
> > why does it matter in which device it resides?
>
> It would cleanup the boot process if the IRQ components were available
> at the same time as the IRQ drivers instead of much later when the
> iommu gets plugged in.
>
> > Way I see it, there is little reason to remap interrupts without
> > also using an iommu, so why not a single device. what did I miss?
>
> Remapping interrupts can be understood to be virtualizing the MSI
> addr/data pair space so that the CPU controls where the interrupt goes
> though its internal tables not the device through the addr/data.
>
> On x86 you also need to use remapping to exceed the max CPU count that
> can be encoded in the MSI, no iommu required to need this.
More of an x86 quirk though, isn't it?
> There is also some stuff related to IMS that could get improved here.
>
> You don't need an iommu to enjoy those benefits.
>
> Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-17 20:01 ` Michael S. Tsirkin
@ 2025-06-17 23:03 ` Jason Gunthorpe
0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2025-06-17 23:03 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Demi Marie Obenour, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Joerg Roedel,
Will Deacon, Robin Murphy, Alyssa Ross, virtualization,
linux-kernel, linux-acpi, iommu, x86, Spectrum OS Development
On Tue, Jun 17, 2025 at 04:01:53PM -0400, Michael S. Tsirkin wrote:
> > On x86 you also need to use remapping to exceed the max CPU count that
> > can be encoded in the MSI, no iommu required to need this.
>
> More of an x86 quirk though, isn't it?
Yes, but so is bundling IOMMU and remapping HW together <shrug>
GIC fully integrates it into the interrupt controller architecture.
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Virtio-IOMMU interrupt remapping design
2025-06-15 18:47 Virtio-IOMMU interrupt remapping design Demi Marie Obenour
2025-06-16 13:20 ` Jason Gunthorpe
@ 2025-06-17 19:46 ` Michael S. Tsirkin
1 sibling, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2025-06-17 19:46 UTC (permalink / raw)
To: Demi Marie Obenour
Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Rafael J. Wysocki,
Len Brown, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Joerg Roedel, Will Deacon,
Robin Murphy, Alyssa Ross, virtualization, linux-kernel,
linux-acpi, iommu, x86, Spectrum OS Development
On Sun, Jun 15, 2025 at 02:47:15PM -0400, Demi Marie Obenour wrote:
> Virtio-IOMMU interrupt remapping turned out to be much harder than I
> realized. The main problem is that interrupt remapping is set up
> very early in boot. In fact, Linux calls the interrupt remapping probe
> function from the APIC initialization code: x86_64_probe_apic ->
> enable_IR_x2apic -> irq_remapping_prepare(). This is almost certainly
> much before PCI has been initialized. Also, the order in which devices
> will be initialized is not something Linux guarantees at all, which is a
> problem because interrupt remapping must be initialized before drivers
> start setting up interrupts. Otherwise, the interrupt remapping table
> won't include entries for already-existing interrupts, and things will
> either break badly, not get the benefit of interrupt remapping
> security-wise, or both.
>
> The reason I expect this doesn't cause problems for address translation
> is that the IOMMU probably starts in bypass mode by default, meaning
> that all DMA is permitted. If the IOMMU is only used by VFIO or
> IOMMUFD, it will not be needed until userspace starts up, which is after
> the IOMMU has been initialized. This isn't ideal, though, as it means
> that kernel drivers operate without DMA protection.
>
> Is a paravirtualized IOMMU with interrupt remapping something that makes
> sense? Absolutely! However, the IOMMU should be considered a platform
> device that must be initialized very early in boot. Using virtio-IOMMU
> with MMIO transport as the interface might be a reasonable option, but
> the IOMMU needs to be enumerated via ACPI, device tree, or kernel
> command line argument. This allows it to be brought up before anything
> capable of DMA is initialized.
>
> Is this the right path to go down? What do others think about this?
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
The project for this discussion is also virtio-comment,
this ML is for driver work.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-06-17 23:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-15 18:47 Virtio-IOMMU interrupt remapping design Demi Marie Obenour
2025-06-16 13:20 ` Jason Gunthorpe
2025-06-16 16:53 ` Demi Marie Obenour
2025-06-16 17:33 ` Jason Gunthorpe
2025-06-17 19:44 ` Michael S. Tsirkin
2025-06-17 19:57 ` Jason Gunthorpe
2025-06-17 20:01 ` Michael S. Tsirkin
2025-06-17 23:03 ` Jason Gunthorpe
2025-06-17 19:46 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).