linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	 Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, quic_bqiang@quicinc.com, kvalo@kernel.org,
	 prestwoj@gmail.com, linux-wireless@vger.kernel.org,
	ath11k@lists.infradead.org,  iommu@lists.linux.dev,
	kernel@quicinc.com, johannes@sipsolutions.net,
	 jtornosm@redhat.com
Subject: Re: [PATCH RFC/RFT] vfio/pci: Create feature to disable MSI virtualization
Date: Fri, 13 Dec 2024 09:10:30 +0000	[thread overview]
Message-ID: <03fdfde8dc05ecce1f1edececf0800d8cb919ac1.camel@infradead.org> (raw)
In-Reply-To: <87r0aspby6.ffs@tglx>

[-- Attachment #1: Type: text/plain, Size: 4266 bytes --]

On Tue, 2024-08-13 at 19:30 +0200, Thomas Gleixner wrote:
> On Tue, Aug 13 2024 at 13:30, Jason Gunthorpe wrote:
> > On Mon, Aug 12, 2024 at 10:59:12AM -0600, Alex Williamson wrote:
> > > vfio-pci has always virtualized the MSI address and data registers as
> > > MSI programming is performed through the SET_IRQS ioctl.  Often this
> > > virtualization is not used, and in specific cases can be unhelpful.
> > > 
> > > One such case where the virtualization is a hinderance is when the
> > > device contains an onboard interrupt controller programmed by the guest
> > > driver.  Userspace VMMs have a chance to quirk this programming,
> > > injecting the host physical MSI information, but only if the userspace
> > > driver can get access to the host physical address and data registers.
> > > 
> > > This introduces a device feature which allows the userspace driver to
> > > disable virtualization of the MSI capability address and data registers
> > > in order to provide read-only access the the physical values.
> > 
> > Personally, I very much dislike this. Encouraging such hacky driver
> > use of the interrupt subsystem is not a good direction. Enabling this
> > in VMs will further complicate fixing the IRQ usages in these drivers
> > over the long run.
> > 
> > If the device has it's own interrupt sources then the device needs to
> > create an irq_chip and related and hook them up properly. Not hackily
> > read the MSI-X registers and write them someplace else.
> > 
> > Thomas Gleixner has done alot of great work recently to clean this up.
> > 
> > So if you imagine the driver is fixed, then this is not necessary.
> 
> Yes. I looked at the at11k driver when I was reworking the PCI/MSI
> subsystem and that's a perfect candidate for a proper device specific
> interrupt domain to replace the horrible MSI hackery it has.

The ath11k hacks may be awful, but in their defence, that's because the
whole way the hardware works is awful.

Q: With PCI passthrough to a guest, how does the guest OS tell the
device where to do DMA?

A: The guest OS just hands the device a guest physical address and the
IOMMU does the rest. Nothing 'intercedes' between the guest and the
device to mess with that address.

Q: MSIs are just DMA. So with PCI passthrough to a guest, how does the
guest OS configure the device's MSIs? 

<fantasy>
A: The guest OS just hands the device a standard MSI message encoding
the target guest APIC ID and vector (etc.), and the IOMMU does the
rest. Nothing 'intercedes' between the guest and the device to mess
with that MSI message.

And thus ath11k didn't need to do *any* hacks to work around a stupid
hardware design with the VMM snooping on stuff it ideally shouldn't
have had any business touching in the first place.

Posted interrupts are almost the *default* because the IOMMU receives a
<source-id, vCPU APIC ID, vector> tuple on the bus. If receiving an
interrupt for a vCPU which isn't currently running, that's when the
IOMMU sets a bit in a table somewhere and notifies the host OS.

All that special case MSI handling and routing code that I had
nightmares about because it fell through a wormhole from a parallel
universe, doesn't exist.

And look, DPDK drivers which run in polling mode and 'abuse' MSIs by
using real memory addresses and asking the device to "write <these> 32
bits to <this> structure if you want attention" just work nicely in
virtual machines too, just as they do on real hardware.
</fantasy>

/me wakes up...

Shit.

And we have to enable this Interrupt Remapping crap even to address
more than 255 CPUs *without* virtualization? Even a *guest* has to see
a virtual IOMMU and enable Interrupt Remapping to be able to use more
than 255 vCPUs? Even though there were a metric shitload of spare bits
in the MSI message we could have used¹.

Wait, so that means we have to offer an IOMMU with *DMA* remapping to
guests, which means 2-stage translations and/or massive overhead, just
for that guest to be able to use >255 vCPUs?

Screw you all, I'm going back to bed.



¹ And *should* use, if we ever do something similar like, say, expand
  the vector# space past 8 bits. Intel and AMD take note. 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  parent reply	other threads:[~2024-12-13  9:10 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-08 13:17 ath11k and vfio-pci support James Prestwood
2024-01-10  9:00 ` Kalle Valo
2024-01-10 13:04   ` James Prestwood
2024-01-10 13:49     ` Kalle Valo
2024-01-10 14:55       ` James Prestwood
2024-01-11  3:51         ` Baochen Qiang
2024-01-11  8:16           ` Kalle Valo
2024-01-11 12:48             ` James Prestwood
2024-01-11 13:11               ` Kalle Valo
2024-01-11 13:38                 ` James Prestwood
2024-01-12  2:04                   ` Baochen Qiang
2024-01-12 12:47                     ` James Prestwood
2024-01-14 12:37                       ` Baochen Qiang
2024-01-14 14:36                         ` Kalle Valo
2024-01-15 17:46                           ` Alex Williamson
2024-01-16 10:08                             ` Baochen Qiang
2024-01-16 10:41                               ` David Woodhouse
2024-01-16 15:29                                 ` Jason Gunthorpe
2024-01-16 18:28                                 ` Alex Williamson
2024-01-16 21:10                                   ` Jeff Johnson
2024-01-17  5:47                                 ` Baochen Qiang
2024-03-21 19:14                                 ` Johannes Berg
2024-01-16 13:05                         ` James Prestwood
2024-01-17  5:26                           ` Baochen Qiang
2024-01-17 13:20                             ` James Prestwood
2024-01-17 13:43                               ` Kalle Valo
2024-01-17 14:25                                 ` James Prestwood
2024-01-18  2:09                               ` Baochen Qiang
2024-01-19 17:52                                 ` James Prestwood
2024-01-19 17:57                                   ` Kalle Valo
2024-01-19 18:07                                     ` James Prestwood
2024-01-26 18:20                                     ` James Prestwood
2024-01-27  4:31                                       ` Baochen Qiang
2024-08-12 16:59 ` [PATCH RFC/RFT] vfio/pci: Create feature to disable MSI virtualization Alex Williamson
2024-08-13 16:30   ` Jason Gunthorpe
2024-08-13 17:30     ` Thomas Gleixner
2024-08-13 23:39       ` Jason Gunthorpe
2024-12-13  9:10       ` David Woodhouse [this message]
2025-01-03 14:31         ` Jason Gunthorpe
2025-01-03 14:47           ` David Woodhouse
2025-01-03 15:19             ` Jason Gunthorpe
2024-08-13 21:14     ` Alex Williamson
2024-08-13 23:16       ` Jason Gunthorpe
2024-08-14 14:55         ` Alex Williamson
2024-08-14 15:20           ` Jason Gunthorpe
2024-08-12 17:00 ` [PATCH RFC/RFT] vfio/pci-quirks: Quirk for ath wireless Alex Williamson
2024-08-13 16:43   ` Jason Gunthorpe
2024-08-13 21:03     ` Alex Williamson
2024-08-13 23:37       ` Jason Gunthorpe
2024-08-15 16:59         ` Alex Williamson
2024-08-15 17:19           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03fdfde8dc05ecce1f1edececf0800d8cb919ac1.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=alex.williamson@redhat.com \
    --cc=ath11k@lists.infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=johannes@sipsolutions.net \
    --cc=jtornosm@redhat.com \
    --cc=kernel@quicinc.com \
    --cc=kvalo@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=prestwoj@gmail.com \
    --cc=quic_bqiang@quicinc.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).