public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: Baochen Qiang <quic_bqiang@quicinc.com>
To: David Woodhouse <dwmw2@infradead.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kalle Valo <kvalo@kernel.org>
Cc: James Prestwood <prestwoj@gmail.com>,
	<linux-wireless@vger.kernel.org>, <ath11k@lists.infradead.org>,
	<iommu@lists.linux.dev>
Subject: Re: ath11k and vfio-pci support
Date: Wed, 17 Jan 2024 13:47:28 +0800	[thread overview]
Message-ID: <bea2eca9-e1c8-4220-bf3a-1feb6287c983@quicinc.com> (raw)
In-Reply-To: <57d20bd812ccf8d1a5815ad41b5dcea3925d4fe1.camel@infradead.org>



On 1/16/2024 6:41 PM, David Woodhouse wrote:
> On Tue, 2024-01-16 at 18:08 +0800, Baochen Qiang wrote:
>>
>>
>> On 1/16/2024 1:46 AM, Alex Williamson wrote:
>>> On Sun, 14 Jan 2024 16:36:02 +0200
>>> Kalle Valo <kvalo@kernel.org> wrote:
>>>
>>>> Baochen Qiang <quic_bqiang@quicinc.com> writes:
>>>>
>>>>>>> Strange that still fails. Are you now seeing this error in your
>>>>>>> host or your Qemu? or both?
>>>>>>> Could you share your test steps? And if you can share please be as
>>>>>>> detailed as possible since I'm not familiar with passing WLAN
>>>>>>> hardware to a VM using vfio-pci.
>>>>>>
>>>>>> Just in Qemu, the hardware works fine on my host machine.
>>>>>> I basically follow this guide to set it up, its written in the
>>>>>> context of GPUs/libvirt but the host setup is exactly the same. By
>>>>>> no means do you need to read it all, once you set the vfio-pci.ids
>>>>>> and see your unclaimed adapter you can stop:
>>>>>> https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF
>>>>>> In short you should be able to set the following host kernel options
>>>>>> and reboot (assuming your motherboard/hardware is compatible):
>>>>>> intel_iommu=on iommu=pt vfio-pci.ids=17cb:1103
>>>>>> Obviously change the device/vendor IDs to whatever ath11k hw you
>>>>>> have. Once the host is rebooted you should see your wlan adapter as
>>>>>> UNCLAIMED, showing the driver in use as vfio-pci. If not, its likely
>>>>>> your motherboard just isn't compatible, the device has to be in its
>>>>>> own IOMMU group (you could try switching PCI ports if this is the
>>>>>> case).
>>>>>> I then build a "kvm_guest.config" kernel with the driver/firmware
>>>>>> for ath11k and boot into that with the following Qemu options:
>>>>>> -enable-kvm -device -vfio-pci,host=<PCI address>
>>>>>> If it seems easier you could also utilize IWD's test-runner which
>>>>>> handles launching the Qemu kernel automatically, detecting any
>>>>>> vfio-devices and passes them through and mounts some useful host
>>>>>> folders into the VM. Its actually a very good general purpose tool
>>>>>> for kernel testing, not just for IWD:
>>>>>> https://git.kernel.org/pub/scm/network/wireless/iwd.git/tree/doc/test-runner.txt
>>>>>> Once set up you can just run test-runner with a few flags and you'll
>>>>>> boot into a shell:
>>>>>> ./tools/test-runner -k <kernel-image> --hw --start /bin/bash
>>>>>> Please reach out if you have questions, thanks for looking into
>>>>>> this.
>>>>>
>>>>> Thanks for these details. I reproduced this issue by following your guide.
>>>>>
>>>>> Seems the root cause is that the MSI vector assigned to WCN6855 in
>>>>> qemu is different with that in host. In my case the MSI vector in qemu
>>>>> is [Address: fee00000  Data: 0020] while in host it is [Address:
>>>>> fee00578 Data: 0000]. So in qemu ath11k configures MSI vector
>>>>> [Address: fee00000 Data: 0020] to WCN6855 hardware/firmware, and
>>>>> firmware uses that vector to fire interrupts to host/qemu. However
>>>>> host IOMMU doesn't know that vector because the real vector is
>>>>> [Address: fee00578  Data: 0000], as a result host blocks that
>>>>> interrupt and reports an error, see below log:
>>>>>
>>>>> [ 1414.206069] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.206081] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.210334] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.210342] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.212496] DMAR: DRHD: handling fault status reg 2
>>>>> [ 1414.212503] DMAR: [INTR-REMAP] Request device [02:00.0] fault index
>>>>> 0x0 [fault reason 0x25] Blocked a compatibility format interrupt
>>>>> request
>>>>> [ 1414.214600] DMAR: DRHD: handling fault status reg 2
>>>>>
>>>>> While I don't think there is a way for qemu/ath11k to get the real MSI
>>>>> vector from host, I will try to read the vfio code to check further.
>>>>> Before that, to unblock you, a possible hack is to hard code the MSI
>>>>> vector in qemu to the same as in host, on condition that the MSI
>>>>> vector doesn't change.
>>>>
>>>> Baochen, awesome that you were able to debug this further. Now we at
>>>> least know what's the problem.
>>>
>>> It's an interesting problem, I don't think we've seen another device
>>> where the driver reads the MSI register in order to program another
>>> hardware entity to match the MSI address and data configuration.
>>>
>>> When assigning a device, the host and guest use entirely separate
>>> address spaces for MSI interrupts.  When the guest enables MSI, the
>>> operation is trapped by the VMM and triggers an ioctl to the host to
>>> perform an equivalent configuration.  Generally the physical device
>>> will interrupt within the host where it may be directly attached to KVM
>>> to signal the interrupt, trigger through the VMM, or where
>>> virtualization hardware supports it, the interrupt can directly trigger
>>> the vCPU.   From the VM perspective, the guest address/data pair is used
>>> to signal the interrupt, which is why it makes sense to virtualize the
>>> MSI registers.
>>
>> Hi Alex, could you help elaborate more? why from the VM perspective MSI
>> virtualization is necessary?
> 
> An MSI is just a write to physical memory space. You can even use it
> like that; configure the device to just write 4 bytes to some address
> in a struct in memory to show that it needs attention, and you then
> poll that memory.
> 
> But mostly we don't (ab)use it like that, of course. We tell the device
> to write to a special range of the physical address space where the
> interrupt controller lives — the range from 0xfee00000 to 0xfeefffff.
> The low 20 bits of the address, and the 32 bits of data written to that
> address, tell the interrupt controller which CPU to interrupt, and
> which vector to raise on the CPU (as well as some other details and
> weird interrupt modes which are theoretically encodable).
> 
> So in your example, the guest writes [Address: fee00000  Data: 0020]
> which means it wants vector 0x20 on CPU#0 (well, the CPU with APICID
> 0). But that's what the *guest* wants. If we just blindly programmed
> that into the hardware, the hardware would deliver vector 0x20 to the
> host's CPU0... which would be very confused by it.
> 
> The host has a driver for that device, probably the VFIO driver. The
> host registers its own interrupt handlers for the real hardware,
> decides which *host* CPU (and vector) should be notified when something
> happens. And when that happens, the VFIO driver will raise an event on
> an eventfd, which will notify QEMU to inject the appropriate interrupt
> into the guest.
> 
> So... when the guest enables the MSI, that's trapped by QEMU which
> remembers which *guest* CPU/vector the interrupt should go to. QEMU
> tells VFIO to enable the corresponding interrupt, and what gets
> programmed into the actual hardware is up to the *host* operating
> system; nothing to do with the guest's information at all.
> 
> Then when the actual hardware raises the interrupt, the VFIO interrupt
> handler runs in the guest, signals an event on the eventfd, and QEMU
> receives that and injects the event into the appropriate guest vCPU.
> 
> (In practice QEMU doesn't do it these days; there's actually a shortcut
> which improves latency by allowing the kernel to deliver the event to
> the guest directly, connecting the eventfd directly to the KVM irq
> routing table.)
> 
> 
> Interrupt remapping is probably not important here, but I'll explain it
> briefly anyway. With interrupt remapping, the IOMMU handles the
> 'memory' write from the device, just as it handles all other memory
> transactions. One of the reasons for interrupt remapping is that the
> original definitions of the bits in the MSI (the low 20 bits of the
> address and the 32 bits of what's written) only had 8 bits for the
> target CPU APICID. And we have bigger systems than that now.
> 
> So by using one of the spare bits in the MSI message, we can indicate
> that this isn't just a directly-encoded cpu/vector in "Compatibility
> Format", but is a "Remappable Format" interrupt. Instead of the
> cpu/vector it just contains an index in to the IOMMU's Interrupt
> Redirection Table. Which *does* have a full 32-bits for the target APIC
> ID. That's why x2apic support (which gives us support for >254 CPUs)
> depends on interrupt remapping.
> 
> The other thing that the IOMMU can do in modern systems is *posted*
> interrupts. Where the entry in the IOMMU's IRT doesn't just specify the
> host's CPU/vector, but actually specifies a *vCPU* to deliver the
> interrupt to.
> 
> All of which is mostly irrelevant as it's just another bypass
> optimisation to improve latency. The key here is that what the guest
> writes to its emulated MSI table and what the host writes to the real
> hardware are not at all related.
> 
Thanks. A really detailed and clear explanation.

> If we had had this posted interrupt support from the beginning, perhaps
> we could have have a much simpler model — we just let the guest write
> its intended (v)CPU#/vector *directly* to the MSI table in the device,
> and let the IOMMU fix it up by having a table pointing to the
> appropriate set of vCPUs. But that isn't how it happened. The model we
> have is that the VMM has to *emulate* the config space and handle the
> interrupts as described above.
> 
> This means that whenever a device has a non-standard way of configuring
> MSIs, the VMM has to understand and intercept that. I believe we've
> even seen some Atheros devices with the MSI target in some weird MMIO
> registers instead of the standard location, so we've had to hack QEMU
> to handle those too?
> 
>> And, maybe a stupid question, is that possible VM/KVM or vfio only
>> virtualize write operation to MSI register but leave read operation
>> un-virtualized? I am asking this because in that way ath11k may get a
>> chance to run in VM after getting the real vector.
> 
> That might confuse a number of operating systems. Especially if they
> mask/unmask by reading the register, flipping the mask bit and writing
> back again.
> 
> How exactly is the content of this register then given back to the
> firmware? Is that communication snoopable by the VMM?
By programming it to a MMIO register. It is a non-standard register and 
also device specific, not sure snoopable or not by the VMM.

> 
> 
>>>
>>> Off hand I don't have a good solution for this, the hardware is
>>> essentially imposing a unique requirement for MSI programming that the
>>> driver needs visibility of the physical MSI address and data.
>>>
> 
> Strictly, the driver doesn't need visibility to the actual values used
> by the hardware. Another way of it looking at it would be to say that
> the driver programs the MSI through this non-standard method, it just
> needs the VMM to trap and handle that, just as the VMM does for the
> standard MSI table.
> 
> Which is what I thought we'd already seen on some Atheros devices.
> 
>>>    It's
>>> conceivable that device specific code could either make the physical
>>> address/data pair visible to the VM or trap the firmware programming to
>>> inject the correct physical values.  Is there somewhere other than the
>>> standard MSI capability in config space that the driver could learn the
>>> physical values, ie. somewhere that isn't virtualized?  Thanks,
>>
>> I don't think we have such capability in configuration space.
> 
> Configuration space is a complete fiction though; it's all emulated. We
> can do anything we like. Or we can have a PV hypercall which will
> report it. I don't know that we'd *want* to, but all things are
> possible.
OK, I get the point now.

> 

  parent reply	other threads:[~2024-01-17  5:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <adcb785e-4dc7-4c4a-b341-d53b72e13467@gmail.com>
     [not found] ` <8734v5zhol.fsf@kernel.org>
     [not found]   ` <87fa5220-6fd9-433d-879b-c55ac67a0748@gmail.com>
     [not found]     ` <87r0ipcn7j.fsf@kernel.org>
     [not found]       ` <356e0b05-f396-4ad7-9b29-c492b54af834@gmail.com>
     [not found]         ` <26119c3f-9012-47bb-948e-7e976d4773a7@quicinc.com>
     [not found]           ` <87mstccmk6.fsf@kernel.org>
     [not found]             ` <df9fd970-5af3-468c-b1f1-18f91215cf44@gmail.com>
     [not found]               ` <8734v4auc4.fsf@kernel.org>
     [not found]                 ` <e8878979-1f3f-4635-a716-9ac381c617d9@gmail.com>
     [not found]                   ` <285b84d0-229c-4c83-a7d6-4c3c23139597@quicinc.com>
     [not found]                     ` <4607fb37-8227-49a3-9e8c-10c9b117ec7b@gmail.com>
     [not found]                       ` <3d22a730-aee5-4f2a-9ddc-b4b5bd4d62fe@quicinc.com>
2024-01-14 14:36                         ` ath11k and vfio-pci support Kalle Valo
2024-01-15 17:46                           ` Alex Williamson
2024-01-16 10:08                             ` Baochen Qiang
2024-01-16 10:41                               ` David Woodhouse
2024-01-16 15:29                                 ` Jason Gunthorpe
2024-01-16 18:28                                 ` Alex Williamson
2024-01-16 21:10                                   ` Jeff Johnson
2024-01-17  5:47                                 ` Baochen Qiang [this message]
2024-03-21 19:14                                 ` Johannes Berg
2024-08-12 16:59 ` [PATCH RFC/RFT] vfio/pci: Create feature to disable MSI virtualization Alex Williamson
2024-08-13 16:30   ` Jason Gunthorpe
2024-08-13 17:30     ` Thomas Gleixner
2024-08-13 23:39       ` Jason Gunthorpe
2024-12-13  9:10       ` David Woodhouse
2025-01-03 14:31         ` Jason Gunthorpe
2025-01-03 14:47           ` David Woodhouse
2025-01-03 15:19             ` Jason Gunthorpe
2024-08-13 21:14     ` Alex Williamson
2024-08-13 23:16       ` Jason Gunthorpe
2024-08-14 14:55         ` Alex Williamson
2024-08-14 15:20           ` Jason Gunthorpe
2024-08-12 17:00 ` [PATCH RFC/RFT] vfio/pci-quirks: Quirk for ath wireless Alex Williamson
2024-08-13 16:43   ` Jason Gunthorpe
2024-08-13 21:03     ` Alex Williamson
2024-08-13 23:37       ` Jason Gunthorpe
2024-08-15 16:59         ` Alex Williamson
2024-08-15 17:19           ` Jason Gunthorpe
2026-03-16 14:58             ` James Prestwood
2026-03-16 15:43               ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bea2eca9-e1c8-4220-bf3a-1feb6287c983@quicinc.com \
    --to=quic_bqiang@quicinc.com \
    --cc=alex.williamson@redhat.com \
    --cc=ath11k@lists.infradead.org \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=kvalo@kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=prestwoj@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox