* [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
@ 2025-07-14 8:00 David Woodhouse
2025-07-14 14:28 ` Yi Liu
2025-08-02 5:38 ` Michael Tokarev
0 siblings, 2 replies; 16+ messages in thread
From: David Woodhouse @ 2025-07-14 8:00 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, Le Tan, kib, jhb
Cc: Yi Liu, Clément Mathieu--Drif, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2961 bytes --]
From: David Woodhouse <dwmw@amazon.co.uk>
FreeBSD does both, and this appears to be perfectly valid. The VT-d
spec even talks about the ordering (the status write should be done
first, unsurprisingly).
We certainly shouldn't assert() and abort QEMU if the guest asks for
both.
Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
v2:
• Only generate the interrupt once.
• Spaces around bitwise OR.
This stops QEMU crashing, but I still can't get FreeBSD to boot and use
CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
interrupt remapping, or the native 15-bit APIC ID enlightenment.
cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
hw/i386/intel_iommu.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 69d72ad35c..851c4656c5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
{
uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
+ bool ret = true;
if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
__func__, "wait")) {
@@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
uint32_t status_data = (uint32_t)(inv_desc->lo >>
VTD_INV_DESC_WAIT_DATA_SHIFT);
- assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
-
/* FIXME: need to be masked with HAW? */
dma_addr_t status_addr = inv_desc->hi;
trace_vtd_inv_desc_wait_sw(status_addr, status_data);
@@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
&status_data, sizeof(status_data),
MEMTXATTRS_UNSPECIFIED)) {
trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
- return false;
+ ret = false;
}
- } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
+ }
+
+ if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
/* Interrupt flag */
vtd_generate_completion_event(s);
- } else {
+ }
+
+ if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
" (unknown type)", __func__, inv_desc->hi,
inv_desc->lo);
return false;
}
- return true;
+ return ret;
}
static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
--
2.43.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-14 8:00 [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait David Woodhouse
@ 2025-07-14 14:28 ` Yi Liu
2025-07-14 16:41 ` David Woodhouse
2025-08-02 5:38 ` Michael Tokarev
1 sibling, 1 reply; 16+ messages in thread
From: Yi Liu @ 2025-07-14 14:28 UTC (permalink / raw)
To: David Woodhouse, Michael S. Tsirkin, Jason Wang, Le Tan, kib, jhb
Cc: Clément Mathieu--Drif, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, qemu-devel
Hi David,
On 2025/7/14 16:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).
interesting. Have you tried setting both flags on baremetal and the hw
gives you both the status code and an interrupt?
> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
>
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
> • Only generate the interrupt once.
> • Spaces around bitwise OR.
>
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
>
>
> hw/i386/intel_iommu.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 69d72ad35c..851c4656c5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2822,6 +2822,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> {
> uint64_t mask[4] = {VTD_INV_DESC_WAIT_RSVD_LO, VTD_INV_DESC_WAIT_RSVD_HI,
> VTD_INV_DESC_ALL_ONE, VTD_INV_DESC_ALL_ONE};
> + bool ret = true;
>
> if (!vtd_inv_desc_reserved_check(s, inv_desc, mask, false,
> __func__, "wait")) {
> @@ -2833,8 +2834,6 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> uint32_t status_data = (uint32_t)(inv_desc->lo >>
> VTD_INV_DESC_WAIT_DATA_SHIFT);
>
> - assert(!(inv_desc->lo & VTD_INV_DESC_WAIT_IF));
> -
> /* FIXME: need to be masked with HAW? */
> dma_addr_t status_addr = inv_desc->hi;
> trace_vtd_inv_desc_wait_sw(status_addr, status_data);
> @@ -2843,18 +2842,22 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
> &status_data, sizeof(status_data),
> MEMTXATTRS_UNSPECIFIED)) {
> trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
> - return false;
> + ret = false;
> }
> - } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
> + }
> +
> + if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
> /* Interrupt flag */
> vtd_generate_completion_event(s);
> - } else {
> + }
> +
> + if (!(inv_desc->lo & (VTD_INV_DESC_WAIT_IF | VTD_INV_DESC_WAIT_SW))) {
> error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
> " (unknown type)", __func__, inv_desc->hi,
> inv_desc->lo);
> return false;
> }
I think this "if branch" can be moved just after the inv_desc non-zero
reserved bit checking. Hence you don't need a ret at all. :) btw. I'm
also asking if VT-d spec allows it or not. So let's wait for a while..
> - return true;
> + return ret;
> }
>
> static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-14 14:28 ` Yi Liu
@ 2025-07-14 16:41 ` David Woodhouse
2025-07-14 21:22 ` Konstantin Belousov via
0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2025-07-14 16:41 UTC (permalink / raw)
To: Yi Liu, Michael S. Tsirkin, Jason Wang, Le Tan, kib, jhb
Cc: Clément Mathieu--Drif, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, qemu-devel
On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>Hi David,
>
>On 2025/7/14 16:00, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>> spec even talks about the ordering (the status write should be done
>> first, unsurprisingly).
>
>interesting. Have you tried setting both flags on baremetal and the hw
>gives you both the status code and an interrupt?
I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:
Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an
invalidation wait command as follows:
• If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of
the status data to the status address.
• If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the
Invalidation Completion Status Register. An invalidation completion interrupt may be generated as
described in the following section
>I think this "if branch" can be moved just after the inv_desc non-zero
>reserved bit checking. Hence you don't need a ret at all. :)
We want to return false if the memory write fails, and the interrupt has to happen afterwards.
> btw. I'm
>also asking if VT-d spec allows it or not. So let's wait for a while..
Ok.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-14 16:41 ` David Woodhouse
@ 2025-07-14 21:22 ` Konstantin Belousov via
2025-07-15 6:11 ` CLEMENT MATHIEU--DRIF
0 siblings, 1 reply; 16+ messages in thread
From: Konstantin Belousov via @ 2025-07-14 21:22 UTC (permalink / raw)
To: David Woodhouse
Cc: Yi Liu, Michael S. Tsirkin, Jason Wang, Le Tan, jhb,
Clément Mathieu--Drif, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, qemu-devel
On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
> >Hi David,
> >
> >On 2025/7/14 16:00, David Woodhouse wrote:
> >> From: David Woodhouse <dwmw@amazon.co.uk>
> >>
> >> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> >> spec even talks about the ordering (the status write should be done
> >> first, unsurprisingly).
> >
> >interesting. Have you tried setting both flags on baremetal and the hw
> >gives you both the status code and an interrupt?
>
> I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:
>
> Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an
> invalidation wait command as follows:
> • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of
> the status data to the status address.
> • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the
> Invalidation Completion Status Register. An invalidation completion interrupt may be generated as
> described in the following section
>
Yes, and the FreeBSD DMAR code uses that, and relies on that, as was
mentioned earlier in the mail thread.
>
>
> >I think this "if branch" can be moved just after the inv_desc non-zero
> >reserved bit checking. Hence you don't need a ret at all. :)
>
> We want to return false if the memory write fails, and the interrupt has to happen afterwards.
>
> > btw. I'm
> >also asking if VT-d spec allows it or not. So let's wait for a while..
>
> Ok.
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-14 21:22 ` Konstantin Belousov via
@ 2025-07-15 6:11 ` CLEMENT MATHIEU--DRIF
2025-07-15 8:27 ` David Woodhouse
2025-07-15 12:35 ` Yi Liu
0 siblings, 2 replies; 16+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-07-15 6:11 UTC (permalink / raw)
To: Konstantin Belousov, David Woodhouse
Cc: Yi Liu, Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>> Hi David,
>>>
>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>
>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>> spec even talks about the ordering (the status write should be done
>>>> first, unsurprisingly).
Are you talking about the ordering constraint mentioned in bullet
"Page-request Drain (PD)"?
>>>
>>> interesting. Have you tried setting both flags on baremetal and the hw
>>> gives you both the status code and an interrupt?
>>
>> I see no reason why it shouldn't. The spec (§6.5.2.8) gives no that the IF and SW bits should be mutually exclusive and even talks about ordering:
>>
>> Section 6.5.2.11 describes queued invalidation ordering considerations. Hardware completes an
>> invalidation wait command as follows:
>> • If a status write is specified in the wait descriptor (SW=1), hardware performs a coherent write of
>> the status data to the status address.
>> • If an interrupt is requested in the wait descriptor (IF=1), hardware sets the IWC field in the
>> Invalidation Completion Status Register. An invalidation completion interrupt may be generated as
>> described in the following section
>>
>
> Yes, and the FreeBSD DMAR code uses that, and relies on that, as was
> mentioned earlier in the mail thread.
>
>>
>>
>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>> reserved bit checking. Hence you don't need a ret at all. :)
>>
>> We want to return false if the memory write fails, and the interrupt has to happen afterwards.
Per spec: "Hardware behavior is undefined if the Status Address
specified is not an address route-able to memory"
Do we want to trigger the interrupt even when the DMA fails?
>>
>>> btw. I'm
>>> also asking if VT-d spec allows it or not. So let's wait for a while..
Thanks Yi
>>
>> Ok.
>>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 6:11 ` CLEMENT MATHIEU--DRIF
@ 2025-07-15 8:27 ` David Woodhouse
2025-07-15 12:27 ` CLEMENT MATHIEU--DRIF
2025-07-15 12:35 ` Yi Liu
1 sibling, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2025-07-15 8:27 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, Konstantin Belousov
Cc: Yi Liu, Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
[-- Attachment #1: Type: text/plain, Size: 2287 bytes --]
On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>
>
> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > Caution: External email. Do not open attachments or click links,
> > unless this email comes from a known sender and you know the
> > content is safe.
> >
> >
> > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > wrote:
> > > > Hi David,
> > > >
> > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > >
> > > > > FreeBSD does both, and this appears to be perfectly valid. The VT-d
> > > > > spec even talks about the ordering (the status write should be done
> > > > > first, unsurprisingly).
>
> Are you talking about the ordering constraint mentioned in bullet
> "Page-request Drain (PD)"?
>
No, in the v4.0 spec it's just below that bullet list, at the bottom of
§6.5.2.8.
The wording in §6.5.2.9 is even clearer:
"The invalidation completion event interrupt must push any in-flight
invalidation completion status writes, including status writes that may
have originated from the same inv_wait_dsc for which the interrupt was
generated."
> > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > reserved bit checking. Hence you don't need a ret at all. :)
> > >
> > > We want to return false if the memory write fails, and the
> > > interrupt has to happen afterwards.
>
> Per spec: "Hardware behavior is undefined if the Status Address
> specified is not an address route-able to memory"
>
> Do we want to trigger the interrupt even when the DMA fails?
Yes, we do. That's a quality of implementation issue. Just because the
behaviour is 'undefined' and theoretically gives us permission to do
whatever we like to the guest, we should still be as sensible as
possible.
> > >
> > > > btw. I'm
> > > > also asking if VT-d spec allows it or not. So let's wait for a
> > > > while..
Rapidly losing interest in this conversation. QEMU currently has a
guest-triggerable crash, for crying out loud. I'd like to fix it and
move on to finding out why FreeBSD doesn't work *even* when QEMU
doesn't abort...
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 8:27 ` David Woodhouse
@ 2025-07-15 12:27 ` CLEMENT MATHIEU--DRIF
2025-07-16 4:01 ` Yi Liu
0 siblings, 1 reply; 16+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-07-15 12:27 UTC (permalink / raw)
To: David Woodhouse, Konstantin Belousov
Cc: Yi Liu, Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 15/07/2025 10:27 am, David Woodhouse wrote:
> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>
>>
>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>> Caution: External email. Do not open attachments or click links,
>>> unless this email comes from a known sender and you know the
>>> content is safe.
>>>
>>>
>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>> wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>
>>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>>> spec even talks about the ordering (the status write should be done
>>>>>> first, unsurprisingly).
>>
>> Are you talking about the ordering constraint mentioned in bullet
>> "Page-request Drain (PD)"?
>>
>
> No, in the v4.0 spec it's just below that bullet list, at the bottom of
> §6.5.2.8.
>
> The wording in §6.5.2.9 is even clearer:
>
> "The invalidation completion event interrupt must push any in-flight
> invalidation completion status writes, including status writes that may
> have originated from the same inv_wait_dsc for which the interrupt was
> generated."
Fine
>
>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>
>>>> We want to return false if the memory write fails, and the
>>>> interrupt has to happen afterwards.
>>
>> Per spec: "Hardware behavior is undefined if the Status Address
>> specified is not an address route-able to memory"
>>
>> Do we want to trigger the interrupt even when the DMA fails?
>
> Yes, we do. That's a quality of implementation issue. Just because the
> behaviour is 'undefined' and theoretically gives us permission to do
> whatever we like to the guest, we should still be as sensible as
> possible.
lgtm
>
>
>>>>
>>>>> btw. I'm
>>>>> also asking if VT-d spec allows it or not. So let's wait for a
>>>>> while..
>
> Rapidly losing interest in this conversation. QEMU currently has a
> guest-triggerable crash, for crying out loud. I'd like to fix it and
> move on to finding out why FreeBSD doesn't work *even* when QEMU
> doesn't abort...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 6:11 ` CLEMENT MATHIEU--DRIF
2025-07-15 8:27 ` David Woodhouse
@ 2025-07-15 12:35 ` Yi Liu
2025-07-15 13:59 ` CLEMENT MATHIEU--DRIF
2025-07-22 12:04 ` David Woodhouse
1 sibling, 2 replies; 16+ messages in thread
From: Yi Liu @ 2025-07-15 12:35 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, Konstantin Belousov, David Woodhouse
Cc: Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote:
>
>
> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>>
>>
>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>>> Hi David,
>>>>
>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>
>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>> spec even talks about the ordering (the status write should be done
>>>>> first, unsurprisingly).
>
> Are you talking about the ordering constraint mentioned in bullet
> "Page-request Drain (PD)"?
David is talking about the IF and SW flags. And he is correct. Spec has
below sentence. It means a wait descriptor can have both IF and SW set
and indeed completion interrupt happens later than status write. Let's
go on refine the patch. :)
"The invalidation completion event interrupt must push any in-flight
invalidation completion status
writes, including status writes that may have originated from the same
inv_wait_dsc for which the interrupt was generated."
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 12:35 ` Yi Liu
@ 2025-07-15 13:59 ` CLEMENT MATHIEU--DRIF
2025-07-22 12:04 ` David Woodhouse
1 sibling, 0 replies; 16+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-07-15 13:59 UTC (permalink / raw)
To: Yi Liu, Konstantin Belousov, David Woodhouse
Cc: Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 15/07/2025 2:35 pm, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links, unless
> this email comes from a known sender and you know the content is safe.
>
>
> On 2025/7/15 14:11, CLEMENT MATHIEU--DRIF wrote:
>>
>>
>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>> Caution: External email. Do not open attachments or click links,
>>> unless this email comes from a known sender and you know the content
>>> is safe.
>>>
>>>
>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com> wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>
>>>>>> FreeBSD does both, and this appears to be perfectly valid. The VT-d
>>>>>> spec even talks about the ordering (the status write should be done
>>>>>> first, unsurprisingly).
>>
>> Are you talking about the ordering constraint mentioned in bullet
>> "Page-request Drain (PD)"?
>
> David is talking about the IF and SW flags. And he is correct. Spec has
> below sentence. It means a wait descriptor can have both IF and SW set
> and indeed completion interrupt happens later than status write. Let's
> go on refine the patch. :)
>
> "The invalidation completion event interrupt must push any in-flight
> invalidation completion status
> writes, including status writes that may have originated from the same
> inv_wait_dsc for which the interrupt was generated."
Yep, I agree
>
>
> Regards,
> Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 12:27 ` CLEMENT MATHIEU--DRIF
@ 2025-07-16 4:01 ` Yi Liu
2025-07-16 4:05 ` Konstantin Belousov
0 siblings, 1 reply; 16+ messages in thread
From: Yi Liu @ 2025-07-16 4:01 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, David Woodhouse, Konstantin Belousov
Cc: Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
>
>
> On 15/07/2025 10:27 am, David Woodhouse wrote:
>> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>>
>>>
>>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>>>
>>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>>> wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>>
>>
>>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>>
>>>>> We want to return false if the memory write fails, and the
>>>>> interrupt has to happen afterwards.
>>>
>>> Per spec: "Hardware behavior is undefined if the Status Address
>>> specified is not an address route-able to memory"
>>>
>>> Do we want to trigger the interrupt even when the DMA fails?
>>
>> Yes, we do. That's a quality of implementation issue. Just because the
>> behaviour is 'undefined' and theoretically gives us permission to do
>> whatever we like to the guest, we should still be as sensible as
>> possible.
>
Personally, I'm fine with generating the interrupt even the status write
failed. But to avoid potential conflict, I've also raised this question to
the VT-d spec owner. Haven't got a clear answer yet. To further understand
this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
flag. What is the usage model here. Would software consider that all the QI
descriptors prior to this specific wait descriptor has succeeded when
either the interrupt got invoked or the expected status is written back?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-16 4:01 ` Yi Liu
@ 2025-07-16 4:05 ` Konstantin Belousov
2025-07-16 9:23 ` Yi Liu
0 siblings, 1 reply; 16+ messages in thread
From: Konstantin Belousov @ 2025-07-16 4:05 UTC (permalink / raw)
To: Yi Liu
Cc: CLEMENT MATHIEU--DRIF, David Woodhouse, Konstantin Belousov,
Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
> On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
> >
> >
> > On 15/07/2025 10:27 am, David Woodhouse wrote:
> > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
> > > >
> > > >
> > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > > > >
> > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > > > > wrote:
> > > > > > > Hi David,
> > > > > > >
> > > > > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > > > >
> > >
> > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > > > > reserved bit checking. Hence you don't need a ret at all. :)
> > > > > >
> > > > > > We want to return false if the memory write fails, and the
> > > > > > interrupt has to happen afterwards.
> > > >
> > > > Per spec: "Hardware behavior is undefined if the Status Address
> > > > specified is not an address route-able to memory"
> > > >
> > > > Do we want to trigger the interrupt even when the DMA fails?
> > >
> > > Yes, we do. That's a quality of implementation issue. Just because the
> > > behaviour is 'undefined' and theoretically gives us permission to do
> > > whatever we like to the guest, we should still be as sensible as
> > > possible.
> >
>
> Personally, I'm fine with generating the interrupt even the status write
> failed. But to avoid potential conflict, I've also raised this question to
> the VT-d spec owner. Haven't got a clear answer yet. To further understand
> this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
> flag. What is the usage model here. Would software consider that all the QI
> descriptors prior to this specific wait descriptor has succeeded when
> either the interrupt got invoked or the expected status is written back?
FreeBSD queues invalidations, each invalidation has the gen number. To
know that some invalidation finished, FreeBSD waits for the interrupt,
we do not scan the invalidation sequence word otherwise. There might be
further generations of the invalidation descriptors in flight when we
get the interrupt, which means that we need to know which generation is
finished.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-16 4:05 ` Konstantin Belousov
@ 2025-07-16 9:23 ` Yi Liu
2025-07-16 9:36 ` Konstantin Belousov via
0 siblings, 1 reply; 16+ messages in thread
From: Yi Liu @ 2025-07-16 9:23 UTC (permalink / raw)
To: Konstantin Belousov
Cc: CLEMENT MATHIEU--DRIF, David Woodhouse, Konstantin Belousov,
Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
On 2025/7/16 12:05, Konstantin Belousov wrote:
> On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
>> On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
>>>
>>>
>>> On 15/07/2025 10:27 am, David Woodhouse wrote:
>>>> On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
>>>>>
>>>>>
>>>>> On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
>>>>>>
>>>>>> On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
>>>>>>> On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
>>>>>>> wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 2025/7/14 16:00, David Woodhouse wrote:
>>>>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>>>>
>>>>
>>>>>>>> I think this "if branch" can be moved just after the inv_desc non-zero
>>>>>>>> reserved bit checking. Hence you don't need a ret at all. :)
>>>>>>>
>>>>>>> We want to return false if the memory write fails, and the
>>>>>>> interrupt has to happen afterwards.
>>>>>
>>>>> Per spec: "Hardware behavior is undefined if the Status Address
>>>>> specified is not an address route-able to memory"
>>>>>
>>>>> Do we want to trigger the interrupt even when the DMA fails?
>>>>
>>>> Yes, we do. That's a quality of implementation issue. Just because the
>>>> behaviour is 'undefined' and theoretically gives us permission to do
>>>> whatever we like to the guest, we should still be as sensible as
>>>> possible.
>>>
>>
>> Personally, I'm fine with generating the interrupt even the status write
>> failed. But to avoid potential conflict, I've also raised this question to
>> the VT-d spec owner. Haven't got a clear answer yet. To further understand
>> this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
>> flag. What is the usage model here. Would software consider that all the QI
>> descriptors prior to this specific wait descriptor has succeeded when
>> either the interrupt got invoked or the expected status is written back?
>
> FreeBSD queues invalidations, each invalidation has the gen number. To
> know that some invalidation finished, FreeBSD waits for the interrupt,
> we do not scan the invalidation sequence word otherwise. There might be
> further generations of the invalidation descriptors in flight when we
> get the interrupt, which means that we need to know which generation is
> finished.
thanks for the explanation. So software still relies on checking the
written back status of the wait descriptor to identify finished
invalidation. If so might be better to generate interrupt when status write
is succeeded? Otherwise, the interrupt is meaningless to software. Does the
current software implementation rely on this interrupt even status write
failed?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-16 9:23 ` Yi Liu
@ 2025-07-16 9:36 ` Konstantin Belousov via
0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Belousov via @ 2025-07-16 9:36 UTC (permalink / raw)
To: Yi Liu
Cc: CLEMENT MATHIEU--DRIF, David Woodhouse, Michael S. Tsirkin,
Jason Wang, Le Tan, jhb@freebsd.org, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost,
qemu-devel@nongnu.org
On Wed, Jul 16, 2025 at 05:23:57PM +0800, Yi Liu wrote:
> On 2025/7/16 12:05, Konstantin Belousov wrote:
> > On Wed, Jul 16, 2025 at 12:01:44PM +0800, Yi Liu wrote:
> > > On 2025/7/15 20:27, CLEMENT MATHIEU--DRIF wrote:
> > > >
> > > >
> > > > On 15/07/2025 10:27 am, David Woodhouse wrote:
> > > > > On Tue, 2025-07-15 at 06:11 +0000, CLEMENT MATHIEU--DRIF wrote:
> > > > > >
> > > > > >
> > > > > > On 14/07/2025 11:22 pm, Konstantin Belousov wrote:
> > > > > > >
> > > > > > > On Mon, Jul 14, 2025 at 05:41:22PM +0100, David Woodhouse wrote:
> > > > > > > > On 14 July 2025 15:28:09 GMT+01:00, Yi Liu <yi.l.liu@intel.com>
> > > > > > > > wrote:
> > > > > > > > > Hi David,
> > > > > > > > >
> > > > > > > > > On 2025/7/14 16:00, David Woodhouse wrote:
> > > > > > > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > > > > > >
> > > > >
> > > > > > > > > I think this "if branch" can be moved just after the inv_desc non-zero
> > > > > > > > > reserved bit checking. Hence you don't need a ret at all. :)
> > > > > > > >
> > > > > > > > We want to return false if the memory write fails, and the
> > > > > > > > interrupt has to happen afterwards.
> > > > > >
> > > > > > Per spec: "Hardware behavior is undefined if the Status Address
> > > > > > specified is not an address route-able to memory"
> > > > > >
> > > > > > Do we want to trigger the interrupt even when the DMA fails?
> > > > >
> > > > > Yes, we do. That's a quality of implementation issue. Just because the
> > > > > behaviour is 'undefined' and theoretically gives us permission to do
> > > > > whatever we like to the guest, we should still be as sensible as
> > > > > possible.
> > > >
> > >
> > > Personally, I'm fine with generating the interrupt even the status write
> > > failed. But to avoid potential conflict, I've also raised this question to
> > > the VT-d spec owner. Haven't got a clear answer yet. To further understand
> > > this, may I ask some dumb questions here. Why FreeBSD set both SW and IF
> > > flag. What is the usage model here. Would software consider that all the QI
> > > descriptors prior to this specific wait descriptor has succeeded when
> > > either the interrupt got invoked or the expected status is written back?
> >
> > FreeBSD queues invalidations, each invalidation has the gen number. To
> > know that some invalidation finished, FreeBSD waits for the interrupt,
> > we do not scan the invalidation sequence word otherwise. There might be
> > further generations of the invalidation descriptors in flight when we
> > get the interrupt, which means that we need to know which generation is
> > finished.
>
> thanks for the explanation. So software still relies on checking the
> written back status of the wait descriptor to identify finished
> invalidation. If so might be better to generate interrupt when status write
> is succeeded? Otherwise, the interrupt is meaningless to software. Does the
> current software implementation rely on this interrupt even status write
> failed?
Yes, if the inv sequence word is not updated past the requested value,
the interrupt is basically ignored.
OTOH, the FreeBSD DMAR driver (should) never program non-main memory
physical address into the address field of the inv descriptor. And we
use the same location for all invalidations issued from the single DMAR
unit.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-15 12:35 ` Yi Liu
2025-07-15 13:59 ` CLEMENT MATHIEU--DRIF
@ 2025-07-22 12:04 ` David Woodhouse
2025-08-01 15:09 ` Liu, Yi L
1 sibling, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2025-07-22 12:04 UTC (permalink / raw)
To: Yi Liu, CLEMENT MATHIEU--DRIF, Konstantin Belousov
Cc: Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
[-- Attachment #1: Type: text/plain, Size: 511 bytes --]
On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote:
>
> David is talking about the IF and SW flags. And he is correct. Spec has
> below sentence. It means a wait descriptor can have both IF and SW set
> and indeed completion interrupt happens later than status write. Let's
> go on refine the patch. :)
Are there any refinements you believe are necessary?
https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's
going to be clean, and I think we resolved everything we talked about.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-22 12:04 ` David Woodhouse
@ 2025-08-01 15:09 ` Liu, Yi L
0 siblings, 0 replies; 16+ messages in thread
From: Liu, Yi L @ 2025-08-01 15:09 UTC (permalink / raw)
To: David Woodhouse, CLEMENT MATHIEU--DRIF, Konstantin Belousov
Cc: Michael S. Tsirkin, Jason Wang, Le Tan, jhb@freebsd.org,
Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, qemu-devel@nongnu.org
> From: David Woodhouse <dwmw2@infradead.org>
> Sent: Tuesday, July 22, 2025 8:04 PM
>
> On Tue, 2025-07-15 at 20:35 +0800, Yi Liu wrote:
> >
> > David is talking about the IF and SW flags. And he is correct. Spec has
> > below sentence. It means a wait descriptor can have both IF and SW set
> > and indeed completion interrupt happens later than status write. Let's
> > go on refine the patch. :)
>
> Are there any refinements you believe are necessary?
No extra change I suppose. I was just waiting for a respond from VT-d arch. I've now
got a clear answer. It's ok to generate the interrupt no matter the status write failed
or not if IF is set. Sorry for the delayed respond.
> https://gitlab.com/dwmw2/qemu/-/pipelines/1941324674 looks like it's
> going to be clean, and I think we resolved everything we talked about.
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Thanks,
Yi Liu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait
2025-07-14 8:00 [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait David Woodhouse
2025-07-14 14:28 ` Yi Liu
@ 2025-08-02 5:38 ` Michael Tokarev
1 sibling, 0 replies; 16+ messages in thread
From: Michael Tokarev @ 2025-08-02 5:38 UTC (permalink / raw)
To: David Woodhouse, Michael S. Tsirkin, Jason Wang, Le Tan, kib, jhb
Cc: Yi Liu, Clément Mathieu--Drif, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost, qemu-devel,
qemu-stable
On 14.07.2025 11:00, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> FreeBSD does both, and this appears to be perfectly valid. The VT-d
> spec even talks about the ordering (the status write should be done
> first, unsurprisingly).
>
> We certainly shouldn't assert() and abort QEMU if the guest asks for
> both.
>
> Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> v2:
> • Only generate the interrupt once.
> • Spaces around bitwise OR.
>
> This stops QEMU crashing, but I still can't get FreeBSD to boot and use
> CPUs with APIC ID > 255 using *either* Intel or AMD IOMMU with
> interrupt remapping, or the native 15-bit APIC ID enlightenment.
> cf. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288122
>
>
> hw/i386/intel_iommu.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
This looks like a qemu-stable material (for 10.0).
Please let me know if it is not.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-02 5:39 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-14 8:00 [PATCH v2] intel_iommu: Allow both Status Write and Interrupt Flag in QI wait David Woodhouse
2025-07-14 14:28 ` Yi Liu
2025-07-14 16:41 ` David Woodhouse
2025-07-14 21:22 ` Konstantin Belousov via
2025-07-15 6:11 ` CLEMENT MATHIEU--DRIF
2025-07-15 8:27 ` David Woodhouse
2025-07-15 12:27 ` CLEMENT MATHIEU--DRIF
2025-07-16 4:01 ` Yi Liu
2025-07-16 4:05 ` Konstantin Belousov
2025-07-16 9:23 ` Yi Liu
2025-07-16 9:36 ` Konstantin Belousov via
2025-07-15 12:35 ` Yi Liu
2025-07-15 13:59 ` CLEMENT MATHIEU--DRIF
2025-07-22 12:04 ` David Woodhouse
2025-08-01 15:09 ` Liu, Yi L
2025-08-02 5:38 ` Michael Tokarev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).