All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.18] iommu/vt-d: use max supported AGAW
@ 2023-10-17 13:09 Roger Pau Monne
  2023-10-18  1:05 ` Henry Wang
  2023-10-18  7:54 ` Jan Beulich
  0 siblings, 2 replies; 5+ messages in thread
From: Roger Pau Monne @ 2023-10-17 13:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Henry Wang, Roger Pau Monne, Kevin Tian

SAGAW is a bitmap field, with bits 1 and 2 signaling support for AGAW 1 and
AGAW 2 respectively.  According to the Intel VT-d specification, an IOMMU might
support multiple AGAW values.

The AGAW value for each device is set in the device context entry, however
there's a caveat related to the value the field supports depending on the
translation type:

"When the Translation-type (T) field indicates pass-through (010b) or
guest-mode (100b or 101b), this field must be programmed to indicate the
largest AGAW value supported by hardware."

Of the translation types listed above Xen only uses pass-through (010b), and
hence we need to make sure the context entry AGAW field is set appropriately,
or else the IOMMU will report invalid context entry errors.

To do so calculate the IOMMU supported page table levels based on the last bit
set in the SAGAW field, instead of the first one.  This also allows making use
of the widest address width supported by the IOMMU, in case multiple AGAWs are
supported.

Note that 859d11b27912 claims to replace the open-coded find_first_set_bit(),
but it's actually replacing an open coded implementation to find the last set
bit.

Fixes: 859d11b27912 ('VT-d: prune SAGAW recognition')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index ceef7359e553..be60d7573dae 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1328,7 +1328,7 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
     /* Calculate number of pagetable levels: 3 or 4. */
     sagaw = cap_sagaw(iommu->cap);
     if ( sagaw & 6 )
-        agaw = find_first_set_bit(sagaw & 6);
+        agaw = fls(sagaw & 6) - 1;
     if ( !agaw )
     {
         printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw);
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.18] iommu/vt-d: use max supported AGAW
  2023-10-17 13:09 [PATCH for-4.18] iommu/vt-d: use max supported AGAW Roger Pau Monne
@ 2023-10-18  1:05 ` Henry Wang
  2023-10-18  7:54 ` Jan Beulich
  1 sibling, 0 replies; 5+ messages in thread
From: Henry Wang @ 2023-10-18  1:05 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Xen-devel, Kevin Tian

Hi Roger,

> On Oct 17, 2023, at 21:09, Roger Pau Monne <roger.pau@citrix.com> wrote:
> 
> SAGAW is a bitmap field, with bits 1 and 2 signaling support for AGAW 1 and
> AGAW 2 respectively.  According to the Intel VT-d specification, an IOMMU might
> support multiple AGAW values.
> 
> The AGAW value for each device is set in the device context entry, however
> there's a caveat related to the value the field supports depending on the
> translation type:
> 
> "When the Translation-type (T) field indicates pass-through (010b) or
> guest-mode (100b or 101b), this field must be programmed to indicate the
> largest AGAW value supported by hardware."
> 
> Of the translation types listed above Xen only uses pass-through (010b), and
> hence we need to make sure the context entry AGAW field is set appropriately,
> or else the IOMMU will report invalid context entry errors.
> 
> To do so calculate the IOMMU supported page table levels based on the last bit
> set in the SAGAW field, instead of the first one.  This also allows making use
> of the widest address width supported by the IOMMU, in case multiple AGAWs are
> supported.
> 
> Note that 859d11b27912 claims to replace the open-coded find_first_set_bit(),
> but it's actually replacing an open coded implementation to find the last set
> bit.
> 
> Fixes: 859d11b27912 ('VT-d: prune SAGAW recognition')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.18] iommu/vt-d: use max supported AGAW
  2023-10-17 13:09 [PATCH for-4.18] iommu/vt-d: use max supported AGAW Roger Pau Monne
  2023-10-18  1:05 ` Henry Wang
@ 2023-10-18  7:54 ` Jan Beulich
  2023-10-18  8:54   ` Roger Pau Monné
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2023-10-18  7:54 UTC (permalink / raw)
  To: Roger Pau Monne, Kevin Tian; +Cc: Henry Wang, xen-devel

On 17.10.2023 15:09, Roger Pau Monne wrote:
> SAGAW is a bitmap field, with bits 1 and 2 signaling support for AGAW 1 and
> AGAW 2 respectively.  According to the Intel VT-d specification, an IOMMU might
> support multiple AGAW values.
> 
> The AGAW value for each device is set in the device context entry, however
> there's a caveat related to the value the field supports depending on the
> translation type:
> 
> "When the Translation-type (T) field indicates pass-through (010b) or
> guest-mode (100b or 101b), this field must be programmed to indicate the
> largest AGAW value supported by hardware."

Considering SAGAW=3 was reserved in earlier versions, and considering SAGAW=4
and higher continue to be reserved, how is one to write forward-compatible
code? (In retrospect I think this is what mislead me to wrongly use
find_first_set_bit().)

Furthermore, which version of the spec are you looking at? The newest public
one I found is 4.1 (-016), which only mentions pass-through, and only as a
2-bit quantity. (Doesn't matter much for the purposes of the actual code
change, but still.)

> Of the translation types listed above Xen only uses pass-through (010b), and
> hence we need to make sure the context entry AGAW field is set appropriately,
> or else the IOMMU will report invalid context entry errors.
> 
> To do so calculate the IOMMU supported page table levels based on the last bit
> set in the SAGAW field, instead of the first one.  This also allows making use
> of the widest address width supported by the IOMMU, in case multiple AGAWs are
> supported.

To truly achieve that (with the 5-level spec), ...

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1328,7 +1328,7 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>      /* Calculate number of pagetable levels: 3 or 4. */
>      sagaw = cap_sagaw(iommu->cap);
>      if ( sagaw & 6 )
> -        agaw = find_first_set_bit(sagaw & 6);
> +        agaw = fls(sagaw & 6) - 1;

... the mask here needs widening to 0xe. But see my forward-compatibility
remark above: It may need widening even further. Yet I'm not sure our code
is uniformly ready to handle levels > 4. As a result I think we need to
further alter the use of context_set_address_width(): We don't necessarily
want to use the maximum value with CONTEXT_TT_{DEV_IOTLB,MULTI_LEVEL}.
Specifically I don't think we want to use levels=5 (aw=3) there, until
such time that we support 5-level page tables (which as it looks right now
may well end up being never).

Furthermore just out of context we have

    iommu->nr_pt_levels = agaw_to_level(agaw);
    if ( min_pt_levels > iommu->nr_pt_levels )
        min_pt_levels = iommu->nr_pt_levels;

With fls() instead of find_first_set_bit() this won't be correct anymore.
Yet looking at the sole use (and depending on the resolution of the other
issue) it may be a mere matter of renaming the variable to properly
reflect its purpose.

Taking together perhaps:
- nr_pt_levels needs setting to the larger of 3 and 4, depending on what
  hardware supports, for use in non-pass-through entries,
- a new max_pt_levels field needs setting to what would result from your
  code change above, for use in pass-through entries,
- min_pt_levels could then remain as is,
- for the moment we ignore the forward-compatibility aspect, until the
  underlying principle has been clarified by Intel.

A possible further complication then is that we will end up switching
context entries between different AW values. That's not an issue when
we use CMPXCHG16B or transiently clear the present bit, but our best
effort fallback would likely be of security concern then.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.18] iommu/vt-d: use max supported AGAW
  2023-10-18  7:54 ` Jan Beulich
@ 2023-10-18  8:54   ` Roger Pau Monné
  2023-10-18  9:14     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Pau Monné @ 2023-10-18  8:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Henry Wang, xen-devel

On Wed, Oct 18, 2023 at 09:54:15AM +0200, Jan Beulich wrote:
> On 17.10.2023 15:09, Roger Pau Monne wrote:
> > SAGAW is a bitmap field, with bits 1 and 2 signaling support for AGAW 1 and
> > AGAW 2 respectively.  According to the Intel VT-d specification, an IOMMU might
> > support multiple AGAW values.
> > 
> > The AGAW value for each device is set in the device context entry, however
> > there's a caveat related to the value the field supports depending on the
> > translation type:
> > 
> > "When the Translation-type (T) field indicates pass-through (010b) or
> > guest-mode (100b or 101b), this field must be programmed to indicate the
> > largest AGAW value supported by hardware."
> 
> Considering SAGAW=3 was reserved in earlier versions, and considering SAGAW=4
> and higher continue to be reserved, how is one to write forward-compatible
> code? (In retrospect I think this is what mislead me to wrongly use
> find_first_set_bit().)

Oh, my spec copy still has SAGAW=3 reserved, only bits 1 and 2 of
SAGAW are not reserved.

> Furthermore, which version of the spec are you looking at? The newest public
> one I found is 4.1 (-016), which only mentions pass-through, and only as a
> 2-bit quantity. (Doesn't matter much for the purposes of the actual code
> change, but still.)

It's an old version, 2.4 from June 2016.  I've now picked up the last
4.1 version.  I indeed see the changes you mention, so will update the
commit message accordingly to pick the wording from the new spec
version (even if we don't care about the guest-mode.

I'm kind of confused by the removal of the guest-modes, but we didn't
use those anyway.

> > Of the translation types listed above Xen only uses pass-through (010b), and
> > hence we need to make sure the context entry AGAW field is set appropriately,
> > or else the IOMMU will report invalid context entry errors.
> > 
> > To do so calculate the IOMMU supported page table levels based on the last bit
> > set in the SAGAW field, instead of the first one.  This also allows making use
> > of the widest address width supported by the IOMMU, in case multiple AGAWs are
> > supported.
> 
> To truly achieve that (with the 5-level spec), ...
> 
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1328,7 +1328,7 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
> >      /* Calculate number of pagetable levels: 3 or 4. */
> >      sagaw = cap_sagaw(iommu->cap);
> >      if ( sagaw & 6 )
> > -        agaw = find_first_set_bit(sagaw & 6);
> > +        agaw = fls(sagaw & 6) - 1;
> 
> ... the mask here needs widening to 0xe. But see my forward-compatibility
> remark above: It may need widening even further. Yet I'm not sure our code
> is uniformly ready to handle levels > 4.

Hard to tell, I'm not sure we have a system with SAGAW bit 3 set to
test with, will have to check.

> As a result I think we need to
> further alter the use of context_set_address_width(): We don't necessarily
> want to use the maximum value with CONTEXT_TT_{DEV_IOTLB,MULTI_LEVEL}.
> Specifically I don't think we want to use levels=5 (aw=3) there, until
> such time that we support 5-level page tables (which as it looks right now
> may well end up being never).
> 
> Furthermore just out of context we have
> 
>     iommu->nr_pt_levels = agaw_to_level(agaw);
>     if ( min_pt_levels > iommu->nr_pt_levels )
>         min_pt_levels = iommu->nr_pt_levels;
> 
> With fls() instead of find_first_set_bit() this won't be correct anymore.

I think this is correct as a long as the context entry address width
field is forced to iommu->nr_pt_levels.  min_pt_levels needs to
reflect the minimal paging level used by any IOMMU on the system, even
if smaller page table levels are supported by the IOMMUs those are not
relevant given the unconditional setting of iommu->nr_pt_levels in the
context entry.

> Yet looking at the sole use (and depending on the resolution of the other
> issue) it may be a mere matter of renaming the variable to properly
> reflect its purpose.
> 
> Taking together perhaps:
> - nr_pt_levels needs setting to the larger of 3 and 4, depending on what
>   hardware supports, for use in non-pass-through entries,
> - a new max_pt_levels field needs setting to what would result from your
>   code change above, for use in pass-through entries,

It needs to be a per-IOMMU field, as I would assume IOMMUs can have
different page table level support on the same system?

> - min_pt_levels could then remain as is,
> - for the moment we ignore the forward-compatibility aspect, until the
>   underlying principle has been clarified by Intel.
> 
> A possible further complication then is that we will end up switching
> context entries between different AW values. That's not an issue when
> we use CMPXCHG16B or transiently clear the present bit, but our best
> effort fallback would likely be of security concern then.

We would need to update the AW context entry field unconditionally in
domain_context_mapping_one().

Hm, it's likely more change than what I was expecting to perform at
this point in the release, but I guess we cannot ignore the fact that
SAGAW might now have bit 3 set and hence passthrough mode is broken on
such systems.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for-4.18] iommu/vt-d: use max supported AGAW
  2023-10-18  8:54   ` Roger Pau Monné
@ 2023-10-18  9:14     ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2023-10-18  9:14 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Kevin Tian, Henry Wang, xen-devel

On 18.10.2023 10:54, Roger Pau Monné wrote:
> On Wed, Oct 18, 2023 at 09:54:15AM +0200, Jan Beulich wrote:
>> Taking together perhaps:
>> - nr_pt_levels needs setting to the larger of 3 and 4, depending on what
>>   hardware supports, for use in non-pass-through entries,
>> - a new max_pt_levels field needs setting to what would result from your
>>   code change above, for use in pass-through entries,
> 
> It needs to be a per-IOMMU field, as I would assume IOMMUs can have
> different page table level support on the same system?

Right, hence why I also said "field".

>> - min_pt_levels could then remain as is,
>> - for the moment we ignore the forward-compatibility aspect, until the
>>   underlying principle has been clarified by Intel.
>>
>> A possible further complication then is that we will end up switching
>> context entries between different AW values. That's not an issue when
>> we use CMPXCHG16B or transiently clear the present bit, but our best
>> effort fallback would likely be of security concern then.
> 
> We would need to update the AW context entry field unconditionally in
> domain_context_mapping_one().

Plus (just so it's not missed) purge the corresponding assertion.

> Hm, it's likely more change than what I was expecting to perform at
> this point in the release, but I guess we cannot ignore the fact that
> SAGAW might now have bit 3 set and hence passthrough mode is broken on
> such systems.

Ideally yes. Of course for the immediate purpose we might go with the
smaller change, but then with the description mentioning (and
justifying) the omission. If there are 5-level IOMMUs in the wild, I'm
afraid that wouldn't be a (good) option though. Otoh people shouldn't
be using hwdom-pass-through mode in the first place ...

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-18  9:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-17 13:09 [PATCH for-4.18] iommu/vt-d: use max supported AGAW Roger Pau Monne
2023-10-18  1:05 ` Henry Wang
2023-10-18  7:54 ` Jan Beulich
2023-10-18  8:54   ` Roger Pau Monné
2023-10-18  9:14     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.