All of lore.kernel.org
 help / color / mirror / Atom feed
* IOMMU faults after S3
@ 2026-03-27 10:19 Marek Marczykowski-Górecki
  2026-03-27 10:56 ` Teddy Astie
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-03-27 10:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

Hi,

I noticed that on some systems, there are a lot of IOMMU faults after
S3. I can see it also on a laptop with MTL, but it affects also the ADL
gitlab runner:

    https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
    (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
    (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
    (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
    (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear

Interestingly, the 0000:00:1e.6 device is not even listed by lspci.

The issue is present only on staging, not staging-4.21.

Bisect says:

5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Jan 22 14:13:35 2026 +0100

    x86/HPET: drop .set_affinity hook
    
    No IRQ balancing is supposed to be happening on the broadcast IRQs. The
    only entity responsible for fiddling with the CPU affinities is
    set_channel_irq_affinity(). They shouldn't even be fiddled with when
    offlining a CPU: A CPU going down can't at the same time be idle. Some
    properties (->arch.cpu_mask in particular) may transiently reference an
    offline CPU, but that'll be adjusted as soon as a channel goes into active
    use again.
    
    Along with adjusting fixup_irqs() (in a more general way, i.e. covering all
    vectors which are marked in use globally), also adjust section placement of
    used_vectors.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

 xen/arch/x86/hpet.c | 17 -----------------
 xen/arch/x86/irq.c  | 12 ++++++++----
 2 files changed, 8 insertions(+), 21 deletions(-)


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-03-27 10:19 IOMMU faults after S3 Marek Marczykowski-Górecki
@ 2026-03-27 10:56 ` Teddy Astie
  2026-03-27 10:59   ` Marek Marczykowski-Górecki
  2026-03-27 12:23 ` Andrew Cooper
  2026-04-01  7:14 ` Jan Beulich
  2 siblings, 1 reply; 32+ messages in thread
From: Teddy Astie @ 2026-03-27 10:56 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Jan Beulich

Le 27/03/2026 à 11:19, Marek Marczykowski-Górecki a écrit :
> Hi,
>
> I noticed that on some systems, there are a lot of IOMMU faults after
> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> gitlab runner:
>
>      https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>      (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>      (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>      (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>      (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>
> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>
> The issue is present only on staging, not staging-4.21.
>

Is there a 1e.0 device ? That could be a "phantom" PCI device.

> Bisect says:
>
> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Thu Jan 22 14:13:35 2026 +0100
>
>      x86/HPET: drop .set_affinity hook
>
>      No IRQ balancing is supposed to be happening on the broadcast IRQs. The
>      only entity responsible for fiddling with the CPU affinities is
>      set_channel_irq_affinity(). They shouldn't even be fiddled with when
>      offlining a CPU: A CPU going down can't at the same time be idle. Some
>      properties (->arch.cpu_mask in particular) may transiently reference an
>      offline CPU, but that'll be adjusted as soon as a channel goes into active
>      use again.
>
>      Along with adjusting fixup_irqs() (in a more general way, i.e. covering all
>      vectors which are marked in use globally), also adjust section placement of
>      used_vectors.
>
>      Signed-off-by: Jan Beulich <jbeulich@suse.com>
>      Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
>   xen/arch/x86/hpet.c | 17 -----------------
>   xen/arch/x86/irq.c  | 12 ++++++++----
>   2 files changed, 8 insertions(+), 21 deletions(-)
>
>



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-03-27 10:56 ` Teddy Astie
@ 2026-03-27 10:59   ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-03-27 10:59 UTC (permalink / raw)
  To: Teddy Astie; +Cc: xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On Fri, Mar 27, 2026 at 10:56:43AM +0000, Teddy Astie wrote:
> Le 27/03/2026 à 11:19, Marek Marczykowski-Górecki a écrit :
> > Hi,
> >
> > I noticed that on some systems, there are a lot of IOMMU faults after
> > S3. I can see it also on a laptop with MTL, but it affects also the ADL
> > gitlab runner:
> >
> >      https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >      (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >      (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >      (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >      (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >
> > Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >
> > The issue is present only on staging, not staging-4.21.
> >
> 
> Is there a 1e.0 device ? That could be a "phantom" PCI device.

On ADL - no, there is 1c.2, and then 1f.0.
But on that MTL, yes:
00:1e.0 Communication controller [0780]: Intel Corporation Meteor Lake-P Serial IO UART Controller #0 [8086:7e25] (rev 20)

(I wish there would be a connector populated on the board...)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-03-27 10:19 IOMMU faults after S3 Marek Marczykowski-Górecki
  2026-03-27 10:56 ` Teddy Astie
@ 2026-03-27 12:23 ` Andrew Cooper
  2026-04-01  7:14 ` Jan Beulich
  2 siblings, 0 replies; 32+ messages in thread
From: Andrew Cooper @ 2026-03-27 12:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 27/03/2026 10:19 am, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I noticed that on some systems, there are a lot of IOMMU faults after
> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> gitlab runner:
>
>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>
> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.

Ah.  HPETs and IO-APICs get assigned an otherwise unused PCI address for
the purposes of interrupt remapping.

acpi$ grep -A7 HPET dmar.dsl 
[060h 0096   1]            Device Scope Type : 04 [Message-capable HPET Device]
[061h 0097   1]                 Entry Length : 08
[062h 0098   2]                     Reserved : 0000
[064h 0100   1]               Enumeration ID : 00
[065h 0101   1]               PCI Bus Number : 00

[066h 0102   2]                     PCI Path : 1E,06


This is information carried in the DMAR / IVRS ACPI tables, and it seems
to be the same on a random Intel system of mine.

~Andrew


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-03-27 10:19 IOMMU faults after S3 Marek Marczykowski-Górecki
  2026-03-27 10:56 ` Teddy Astie
  2026-03-27 12:23 ` Andrew Cooper
@ 2026-04-01  7:14 ` Jan Beulich
  2026-04-01  7:20   ` Andrew Cooper
                     ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-01  7:14 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> I noticed that on some systems, there are a lot of IOMMU faults after
> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> gitlab runner:
> 
>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> 
> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> 
> The issue is present only on staging, not staging-4.21.
> 
> Bisect says:
> 
> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Thu Jan 22 14:13:35 2026 +0100
> 
>     x86/HPET: drop .set_affinity hook

Looking into this, I find several things I can't quite understand (yet).
First there is

(XEN) [000000456c0fe39f] Disabling HPET for being unreliable

which looks to only affect clocksource selection, but not use as
broadcast source for CPU-idle management. (This may be an independent
issue.)

Then there is

(XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)

which should only occur on ARAT-incapable systems. That should only be
older hardware. (On my much older Skylake I don't see this line, for
example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
is purely featureset based, and hence doesn't expose info about that
leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
isn't useful to look at either. It would need to be CPUID output on a
bare metal kernel.

Further I suspect the fingered commit may only have uncovered an issue
elsewhere. I don't think we clear any context table entries during
suspend or resume. Hence in

(XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
(XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear

the latter message is confusing me.

The fault address being zero may, otoh, be a hint of hpet_msi_write()
never having run post-resume. Which may be the connection to the
dropping of hpet_msi_set_affinity(), as that did call that function.
I'll continue looking in that direction as a first step.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  7:14 ` Jan Beulich
@ 2026-04-01  7:20   ` Andrew Cooper
  2026-04-01  8:11     ` Jan Beulich
  2026-04-01  8:52   ` Jan Beulich
  2026-04-01  8:58   ` Jan Beulich
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2026-04-01  7:20 UTC (permalink / raw)
  To: Jan Beulich, Marek Marczykowski-Górecki; +Cc: Andrew Cooper, xen-devel

On 01/04/2026 9:14 am, Jan Beulich wrote:
> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>> I noticed that on some systems, there are a lot of IOMMU faults after
>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>> gitlab runner:
>>
>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>
>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>
>> The issue is present only on staging, not staging-4.21.
>>
>> Bisect says:
>>
>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>
>>     x86/HPET: drop .set_affinity hook
> Looking into this, I find several things I can't quite understand (yet).
> First there is
>
> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
>
> which looks to only affect clocksource selection, but not use as
> broadcast source for CPU-idle management. (This may be an independent
> issue.)
>
> Then there is
>
> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
>
> which should only occur on ARAT-incapable systems. That should only be
> older hardware.

I'm not sure that's a reasonable assertion to draw.  The number of HPET
channels is down to the HPET alone, not anything to do with the CPU
capabilities.

>  (On my much older Skylake I don't see this line, for
> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> is purely featureset based, and hence doesn't expose info about that
> leaf.

xen-cpuid -p

That will get you leaf 6, but there's no human-readable decode of it.

~Andrew


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  7:20   ` Andrew Cooper
@ 2026-04-01  8:11     ` Jan Beulich
  2026-04-01 20:30       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-01  8:11 UTC (permalink / raw)
  To: Andrew Cooper, Marek Marczykowski-Górecki; +Cc: xen-devel

On 01.04.2026 09:20, Andrew Cooper wrote:
> On 01/04/2026 9:14 am, Jan Beulich wrote:
>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>>> I noticed that on some systems, there are a lot of IOMMU faults after
>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>>> gitlab runner:
>>>
>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>
>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>>
>>> The issue is present only on staging, not staging-4.21.
>>>
>>> Bisect says:
>>>
>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>>
>>>     x86/HPET: drop .set_affinity hook
>> Looking into this, I find several things I can't quite understand (yet).
>> First there is
>>
>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
>>
>> which looks to only affect clocksource selection, but not use as
>> broadcast source for CPU-idle management. (This may be an independent
>> issue.)
>>
>> Then there is
>>
>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
>>
>> which should only occur on ARAT-incapable systems. That should only be
>> older hardware.
> 
> I'm not sure that's a reasonable assertion to draw.  The number of HPET
> channels is down to the HPET alone, not anything to do with the CPU
> capabilities.

My statement was about the mere presence of that message, not the number
of channels that are reported.

>>  (On my much older Skylake I don't see this line, for
>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
>> is purely featureset based, and hence doesn't expose info about that
>> leaf.
> 
> xen-cpuid -p
> 
> That will get you leaf 6, but there's no human-readable decode of it.

Raw numbers is good enough here. How did I miss that option when looking
at --help output? Oh, simply because it isn't shown there.

Marek, that'll be better than bare metal kernel data, as it gives us both
raw and host policies.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  7:14 ` Jan Beulich
  2026-04-01  7:20   ` Andrew Cooper
@ 2026-04-01  8:52   ` Jan Beulich
  2026-04-01 23:17     ` Marek Marczykowski-Górecki
  2026-04-01  8:58   ` Jan Beulich
  2 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-01  8:52 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 01.04.2026 09:14, Jan Beulich wrote:
> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>> I noticed that on some systems, there are a lot of IOMMU faults after
>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>> gitlab runner:
>>
>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>
>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>
>> The issue is present only on staging, not staging-4.21.
>>
>> Bisect says:
>>
>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>
>>     x86/HPET: drop .set_affinity hook
> 
> Looking into this, I find several things I can't quite understand (yet).
> First there is
> 
> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> 
> which looks to only affect clocksource selection, but not use as
> broadcast source for CPU-idle management. (This may be an independent
> issue.)
> 
> Then there is
> 
> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
> 
> which should only occur on ARAT-incapable systems. That should only be
> older hardware. (On my much older Skylake I don't see this line, for
> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> is purely featureset based, and hence doesn't expose info about that
> leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
> isn't useful to look at either. It would need to be CPUID output on a
> bare metal kernel.
> 
> Further I suspect the fingered commit may only have uncovered an issue
> elsewhere. I don't think we clear any context table entries during
> suspend or resume. Hence in
> 
> (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> 
> the latter message is confusing me.
> 
> The fault address being zero may, otoh, be a hint of hpet_msi_write()
> never having run post-resume. Which may be the connection to the
> dropping of hpet_msi_set_affinity(), as that did call that function.

There clearly is an issue with the handling of the max_cstate variable,
but I expect you don't use xenpm to limit usable C-states (there clearly
is no respective command line option in the log you referenced)?

From what the log has, I conclude hpet_broadcast_resume() is called.
Question is whether it does what we want it to. Could you instrument it
some, so we have confirmation that it is called, and we also know whether
__hpet_setup_msi_irq() is not only called on all 8 channels, but also
succeeds there? (If it failed, I suppose we better wouldn't set
HPET_TN_FSB and/or HPET_TN_ENABLE.) If, however, it succeeds, I couldn't
explain why the fault address would be reported as 0, as then we
definitely must have written HPET_Tn_ROUTE.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  7:14 ` Jan Beulich
  2026-04-01  7:20   ` Andrew Cooper
  2026-04-01  8:52   ` Jan Beulich
@ 2026-04-01  8:58   ` Jan Beulich
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-01  8:58 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 01.04.2026 09:14, Jan Beulich wrote:
> Further I suspect the fingered commit may only have uncovered an issue
> elsewhere. I don't think we clear any context table entries during
> suspend or resume. Hence in
> 
> (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> 
> the latter message is confusing me.

Actually, it makes sense. The address being outside of the interrupt remap
MMIO window (FEExxxxx), it's subject to DMA translation. Yet the HPET has no
entry in the context table; only its IRQs have entries in the intremap table.

So it really all looks to be boiling down to missing HPET_Tn_ROUTE writes.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  8:11     ` Jan Beulich
@ 2026-04-01 20:30       ` Marek Marczykowski-Górecki
  2026-04-02  6:55         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-01 20:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 12674 bytes --]

On Wed, Apr 01, 2026 at 10:11:12AM +0200, Jan Beulich wrote:
> On 01.04.2026 09:20, Andrew Cooper wrote:
> > On 01/04/2026 9:14 am, Jan Beulich wrote:
> >> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> >>> I noticed that on some systems, there are a lot of IOMMU faults after
> >>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> >>> gitlab runner:
> >>>
> >>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>
> >>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >>>
> >>> The issue is present only on staging, not staging-4.21.
> >>>
> >>> Bisect says:
> >>>
> >>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> >>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> >>> Author: Jan Beulich <jbeulich@suse.com>
> >>> Date:   Thu Jan 22 14:13:35 2026 +0100
> >>>
> >>>     x86/HPET: drop .set_affinity hook
> >> Looking into this, I find several things I can't quite understand (yet).
> >> First there is
> >>
> >> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> >>
> >> which looks to only affect clocksource selection, but not use as
> >> broadcast source for CPU-idle management. (This may be an independent
> >> issue.)
> >>
> >> Then there is
> >>
> >> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
> >>
> >> which should only occur on ARAT-incapable systems. That should only be
> >> older hardware.
> > 
> > I'm not sure that's a reasonable assertion to draw.  The number of HPET
> > channels is down to the HPET alone, not anything to do with the CPU
> > capabilities.
> 
> My statement was about the mere presence of that message, not the number
> of channels that are reported.
> 
> >>  (On my much older Skylake I don't see this line, for
> >> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> >> is purely featureset based, and hence doesn't expose info about that
> >> leaf.
> > 
> > xen-cpuid -p
> > 
> > That will get you leaf 6, but there's no human-readable decode of it.
> 
> Raw numbers is good enough here. How did I miss that option when looking
> at --help output? Oh, simply because it isn't shown there.
> 
> Marek, that'll be better than bare metal kernel data, as it gives us both
> raw and host policies.

Here is the output from ADL runner:

Xen reports there are maximum 120 leaves and 2 MSRs
Raw policy: 48 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 00000020:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000005:ffffffff -> 00000040:00000040:00000003:10102020
  00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003
  00000007:00000000 -> 00000002:239c27eb:98c027ac:fc1cc410
  00000007:00000001 -> 00400810:00000000:00000000:00040000
  00000007:00000002 -> 00000000:00000000:00000000:00000017
  0000000a:ffffffff -> 07300605:00000000:00000007:00008603
  0000000b:00000000 -> 00000001:00000002:00000100:00000000
  0000000b:00000001 -> 00000007:00000010:00000201:00000000
  0000000d:00000000 -> 00000207:00000000:00000a88:00000000
  0000000d:00000001 -> 0000000f:00000000:00019900:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  0000000d:00000008 -> 00000080:00000000:00000001:00000000
  0000000d:00000009 -> 00000008:00000a80:00000000:00000000
  0000000d:0000000b -> 00000010:00000000:00000001:00000000
  0000000d:0000000c -> 00000018:00000000:00000001:00000000
  0000000d:0000000f -> 00000328:00000000:00000001:00000000
  0000000d:00000010 -> 00000008:00000000:00000001:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:2c100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 0000302e:00000000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 000000001488fd6b
Host policy: 41 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000005:ffffffff -> 00000040:00000040:00000003:10102020
  00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003
  00000007:00000000 -> 00000002:239c27eb:984027ac:fc1cc410
  00000007:00000001 -> 00000810:00000000:00000000:00040000
  00000007:00000002 -> 00000000:00000000:00000000:00000017
  0000000b:00000000 -> 00000001:00000002:00000100:00000000
  0000000b:00000001 -> 00000007:00000010:00000201:00000000
  0000000d:00000000 -> 00000207:00000000:00000a88:00000000
  0000000d:00000001 -> 0000000f:00000000:00000000:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  0000000d:00000009 -> 00000008:00000a80:00000000:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:2c100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 0000302e:00000000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 400000001488fd6b
PV Max policy: 58 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:f6f83203:1fc9cbf5
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000007:00000000 -> 00000002:218c0329:18400700:ac004410
  00000007:00000001 -> 00000810:00000000:00000000:00000000
  00000007:00000002 -> 00000000:00000000:00000000:00000001
  0000000d:00000000 -> 00000007:00000000:00000340:00000000
  0000000d:00000001 -> 00000007:00000000:00000000:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  80000000:ffffffff -> 80000021:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000123:28100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 0000302e:00001000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 40000000140ae167
HVM Max policy: 65 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:f7fa3223:1fcbfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000007:00000000 -> 00000002:219c07ab:9840070c:bc004410
  00000007:00000001 -> 00000810:00000000:00000000:00000000
  00000007:00000002 -> 00000000:00000000:00000000:00000017
  0000000d:00000000 -> 00000207:00000000:00000a88:00000000
  0000000d:00000001 -> 0000000f:00000000:00000000:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  0000000d:00000009 -> 00000008:00000a80:00000000:00000000
  80000000:ffffffff -> 80000021:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000123:2c100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 0000302e:00101000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 40000000140ae167
PV Default policy: 33 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:f6d83203:1fc9cbf5
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000007:00000000 -> 00000002:218c0329:00400700:ac004410
  00000007:00000001 -> 00000810:00000000:00000000:00000000
  00000007:00000002 -> 00000000:00000000:00000000:00000001
  0000000d:00000000 -> 00000007:00000000:00000340:00000000
  0000000d:00000001 -> 00000007:00000000:00000000:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:28100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000008:ffffffff -> 0000302e:00001000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 400000001408e163
HVM Default policy: 40 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx     
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 00090672:00800800:f7fa3203:1fcbfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
  00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
  00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
  00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
  00000007:00000000 -> 00000002:219c07ab:8040070c:bc004410
  00000007:00000001 -> 00000810:00000000:00000000:00000000
  00000007:00000002 -> 00000000:00000000:00000000:00000017
  0000000d:00000000 -> 00000207:00000000:00000a88:00000000
  0000000d:00000001 -> 0000000f:00000000:00000000:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  0000000d:00000009 -> 00000008:00000a80:00000000:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:2c100800
  80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
  80000004:ffffffff -> 4b303036:00000000:00000000:00000000
  80000006:ffffffff -> 00000000:00000000:05007040:00000000
  80000008:ffffffff -> 0000302e:00101000:00000000:00000000
 MSRs:
  index    -> value           
  000000ce -> 0000000080000000
  0000010a -> 400000001408e163


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01  8:52   ` Jan Beulich
@ 2026-04-01 23:17     ` Marek Marczykowski-Górecki
  2026-04-02  7:01       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-01 23:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4830 bytes --]

On Wed, Apr 01, 2026 at 10:52:37AM +0200, Jan Beulich wrote:
> On 01.04.2026 09:14, Jan Beulich wrote:
> > On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> >> I noticed that on some systems, there are a lot of IOMMU faults after
> >> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> >> gitlab runner:
> >>
> >>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>
> >> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >>
> >> The issue is present only on staging, not staging-4.21.
> >>
> >> Bisect says:
> >>
> >> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> >> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> >> Author: Jan Beulich <jbeulich@suse.com>
> >> Date:   Thu Jan 22 14:13:35 2026 +0100
> >>
> >>     x86/HPET: drop .set_affinity hook
> > 
> > Looking into this, I find several things I can't quite understand (yet).
> > First there is
> > 
> > (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> > 
> > which looks to only affect clocksource selection, but not use as
> > broadcast source for CPU-idle management. (This may be an independent
> > issue.)
> > 
> > Then there is
> > 
> > (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
> > 
> > which should only occur on ARAT-incapable systems. That should only be
> > older hardware. (On my much older Skylake I don't see this line, for
> > example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> > is purely featureset based, and hence doesn't expose info about that
> > leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
> > isn't useful to look at either. It would need to be CPUID output on a
> > bare metal kernel.
> > 
> > Further I suspect the fingered commit may only have uncovered an issue
> > elsewhere. I don't think we clear any context table entries during
> > suspend or resume. Hence in
> > 
> > (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> > (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> > 
> > the latter message is confusing me.
> > 
> > The fault address being zero may, otoh, be a hint of hpet_msi_write()
> > never having run post-resume. Which may be the connection to the
> > dropping of hpet_msi_set_affinity(), as that did call that function.
> 
> There clearly is an issue with the handling of the max_cstate variable,
> but I expect you don't use xenpm to limit usable C-states (there clearly
> is no respective command line option in the log you referenced)?

No, I don't think so.

> From what the log has, I conclude hpet_broadcast_resume() is called.

I don't think so... I applied changes as attached and got this on
resume:

(XEN) [   69.486120] Enabling non-boot CPUs  ...
(XEN) [   69.486404] mwait-idle: state C1 is disabled
(XEN) [   69.587869] mwait-idle: state C1 is disabled
(XEN) [   69.588008] mwait-idle: state C1 is disabled
(XEN) [   69.689438] mwait-idle: state C1 is disabled
(XEN) [   69.689608] mwait-idle: state C1 is disabled
(XEN) [   69.791066] mwait-idle: state C1 is disabled
(XEN) [   69.791334] mwait-idle: state C1 is disabled
(XEN) [   69.892938] mwait-idle: state C1 is disabled
(XEN) [   69.893209] mwait-idle: state C1 is disabled
(XEN) [   69.994890] mwait-idle: state C1 is disabled
(XEN) [   69.995096] mwait-idle: state C1 is disabled
(XEN) [   70.096638] mwait-idle: state C1 is disabled
(XEN) [   70.096915] mwait-idle: state C1 is disabled
(XEN) [   70.097093] mwait-idle: state C1 is disabled
(XEN) [   70.097272] mwait-idle: state C1 is disabled
(XEN) [   70.203357] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
(XEN) [   70.203363] [VT-D]DMAR: reason 02 - Present bit in context entry is clear

> Question is whether it does what we want it to. Could you instrument it
> some, so we have confirmation that it is called, and we also know whether
> __hpet_setup_msi_irq() is not only called on all 8 channels, but also
> succeeds there? (If it failed, I suppose we better wouldn't set
> HPET_TN_FSB and/or HPET_TN_ENABLE.) If, however, it succeeds, I couldn't
> explain why the fault address would be reported as 0, as then we
> definitely must have written HPET_Tn_ROUTE.
> 
> Jan

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #1.2: xen-debug.diff --]
[-- Type: text/plain, Size: 2875 bytes --]

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 1ea8ae457424..4c5bf079b728 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -658,6 +658,8 @@ void hpet_broadcast_resume(void)
     u32 cfg;
     unsigned int i, n;
 
+    printk("%s:%d: hpet_events: %p\n", __func__, __LINE__, hpet_events);
+
     if ( !hpet_events )
         return;
 
@@ -667,23 +669,30 @@ void hpet_broadcast_resume(void)
 
     if ( num_hpets_used > 0 )
     {
+        printk("%s:%d: num_hpets_used: %d\n", __func__, __LINE__, num_hpets_used);
         /* Stop HPET legacy interrupts */
         cfg &= ~HPET_CFG_LEGACY;
         n = num_hpets_used;
     }
     else if ( hpet_events->flags & HPET_EVT_DISABLE )
+    {
+        printk("%s:%d: hpet_events->flags: %#x\n", __func__, __LINE__, hpet_events->flags);
         return;
+    }
     else
     {
         /* Start HPET legacy interrupts */
+        printk("%s:%d\n", __func__, __LINE__);
         cfg |= HPET_CFG_LEGACY;
         n = 1;
     }
 
+    printk("%s:%d: cfg: %#x\n", __func__, __LINE__, cfg);
     hpet_write32(cfg, HPET_CFG);
 
     for ( i = 0; i < n; i++ )
     {
+        printk("%s:%d: i:%d, hpet_events[i].msi.irq: %d, hpet_events[i].flags: %#x\n", __func__, __LINE__, i, hpet_events[i].msi.irq, hpet_events[i].flags);
         if ( hpet_events[i].msi.irq >= 0 )
             __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
 
@@ -694,6 +703,7 @@ void hpet_broadcast_resume(void)
         if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
             cfg |= HPET_TN_FSB;
         hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
+        printk("%s:%d: i:%d, cfg: %#x\n", __func__, __LINE__, i, cfg);
 
         hpet_events[i].next_event = STIME_MAX;
     }
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index fed30a919d2c..15113ebdfb6c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2646,6 +2646,7 @@ static int _disable_pit_irq(bool init)
 {
     int ret = 1;
 
+    printk("%s:%d: using_pit: %d, cpu_has_apic: %d\n", __func__, __LINE__, using_pit, cpu_has_apic);
     if ( using_pit || !cpu_has_apic )
         return -1;
 
@@ -2655,8 +2656,10 @@ static int _disable_pit_irq(bool init)
      * XXX dom0 may rely on RTC interrupt delivery, so only enable
      * hpet_broadcast if FSB mode available or if force_hpet_broadcast.
      */
+    printk("%s:%d: cpuidle_using_deep_cstate: %d, boot_cpu_has(X86_FEATURE_XEN_ARAT): %d\n", __func__, __LINE__, cpuidle_using_deep_cstate(), boot_cpu_has(X86_FEATURE_XEN_ARAT));
     if ( cpuidle_using_deep_cstate() && !boot_cpu_has(X86_FEATURE_XEN_ARAT) )
     {
+        printk("%s:%d: init: %d\n", __func__, __LINE__, init);
         init ? hpet_broadcast_init() : hpet_broadcast_resume();
         if ( !hpet_broadcast_is_available() )
         {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01 20:30       ` Marek Marczykowski-Górecki
@ 2026-04-02  6:55         ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-02  6:55 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: Andrew Cooper, xen-devel

On 01.04.2026 22:30, Marek Marczykowski-Górecki wrote:
> On Wed, Apr 01, 2026 at 10:11:12AM +0200, Jan Beulich wrote:
>> On 01.04.2026 09:20, Andrew Cooper wrote:
>>> On 01/04/2026 9:14 am, Jan Beulich wrote:
>>>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>>>>> I noticed that on some systems, there are a lot of IOMMU faults after
>>>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>>>>> gitlab runner:
>>>>>
>>>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>>>
>>>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>>>>
>>>>> The issue is present only on staging, not staging-4.21.
>>>>>
>>>>> Bisect says:
>>>>>
>>>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>>>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>>>>
>>>>>     x86/HPET: drop .set_affinity hook
>>>> Looking into this, I find several things I can't quite understand (yet).
>>>> First there is
>>>>
>>>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
>>>>
>>>> which looks to only affect clocksource selection, but not use as
>>>> broadcast source for CPU-idle management. (This may be an independent
>>>> issue.)
>>>>
>>>> Then there is
>>>>
>>>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
>>>>
>>>> which should only occur on ARAT-incapable systems. That should only be
>>>> older hardware.
>>>
>>> I'm not sure that's a reasonable assertion to draw.  The number of HPET
>>> channels is down to the HPET alone, not anything to do with the CPU
>>> capabilities.
>>
>> My statement was about the mere presence of that message, not the number
>> of channels that are reported.
>>
>>>>  (On my much older Skylake I don't see this line, for
>>>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
>>>> is purely featureset based, and hence doesn't expose info about that
>>>> leaf.
>>>
>>> xen-cpuid -p
>>>
>>> That will get you leaf 6, but there's no human-readable decode of it.
>>
>> Raw numbers is good enough here. How did I miss that option when looking
>> at --help output? Oh, simply because it isn't shown there.
>>
>> Marek, that'll be better than bare metal kernel data, as it gives us both
>> raw and host policies.
> 
> Here is the output from ADL runner:
> 
> Xen reports there are maximum 120 leaves and 2 MSRs
> Raw policy: 48 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx     
>   00000000:ffffffff -> 00000020:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>   00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
>   00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
>   00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
>   00000005:ffffffff -> 00000040:00000040:00000003:10102020
>   00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003
>   00000007:00000000 -> 00000002:239c27eb:98c027ac:fc1cc410
>   00000007:00000001 -> 00400810:00000000:00000000:00040000
>   00000007:00000002 -> 00000000:00000000:00000000:00000017
>   0000000a:ffffffff -> 07300605:00000000:00000007:00008603
>   0000000b:00000000 -> 00000001:00000002:00000100:00000000
>   0000000b:00000001 -> 00000007:00000010:00000201:00000000
>   0000000d:00000000 -> 00000207:00000000:00000a88:00000000
>   0000000d:00000001 -> 0000000f:00000000:00019900:00000000
>   0000000d:00000002 -> 00000100:00000240:00000000:00000000
>   0000000d:00000008 -> 00000080:00000000:00000001:00000000
>   0000000d:00000009 -> 00000008:00000a80:00000000:00000000
>   0000000d:0000000b -> 00000010:00000000:00000001:00000000
>   0000000d:0000000c -> 00000018:00000000:00000001:00000000
>   0000000d:0000000f -> 00000328:00000000:00000001:00000000
>   0000000d:00000010 -> 00000008:00000000:00000001:00000000
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000121:2c100800
>   80000002:ffffffff -> 68743231:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:32312d35
>   80000004:ffffffff -> 4b303036:00000000:00000000:00000000
>   80000006:ffffffff -> 00000000:00000000:05007040:00000000
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100
>   80000008:ffffffff -> 0000302e:00000000:00000000:00000000
>  MSRs:
>   index    -> value           
>   000000ce -> 0000000080000000
>   0000010a -> 000000001488fd6b
> Host policy: 41 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx     
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 00090672:00800800:77fafbff:bfebfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>   00000004:00000001 -> fc004122:01c0003f:0000003f:00000000
>   00000004:00000002 -> fc01c143:0240003f:000007ff:00000000
>   00000004:00000003 -> fc1fc163:0240003f:00007fff:00000004
>   00000005:ffffffff -> 00000040:00000040:00000003:10102020
>   00000006:ffffffff -> 00df8ff7:00000002:00000409:00000003

And everything as expected: The ARAT bit is set.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-01 23:17     ` Marek Marczykowski-Górecki
@ 2026-04-02  7:01       ` Jan Beulich
  2026-04-02  8:08         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-02  7:01 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 02.04.2026 01:17, Marek Marczykowski-Górecki wrote:
> On Wed, Apr 01, 2026 at 10:52:37AM +0200, Jan Beulich wrote:
>> On 01.04.2026 09:14, Jan Beulich wrote:
>>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
>>>> I noticed that on some systems, there are a lot of IOMMU faults after
>>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
>>>> gitlab runner:
>>>>
>>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
>>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>>
>>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
>>>>
>>>> The issue is present only on staging, not staging-4.21.
>>>>
>>>> Bisect says:
>>>>
>>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
>>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>> Date:   Thu Jan 22 14:13:35 2026 +0100
>>>>
>>>>     x86/HPET: drop .set_affinity hook
>>>
>>> Looking into this, I find several things I can't quite understand (yet).
>>> First there is
>>>
>>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
>>>
>>> which looks to only affect clocksource selection, but not use as
>>> broadcast source for CPU-idle management. (This may be an independent
>>> issue.)
>>>
>>> Then there is
>>>
>>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
>>>
>>> which should only occur on ARAT-incapable systems. That should only be
>>> older hardware. (On my much older Skylake I don't see this line, for
>>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
>>> is purely featureset based, and hence doesn't expose info about that
>>> leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
>>> isn't useful to look at either. It would need to be CPUID output on a
>>> bare metal kernel.
>>>
>>> Further I suspect the fingered commit may only have uncovered an issue
>>> elsewhere. I don't think we clear any context table entries during
>>> suspend or resume. Hence in
>>>
>>> (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
>>> (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>>>
>>> the latter message is confusing me.
>>>
>>> The fault address being zero may, otoh, be a hint of hpet_msi_write()
>>> never having run post-resume. Which may be the connection to the
>>> dropping of hpet_msi_set_affinity(), as that did call that function.
>>
>> There clearly is an issue with the handling of the max_cstate variable,
>> but I expect you don't use xenpm to limit usable C-states (there clearly
>> is no respective command line option in the log you referenced)?
> 
> No, I don't think so.
> 
>> From what the log has, I conclude hpet_broadcast_resume() is called.
> 
> I don't think so... I applied changes as attached and got this on
> resume:
> 
> (XEN) [   69.486120] Enabling non-boot CPUs  ...
> (XEN) [   69.486404] mwait-idle: state C1 is disabled
> (XEN) [   69.587869] mwait-idle: state C1 is disabled
> (XEN) [   69.588008] mwait-idle: state C1 is disabled
> (XEN) [   69.689438] mwait-idle: state C1 is disabled
> (XEN) [   69.689608] mwait-idle: state C1 is disabled
> (XEN) [   69.791066] mwait-idle: state C1 is disabled
> (XEN) [   69.791334] mwait-idle: state C1 is disabled
> (XEN) [   69.892938] mwait-idle: state C1 is disabled
> (XEN) [   69.893209] mwait-idle: state C1 is disabled
> (XEN) [   69.994890] mwait-idle: state C1 is disabled
> (XEN) [   69.995096] mwait-idle: state C1 is disabled
> (XEN) [   70.096638] mwait-idle: state C1 is disabled
> (XEN) [   70.096915] mwait-idle: state C1 is disabled
> (XEN) [   70.097093] mwait-idle: state C1 is disabled
> (XEN) [   70.097272] mwait-idle: state C1 is disabled
> (XEN) [   70.203357] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> (XEN) [   70.203363] [VT-D]DMAR: reason 02 - Present bit in context entry is clear

That was on the serial console or from xl dmesg? I ask because console_resume()
runs after time_resume(), so nothing appearing on the serial console would be
expected (I think).

Without hpet_broadcast_resume() running, I don't think I could explain how the
channels (and their FSB interrupts) would get enabled.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  7:01       ` Jan Beulich
@ 2026-04-02  8:08         ` Marek Marczykowski-Górecki
  2026-04-02  8:39           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02  8:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 17447 bytes --]

On Thu, Apr 02, 2026 at 09:01:12AM +0200, Jan Beulich wrote:
> On 02.04.2026 01:17, Marek Marczykowski-Górecki wrote:
> > On Wed, Apr 01, 2026 at 10:52:37AM +0200, Jan Beulich wrote:
> >> On 01.04.2026 09:14, Jan Beulich wrote:
> >>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> >>>> I noticed that on some systems, there are a lot of IOMMU faults after
> >>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> >>>> gitlab runner:
> >>>>
> >>>>     https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >>>>     (XEN) [   37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>>     (XEN) [   37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>>     (XEN) [   37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>>     (XEN) [   37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>>
> >>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >>>>
> >>>> The issue is present only on staging, not staging-4.21.
> >>>>
> >>>> Bisect says:
> >>>>
> >>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> >>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> >>>> Author: Jan Beulich <jbeulich@suse.com>
> >>>> Date:   Thu Jan 22 14:13:35 2026 +0100
> >>>>
> >>>>     x86/HPET: drop .set_affinity hook
> >>>
> >>> Looking into this, I find several things I can't quite understand (yet).
> >>> First there is
> >>>
> >>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> >>>
> >>> which looks to only affect clocksource selection, but not use as
> >>> broadcast source for CPU-idle management. (This may be an independent
> >>> issue.)
> >>>
> >>> Then there is
> >>>
> >>> (XEN) [    2.760248] HPET: 8 timers usable for broadcast (8 total)
> >>>
> >>> which should only occur on ARAT-incapable systems. That should only be
> >>> older hardware. (On my much older Skylake I don't see this line, for
> >>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> >>> is purely featureset based, and hence doesn't expose info about that
> >>> leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
> >>> isn't useful to look at either. It would need to be CPUID output on a
> >>> bare metal kernel.
> >>>
> >>> Further I suspect the fingered commit may only have uncovered an issue
> >>> elsewhere. I don't think we clear any context table entries during
> >>> suspend or resume. Hence in
> >>>
> >>> (XEN) [   20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>> (XEN) [   20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>
> >>> the latter message is confusing me.
> >>>
> >>> The fault address being zero may, otoh, be a hint of hpet_msi_write()
> >>> never having run post-resume. Which may be the connection to the
> >>> dropping of hpet_msi_set_affinity(), as that did call that function.
> >>
> >> There clearly is an issue with the handling of the max_cstate variable,
> >> but I expect you don't use xenpm to limit usable C-states (there clearly
> >> is no respective command line option in the log you referenced)?
> > 
> > No, I don't think so.
> > 
> >> From what the log has, I conclude hpet_broadcast_resume() is called.
> > 
> > I don't think so... I applied changes as attached and got this on
> > resume:
> > 
> > (XEN) [   69.486120] Enabling non-boot CPUs  ...
> > (XEN) [   69.486404] mwait-idle: state C1 is disabled
> > (XEN) [   69.587869] mwait-idle: state C1 is disabled
> > (XEN) [   69.588008] mwait-idle: state C1 is disabled
> > (XEN) [   69.689438] mwait-idle: state C1 is disabled
> > (XEN) [   69.689608] mwait-idle: state C1 is disabled
> > (XEN) [   69.791066] mwait-idle: state C1 is disabled
> > (XEN) [   69.791334] mwait-idle: state C1 is disabled
> > (XEN) [   69.892938] mwait-idle: state C1 is disabled
> > (XEN) [   69.893209] mwait-idle: state C1 is disabled
> > (XEN) [   69.994890] mwait-idle: state C1 is disabled
> > (XEN) [   69.995096] mwait-idle: state C1 is disabled
> > (XEN) [   70.096638] mwait-idle: state C1 is disabled
> > (XEN) [   70.096915] mwait-idle: state C1 is disabled
> > (XEN) [   70.097093] mwait-idle: state C1 is disabled
> > (XEN) [   70.097272] mwait-idle: state C1 is disabled
> > (XEN) [   70.203357] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> > (XEN) [   70.203363] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> 
> That was on the serial console or from xl dmesg? I ask because console_resume()
> runs after time_resume(), so nothing appearing on the serial console would be
> expected (I think).

Ah, right, that's why I don't see my messages.
The xl dmesg output (from MTL this time):

    (XEN) [  123.477511] Entering ACPI S3 state.
    (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
    (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
    (XEN) [18446743903.571866] _disable_pit_irq:2662: init: 0
    (XEN) [18446743903.571877] hpet_broadcast_resume:661: hpet_events: ffff83046bc1f080
    (XEN) [18446743903.572020] hpet_broadcast_resume:672: num_hpets_used: 8
    (XEN) [18446743903.572029] hpet_broadcast_resume:690: cfg: 0x1
    (XEN) [18446743903.572040] hpet_broadcast_resume:695: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
    (XEN) [18446743903.572081] hpet_broadcast_resume:706: i:0, cfg: 0xc134
    (XEN) [18446743903.572089] hpet_broadcast_resume:695: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
    (XEN) [18446743903.572123] hpet_broadcast_resume:706: i:1, cfg: 0xc104
    (XEN) [18446743903.572132] hpet_broadcast_resume:695: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
    (XEN) [18446743903.572167] hpet_broadcast_resume:706: i:2, cfg: 0xc104
    (XEN) [18446743903.572175] hpet_broadcast_resume:695: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
    (XEN) [18446743903.572210] hpet_broadcast_resume:706: i:3, cfg: 0xc104
    (XEN) [18446743903.572218] hpet_broadcast_resume:695: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
    (XEN) [18446743903.572252] hpet_broadcast_resume:706: i:4, cfg: 0xc104
    (XEN) [18446743903.572261] hpet_broadcast_resume:695: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
    (XEN) [18446743903.572294] hpet_broadcast_resume:706: i:5, cfg: 0xc104
    (XEN) [18446743903.572303] hpet_broadcast_resume:695: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
    (XEN) [18446743903.572338] hpet_broadcast_resume:706: i:6, cfg: 0xc104
    (XEN) [18446743903.572347] hpet_broadcast_resume:695: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
    (XEN) [18446743903.572382] hpet_broadcast_resume:706: i:7, cfg: 0xc104

And the xen-cpuid -p output from this system:

    Xen reports there are maximum 120 leaves and 2 MSRs
    Raw policy: 48 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 00000023:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000005:ffffffff -> 00000040:00000040:00000003:11112020
      00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
      00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
      00000007:00000001 -> 40400910:00000001:00000000:00040000
      00000007:00000002 -> 00000000:00000000:00000000:0000003f
      0000000a:ffffffff -> 07300805:00000000:00000007:00008603
      0000000b:00000000 -> 00000001:00000002:00000100:00000020
      0000000b:00000001 -> 00000007:00000016:00000201:00000020
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00019900:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000008 -> 00000080:00000000:00000001:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      0000000d:0000000b -> 00000010:00000000:00000001:00000000
      0000000d:0000000c -> 00000018:00000000:00000001:00000000
      0000000d:0000000f -> 00000328:00000000:00000001:00000000
      0000000d:00000010 -> 00000008:00000000:00000001:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00000000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 000000000d89fd6b
    Host policy: 41 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000005:ffffffff -> 00000040:00000040:00000003:11112020
      00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
      00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
      00000007:00000001 -> 40000910:00000001:00000000:00040000
      00000007:00000002 -> 00000000:00000000:00000000:0000003f
      0000000b:00000000 -> 00000001:00000002:00000100:00000020
      0000000b:00000001 -> 00000007:00000016:00000201:00000020
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00000000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d89fd6b
    PV Max policy: 58 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f6f83203:1fc9cbf5
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:218c0329:18400700:ac004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000021
      0000000d:00000000 -> 00000007:00000000:00000340:00000000
      0000000d:00000001 -> 00000007:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      80000000:ffffffff -> 80000021:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000123:28100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00001000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000001d0ae167
    HVM Max policy: 65 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f7fa3223:1fcbfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:219c07ab:9840070c:bc004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000037
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000021:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000123:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000007:ffffffff -> 00000000:00000000:00000000:00000100
      80000008:ffffffff -> 0000302e:00101000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000001d0ae167
    PV Default policy: 33 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f6d83203:1fc9cbf5
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:218c0329:00400700:ac004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000021
      0000000d:00000000 -> 00000007:00000000:00000340:00000000
      0000000d:00000001 -> 00000007:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:28100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000008:ffffffff -> 0000302e:00001000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d08e163
    HVM Default policy: 40 leaves, 2 MSRs
     CPUID:
      leaf     subleaf  -> eax      ebx      ecx      edx     
      00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
      00000001:ffffffff -> 000a06a4:00800800:f7fa3203:1fcbfbff
      00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
      00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
      00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
      00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
      00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
      00000007:00000000 -> 00000002:219c07ab:8040070c:bc004410
      00000007:00000001 -> 00000810:00000000:00000000:00000000
      00000007:00000002 -> 00000000:00000000:00000000:00000037
      0000000d:00000000 -> 00000207:00000000:00000a88:00000000
      0000000d:00000001 -> 0000000f:00000000:00000000:00000000
      0000000d:00000002 -> 00000100:00000240:00000000:00000000
      0000000d:00000009 -> 00000008:00000a80:00000000:00000000
      80000000:ffffffff -> 80000008:00000000:00000000:00000000
      80000001:ffffffff -> 00000000:00000000:00000121:2c100800
      80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
      80000003:ffffffff -> 6c552029:20617274:35312037:00004835
      80000006:ffffffff -> 00000000:00000000:08007040:00000000
      80000008:ffffffff -> 0000302e:00101000:00000000:00000000
     MSRs:
      index    -> value           
      000000ce -> 0000000080000000
      0000010a -> 400000000d08e163


> Without hpet_broadcast_resume() running, I don't think I could explain how the
> channels (and their FSB interrupts) would get enabled.
> 
> Jan

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  8:08         ` Marek Marczykowski-Górecki
@ 2026-04-02  8:39           ` Jan Beulich
  2026-04-02  8:47             ` Jan Beulich
  2026-04-02  9:35             ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-02  8:39 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> The xl dmesg output (from MTL this time):
> 
>     (XEN) [  123.477511] Entering ACPI S3 state.
>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0

XEN_ARAT being off is the one odd aspect here. That'll want tracking down
separately. As per xen-cpuid output (below) ARAT is available.

>     (XEN) [18446743903.571866] _disable_pit_irq:2662: init: 0
>     (XEN) [18446743903.571877] hpet_broadcast_resume:661: hpet_events: ffff83046bc1f080
>     (XEN) [18446743903.572020] hpet_broadcast_resume:672: num_hpets_used: 8
>     (XEN) [18446743903.572029] hpet_broadcast_resume:690: cfg: 0x1
>     (XEN) [18446743903.572040] hpet_broadcast_resume:695: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
>     (XEN) [18446743903.572081] hpet_broadcast_resume:706: i:0, cfg: 0xc134
>     (XEN) [18446743903.572089] hpet_broadcast_resume:695: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
>     (XEN) [18446743903.572123] hpet_broadcast_resume:706: i:1, cfg: 0xc104
>     (XEN) [18446743903.572132] hpet_broadcast_resume:695: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
>     (XEN) [18446743903.572167] hpet_broadcast_resume:706: i:2, cfg: 0xc104
>     (XEN) [18446743903.572175] hpet_broadcast_resume:695: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
>     (XEN) [18446743903.572210] hpet_broadcast_resume:706: i:3, cfg: 0xc104
>     (XEN) [18446743903.572218] hpet_broadcast_resume:695: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
>     (XEN) [18446743903.572252] hpet_broadcast_resume:706: i:4, cfg: 0xc104
>     (XEN) [18446743903.572261] hpet_broadcast_resume:695: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
>     (XEN) [18446743903.572294] hpet_broadcast_resume:706: i:5, cfg: 0xc104
>     (XEN) [18446743903.572303] hpet_broadcast_resume:695: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
>     (XEN) [18446743903.572338] hpet_broadcast_resume:706: i:6, cfg: 0xc104
>     (XEN) [18446743903.572347] hpet_broadcast_resume:695: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
>     (XEN) [18446743903.572382] hpet_broadcast_resume:706: i:7, cfg: 0xc104

Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
might be a good idea, if only to double check.)

All values logged look entirely plausible, with XEN_ARAT being off.

> And the xen-cpuid -p output from this system:
> 
>     Xen reports there are maximum 120 leaves and 2 MSRs
>     Raw policy: 48 leaves, 2 MSRs
>      CPUID:
>       leaf     subleaf  -> eax      ebx      ecx      edx     
>       00000000:ffffffff -> 00000023:756e6547:6c65746e:49656e69
>       00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
>       00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>       00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>       00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
>       00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
>       00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
>       00000005:ffffffff -> 00000040:00000040:00000003:11112020
>       00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
>       00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
>       00000007:00000001 -> 40400910:00000001:00000000:00040000
>       00000007:00000002 -> 00000000:00000000:00000000:0000003f
>       0000000a:ffffffff -> 07300805:00000000:00000007:00008603
>       0000000b:00000000 -> 00000001:00000002:00000100:00000020
>       0000000b:00000001 -> 00000007:00000016:00000201:00000020
>       0000000d:00000000 -> 00000207:00000000:00000a88:00000000
>       0000000d:00000001 -> 0000000f:00000000:00019900:00000000
>       0000000d:00000002 -> 00000100:00000240:00000000:00000000
>       0000000d:00000008 -> 00000080:00000000:00000001:00000000
>       0000000d:00000009 -> 00000008:00000a80:00000000:00000000
>       0000000d:0000000b -> 00000010:00000000:00000001:00000000
>       0000000d:0000000c -> 00000018:00000000:00000001:00000000
>       0000000d:0000000f -> 00000328:00000000:00000001:00000000
>       0000000d:00000010 -> 00000008:00000000:00000001:00000000
>       80000000:ffffffff -> 80000008:00000000:00000000:00000000
>       80000001:ffffffff -> 00000000:00000000:00000121:2c100800
>       80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
>       80000003:ffffffff -> 6c552029:20617274:35312037:00004835
>       80000006:ffffffff -> 00000000:00000000:08007040:00000000
>       80000007:ffffffff -> 00000000:00000000:00000000:00000100
>       80000008:ffffffff -> 0000302e:00000000:00000000:00000000
>      MSRs:
>       index    -> value           
>       000000ce -> 0000000080000000
>       0000010a -> 000000000d89fd6b
>     Host policy: 41 leaves, 2 MSRs
>      CPUID:
>       leaf     subleaf  -> eax      ebx      ecx      edx     
>       00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>       00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
>       00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>       00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
>       00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
>       00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
>       00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
>       00000005:ffffffff -> 00000040:00000040:00000003:11112020
>       00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003

Still ARAT available as per here.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  8:39           ` Jan Beulich
@ 2026-04-02  8:47             ` Jan Beulich
  2026-04-02  9:42               ` Marek Marczykowski-Górecki
  2026-04-02  9:35             ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-02  8:47 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 02.04.2026 10:39, Jan Beulich wrote:
> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>> The xl dmesg output (from MTL this time):
>>
>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> 
> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
> separately. As per xen-cpuid output (below) ARAT is available.

For this you may want to also add logging to intel_init_arat(): Since opt_arat
can be false only due to command line option use, it can only be the function
not being called (which looks impossible on plain staging code), or cpu_has_arat
being false despite the xen-cpuid output that you supplied earlier (inexplicable
as well, at least for now).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  8:39           ` Jan Beulich
  2026-04-02  8:47             ` Jan Beulich
@ 2026-04-02  9:35             ` Marek Marczykowski-Górecki
  2026-04-02 10:48               ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02  9:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5240 bytes --]

On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> > The xl dmesg output (from MTL this time):
> > 
> >     (XEN) [  123.477511] Entering ACPI S3 state.
> >     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0

> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
> might be a good idea, if only to double check.)

Updated output:

    (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
    (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
    (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
    (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
    (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
    (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
    (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
    (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
    (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720649] hpet_broadcast_resume:701: i:0, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720665] hpet_broadcast_resume:711: i:0, cfg: 0xc134
    (XEN) [18446743899.720674] hpet_broadcast_resume:697: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
    (XEN) [18446743899.720684] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x130
    (XEN) [18446743899.720707] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720717] hpet_broadcast_resume:701: i:1, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720728] hpet_broadcast_resume:711: i:1, cfg: 0xc104
    (XEN) [18446743899.720737] hpet_broadcast_resume:697: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
    (XEN) [18446743899.720747] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x150
    (XEN) [18446743899.720771] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720781] hpet_broadcast_resume:701: i:2, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720797] hpet_broadcast_resume:711: i:2, cfg: 0xc104
    (XEN) [18446743899.720805] hpet_broadcast_resume:697: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
    (XEN) [18446743899.720816] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x170
    (XEN) [18446743899.720838] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720848] hpet_broadcast_resume:701: i:3, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720863] hpet_broadcast_resume:711: i:3, cfg: 0xc104
    (XEN) [18446743899.720872] hpet_broadcast_resume:697: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
    (XEN) [18446743899.720882] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x190
    (XEN) [18446743899.720905] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720915] hpet_broadcast_resume:701: i:4, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720931] hpet_broadcast_resume:711: i:4, cfg: 0xc104
    (XEN) [18446743899.720939] hpet_broadcast_resume:697: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
    (XEN) [18446743899.720949] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x1b0
    (XEN) [18446743899.720971] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.720981] hpet_broadcast_resume:701: i:5, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.720997] hpet_broadcast_resume:711: i:5, cfg: 0xc104
    (XEN) [18446743899.721006] hpet_broadcast_resume:697: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
    (XEN) [18446743899.721016] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x1d0
    (XEN) [18446743899.721039] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.721048] hpet_broadcast_resume:701: i:6, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.721064] hpet_broadcast_resume:711: i:6, cfg: 0xc104
    (XEN) [18446743899.721072] hpet_broadcast_resume:697: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
    (XEN) [18446743899.721082] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x1f0
    (XEN) [18446743899.721105] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743899.721115] hpet_broadcast_resume:701: i:7, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743899.721130] hpet_broadcast_resume:711: i:7, cfg: 0xc104

And the current debug diff attached.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #1.2: xen-debug.diff --]
[-- Type: text/plain, Size: 4846 bytes --]

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 5273fe0ae435..9916afd5ed68 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -364,6 +364,7 @@ static void __init calculate_host_policy(void)
     struct cpu_policy *p = &host_cpu_policy;
     unsigned int max_extd_leaf;
 
+    printk("%s:%d\n", __func__, __LINE__);
     *p = raw_cpu_policy;
 
     p->basic.max_leaf =
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 18b3c79dc97f..51a3d1c4b5f3 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -671,6 +671,7 @@ const struct cpu_dev __initconst_cf_clobber intel_cpu_dev = {
 
 void __init intel_init_arat(void)
 {
+    printk("%s:%d: opt_arat: %d, cpu_has_arat: %d\n", __func__, __LINE__, opt_arat, cpu_has_arat);
     if ( opt_arat && cpu_has_arat )
         setup_force_cpu_cap(X86_FEATURE_XEN_ARAT);
 }
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 1ea8ae457424..7731654efa9b 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -280,9 +280,11 @@ static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg *msg)
 {
     ch->msi.msg = *msg;
 
+    printk("%s:%d: iommu_intremap: %d (iommu_intremap_off: %d), HPET_Tn_ROUTE(ch->idx): %#x\n", __func__, __LINE__, iommu_intremap, iommu_intremap_off, HPET_Tn_ROUTE(ch->idx));
     if ( iommu_intremap != iommu_intremap_off )
     {
         int rc = iommu_update_ire_from_msi(&ch->msi, msg);
+        printk("%s:%d: iommu_update_ire_from_msi rc: %d\n", __func__, __LINE__, rc);
 
         if ( rc < 0 )
             return rc;
@@ -658,6 +660,8 @@ void hpet_broadcast_resume(void)
     u32 cfg;
     unsigned int i, n;
 
+    printk("%s:%d: hpet_events: %p\n", __func__, __LINE__, hpet_events);
+
     if ( !hpet_events )
         return;
 
@@ -667,25 +671,35 @@ void hpet_broadcast_resume(void)
 
     if ( num_hpets_used > 0 )
     {
+        printk("%s:%d: num_hpets_used: %d\n", __func__, __LINE__, num_hpets_used);
         /* Stop HPET legacy interrupts */
         cfg &= ~HPET_CFG_LEGACY;
         n = num_hpets_used;
     }
     else if ( hpet_events->flags & HPET_EVT_DISABLE )
+    {
+        printk("%s:%d: hpet_events->flags: %#x\n", __func__, __LINE__, hpet_events->flags);
         return;
+    }
     else
     {
         /* Start HPET legacy interrupts */
+        printk("%s:%d\n", __func__, __LINE__);
         cfg |= HPET_CFG_LEGACY;
         n = 1;
     }
 
+    printk("%s:%d: cfg: %#x\n", __func__, __LINE__, cfg);
     hpet_write32(cfg, HPET_CFG);
 
     for ( i = 0; i < n; i++ )
     {
+        printk("%s:%d: i:%d, hpet_events[i].msi.irq: %d, hpet_events[i].flags: %#x\n", __func__, __LINE__, i, hpet_events[i].msi.irq, hpet_events[i].flags);
         if ( hpet_events[i].msi.irq >= 0 )
-            __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
+        {
+            int ret = __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
+            printk("%s:%d: i:%d, __hpet_setup_msi_irq ret: %d\n", __func__, __LINE__, i, ret);
+        }
 
         /* set HPET Tn as oneshot */
         cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
@@ -694,6 +708,7 @@ void hpet_broadcast_resume(void)
         if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
             cfg |= HPET_TN_FSB;
         hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
+        printk("%s:%d: i:%d, cfg: %#x\n", __func__, __LINE__, i, cfg);
 
         hpet_events[i].next_event = STIME_MAX;
     }
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index fed30a919d2c..15113ebdfb6c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2646,6 +2646,7 @@ static int _disable_pit_irq(bool init)
 {
     int ret = 1;
 
+    printk("%s:%d: using_pit: %d, cpu_has_apic: %d\n", __func__, __LINE__, using_pit, cpu_has_apic);
     if ( using_pit || !cpu_has_apic )
         return -1;
 
@@ -2655,8 +2656,10 @@ static int _disable_pit_irq(bool init)
      * XXX dom0 may rely on RTC interrupt delivery, so only enable
      * hpet_broadcast if FSB mode available or if force_hpet_broadcast.
      */
+    printk("%s:%d: cpuidle_using_deep_cstate: %d, boot_cpu_has(X86_FEATURE_XEN_ARAT): %d\n", __func__, __LINE__, cpuidle_using_deep_cstate(), boot_cpu_has(X86_FEATURE_XEN_ARAT));
     if ( cpuidle_using_deep_cstate() && !boot_cpu_has(X86_FEATURE_XEN_ARAT) )
     {
+        printk("%s:%d: init: %d\n", __func__, __LINE__, init);
         init ? hpet_broadcast_init() : hpet_broadcast_resume();
         if ( !hpet_broadcast_is_available() )
         {
diff --git a/xen/source b/xen/source
new file mode 120000
index 000000000000..945c9b46d684
--- /dev/null
+++ b/xen/source
@@ -0,0 +1 @@
+.
\ No newline at end of file

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  8:47             ` Jan Beulich
@ 2026-04-02  9:42               ` Marek Marczykowski-Górecki
  2026-04-02 10:23                 ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02  9:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
> On 02.04.2026 10:39, Jan Beulich wrote:
> > On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >> The xl dmesg output (from MTL this time):
> >>
> >>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> > 
> > XEN_ARAT being off is the one odd aspect here. That'll want tracking down
> > separately. As per xen-cpuid output (below) ARAT is available.
> 
> For this you may want to also add logging to intel_init_arat(): Since opt_arat
> can be false only due to command line option use, it can only be the function
> not being called (which looks impossible on plain staging code), or cpu_has_arat
> being false despite the xen-cpuid output that you supplied earlier (inexplicable
> as well, at least for now).

Hm, I got this:

    (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0

so, cpu_has_arat=0 ...
next lines are those, to hint when it happened in the boot process:

    (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
    (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
    (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2

Looks like calculate_host_policy() runs much later...


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  9:42               ` Marek Marczykowski-Górecki
@ 2026-04-02 10:23                 ` Jan Beulich
  2026-04-02 14:02                   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-02 10:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Andrew Cooper
  Cc: xen-devel, Roger Pau Monné

On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
>> On 02.04.2026 10:39, Jan Beulich wrote:
>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>>> The xl dmesg output (from MTL this time):
>>>>
>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>
>>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
>>> separately. As per xen-cpuid output (below) ARAT is available.
>>
>> For this you may want to also add logging to intel_init_arat(): Since opt_arat
>> can be false only due to command line option use, it can only be the function
>> not being called (which looks impossible on plain staging code), or cpu_has_arat
>> being false despite the xen-cpuid output that you supplied earlier (inexplicable
>> as well, at least for now).
> 
> Hm, I got this:
> 
>     (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
> 
> so, cpu_has_arat=0 ...
> next lines are those, to hint when it happened in the boot process:
> 
>     (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
>     (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
>     (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2
> 
> Looks like calculate_host_policy() runs much later...

Hmm, yes, and that's the problem. The reason I don't see this is that a newer
version of [1] has this

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	}
 
 	/* Now the feature flags better reflect actual CPU features! */
+	if (c == &boot_cpu_data)
+		calculate_host_policy();
 
 	xstate_init(c);
 
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
     /* Was already added by probe_cpuid_faulting() */
 }
 
-static void __init calculate_host_policy(void)
+void __init calculate_host_policy(void)
 {
     struct cpu_policy *p = &host_cpu_policy;
 
@@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
 
 void __init init_guest_cpu_policies(void)
 {
+    /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses. */
     calculate_host_policy();
 
     if ( IS_ENABLED(CONFIG_PV) )

and of course I'm doing my work (and my analysis) with that in place.

I may need to break this out and submit independently, but really the problem
here is that the containing series has been sitting largely unreviewed (and
hence not in a position to plausibly re-post) for almost 5 years. Andrew,
(maybe also Roger) - I'm open to suggestions how to proceed. When your xstate
cleanup patches were helped to go in ahead of mine, you promised to help mine
going in afterwards. Yet nothing has happened (and I'm tired of re-submitting
large pieces of work just for the sake of re-submitting, i.e. without having
has [sufficient] feedback on the earlier version).

Jan

[1] https://lists.xen.org/archives/html/xen-devel/2021-04/msg01336.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02  9:35             ` Marek Marczykowski-Górecki
@ 2026-04-02 10:48               ` Jan Beulich
  2026-04-02 14:47                 ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-02 10:48 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>> The xl dmesg output (from MTL this time):
>>>
>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> 
>> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
>> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
>> might be a good idea, if only to double check.)
> 
> Updated output:
> 
>     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
>     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
>     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
>     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
>     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
>     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
>     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0

So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
half is, however, the address that the low half value is written to. It's hard
to imagine that it would be zero when the low half isn't, but it is about the
last thing I can think of which could explain observed behavior. (Yet then, all
of this is pretty meaningless; see below.)

> And the current debug diff attached.

Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
to move that part of logging to the bottom of hpet_msi_write(), or maybe to
where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
making the logging overall less verbose).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 10:23                 ` Jan Beulich
@ 2026-04-02 14:02                   ` Marek Marczykowski-Górecki
  2026-04-02 14:23                     ` Jan Beulich
  2026-04-07  6:48                     ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02 14:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]

On Thu, Apr 02, 2026 at 12:23:08PM +0200, Jan Beulich wrote:
> On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
> >> On 02.04.2026 10:39, Jan Beulich wrote:
> >>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>>> The xl dmesg output (from MTL this time):
> >>>>
> >>>>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>
> >>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
> >>> separately. As per xen-cpuid output (below) ARAT is available.
> >>
> >> For this you may want to also add logging to intel_init_arat(): Since opt_arat
> >> can be false only due to command line option use, it can only be the function
> >> not being called (which looks impossible on plain staging code), or cpu_has_arat
> >> being false despite the xen-cpuid output that you supplied earlier (inexplicable
> >> as well, at least for now).
> > 
> > Hm, I got this:
> > 
> >     (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
> > 
> > so, cpu_has_arat=0 ...
> > next lines are those, to hint when it happened in the boot process:
> > 
> >     (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
> >     (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
> >     (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2
> > 
> > Looks like calculate_host_policy() runs much later...
> 
> Hmm, yes, and that's the problem. The reason I don't see this is that a newer
> version of [1] has this
>
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>  	}
>  
>  	/* Now the feature flags better reflect actual CPU features! */
> +	if (c == &boot_cpu_data)
> +		calculate_host_policy();
>  
>  	xstate_init(c);
>  
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
>      /* Was already added by probe_cpuid_faulting() */
>  }
>  
> -static void __init calculate_host_policy(void)
> +void __init calculate_host_policy(void)
>  {
>      struct cpu_policy *p = &host_cpu_policy;
>  
> @@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
>  
>  void __init init_guest_cpu_policies(void)
>  {
> +    /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses. */
>      calculate_host_policy();
>  
>      if ( IS_ENABLED(CONFIG_PV) )
> 
> and of course I'm doing my work (and my analysis) with that in place.

FWIW, with this patch applied I get:
(XEN) [18446743899.051851] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743899.051865] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 1

And no IOMMU faults anymore.

> I may need to break this out and submit independently, but really the problem
> here is that the containing series has been sitting largely unreviewed (and
> hence not in a position to plausibly re-post) for almost 5 years. Andrew,
> (maybe also Roger) - I'm open to suggestions how to proceed. When your xstate
> cleanup patches were helped to go in ahead of mine, you promised to help mine
> going in afterwards. Yet nothing has happened (and I'm tired of re-submitting
> large pieces of work just for the sake of re-submitting, i.e. without having
> has [sufficient] feedback on the earlier version).
> 
> Jan
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2021-04/msg01336.html

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 14:02                   ` Marek Marczykowski-Górecki
@ 2026-04-02 14:23                     ` Jan Beulich
  2026-04-07  6:48                     ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-02 14:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, xen-devel, Roger Pau Monné

On 02.04.2026 16:02, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 12:23:08PM +0200, Jan Beulich wrote:
>> On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
>>> On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
>>>> On 02.04.2026 10:39, Jan Beulich wrote:
>>>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>>>>> The xl dmesg output (from MTL this time):
>>>>>>
>>>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>>>
>>>>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
>>>>> separately. As per xen-cpuid output (below) ARAT is available.
>>>>
>>>> For this you may want to also add logging to intel_init_arat(): Since opt_arat
>>>> can be false only due to command line option use, it can only be the function
>>>> not being called (which looks impossible on plain staging code), or cpu_has_arat
>>>> being false despite the xen-cpuid output that you supplied earlier (inexplicable
>>>> as well, at least for now).
>>>
>>> Hm, I got this:
>>>
>>>     (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
>>>
>>> so, cpu_has_arat=0 ...
>>> next lines are those, to hint when it happened in the boot process:
>>>
>>>     (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
>>>     (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
>>>     (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2
>>>
>>> Looks like calculate_host_policy() runs much later...
>>
>> Hmm, yes, and that's the problem. The reason I don't see this is that a newer
>> version of [1] has this
>>
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>  	}
>>  
>>  	/* Now the feature flags better reflect actual CPU features! */
>> +	if (c == &boot_cpu_data)
>> +		calculate_host_policy();
>>  
>>  	xstate_init(c);
>>  
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
>>      /* Was already added by probe_cpuid_faulting() */
>>  }
>>  
>> -static void __init calculate_host_policy(void)
>> +void __init calculate_host_policy(void)
>>  {
>>      struct cpu_policy *p = &host_cpu_policy;
>>  
>> @@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
>>  
>>  void __init init_guest_cpu_policies(void)
>>  {
>> +    /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses. */
>>      calculate_host_policy();
>>  
>>      if ( IS_ENABLED(CONFIG_PV) )
>>
>> and of course I'm doing my work (and my analysis) with that in place.
> 
> FWIW, with this patch applied I get:
> (XEN) [18446743899.051851] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> (XEN) [18446743899.051865] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 1
> 
> And no IOMMU faults anymore.

Right, because then - as intended - HPET broadcast isn't used. You'd see them
again if you put "no-arat" on the command line. (And we really want to figure
out that issue, if at all possible.)

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 10:48               ` Jan Beulich
@ 2026-04-02 14:47                 ` Marek Marczykowski-Górecki
  2026-04-02 14:53                   ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02 14:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7207 bytes --]

On Thu, Apr 02, 2026 at 12:48:14PM +0200, Jan Beulich wrote:
> On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
> >> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>> The xl dmesg output (from MTL this time):
> >>>
> >>>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> > 
> >> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
> >> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
> >> might be a good idea, if only to double check.)
> > 
> > Updated output:
> > 
> >     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
> >     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
> >     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
> >     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
> >     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> >     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
> >     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
> 
> So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
> half is, however, the address that the low half value is written to. It's hard
> to imagine that it would be zero when the low half isn't, but it is about the
> last thing I can think of which could explain observed behavior. (Yet then, all
> of this is pretty meaningless; see below.)
> 
> > And the current debug diff attached.
> 
> Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
> to move that part of logging to the bottom of hpet_msi_write(), or maybe to
> where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
> making the logging overall less verbose).

This test is with the updated patch (attached) + your extra
calculate_host_policy() call and "no-arat" on cmdline:

    (XEN) [18446743900.569705] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
    (XEN) [18446743900.569720] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
    (XEN) [18446743900.569730] _disable_pit_irq:2662: init: 0
    (XEN) [18446743900.569741] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
    (XEN) [18446743900.569885] hpet_broadcast_resume:673: num_hpets_used: 8
    (XEN) [18446743900.569893] hpet_broadcast_resume:691: cfg: 0x1
    (XEN) [18446743900.569905] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
    (XEN) [18446743900.569935] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.569946] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.569970] hpet_broadcast_resume:710: i:0, cfg: 0xc134, HPET_Tn_ROUTE(hpet_events[i].idx): 0x110
    (XEN) [18446743900.569980] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x110
    (XEN) [18446743900.569989] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
    (XEN) [18446743900.570012] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570022] hpet_broadcast_resume:700: i:1, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570040] hpet_broadcast_resume:710: i:1, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x130
    (XEN) [18446743900.570050] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x130
    (XEN) [18446743900.570059] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
    (XEN) [18446743900.570082] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570092] hpet_broadcast_resume:700: i:2, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570105] hpet_broadcast_resume:710: i:2, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x150
    (XEN) [18446743900.570114] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x150
    (XEN) [18446743900.570123] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
    (XEN) [18446743900.570145] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570155] hpet_broadcast_resume:700: i:3, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570172] hpet_broadcast_resume:710: i:3, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x170
    (XEN) [18446743900.570181] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x170
    (XEN) [18446743900.570191] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
    (XEN) [18446743900.570214] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570225] hpet_broadcast_resume:700: i:4, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570242] hpet_broadcast_resume:710: i:4, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x190
    (XEN) [18446743900.570251] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x190
    (XEN) [18446743900.570260] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
    (XEN) [18446743900.570282] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570292] hpet_broadcast_resume:700: i:5, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570309] hpet_broadcast_resume:710: i:5, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x1b0
    (XEN) [18446743900.570318] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x1b0
    (XEN) [18446743900.570327] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
    (XEN) [18446743900.570351] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570361] hpet_broadcast_resume:700: i:6, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570374] hpet_broadcast_resume:710: i:6, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x1d0
    (XEN) [18446743900.570383] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x1d0
    (XEN) [18446743900.570392] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
    (XEN) [18446743900.570415] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
    (XEN) [18446743900.570425] hpet_broadcast_resume:700: i:7, __hpet_setup_msi_irq ret: 0
    (XEN) [18446743900.570442] hpet_broadcast_resume:710: i:7, cfg: 0xc104, HPET_Tn_ROUTE(hpet_events[i].idx): 0x1f0
    (XEN) [18446743900.570451] hpet_broadcast_resume:713: HPET_Tn_ROUTE(hpet_events[i].idx): 0x1f0


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #1.2: 0001-DEBUG.patch --]
[-- Type: text/plain, Size: 5363 bytes --]

From 34e6a34cf0504233776337ace8ac69a92297984e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
 <marmarek@invisiblethingslab.com>
Date: Thu, 2 Apr 2026 11:09:32 +0200
Subject: [PATCH] DEBUG

---
 xen/arch/x86/cpu-policy.c |  1 +
 xen/arch/x86/cpu/intel.c  |  1 +
 xen/arch/x86/hpet.c       | 17 ++++++++++++++++-
 xen/arch/x86/time.c       |  3 +++
 xen/source                |  1 +
 5 files changed, 22 insertions(+), 1 deletion(-)
 create mode 120000 xen/source

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 5273fe0ae435..9916afd5ed68 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -364,6 +364,7 @@ static void __init calculate_host_policy(void)
     struct cpu_policy *p = &host_cpu_policy;
     unsigned int max_extd_leaf;
 
+    printk("%s:%d\n", __func__, __LINE__);
     *p = raw_cpu_policy;
 
     p->basic.max_leaf =
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 18b3c79dc97f..51a3d1c4b5f3 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -671,6 +671,7 @@ const struct cpu_dev __initconst_cf_clobber intel_cpu_dev = {
 
 void __init intel_init_arat(void)
 {
+    printk("%s:%d: opt_arat: %d, cpu_has_arat: %d\n", __func__, __LINE__, opt_arat, cpu_has_arat);
     if ( opt_arat && cpu_has_arat )
         setup_force_cpu_cap(X86_FEATURE_XEN_ARAT);
 }
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 1ea8ae457424..cef060cb18bb 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -283,6 +283,7 @@ static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg *msg)
     if ( iommu_intremap != iommu_intremap_off )
     {
         int rc = iommu_update_ire_from_msi(&ch->msi, msg);
+        printk("%s:%d: iommu_update_ire_from_msi rc: %d\n", __func__, __LINE__, rc);
 
         if ( rc < 0 )
             return rc;
@@ -658,6 +659,8 @@ void hpet_broadcast_resume(void)
     u32 cfg;
     unsigned int i, n;
 
+    printk("%s:%d: hpet_events: %p\n", __func__, __LINE__, hpet_events);
+
     if ( !hpet_events )
         return;
 
@@ -667,25 +670,35 @@ void hpet_broadcast_resume(void)
 
     if ( num_hpets_used > 0 )
     {
+        printk("%s:%d: num_hpets_used: %d\n", __func__, __LINE__, num_hpets_used);
         /* Stop HPET legacy interrupts */
         cfg &= ~HPET_CFG_LEGACY;
         n = num_hpets_used;
     }
     else if ( hpet_events->flags & HPET_EVT_DISABLE )
+    {
+        printk("%s:%d: hpet_events->flags: %#x\n", __func__, __LINE__, hpet_events->flags);
         return;
+    }
     else
     {
         /* Start HPET legacy interrupts */
+        printk("%s:%d\n", __func__, __LINE__);
         cfg |= HPET_CFG_LEGACY;
         n = 1;
     }
 
+    printk("%s:%d: cfg: %#x\n", __func__, __LINE__, cfg);
     hpet_write32(cfg, HPET_CFG);
 
     for ( i = 0; i < n; i++ )
     {
+        printk("%s:%d: i:%d, hpet_events[i].msi.irq: %d, hpet_events[i].flags: %#x\n", __func__, __LINE__, i, hpet_events[i].msi.irq, hpet_events[i].flags);
         if ( hpet_events[i].msi.irq >= 0 )
-            __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
+        {
+            int ret = __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
+            printk("%s:%d: i:%d, __hpet_setup_msi_irq ret: %d\n", __func__, __LINE__, i, ret);
+        }
 
         /* set HPET Tn as oneshot */
         cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
@@ -694,8 +707,10 @@ void hpet_broadcast_resume(void)
         if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
             cfg |= HPET_TN_FSB;
         hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
+        printk("%s:%d: i:%d, cfg: %#x, HPET_Tn_ROUTE(hpet_events[i].idx): %#x\n", __func__, __LINE__, i, cfg, HPET_Tn_ROUTE(hpet_events[i].idx));
 
         hpet_events[i].next_event = STIME_MAX;
+        printk("%s:%d: HPET_Tn_ROUTE(hpet_events[i].idx): %#x\n", __func__, __LINE__, HPET_Tn_ROUTE(hpet_events[i].idx));
     }
 }
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index fed30a919d2c..15113ebdfb6c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2646,6 +2646,7 @@ static int _disable_pit_irq(bool init)
 {
     int ret = 1;
 
+    printk("%s:%d: using_pit: %d, cpu_has_apic: %d\n", __func__, __LINE__, using_pit, cpu_has_apic);
     if ( using_pit || !cpu_has_apic )
         return -1;
 
@@ -2655,8 +2656,10 @@ static int _disable_pit_irq(bool init)
      * XXX dom0 may rely on RTC interrupt delivery, so only enable
      * hpet_broadcast if FSB mode available or if force_hpet_broadcast.
      */
+    printk("%s:%d: cpuidle_using_deep_cstate: %d, boot_cpu_has(X86_FEATURE_XEN_ARAT): %d\n", __func__, __LINE__, cpuidle_using_deep_cstate(), boot_cpu_has(X86_FEATURE_XEN_ARAT));
     if ( cpuidle_using_deep_cstate() && !boot_cpu_has(X86_FEATURE_XEN_ARAT) )
     {
+        printk("%s:%d: init: %d\n", __func__, __LINE__, init);
         init ? hpet_broadcast_init() : hpet_broadcast_resume();
         if ( !hpet_broadcast_is_available() )
         {
diff --git a/xen/source b/xen/source
new file mode 120000
index 000000000000..945c9b46d684
--- /dev/null
+++ b/xen/source
@@ -0,0 +1 @@
+.
\ No newline at end of file
-- 
2.53.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 14:47                 ` Marek Marczykowski-Górecki
@ 2026-04-02 14:53                   ` Jan Beulich
  2026-04-02 23:06                     ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-02 14:53 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 02.04.2026 16:47, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 12:48:14PM +0200, Jan Beulich wrote:
>> On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
>>> On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
>>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>>>> The xl dmesg output (from MTL this time):
>>>>>
>>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>
>>>> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
>>>> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
>>>> might be a good idea, if only to double check.)
>>>
>>> Updated output:
>>>
>>>     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
>>>     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
>>>     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
>>>     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
>>>     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
>>>     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
>>>     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
>>
>> So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
>> half is, however, the address that the low half value is written to. It's hard
>> to imagine that it would be zero when the low half isn't, but it is about the
>> last thing I can think of which could explain observed behavior. (Yet then, all
>> of this is pretty meaningless; see below.)
>>
>>> And the current debug diff attached.
>>
>> Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
>> to move that part of logging to the bottom of hpet_msi_write(), or maybe to
>> where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
>> making the logging overall less verbose).
> 
> This test is with the updated patch (attached) + your extra
> calculate_host_policy() call and "no-arat" on cmdline:

And IOMMU faults still occurring as before, I expect.

Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
the high halves at all.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 14:53                   ` Jan Beulich
@ 2026-04-02 23:06                     ` Marek Marczykowski-Górecki
  2026-04-07  6:29                       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-02 23:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 7287 bytes --]

On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
> On 02.04.2026 16:47, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 12:48:14PM +0200, Jan Beulich wrote:
> >> On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
> >>> On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
> >>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>>>> The xl dmesg output (from MTL this time):
> >>>>>
> >>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>
> >>>> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
> >>>> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
> >>>> might be a good idea, if only to double check.)
> >>>
> >>> Updated output:
> >>>
> >>>     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
> >>>     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
> >>>     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
> >>>     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
> >>>     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> >>>     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
> >>>     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
> >>
> >> So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
> >> half is, however, the address that the low half value is written to. It's hard
> >> to imagine that it would be zero when the low half isn't, but it is about the
> >> last thing I can think of which could explain observed behavior. (Yet then, all
> >> of this is pretty meaningless; see below.)
> >>
> >>> And the current debug diff attached.
> >>
> >> Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
> >> to move that part of logging to the bottom of hpet_msi_write(), or maybe to
> >> where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
> >> making the logging overall less verbose).
> > 
> > This test is with the updated patch (attached) + your extra
> > calculate_host_policy() call and "no-arat" on cmdline:
> 
> And IOMMU faults still occurring as before, I expect.
> 
> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
> the high halves at all.

I was missing hpet_read32 there...

Updated:
(XEN) [  116.921573] Entering ACPI S3 state.
(XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743895.088907] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
(XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
(XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
(XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
(XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
(XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
(XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18
(XEN) [18446743895.089180] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
(XEN) [18446743895.089203] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089213] hpet_broadcast_resume:700: i:1, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089242] hpet_broadcast_resume:710: i:1, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf38
(XEN) [18446743895.089254] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
(XEN) [18446743895.089278] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089288] hpet_broadcast_resume:700: i:2, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089316] hpet_broadcast_resume:710: i:2, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf58
(XEN) [18446743895.089327] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
(XEN) [18446743895.089350] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089361] hpet_broadcast_resume:700: i:3, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089390] hpet_broadcast_resume:710: i:3, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf78
(XEN) [18446743895.089401] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
(XEN) [18446743895.089425] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089436] hpet_broadcast_resume:700: i:4, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089465] hpet_broadcast_resume:710: i:4, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf98
(XEN) [18446743895.089476] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
(XEN) [18446743895.089499] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089509] hpet_broadcast_resume:700: i:5, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089540] hpet_broadcast_resume:710: i:5, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfb8
(XEN) [18446743895.089551] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
(XEN) [18446743895.089574] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089584] hpet_broadcast_resume:700: i:6, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089622] hpet_broadcast_resume:710: i:6, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfd8
(XEN) [18446743895.089633] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
(XEN) [18446743895.089655] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743895.089665] hpet_broadcast_resume:700: i:7, __hpet_setup_msi_irq ret: 0
(XEN) [18446743895.089702] hpet_broadcast_resume:710: i:7, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xff8




-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 23:06                     ` Marek Marczykowski-Górecki
@ 2026-04-07  6:29                       ` Jan Beulich
  2026-04-07 10:02                         ` Marek Marczykowski-Górecki
  2026-04-07 10:23                         ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-07  6:29 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 03.04.2026 01:06, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
>> On 02.04.2026 16:47, Marek Marczykowski-Górecki wrote:
>>> On Thu, Apr 02, 2026 at 12:48:14PM +0200, Jan Beulich wrote:
>>>> On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
>>>>> On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
>>>>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>>>>>> The xl dmesg output (from MTL this time):
>>>>>>>
>>>>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>>>
>>>>>> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
>>>>>> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
>>>>>> might be a good idea, if only to double check.)
>>>>>
>>>>> Updated output:
>>>>>
>>>>>     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>>     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>>>     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
>>>>>     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
>>>>>     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
>>>>>     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
>>>>>     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
>>>>>     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
>>>>>     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
>>>>
>>>> So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
>>>> half is, however, the address that the low half value is written to. It's hard
>>>> to imagine that it would be zero when the low half isn't, but it is about the
>>>> last thing I can think of which could explain observed behavior. (Yet then, all
>>>> of this is pretty meaningless; see below.)
>>>>
>>>>> And the current debug diff attached.
>>>>
>>>> Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
>>>> to move that part of logging to the bottom of hpet_msi_write(), or maybe to
>>>> where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
>>>> making the logging overall less verbose).
>>>
>>> This test is with the updated patch (attached) + your extra
>>> calculate_host_policy() call and "no-arat" on cmdline:
>>
>> And IOMMU faults still occurring as before, I expect.
>>
>> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
>> the high halves at all.
> 
> I was missing hpet_read32 there...
> 
> Updated:
> (XEN) [  116.921573] Entering ACPI S3 state.
> (XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> (XEN) [18446743895.088907] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> (XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
> (XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
> (XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
> (XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
> (XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> (XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18

Okay, this would appear to clarify that the address really isn't correct. Yet I'm
confused now by the low half values: In your earlier log there was

hpet_broadcast_resume:710: i:0, cfg: 0xc134, HPET_Tn_ROUTE(hpet_events[i].idx): 0x110

and alike, i.e. clearly a non-zero value. Now all low halves are zero. I'll try
to figure how the logged values here could result, but consistent data (or an
explantation for the apparent inconsistency) would help.

Jan

> (XEN) [18446743895.089180] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
> (XEN) [18446743895.089203] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089213] hpet_broadcast_resume:700: i:1, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089242] hpet_broadcast_resume:710: i:1, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf38
> (XEN) [18446743895.089254] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
> (XEN) [18446743895.089278] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089288] hpet_broadcast_resume:700: i:2, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089316] hpet_broadcast_resume:710: i:2, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf58
> (XEN) [18446743895.089327] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
> (XEN) [18446743895.089350] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089361] hpet_broadcast_resume:700: i:3, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089390] hpet_broadcast_resume:710: i:3, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf78
> (XEN) [18446743895.089401] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
> (XEN) [18446743895.089425] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089436] hpet_broadcast_resume:700: i:4, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089465] hpet_broadcast_resume:710: i:4, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf98
> (XEN) [18446743895.089476] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
> (XEN) [18446743895.089499] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089509] hpet_broadcast_resume:700: i:5, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089540] hpet_broadcast_resume:710: i:5, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfb8
> (XEN) [18446743895.089551] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
> (XEN) [18446743895.089574] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089584] hpet_broadcast_resume:700: i:6, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089622] hpet_broadcast_resume:710: i:6, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfd8
> (XEN) [18446743895.089633] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
> (XEN) [18446743895.089655] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743895.089665] hpet_broadcast_resume:700: i:7, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743895.089702] hpet_broadcast_resume:710: i:7, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xff8
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-02 14:02                   ` Marek Marczykowski-Górecki
  2026-04-02 14:23                     ` Jan Beulich
@ 2026-04-07  6:48                     ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2026-04-07  6:48 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, xen-devel, Roger Pau Monné

On 02.04.2026 16:02, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 02, 2026 at 12:23:08PM +0200, Jan Beulich wrote:
>> On 02.04.2026 11:42, Marek Marczykowski-Górecki wrote:
>>> On Thu, Apr 02, 2026 at 10:47:53AM +0200, Jan Beulich wrote:
>>>> On 02.04.2026 10:39, Jan Beulich wrote:
>>>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
>>>>>> The xl dmesg output (from MTL this time):
>>>>>>
>>>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
>>>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>>>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>>>>>
>>>>> XEN_ARAT being off is the one odd aspect here. That'll want tracking down
>>>>> separately. As per xen-cpuid output (below) ARAT is available.
>>>>
>>>> For this you may want to also add logging to intel_init_arat(): Since opt_arat
>>>> can be false only due to command line option use, it can only be the function
>>>> not being called (which looks impossible on plain staging code), or cpu_has_arat
>>>> being false despite the xen-cpuid output that you supplied earlier (inexplicable
>>>> as well, at least for now).
>>>
>>> Hm, I got this:
>>>
>>>     (XEN) [   11.403340] intel_init_arat:674: opt_arat: 1, cpu_has_arat: 0
>>>
>>> so, cpu_has_arat=0 ...
>>> next lines are those, to hint when it happened in the boot process:
>>>
>>>     (XEN) [   11.409754] mwait-idle: MWAIT substates: 0x11112020
>>>     (XEN) [   11.416130] mwait-idle: v0.4.1 model 0xaa
>>>     (XEN) [   11.422396] mwait-idle: lapic_timer_reliable_states 0x2
>>>
>>> Looks like calculate_host_policy() runs much later...
>>
>> Hmm, yes, and that's the problem. The reason I don't see this is that a newer
>> version of [1] has this
>>
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -628,6 +628,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>  	}
>>  
>>  	/* Now the feature flags better reflect actual CPU features! */
>> +	if (c == &boot_cpu_data)
>> +		calculate_host_policy();
>>  
>>  	xstate_init(c);
>>  
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -384,7 +384,7 @@ void calculate_raw_cpu_policy(void)
>>      /* Was already added by probe_cpuid_faulting() */
>>  }
>>  
>> -static void __init calculate_host_policy(void)
>> +void __init calculate_host_policy(void)
>>  {
>>      struct cpu_policy *p = &host_cpu_policy;
>>  
>> @@ -959,6 +959,7 @@ static void __init calculate_hvm_def_pol
>>  
>>  void __init init_guest_cpu_policies(void)
>>  {
>> +    /* Do this a 2nd time to account for setup_{clear,force}_cpu_cap() uses. */
>>      calculate_host_policy();
>>  
>>      if ( IS_ENABLED(CONFIG_PV) )
>>
>> and of course I'm doing my work (and my analysis) with that in place.
> 
> FWIW, with this patch applied I get:
> (XEN) [18446743899.051851] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> (XEN) [18446743899.051865] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 1
> 
> And no IOMMU faults anymore.

I've Cc-ed you on the formal patch submission; please clarify whether I may
translate the above to Tested-by:.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-07  6:29                       ` Jan Beulich
@ 2026-04-07 10:02                         ` Marek Marczykowski-Górecki
  2026-04-07 10:23                         ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-07 10:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 8293 bytes --]

On Tue, Apr 07, 2026 at 08:29:48AM +0200, Jan Beulich wrote:
> On 03.04.2026 01:06, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
> >> On 02.04.2026 16:47, Marek Marczykowski-Górecki wrote:
> >>> On Thu, Apr 02, 2026 at 12:48:14PM +0200, Jan Beulich wrote:
> >>>> On 02.04.2026 11:35, Marek Marczykowski-Górecki wrote:
> >>>>> On Thu, Apr 02, 2026 at 10:39:41AM +0200, Jan Beulich wrote:
> >>>>>> On 02.04.2026 10:08, Marek Marczykowski-Górecki wrote:
> >>>>>>> The xl dmesg output (from MTL this time):
> >>>>>>>
> >>>>>>>     (XEN) [  123.477511] Entering ACPI S3 state.
> >>>>>>>     (XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>>>>>     (XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>>>
> >>>>>> Hmm, but what you didn't log is whether __hpet_setup_msi_irq() actually
> >>>>>> succeeded everywhere. (And if it did, also logging HPET_Tn_ROUTE() values
> >>>>>> might be a good idea, if only to double check.)
> >>>>>
> >>>>> Updated output:
> >>>>>
> >>>>>     (XEN) [18446743899.720395] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >>>>>     (XEN) [18446743899.720409] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >>>>>     (XEN) [18446743899.720420] _disable_pit_irq:2662: init: 0
> >>>>>     (XEN) [18446743899.720431] hpet_broadcast_resume:663: hpet_events: ffff83046bc1f080
> >>>>>     (XEN) [18446743899.720579] hpet_broadcast_resume:674: num_hpets_used: 8
> >>>>>     (XEN) [18446743899.720587] hpet_broadcast_resume:692: cfg: 0x1
> >>>>>     (XEN) [18446743899.720599] hpet_broadcast_resume:697: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> >>>>>     (XEN) [18446743899.720612] hpet_msi_write:283: iommu_intremap: 2 (iommu_intremap_off: 0), HPET_Tn_ROUTE(ch->idx): 0x110
> >>>>>     (XEN) [18446743899.720638] hpet_msi_write:287: iommu_update_ire_from_msi rc: 0
> >>>>
> >>>> So it succeeds, and the low half of HPET_Tn_ROUTE also looks plausible. The high
> >>>> half is, however, the address that the low half value is written to. It's hard
> >>>> to imagine that it would be zero when the low half isn't, but it is about the
> >>>> last thing I can think of which could explain observed behavior. (Yet then, all
> >>>> of this is pretty meaningless; see below.)
> >>>>
> >>>>> And the current debug diff attached.
> >>>>
> >>>> Hmm, you log HPET_Tn_ROUTE _before_ our update. That's not very useful. You want
> >>>> to move that part of logging to the bottom of hpet_msi_write(), or maybe to
> >>>> where you also log the per-channel cfg value in hpet_broadcast_resume() (thus
> >>>> making the logging overall less verbose).
> >>>
> >>> This test is with the updated patch (attached) + your extra
> >>> calculate_host_policy() call and "no-arat" on cmdline:
> >>
> >> And IOMMU faults still occurring as before, I expect.
> >>
> >> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
> >> the high halves at all.
> > 
> > I was missing hpet_read32 there...
> > 
> > Updated:
> > (XEN) [  116.921573] Entering ACPI S3 state.
> > (XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> > (XEN) [18446743895.088907] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> > (XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
> > (XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
> > (XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
> > (XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
> > (XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> > (XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18
> 
> Okay, this would appear to clarify that the address really isn't correct. Yet I'm
> confused now by the low half values: In your earlier log there was
> 
> hpet_broadcast_resume:710: i:0, cfg: 0xc134, HPET_Tn_ROUTE(hpet_events[i].idx): 0x110

My earlier logging included literal HPET_Tn_ROUTE() macro output, not
hpet_read32() of it...

> and alike, i.e. clearly a non-zero value. Now all low halves are zero. I'll try
> to figure how the logged values here could result, but consistent data (or an
> explantation for the apparent inconsistency) would help.
> 
> Jan
> 
> > (XEN) [18446743895.089180] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
> > (XEN) [18446743895.089203] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089213] hpet_broadcast_resume:700: i:1, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089242] hpet_broadcast_resume:710: i:1, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf38
> > (XEN) [18446743895.089254] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
> > (XEN) [18446743895.089278] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089288] hpet_broadcast_resume:700: i:2, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089316] hpet_broadcast_resume:710: i:2, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf58
> > (XEN) [18446743895.089327] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
> > (XEN) [18446743895.089350] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089361] hpet_broadcast_resume:700: i:3, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089390] hpet_broadcast_resume:710: i:3, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf78
> > (XEN) [18446743895.089401] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
> > (XEN) [18446743895.089425] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089436] hpet_broadcast_resume:700: i:4, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089465] hpet_broadcast_resume:710: i:4, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf98
> > (XEN) [18446743895.089476] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
> > (XEN) [18446743895.089499] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089509] hpet_broadcast_resume:700: i:5, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089540] hpet_broadcast_resume:710: i:5, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfb8
> > (XEN) [18446743895.089551] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
> > (XEN) [18446743895.089574] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089584] hpet_broadcast_resume:700: i:6, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089622] hpet_broadcast_resume:710: i:6, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfd8
> > (XEN) [18446743895.089633] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
> > (XEN) [18446743895.089655] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> > (XEN) [18446743895.089665] hpet_broadcast_resume:700: i:7, __hpet_setup_msi_irq ret: 0
> > (XEN) [18446743895.089702] hpet_broadcast_resume:710: i:7, cfg: 0xc104, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xff8
> > 
> > 
> > 
> > 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-07  6:29                       ` Jan Beulich
  2026-04-07 10:02                         ` Marek Marczykowski-Górecki
@ 2026-04-07 10:23                         ` Jan Beulich
  2026-04-07 11:34                           ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-07 10:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 07.04.2026 08:29, Jan Beulich wrote:
> On 03.04.2026 01:06, Marek Marczykowski-Górecki wrote:
>> On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
>>> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
>>> the high halves at all.
>>
>> I was missing hpet_read32 there...
>>
>> Updated:
>> (XEN) [  116.921573] Entering ACPI S3 state.
>> (XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
>> (XEN) [18446743895.088907] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
>> (XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
>> (XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
>> (XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
>> (XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
>> (XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
>> (XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
>> (XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
>> (XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18
> 
> Okay, this would appear to clarify that the address really isn't correct. Yet I'm
> confused now by the low half values: In your earlier log there was
> 
> hpet_broadcast_resume:710: i:0, cfg: 0xc134, HPET_Tn_ROUTE(hpet_events[i].idx): 0x110
> 
> and alike, i.e. clearly a non-zero value. Now all low halves are zero. I'll try
> to figure how the logged values here could result, but consistent data (or an
> explantation for the apparent inconsistency) would help.

Could you give the patch below a try?

Jan

x86/HPET: channel handling in hpet_broadcast_resume()

The per-channel ENABLE bit is to solely be driven by hpet_enable_channel()
and hpet_msi_{,un}mask(). It doesn't need setting immediately. Except for
the (possible) channel put in legacy mode we don't do so during boot
either.

Instead reset ->arch.cpu_mask, to avoid msi_compose_msg() yielding an
all-zero message (when the passed in CPU mask has no online CPUs). Nothing
would later call msi_compose_msg() / hpet_msi_write(), and hence nothing
would later produce a well-formed message template in
hpet_events[].msi.msg.

Fixes: 15aa6c67486c ("amd iommu: use base platform MSI implementation")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
As to the Fixes: tag: The issue for the HPET resume case is the
cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) check in
msi_compose_msg(). The earlier cpumask_empty() wasn't a problem, as
cpu_mask_to_apicid() returning a bogus (offline) value didn't have any bad
effect: Before use, a valid destination would have been put in place, but
other parts of .msg were properly set up. Furthermore we also didn't clear
the entire message prior to that change.

--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -685,12 +685,18 @@ void hpet_broadcast_resume(void)
     for ( i = 0; i < n; i++ )
     {
         if ( hpet_events[i].msi.irq >= 0 )
+        {
+            struct irq_desc *desc = irq_to_desc(hpet_events[i].msi.irq);
+
+            cpumask_copy(desc->arch.cpu_mask, cpumask_of(smp_processor_id()));
+
             __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
+        }
 
         /* set HPET Tn as oneshot */
         cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
         cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
-        cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
+        cfg |= HPET_TN_32BIT;
         if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
             cfg |= HPET_TN_FSB;
         hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-07 10:23                         ` Jan Beulich
@ 2026-04-07 11:34                           ` Marek Marczykowski-Górecki
  2026-04-07 11:52                             ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-07 11:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 8628 bytes --]

On Tue, Apr 07, 2026 at 12:23:16PM +0200, Jan Beulich wrote:
> On 07.04.2026 08:29, Jan Beulich wrote:
> > On 03.04.2026 01:06, Marek Marczykowski-Górecki wrote:
> >> On Thu, Apr 02, 2026 at 04:53:31PM +0200, Jan Beulich wrote:
> >>> Sadly you now log the low halves of HPET_Tn_ROUTE twice, while you don't log
> >>> the high halves at all.
> >>
> >> I was missing hpet_read32 there...
> >>
> >> Updated:
> >> (XEN) [  116.921573] Entering ACPI S3 state.
> >> (XEN) [18446743895.088893] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> >> (XEN) [18446743895.088907] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> >> (XEN) [18446743895.088918] _disable_pit_irq:2662: init: 0
> >> (XEN) [18446743895.088928] hpet_broadcast_resume:662: hpet_events: ffff83046bc1f080
> >> (XEN) [18446743895.089072] hpet_broadcast_resume:673: num_hpets_used: 8
> >> (XEN) [18446743895.089081] hpet_broadcast_resume:691: cfg: 0x1
> >> (XEN) [18446743895.089092] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> >> (XEN) [18446743895.089122] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> >> (XEN) [18446743895.089132] hpet_broadcast_resume:700: i:0, __hpet_setup_msi_irq ret: 0
> >> (XEN) [18446743895.089168] hpet_broadcast_resume:710: i:0, cfg: 0xc134, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xf18
> > 
> > Okay, this would appear to clarify that the address really isn't correct. Yet I'm
> > confused now by the low half values: In your earlier log there was
> > 
> > hpet_broadcast_resume:710: i:0, cfg: 0xc134, HPET_Tn_ROUTE(hpet_events[i].idx): 0x110
> > 
> > and alike, i.e. clearly a non-zero value. Now all low halves are zero. I'll try
> > to figure how the logged values here could result, but consistent data (or an
> > explantation for the apparent inconsistency) would help.
> 
> Could you give the patch below a try?
> 
> Jan
> 
> x86/HPET: channel handling in hpet_broadcast_resume()
> 
> The per-channel ENABLE bit is to solely be driven by hpet_enable_channel()
> and hpet_msi_{,un}mask(). It doesn't need setting immediately. Except for
> the (possible) channel put in legacy mode we don't do so during boot
> either.
> 
> Instead reset ->arch.cpu_mask, to avoid msi_compose_msg() yielding an
> all-zero message (when the passed in CPU mask has no online CPUs). Nothing
> would later call msi_compose_msg() / hpet_msi_write(), and hence nothing
> would later produce a well-formed message template in
> hpet_events[].msi.msg.
> 
> Fixes: 15aa6c67486c ("amd iommu: use base platform MSI implementation")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This appears to fix the IOMMU faults.
Started with no-arat, the debug output is now this:

(XEN) [18446743900.509455] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743900.509470] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
(XEN) [18446743900.509480] _disable_pit_irq:2662: init: 0
(XEN) [18446743900.509491] hpet_broadcast_resume:662: hpet_events: ffff830461b3f080
(XEN) [18446743900.509636] hpet_broadcast_resume:673: num_hpets_used: 8
(XEN) [18446743900.509644] hpet_broadcast_resume:691: cfg: 0x1
(XEN) [18446743900.509656] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
(XEN) [18446743900.509687] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509698] hpet_broadcast_resume:705: i:0, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.509728] hpet_broadcast_resume:715: i:0, cfg: 0xc130, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f18
(XEN) [18446743900.509739] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
(XEN) [18446743900.509762] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509772] hpet_broadcast_resume:705: i:1, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.509803] hpet_broadcast_resume:715: i:1, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f38
(XEN) [18446743900.509814] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
(XEN) [18446743900.509838] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509848] hpet_broadcast_resume:705: i:2, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.509877] hpet_broadcast_resume:715: i:2, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f58
(XEN) [18446743900.509888] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
(XEN) [18446743900.509912] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509922] hpet_broadcast_resume:705: i:3, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.509952] hpet_broadcast_resume:715: i:3, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f78
(XEN) [18446743900.509963] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
(XEN) [18446743900.509987] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.509997] hpet_broadcast_resume:705: i:4, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.510027] hpet_broadcast_resume:715: i:4, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f98
(XEN) [18446743900.510038] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
(XEN) [18446743900.510062] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510072] hpet_broadcast_resume:705: i:5, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.510102] hpet_broadcast_resume:715: i:5, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fb8
(XEN) [18446743900.510113] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
(XEN) [18446743900.510138] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510149] hpet_broadcast_resume:705: i:6, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.510179] hpet_broadcast_resume:715: i:6, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fd8
(XEN) [18446743900.510191] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
(XEN) [18446743900.510214] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
(XEN) [18446743900.510224] hpet_broadcast_resume:705: i:7, __hpet_setup_msi_irq ret: 0
(XEN) [18446743900.510253] hpet_broadcast_resume:715: i:7, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00ff8


> ---
> As to the Fixes: tag: The issue for the HPET resume case is the
> cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) check in
> msi_compose_msg(). The earlier cpumask_empty() wasn't a problem, as
> cpu_mask_to_apicid() returning a bogus (offline) value didn't have any bad
> effect: Before use, a valid destination would have been put in place, but
> other parts of .msg were properly set up. Furthermore we also didn't clear
> the entire message prior to that change.
> 
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -685,12 +685,18 @@ void hpet_broadcast_resume(void)
>      for ( i = 0; i < n; i++ )
>      {
>          if ( hpet_events[i].msi.irq >= 0 )
> +        {
> +            struct irq_desc *desc = irq_to_desc(hpet_events[i].msi.irq);
> +
> +            cpumask_copy(desc->arch.cpu_mask, cpumask_of(smp_processor_id()));
> +
>              __hpet_setup_msi_irq(irq_to_desc(hpet_events[i].msi.irq));
> +        }
>  
>          /* set HPET Tn as oneshot */
>          cfg = hpet_read32(HPET_Tn_CFG(hpet_events[i].idx));
>          cfg &= ~(HPET_TN_LEVEL | HPET_TN_PERIODIC);
> -        cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
> +        cfg |= HPET_TN_32BIT;
>          if ( !(hpet_events[i].flags & HPET_EVT_LEGACY) )
>              cfg |= HPET_TN_FSB;
>          hpet_write32(cfg, HPET_Tn_CFG(hpet_events[i].idx));
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-07 11:34                           ` Marek Marczykowski-Górecki
@ 2026-04-07 11:52                             ` Jan Beulich
  2026-04-07 11:56                               ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2026-04-07 11:52 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel

On 07.04.2026 13:34, Marek Marczykowski-Górecki wrote:
> On Tue, Apr 07, 2026 at 12:23:16PM +0200, Jan Beulich wrote:
>> x86/HPET: channel handling in hpet_broadcast_resume()
>>
>> The per-channel ENABLE bit is to solely be driven by hpet_enable_channel()
>> and hpet_msi_{,un}mask(). It doesn't need setting immediately. Except for
>> the (possible) channel put in legacy mode we don't do so during boot
>> either.
>>
>> Instead reset ->arch.cpu_mask, to avoid msi_compose_msg() yielding an
>> all-zero message (when the passed in CPU mask has no online CPUs). Nothing
>> would later call msi_compose_msg() / hpet_msi_write(), and hence nothing
>> would later produce a well-formed message template in
>> hpet_events[].msi.msg.
>>
>> Fixes: 15aa6c67486c ("amd iommu: use base platform MSI implementation")
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> This appears to fix the IOMMU faults.
> Started with no-arat, the debug output is now this:

Same question here: May I translate this to Tested-by:?

Jan

> (XEN) [18446743900.509455] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
> (XEN) [18446743900.509470] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
> (XEN) [18446743900.509480] _disable_pit_irq:2662: init: 0
> (XEN) [18446743900.509491] hpet_broadcast_resume:662: hpet_events: ffff830461b3f080
> (XEN) [18446743900.509636] hpet_broadcast_resume:673: num_hpets_used: 8
> (XEN) [18446743900.509644] hpet_broadcast_resume:691: cfg: 0x1
> (XEN) [18446743900.509656] hpet_broadcast_resume:696: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
> (XEN) [18446743900.509687] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.509698] hpet_broadcast_resume:705: i:0, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.509728] hpet_broadcast_resume:715: i:0, cfg: 0xc130, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f18
> (XEN) [18446743900.509739] hpet_broadcast_resume:696: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
> (XEN) [18446743900.509762] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.509772] hpet_broadcast_resume:705: i:1, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.509803] hpet_broadcast_resume:715: i:1, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f38
> (XEN) [18446743900.509814] hpet_broadcast_resume:696: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
> (XEN) [18446743900.509838] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.509848] hpet_broadcast_resume:705: i:2, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.509877] hpet_broadcast_resume:715: i:2, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f58
> (XEN) [18446743900.509888] hpet_broadcast_resume:696: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
> (XEN) [18446743900.509912] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.509922] hpet_broadcast_resume:705: i:3, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.509952] hpet_broadcast_resume:715: i:3, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f78
> (XEN) [18446743900.509963] hpet_broadcast_resume:696: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
> (XEN) [18446743900.509987] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.509997] hpet_broadcast_resume:705: i:4, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.510027] hpet_broadcast_resume:715: i:4, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00f98
> (XEN) [18446743900.510038] hpet_broadcast_resume:696: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
> (XEN) [18446743900.510062] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.510072] hpet_broadcast_resume:705: i:5, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.510102] hpet_broadcast_resume:715: i:5, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fb8
> (XEN) [18446743900.510113] hpet_broadcast_resume:696: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
> (XEN) [18446743900.510138] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.510149] hpet_broadcast_resume:705: i:6, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.510179] hpet_broadcast_resume:715: i:6, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00fd8
> (XEN) [18446743900.510191] hpet_broadcast_resume:696: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
> (XEN) [18446743900.510214] hpet_msi_write:286: iommu_update_ire_from_msi rc: 0
> (XEN) [18446743900.510224] hpet_broadcast_resume:705: i:7, __hpet_setup_msi_irq ret: 0
> (XEN) [18446743900.510253] hpet_broadcast_resume:715: i:7, cfg: 0xc100, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx)): 0, hpet_read32(HPET_Tn_ROUTE(hpet_events[i].idx) + 4): 0xfee00ff8


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: IOMMU faults after S3
  2026-04-07 11:52                             ` Jan Beulich
@ 2026-04-07 11:56                               ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 32+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-07 11:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

On Tue, Apr 07, 2026 at 01:52:18PM +0200, Jan Beulich wrote:
> On 07.04.2026 13:34, Marek Marczykowski-Górecki wrote:
> > On Tue, Apr 07, 2026 at 12:23:16PM +0200, Jan Beulich wrote:
> >> x86/HPET: channel handling in hpet_broadcast_resume()
> >>
> >> The per-channel ENABLE bit is to solely be driven by hpet_enable_channel()
> >> and hpet_msi_{,un}mask(). It doesn't need setting immediately. Except for
> >> the (possible) channel put in legacy mode we don't do so during boot
> >> either.
> >>
> >> Instead reset ->arch.cpu_mask, to avoid msi_compose_msg() yielding an
> >> all-zero message (when the passed in CPU mask has no online CPUs). Nothing
> >> would later call msi_compose_msg() / hpet_msi_write(), and hence nothing
> >> would later produce a well-formed message template in
> >> hpet_events[].msi.msg.
> >>
> >> Fixes: 15aa6c67486c ("amd iommu: use base platform MSI implementation")
> >> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > This appears to fix the IOMMU faults.
> > Started with no-arat, the debug output is now this:
> 
> Same question here: May I translate this to Tested-by:?

Yes.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-04-07 11:56 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 10:19 IOMMU faults after S3 Marek Marczykowski-Górecki
2026-03-27 10:56 ` Teddy Astie
2026-03-27 10:59   ` Marek Marczykowski-Górecki
2026-03-27 12:23 ` Andrew Cooper
2026-04-01  7:14 ` Jan Beulich
2026-04-01  7:20   ` Andrew Cooper
2026-04-01  8:11     ` Jan Beulich
2026-04-01 20:30       ` Marek Marczykowski-Górecki
2026-04-02  6:55         ` Jan Beulich
2026-04-01  8:52   ` Jan Beulich
2026-04-01 23:17     ` Marek Marczykowski-Górecki
2026-04-02  7:01       ` Jan Beulich
2026-04-02  8:08         ` Marek Marczykowski-Górecki
2026-04-02  8:39           ` Jan Beulich
2026-04-02  8:47             ` Jan Beulich
2026-04-02  9:42               ` Marek Marczykowski-Górecki
2026-04-02 10:23                 ` Jan Beulich
2026-04-02 14:02                   ` Marek Marczykowski-Górecki
2026-04-02 14:23                     ` Jan Beulich
2026-04-07  6:48                     ` Jan Beulich
2026-04-02  9:35             ` Marek Marczykowski-Górecki
2026-04-02 10:48               ` Jan Beulich
2026-04-02 14:47                 ` Marek Marczykowski-Górecki
2026-04-02 14:53                   ` Jan Beulich
2026-04-02 23:06                     ` Marek Marczykowski-Górecki
2026-04-07  6:29                       ` Jan Beulich
2026-04-07 10:02                         ` Marek Marczykowski-Górecki
2026-04-07 10:23                         ` Jan Beulich
2026-04-07 11:34                           ` Marek Marczykowski-Górecki
2026-04-07 11:52                             ` Jan Beulich
2026-04-07 11:56                               ` Marek Marczykowski-Górecki
2026-04-01  8:58   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.