From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: IOMMU faults after S3
Date: Thu, 2 Apr 2026 10:08:42 +0200 [thread overview]
Message-ID: <ac4kCq87SQSc6ddV@mail-itl> (raw)
In-Reply-To: <933a3e95-33d2-4e20-a4d5-2d8b20c2da7f@suse.com>
[-- Attachment #1: Type: text/plain, Size: 17447 bytes --]
On Thu, Apr 02, 2026 at 09:01:12AM +0200, Jan Beulich wrote:
> On 02.04.2026 01:17, Marek Marczykowski-Górecki wrote:
> > On Wed, Apr 01, 2026 at 10:52:37AM +0200, Jan Beulich wrote:
> >> On 01.04.2026 09:14, Jan Beulich wrote:
> >>> On 27.03.2026 11:19, Marek Marczykowski-Górecki wrote:
> >>>> I noticed that on some systems, there are a lot of IOMMU faults after
> >>>> S3. I can see it also on a laptop with MTL, but it affects also the ADL
> >>>> gitlab runner:
> >>>>
> >>>> https://gitlab.com/xen-project/hardware/xen/-/jobs/13661033722
> >>>> (XEN) [ 37.201160] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>> (XEN) [ 37.201164] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>> (XEN) [ 37.202332] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>>> (XEN) [ 37.202339] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>>
> >>>> Interestingly, the 0000:00:1e.6 device is not even listed by lspci.
> >>>>
> >>>> The issue is present only on staging, not staging-4.21.
> >>>>
> >>>> Bisect says:
> >>>>
> >>>> 5ec93b2f19ff8873fca65d38c1164b0a56d3898b is the first bad commit
> >>>> commit 5ec93b2f19ff8873fca65d38c1164b0a56d3898b
> >>>> Author: Jan Beulich <jbeulich@suse.com>
> >>>> Date: Thu Jan 22 14:13:35 2026 +0100
> >>>>
> >>>> x86/HPET: drop .set_affinity hook
> >>>
> >>> Looking into this, I find several things I can't quite understand (yet).
> >>> First there is
> >>>
> >>> (XEN) [000000456c0fe39f] Disabling HPET for being unreliable
> >>>
> >>> which looks to only affect clocksource selection, but not use as
> >>> broadcast source for CPU-idle management. (This may be an independent
> >>> issue.)
> >>>
> >>> Then there is
> >>>
> >>> (XEN) [ 2.760248] HPET: 8 timers usable for broadcast (8 total)
> >>>
> >>> which should only occur on ARAT-incapable systems. That should only be
> >>> older hardware. (On my much older Skylake I don't see this line, for
> >>> example.) What does CPUID leaf 6 have on this system? Sadly xen-cpuid
> >>> is purely featureset based, and hence doesn't expose info about that
> >>> leaf. The leaf also isn't exposed to domains, so CPUID output in Dom0
> >>> isn't useful to look at either. It would need to be CPUID output on a
> >>> bare metal kernel.
> >>>
> >>> Further I suspect the fingered commit may only have uncovered an issue
> >>> elsewhere. I don't think we clear any context table entries during
> >>> suspend or resume. Hence in
> >>>
> >>> (XEN) [ 20.554813] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> >>> (XEN) [ 20.554819] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
> >>>
> >>> the latter message is confusing me.
> >>>
> >>> The fault address being zero may, otoh, be a hint of hpet_msi_write()
> >>> never having run post-resume. Which may be the connection to the
> >>> dropping of hpet_msi_set_affinity(), as that did call that function.
> >>
> >> There clearly is an issue with the handling of the max_cstate variable,
> >> but I expect you don't use xenpm to limit usable C-states (there clearly
> >> is no respective command line option in the log you referenced)?
> >
> > No, I don't think so.
> >
> >> From what the log has, I conclude hpet_broadcast_resume() is called.
> >
> > I don't think so... I applied changes as attached and got this on
> > resume:
> >
> > (XEN) [ 69.486120] Enabling non-boot CPUs ...
> > (XEN) [ 69.486404] mwait-idle: state C1 is disabled
> > (XEN) [ 69.587869] mwait-idle: state C1 is disabled
> > (XEN) [ 69.588008] mwait-idle: state C1 is disabled
> > (XEN) [ 69.689438] mwait-idle: state C1 is disabled
> > (XEN) [ 69.689608] mwait-idle: state C1 is disabled
> > (XEN) [ 69.791066] mwait-idle: state C1 is disabled
> > (XEN) [ 69.791334] mwait-idle: state C1 is disabled
> > (XEN) [ 69.892938] mwait-idle: state C1 is disabled
> > (XEN) [ 69.893209] mwait-idle: state C1 is disabled
> > (XEN) [ 69.994890] mwait-idle: state C1 is disabled
> > (XEN) [ 69.995096] mwait-idle: state C1 is disabled
> > (XEN) [ 70.096638] mwait-idle: state C1 is disabled
> > (XEN) [ 70.096915] mwait-idle: state C1 is disabled
> > (XEN) [ 70.097093] mwait-idle: state C1 is disabled
> > (XEN) [ 70.097272] mwait-idle: state C1 is disabled
> > (XEN) [ 70.203357] [VT-D]DMAR:[DMA Write] Request device [0000:00:1e.6] fault addr 0
> > (XEN) [ 70.203363] [VT-D]DMAR: reason 02 - Present bit in context entry is clear
>
> That was on the serial console or from xl dmesg? I ask because console_resume()
> runs after time_resume(), so nothing appearing on the serial console would be
> expected (I think).
Ah, right, that's why I don't see my messages.
The xl dmesg output (from MTL this time):
(XEN) [ 123.477511] Entering ACPI S3 state.
(XEN) [18446743903.571842] _disable_pit_irq:2649: using_pit: 0, cpu_has_apic: 1
(XEN) [18446743903.571856] _disable_pit_irq:2659: cpuidle_using_deep_cstate: 1, boot_cpu_has(X86_FEATURE_XEN_ARAT): 0
(XEN) [18446743903.571866] _disable_pit_irq:2662: init: 0
(XEN) [18446743903.571877] hpet_broadcast_resume:661: hpet_events: ffff83046bc1f080
(XEN) [18446743903.572020] hpet_broadcast_resume:672: num_hpets_used: 8
(XEN) [18446743903.572029] hpet_broadcast_resume:690: cfg: 0x1
(XEN) [18446743903.572040] hpet_broadcast_resume:695: i:0, hpet_events[i].msi.irq: 122, hpet_events[i].flags: 0
(XEN) [18446743903.572081] hpet_broadcast_resume:706: i:0, cfg: 0xc134
(XEN) [18446743903.572089] hpet_broadcast_resume:695: i:1, hpet_events[i].msi.irq: 123, hpet_events[i].flags: 0
(XEN) [18446743903.572123] hpet_broadcast_resume:706: i:1, cfg: 0xc104
(XEN) [18446743903.572132] hpet_broadcast_resume:695: i:2, hpet_events[i].msi.irq: 124, hpet_events[i].flags: 0
(XEN) [18446743903.572167] hpet_broadcast_resume:706: i:2, cfg: 0xc104
(XEN) [18446743903.572175] hpet_broadcast_resume:695: i:3, hpet_events[i].msi.irq: 125, hpet_events[i].flags: 0
(XEN) [18446743903.572210] hpet_broadcast_resume:706: i:3, cfg: 0xc104
(XEN) [18446743903.572218] hpet_broadcast_resume:695: i:4, hpet_events[i].msi.irq: 126, hpet_events[i].flags: 0
(XEN) [18446743903.572252] hpet_broadcast_resume:706: i:4, cfg: 0xc104
(XEN) [18446743903.572261] hpet_broadcast_resume:695: i:5, hpet_events[i].msi.irq: 127, hpet_events[i].flags: 0
(XEN) [18446743903.572294] hpet_broadcast_resume:706: i:5, cfg: 0xc104
(XEN) [18446743903.572303] hpet_broadcast_resume:695: i:6, hpet_events[i].msi.irq: 128, hpet_events[i].flags: 0
(XEN) [18446743903.572338] hpet_broadcast_resume:706: i:6, cfg: 0xc104
(XEN) [18446743903.572347] hpet_broadcast_resume:695: i:7, hpet_events[i].msi.irq: 129, hpet_events[i].flags: 0
(XEN) [18446743903.572382] hpet_broadcast_resume:706: i:7, cfg: 0xc104
And the xen-cpuid -p output from this system:
Xen reports there are maximum 120 leaves and 2 MSRs
Raw policy: 48 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 00000023:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000005:ffffffff -> 00000040:00000040:00000003:11112020
00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
00000007:00000001 -> 40400910:00000001:00000000:00040000
00000007:00000002 -> 00000000:00000000:00000000:0000003f
0000000a:ffffffff -> 07300805:00000000:00000007:00008603
0000000b:00000000 -> 00000001:00000002:00000100:00000020
0000000b:00000001 -> 00000007:00000016:00000201:00000020
0000000d:00000000 -> 00000207:00000000:00000a88:00000000
0000000d:00000001 -> 0000000f:00000000:00019900:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
0000000d:00000008 -> 00000080:00000000:00000001:00000000
0000000d:00000009 -> 00000008:00000a80:00000000:00000000
0000000d:0000000b -> 00000010:00000000:00000001:00000000
0000000d:0000000c -> 00000018:00000000:00000001:00000000
0000000d:0000000f -> 00000328:00000000:00000001:00000000
0000000d:00000010 -> 00000008:00000000:00000001:00000000
80000000:ffffffff -> 80000008:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000121:2c100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000007:ffffffff -> 00000000:00000000:00000000:00000100
80000008:ffffffff -> 0000302e:00000000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 000000000d89fd6b
Host policy: 41 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:20800800:77fafbff:bfebfbff
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000005:ffffffff -> 00000040:00000040:00000003:11112020
00000006:ffffffff -> 00dfcff7:00000002:00000409:00040003
00000007:00000000 -> 00000002:239c27eb:994007ac:fc18c410
00000007:00000001 -> 40000910:00000001:00000000:00040000
00000007:00000002 -> 00000000:00000000:00000000:0000003f
0000000b:00000000 -> 00000001:00000002:00000100:00000020
0000000b:00000001 -> 00000007:00000016:00000201:00000020
0000000d:00000000 -> 00000207:00000000:00000a88:00000000
0000000d:00000001 -> 0000000f:00000000:00000000:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
0000000d:00000009 -> 00000008:00000a80:00000000:00000000
80000000:ffffffff -> 80000008:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000121:2c100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000007:ffffffff -> 00000000:00000000:00000000:00000100
80000008:ffffffff -> 0000302e:00000000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 400000000d89fd6b
PV Max policy: 58 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:00800800:f6f83203:1fc9cbf5
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000007:00000000 -> 00000002:218c0329:18400700:ac004410
00000007:00000001 -> 00000810:00000000:00000000:00000000
00000007:00000002 -> 00000000:00000000:00000000:00000021
0000000d:00000000 -> 00000007:00000000:00000340:00000000
0000000d:00000001 -> 00000007:00000000:00000000:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
80000000:ffffffff -> 80000021:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000123:28100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000007:ffffffff -> 00000000:00000000:00000000:00000100
80000008:ffffffff -> 0000302e:00001000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 400000001d0ae167
HVM Max policy: 65 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:00800800:f7fa3223:1fcbfbff
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000007:00000000 -> 00000002:219c07ab:9840070c:bc004410
00000007:00000001 -> 00000810:00000000:00000000:00000000
00000007:00000002 -> 00000000:00000000:00000000:00000037
0000000d:00000000 -> 00000207:00000000:00000a88:00000000
0000000d:00000001 -> 0000000f:00000000:00000000:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
0000000d:00000009 -> 00000008:00000a80:00000000:00000000
80000000:ffffffff -> 80000021:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000123:2c100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000007:ffffffff -> 00000000:00000000:00000000:00000100
80000008:ffffffff -> 0000302e:00101000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 400000001d0ae167
PV Default policy: 33 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:00800800:f6d83203:1fc9cbf5
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000007:00000000 -> 00000002:218c0329:00400700:ac004410
00000007:00000001 -> 00000810:00000000:00000000:00000000
00000007:00000002 -> 00000000:00000000:00000000:00000021
0000000d:00000000 -> 00000007:00000000:00000340:00000000
0000000d:00000001 -> 00000007:00000000:00000000:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
80000000:ffffffff -> 80000008:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000121:28100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000008:ffffffff -> 0000302e:00001000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 400000000d08e163
HVM Default policy: 40 leaves, 2 MSRs
CPUID:
leaf subleaf -> eax ebx ecx edx
00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
00000001:ffffffff -> 000a06a4:00800800:f7fa3203:1fcbfbff
00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
00000004:00000000 -> fc004121:02c0003f:0000003f:00000000
00000004:00000001 -> fc004122:03c0003f:0000003f:00000000
00000004:00000002 -> fc01c143:03c0003f:000007ff:00000000
00000004:00000003 -> fc0fc163:02c0003f:00007fff:00000004
00000007:00000000 -> 00000002:219c07ab:8040070c:bc004410
00000007:00000001 -> 00000810:00000000:00000000:00000000
00000007:00000002 -> 00000000:00000000:00000000:00000037
0000000d:00000000 -> 00000207:00000000:00000a88:00000000
0000000d:00000001 -> 0000000f:00000000:00000000:00000000
0000000d:00000002 -> 00000100:00000240:00000000:00000000
0000000d:00000009 -> 00000008:00000a80:00000000:00000000
80000000:ffffffff -> 80000008:00000000:00000000:00000000
80000001:ffffffff -> 00000000:00000000:00000121:2c100800
80000002:ffffffff -> 65746e49:2952286c:726f4320:4d542865
80000003:ffffffff -> 6c552029:20617274:35312037:00004835
80000006:ffffffff -> 00000000:00000000:08007040:00000000
80000008:ffffffff -> 0000302e:00101000:00000000:00000000
MSRs:
index -> value
000000ce -> 0000000080000000
0000010a -> 400000000d08e163
> Without hpet_broadcast_resume() running, I don't think I could explain how the
> channels (and their FSB interrupts) would get enabled.
>
> Jan
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2026-04-02 8:09 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 10:19 IOMMU faults after S3 Marek Marczykowski-Górecki
2026-03-27 10:56 ` Teddy Astie
2026-03-27 10:59 ` Marek Marczykowski-Górecki
2026-03-27 12:23 ` Andrew Cooper
2026-04-01 7:14 ` Jan Beulich
2026-04-01 7:20 ` Andrew Cooper
2026-04-01 8:11 ` Jan Beulich
2026-04-01 20:30 ` Marek Marczykowski-Górecki
2026-04-02 6:55 ` Jan Beulich
2026-04-01 8:52 ` Jan Beulich
2026-04-01 23:17 ` Marek Marczykowski-Górecki
2026-04-02 7:01 ` Jan Beulich
2026-04-02 8:08 ` Marek Marczykowski-Górecki [this message]
2026-04-02 8:39 ` Jan Beulich
2026-04-02 8:47 ` Jan Beulich
2026-04-02 9:42 ` Marek Marczykowski-Górecki
2026-04-02 10:23 ` Jan Beulich
2026-04-02 14:02 ` Marek Marczykowski-Górecki
2026-04-02 14:23 ` Jan Beulich
2026-04-07 6:48 ` Jan Beulich
2026-04-02 9:35 ` Marek Marczykowski-Górecki
2026-04-02 10:48 ` Jan Beulich
2026-04-02 14:47 ` Marek Marczykowski-Górecki
2026-04-02 14:53 ` Jan Beulich
2026-04-02 23:06 ` Marek Marczykowski-Górecki
2026-04-07 6:29 ` Jan Beulich
2026-04-07 10:02 ` Marek Marczykowski-Górecki
2026-04-07 10:23 ` Jan Beulich
2026-04-07 11:34 ` Marek Marczykowski-Górecki
2026-04-07 11:52 ` Jan Beulich
2026-04-07 11:56 ` Marek Marczykowski-Górecki
2026-04-01 8:58 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac4kCq87SQSc6ddV@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=jbeulich@suse.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.