From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: xen-devel <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: S3 regression related to XSA-471 patches
Date: Wed, 6 Aug 2025 12:23:56 +0200 [thread overview]
Message-ID: <aJMtPLNqQFbGg5cs@mail-itl> (raw)
[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]
Hi,
We've got several reports that S3 reliability recently regressed. We
identified it's definitely related to XSA-471 patches, and bisection
points at "x86/idle: Remove broken MWAIT implementation". I don't have
reliable reproduction steps, so I'm not 100% sure if it's really this
patch, or maybe an earlier one - but it's definitely already broken at
this point in the series. Most reports are about Xen 4.17 (as that's
what stable Qubes OS version currently use), but I think I've seen
somebody reporting the issue on 4.19 too (but I don't have clear
evidence, especially if it's the same issue).
The problem manifests in system freezing on S3 resume. Sometimes it
manages to show the screenlocker password prompt, and sometimes one can
interact with it for a second or two. But then it freezes, mouse stops
moving etc (but no reboot).
One time I managed to get pass the screenlocker and interact with dom0
for a few minutes before it frozen. Resuming domUs didn't happen (the
qubes-specific script doing so resume hanged), and also no logs
persisted on the disk from this case (on disk it looked like it never
resumed). Generally it looked like some CPUs were stuck.
It appears to be more likely to hit the issue if some domUs are active
at the suspend/resume time. While Qubes OS does suspend (not just pause)
them for the host S3 time, some activity before/after does appear to
matter. My test case that has ~30-40% reproduction rate involves several
firefox instances playing youtube videos.
I've talked with Andrew about it a bit, with not much conclusions.
Initial reports mentioned only MTL and RPL systems, so we focused on
something related to weird topology. But just today I've got a report
of the same happening on KBL too...
Another observation (possibly invalidated by today's report...) is that
all reports were about systems running Coreboot (but not only Dasharo
flavor - at least one was Star Labs).
Most reports are collected at https://github.com/QubesOS/qubes-issues/issues/10110
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next reply other threads:[~2025-08-06 10:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 10:23 Marek Marczykowski-Górecki [this message]
2025-08-06 10:36 ` S3 regression related to XSA-471 patches Jan Beulich
2025-08-06 10:46 ` Marek Marczykowski-Górecki
2025-08-07 17:29 ` Marek Marczykowski-Górecki
2025-08-11 13:16 ` Andrew Cooper
2025-08-13 2:53 ` Marek Marczykowski-Górecki
2025-08-13 7:26 ` Roger Pau Monné
2025-08-22 0:53 ` Marek Marczykowski-Górecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJMtPLNqQFbGg5cs@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=andrew.cooper3@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.