All of lore.kernel.org
 help / color / mirror / Atom feed
* S3 regression related to XSA-471 patches
@ 2025-08-06 10:23 Marek Marczykowski-Górecki
  2025-08-06 10:36 ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-08-06 10:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

Hi,

We've got several reports that S3 reliability recently regressed. We
identified it's definitely related to XSA-471 patches, and bisection
points at "x86/idle: Remove broken MWAIT implementation". I don't have
reliable reproduction steps, so I'm not 100% sure if it's really this
patch, or maybe an earlier one - but it's definitely already broken at
this point in the series. Most reports are about Xen 4.17 (as that's
what stable Qubes OS version currently use), but I think I've seen
somebody reporting the issue on 4.19 too (but I don't have clear
evidence, especially if it's the same issue).

The problem manifests in system freezing on S3 resume. Sometimes it
manages to show the screenlocker password prompt, and sometimes one can
interact with it for a second or two. But then it freezes, mouse stops
moving etc (but no reboot).
One time I managed to get pass the screenlocker and interact with dom0
for a few minutes before it frozen. Resuming domUs didn't happen (the
qubes-specific script doing so resume hanged), and also no logs
persisted on the disk from this case (on disk it looked like it never
resumed). Generally it looked like some CPUs were stuck.

It appears to be more likely to hit the issue if some domUs are active
at the suspend/resume time. While Qubes OS does suspend (not just pause)
them for the host S3 time, some activity before/after does appear to
matter. My test case that has ~30-40% reproduction rate involves several
firefox instances playing youtube videos.

I've talked with Andrew about it a bit, with not much conclusions.
Initial reports mentioned only MTL and RPL systems, so we focused on
something related to weird topology. But just today I've got a report
of the same happening on KBL too...

Another observation (possibly invalidated by today's report...) is that
all reports were about systems running Coreboot (but not only Dasharo
flavor - at least one was Star Labs). 

Most reports are collected at https://github.com/QubesOS/qubes-issues/issues/10110

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-22  0:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 10:23 S3 regression related to XSA-471 patches Marek Marczykowski-Górecki
2025-08-06 10:36 ` Jan Beulich
2025-08-06 10:46   ` Marek Marczykowski-Górecki
2025-08-07 17:29     ` Marek Marczykowski-Górecki
2025-08-11 13:16       ` Andrew Cooper
2025-08-13  2:53         ` Marek Marczykowski-Górecki
2025-08-13  7:26           ` Roger Pau Monné
2025-08-22  0:53             ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.