public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
       [not found] <20251119224731.61497-1-jason.andryuk@amd.com>
@ 2025-11-19 22:47 ` Jason Andryuk
  2025-11-25  8:20   ` Jürgen Groß
                     ` (2 more replies)
  2025-11-19 22:47 ` [PATCH 2/2] xenbus: Rename helpers to freeze/thaw/restore Jason Andryuk
  1 sibling, 3 replies; 12+ messages in thread
From: Jason Andryuk @ 2025-11-19 22:47 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Yann Sionneau, Jason Andryuk, xen-devel, linux-kernel

The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
from s3 or s2idle disconnects its PV devices during resume.  The
backends are not expecting this and do not reconnect.

b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.

.freeze/restore are used with hiberation where Linux restarts in a new
place in the future.  .suspend/resume are useful for runtime power
management for the duration of a boot.

The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.

Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook.  This matches the use in drivers/xen/manage.c for
save/restore and live migration.  With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe..199917b6f77c 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
 }
 
 static const struct dev_pm_ops xenbus_pm_ops = {
-	.suspend	= xenbus_dev_suspend,
-	.resume		= xenbus_frontend_dev_resume,
 	.freeze		= xenbus_dev_suspend,
 	.thaw		= xenbus_dev_cancel,
-	.restore	= xenbus_dev_resume,
+	.restore	= xenbus_frontend_dev_resume,
 };
 
 static struct xen_bus_type xenbus_frontend = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] xenbus: Rename helpers to freeze/thaw/restore
       [not found] <20251119224731.61497-1-jason.andryuk@amd.com>
  2025-11-19 22:47 ` [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices Jason Andryuk
@ 2025-11-19 22:47 ` Jason Andryuk
  2025-11-25  9:17   ` Jürgen Groß
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Andryuk @ 2025-11-19 22:47 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Yann Sionneau, Jason Andryuk, xen-devel, linux-kernel

Rename the xenbus helpers called from the .freeze, .thaw, and .restore
pm ops to have matching names.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
We could continue renaming and change struct xenbus_driver
.suspend/.resume to .freeze/.restore.  Thereis only the single callback,
and it would be more churn, so I did not do it.
---
 drivers/xen/xenbus/xenbus.h                |  6 +++---
 drivers/xen/xenbus/xenbus_probe.c          | 22 +++++++++++-----------
 drivers/xen/xenbus/xenbus_probe_frontend.c | 16 ++++++++--------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
index 9ac0427724a3..daba7f5e05c4 100644
--- a/drivers/xen/xenbus/xenbus.h
+++ b/drivers/xen/xenbus/xenbus.h
@@ -120,9 +120,9 @@ int xenbus_probe_devices(struct xen_bus_type *bus);
 
 void xenbus_dev_changed(const char *node, struct xen_bus_type *bus);
 
-int xenbus_dev_suspend(struct device *dev);
-int xenbus_dev_resume(struct device *dev);
-int xenbus_dev_cancel(struct device *dev);
+int xenbus_dev_freeze(struct device *dev);
+int xenbus_dev_restore(struct device *dev);
+int xenbus_dev_thaw(struct device *dev);
 
 void xenbus_otherend_changed(struct xenbus_watch *watch,
 			     const char *path, const char *token,
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 86fe6e779056..9f9011cd7447 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -668,7 +668,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_changed);
 
-int xenbus_dev_suspend(struct device *dev)
+int xenbus_dev_freeze(struct device *dev)
 {
 	int err = 0;
 	struct xenbus_driver *drv;
@@ -683,12 +683,12 @@ int xenbus_dev_suspend(struct device *dev)
 	if (drv->suspend)
 		err = drv->suspend(xdev);
 	if (err)
-		dev_warn(dev, "suspend failed: %i\n", err);
+		dev_warn(dev, "freeze failed: %i\n", err);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(xenbus_dev_suspend);
+EXPORT_SYMBOL_GPL(xenbus_dev_freeze);
 
-int xenbus_dev_resume(struct device *dev)
+int xenbus_dev_restore(struct device *dev)
 {
 	int err;
 	struct xenbus_driver *drv;
@@ -702,7 +702,7 @@ int xenbus_dev_resume(struct device *dev)
 	drv = to_xenbus_driver(dev->driver);
 	err = talk_to_otherend(xdev);
 	if (err) {
-		dev_warn(dev, "resume (talk_to_otherend) failed: %i\n", err);
+		dev_warn(dev, "restore (talk_to_otherend) failed: %i\n", err);
 		return err;
 	}
 
@@ -711,28 +711,28 @@ int xenbus_dev_resume(struct device *dev)
 	if (drv->resume) {
 		err = drv->resume(xdev);
 		if (err) {
-			dev_warn(dev, "resume failed: %i\n", err);
+			dev_warn(dev, "restore failed: %i\n", err);
 			return err;
 		}
 	}
 
 	err = watch_otherend(xdev);
 	if (err) {
-		dev_warn(dev, "resume (watch_otherend) failed: %d\n", err);
+		dev_warn(dev, "restore (watch_otherend) failed: %d\n", err);
 		return err;
 	}
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(xenbus_dev_resume);
+EXPORT_SYMBOL_GPL(xenbus_dev_restore);
 
-int xenbus_dev_cancel(struct device *dev)
+int xenbus_dev_thaw(struct device *dev)
 {
 	/* Do nothing */
-	DPRINTK("cancel");
+	DPRINTK("thaw");
 	return 0;
 }
-EXPORT_SYMBOL_GPL(xenbus_dev_cancel);
+EXPORT_SYMBOL_GPL(xenbus_dev_thaw);
 
 /* A flag to determine if xenstored is 'ready' (i.e. has started) */
 int xenstored_ready;
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 199917b6f77c..f04707d1f667 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -91,14 +91,14 @@ static void backend_changed(struct xenbus_watch *watch,
 	xenbus_otherend_changed(watch, path, token, 1);
 }
 
-static void xenbus_frontend_delayed_resume(struct work_struct *w)
+static void xenbus_frontend_delayed_restore(struct work_struct *w)
 {
 	struct xenbus_device *xdev = container_of(w, struct xenbus_device, work);
 
-	xenbus_dev_resume(&xdev->dev);
+	xenbus_dev_restore(&xdev->dev);
 }
 
-static int xenbus_frontend_dev_resume(struct device *dev)
+static int xenbus_frontend_dev_restore(struct device *dev)
 {
 	/*
 	 * If xenstored is running in this domain, we cannot access the backend
@@ -112,14 +112,14 @@ static int xenbus_frontend_dev_resume(struct device *dev)
 		return 0;
 	}
 
-	return xenbus_dev_resume(dev);
+	return xenbus_dev_restore(dev);
 }
 
 static int xenbus_frontend_dev_probe(struct device *dev)
 {
 	if (xen_store_domain_type == XS_LOCAL) {
 		struct xenbus_device *xdev = to_xenbus_device(dev);
-		INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume);
+		INIT_WORK(&xdev->work, xenbus_frontend_delayed_restore);
 	}
 
 	return xenbus_dev_probe(dev);
@@ -148,9 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
 }
 
 static const struct dev_pm_ops xenbus_pm_ops = {
-	.freeze		= xenbus_dev_suspend,
-	.thaw		= xenbus_dev_cancel,
-	.restore	= xenbus_frontend_dev_resume,
+	.freeze		= xenbus_dev_freeze,
+	.thaw		= xenbus_dev_thaw,
+	.restore	= xenbus_frontend_dev_restore,
 };
 
 static struct xen_bus_type xenbus_frontend = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-19 22:47 ` [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices Jason Andryuk
@ 2025-11-25  8:20   ` Jürgen Groß
  2025-11-25 10:47     ` Marek Marczykowski-Górecki
  2025-11-26 15:00   ` Yann Sionneau
  2025-11-30  2:03   ` Marek Marczykowski-Górecki
  2 siblings, 1 reply; 12+ messages in thread
From: Jürgen Groß @ 2025-11-25  8:20 UTC (permalink / raw)
  To: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko,
	Marek Marczykowski-Górecki
  Cc: Yann Sionneau, xen-devel, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 2302 bytes --]

On 19.11.25 23:47, Jason Andryuk wrote:
> The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
> from s3 or s2idle disconnects its PV devices during resume.  The
> backends are not expecting this and do not reconnect.
> 
> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> resume/chkpt") changed xen_suspend()/do_suspend() from
> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> suspend/resume callbacks remained.
> 
> .freeze/restore are used with hiberation where Linux restarts in a new
> place in the future.  .suspend/resume are useful for runtime power
> management for the duration of a boot.
> 
> The current behavior of the callbacks works for an xl save/restore or
> live migration where the domain is restored/migrated to a new location
> and connecting to a not-already-connected backend.
> 
> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> save/restore and live migration.  With .suspend/resume empty, PV devices
> are left connected during s2idle and s3, so PV devices are not changed
> and work after resume.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Acked-by: Juergen Gross <jgross@suse.com>

Marek, could you please give this patch a try with QubesOS? I think this
patch should be verified not to break your use cases regarding suspend /
resume.


Juergen

> ---
>   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
> index 6d1819269cbe..199917b6f77c 100644
> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>   }
>   
>   static const struct dev_pm_ops xenbus_pm_ops = {
> -	.suspend	= xenbus_dev_suspend,
> -	.resume		= xenbus_frontend_dev_resume,
>   	.freeze		= xenbus_dev_suspend,
>   	.thaw		= xenbus_dev_cancel,
> -	.restore	= xenbus_dev_resume,
> +	.restore	= xenbus_frontend_dev_resume,
>   };
>   
>   static struct xen_bus_type xenbus_frontend = {


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xenbus: Rename helpers to freeze/thaw/restore
  2025-11-19 22:47 ` [PATCH 2/2] xenbus: Rename helpers to freeze/thaw/restore Jason Andryuk
@ 2025-11-25  9:17   ` Jürgen Groß
  0 siblings, 0 replies; 12+ messages in thread
From: Jürgen Groß @ 2025-11-25  9:17 UTC (permalink / raw)
  To: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Yann Sionneau, xen-devel, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 270 bytes --]

On 19.11.25 23:47, Jason Andryuk wrote:
> Rename the xenbus helpers called from the .freeze, .thaw, and .restore
> pm ops to have matching names.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-25  8:20   ` Jürgen Groß
@ 2025-11-25 10:47     ` Marek Marczykowski-Górecki
  2025-11-30  2:56       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-11-25 10:47 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

On Tue, Nov 25, 2025 at 09:20:55AM +0100, Jürgen Groß wrote:
> On 19.11.25 23:47, Jason Andryuk wrote:
> > The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
> > from s3 or s2idle disconnects its PV devices during resume.  The
> > backends are not expecting this and do not reconnect.
> > 
> > b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> > resume/chkpt") changed xen_suspend()/do_suspend() from
> > PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> > suspend/resume callbacks remained.
> > 
> > .freeze/restore are used with hiberation where Linux restarts in a new
> > place in the future.  .suspend/resume are useful for runtime power
> > management for the duration of a boot.
> > 
> > The current behavior of the callbacks works for an xl save/restore or
> > live migration where the domain is restored/migrated to a new location
> > and connecting to a not-already-connected backend.
> > 
> > Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> > .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> > save/restore and live migration.  With .suspend/resume empty, PV devices
> > are left connected during s2idle and s3, so PV devices are not changed
> > and work after resume.
> > 
> > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> 
> Acked-by: Juergen Gross <jgross@suse.com>
> 
> Marek, could you please give this patch a try with QubesOS? I think this
> patch should be verified not to break your use cases regarding suspend /
> resume.

Sure, but I can't promise it will be this week, I have some deadlines to
meet...

> Juergen
> 
> > ---
> >   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
> >   1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
> > index 6d1819269cbe..199917b6f77c 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> > @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
> >   }
> >   static const struct dev_pm_ops xenbus_pm_ops = {
> > -	.suspend	= xenbus_dev_suspend,
> > -	.resume		= xenbus_frontend_dev_resume,
> >   	.freeze		= xenbus_dev_suspend,
> >   	.thaw		= xenbus_dev_cancel,
> > -	.restore	= xenbus_dev_resume,
> > +	.restore	= xenbus_frontend_dev_resume,
> >   };
> >   static struct xen_bus_type xenbus_frontend = {
> 






-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-19 22:47 ` [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices Jason Andryuk
  2025-11-25  8:20   ` Jürgen Groß
@ 2025-11-26 15:00   ` Yann Sionneau
  2025-12-01 17:15     ` Jason Andryuk
  2025-11-30  2:03   ` Marek Marczykowski-Górecki
  2 siblings, 1 reply; 12+ messages in thread
From: Yann Sionneau @ 2025-11-26 15:00 UTC (permalink / raw)
  To: Jason Andryuk, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel

Hi Jason,

On 11/19/25 23:46, Jason Andryuk wrote:
> The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
> from s3 or s2idle disconnects its PV devices during resume.  The
> backends are not expecting this and do not reconnect.
> 
> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> resume/chkpt") changed xen_suspend()/do_suspend() from
> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> suspend/resume callbacks remained.
> 
> .freeze/restore are used with hiberation where Linux restarts in a new
> place in the future.  .suspend/resume are useful for runtime power
> management for the duration of a boot.
> 
> The current behavior of the callbacks works for an xl save/restore or
> live migration where the domain is restored/migrated to a new location
> and connecting to a not-already-connected backend.
> 
> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> save/restore and live migration.  With .suspend/resume empty, PV devices
> are left connected during s2idle and s3, so PV devices are not changed
> and work after resume.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
>   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
> index 6d1819269cbe..199917b6f77c 100644
> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>   }
>   
>   static const struct dev_pm_ops xenbus_pm_ops = {
> -	.suspend	= xenbus_dev_suspend,
> -	.resume		= xenbus_frontend_dev_resume,
>   	.freeze		= xenbus_dev_suspend,
>   	.thaw		= xenbus_dev_cancel,
> -	.restore	= xenbus_dev_resume,
> +	.restore	= xenbus_frontend_dev_resume,
>   };
>   
>   static struct xen_bus_type xenbus_frontend = {

I've tried putting Debian 13 to sleep with your patch (echo freeze > 
/sys/power/state) and could not wake up the guest.
Isn't it also mandatory to apply this patch 
https://github.com/QubesOS/qubes-linux-kernel/blob/main/xen-events-Add-wakeup-support-to-xen-pirq.patch 
?

With both patches applied, I get the wake up to work.

Regards,

-- 


--
Yann Sionneau | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-19 22:47 ` [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices Jason Andryuk
  2025-11-25  8:20   ` Jürgen Groß
  2025-11-26 15:00   ` Yann Sionneau
@ 2025-11-30  2:03   ` Marek Marczykowski-Górecki
  2025-12-01 18:20     ` Jason Andryuk
  2 siblings, 1 reply; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-11-30  2:03 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3099 bytes --]

On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
> The goal is to fix s2idle and S3 for Xen PV devices.  

Can you give a little more context of this? We do have working S3 in
qubes with no need for such change. We trigger it via the toolstack (libxl_domain_suspend_only()).
Are you talking about guest-initiated suspend here?

We also have kinda working (host) s2idle. You may want to take a look at this
work (some/most of it was posted upstream, but not all got
committed/reviewed):
https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
changed since that PR, see the current main too).

> A domain resuming
> from s3 or s2idle disconnects its PV devices during resume.  The
> backends are not expecting this and do not reconnect.
> 
> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> resume/chkpt") changed xen_suspend()/do_suspend() from
> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> suspend/resume callbacks remained.
> 
> .freeze/restore are used with hiberation where Linux restarts in a new
> place in the future.  .suspend/resume are useful for runtime power
> management for the duration of a boot.
> 
> The current behavior of the callbacks works for an xl save/restore or
> live migration where the domain is restored/migrated to a new location
> and connecting to a not-already-connected backend.
> 
> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> save/restore and live migration.  With .suspend/resume empty, PV devices
> are left connected during s2idle and s3, so PV devices are not changed
> and work after resume.

Is that intended? While it might work for suspend by a chance(*), I'm
pretty sure not disconnecting + re-reconnecting PV devices across
save/restore/live migration will break them.

(*) and even that I'm not sure - with driver domains, depending on
suspend order this feels like might result in a deadlock...

> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
>  drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
> index 6d1819269cbe..199917b6f77c 100644
> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>  }
>  
>  static const struct dev_pm_ops xenbus_pm_ops = {
> -	.suspend	= xenbus_dev_suspend,
> -	.resume		= xenbus_frontend_dev_resume,
>  	.freeze		= xenbus_dev_suspend,
>  	.thaw		= xenbus_dev_cancel,
> -	.restore	= xenbus_dev_resume,
> +	.restore	= xenbus_frontend_dev_resume,
>  };
>  
>  static struct xen_bus_type xenbus_frontend = {
> -- 
> 2.34.1
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-25 10:47     ` Marek Marczykowski-Górecki
@ 2025-11-30  2:56       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-11-30  2:56 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On Tue, Nov 25, 2025 at 11:47:37AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Nov 25, 2025 at 09:20:55AM +0100, Jürgen Groß wrote:
> > On 19.11.25 23:47, Jason Andryuk wrote:
> > > The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
> > > from s3 or s2idle disconnects its PV devices during resume.  The
> > > backends are not expecting this and do not reconnect.
> > > 
> > > b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> > > resume/chkpt") changed xen_suspend()/do_suspend() from
> > > PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> > > suspend/resume callbacks remained.
> > > 
> > > .freeze/restore are used with hiberation where Linux restarts in a new
> > > place in the future.  .suspend/resume are useful for runtime power
> > > management for the duration of a boot.
> > > 
> > > The current behavior of the callbacks works for an xl save/restore or
> > > live migration where the domain is restored/migrated to a new location
> > > and connecting to a not-already-connected backend.
> > > 
> > > Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> > > .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> > > save/restore and live migration.  With .suspend/resume empty, PV devices
> > > are left connected during s2idle and s3, so PV devices are not changed
> > > and work after resume.
> > > 
> > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > 
> > Acked-by: Juergen Gross <jgross@suse.com>
> > 
> > Marek, could you please give this patch a try with QubesOS? I think this
> > patch should be verified not to break your use cases regarding suspend /
> > resume.
> 
> Sure, but I can't promise it will be this week, I have some deadlines to
> meet...

Regardless of my other response, those two patches appear to work fine
across domU suspend/resume (both the S3 and s2idle variants).
Note for s2idle I tested it together with other qubes patches:
https://github.com/QubesOS/qubes-linux-kernel/

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-26 15:00   ` Yann Sionneau
@ 2025-12-01 17:15     ` Jason Andryuk
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Andryuk @ 2025-12-01 17:15 UTC (permalink / raw)
  To: Yann Sionneau, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko
  Cc: xen-devel, linux-kernel

On 2025-11-26 10:00, Yann Sionneau wrote:
> Hi Jason,
> 
> On 11/19/25 23:46, Jason Andryuk wrote:
>> The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
>> from s3 or s2idle disconnects its PV devices during resume.  The
>> backends are not expecting this and do not reconnect.
>>
>> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
>> resume/chkpt") changed xen_suspend()/do_suspend() from
>> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
>> suspend/resume callbacks remained.
>>
>> .freeze/restore are used with hiberation where Linux restarts in a new
>> place in the future.  .suspend/resume are useful for runtime power
>> management for the duration of a boot.
>>
>> The current behavior of the callbacks works for an xl save/restore or
>> live migration where the domain is restored/migrated to a new location
>> and connecting to a not-already-connected backend.
>>
>> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
>> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
>> save/restore and live migration.  With .suspend/resume empty, PV devices
>> are left connected during s2idle and s3, so PV devices are not changed
>> and work after resume.
>>
>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>>    drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>>    1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> index 6d1819269cbe..199917b6f77c 100644
>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>>    }
>>    
>>    static const struct dev_pm_ops xenbus_pm_ops = {
>> -	.suspend	= xenbus_dev_suspend,
>> -	.resume		= xenbus_frontend_dev_resume,
>>    	.freeze		= xenbus_dev_suspend,
>>    	.thaw		= xenbus_dev_cancel,
>> -	.restore	= xenbus_dev_resume,
>> +	.restore	= xenbus_frontend_dev_resume,
>>    };
>>    
>>    static struct xen_bus_type xenbus_frontend = {
> 
> I've tried putting Debian 13 to sleep with your patch (echo freeze >
> /sys/power/state) and could not wake up the guest.
> Isn't it also mandatory to apply this patch
> https://github.com/QubesOS/qubes-linux-kernel/blob/main/xen-events-Add-wakeup-support-to-xen-pirq.patch
> ?
> 
> With both patches applied, I get the wake up to work.

I did not need the additional patch.  I thought it was necessary for 
dom0 s0ix wakeup which uses xen-pirq for hardware interrupts.  I did not 
think it was necessary for a domU.  I'm only looking at the domU case 
here, and this code is for frontend PV devices.

In my setup, I use `xl trigger $dom sleep` to enter (and exit) s2idle 
for a PVH domain, which has the ACPI buttons exposed with X86_EMU_PM. 
This is a non-standard configuration.

I've additionally tested with `echo mem > /sys/power/state` to enter 
s2idle.  Again waking with `xl trigger $dom sleep`.

xl console remains connected, but typing there does not wake.

What kind of guest are you testing with?  How do you wake it?  If it's 
HVM in S3, another way to attempt wake up may be `xl 
qemu-monitor-command $dom system_wakeup`.

Regards,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-11-30  2:03   ` Marek Marczykowski-Górecki
@ 2025-12-01 18:20     ` Jason Andryuk
  2025-12-01 22:16       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Andryuk @ 2025-12-01 18:20 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel

On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:
> On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
>> The goal is to fix s2idle and S3 for Xen PV devices.
> 
> Can you give a little more context of this? We do have working S3 in
> qubes with no need for such change. We trigger it via the toolstack (libxl_domain_suspend_only()).
> Are you talking about guest-initiated suspend here?

This is intended to help domU s2idle/S3 and resume.  I guess that is 
what you mean by guest-initiated?  The domU can use 'echo mem > 
/sys/power/state' to enter s2idle/S3.  We also have the domU react to 
the ACPI sleep button from `xl trigger $dom sleep`.

AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux 
drivers/xen/manage.c:do_suspend() acts on.  `xl save/suspend/migrate` 
all use this path.

The terminology gets confusing.  Xen uses "suspend" for 
save/suspend/migrate, but the Linux power management codes uses 
freeze/thaw/restore.  AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime 
power management.

When you call libxl_domain_suspend_only()/libxl_domain_resume(), you 
pass suspend_cancel==1.
  *  1. (fast=1) Resume the guest without resetting the domain
        environment.
  *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
        return 1.

That ends up in Linux do_suspend() as si.cancelled = 1, which calls 
PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op.  So it does 
not change the PV devices.

We needed guest user space to perform actions before entering s2idle. 
libxl_domain_suspend_only() triggers the Linux kernel path which does 
not notify user space.  The ACPI power buttons let user space perform 
actions (lock and blank the screen) before entering the idle state.

> We also have kinda working (host) s2idle. You may want to take a look at this
> work (some/most of it was posted upstream, but not all got
> committed/reviewed):
> https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
> https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
> changed since that PR, see the current main too).

This would not affect host s2idle - it changes PV frontend devices.

Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?

>> A domain resuming
>> from s3 or s2idle disconnects its PV devices during resume.  The
>> backends are not expecting this and do not reconnect.
>>
>> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
>> resume/chkpt") changed xen_suspend()/do_suspend() from
>> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
>> suspend/resume callbacks remained.
>>
>> .freeze/restore are used with hiberation where Linux restarts in a new
>> place in the future.  .suspend/resume are useful for runtime power
>> management for the duration of a boot.
>>
>> The current behavior of the callbacks works for an xl save/restore or
>> live migration where the domain is restored/migrated to a new location
>> and connecting to a not-already-connected backend.
>>
>> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
>> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
>> save/restore and live migration.  With .suspend/resume empty, PV devices
>> are left connected during s2idle and s3, so PV devices are not changed
>> and work after resume.
> 
> Is that intended? While it might work for suspend by a chance(*), I'm
> pretty sure not disconnecting + re-reconnecting PV devices across
> save/restore/live migration will break them.

save/restore/live migration keep using .freeze/thaw/restore, which 
disconnects and reconnects today.  Nothing changes there as 
xen_suspend()/do_suspend() call the power management code with 
PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.

This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. 
When a domU goes into s2idle/S3, the backend state remains connected. 
With this patch, when the domU wakes up, the frontends do nothing and 
remain connected.

> (*) and even that I'm not sure - with driver domains, depending on
> suspend order this feels like might result in a deadlock...

I'm not sure.  I don't think this patch changes anything with respect to 
them.

Thanks for testing.

Maybe the commit messages should change to highlight this is for domU PV 
devices?  struct xen_bus_type xenbus_backend does not define dev_pm_ops.

Regards,
Jason

>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>>   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> index 6d1819269cbe..199917b6f77c 100644
>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
>> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>>   }
>>   
>>   static const struct dev_pm_ops xenbus_pm_ops = {
>> -	.suspend	= xenbus_dev_suspend,
>> -	.resume		= xenbus_frontend_dev_resume,
>>   	.freeze		= xenbus_dev_suspend,
>>   	.thaw		= xenbus_dev_cancel,
>> -	.restore	= xenbus_dev_resume,
>> +	.restore	= xenbus_frontend_dev_resume,
>>   };
>>   
>>   static struct xen_bus_type xenbus_frontend = {
>> -- 
>> 2.34.1
>>
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-12-01 18:20     ` Jason Andryuk
@ 2025-12-01 22:16       ` Marek Marczykowski-Górecki
  2025-12-03 22:33         ` Jason Andryuk
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-12-01 22:16 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6129 bytes --]

On Mon, Dec 01, 2025 at 01:20:40PM -0500, Jason Andryuk wrote:
> On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:
> > On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
> > > The goal is to fix s2idle and S3 for Xen PV devices.
> > 
> > Can you give a little more context of this? We do have working S3 in
> > qubes with no need for such change. We trigger it via the toolstack (libxl_domain_suspend_only()).
> > Are you talking about guest-initiated suspend here?
> 
> This is intended to help domU s2idle/S3 and resume.  I guess that is what
> you mean by guest-initiated?  The domU can use 'echo mem > /sys/power/state'
> to enter s2idle/S3.  We also have the domU react to the ACPI sleep button
> from `xl trigger $dom sleep`.

Ok, so this is indeed a different path than we use in Qubes OS.

> AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux
> drivers/xen/manage.c:do_suspend() acts on.  `xl save/suspend/migrate` all
> use this path.
> 
> The terminology gets confusing.  Xen uses "suspend" for
> save/suspend/migrate, but the Linux power management codes uses
> freeze/thaw/restore.  AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime
> power management.

Indeed it gets confusing...

> When you call libxl_domain_suspend_only()/libxl_domain_resume(), you pass
> suspend_cancel==1.
>  *  1. (fast=1) Resume the guest without resetting the domain
>        environment.
>  *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
>        return 1.
> 
> That ends up in Linux do_suspend() as si.cancelled = 1, which calls
> PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op.  So it does not
> change the PV devices.
> 
> We needed guest user space to perform actions before entering s2idle.
> libxl_domain_suspend_only() triggers the Linux kernel path which does not
> notify user space.  The ACPI power buttons let user space perform actions
> (lock and blank the screen) before entering the idle state.

I see. In our case, we have our own userspace hook that gets called
before (if relevant - in most cases it isn't).

> > We also have kinda working (host) s2idle. You may want to take a look at this
> > work (some/most of it was posted upstream, but not all got
> > committed/reviewed):
> > https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
> > https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
> > changed since that PR, see the current main too).
> 
> This would not affect host s2idle - it changes PV frontend devices.
> 
> Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?

Yes, exactly.

> > > A domain resuming
> > > from s3 or s2idle disconnects its PV devices during resume.  The
> > > backends are not expecting this and do not reconnect.
> > > 
> > > b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
> > > resume/chkpt") changed xen_suspend()/do_suspend() from
> > > PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
> > > suspend/resume callbacks remained.
> > > 
> > > .freeze/restore are used with hiberation where Linux restarts in a new
> > > place in the future.  .suspend/resume are useful for runtime power
> > > management for the duration of a boot.
> > > 
> > > The current behavior of the callbacks works for an xl save/restore or
> > > live migration where the domain is restored/migrated to a new location
> > > and connecting to a not-already-connected backend.
> > > 
> > > Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
> > > .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
> > > save/restore and live migration.  With .suspend/resume empty, PV devices
> > > are left connected during s2idle and s3, so PV devices are not changed
> > > and work after resume.
> > 
> > Is that intended? While it might work for suspend by a chance(*), I'm
> > pretty sure not disconnecting + re-reconnecting PV devices across
> > save/restore/live migration will break them.
> 
> save/restore/live migration keep using .freeze/thaw/restore, which
> disconnects and reconnects today.  Nothing changes there as
> xen_suspend()/do_suspend() call the power management code with
> PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.
> 
> This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. When
> a domU goes into s2idle/S3, the backend state remains connected. With this
> patch, when the domU wakes up, the frontends do nothing and remain
> connected.

This explanation makes sense.

> > (*) and even that I'm not sure - with driver domains, depending on
> > suspend order this feels like might result in a deadlock...
> 
> I'm not sure.  I don't think this patch changes anything with respect to
> them.
> 
> Thanks for testing.
> 
> Maybe the commit messages should change to highlight this is for domU PV
> devices?  struct xen_bus_type xenbus_backend does not define dev_pm_ops.

Good idea.

> Regards,
> Jason
> 
> > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > > ---
> > >   drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
> > >   1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
> > > index 6d1819269cbe..199917b6f77c 100644
> > > --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
> > > +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
> > > @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
> > >   }
> > >   static const struct dev_pm_ops xenbus_pm_ops = {
> > > -	.suspend	= xenbus_dev_suspend,
> > > -	.resume		= xenbus_frontend_dev_resume,
> > >   	.freeze		= xenbus_dev_suspend,
> > >   	.thaw		= xenbus_dev_cancel,
> > > -	.restore	= xenbus_dev_resume,
> > > +	.restore	= xenbus_frontend_dev_resume,
> > >   };
> > >   static struct xen_bus_type xenbus_frontend = {
> > > -- 
> > > 2.34.1
> > > 
> > > 
> > 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices
  2025-12-01 22:16       ` Marek Marczykowski-Górecki
@ 2025-12-03 22:33         ` Jason Andryuk
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Andryuk @ 2025-12-03 22:33 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Yann Sionneau, xen-devel, linux-kernel, Daniel Smith,
	Christopher Clark

On 2025-12-01 17:16, Marek Marczykowski-Górecki wrote:
> On Mon, Dec 01, 2025 at 01:20:40PM -0500, Jason Andryuk wrote:
>> On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:
>>> On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
>>>> The goal is to fix s2idle and S3 for Xen PV devices.
>>>
>>> Can you give a little more context of this? We do have working S3 in
>>> qubes with no need for such change. We trigger it via the toolstack (libxl_domain_suspend_only()).
>>> Are you talking about guest-initiated suspend here?
>>
>> This is intended to help domU s2idle/S3 and resume.  I guess that is what
>> you mean by guest-initiated?  The domU can use 'echo mem > /sys/power/state'
>> to enter s2idle/S3.  We also have the domU react to the ACPI sleep button
>> from `xl trigger $dom sleep`.
> 
> Ok, so this is indeed a different path than we use in Qubes OS.
> 
>> AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux
>> drivers/xen/manage.c:do_suspend() acts on.  `xl save/suspend/migrate` all
>> use this path.
>>
>> The terminology gets confusing.  Xen uses "suspend" for
>> save/suspend/migrate, but the Linux power management codes uses
>> freeze/thaw/restore.  AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime
>> power management.
> 
> Indeed it gets confusing...
> 
>> When you call libxl_domain_suspend_only()/libxl_domain_resume(), you pass
>> suspend_cancel==1.
>>   *  1. (fast=1) Resume the guest without resetting the domain
>>         environment.
>>   *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
>>         return 1.
>>
>> That ends up in Linux do_suspend() as si.cancelled = 1, which calls
>> PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op.  So it does not
>> change the PV devices.
>>
>> We needed guest user space to perform actions before entering s2idle.
>> libxl_domain_suspend_only() triggers the Linux kernel path which does not
>> notify user space.  The ACPI power buttons let user space perform actions
>> (lock and blank the screen) before entering the idle state.
> 
> I see. In our case, we have our own userspace hook that gets called
> before (if relevant - in most cases it isn't).
> 
>>> We also have kinda working (host) s2idle. You may want to take a look at this
>>> work (some/most of it was posted upstream, but not all got
>>> committed/reviewed):
>>> https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
>>> https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
>>> changed since that PR, see the current main too).
>>
>> This would not affect host s2idle - it changes PV frontend devices.
>>
>> Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?
> 
> Yes, exactly.
> 
>>>> A domain resuming
>>>> from s3 or s2idle disconnects its PV devices during resume.  The
>>>> backends are not expecting this and do not reconnect.
>>>>
>>>> b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
>>>> resume/chkpt") changed xen_suspend()/do_suspend() from
>>>> PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
>>>> suspend/resume callbacks remained.
>>>>
>>>> .freeze/restore are used with hiberation where Linux restarts in a new
>>>> place in the future.  .suspend/resume are useful for runtime power
>>>> management for the duration of a boot.
>>>>
>>>> The current behavior of the callbacks works for an xl save/restore or
>>>> live migration where the domain is restored/migrated to a new location
>>>> and connecting to a not-already-connected backend.
>>>>
>>>> Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
>>>> .suspend/resume hook.  This matches the use in drivers/xen/manage.c for
>>>> save/restore and live migration.  With .suspend/resume empty, PV devices
>>>> are left connected during s2idle and s3, so PV devices are not changed
>>>> and work after resume.
>>>
>>> Is that intended? While it might work for suspend by a chance(*), I'm
>>> pretty sure not disconnecting + re-reconnecting PV devices across
>>> save/restore/live migration will break them.
>>
>> save/restore/live migration keep using .freeze/thaw/restore, which
>> disconnects and reconnects today.  Nothing changes there as
>> xen_suspend()/do_suspend() call the power management code with
>> PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.
>>
>> This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. When
>> a domU goes into s2idle/S3, the backend state remains connected. With this
>> patch, when the domU wakes up, the frontends do nothing and remain
>> connected.
> 
> This explanation makes sense.
> 
>>> (*) and even that I'm not sure - with driver domains, depending on
>>> suspend order this feels like might result in a deadlock...
>>
>> I'm not sure.  I don't think this patch changes anything with respect to
>> them.
>>
>> Thanks for testing.
>>
>> Maybe the commit messages should change to highlight this is for domU PV
>> devices?  struct xen_bus_type xenbus_backend does not define dev_pm_ops.
> 
> Good idea.
> 
>> Regards,
>> Jason
>>
>>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>>> ---
>>>>    drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
>>>>    1 file changed, 1 insertion(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
>>>> index 6d1819269cbe..199917b6f77c 100644
>>>> --- a/drivers/xen/xenbus/xenbus_probe_frontend.c
>>>> +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
>>>> @@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
>>>>    }
>>>>    static const struct dev_pm_ops xenbus_pm_ops = {
>>>> -	.suspend	= xenbus_dev_suspend,
>>>> -	.resume		= xenbus_frontend_dev_resume,
>>>>    	.freeze		= xenbus_dev_suspend,
>>>>    	.thaw		= xenbus_dev_cancel,
>>>> -	.restore	= xenbus_dev_resume,
>>>> +	.restore	= xenbus_frontend_dev_resume,

I was double checking before sending a v2, and I have questions about 
this.  I purposely switched the .restore callback since 
xenbus_frontend_dev_resume() handles the extra case.  It was added in:

commit 2abb274629614bef4044a0b98ada42e977feadfd
Author: Aurelien Chartier <aurelien.chartier@citrix.com>
Date:   Tue May 28 18:09:56 2013 +0100

     xenbus: delay xenbus frontend resume if xenstored is not running

     If the xenbus frontend is located in a domain running xenstored, 
the device
     resume is hanging because it is happening before the process 
resume. This
     patch adds extra logic to the resume code to check if we are the domain
     running xenstored and delay the resume if needed.

It was after b3e96c0c7562, so .freeze/thaw/restore were already present 
for domU xen_suspend() handling.  So the .resume handler should have 
been used.

This is for the "domain running xenstored", so dom0.  So maybe this was 
called for a dom0 S3 suspend/resume?  But as stated above this is patch 
changes PV frontends.  Maybe the change was for Xenclient/OpenXT - 
OpenXT has netfront in dom0 connected to the network driver domain.  But 
without netback changes, I don't think that would work today?  As it is, 
S3 in OpenXT has been disabled for years as broken.

Ok, yes, dom0 S3:
https://lore.kernel.org/xen-devel/1367413042-13987-1-git-send-email-aurelien.chartier@citrix.com/

 > This patch series fixes the S3 resume of dom0 or a Xenstore stub
 > domain running a frontend over xenbus (xen-netfront in my use case).
 >
 > As device resume is happening before process resume, the xenbus
 > frontend resume is hanging if xenstored is not running, thus causing
 > a deadlock. This patch series is fixing that issue by bypassing the
 > xenbus frontend resume when we are running in dom0 or a Xenstore stub
 > domain.

I don't think setting .restore = xenbus_frontend_dev_resume will break 
anything for a domU.  It handles a case which doesn't trigger for a 
domU.  Removing .resume, a frontend in dom0 will not be touched, so that 
at least cannot hang.

Does this all sound okay?  Does anyone think I am missing anything

Regards,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-12-03 22:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251119224731.61497-1-jason.andryuk@amd.com>
2025-11-19 22:47 ` [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices Jason Andryuk
2025-11-25  8:20   ` Jürgen Groß
2025-11-25 10:47     ` Marek Marczykowski-Górecki
2025-11-30  2:56       ` Marek Marczykowski-Górecki
2025-11-26 15:00   ` Yann Sionneau
2025-12-01 17:15     ` Jason Andryuk
2025-11-30  2:03   ` Marek Marczykowski-Górecki
2025-12-01 18:20     ` Jason Andryuk
2025-12-01 22:16       ` Marek Marczykowski-Górecki
2025-12-03 22:33         ` Jason Andryuk
2025-11-19 22:47 ` [PATCH 2/2] xenbus: Rename helpers to freeze/thaw/restore Jason Andryuk
2025-11-25  9:17   ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox