From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Jason Andryuk <jason.andryuk@amd.com>,
Juergen Gross <jgross@suse.com>, Sasha Levin <sashal@kernel.org>,
arnd@arndb.de
Subject: [PATCH AUTOSEL 6.19-5.10] xenbus: Use .freeze/.thaw to handle xenbus devices
Date: Wed, 11 Feb 2026 07:30:13 -0500 [thread overview]
Message-ID: <20260211123112.1330287-3-sashal@kernel.org> (raw)
In-Reply-To: <20260211123112.1330287-1-sashal@kernel.org>
From: Jason Andryuk <jason.andryuk@amd.com>
[ Upstream commit e08dd1ee49838750a514e83c0aa60cd12ba6ecbb ]
The goal is to fix s2idle and S3 for Xen PV devices. A domain resuming
from s3 or s2idle disconnects its PV devices during resume. The
backends are not expecting this and do not reconnect.
b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.
.freeze/restore are used with hiberation where Linux restarts in a new
place in the future. .suspend/resume are useful for runtime power
management for the duration of a boot.
The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.
Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook. This matches the use in drivers/xen/manage.c for
save/restore and live migration. With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20251119224731.61497-2-jason.andryuk@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a thorough understanding. Let me summarize my complete
analysis.
---
## Complete Analysis
### 1. Commit Message Analysis
The commit explicitly states its goal: **fix s2idle and S3 for Xen PV
devices**. The problem is clearly described: when a Xen domain resumes
from S3 or s2idle, PV devices are disconnected and the backends refuse
to reconnect, leaving them broken.
The commit references the 2011 commit `b3e96c0c7562` which changed
`do_suspend()` in `manage.c` to use
`PMSG_FREEZE`/`PMSG_THAW`/`PMSG_RESTORE` instead of
`PMSG_SUSPEND`/`PMSG_RESUME`, but the xenbus frontend PM callbacks were
never updated to match — they still had `.suspend`/`.resume` populated.
### 2. Code Change Analysis
The change modifies only the `xenbus_pm_ops` structure in
`xenbus_probe_frontend.c`:
**Before:**
```150:156:drivers/xen/xenbus/xenbus_probe_frontend.c
static const struct dev_pm_ops xenbus_pm_ops = {
.suspend = xenbus_dev_suspend,
.resume = xenbus_frontend_dev_resume,
.freeze = xenbus_dev_suspend,
.thaw = xenbus_dev_cancel,
.restore = xenbus_dev_resume,
};
```
**After (proposed):**
```c
static const struct dev_pm_ops xenbus_pm_ops = {
.freeze = xenbus_dev_suspend,
.thaw = xenbus_dev_cancel,
.restore = xenbus_frontend_dev_resume,
};
```
Three changes:
1. **Remove `.suspend = xenbus_dev_suspend`**: During S3/s2idle, the PM
core dispatches `PM_EVENT_SUSPEND` to `.suspend`. By removing it, PV
devices are no longer disconnected during S3/s2idle.
2. **Remove `.resume = xenbus_frontend_dev_resume`**: No reconnection
attempt during S3/s2idle resume (nothing was disconnected, so nothing
needs reconnecting).
3. **Change `.restore` from `xenbus_dev_resume` to
`xenbus_frontend_dev_resume`**: This fixes a second bug -
`xenbus_frontend_dev_resume` properly handles the case where
xenstored runs locally (`XS_LOCAL`) by deferring the resume via a
work queue. The old `xenbus_dev_resume` would hang in that case
during restore.
### 3. PM Dispatch Verification
The PM core's `pm_op()` function in `drivers/base/power/main.c` confirms
the dispatch logic:
- `PM_EVENT_SUSPEND` → `.suspend` (used for real S3/s2idle)
- `PM_EVENT_FREEZE` → `.freeze` (used by Xen save/restore via
`do_suspend()`)
- `PM_EVENT_THAW` → `.thaw` (Xen cancelled restore)
- `PM_EVENT_RESTORE` → `.restore` (Xen successful restore)
The Xen save/restore/migration path in `drivers/xen/manage.c`
exclusively uses `PMSG_FREEZE`/`PMSG_THAW`/`PMSG_RESTORE`:
```117:117:drivers/xen/manage.c
err = dpm_suspend_start(PMSG_FREEZE);
```
```147:147:drivers/xen/manage.c
dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
```
This confirms: Xen save/restore uses `.freeze`/`.thaw`/`.restore`, NOT
`.suspend`/`.resume`. The removal of `.suspend`/`.resume` does not
affect Xen save/restore at all.
### 4. Bug Mechanism
The bug is a **functional regression** dating back to the original 2011
commit `b3e96c0c7562`:
- `do_suspend()` was changed to use `PMSG_FREEZE`
- But `xenbus_pm_ops` kept `.suspend`/`.resume` populated
- When S3/s2idle is used (which triggers `PMSG_SUSPEND`→`.suspend`),
`xenbus_dev_suspend` is called, which disconnects PV devices
- On resume, `xenbus_frontend_dev_resume` tries to reconnect, but the
backend doesn't expect the disconnect and refuses to reconnect
- **Result: PV devices are broken after S3/s2idle**
### 5. Self-Containedness
Despite the Message-ID suggesting patch 2 of a series (`-2-`), this
commit is entirely self-contained:
- It modifies only one data structure
- All referenced functions already exist in the codebase
- No new code is introduced
- It does not depend on other patches
### 6. Risk Assessment
**Very low risk:**
- The change removes behavior (empties `.suspend`/`.resume`), which is
strictly less likely to cause regression than adding behavior
- During S3/s2idle, devices simply stay connected — the safest possible
behavior
- The `.restore` change to `xenbus_frontend_dev_resume` is strictly an
improvement (adds XS_LOCAL handling that was already done for
`.resume`)
- Change is only 3 lines in one structure
- Acked by Xen subsystem maintainer (Juergen Gross)
- Also signed off by Juergen Gross as the committer
### 7. Impact
- **Who is affected**: All Xen PV guests attempting S3/s2idle
- **Severity**: HIGH - PV devices completely break after resume, meaning
network and block devices stop working
- **User visibility**: Very visible - domain becomes unusable after
suspend/resume
### 8. Stable Tree Applicability
- The affected code (`xenbus_pm_ops` structure) has been stable since
2011/2013
- All referenced functions exist in all maintained stable trees
- The patch applies cleanly (trivial change to a stable structure)
- This fixes a longstanding functional bug affecting real use cases
### 9. Classification
This is a **bug fix** for broken S3/s2idle on Xen PV domains, with an
additional fix for potential hang during restore with local xenstored.
It meets all stable criteria:
- Obviously correct and well-reviewed (Acked-by subsystem maintainer)
- Fixes a real, user-facing bug (broken PV devices)
- Important issue (complete loss of PV device functionality)
- Very small and contained (3 lines in one structure)
- No new features or APIs
**YES**
drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe5..199917b6f77ca 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
}
static const struct dev_pm_ops xenbus_pm_ops = {
- .suspend = xenbus_dev_suspend,
- .resume = xenbus_frontend_dev_resume,
.freeze = xenbus_dev_suspend,
.thaw = xenbus_dev_cancel,
- .restore = xenbus_dev_resume,
+ .restore = xenbus_frontend_dev_resume,
};
static struct xen_bus_type xenbus_frontend = {
--
2.51.0
next prev parent reply other threads:[~2026-02-11 12:31 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-11 12:30 [PATCH AUTOSEL 6.19-5.10] s390/perf: Disable register readout on sampling events Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] arm64: Add support for TSV110 Spectre-BHB mitigation Sasha Levin
2026-02-11 12:30 ` Sasha Levin [this message]
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] s390/boot: " Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.1] perf/arm-cmn: Support CMN-600AE Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] ntfs: ->d_compare() must not block Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] block: decouple secure erase size limit from discard size limit Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: don't reference obsolete termio struct for TC* constants Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't go past the ARM processor CPER record buffer Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] cpufreq: dt-platdev: Block the driver from probing on more QC platforms Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't dump the entire memory region Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: battery: fix incorrect charging status when current is zero Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] rust: cpufreq: always inline functions using build_assert with arguments Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] blk-mq-sched: unify elevators checking for async requests Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] APEI/GHES: ARM processor Error: don't go past allocated memory Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] md raid: fix hang when stopping arrays with metadata through dm-raid Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] tools/power cpupower: Reset errno before strtoull() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: Synchronize user stack on fork and clone Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] rnbd-srv: Zero the rsp buffer before using it Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] alpha: fix user-space corruption during memory compaction Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] ACPICA: Abort AML bytecode execution when executing AML_FATAL_OP Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] tools/cpupower: Fix inverted APERF capability check Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.15] ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[] Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] md-cluster: fix NULL pointer dereference in process_metadata_update Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] APEI/GHES: ensure that won't go past CPER allocated record Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] powercap: intel_rapl: Add PL4 support for Ice Lake Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260211123112.1330287-3-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=arnd@arndb.de \
--cc=jason.andryuk@amd.com \
--cc=jgross@suse.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox