public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Jason Andryuk <jason.andryuk@amd.com>,
	Juergen Gross <jgross@suse.com>, Sasha Levin <sashal@kernel.org>,
	arnd@arndb.de
Subject: [PATCH AUTOSEL 6.19-5.10] xenbus: Use .freeze/.thaw to handle xenbus devices
Date: Wed, 11 Feb 2026 07:30:13 -0500	[thread overview]
Message-ID: <20260211123112.1330287-3-sashal@kernel.org> (raw)
In-Reply-To: <20260211123112.1330287-1-sashal@kernel.org>

From: Jason Andryuk <jason.andryuk@amd.com>

[ Upstream commit e08dd1ee49838750a514e83c0aa60cd12ba6ecbb ]

The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
from s3 or s2idle disconnects its PV devices during resume.  The
backends are not expecting this and do not reconnect.

b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.

.freeze/restore are used with hiberation where Linux restarts in a new
place in the future.  .suspend/resume are useful for runtime power
management for the duration of a boot.

The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.

Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook.  This matches the use in drivers/xen/manage.c for
save/restore and live migration.  With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20251119224731.61497-2-jason.andryuk@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a thorough understanding. Let me summarize my complete
analysis.

---

## Complete Analysis

### 1. Commit Message Analysis

The commit explicitly states its goal: **fix s2idle and S3 for Xen PV
devices**. The problem is clearly described: when a Xen domain resumes
from S3 or s2idle, PV devices are disconnected and the backends refuse
to reconnect, leaving them broken.

The commit references the 2011 commit `b3e96c0c7562` which changed
`do_suspend()` in `manage.c` to use
`PMSG_FREEZE`/`PMSG_THAW`/`PMSG_RESTORE` instead of
`PMSG_SUSPEND`/`PMSG_RESUME`, but the xenbus frontend PM callbacks were
never updated to match — they still had `.suspend`/`.resume` populated.

### 2. Code Change Analysis

The change modifies only the `xenbus_pm_ops` structure in
`xenbus_probe_frontend.c`:

**Before:**
```150:156:drivers/xen/xenbus/xenbus_probe_frontend.c
static const struct dev_pm_ops xenbus_pm_ops = {
        .suspend        = xenbus_dev_suspend,
        .resume         = xenbus_frontend_dev_resume,
        .freeze         = xenbus_dev_suspend,
        .thaw           = xenbus_dev_cancel,
        .restore        = xenbus_dev_resume,
};
```

**After (proposed):**
```c
static const struct dev_pm_ops xenbus_pm_ops = {
        .freeze         = xenbus_dev_suspend,
        .thaw           = xenbus_dev_cancel,
        .restore        = xenbus_frontend_dev_resume,
};
```

Three changes:
1. **Remove `.suspend = xenbus_dev_suspend`**: During S3/s2idle, the PM
   core dispatches `PM_EVENT_SUSPEND` to `.suspend`. By removing it, PV
   devices are no longer disconnected during S3/s2idle.
2. **Remove `.resume = xenbus_frontend_dev_resume`**: No reconnection
   attempt during S3/s2idle resume (nothing was disconnected, so nothing
   needs reconnecting).
3. **Change `.restore` from `xenbus_dev_resume` to
   `xenbus_frontend_dev_resume`**: This fixes a second bug -
   `xenbus_frontend_dev_resume` properly handles the case where
   xenstored runs locally (`XS_LOCAL`) by deferring the resume via a
   work queue. The old `xenbus_dev_resume` would hang in that case
   during restore.

### 3. PM Dispatch Verification

The PM core's `pm_op()` function in `drivers/base/power/main.c` confirms
the dispatch logic:
- `PM_EVENT_SUSPEND` → `.suspend` (used for real S3/s2idle)
- `PM_EVENT_FREEZE` → `.freeze` (used by Xen save/restore via
  `do_suspend()`)
- `PM_EVENT_THAW` → `.thaw` (Xen cancelled restore)
- `PM_EVENT_RESTORE` → `.restore` (Xen successful restore)

The Xen save/restore/migration path in `drivers/xen/manage.c`
exclusively uses `PMSG_FREEZE`/`PMSG_THAW`/`PMSG_RESTORE`:

```117:117:drivers/xen/manage.c
        err = dpm_suspend_start(PMSG_FREEZE);
```

```147:147:drivers/xen/manage.c
        dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
```

This confirms: Xen save/restore uses `.freeze`/`.thaw`/`.restore`, NOT
`.suspend`/`.resume`. The removal of `.suspend`/`.resume` does not
affect Xen save/restore at all.

### 4. Bug Mechanism

The bug is a **functional regression** dating back to the original 2011
commit `b3e96c0c7562`:
- `do_suspend()` was changed to use `PMSG_FREEZE`
- But `xenbus_pm_ops` kept `.suspend`/`.resume` populated
- When S3/s2idle is used (which triggers `PMSG_SUSPEND`→`.suspend`),
  `xenbus_dev_suspend` is called, which disconnects PV devices
- On resume, `xenbus_frontend_dev_resume` tries to reconnect, but the
  backend doesn't expect the disconnect and refuses to reconnect
- **Result: PV devices are broken after S3/s2idle**

### 5. Self-Containedness

Despite the Message-ID suggesting patch 2 of a series (`-2-`), this
commit is entirely self-contained:
- It modifies only one data structure
- All referenced functions already exist in the codebase
- No new code is introduced
- It does not depend on other patches

### 6. Risk Assessment

**Very low risk:**
- The change removes behavior (empties `.suspend`/`.resume`), which is
  strictly less likely to cause regression than adding behavior
- During S3/s2idle, devices simply stay connected — the safest possible
  behavior
- The `.restore` change to `xenbus_frontend_dev_resume` is strictly an
  improvement (adds XS_LOCAL handling that was already done for
  `.resume`)
- Change is only 3 lines in one structure
- Acked by Xen subsystem maintainer (Juergen Gross)
- Also signed off by Juergen Gross as the committer

### 7. Impact

- **Who is affected**: All Xen PV guests attempting S3/s2idle
- **Severity**: HIGH - PV devices completely break after resume, meaning
  network and block devices stop working
- **User visibility**: Very visible - domain becomes unusable after
  suspend/resume

### 8. Stable Tree Applicability

- The affected code (`xenbus_pm_ops` structure) has been stable since
  2011/2013
- All referenced functions exist in all maintained stable trees
- The patch applies cleanly (trivial change to a stable structure)
- This fixes a longstanding functional bug affecting real use cases

### 9. Classification

This is a **bug fix** for broken S3/s2idle on Xen PV domains, with an
additional fix for potential hang during restore with local xenstored.
It meets all stable criteria:
- Obviously correct and well-reviewed (Acked-by subsystem maintainer)
- Fixes a real, user-facing bug (broken PV devices)
- Important issue (complete loss of PV device functionality)
- Very small and contained (3 lines in one structure)
- No new features or APIs

**YES**

 drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe5..199917b6f77ca 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device *_dev)
 }
 
 static const struct dev_pm_ops xenbus_pm_ops = {
-	.suspend	= xenbus_dev_suspend,
-	.resume		= xenbus_frontend_dev_resume,
 	.freeze		= xenbus_dev_suspend,
 	.thaw		= xenbus_dev_cancel,
-	.restore	= xenbus_dev_resume,
+	.restore	= xenbus_frontend_dev_resume,
 };
 
 static struct xen_bus_type xenbus_frontend = {
-- 
2.51.0


  parent reply	other threads:[~2026-02-11 12:31 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 12:30 [PATCH AUTOSEL 6.19-5.10] s390/perf: Disable register readout on sampling events Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] arm64: Add support for TSV110 Spectre-BHB mitigation Sasha Levin
2026-02-11 12:30 ` Sasha Levin [this message]
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] s390/boot: " Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.1] perf/arm-cmn: Support CMN-600AE Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] ntfs: ->d_compare() must not block Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] block: decouple secure erase size limit from discard size limit Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: don't reference obsolete termio struct for TC* constants Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't go past the ARM processor CPER record buffer Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] cpufreq: dt-platdev: Block the driver from probing on more QC platforms Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't dump the entire memory region Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: battery: fix incorrect charging status when current is zero Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] rust: cpufreq: always inline functions using build_assert with arguments Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] blk-mq-sched: unify elevators checking for async requests Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] APEI/GHES: ARM processor Error: don't go past allocated memory Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] md raid: fix hang when stopping arrays with metadata through dm-raid Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] tools/power cpupower: Reset errno before strtoull() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: Synchronize user stack on fork and clone Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] rnbd-srv: Zero the rsp buffer before using it Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] alpha: fix user-space corruption during memory compaction Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] ACPICA: Abort AML bytecode execution when executing AML_FATAL_OP Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] tools/cpupower: Fix inverted APERF capability check Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.15] ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[] Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] md-cluster: fix NULL pointer dereference in process_metadata_update Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] APEI/GHES: ensure that won't go past CPER allocated record Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] powercap: intel_rapl: Add PL4 support for Ice Lake Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts() Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260211123112.1330287-3-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=arnd@arndb.de \
    --cc=jason.andryuk@amd.com \
    --cc=jgross@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox