* [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap
@ 2026-03-20 17:03 Youngjun Park
2026-03-20 17:03 ` [PATCH v6 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Youngjun Park @ 2026-03-20 17:03 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
Currently, in the uswsusp path, only the swap type value is retrieved at
lookup time without holding a reference. If swapoff races after the type
is acquired, subsequent slot allocations operate on a stale swap device.
Additionally, grabbing and releasing the swap device reference on every
slot allocation is inefficient across the entire hibernation swap path.
This patch series addresses these issues:
- Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device
from the point it is looked up until the session completes.
- Patch 2: Removes the overhead of per-slot reference counting in alloc/free
paths and cleans up the redundant SWP_WRITEOK check.
- Patch 3: Fixes a spurious WARNING in the uswsusp GFP mask restore path.
(Found during uswsusp test)
Links:
RFC v1: https://lore.kernel.org/linux-mm/20260305202413.1888499-1-usama.arif@linux.dev/T/#m3693d45180f14f441b6951984f4b4bfd90ec0c9d
RFC v2: https://lore.kernel.org/linux-mm/20260306024608.1720991-1-youngjun.park@lge.com/
RFC v3: https://lore.kernel.org/linux-mm/20260312112511.3596781-1-youngjun.park@lge.com/
v4: https://lore.kernel.org/linux-mm/abv+rjgyArqZ2uym@yjaykim-PowerEdge-T330/T/#m924fa3e58d0f0da488300653163ee8db7e870e4a
v5: https://lore.kernel.org/linux-mm/ab0YEn+Fd41q6LM7@yjaykim-PowerEdge-T330/T/#m8409d470c68cb152b0849940759bff7d7806f397
Testing:
- Hibernate/resume via sysfs
(echo reboot > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate with suspend via sysfs
(echo suspend > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate/resume via uswsusp (suspend-utils s2disk/resume on QEMU)
- Verified swap I/O works correctly after resume.
- Verified swapoff succeeds after snapshot resume completes.
- Verified pm_restore_gfp_mask() WARNING no longer triggers (Patch 3).
- Verified SNAPSHOT_FREEZE followed by snapshot_release() does not
trigger pm_restore_gfp_mask() WARNING (Patch 3).
- swapoff during active uswsusp session:
- Verified swapoff returns -EBUSY while swap device is pinned (Patch 1).
- Verified swapoff succeeds after uswsusp process terminates.
Changelog:
v5 -> v6:
- Replaced get/put reference approach with SWP_HIBERNATION
pinning to prevent swapoff, per Kairui's feedback. Renamed helpers
from get/find/put_hibernation_swap_type() to
pin/find/unpin_hibernation_swap_type().
- Removed swapoff waiting on hibernation reference.
swapoff now returns -EBUSY immediately when the swap device is
pinned.
- Updated function comments per Kairui's review.
- Updated commit message.
- Fixed pm_restore_gfp_mask_safe() to use saved_gfp_count
check instead of saved_gfp_mask, and hold system_transition_mutex
for the check. Addressed AI review finding that SNAPSHOT_FREEZE
followed by snapshot_release() could still trigger the WARNING.
v4 -> v5:
- Rebased onto v7.0-rc4 (Rafael J. Wysocki comment)
- No functional changes. rebase conflict fix.
rfc v3 -> v4:
- Introduced get/find/put_hibernation_swap_type() helpers per Kairui's
feedback. find_ for lookup-only, get/put for reference management.
- Switched to swap_type_to_info() and added type < 0 check per
Kairui's suggestion.
- Fixed get_hibernation_swap_type() return when ref == false (Reviewed by Kairui)
- Made swapoff wait interruptible to prevent hang when uswsusp
holds a swap reference.
- Fixed spurious WARN_ON in pm_restore_gfp_mask() by introducing
pm_restore_gfp_mask_safe() (Patch 3).
- Updated commit messages and added comments for clarity.
- Rebased onto latest mm-new tree.
rfc v2 -> rfc v3:
- Split into 2 patches per Chris Li's feedback.
- Simplified by not holding reference in normal hibernation path
per Chris Li's suggestion.
- Removed redundant SWP_WRITEOK check.
- Rebased onto f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b.
rfc v1 -> rfc v2:
- Squashed into single patch per Usama Arif's feedback.
Youngjun Park (3):
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device
mm/swap: remove redundant swap device reference in alloc/free
PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
include/linux/suspend.h | 1 +
include/linux/swap.h | 5 +-
kernel/power/main.c | 18 ++++
kernel/power/swap.c | 2 +-
kernel/power/user.c | 21 +++--
mm/swapfile.c | 178 +++++++++++++++++++++++++++++++---------
6 files changed, 178 insertions(+), 47 deletions(-)
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v6 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
2026-03-20 17:03 [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
@ 2026-03-20 17:03 ` Youngjun Park
2026-03-20 17:03 ` [PATCH v6 2/3] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Youngjun Park @ 2026-03-20 17:03 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
Hibernation via uswsusp (/dev/snapshot ioctls) has a race window:
after selecting the resume swap area but before user space is frozen,
swapoff may run and invalidate the selected swap device.
Fix this by pinning the swap device with SWP_HIBERNATION while it is
in use. The pin is exclusive, which is sufficient since
hibernate_acquire() already prevents concurrent hibernation sessions.
The kernel swsusp path (sysfs-based hibernate/resume) uses
find_hibernation_swap_type() which is not affected by the pin. It
freezes user space before touching swap, so swapoff cannot race.
Introduce dedicated helpers:
- pin_hibernation_swap_type(): Look up and pin the swap device.
Used by the uswsusp path.
- find_hibernation_swap_type(): Lookup without pinning.
Used by the kernel swsusp path.
- unpin_hibernation_swap_type(): Clear the hibernation pin.
While a swap device is pinned, swapoff is prevented from proceeding.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
include/linux/swap.h | 5 +-
kernel/power/swap.c | 2 +-
kernel/power/user.c | 15 ++++-
mm/swapfile.c | 135 ++++++++++++++++++++++++++++++++++++++-----
4 files changed, 136 insertions(+), 21 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 62fc7499b408..82bfc965c3f8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -216,6 +216,7 @@ enum {
SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */
SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */
SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */
+ SWP_HIBERNATION = (1 << 13), /* pinned for hibernation */
/* add others here before... */
};
@@ -452,7 +453,9 @@ static inline long get_nr_swap_pages(void)
extern void si_swapinfo(struct sysinfo *);
extern int add_swap_count_continuation(swp_entry_t, gfp_t);
-int swap_type_of(dev_t device, sector_t offset);
+int pin_hibernation_swap_type(dev_t device, sector_t offset);
+void unpin_hibernation_swap_type(int type);
+int find_hibernation_swap_type(dev_t device, sector_t offset);
int find_first_swap(dev_t *device);
extern unsigned int count_swap_pages(int, int);
extern sector_t swapdev_block(int, pgoff_t);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 2e64869bb5a0..cc4764149e8f 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -341,7 +341,7 @@ static int swsusp_swap_check(void)
* This is called before saving the image.
*/
if (swsusp_resume_device)
- res = swap_type_of(swsusp_resume_device, swsusp_resume_block);
+ res = find_hibernation_swap_type(swsusp_resume_device, swsusp_resume_block);
else
res = find_first_swap(&swsusp_resume_device);
if (res < 0)
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4401cfe26e5c..aab9aece1009 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
memset(&data->handle, 0, sizeof(struct snapshot_handle));
if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
/* Hibernating. The image device should be accessible. */
- data->swap = swap_type_of(swsusp_resume_device, 0);
+ data->swap = pin_hibernation_swap_type(swsusp_resume_device, 0);
data->mode = O_RDONLY;
data->free_bitmaps = false;
error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
@@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp)
data->free_bitmaps = !error;
}
}
- if (error)
+ if (error) {
+ unpin_hibernation_swap_type(data->swap);
hibernate_release();
+ }
data->frozen = false;
data->ready = false;
@@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
data = filp->private_data;
data->dev = 0;
free_all_swap_pages(data->swap);
+ unpin_hibernation_swap_type(data->swap);
if (data->frozen) {
pm_restore_gfp_mask();
free_basic_memory_bitmaps();
@@ -235,11 +238,17 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
offset = swap_area.offset;
}
+ /*
+ * Pin the swap device if a swap area was already
+ * set by SNAPSHOT_SET_SWAP_AREA.
+ */
+ unpin_hibernation_swap_type(data->swap);
+
/*
* User space encodes device types as two-byte values,
* so we need to recode them
*/
- data->swap = swap_type_of(swdev, offset);
+ data->swap = pin_hibernation_swap_type(swdev, offset);
if (data->swap < 0)
return swdev ? -ENODEV : -EINVAL;
data->dev = swdev;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..ac1574acade7 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -133,7 +133,7 @@ static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) = {
/* May return NULL on invalid type, caller must check for NULL return */
static struct swap_info_struct *swap_type_to_info(int type)
{
- if (type >= MAX_SWAPFILES)
+ if (type < 0 || type >= MAX_SWAPFILES)
return NULL;
return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
@@ -1972,22 +1972,15 @@ void swap_free_hibernation_slot(swp_entry_t entry)
put_swap_device(si);
}
-/*
- * Find the swap type that corresponds to given device (if any).
- *
- * @offset - number of the PAGE_SIZE-sized block of the device, starting
- * from 0, in which the swap header is expected to be located.
- *
- * This is needed for the suspend to disk (aka swsusp).
- */
-int swap_type_of(dev_t device, sector_t offset)
+static int __find_hibernation_swap_type(dev_t device, sector_t offset)
{
int type;
+ lockdep_assert_held(&swap_lock);
+
if (!device)
- return -1;
+ return -EINVAL;
- spin_lock(&swap_lock);
for (type = 0; type < nr_swapfiles; type++) {
struct swap_info_struct *sis = swap_info[type];
@@ -1997,16 +1990,118 @@ int swap_type_of(dev_t device, sector_t offset)
if (device == sis->bdev->bd_dev) {
struct swap_extent *se = first_se(sis);
- if (se->start_block == offset) {
- spin_unlock(&swap_lock);
+ if (se->start_block == offset)
return type;
- }
}
}
- spin_unlock(&swap_lock);
return -ENODEV;
}
+/**
+ * pin_hibernation_swap_type - Pin the swap device for hibernation
+ * @device: Block device containing the resume image
+ * @offset: Offset identifying the swap area
+ *
+ * Locate the swap device for @device/@offset and mark it as pinned
+ * for hibernation. While pinned, swapoff() is prevented.
+ *
+ * Only one uswsusp context may pin a swap device at a time.
+ * If already pinned, this function returns -EBUSY.
+ *
+ * Return:
+ * >= 0 on success (swap type).
+ * -EINVAL if @device is invalid.
+ * -ENODEV if the swap device is not found.
+ * -EBUSY if the device is already pinned for hibernation.
+ */
+int pin_hibernation_swap_type(dev_t device, sector_t offset)
+{
+ int type;
+ struct swap_info_struct *si;
+
+ spin_lock(&swap_lock);
+
+ type = __find_hibernation_swap_type(device, offset);
+ if (type < 0) {
+ spin_unlock(&swap_lock);
+ return type;
+ }
+
+ si = swap_type_to_info(type);
+ if (WARN_ON_ONCE(!si)) {
+ spin_unlock(&swap_lock);
+ return -ENODEV;
+ }
+
+ /*
+ * hibernate_acquire() prevents concurrent hibernation sessions.
+ * This check additionally guards against double-pinning within
+ * the same session.
+ */
+ if (WARN_ON_ONCE(si->flags & SWP_HIBERNATION)) {
+ spin_unlock(&swap_lock);
+ return -EBUSY;
+ }
+
+ si->flags |= SWP_HIBERNATION;
+
+ spin_unlock(&swap_lock);
+ return type;
+}
+
+/**
+ * unpin_hibernation_swap_type - Unpin the swap device for hibernation
+ * @type: Swap type previously returned by pin_hibernation_swap_type()
+ *
+ * Clear the hibernation pin on the given swap device, allowing
+ * swapoff() to proceed normally.
+ *
+ * If @type does not refer to a valid swap device, this function
+ * does nothing.
+ */
+void unpin_hibernation_swap_type(int type)
+{
+ struct swap_info_struct *si;
+
+ spin_lock(&swap_lock);
+ si = swap_type_to_info(type);
+ if (!si) {
+ spin_unlock(&swap_lock);
+ return;
+ }
+ si->flags &= ~SWP_HIBERNATION;
+ spin_unlock(&swap_lock);
+}
+
+/**
+ * find_hibernation_swap_type - Find swap type for hibernation
+ * @device: Block device containing the resume image
+ * @offset: Offset within the device identifying the swap area
+ *
+ * Locate the swap device corresponding to @device and @offset.
+ *
+ * Unlike pin_hibernation_swap_type(), this function only performs a
+ * lookup and does not mark the swap device as pinned for hibernation.
+ *
+ * This is safe in the sysfs-based hibernation path where user space
+ * is already frozen and swapoff() cannot run concurrently.
+ *
+ * Return:
+ * A non-negative swap type on success.
+ * -EINVAL if @device is invalid.
+ * -ENODEV if no matching swap device is found.
+ */
+int find_hibernation_swap_type(dev_t device, sector_t offset)
+{
+ int type;
+
+ spin_lock(&swap_lock);
+ type = __find_hibernation_swap_type(device, offset);
+ spin_unlock(&swap_lock);
+
+ return type;
+}
+
int find_first_swap(dev_t *device)
{
int type;
@@ -2803,6 +2898,14 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
+
+ /* Refuse swapoff while the device is pinned for hibernation */
+ if (p->flags & SWP_HIBERNATION) {
+ err = -EBUSY;
+ spin_unlock(&swap_lock);
+ goto out_dput;
+ }
+
if (!security_vm_enough_memory_mm(current->mm, p->pages))
vm_unacct_memory(p->pages);
else {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 2/3] mm/swap: remove redundant swap device reference in alloc/free
2026-03-20 17:03 [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-20 17:03 ` [PATCH v6 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
@ 2026-03-20 17:03 ` Youngjun Park
2026-03-20 17:03 ` [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path Youngjun Park
2026-03-21 1:22 ` [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
3 siblings, 0 replies; 10+ messages in thread
From: Youngjun Park @ 2026-03-20 17:03 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
In the previous commit, uswsusp was modified to pin the swap device
when the swap type is determined, ensuring the device remains valid
throughout the hibernation I/O path.
Therefore, it is no longer necessary to repeatedly get and put the swap
device reference for each swap slot allocation and free operation.
For hibernation via the sysfs interface, user-space tasks are frozen
before swap allocation begins, so swapoff cannot race with allocation.
After resume, tasks remain frozen while swap slots are freed, so
additional reference management is not required there either.
Remove the redundant swap device get/put operations from the
hibernation swap allocation and free paths.
Also remove the SWP_WRITEOK check before allocation, as the cluster
allocation logic already validates the swap device state.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
mm/swapfile.c | 43 ++++++++++++++++++++-----------------------
1 file changed, 20 insertions(+), 23 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ac1574acade7..dd9631658808 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1923,7 +1923,12 @@ void swap_put_entries_direct(swp_entry_t entry, int nr)
}
#ifdef CONFIG_HIBERNATION
-/* Allocate a slot for hibernation */
+/*
+ * Allocate a slot for hibernation.
+ *
+ * Note: The caller must ensure the swap device is stable, either by
+ * holding a reference or by freezing user-space before calling this.
+ */
swp_entry_t swap_alloc_hibernation_slot(int type)
{
struct swap_info_struct *si = swap_type_to_info(type);
@@ -1933,43 +1938,35 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
if (!si)
goto fail;
- /* This is called for allocating swap entry, not cache */
- if (get_swap_device_info(si)) {
- if (si->flags & SWP_WRITEOK) {
- /*
- * Grab the local lock to be compliant
- * with swap table allocation.
- */
- local_lock(&percpu_swap_cluster.lock);
- offset = cluster_alloc_swap_entry(si, NULL);
- local_unlock(&percpu_swap_cluster.lock);
- if (offset)
- entry = swp_entry(si->type, offset);
- }
- put_swap_device(si);
- }
+ /*
+ * Grab the local lock to be compliant
+ * with swap table allocation.
+ */
+ local_lock(&percpu_swap_cluster.lock);
+ offset = cluster_alloc_swap_entry(si, NULL);
+ local_unlock(&percpu_swap_cluster.lock);
+ if (offset)
+ entry = swp_entry(si->type, offset);
fail:
return entry;
}
-/* Free a slot allocated by swap_alloc_hibernation_slot */
+/*
+ * Free a slot allocated by swap_alloc_hibernation_slot.
+ * As with allocation, the caller must ensure the swap device is stable.
+ */
void swap_free_hibernation_slot(swp_entry_t entry)
{
- struct swap_info_struct *si;
+ struct swap_info_struct *si = __swap_entry_to_info(entry);
struct swap_cluster_info *ci;
pgoff_t offset = swp_offset(entry);
- si = get_swap_device(entry);
- if (WARN_ON(!si))
- return;
-
ci = swap_cluster_lock(si, offset);
swap_put_entry_locked(si, ci, offset);
swap_cluster_unlock(ci);
/* In theory readahead might add it to the swap cache by accident */
__try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
- put_swap_device(si);
}
static int __find_hibernation_swap_type(dev_t device, sector_t offset)
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-20 17:03 [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-20 17:03 ` [PATCH v6 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
2026-03-20 17:03 ` [PATCH v6 2/3] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
@ 2026-03-20 17:03 ` Youngjun Park
2026-03-20 18:20 ` Rafael J. Wysocki
2026-03-21 1:22 ` [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
3 siblings, 1 reply; 10+ messages in thread
From: Youngjun Park @ 2026-03-20 17:03 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
Commit 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask()
stacking") introduced refcount-based GFP mask management that warns
when pm_restore_gfp_mask() is called with saved_gfp_count == 0:
WARNING: kernel/power/main.c:44 at pm_restore_gfp_mask+0xd7/0xf0
CPU: 0 UID: 0 PID: 373 Comm: s2disk
Call Trace:
snapshot_ioctl+0x964/0xbd0
__x64_sys_ioctl+0x724/0x1320
...
The uswsusp path calls pm_restore_gfp_mask() defensively in
SNAPSHOT_CREATE_IMAGE, SNAPSHOT_UNFREEZE, and snapshot_release(),
where the GFP mask may or may not be restricted depending on the
execution path.
Before the stacking change this was a silent no-op; it now triggers
a WARNING when saved_gfp_count is 0.
Introduce pm_restore_gfp_mask_safe(), which skips the call if
saved_gfp_count is 0. This avoids the warning without requiring
state tracking in snapshot_ioctl, which could otherwise leave the
GFP mask permanently restricted if mismanaged.
Fixes: 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask() stacking")
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
include/linux/suspend.h | 1 +
kernel/power/main.c | 18 ++++++++++++++++++
kernel/power/user.c | 6 +++---
3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index b02876f1ae38..7777931d88a5 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -454,6 +454,7 @@ extern void pm_report_hw_sleep_time(u64 t);
extern void pm_report_max_hw_sleep(u64 t);
void pm_restrict_gfp_mask(void);
void pm_restore_gfp_mask(void);
+void pm_restore_gfp_mask_safe(void);
#define pm_notifier(fn, pri) { \
static struct notifier_block fn##_nb = \
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 5f8c9e12eaec..90e9bd56a433 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -36,6 +36,24 @@
static unsigned int saved_gfp_count;
static gfp_t saved_gfp_mask;
+/**
+ * pm_restore_gfp_mask_safe - Conditionally restore the GFP mask
+ *
+ * Call pm_restore_gfp_mask() only if a GFP restriction is active.
+ *
+ * After GFP mask stacking was introduced, calling
+ * pm_restore_gfp_mask() without a matching restriction triggers a
+ * warning. Some hibernation paths invoke restore defensively, so this
+ * helper avoids spurious warnings when no restriction is in place.
+ */
+void pm_restore_gfp_mask_safe(void)
+{
+ WARN_ON(!mutex_is_locked(&system_transition_mutex));
+ if (!saved_gfp_count)
+ return;
+ pm_restore_gfp_mask();
+}
+
void pm_restore_gfp_mask(void)
{
WARN_ON(!mutex_is_locked(&system_transition_mutex));
diff --git a/kernel/power/user.c b/kernel/power/user.c
index aab9aece1009..88de4b76a9dc 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -119,7 +119,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
free_all_swap_pages(data->swap);
unpin_hibernation_swap_type(data->swap);
if (data->frozen) {
- pm_restore_gfp_mask();
+ pm_restore_gfp_mask_safe();
free_basic_memory_bitmaps();
thaw_processes();
} else if (data->free_bitmaps) {
@@ -306,7 +306,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
case SNAPSHOT_UNFREEZE:
if (!data->frozen || data->ready)
break;
- pm_restore_gfp_mask();
+ pm_restore_gfp_mask_safe();
free_basic_memory_bitmaps();
data->free_bitmaps = false;
thaw_processes();
@@ -318,7 +318,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
error = -EPERM;
break;
}
- pm_restore_gfp_mask();
+ pm_restore_gfp_mask_safe();
error = hibernation_snapshot(data->platform_support);
if (!error) {
error = put_user(in_suspend, (int __user *)arg);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-20 17:03 ` [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path Youngjun Park
@ 2026-03-20 18:20 ` Rafael J. Wysocki
2026-03-21 10:48 ` YoungJun Park
0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2026-03-20 18:20 UTC (permalink / raw)
To: Youngjun Park
Cc: rafael, akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe,
baohua, usama.arif, linux-pm, linux-mm
Again, this patch does not appear to depend on the rest of the series,
in which case there's no need to send the other patches along with it.
Please send the next version separately unless there is a dependency
on the other two patches, but in that case that dependency should be
explained in the changelog.
On Fri, Mar 20, 2026 at 6:03 PM Youngjun Park <youngjun.park@lge.com> wrote:
>
> Commit 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask()
> stacking") introduced refcount-based GFP mask management that warns
> when pm_restore_gfp_mask() is called with saved_gfp_count == 0:
>
> WARNING: kernel/power/main.c:44 at pm_restore_gfp_mask+0xd7/0xf0
> CPU: 0 UID: 0 PID: 373 Comm: s2disk
> Call Trace:
> snapshot_ioctl+0x964/0xbd0
> __x64_sys_ioctl+0x724/0x1320
> ...
>
> The uswsusp path calls pm_restore_gfp_mask() defensively in
> SNAPSHOT_CREATE_IMAGE, SNAPSHOT_UNFREEZE, and snapshot_release(),
> where the GFP mask may or may not be restricted depending on the
> execution path.
>
> Before the stacking change this was a silent no-op; it now triggers
> a WARNING when saved_gfp_count is 0.
>
> Introduce pm_restore_gfp_mask_safe(), which skips the call if
It would be better to call this pm_restore_gfp_mask_nowarn() IMV.
> saved_gfp_count is 0. This avoids the warning without requiring
> state tracking in snapshot_ioctl, which could otherwise leave the
> GFP mask permanently restricted if mismanaged.
>
> Fixes: 35e4a69b2003f ("PM: sleep: Allow pm_restrict_gfp_mask() stacking")
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
> include/linux/suspend.h | 1 +
> kernel/power/main.c | 18 ++++++++++++++++++
> kernel/power/user.c | 6 +++---
> 3 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index b02876f1ae38..7777931d88a5 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -454,6 +454,7 @@ extern void pm_report_hw_sleep_time(u64 t);
> extern void pm_report_max_hw_sleep(u64 t);
> void pm_restrict_gfp_mask(void);
> void pm_restore_gfp_mask(void);
> +void pm_restore_gfp_mask_safe(void);
>
> #define pm_notifier(fn, pri) { \
> static struct notifier_block fn##_nb = \
> diff --git a/kernel/power/main.c b/kernel/power/main.c
> index 5f8c9e12eaec..90e9bd56a433 100644
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -36,6 +36,24 @@
> static unsigned int saved_gfp_count;
> static gfp_t saved_gfp_mask;
>
> +/**
> + * pm_restore_gfp_mask_safe - Conditionally restore the GFP mask
> + *
> + * Call pm_restore_gfp_mask() only if a GFP restriction is active.
> + *
> + * After GFP mask stacking was introduced, calling
> + * pm_restore_gfp_mask() without a matching restriction triggers a
> + * warning. Some hibernation paths invoke restore defensively, so this
> + * helper avoids spurious warnings when no restriction is in place.
> + */
> +void pm_restore_gfp_mask_safe(void)
> +{
> + WARN_ON(!mutex_is_locked(&system_transition_mutex));
Could this be changed to lockdep_assert_held()?
> + if (!saved_gfp_count)
> + return;
Please add an empty line here.
> + pm_restore_gfp_mask();
> +}
> +
> void pm_restore_gfp_mask(void)
> {
> WARN_ON(!mutex_is_locked(&system_transition_mutex));
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index aab9aece1009..88de4b76a9dc 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -119,7 +119,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
> free_all_swap_pages(data->swap);
> unpin_hibernation_swap_type(data->swap);
> if (data->frozen) {
> - pm_restore_gfp_mask();
> + pm_restore_gfp_mask_safe();
> free_basic_memory_bitmaps();
> thaw_processes();
> } else if (data->free_bitmaps) {
> @@ -306,7 +306,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
> case SNAPSHOT_UNFREEZE:
> if (!data->frozen || data->ready)
> break;
> - pm_restore_gfp_mask();
> + pm_restore_gfp_mask_safe();
> free_basic_memory_bitmaps();
> data->free_bitmaps = false;
> thaw_processes();
> @@ -318,7 +318,7 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
> error = -EPERM;
> break;
> }
> - pm_restore_gfp_mask();
> + pm_restore_gfp_mask_safe();
> error = hibernation_snapshot(data->platform_support);
> if (!error) {
> error = put_user(in_suspend, (int __user *)arg);
> --
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap
2026-03-20 17:03 [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
` (2 preceding siblings ...)
2026-03-20 17:03 ` [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path Youngjun Park
@ 2026-03-21 1:22 ` Andrew Morton
3 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2026-03-21 1:22 UTC (permalink / raw)
To: Youngjun Park
Cc: rafael, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sat, 21 Mar 2026 02:03:10 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
> Currently, in the uswsusp path, only the swap type value is retrieved at
> lookup time without holding a reference. If swapoff races after the type
> is acquired, subsequent slot allocations operate on a stale swap device.
>
> Additionally, grabbing and releasing the swap device reference on every
> slot allocation is inefficient across the entire hibernation swap path.
AI review has a couple of questions:
https://sashiko.dev/#/patchset/20260320170313.163386-1-youngjun.park@lge.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-20 18:20 ` Rafael J. Wysocki
@ 2026-03-21 10:48 ` YoungJun Park
2026-03-21 11:32 ` Rafael J. Wysocki
0 siblings, 1 reply; 10+ messages in thread
From: YoungJun Park @ 2026-03-21 10:48 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Fri, Mar 20, 2026 at 07:20:24PM +0100, Rafael J. Wysocki wrote:
>...
Hi Rafael,
Thanks for the review. Sorry for the back and forth on this one.
I'm preparing to send the uswsusp GFP mask fix separately as you
requested (with _nowarn rename and lockdep_assert_held).
Before sending, I wanted to check on the approach. I originally
found this WARNING while testing uswsusp and thought it was a
localized issue. But AI review has kept uncovering more cases —
first SNAPSHOT_FREEZE + snapshot_release(), and now
dpm_resume_end() when dpm_prepare() fails. More callers than I
expected use this defensive restore pattern.
So I'd like your thoughts on the approach:
1. Introduce pm_restore_gfp_mask_nowarn() and update each caller.
2. Remove the WARN_ON from pm_restore_gfp_mask() itself, restoring
the pre-stacking no-op behavior.
I'm leaning towards option 2. Defensive restores are an established
pattern in multiple paths, and warning against a legitimate no-op
seems counterproductive — we'd just be playing whack-a-mole with
_nowarn conversions as more callers turn up.
What do you think?
Youngjun Park
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-21 10:48 ` YoungJun Park
@ 2026-03-21 11:32 ` Rafael J. Wysocki
2026-03-21 11:45 ` Rafael J. Wysocki
0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2026-03-21 11:32 UTC (permalink / raw)
To: YoungJun Park
Cc: Rafael J. Wysocki, akpm, chrisl, kasong, pavel, shikemeng,
nphamcs, bhe, baohua, usama.arif, linux-pm, linux-mm
On Sat, Mar 21, 2026 at 11:48 AM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Fri, Mar 20, 2026 at 07:20:24PM +0100, Rafael J. Wysocki wrote:
> >...
>
> Hi Rafael,
>
> Thanks for the review. Sorry for the back and forth on this one.
> I'm preparing to send the uswsusp GFP mask fix separately as you
> requested (with _nowarn rename and lockdep_assert_held).
>
> Before sending, I wanted to check on the approach. I originally
> found this WARNING while testing uswsusp and thought it was a
> localized issue. But AI review has kept uncovering more cases —
> first SNAPSHOT_FREEZE + snapshot_release(), and now
> dpm_resume_end() when dpm_prepare() fails.
dpm_resume_end() should not be called after a failing dpm_prepare().
Which code path is that?
> More callers than I expected use this defensive restore pattern.
>
> So I'd like your thoughts on the approach:
>
> 1. Introduce pm_restore_gfp_mask_nowarn() and update each caller.
>
> 2. Remove the WARN_ON from pm_restore_gfp_mask() itself, restoring
> the pre-stacking no-op behavior.
>
> I'm leaning towards option 2. Defensive restores are an established
> pattern in multiple paths, and warning against a legitimate no-op
> seems counterproductive — we'd just be playing whack-a-mole with
> _nowarn conversions as more callers turn up.
>
> What do you think?
Using the _nowarn() wariant would help to annotate code paths where
omitting the warning is legitimate.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-21 11:32 ` Rafael J. Wysocki
@ 2026-03-21 11:45 ` Rafael J. Wysocki
2026-03-22 11:31 ` Rafael J. Wysocki
0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2026-03-21 11:45 UTC (permalink / raw)
To: YoungJun Park
Cc: akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sat, Mar 21, 2026 at 12:32 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, Mar 21, 2026 at 11:48 AM YoungJun Park <youngjun.park@lge.com> wrote:
> >
> > On Fri, Mar 20, 2026 at 07:20:24PM +0100, Rafael J. Wysocki wrote:
> > >...
> >
> > Hi Rafael,
> >
> > Thanks for the review. Sorry for the back and forth on this one.
> > I'm preparing to send the uswsusp GFP mask fix separately as you
> > requested (with _nowarn rename and lockdep_assert_held).
> >
> > Before sending, I wanted to check on the approach. I originally
> > found this WARNING while testing uswsusp and thought it was a
> > localized issue. But AI review has kept uncovering more cases —
> > first SNAPSHOT_FREEZE + snapshot_release(), and now
> > dpm_resume_end() when dpm_prepare() fails.
>
> dpm_resume_end() should not be called after a failing dpm_prepare().
>
> Which code path is that?
OK, I see.
Let me have a deeper look at this.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path
2026-03-21 11:45 ` Rafael J. Wysocki
@ 2026-03-22 11:31 ` Rafael J. Wysocki
0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2026-03-22 11:31 UTC (permalink / raw)
To: YoungJun Park
Cc: akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sat, Mar 21, 2026 at 12:45 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, Mar 21, 2026 at 12:32 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Sat, Mar 21, 2026 at 11:48 AM YoungJun Park <youngjun.park@lge.com> wrote:
> > >
> > > On Fri, Mar 20, 2026 at 07:20:24PM +0100, Rafael J. Wysocki wrote:
> > > >...
> > >
> > > Hi Rafael,
> > >
> > > Thanks for the review. Sorry for the back and forth on this one.
> > > I'm preparing to send the uswsusp GFP mask fix separately as you
> > > requested (with _nowarn rename and lockdep_assert_held).
> > >
> > > Before sending, I wanted to check on the approach. I originally
> > > found this WARNING while testing uswsusp and thought it was a
> > > localized issue. But AI review has kept uncovering more cases —
> > > first SNAPSHOT_FREEZE + snapshot_release(), and now
> > > dpm_resume_end() when dpm_prepare() fails.
> >
> > dpm_resume_end() should not be called after a failing dpm_prepare().
> >
> > Which code path is that?
>
> OK, I see.
>
> Let me have a deeper look at this.
So I agree with you that the most straightforward way to address the
spurious warnings is to remove the WARN_ON() around the
!saved_gfp_count check in pm_restore_gfp_mask() while retaining the
check itself.
Please feel free to send a patch making that change.
Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-22 11:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 17:03 [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-20 17:03 ` [PATCH v6 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
2026-03-20 17:03 ` [PATCH v6 2/3] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
2026-03-20 17:03 ` [PATCH v6 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path Youngjun Park
2026-03-20 18:20 ` Rafael J. Wysocki
2026-03-21 10:48 ` YoungJun Park
2026-03-21 11:32 ` Rafael J. Wysocki
2026-03-21 11:45 ` Rafael J. Wysocki
2026-03-22 11:31 ` Rafael J. Wysocki
2026-03-21 1:22 ` [PATCH v6 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox