* [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap
@ 2026-03-21 10:33 Youngjun Park
2026-03-21 10:33 ` [PATCH v7 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Youngjun Park @ 2026-03-21 10:33 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
Apologies for the frequent revisions. Hopefully this version is close to final.
Currently, in the uswsusp path, only the swap type value is retrieved at
lookup time without holding a reference. If swapoff races after the type
is acquired, subsequent slot allocations operate on a stale swap device.
Additionally, grabbing and releasing the swap device reference on every
slot allocation is inefficient across the entire hibernation swap path.
This patch series addresses these issues:
- Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device
from the point it is looked up until the session completes.
- Patch 2: Removes the overhead of per-slot reference counting in alloc/free
paths and cleans up the redundant SWP_WRITEOK check.
This series is based on v7.0-rc4(Refael's request for PM's modification)
. Happy to rebase onto mm-new if needed.
Links:
RFC v1: https://lore.kernel.org/linux-mm/20260305202413.1888499-1-usama.arif@linux.dev/T/#m3693d45180f14f441b6951984f4b4bfd90ec0c9d
RFC v2: https://lore.kernel.org/linux-mm/20260306024608.1720991-1-youngjun.park@lge.com/
RFC v3: https://lore.kernel.org/linux-mm/20260312112511.3596781-1-youngjun.park@lge.com/
v4: https://lore.kernel.org/linux-mm/abv+rjgyArqZ2uym@yjaykim-PowerEdge-T330/T/#m924fa3e58d0f0da488300653163ee8db7e870e4a
v5: https://lore.kernel.org/linux-mm/ab0YEn+Fd41q6LM7@yjaykim-PowerEdge-T330/T/#m8409d470c68cb152b0849940759bff7d7806f397
v6: https://lore.kernel.org/linux-mm/20260320182227.896f9ab62d62961b2caab5f7@linux-foundation.org/T/#m10ee3346cd8dcd052749105d9a8e2052dbf3bc80
Testing:
- Hibernate/resume via sysfs
(echo reboot > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate with suspend via sysfs
(echo suspend > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate/resume via uswsusp (suspend-utils s2disk/resume on QEMU)
- Verified swap I/O works correctly after resume.
- Verified swapoff succeeds after snapshot resume completes.
- swapoff during active uswsusp session:
- Verified swapoff returns -EBUSY while swap device is pinned (Patch 1).
- Verified swapoff succeeds after uswsusp process terminates.
Changelog:
v6 -> v7:
- Dropped Patch 3 (pm_restore_gfp_mask fix) from series as it has
no dependency on Patches 1-2. Will be sent separately.
(Rafael J. Wysocki feedback)
- Andrew Morton's AI review findings applied only to Patch 3;
Patches 1-2 are unchanged. (no problem on AI's review)
v5 -> v6:
- Replaced get/put reference approach with SWP_HIBERNATION
pinning to prevent swapoff, per Kairui's feedback. Renamed helpers
from get/find/put_hibernation_swap_type() to
pin/find/unpin_hibernation_swap_type().
- Renamed swap_type_of() to __find_hibernation_swap_type() since
it is now an internal helper with no external callers.
(Kairui's feedback)
- Removed swapoff waiting on hibernation reference.
swapoff now returns -EBUSY immediately when the swap device is
pinned.
- Updated function comments per Kairui's review.
- Updated commit message.
v4 -> v5:
- Rebased onto v7.0-rc4 (Rafael J. Wysocki comment)
- No functional changes. rebase conflict fix.
rfc v3 -> v4:
- Introduced get/find/put_hibernation_swap_type() helpers per Kairui's
feedback. find_ for lookup-only, get/put for reference management.
- Switched to swap_type_to_info() and added type < 0 check per
Kairui's suggestion.
- Fixed get_hibernation_swap_type() return when ref == false (Reviewed by Kairui)
- Made swapoff wait interruptible to prevent hang when uswsusp
holds a swap reference.
- Rebased onto latest mm-new tree.
rfc v2 -> rfc v3:
- Split into 2 patches per Chris Li's feedback.
- Simplified by not holding reference in normal hibernation path
per Chris Li's suggestion.
- Removed redundant SWP_WRITEOK check.
- Rebased onto f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b.
rfc v1 -> rfc v2:
- Squashed into single patch per Usama Arif's feedback.
Youngjun Park (2):
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device
mm/swap: remove redundant swap device reference in alloc/free
include/linux/swap.h | 5 +-
kernel/power/swap.c | 2 +-
kernel/power/user.c | 15 +++-
mm/swapfile.c | 178 +++++++++++++++++++++++++++++++++----------
4 files changed, 156 insertions(+), 44 deletions(-)
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v7 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
2026-03-21 10:33 [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
@ 2026-03-21 10:33 ` Youngjun Park
2026-03-21 10:33 ` [PATCH v7 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
2026-03-21 17:59 ` [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
2 siblings, 0 replies; 7+ messages in thread
From: Youngjun Park @ 2026-03-21 10:33 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
Hibernation via uswsusp (/dev/snapshot ioctls) has a race window:
after selecting the resume swap area but before user space is frozen,
swapoff may run and invalidate the selected swap device.
Fix this by pinning the swap device with SWP_HIBERNATION while it is
in use. The pin is exclusive, which is sufficient since
hibernate_acquire() already prevents concurrent hibernation sessions.
The kernel swsusp path (sysfs-based hibernate/resume) uses
find_hibernation_swap_type() which is not affected by the pin. It
freezes user space before touching swap, so swapoff cannot race.
Introduce dedicated helpers:
- pin_hibernation_swap_type(): Look up and pin the swap device.
Used by the uswsusp path.
- find_hibernation_swap_type(): Lookup without pinning.
Used by the kernel swsusp path.
- unpin_hibernation_swap_type(): Clear the hibernation pin.
While a swap device is pinned, swapoff is prevented from proceeding.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
include/linux/swap.h | 5 +-
kernel/power/swap.c | 2 +-
kernel/power/user.c | 15 ++++-
mm/swapfile.c | 135 ++++++++++++++++++++++++++++++++++++++-----
4 files changed, 136 insertions(+), 21 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 62fc7499b408..82bfc965c3f8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -216,6 +216,7 @@ enum {
SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */
SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */
SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */
+ SWP_HIBERNATION = (1 << 13), /* pinned for hibernation */
/* add others here before... */
};
@@ -452,7 +453,9 @@ static inline long get_nr_swap_pages(void)
extern void si_swapinfo(struct sysinfo *);
extern int add_swap_count_continuation(swp_entry_t, gfp_t);
-int swap_type_of(dev_t device, sector_t offset);
+int pin_hibernation_swap_type(dev_t device, sector_t offset);
+void unpin_hibernation_swap_type(int type);
+int find_hibernation_swap_type(dev_t device, sector_t offset);
int find_first_swap(dev_t *device);
extern unsigned int count_swap_pages(int, int);
extern sector_t swapdev_block(int, pgoff_t);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 2e64869bb5a0..cc4764149e8f 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -341,7 +341,7 @@ static int swsusp_swap_check(void)
* This is called before saving the image.
*/
if (swsusp_resume_device)
- res = swap_type_of(swsusp_resume_device, swsusp_resume_block);
+ res = find_hibernation_swap_type(swsusp_resume_device, swsusp_resume_block);
else
res = find_first_swap(&swsusp_resume_device);
if (res < 0)
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4401cfe26e5c..aab9aece1009 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
memset(&data->handle, 0, sizeof(struct snapshot_handle));
if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
/* Hibernating. The image device should be accessible. */
- data->swap = swap_type_of(swsusp_resume_device, 0);
+ data->swap = pin_hibernation_swap_type(swsusp_resume_device, 0);
data->mode = O_RDONLY;
data->free_bitmaps = false;
error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
@@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp)
data->free_bitmaps = !error;
}
}
- if (error)
+ if (error) {
+ unpin_hibernation_swap_type(data->swap);
hibernate_release();
+ }
data->frozen = false;
data->ready = false;
@@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
data = filp->private_data;
data->dev = 0;
free_all_swap_pages(data->swap);
+ unpin_hibernation_swap_type(data->swap);
if (data->frozen) {
pm_restore_gfp_mask();
free_basic_memory_bitmaps();
@@ -235,11 +238,17 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
offset = swap_area.offset;
}
+ /*
+ * Pin the swap device if a swap area was already
+ * set by SNAPSHOT_SET_SWAP_AREA.
+ */
+ unpin_hibernation_swap_type(data->swap);
+
/*
* User space encodes device types as two-byte values,
* so we need to recode them
*/
- data->swap = swap_type_of(swdev, offset);
+ data->swap = pin_hibernation_swap_type(swdev, offset);
if (data->swap < 0)
return swdev ? -ENODEV : -EINVAL;
data->dev = swdev;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..ac1574acade7 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -133,7 +133,7 @@ static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) = {
/* May return NULL on invalid type, caller must check for NULL return */
static struct swap_info_struct *swap_type_to_info(int type)
{
- if (type >= MAX_SWAPFILES)
+ if (type < 0 || type >= MAX_SWAPFILES)
return NULL;
return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
@@ -1972,22 +1972,15 @@ void swap_free_hibernation_slot(swp_entry_t entry)
put_swap_device(si);
}
-/*
- * Find the swap type that corresponds to given device (if any).
- *
- * @offset - number of the PAGE_SIZE-sized block of the device, starting
- * from 0, in which the swap header is expected to be located.
- *
- * This is needed for the suspend to disk (aka swsusp).
- */
-int swap_type_of(dev_t device, sector_t offset)
+static int __find_hibernation_swap_type(dev_t device, sector_t offset)
{
int type;
+ lockdep_assert_held(&swap_lock);
+
if (!device)
- return -1;
+ return -EINVAL;
- spin_lock(&swap_lock);
for (type = 0; type < nr_swapfiles; type++) {
struct swap_info_struct *sis = swap_info[type];
@@ -1997,16 +1990,118 @@ int swap_type_of(dev_t device, sector_t offset)
if (device == sis->bdev->bd_dev) {
struct swap_extent *se = first_se(sis);
- if (se->start_block == offset) {
- spin_unlock(&swap_lock);
+ if (se->start_block == offset)
return type;
- }
}
}
- spin_unlock(&swap_lock);
return -ENODEV;
}
+/**
+ * pin_hibernation_swap_type - Pin the swap device for hibernation
+ * @device: Block device containing the resume image
+ * @offset: Offset identifying the swap area
+ *
+ * Locate the swap device for @device/@offset and mark it as pinned
+ * for hibernation. While pinned, swapoff() is prevented.
+ *
+ * Only one uswsusp context may pin a swap device at a time.
+ * If already pinned, this function returns -EBUSY.
+ *
+ * Return:
+ * >= 0 on success (swap type).
+ * -EINVAL if @device is invalid.
+ * -ENODEV if the swap device is not found.
+ * -EBUSY if the device is already pinned for hibernation.
+ */
+int pin_hibernation_swap_type(dev_t device, sector_t offset)
+{
+ int type;
+ struct swap_info_struct *si;
+
+ spin_lock(&swap_lock);
+
+ type = __find_hibernation_swap_type(device, offset);
+ if (type < 0) {
+ spin_unlock(&swap_lock);
+ return type;
+ }
+
+ si = swap_type_to_info(type);
+ if (WARN_ON_ONCE(!si)) {
+ spin_unlock(&swap_lock);
+ return -ENODEV;
+ }
+
+ /*
+ * hibernate_acquire() prevents concurrent hibernation sessions.
+ * This check additionally guards against double-pinning within
+ * the same session.
+ */
+ if (WARN_ON_ONCE(si->flags & SWP_HIBERNATION)) {
+ spin_unlock(&swap_lock);
+ return -EBUSY;
+ }
+
+ si->flags |= SWP_HIBERNATION;
+
+ spin_unlock(&swap_lock);
+ return type;
+}
+
+/**
+ * unpin_hibernation_swap_type - Unpin the swap device for hibernation
+ * @type: Swap type previously returned by pin_hibernation_swap_type()
+ *
+ * Clear the hibernation pin on the given swap device, allowing
+ * swapoff() to proceed normally.
+ *
+ * If @type does not refer to a valid swap device, this function
+ * does nothing.
+ */
+void unpin_hibernation_swap_type(int type)
+{
+ struct swap_info_struct *si;
+
+ spin_lock(&swap_lock);
+ si = swap_type_to_info(type);
+ if (!si) {
+ spin_unlock(&swap_lock);
+ return;
+ }
+ si->flags &= ~SWP_HIBERNATION;
+ spin_unlock(&swap_lock);
+}
+
+/**
+ * find_hibernation_swap_type - Find swap type for hibernation
+ * @device: Block device containing the resume image
+ * @offset: Offset within the device identifying the swap area
+ *
+ * Locate the swap device corresponding to @device and @offset.
+ *
+ * Unlike pin_hibernation_swap_type(), this function only performs a
+ * lookup and does not mark the swap device as pinned for hibernation.
+ *
+ * This is safe in the sysfs-based hibernation path where user space
+ * is already frozen and swapoff() cannot run concurrently.
+ *
+ * Return:
+ * A non-negative swap type on success.
+ * -EINVAL if @device is invalid.
+ * -ENODEV if no matching swap device is found.
+ */
+int find_hibernation_swap_type(dev_t device, sector_t offset)
+{
+ int type;
+
+ spin_lock(&swap_lock);
+ type = __find_hibernation_swap_type(device, offset);
+ spin_unlock(&swap_lock);
+
+ return type;
+}
+
int find_first_swap(dev_t *device)
{
int type;
@@ -2803,6 +2898,14 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
+
+ /* Refuse swapoff while the device is pinned for hibernation */
+ if (p->flags & SWP_HIBERNATION) {
+ err = -EBUSY;
+ spin_unlock(&swap_lock);
+ goto out_dput;
+ }
+
if (!security_vm_enough_memory_mm(current->mm, p->pages))
vm_unacct_memory(p->pages);
else {
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v7 2/2] mm/swap: remove redundant swap device reference in alloc/free
2026-03-21 10:33 [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-21 10:33 ` [PATCH v7 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
@ 2026-03-21 10:33 ` Youngjun Park
2026-03-21 17:59 ` [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
2 siblings, 0 replies; 7+ messages in thread
From: Youngjun Park @ 2026-03-21 10:33 UTC (permalink / raw)
To: rafael, akpm
Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
youngjun.park, usama.arif, linux-pm, linux-mm
In the previous commit, uswsusp was modified to pin the swap device
when the swap type is determined, ensuring the device remains valid
throughout the hibernation I/O path.
Therefore, it is no longer necessary to repeatedly get and put the swap
device reference for each swap slot allocation and free operation.
For hibernation via the sysfs interface, user-space tasks are frozen
before swap allocation begins, so swapoff cannot race with allocation.
After resume, tasks remain frozen while swap slots are freed, so
additional reference management is not required there either.
Remove the redundant swap device get/put operations from the
hibernation swap allocation and free paths.
Also remove the SWP_WRITEOK check before allocation, as the cluster
allocation logic already validates the swap device state.
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
mm/swapfile.c | 43 ++++++++++++++++++++-----------------------
1 file changed, 20 insertions(+), 23 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ac1574acade7..dd9631658808 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1923,7 +1923,12 @@ void swap_put_entries_direct(swp_entry_t entry, int nr)
}
#ifdef CONFIG_HIBERNATION
-/* Allocate a slot for hibernation */
+/*
+ * Allocate a slot for hibernation.
+ *
+ * Note: The caller must ensure the swap device is stable, either by
+ * holding a reference or by freezing user-space before calling this.
+ */
swp_entry_t swap_alloc_hibernation_slot(int type)
{
struct swap_info_struct *si = swap_type_to_info(type);
@@ -1933,43 +1938,35 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
if (!si)
goto fail;
- /* This is called for allocating swap entry, not cache */
- if (get_swap_device_info(si)) {
- if (si->flags & SWP_WRITEOK) {
- /*
- * Grab the local lock to be compliant
- * with swap table allocation.
- */
- local_lock(&percpu_swap_cluster.lock);
- offset = cluster_alloc_swap_entry(si, NULL);
- local_unlock(&percpu_swap_cluster.lock);
- if (offset)
- entry = swp_entry(si->type, offset);
- }
- put_swap_device(si);
- }
+ /*
+ * Grab the local lock to be compliant
+ * with swap table allocation.
+ */
+ local_lock(&percpu_swap_cluster.lock);
+ offset = cluster_alloc_swap_entry(si, NULL);
+ local_unlock(&percpu_swap_cluster.lock);
+ if (offset)
+ entry = swp_entry(si->type, offset);
fail:
return entry;
}
-/* Free a slot allocated by swap_alloc_hibernation_slot */
+/*
+ * Free a slot allocated by swap_alloc_hibernation_slot.
+ * As with allocation, the caller must ensure the swap device is stable.
+ */
void swap_free_hibernation_slot(swp_entry_t entry)
{
- struct swap_info_struct *si;
+ struct swap_info_struct *si = __swap_entry_to_info(entry);
struct swap_cluster_info *ci;
pgoff_t offset = swp_offset(entry);
- si = get_swap_device(entry);
- if (WARN_ON(!si))
- return;
-
ci = swap_cluster_lock(si, offset);
swap_put_entry_locked(si, ci, offset);
swap_cluster_unlock(ci);
/* In theory readahead might add it to the swap cache by accident */
__try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
- put_swap_device(si);
}
static int __find_hibernation_swap_type(dev_t device, sector_t offset)
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap
2026-03-21 10:33 [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-21 10:33 ` [PATCH v7 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
2026-03-21 10:33 ` [PATCH v7 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
@ 2026-03-21 17:59 ` Andrew Morton
2026-03-22 10:31 ` YoungJun Park
2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2026-03-21 17:59 UTC (permalink / raw)
To: Youngjun Park
Cc: rafael, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sat, 21 Mar 2026 19:33:07 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
> Apologies for the frequent revisions. Hopefully this version is close to final.
>
> Currently, in the uswsusp path, only the swap type value is retrieved at
> lookup time without holding a reference. If swapoff races after the type
> is acquired, subsequent slot allocations operate on a stale swap device.
>
> Additionally, grabbing and releasing the swap device reference on every
> slot allocation is inefficient across the entire hibernation swap path.
>
> This patch series addresses these issues:
> - Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device
> from the point it is looked up until the session completes.
> - Patch 2: Removes the overhead of per-slot reference counting in alloc/free
> paths and cleans up the redundant SWP_WRITEOK check.
>
> ...
>
> v6 -> v7:
> - Dropped Patch 3 (pm_restore_gfp_mask fix) from series as it has
> no dependency on Patches 1-2. Will be sent separately.
> (Rafael J. Wysocki feedback)
> - Andrew Morton's AI review
Well. Roman, Chris, Google and others. I'm just a messenger ;)
> findings applied only to Patch 3;
> Patches 1-2 are unchanged. (no problem on AI's review)
Seems that it changed its mind!
https://sashiko.dev/#/patchset/20260321103309.439265-1-youngjun.park@lge.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap
2026-03-21 17:59 ` [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
@ 2026-03-22 10:31 ` YoungJun Park
2026-03-22 16:30 ` Andrew Morton
2026-03-23 19:56 ` Andrew Morton
0 siblings, 2 replies; 7+ messages in thread
From: YoungJun Park @ 2026-03-22 10:31 UTC (permalink / raw)
To: Andrew Morton
Cc: rafael, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sat, Mar 21, 2026 at 10:59:21AM -0700, Andrew Morton wrote:
> On Sat, 21 Mar 2026 19:33:07 +0900 Youngjun Park <youngjun.park@lge.com> wrote:
>
> > Apologies for the frequent revisions. Hopefully this version is close to final.
> >
> > Currently, in the uswsusp path, only the swap type value is retrieved at
> > lookup time without holding a reference. If swapoff races after the type
> > is acquired, subsequent slot allocations operate on a stale swap device.
> >
> > Additionally, grabbing and releasing the swap device reference on every
> > slot allocation is inefficient across the entire hibernation swap path.
> >
> > This patch series addresses these issues:
> > - Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device
> > from the point it is looked up until the session completes.
> > - Patch 2: Removes the overhead of per-slot reference counting in alloc/free
> > paths and cleans up the redundant SWP_WRITEOK check.
> >
> > ...
> >
> > v6 -> v7:
> > - Dropped Patch 3 (pm_restore_gfp_mask fix) from series as it has
> > no dependency on Patches 1-2. Will be sent separately.
> > (Rafael J. Wysocki feedback)
> > - Andrew Morton's AI review
>
> Well. Roman, Chris, Google and others. I'm just a messenger ;)
> > findings applied only to Patch 3;
> > Patches 1-2 are unchanged. (no problem on AI's review)
>
> Seems that it changed its mind!
>
> https://sashiko.dev/#/patchset/20260321103309.439265-1-youngjun.park@lge.com
Thank you Andrew. It seems sashiko has learned not to go easy
on me twice. :) Will address it.
On the v7.0-rc4 base, add_swap_count_continuation() modifies
si->flags (SWP_CONTINUED) under si->cont_lock without holding
swap_lock, so the non-atomic RMW race is a real concern.
Possible fixes (based on v7.0-rc4):
1. Grab cont_lock on the SWP_HIBERNATION set path, or grab
swap_lock in add_swap_count_continuation(). This would
serialize the race, but adds lock contention on a path
that doesn't really need it.
2. Convert si->flags to atomic ops. This would be the correct
fix, but is quite extensive and better suited as a separate
effort.
However, on mm-new, Kairui's series [1] has removed
add_swap_count_continuation() and SWP_CONTINUED entirely, so
this race path no longer exists (verified by code inspection
and AI review on mm-new).
[1] https://lore.kernel.org/linux-mm/20260128-swap-table-p3-v2-9-fe0b67ef0215@tencent.com/
I based this series on v7.0-rc4 per Rafael's request since it
depends on PM-side changes. I'm not very familiar with how
cross-subsystem dependencies are typically coordinated -- if
rebasing onto mm-new is an option, the race goes away and the
PM-side changes could be picked up separately. Would that be
a reasonable approach? I'd appreciate any guidance on this.
On a side note -- the AI review is becoming genuinely useful.
It might be worth having it gate-check patches before they hit
the mailing list, rather than reviewing after the fact.
Best regards,
Youngjun Park
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap
2026-03-22 10:31 ` YoungJun Park
@ 2026-03-22 16:30 ` Andrew Morton
2026-03-23 19:56 ` Andrew Morton
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2026-03-22 16:30 UTC (permalink / raw)
To: YoungJun Park
Cc: rafael, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sun, 22 Mar 2026 19:31:01 +0900 YoungJun Park <youngjun.park@lge.com> wrote:
> 1. Grab cont_lock on the SWP_HIBERNATION set path, or grab
> swap_lock in add_swap_count_continuation(). This would
> serialize the race, but adds lock contention on a path
> that doesn't really need it.
>
> 2. Convert si->flags to atomic ops. This would be the correct
> fix, but is quite extensive and better suited as a separate
> effort.
>
> However, on mm-new, Kairui's series [1] has removed
> add_swap_count_continuation() and SWP_CONTINUED entirely, so
> this race path no longer exists (verified by code inspection
> and AI review on mm-new).
>
> [1] https://lore.kernel.org/linux-mm/20260128-swap-table-p3-v2-9-fe0b67ef0215@tencent.com/
>
> I based this series on v7.0-rc4 per Rafael's request since it
> depends on PM-side changes. I'm not very familiar with how
> cross-subsystem dependencies are typically coordinated -- if
> rebasing onto mm-new is an option, the race goes away and the
> PM-side changes could be picked up separately. Would that be
> a reasonable approach? I'd appreciate any guidance on this.
The kernel/power/ changes are small. How about we prepare all this
against mm-new and ask Rafael for an ack?
> On a side note -- the AI review is becoming genuinely useful.
> It might be worth having it gate-check patches before they hit
> the mailing list, rather than reviewing after the fact.
Totally agree. Problem is, these review are expensive and permitting
people to check their work before publishing could be terribly abused -
anyone in the world gets free patch review. I don't think Google wants
to pay for that!
Hopefully over time, as engineers see the value in this tool they will
persuade their employers to purchase the tokens they need for such
screening work.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap
2026-03-22 10:31 ` YoungJun Park
2026-03-22 16:30 ` Andrew Morton
@ 2026-03-23 19:56 ` Andrew Morton
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2026-03-23 19:56 UTC (permalink / raw)
To: YoungJun Park
Cc: rafael, chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
usama.arif, linux-pm, linux-mm
On Sun, 22 Mar 2026 19:31:01 +0900 YoungJun Park <youngjun.park@lge.com> wrote:
> I based this series on v7.0-rc4 per Rafael's request since it
> depends on PM-side changes. I'm not very familiar with how
> cross-subsystem dependencies are typically coordinated -- if
> rebasing onto mm-new is an option, the race goes away and the
> PM-side changes could be picked up separately. Would that be
> a reasonable approach? I'd appreciate any guidance on this.
hm, yes, 2/2 doesn't apply to linux-next.
The easiest approach would be to park this until the next cycle. How
serious is the bug?
Or Rafael can merge the whole thing, if we can tease some review out of
the swap maintainers (please).
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-23 19:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 10:33 [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-21 10:33 ` [PATCH v7 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park
2026-03-21 10:33 ` [PATCH v7 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
2026-03-21 17:59 ` [PATCH v7 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
2026-03-22 10:31 ` YoungJun Park
2026-03-22 16:30 ` Andrew Morton
2026-03-23 19:56 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox