* [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
@ 2026-03-23 16:08 Youngjun Park
2026-03-23 16:08 ` [PATCH v8 1/2] " Youngjun Park
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Youngjun Park @ 2026-03-23 16:08 UTC (permalink / raw)
To: Rafael J . Wysocki, Andrew Morton
Cc: Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Youngjun Park, Usama Arif, linux-pm,
linux-mm
Currently, in the uswsusp path, only the swap type value is retrieved at
lookup time without holding a reference. If swapoff races after the type
is acquired, subsequent slot allocations operate on a stale swap device.
Additionally, grabbing and releasing the swap device reference on every
slot allocation is inefficient across the entire hibernation swap path.
This patch series addresses these issues:
- Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device
from the point it is looked up until the session completes.
- Patch 2: Removes the overhead of per-slot reference counting in alloc/free
paths and cleans up the redundant SWP_WRITEOK check.
Rebased onto mm-new per Andrew's suggestion [1]. The si->flags race
flagged by AI review in v7 (between SWP_HIBERNATION and cont_lock in
add_swap_count_continuation) and the proposed fixes discussed there
(atomic ops for si->flags, or serializing with swap_lock) are all moot
on mm-new since Kairui's series removed that code path entirely.
kernel/power/ changes are small, so Andrew proposed carrying everything
through mm-new.
Rafael, could you ack the PM-side changes?
Re-tested on mm-new (c51ea78c5466) with hibernate/resume cycles and
uswsusp paths. Also ran an additional round of AI review against the
rebased version, no new issues found.
[1] https://lore.kernel.org/linux-mm/20260322093038.25a7fd51f5d564b85815db7a@linux-foundation.org/
Links:
RFC v1: https://lore.kernel.org/linux-mm/20260305202413.1888499-1-usama.arif@linux.dev/T/#m3693d45180f14f441b6951984f4b4bfd90ec0c9d
RFC v2: https://lore.kernel.org/linux-mm/20260306024608.1720991-1-youngjun.park@lge.com/
RFC v3: https://lore.kernel.org/linux-mm/20260312112511.3596781-1-youngjun.park@lge.com/
v4: https://lore.kernel.org/linux-mm/abv+rjgyArqZ2uym@yjaykim-PowerEdge-T330/T/#m924fa3e58d0f0da488300653163ee8db7e870e4a
v5: https://lore.kernel.org/linux-mm/ab0YEn+Fd41q6LM7@yjaykim-PowerEdge-T330/T/#m8409d470c68cb152b0849940759bff7d7806f397
v6: https://lore.kernel.org/linux-mm/20260320182227.896f9ab62d62961b2caab5f7@linux-foundation.org/T/#m10ee3346cd8dcd052749105d9a8e2052dbf3bc80
v7: https://lore.kernel.org/linux-mm/ab/20260321103309.439265-1-youngjun.park@lge.com/
Testing:
- Hibernate/resume via sysfs
(echo reboot > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate with suspend via sysfs
(echo suspend > /sys/power/disk && echo disk > /sys/power/state)
- Hibernate/resume via uswsusp (suspend-utils s2disk/resume on QEMU)
- Verified swap I/O works correctly after resume.
- Verified swapoff succeeds after snapshot resume completes.
- swapoff during active uswsusp session:
- Verified swapoff returns -EBUSY while swap device is pinned (Patch 1).
- Verified swapoff succeeds after uswsusp process terminates.
Changelog:
v7 -> v8:
- Rebased onto mm-new per Andrew Morton's suggestion.
- Clarified function comments (SWP_HIBERNATION pinning).
- Re-tested and AI-reviewed on mm-new; no new issues found.
v6 -> v7:
- Dropped Patch 3 (pm_restore_gfp_mask fix) from series as it has
no dependency on Patches 1-2. Will be sent separately.
(Rafael J. Wysocki feedback)
v5 -> v6:
- Replaced get/put reference approach with SWP_HIBERNATION
pinning to prevent swapoff, per Kairui's feedback. Renamed helpers
from get/find/put_hibernation_swap_type() to
pin/find/unpin_hibernation_swap_type().
- Renamed swap_type_of() to __find_hibernation_swap_type() since
it is now an internal helper with no external callers.
- Removed swapoff waiting on hibernation reference.
swapoff now returns -EBUSY immediately when the swap device is
pinned.
- Updated function comments per Kairui's review.
v4 -> v5:
- Rebased onto v7.0-rc4 (Rafael J. Wysocki comment)
- No functional changes.
rfc v3 -> v4:
- Introduced get/find/put_hibernation_swap_type() helpers per Kairui's
feedback.
- Switched to swap_type_to_info() and added type < 0 check.
- Fixed get_hibernation_swap_type() return when ref == false.
- Made swapoff wait interruptible to prevent hang when uswsusp
holds a swap reference.
rfc v2 -> rfc v3:
- Split into 2 patches per Chris Li's feedback.
- Simplified by not holding reference in normal hibernation path.
- Removed redundant SWP_WRITEOK check.
rfc v1 -> rfc v2:
- Squashed into single patch per Usama Arif's feedback.
Youngjun Park (2):
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device
mm/swap: remove redundant swap device reference in alloc/free
include/linux/swap.h | 5 +-
kernel/power/swap.c | 2 +-
kernel/power/user.c | 15 +++-
mm/swapfile.c | 203 +++++++++++++++++++++++++++++++++----------
4 files changed, 172 insertions(+), 53 deletions(-)
base-commit: c51ea78c5466be89914cbfbe2618dea67026c2b1
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v8 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-23 16:08 [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park @ 2026-03-23 16:08 ` Youngjun Park 2026-03-24 5:53 ` Kairui Song 2026-03-23 16:08 ` [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park 2026-03-23 22:48 ` [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Andrew Morton 2 siblings, 1 reply; 9+ messages in thread From: Youngjun Park @ 2026-03-23 16:08 UTC (permalink / raw) To: Rafael J . Wysocki, Andrew Morton Cc: Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Youngjun Park, Usama Arif, linux-pm, linux-mm Hibernation via uswsusp (/dev/snapshot ioctls) has a race window: after selecting the resume swap area but before user space is frozen, swapoff may run and invalidate the selected swap device. Fix this by pinning the swap device with SWP_HIBERNATION while it is in use. The pin is exclusive, which is sufficient since hibernate_acquire() already prevents concurrent hibernation sessions. The kernel swsusp path (sysfs-based hibernate/resume) uses find_hibernation_swap_type() which is not affected by the pin. It freezes user space before touching swap, so swapoff cannot race. Introduce dedicated helpers: - pin_hibernation_swap_type(): Look up and pin the swap device. Used by the uswsusp path. - find_hibernation_swap_type(): Lookup without pinning. Used by the kernel swsusp path. - unpin_hibernation_swap_type(): Clear the hibernation pin. While a swap device is pinned, swapoff is prevented from proceeding. Signed-off-by: Youngjun Park <youngjun.park@lge.com> --- include/linux/swap.h | 5 +- kernel/power/swap.c | 2 +- kernel/power/user.c | 15 ++++- mm/swapfile.c | 135 ++++++++++++++++++++++++++++++++++++++----- 4 files changed, 136 insertions(+), 21 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 7a09df6977a5..1930f81e6be4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -213,6 +213,7 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ + SWP_HIBERNATION = (1 << 13), /* pinned for hibernation */ /* add others here before... */ }; @@ -433,7 +434,9 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -int swap_type_of(dev_t device, sector_t offset); +extern int pin_hibernation_swap_type(dev_t device, sector_t offset); +extern void unpin_hibernation_swap_type(int type); +extern int find_hibernation_swap_type(dev_t device, sector_t offset); int find_first_swap(dev_t *device); extern unsigned int count_swap_pages(int, int); extern sector_t swapdev_block(int, pgoff_t); diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 2e64869bb5a0..cc4764149e8f 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -341,7 +341,7 @@ static int swsusp_swap_check(void) * This is called before saving the image. */ if (swsusp_resume_device) - res = swap_type_of(swsusp_resume_device, swsusp_resume_block); + res = find_hibernation_swap_type(swsusp_resume_device, swsusp_resume_block); else res = find_first_swap(&swsusp_resume_device); if (res < 0) diff --git a/kernel/power/user.c b/kernel/power/user.c index 4401cfe26e5c..4406f5644a56 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp) memset(&data->handle, 0, sizeof(struct snapshot_handle)); if ((filp->f_flags & O_ACCMODE) == O_RDONLY) { /* Hibernating. The image device should be accessible. */ - data->swap = swap_type_of(swsusp_resume_device, 0); + data->swap = pin_hibernation_swap_type(swsusp_resume_device, 0); data->mode = O_RDONLY; data->free_bitmaps = false; error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION); @@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp) data->free_bitmaps = !error; } } - if (error) + if (error) { + unpin_hibernation_swap_type(data->swap); hibernate_release(); + } data->frozen = false; data->ready = false; @@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp) data = filp->private_data; data->dev = 0; free_all_swap_pages(data->swap); + unpin_hibernation_swap_type(data->swap); if (data->frozen) { pm_restore_gfp_mask(); free_basic_memory_bitmaps(); @@ -235,11 +238,17 @@ static int snapshot_set_swap_area(struct snapshot_data *data, offset = swap_area.offset; } + /* + * Unpin the swap device if a swap area was already + * set by SNAPSHOT_SET_SWAP_AREA. + */ + unpin_hibernation_swap_type(data->swap); + /* * User space encodes device types as two-byte values, * so we need to recode them */ - data->swap = swap_type_of(swdev, offset); + data->swap = pin_hibernation_swap_type(swdev, offset); if (data->swap < 0) return swdev ? -ENODEV : -EINVAL; data->dev = swdev; diff --git a/mm/swapfile.c b/mm/swapfile.c index 802332850e24..c5b459a18f43 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -133,7 +133,7 @@ static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) = { /* May return NULL on invalid type, caller must check for NULL return */ static struct swap_info_struct *swap_type_to_info(int type) { - if (type >= MAX_SWAPFILES) + if (type < 0 || type >= MAX_SWAPFILES) return NULL; return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } @@ -2138,22 +2138,15 @@ void swap_free_hibernation_slot(swp_entry_t entry) put_swap_device(si); } -/* - * Find the swap type that corresponds to given device (if any). - * - * @offset - number of the PAGE_SIZE-sized block of the device, starting - * from 0, in which the swap header is expected to be located. - * - * This is needed for the suspend to disk (aka swsusp). - */ -int swap_type_of(dev_t device, sector_t offset) +static int __find_hibernation_swap_type(dev_t device, sector_t offset) { int type; + lockdep_assert_held(&swap_lock); + if (!device) - return -1; + return -EINVAL; - spin_lock(&swap_lock); for (type = 0; type < nr_swapfiles; type++) { struct swap_info_struct *sis = swap_info[type]; @@ -2163,16 +2156,118 @@ int swap_type_of(dev_t device, sector_t offset) if (device == sis->bdev->bd_dev) { struct swap_extent *se = first_se(sis); - if (se->start_block == offset) { - spin_unlock(&swap_lock); + if (se->start_block == offset) return type; - } } } - spin_unlock(&swap_lock); return -ENODEV; } +/** + * pin_hibernation_swap_type - Pin the swap device for hibernation + * @device: Block device containing the resume image + * @offset: Offset identifying the swap area + * + * Locate the swap device for @device/@offset and mark it as pinned + * for hibernation. While pinned, swapoff() is prevented. + * + * Only one uswsusp context may pin a swap device at a time. + * If already pinned, this function returns -EBUSY. + * + * Return: + * >= 0 on success (swap type). + * -EINVAL if @device is invalid. + * -ENODEV if the swap device is not found. + * -EBUSY if the device is already pinned for hibernation. + */ +int pin_hibernation_swap_type(dev_t device, sector_t offset) +{ + int type; + struct swap_info_struct *si; + + spin_lock(&swap_lock); + + type = __find_hibernation_swap_type(device, offset); + if (type < 0) { + spin_unlock(&swap_lock); + return type; + } + + si = swap_type_to_info(type); + if (WARN_ON_ONCE(!si)) { + spin_unlock(&swap_lock); + return -ENODEV; + } + + /* + * hibernate_acquire() prevents concurrent hibernation sessions. + * This check additionally guards against double-pinning within + * the same session. + */ + if (WARN_ON_ONCE(si->flags & SWP_HIBERNATION)) { + spin_unlock(&swap_lock); + return -EBUSY; + } + + si->flags |= SWP_HIBERNATION; + + spin_unlock(&swap_lock); + return type; +} + +/** + * unpin_hibernation_swap_type - Unpin the swap device for hibernation + * @type: Swap type previously returned by pin_hibernation_swap_type() + * + * Clear the hibernation pin on the given swap device, allowing + * swapoff() to proceed normally. + * + * If @type does not refer to a valid swap device, this function + * does nothing. + */ +void unpin_hibernation_swap_type(int type) +{ + struct swap_info_struct *si; + + spin_lock(&swap_lock); + si = swap_type_to_info(type); + if (!si) { + spin_unlock(&swap_lock); + return; + } + si->flags &= ~SWP_HIBERNATION; + spin_unlock(&swap_lock); +} + +/** + * find_hibernation_swap_type - Find swap type for hibernation + * @device: Block device containing the resume image + * @offset: Offset within the device identifying the swap area + * + * Locate the swap device corresponding to @device and @offset. + * + * Unlike pin_hibernation_swap_type(), this function only performs a + * lookup and does not mark the swap device as pinned for hibernation. + * + * This is safe in the sysfs-based hibernation path where user space + * is already frozen and swapoff() cannot run concurrently. + * + * Return: + * A non-negative swap type on success. + * -EINVAL if @device is invalid. + * -ENODEV if no matching swap device is found. + */ +int find_hibernation_swap_type(dev_t device, sector_t offset) +{ + int type; + + spin_lock(&swap_lock); + type = __find_hibernation_swap_type(device, offset); + spin_unlock(&swap_lock); + + return type; +} + int find_first_swap(dev_t *device) { int type; @@ -2936,6 +3031,14 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_unlock(&swap_lock); goto out_dput; } + + /* Refuse swapoff while the device is pinned for hibernation */ + if (p->flags & SWP_HIBERNATION) { + err = -EBUSY; + spin_unlock(&swap_lock); + goto out_dput; + } + if (!security_vm_enough_memory_mm(current->mm, p->pages)) vm_unacct_memory(p->pages); else { -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-23 16:08 ` [PATCH v8 1/2] " Youngjun Park @ 2026-03-24 5:53 ` Kairui Song 2026-03-24 12:48 ` YoungJun Park 0 siblings, 1 reply; 9+ messages in thread From: Kairui Song @ 2026-03-24 5:53 UTC (permalink / raw) To: Youngjun Park Cc: Rafael J . Wysocki, Andrew Morton, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Tue, Mar 24, 2026 at 01:08:21AM +0800, Youngjun Park wrote: > Hibernation via uswsusp (/dev/snapshot ioctls) has a race window: > after selecting the resume swap area but before user space is frozen, > swapoff may run and invalidate the selected swap device. > > Fix this by pinning the swap device with SWP_HIBERNATION while it is > in use. The pin is exclusive, which is sufficient since > hibernate_acquire() already prevents concurrent hibernation sessions. > > The kernel swsusp path (sysfs-based hibernate/resume) uses > find_hibernation_swap_type() which is not affected by the pin. It > freezes user space before touching swap, so swapoff cannot race. > > Introduce dedicated helpers: > - pin_hibernation_swap_type(): Look up and pin the swap device. > Used by the uswsusp path. > - find_hibernation_swap_type(): Lookup without pinning. > Used by the kernel swsusp path. > - unpin_hibernation_swap_type(): Clear the hibernation pin. Looks good to me, thanks! Reviewed-by: Kairui Song <kasong@tencent.com> Just one trivial nit picks below. > +/** > + * unpin_hibernation_swap_type - Unpin the swap device for hibernation > + * @type: Swap type previously returned by pin_hibernation_swap_type() > + * > + * Clear the hibernation pin on the given swap device, allowing > + * swapoff() to proceed normally. > + * > + * If @type does not refer to a valid swap device, this function > + * does nothing. > + */ > +void unpin_hibernation_swap_type(int type) > +{ > + struct swap_info_struct *si; > + > + spin_lock(&swap_lock); > + si = swap_type_to_info(type); > + if (!si) { > + spin_unlock(&swap_lock); > + return; > + } > + si->flags &= ~SWP_HIBERNATION; Will the code will be simpler if you just: if (si) si->flags &= ~SWP_HIBERNATION; Just personal taste, free feel to ignore. And as you mentioned this is on top of swap table p3 so you based it on mm-new - but isn't p3 already in mm-unstable? Maybe we can have it there? Not sure how much conflict are there with PM. The code and design looks OK. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-24 5:53 ` Kairui Song @ 2026-03-24 12:48 ` YoungJun Park 0 siblings, 0 replies; 9+ messages in thread From: YoungJun Park @ 2026-03-24 12:48 UTC (permalink / raw) To: Kairui Song Cc: Rafael J . Wysocki, Andrew Morton, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Tue, Mar 24, 2026 at 01:53:33PM +0800, Kairui Song wrote: > On Tue, Mar 24, 2026 at 01:08:21AM +0800, Youngjun Park wrote: > > Hibernation via uswsusp (/dev/snapshot ioctls) has a race window: > > after selecting the resume swap area but before user space is frozen, > > swapoff may run and invalidate the selected swap device. > > > > Fix this by pinning the swap device with SWP_HIBERNATION while it is > > in use. The pin is exclusive, which is sufficient since > > hibernate_acquire() already prevents concurrent hibernation sessions. > > > > The kernel swsusp path (sysfs-based hibernate/resume) uses > > find_hibernation_swap_type() which is not affected by the pin. It > > freezes user space before touching swap, so swapoff cannot race. > > > > Introduce dedicated helpers: > > - pin_hibernation_swap_type(): Look up and pin the swap device. > > Used by the uswsusp path. > > - find_hibernation_swap_type(): Lookup without pinning. > > Used by the kernel swsusp path. > > - unpin_hibernation_swap_type(): Clear the hibernation pin. > > Looks good to me, thanks! > > Reviewed-by: Kairui Song <kasong@tencent.com> Thanks for the review, Kairui, and for all your feedback throughout the revisions! > Just one trivial nit picks below. > > +/** > > + * unpin_hibernation_swap_type - Unpin the swap device for hibernation > > + * @type: Swap type previously returned by pin_hibernation_swap_type() > > + * > > + * Clear the hibernation pin on the given swap device, allowing > > + * swapoff() to proceed normally. > > + * > > + * If @type does not refer to a valid swap device, this function > > + * does nothing. > > + */ > > +void unpin_hibernation_swap_type(int type) > > +{ > > + struct swap_info_struct *si; > > + > > + spin_lock(&swap_lock); > > + si = swap_type_to_info(type); > > + if (!si) { > > + spin_unlock(&swap_lock); > > + return; > > + } > > + si->flags &= ~SWP_HIBERNATION; > > Will the code will be simpler if you just: > > if (si) > si->flags &= ~SWP_HIBERNATION; > > Just personal taste, free feel to ignore. Noted on the style preference. I'll keep it in mind. :D > And as you mentioned this is on top of swap table p3 so you based > it on mm-new - but isn't p3 already in mm-unstable? Maybe we can > have it there? Not sure how much conflict are there with PM. > > The code and design looks OK. Regarding the base branch. Andrew is already aware of the potential conflicts in linux-next, and we've discussed possibly parking this for the next cycle depending on Rafael's input. So I think we can keep it as-is for now and see how things go! Best regards, Youngjun Park ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free 2026-03-23 16:08 [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park 2026-03-23 16:08 ` [PATCH v8 1/2] " Youngjun Park @ 2026-03-23 16:08 ` Youngjun Park 2026-03-24 6:49 ` Kairui Song 2026-03-23 22:48 ` [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Andrew Morton 2 siblings, 1 reply; 9+ messages in thread From: Youngjun Park @ 2026-03-23 16:08 UTC (permalink / raw) To: Rafael J . Wysocki, Andrew Morton Cc: Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Youngjun Park, Usama Arif, linux-pm, linux-mm In the previous commit, uswsusp was modified to pin the swap device when the swap type is determined, ensuring the device remains valid throughout the hibernation I/O path. Therefore, it is no longer necessary to repeatedly get and put the swap device reference for each swap slot allocation and free operation. For hibernation via the sysfs interface, user-space tasks are frozen before swap allocation begins, so swapoff cannot race with allocation. After resume, tasks remain frozen while swap slots are freed, so additional reference management is not required there either. Remove the redundant swap device get/put operations from the hibernation swap allocation and free paths. Also remove the SWP_WRITEOK check before allocation, as the cluster allocation logic already validates the swap device state. Update function comments to document the caller's responsibility for ensuring swap device stability. Signed-off-by: Youngjun Park <youngjun.park@lge.com> --- mm/swapfile.c | 68 +++++++++++++++++++++++++++------------------------ 1 file changed, 36 insertions(+), 32 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index c5b459a18f43..ff315b752afd 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2077,7 +2077,16 @@ void swap_put_entries_direct(swp_entry_t entry, int nr) } #ifdef CONFIG_HIBERNATION -/* Allocate a slot for hibernation */ +/** + * swap_alloc_hibernation_slot() - Allocate a swap slot for hibernation. + * @type: swap device type index to allocate from. + * + * The caller must ensure the swap device is stable, either by pinning + * it (SWP_HIBERNATION) or by freezing user-space. + * + * Return: a valid swp_entry_t on success, or an empty entry (val == 0) + * on failure. + */ swp_entry_t swap_alloc_hibernation_slot(int type) { struct swap_info_struct *pcp_si, *si = swap_type_to_info(type); @@ -2088,46 +2097,42 @@ swp_entry_t swap_alloc_hibernation_slot(int type) if (!si) goto fail; - /* This is called for allocating swap entry, not cache */ - if (get_swap_device_info(si)) { - if (si->flags & SWP_WRITEOK) { - /* - * Try the local cluster first if it matches the device. If - * not, try grab a new cluster and override local cluster. - */ - local_lock(&percpu_swap_cluster.lock); - pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); - pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); - if (pcp_si == si && pcp_offset) { - ci = swap_cluster_lock(si, pcp_offset); - if (cluster_is_usable(ci, 0)) - offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); - else - swap_cluster_unlock(ci); - } - if (!offset) - offset = cluster_alloc_swap_entry(si, NULL); - local_unlock(&percpu_swap_cluster.lock); - if (offset) - entry = swp_entry(si->type, offset); - } - put_swap_device(si); + /* + * Try the local cluster first if it matches the device. If + * not, try grab a new cluster and override local cluster. + */ + local_lock(&percpu_swap_cluster.lock); + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); + if (pcp_si == si && pcp_offset) { + ci = swap_cluster_lock(si, pcp_offset); + if (cluster_is_usable(ci, 0)) + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); + else + swap_cluster_unlock(ci); } + if (!offset) + offset = cluster_alloc_swap_entry(si, NULL); + local_unlock(&percpu_swap_cluster.lock); + if (offset) + entry = swp_entry(si->type, offset); + fail: return entry; } -/* Free a slot allocated by swap_alloc_hibernation_slot */ +/** + * swap_free_hibernation_slot() - Free a swap slot allocated for hibernation. + * @entry: swap entry to free. + * + * The caller must ensure the swap device is stable. + */ void swap_free_hibernation_slot(swp_entry_t entry) { - struct swap_info_struct *si; + struct swap_info_struct *si = __swap_entry_to_info(entry); struct swap_cluster_info *ci; pgoff_t offset = swp_offset(entry); - si = get_swap_device(entry); - if (WARN_ON(!si)) - return; - ci = swap_cluster_lock(si, offset); __swap_cluster_put_entry(ci, offset % SWAPFILE_CLUSTER); __swap_cluster_free_entries(si, ci, offset % SWAPFILE_CLUSTER, 1); @@ -2135,7 +2140,6 @@ void swap_free_hibernation_slot(swp_entry_t entry) /* In theory readahead might add it to the swap cache by accident */ __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); - put_swap_device(si); } static int __find_hibernation_swap_type(dev_t device, sector_t offset) -- 2.34.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free 2026-03-23 16:08 ` [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park @ 2026-03-24 6:49 ` Kairui Song 0 siblings, 0 replies; 9+ messages in thread From: Kairui Song @ 2026-03-24 6:49 UTC (permalink / raw) To: Youngjun Park Cc: Rafael J . Wysocki, Andrew Morton, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Tue, Mar 24, 2026 at 01:08:22AM +0800, Youngjun Park wrote: > In the previous commit, uswsusp was modified to pin the swap device > when the swap type is determined, ensuring the device remains valid > throughout the hibernation I/O path. > > Therefore, it is no longer necessary to repeatedly get and put the swap > device reference for each swap slot allocation and free operation. > > For hibernation via the sysfs interface, user-space tasks are frozen > before swap allocation begins, so swapoff cannot race with allocation. > After resume, tasks remain frozen while swap slots are freed, so > additional reference management is not required there either. > > Remove the redundant swap device get/put operations from the > hibernation swap allocation and free paths. > > Also remove the SWP_WRITEOK check before allocation, as the cluster > allocation logic already validates the swap device state. > > Update function comments to document the caller's responsibility for > ensuring swap device stability. > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > --- > mm/swapfile.c | 68 +++++++++++++++++++++++++++------------------------ > 1 file changed, 36 insertions(+), 32 deletions(-) Thanks! Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-23 16:08 [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park 2026-03-23 16:08 ` [PATCH v8 1/2] " Youngjun Park 2026-03-23 16:08 ` [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park @ 2026-03-23 22:48 ` Andrew Morton 2026-03-24 2:51 ` YoungJun Park 2 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2026-03-23 22:48 UTC (permalink / raw) To: Youngjun Park Cc: Rafael J . Wysocki, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Tue, 24 Mar 2026 01:08:20 +0900 Youngjun Park <youngjun.park@lge.com> wrote: > Rebased onto mm-new per Andrew's suggestion [1]. The si->flags race > flagged by AI review in v7 (between SWP_HIBERNATION and cont_lock in > add_swap_count_continuation) and the proposed fixes discussed there > (atomic ops for si->flags, or serializing with swap_lock) are all moot > on mm-new since Kairui's series removed that code path entirely. > kernel/power/ changes are small, so Andrew proposed carrying everything > through mm-new. > > Rafael, could you ack the PM-side changes? Please. We'll hit a conflict in linux-next and Mark will tell us and we can flag that to Linus when merging into mainline, usual stuff. Or we can park this until the next cycle, depends on how serious the bug is. How serious is the bug? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-23 22:48 ` [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Andrew Morton @ 2026-03-24 2:51 ` YoungJun Park 2026-03-24 3:03 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: YoungJun Park @ 2026-03-24 2:51 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J . Wysocki, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Mon, Mar 23, 2026 at 03:48:33PM -0700, Andrew Morton wrote: > On Tue, 24 Mar 2026 01:08:20 +0900 Youngjun Park <youngjun.park@lge.com> wrote: > > > Rebased onto mm-new per Andrew's suggestion [1]. The si->flags race > > flagged by AI review in v7 (between SWP_HIBERNATION and cont_lock in > > add_swap_count_continuation) and the proposed fixes discussed there > > (atomic ops for si->flags, or serializing with swap_lock) are all moot > > on mm-new since Kairui's series removed that code path entirely. > > kernel/power/ changes are small, so Andrew proposed carrying everything > > through mm-new. > > > > Rafael, could you ack the PM-side changes? > > Please. > > We'll hit a conflict in linux-next and Mark will tell us and we can > flag that to Linus when merging into mainline, usual stuff. > > Or we can park this until the next cycle, depends on how serious the > bug is. How serious is the bug? Hi Andrew, In my opinion, this bug is unlikely to occur and does not appear to be serious. It may be better to park this for the next cycle. I verified the behavior using suspend-utils. Below is a recap of the reproduction scenarios I mentioned earlier in more detail. Pre-condition - swapon /dev/sdb (intended for hibernation) Case 1 Process 1 (test program) Process 2 --------------------------- ---------------- ioctl(SNAPSHOT_SET_SWAP_AREA) swapoff /dev/sdb ioctl(SNAPSHOT_ALLOC_SWAP_PAGE) - SNAPSHOT_ALLOC_SWAP_PAGE fails with -ENOSPC. - The race window where swapoff /dev/sdb can occur is extremely small, and such an intentional sequence is unlikely in practice. - If SNAPSHOT_ALLOC_SWAP_PAGE succeeds, swapoff does not occur. Case 2 Process 1 (test program) Process 2 --------------------------- ---------------- ioctl(SNAPSHOT_SET_SWAP_AREA) swapoff /dev/sdb swapon /dev/sdc freeze processes ioctl(SNAPSHOT_ALLOC_SWAP_PAGE) create snapshot image - In testing, snapshot boot from /dev/sdb succeeds. - The swap block offset may be taken from an unexpected device, but I/O to /dev/sdb itself succeeds. - Since the actual allocated swap offset is not used for writing the snapshot image, there is a theoretical risk of corruption if I/O occurs at that offset on /dev/sdb during the window. However, Processes are frozen before writing the snapshot image to /dev/sdb. Therefore, while the issue is theoretically possible, the probability of it occurring in practice appears extremely low. Best regards, Youngjun Park ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device 2026-03-24 2:51 ` YoungJun Park @ 2026-03-24 3:03 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2026-03-24 3:03 UTC (permalink / raw) To: YoungJun Park Cc: Rafael J . Wysocki, Chris Li, Kairui Song, Pavel Machek, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Usama Arif, linux-pm, linux-mm On Tue, 24 Mar 2026 11:51:12 +0900 YoungJun Park <youngjun.park@lge.com> wrote: > > We'll hit a conflict in linux-next and Mark will tell us and we can > > flag that to Linus when merging into mainline, usual stuff. > > > > Or we can park this until the next cycle, depends on how serious the > > bug is. How serious is the bug? > > Hi Andrew, > > In my opinion, this bug is unlikely to occur and does not appear to be serious. > It may be better to park this for the next cycle. Great, thanks. I'll await Rafael's input and let's see how much damage Mark encounters. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-24 12:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-23 16:08 [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Youngjun Park 2026-03-23 16:08 ` [PATCH v8 1/2] " Youngjun Park 2026-03-24 5:53 ` Kairui Song 2026-03-24 12:48 ` YoungJun Park 2026-03-23 16:08 ` [PATCH v8 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park 2026-03-24 6:49 ` Kairui Song 2026-03-23 22:48 ` [PATCH v8 0/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Andrew Morton 2026-03-24 2:51 ` YoungJun Park 2026-03-24 3:03 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox