public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [RESEND RFC PATCH v3 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap reference
@ 2026-03-12 11:25 Youngjun Park
  2026-03-12 11:25 ` [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting " Youngjun Park
  2026-03-12 11:25 ` [RESEND RFC PATCH v3 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
  0 siblings, 2 replies; 5+ messages in thread
From: Youngjun Park @ 2026-03-12 11:25 UTC (permalink / raw)
  To: rafael, akpm
  Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
	youngjun.park, usama.arif, linux-pm, linux-mm

Currently, in the uswsusp path, only the swap type value is retrieved at         
lookup time without holding a reference. If swapoff races after the type         
is acquired, subsequent slot allocations operate on a stale swap device.         

Additionally, grabbing and releasing the swap device reference on every          
slot allocation is inefficient across the entire hibernation swap path.          

This patch series addresses these issues:                                        
- Patch 1: Fixes the swapoff race in uswsusp by holding the swap device          
  reference from the point the swap device is looked up.                         
- Patch 2: Removes the overhead of per-slot reference counting in alloc/free     
  paths and cleans up the redundant SWP_WRITEOK check.

While this approach seems reasonable and has passed basic testing, 
I am not entirely familiar with the snapshot codebase. 
Therefore, I am submitting this as an RFC to gather your feedback. 
I plan to update it to a formal patch series based on the comments received.
                                                                                
Changelog:                                                                       
rfc v2 -> rfc v3:
 - Split the single patch into 2 patches (Patch 1: race issue, Patch 2:
   reference optimization) per Chris Li's feedback.          
 - Addressed Chris Li's concern regarding the complexity of releasing
   references on error paths in normal hibernation. Since normal hibernation
   freezes processes (making swapoff impossible), it no longer holds the     
   reference at all. Instead, uswsusp explicitly acquires the reference,        
   simplifying the overall fix.         
 - Removed the SWP_WRITEOK check before allocation, as it is already
   handled during cluster allocation.             
 - Rebased the patch series onto f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b.
 - Change the RFC title and modify commit.        

rfc v1 -> rfc v2:
 - Squashed previous commits into a single patch per Usama Arif's feedback
   regarding git bisectability.       

Links:                                                                           
RFC v1: https://lore.kernel.org/linux-mm/20260305202413.1888499-1-usama.arif@linux.dev/T/#m3693d45180f14f441b6951984f4b4bfd90ec0c9d
RFC v2: https://lore.kernel.org/linux-mm/20260306024608.1720991-1-youngjun.park@lge.com/

Thanks,                                                                          
Youngjun Park

Youngjun Park (2):
  mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap
    reference
  mm/swap: remove redundant swap device reference in alloc/free

 include/linux/swap.h |  3 +-
 kernel/power/swap.c  |  2 +-
 kernel/power/user.c  | 11 +++++--
 mm/swapfile.c        | 78 +++++++++++++++++++++++---------------------
 4 files changed, 52 insertions(+), 42 deletions(-)

-- 
base-commit: f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b (mm-new)

2.34.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference
  2026-03-12 11:25 [RESEND RFC PATCH v3 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap reference Youngjun Park
@ 2026-03-12 11:25 ` Youngjun Park
  2026-03-13 13:20   ` Kairui Song
  2026-03-12 11:25 ` [RESEND RFC PATCH v3 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
  1 sibling, 1 reply; 5+ messages in thread
From: Youngjun Park @ 2026-03-12 11:25 UTC (permalink / raw)
  To: rafael, akpm
  Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
	youngjun.park, usama.arif, linux-pm, linux-mm

Hibernation can be triggered either via the sysfs interface or via the
uswsusp utility using /dev/snapshot ioctls.

In the case of uswsusp, the resume device is configured either by the
boot parameter during snapshot_open() or via the SNAPSHOT_SET_SWAP_AREA
ioctl. However, a race condition exists between setting this swap area
and actually allocating a swap slot via the SNAPSHOT_ALLOC_SWAP_PAGE
ioctl. For instance, if swapoff is executed and a different swap device
is enabled during this window, an incorrect swap slot might be allocated.

Hibernation via the sysfs interface does not suffer from this race
condition because user-space processes are frozen before proceeding,
making it impossible to execute swapoff.

To resolve this race in uswsusp, modify swap_type_of() to properly
acquire a reference to the swap device using get_swap_device().

Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
 include/linux/swap.h |  3 ++-
 kernel/power/swap.c  |  2 +-
 kernel/power/user.c  | 11 ++++++++---
 mm/swapfile.c        | 28 +++++++++++++++++++++-------
 4 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7a09df6977a5..ecf19a581fc7 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -433,8 +433,9 @@ static inline long get_nr_swap_pages(void)
 }
 
 extern void si_swapinfo(struct sysinfo *);
-int swap_type_of(dev_t device, sector_t offset);
+int swap_type_of(dev_t device, sector_t offset, bool ref);
 int find_first_swap(dev_t *device);
+void put_swap_device_by_type(int type);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int __swap_count(swp_entry_t entry);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 2e64869bb5a0..3a477914a7c4 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -341,7 +341,7 @@ static int swsusp_swap_check(void)
 	 * This is called before saving the image.
 	 */
 	if (swsusp_resume_device)
-		res = swap_type_of(swsusp_resume_device, swsusp_resume_block);
+		res = swap_type_of(swsusp_resume_device, swsusp_resume_block, false);
 	else
 		res = find_first_swap(&swsusp_resume_device);
 	if (res < 0)
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4401cfe26e5c..7ade4d0aa846 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 	memset(&data->handle, 0, sizeof(struct snapshot_handle));
 	if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
 		/* Hibernating.  The image device should be accessible. */
-		data->swap = swap_type_of(swsusp_resume_device, 0);
+		data->swap = swap_type_of(swsusp_resume_device, 0, true);
 		data->mode = O_RDONLY;
 		data->free_bitmaps = false;
 		error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
@@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 			data->free_bitmaps = !error;
 		}
 	}
-	if (error)
+	if (error) {
+		put_swap_device_by_type(data->swap);
 		hibernate_release();
+	}
 
 	data->frozen = false;
 	data->ready = false;
@@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
 	data = filp->private_data;
 	data->dev = 0;
 	free_all_swap_pages(data->swap);
+	put_swap_device_by_type(data->swap);
 	if (data->frozen) {
 		pm_restore_gfp_mask();
 		free_basic_memory_bitmaps();
@@ -235,11 +238,13 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
 		offset = swap_area.offset;
 	}
 
+	put_swap_device_by_type(data->swap);
+
 	/*
 	 * User space encodes device types as two-byte values,
 	 * so we need to recode them
 	 */
-	data->swap = swap_type_of(swdev, offset);
+	data->swap = swap_type_of(swdev, offset, true);
 	if (data->swap < 0)
 		return swdev ? -ENODEV : -EINVAL;
 	data->dev = swdev;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d864866a35ea..5a3d5c1e1f81 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2149,7 +2149,7 @@ void swap_free_hibernation_slot(swp_entry_t entry)
  *
  * This is needed for the suspend to disk (aka swsusp).
  */
-int swap_type_of(dev_t device, sector_t offset)
+int swap_type_of(dev_t device, sector_t offset, bool ref)
 {
 	int type;
 
@@ -2163,13 +2163,16 @@ int swap_type_of(dev_t device, sector_t offset)
 		if (!(sis->flags & SWP_WRITEOK))
 			continue;
 
-		if (device == sis->bdev->bd_dev) {
-			struct swap_extent *se = first_se(sis);
+		if (device != sis->bdev->bd_dev)
+			continue;
 
-			if (se->start_block == offset) {
-				spin_unlock(&swap_lock);
-				return type;
-			}
+		struct swap_extent *se = first_se(sis);
+		if (se->start_block != offset)
+			continue;
+
+		if (ref && get_swap_device_info(sis)) {
+			spin_unlock(&swap_lock);
+			return type;
 		}
 	}
 	spin_unlock(&swap_lock);
@@ -2194,6 +2197,17 @@ int find_first_swap(dev_t *device)
 	return -ENODEV;
 }
 
+void put_swap_device_by_type(int type)
+{
+	struct swap_info_struct *sis;
+
+	if (type < 0 || type >= MAX_SWAPFILES)
+		return;
+
+	sis = swap_info[type];
+	put_swap_device(sis);
+}
+
 /*
  * Get the (PAGE_SIZE) block corresponding to given offset on the swapdev
  * corresponding to given index in swap_info (swap type).
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RESEND RFC PATCH v3 2/2] mm/swap: remove redundant swap device reference in alloc/free
  2026-03-12 11:25 [RESEND RFC PATCH v3 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap reference Youngjun Park
  2026-03-12 11:25 ` [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting " Youngjun Park
@ 2026-03-12 11:25 ` Youngjun Park
  1 sibling, 0 replies; 5+ messages in thread
From: Youngjun Park @ 2026-03-12 11:25 UTC (permalink / raw)
  To: rafael, akpm
  Cc: chrisl, kasong, pavel, shikemeng, nphamcs, bhe, baohua,
	youngjun.park, usama.arif, linux-pm, linux-mm

In the previous commit, uswsusp was modified to acquire the swap device
reference at the time of determining the swap type. As a result, it is
no longer necessary to repeatedly acquire and release the reference to
protect against swapoff every time a swap slot is allocated.

For hibernation via the sysfs interface, user-space processes are already
frozen, making swapoff inherently impossible. Thus, acquiring and
releasing the reference during allocation is unnecessary. Furthermore,
even after returning from suspend, processes are not yet thawed when
swap slots are freed, meaning reference management is not required at
that stage either.

Therefore, remove the redundant swap device reference acquire and release
operations from the hibernation swap allocation and free functions.

Additionally, remove the SWP_WRITEOK check before allocation. This check
is redundant because the cluster allocation logic already handles it.

Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
 mm/swapfile.c | 50 ++++++++++++++++++++------------------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 5a3d5c1e1f81..8467c3511870 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2091,31 +2091,26 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
 	if (!si)
 		goto fail;
 
-	/* This is called for allocating swap entry, not cache */
-	if (get_swap_device_info(si)) {
-		if (si->flags & SWP_WRITEOK) {
-			/*
-			 * Try the local cluster first if it matches the device. If
-			 * not, try grab a new cluster and override local cluster.
-			 */
-			local_lock(&percpu_swap_cluster.lock);
-			pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
-			pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
-			if (pcp_si == si && pcp_offset) {
-				ci = swap_cluster_lock(si, pcp_offset);
-				if (cluster_is_usable(ci, 0))
-					offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
-				else
-					swap_cluster_unlock(ci);
-			}
-			if (!offset)
-				offset = cluster_alloc_swap_entry(si, NULL);
-			local_unlock(&percpu_swap_cluster.lock);
-			if (offset)
-				entry = swp_entry(si->type, offset);
-		}
-		put_swap_device(si);
+	/*
+	 * Try the local cluster first if it matches the device. If
+	 * not, try grab a new cluster and override local cluster.
+	 */
+	local_lock(&percpu_swap_cluster.lock);
+	pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
+	pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
+	if (pcp_si == si && pcp_offset) {
+		ci = swap_cluster_lock(si, pcp_offset);
+		if (cluster_is_usable(ci, 0))
+			offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
+		else
+			swap_cluster_unlock(ci);
 	}
+	if (!offset)
+		offset = cluster_alloc_swap_entry(si, NULL);
+	local_unlock(&percpu_swap_cluster.lock);
+	if (offset)
+		entry = swp_entry(si->type, offset);
+
 fail:
 	return entry;
 }
@@ -2123,14 +2118,10 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
 /* Free a slot allocated by swap_alloc_hibernation_slot */
 void swap_free_hibernation_slot(swp_entry_t entry)
 {
-	struct swap_info_struct *si;
+	struct swap_info_struct *si = __swap_entry_to_info(entry);
 	struct swap_cluster_info *ci;
 	pgoff_t offset = swp_offset(entry);
 
-	si = get_swap_device(entry);
-	if (WARN_ON(!si))
-		return;
-
 	ci = swap_cluster_lock(si, offset);
 	__swap_cluster_put_entry(ci, offset % SWAPFILE_CLUSTER);
 	__swap_cluster_free_entries(si, ci, offset % SWAPFILE_CLUSTER, 1);
@@ -2138,7 +2129,6 @@ void swap_free_hibernation_slot(swp_entry_t entry)
 
 	/* In theory readahead might add it to the swap cache by accident */
 	__try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
-	put_swap_device(si);
 }
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference
  2026-03-12 11:25 ` [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting " Youngjun Park
@ 2026-03-13 13:20   ` Kairui Song
  2026-03-14 11:48     ` YoungJun Park
  0 siblings, 1 reply; 5+ messages in thread
From: Kairui Song @ 2026-03-13 13:20 UTC (permalink / raw)
  To: Youngjun Park
  Cc: rafael, akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe,
	baohua, usama.arif, linux-pm, linux-mm

On Thu, Mar 12, 2026 at 08:25:10PM +0800, Youngjun Park wrote:
> Hibernation can be triggered either via the sysfs interface or via the
> uswsusp utility using /dev/snapshot ioctls.
> 
> In the case of uswsusp, the resume device is configured either by the
> boot parameter during snapshot_open() or via the SNAPSHOT_SET_SWAP_AREA
> ioctl. However, a race condition exists between setting this swap area
> and actually allocating a swap slot via the SNAPSHOT_ALLOC_SWAP_PAGE
> ioctl. For instance, if swapoff is executed and a different swap device
> is enabled during this window, an incorrect swap slot might be allocated.
> 
> Hibernation via the sysfs interface does not suffer from this race
> condition because user-space processes are frozen before proceeding,
> making it impossible to execute swapoff.
> 
> To resolve this race in uswsusp, modify swap_type_of() to properly
> acquire a reference to the swap device using get_swap_device().
> 
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
>  include/linux/swap.h |  3 ++-
>  kernel/power/swap.c  |  2 +-
>  kernel/power/user.c  | 11 ++++++++---
>  mm/swapfile.c        | 28 +++++++++++++++++++++-------
>  4 files changed, 32 insertions(+), 12 deletions(-)

Hi, YoungJun

Nice work! Thanks for improving the swap part of hibernation.

Some comments below:

> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 7a09df6977a5..ecf19a581fc7 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -433,8 +433,9 @@ static inline long get_nr_swap_pages(void)
>  }
>  
>  extern void si_swapinfo(struct sysinfo *);
> -int swap_type_of(dev_t device, sector_t offset);
> +int swap_type_of(dev_t device, sector_t offset, bool ref);
>  int find_first_swap(dev_t *device);
> +void put_swap_device_by_type(int type);
>  extern unsigned int count_swap_pages(int, int);
>  extern sector_t swapdev_block(int, pgoff_t);
>  extern int __swap_count(swp_entry_t entry);

I think the `ref` parameter here is really confusing. Maybe at lease
add some comment that some caller will block swapoff so no ref needed.

Or could we use a new helper to grab the device? Or better, is it
possible to grab the swap device then keep allocating it, instead of
check the device every time from hibernation side?

The combination of swap_type_of & put_swap_device_by_type call pair
really doesn't look good.

> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 4401cfe26e5c..7ade4d0aa846 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>  	memset(&data->handle, 0, sizeof(struct snapshot_handle));
>  	if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
>  		/* Hibernating.  The image device should be accessible. */
> -		data->swap = swap_type_of(swsusp_resume_device, 0);
> +		data->swap = swap_type_of(swsusp_resume_device, 0, true);
>  		data->mode = O_RDONLY;
>  		data->free_bitmaps = false;
>  		error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
> @@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>  			data->free_bitmaps = !error;
>  		}
>  	}
> -	if (error)
> +	if (error) {
> +		put_swap_device_by_type(data->swap);
>  		hibernate_release();
> +	}
>  
>  	data->frozen = false;
>  	data->ready = false;
> @@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
>  	data = filp->private_data;
>  	data->dev = 0;
>  	free_all_swap_pages(data->swap);
> +	put_swap_device_by_type(data->swap);
>  	if (data->frozen) {
>  		pm_restore_gfp_mask();
>  		free_basic_memory_bitmaps();
> @@ -235,11 +238,13 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
>  		offset = swap_area.offset;
>  	}
>  
> +	put_swap_device_by_type(data->swap);
> +
>  	/*
>  	 * User space encodes device types as two-byte values,
>  	 * so we need to recode them
>  	 */
> -	data->swap = swap_type_of(swdev, offset);
> +	data->swap = swap_type_of(swdev, offset, true);

For example this put_swap_device_by_type followed by swap_type_of
looks very strange. I guess here you only want to get the swap type
without increasing the ref.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d864866a35ea..5a3d5c1e1f81 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2149,7 +2149,7 @@ void swap_free_hibernation_slot(swp_entry_t entry)
>   *
>   * This is needed for the suspend to disk (aka swsusp).
>   */
> -int swap_type_of(dev_t device, sector_t offset)
> +int swap_type_of(dev_t device, sector_t offset, bool ref)
>  {
>  	int type;
>  
> @@ -2163,13 +2163,16 @@ int swap_type_of(dev_t device, sector_t offset)
>  		if (!(sis->flags & SWP_WRITEOK))
>  			continue;
>  
> -		if (device == sis->bdev->bd_dev) {
> -			struct swap_extent *se = first_se(sis);
> +		if (device != sis->bdev->bd_dev)
> +			continue;
>  
> -			if (se->start_block == offset) {
> -				spin_unlock(&swap_lock);
> -				return type;
> -			}
> +		struct swap_extent *se = first_se(sis);
> +		if (se->start_block != offset)
> +			continue;
> +
> +		if (ref && get_swap_device_info(sis)) {
> +			spin_unlock(&swap_lock);
> +			return type;

This part seems wrong, if ref == false, it never returns a usable
type value.

>  		}
>  	}
>  	spin_unlock(&swap_lock);
> @@ -2194,6 +2197,17 @@ int find_first_swap(dev_t *device)
>  	return -ENODEV;
>  }
>  
> +void put_swap_device_by_type(int type)
> +{
> +	struct swap_info_struct *sis;
> +
> +	if (type < 0 || type >= MAX_SWAPFILES)
> +		return;

Maybe we better have a WARN if type is invalid? Caller should
never do that IMO.

> +
> +	sis = swap_info[type];

We have __swap_type_to_info and swap_type_to_info since reading
swap_info have some RCU context implications (see comment in
__swap_type_to_info) and these helpers have better debug checks too.

Maybe you can just use swap_type_to_info and add a check for
type < 0 in swap_type_to_info, and WARN on NULL in
put_swap_device_by_type. How do you think?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference
  2026-03-13 13:20   ` Kairui Song
@ 2026-03-14 11:48     ` YoungJun Park
  0 siblings, 0 replies; 5+ messages in thread
From: YoungJun Park @ 2026-03-14 11:48 UTC (permalink / raw)
  To: Kairui Song
  Cc: rafael, akpm, chrisl, kasong, pavel, shikemeng, nphamcs, bhe,
	baohua, usama.arif, linux-pm, linux-mm

> Hi, YoungJun
>
> Nice work! Thanks for improving the swap part of hibernation.

Hello Kairui :D

> Some comments below:
>
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 7a09df6977a5..ecf19a581fc7 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -433,8 +433,9 @@ static inline long get_nr_swap_pages(void)
> >  }
> >
> >  extern void si_swapinfo(struct sysinfo *);
> > -int swap_type_of(dev_t device, sector_t offset);
> > +int swap_type_of(dev_t device, sector_t offset, bool ref);
> >  int find_first_swap(dev_t *device);
> > +void put_swap_device_by_type(int type);
> >  extern unsigned int count_swap_pages(int, int);
> >  extern sector_t swapdev_block(int, pgoff_t);
> >  extern int __swap_count(swp_entry_t entry);
>
> I think the `ref` parameter here is really confusing. Maybe at
> lease add some comment that some caller will block swapoff so
> no ref needed.
>
> Or could we use a new helper to grab the device? Or better, is
> it possible to grab the swap device then keep allocating it,
> instead of check the device every time from hibernation side?
>
> The combination of swap_type_of & put_swap_device_by_type call
> pair really doesn't look good.

Initially, I considered making the function grab the reference
by default and renaming it accordingly to better reflect its
role in acquiring a swap device for hibernation.

However, based on Chris Li's earlier review, this approach
would require adding a significant amount of error-handling
code to release the reference in the regular (non-uswsusp)
hibernation path. After further consideration, I concluded
that the regular hibernation path genuinely does not need the
reference, since user-space processes are already frozen.

So I will either introduce a separate helper function for
grabbing the reference or add comments to clarify the
distinction. I will also improve the naming of the put
function to make the pairing more intuitive.

> > diff --git a/kernel/power/user.c b/kernel/power/user.c
> > index 4401cfe26e5c..7ade4d0aa846 100644
> > --- a/kernel/power/user.c
> > +++ b/kernel/power/user.c
> > @@ -235,11 +238,13 @@ static int snapshot_set_swap_area(
> >             offset = swap_area.offset;
> >     }
> >
> > +   put_swap_device_by_type(data->swap);
> > +
> >     /*
> >      * User space encodes device types as two-byte values,
> >      * so we need to recode them
> >      */
> > -   data->swap = swap_type_of(swdev, offset);
> > +   data->swap = swap_type_of(swdev, offset, true);
>
> For example this put_swap_device_by_type followed by
> swap_type_of looks very strange. I guess here you only want to
> get the swap type without increasing the ref.

The put here is needed because the SNAPSHOT_SET_SWAP_AREA ioctl
can be called more than once. If the swap type changes between
calls, we need to release the reference from the previous one.
That said, if the type remains the same, there is no need to
call swap_type_of again. I think this part needs some
refinement.

> > +           if (ref && get_swap_device_info(sis)) {
> > +                   spin_unlock(&swap_lock);
> > +                   return type;
>
> This part seems wrong, if ref == false, it never returns a
> usable type value.

Oops, you are right. It should be something like:

	if (ref && !get_swap_device_info(sis))
		continue;

	spin_unlock(&swap_lock);
	return type;

> > +   if (type < 0 || type >= MAX_SWAPFILES)
> > +           return;
>
> Maybe we better have a WARN if type is invalid? Caller should
> never do that IMO.

Agreed, will add that.

> > +   sis = swap_info[type];
>
> We have __swap_type_to_info and swap_type_to_info since reading
> swap_info have some RCU context implications (see comment in
> __swap_type_to_info) and these helpers have better debug checks
> too.
>
> Maybe you can just use swap_type_to_info and add a check for
> type < 0 in swap_type_to_info, and WARN on NULL in
> put_swap_device_by_type. How do you think?

Thank you for the review. I will switch to swap_type_to_info
and add the WARN as you suggested in the next revision.

Best regards,
Youngjun Park


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-14 11:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 11:25 [RESEND RFC PATCH v3 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap reference Youngjun Park
2026-03-12 11:25 ` [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting " Youngjun Park
2026-03-13 13:20   ` Kairui Song
2026-03-14 11:48     ` YoungJun Park
2026-03-12 11:25 ` [RESEND RFC PATCH v3 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox