From: Kairui Song <ryncsn@gmail.com>
To: Youngjun Park <youngjun.park@lge.com>
Cc: rafael@kernel.org, akpm@linux-foundation.org, chrisl@kernel.org,
kasong@tencent.com, pavel@kernel.org, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
usama.arif@linux.dev, linux-pm@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference
Date: Fri, 13 Mar 2026 21:20:22 +0800 [thread overview]
Message-ID: <abQMiv6Y4WEKTLng@KASONG-MC4> (raw)
In-Reply-To: <20260312112511.3596781-2-youngjun.park@lge.com>
On Thu, Mar 12, 2026 at 08:25:10PM +0800, Youngjun Park wrote:
> Hibernation can be triggered either via the sysfs interface or via the
> uswsusp utility using /dev/snapshot ioctls.
>
> In the case of uswsusp, the resume device is configured either by the
> boot parameter during snapshot_open() or via the SNAPSHOT_SET_SWAP_AREA
> ioctl. However, a race condition exists between setting this swap area
> and actually allocating a swap slot via the SNAPSHOT_ALLOC_SWAP_PAGE
> ioctl. For instance, if swapoff is executed and a different swap device
> is enabled during this window, an incorrect swap slot might be allocated.
>
> Hibernation via the sysfs interface does not suffer from this race
> condition because user-space processes are frozen before proceeding,
> making it impossible to execute swapoff.
>
> To resolve this race in uswsusp, modify swap_type_of() to properly
> acquire a reference to the swap device using get_swap_device().
>
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
> include/linux/swap.h | 3 ++-
> kernel/power/swap.c | 2 +-
> kernel/power/user.c | 11 ++++++++---
> mm/swapfile.c | 28 +++++++++++++++++++++-------
> 4 files changed, 32 insertions(+), 12 deletions(-)
Hi, YoungJun
Nice work! Thanks for improving the swap part of hibernation.
Some comments below:
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 7a09df6977a5..ecf19a581fc7 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -433,8 +433,9 @@ static inline long get_nr_swap_pages(void)
> }
>
> extern void si_swapinfo(struct sysinfo *);
> -int swap_type_of(dev_t device, sector_t offset);
> +int swap_type_of(dev_t device, sector_t offset, bool ref);
> int find_first_swap(dev_t *device);
> +void put_swap_device_by_type(int type);
> extern unsigned int count_swap_pages(int, int);
> extern sector_t swapdev_block(int, pgoff_t);
> extern int __swap_count(swp_entry_t entry);
I think the `ref` parameter here is really confusing. Maybe at lease
add some comment that some caller will block swapoff so no ref needed.
Or could we use a new helper to grab the device? Or better, is it
possible to grab the swap device then keep allocating it, instead of
check the device every time from hibernation side?
The combination of swap_type_of & put_swap_device_by_type call pair
really doesn't look good.
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 4401cfe26e5c..7ade4d0aa846 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
> memset(&data->handle, 0, sizeof(struct snapshot_handle));
> if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
> /* Hibernating. The image device should be accessible. */
> - data->swap = swap_type_of(swsusp_resume_device, 0);
> + data->swap = swap_type_of(swsusp_resume_device, 0, true);
> data->mode = O_RDONLY;
> data->free_bitmaps = false;
> error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
> @@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp)
> data->free_bitmaps = !error;
> }
> }
> - if (error)
> + if (error) {
> + put_swap_device_by_type(data->swap);
> hibernate_release();
> + }
>
> data->frozen = false;
> data->ready = false;
> @@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
> data = filp->private_data;
> data->dev = 0;
> free_all_swap_pages(data->swap);
> + put_swap_device_by_type(data->swap);
> if (data->frozen) {
> pm_restore_gfp_mask();
> free_basic_memory_bitmaps();
> @@ -235,11 +238,13 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
> offset = swap_area.offset;
> }
>
> + put_swap_device_by_type(data->swap);
> +
> /*
> * User space encodes device types as two-byte values,
> * so we need to recode them
> */
> - data->swap = swap_type_of(swdev, offset);
> + data->swap = swap_type_of(swdev, offset, true);
For example this put_swap_device_by_type followed by swap_type_of
looks very strange. I guess here you only want to get the swap type
without increasing the ref.
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d864866a35ea..5a3d5c1e1f81 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2149,7 +2149,7 @@ void swap_free_hibernation_slot(swp_entry_t entry)
> *
> * This is needed for the suspend to disk (aka swsusp).
> */
> -int swap_type_of(dev_t device, sector_t offset)
> +int swap_type_of(dev_t device, sector_t offset, bool ref)
> {
> int type;
>
> @@ -2163,13 +2163,16 @@ int swap_type_of(dev_t device, sector_t offset)
> if (!(sis->flags & SWP_WRITEOK))
> continue;
>
> - if (device == sis->bdev->bd_dev) {
> - struct swap_extent *se = first_se(sis);
> + if (device != sis->bdev->bd_dev)
> + continue;
>
> - if (se->start_block == offset) {
> - spin_unlock(&swap_lock);
> - return type;
> - }
> + struct swap_extent *se = first_se(sis);
> + if (se->start_block != offset)
> + continue;
> +
> + if (ref && get_swap_device_info(sis)) {
> + spin_unlock(&swap_lock);
> + return type;
This part seems wrong, if ref == false, it never returns a usable
type value.
> }
> }
> spin_unlock(&swap_lock);
> @@ -2194,6 +2197,17 @@ int find_first_swap(dev_t *device)
> return -ENODEV;
> }
>
> +void put_swap_device_by_type(int type)
> +{
> + struct swap_info_struct *sis;
> +
> + if (type < 0 || type >= MAX_SWAPFILES)
> + return;
Maybe we better have a WARN if type is invalid? Caller should
never do that IMO.
> +
> + sis = swap_info[type];
We have __swap_type_to_info and swap_type_to_info since reading
swap_info have some RCU context implications (see comment in
__swap_type_to_info) and these helpers have better debug checks too.
Maybe you can just use swap_type_to_info and add a check for
type < 0 in swap_type_to_info, and WARN on NULL in
put_swap_device_by_type. How do you think?
next prev parent reply other threads:[~2026-03-13 13:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 11:25 [RESEND RFC PATCH v3 0/2] mm/swap, PM: hibernate: fix swapoff race and optimize swap reference Youngjun Park
2026-03-12 11:25 ` [RESEND RFC PATCH v3 1/2] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting " Youngjun Park
2026-03-13 13:20 ` Kairui Song [this message]
2026-03-14 11:48 ` YoungJun Park
2026-03-12 11:25 ` [RESEND RFC PATCH v3 2/2] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abQMiv6Y4WEKTLng@KASONG-MC4 \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=nphamcs@gmail.com \
--cc=pavel@kernel.org \
--cc=rafael@kernel.org \
--cc=shikemeng@huaweicloud.com \
--cc=usama.arif@linux.dev \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox