From: Baoquan He <bhe@redhat.com>
To: YoungJun Park <youngjun.park@lge.com>
Cc: akpm@linux-foundation.org, chrisl@kernel.org, kasong@tencent.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com, baohua@kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
Date: Thu, 27 Nov 2025 18:50:03 +0800 [thread overview]
Message-ID: <aSgs2y3b/DhsMMHD@MiWiFi-R3L-srv> (raw)
In-Reply-To: <aSgrdLkaLjzVh6kv@yjaykim-PowerEdge-T330>
On 11/27/25 at 07:44pm, YoungJun Park wrote:
> On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote:
> > On 11/27/25 at 06:34pm, YoungJun Park wrote:
> > > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> > > > On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > > > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > > > > and is removed from the plist during iteration, the operation fails
> > > > > > > even when other swap devices with pending discard entries remain
> > > > > > > available.
> > > > > > >
> > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > > > > when the next node is removed during discard operations.
> > > > > > >
> > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > > > > operations rather than frequent availability changes, reducing
> > > > > > > exceptional condition checks and lock contention.
> > > > > > >
> > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > > > > ---
> > > > > > > mm/swapfile.c | 18 +++++++++++-------
> > > > > > > 1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > > > index d12332423a06..998271aa09c3 100644
> > > > > > > --- a/mm/swapfile.c
> > > > > > > +++ b/mm/swapfile.c
> > > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > > > > > bool ret = false;
> > > > > > > struct swap_info_struct *si, *next;
> > > > > > >
> > > > > > > - spin_lock(&swap_avail_lock);
> > > > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > > > > - spin_unlock(&swap_avail_lock);
> > > > > > > + spin_lock(&swap_lock);
> > > > > > > +start_over:
> > > > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > > > > + spin_unlock(&swap_lock);
> > > > > > > if (get_swap_device_info(si)) {
> > > > > > > if (si->flags & SWP_PAGE_DISCARD)
> > > > > > > ret = swap_do_scheduled_discard(si);
> > > > > > > put_swap_device(si);
> > > > > > > }
> > > > > > > if (ret)
> > > > > > > - return true;
> > > > > > > - spin_lock(&swap_avail_lock);
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > + spin_lock(&swap_lock);
> > > > > > > + if (plist_node_empty(&next->list))
> > > > > > > + goto start_over;
> > > > >
> > > > > By forcing a brief delay right before the swap_lock, I was able to observe at
> > > > > runtime that when the next node is removed (due to swapoff), and there is no
> > > > > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > > > > when the iteration continues to the next entry, it keeps retrying on itself,
> > > > > since the list traversal termination condition is based on whether the current
> > > > > node is the head or not.
> > > > >
> > > > > At first glance, I had assumed that plist_node_empty also implicitly served as
> > > > > a termination condition of plist_for_each_entry_safe.
> > > > >
> > > > > Therefore, the real reason for this patch is not:
> > > > > "swap_sync_discard() has an issue where if the next device becomes full
> > > > > and is removed from the plist during iteration, the operation fails even
> > > > > when other swap devices with pending discard entries remain available."
> > > > > but rather:
> > > > > "When the next node is removed, the next pointer loops back to the current
> > > > > entry, possibly causing an loop until it will be reinserted on the list."
> > > > >
> > > > > So, the plist_node_empty check is necessary — either as it is now (not the original
> > > > > code, the patch I modified) or as a break condition
> > > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> > > >
> > > > OK, I only thought of swap on/off case, didn't think much. As you
> > > > analyzed, the plist_node_empty check is necessary. So this patch looks
> > > > good to me. Or one alternative way is fetching the new next? Not strong
> > > > opinion though.
> > > >
> > > > if (plist_node_empty(&next->list)) {
> > > > if (!plist_node_empty(&si->list)) {
> > > > next = list_next_entry(si, list.node_list);
> > > > continue;
> > > > }
> > > > return false;
> > > > }
> > >
> > > Thank you for the suggestion :D
> > > I agree it could be an improvement in some cases.
> > > Personally, I feel the current code works fine,
> > > and from a readability perspective, the current approach might be a bit clearer.
> > > It also seems that the alternative would only make a difference in very minor cases.
> > > (order 0, swapfail and swapoff during on this routine)
> >
> > Agree. Will you post v2 to update the patch log? I would like to add my
> > reviewing tag if no v2 is planned.
>
> Oops, I’ve just posted v2 to update the patch log.
> Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53
Saw it, thanks.
next prev parent reply other threads:[~2025-11-27 10:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park
2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
2025-11-26 18:23 ` Kairui Song
2025-11-27 2:22 ` YoungJun Park
2025-11-27 2:15 ` Baoquan He
2025-11-27 2:54 ` YoungJun Park
2025-11-27 5:42 ` YoungJun Park
2025-11-27 8:06 ` Baoquan He
2025-11-27 9:34 ` YoungJun Park
2025-11-27 10:32 ` Baoquan He
2025-11-27 10:44 ` YoungJun Park
2025-11-27 10:50 ` Baoquan He [this message]
2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSgs2y3b/DhsMMHD@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.