From: Chris Li <chrisl@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: Baoquan He <bhe@redhat.com>,
linux-mm@kvack.org, akpm@linux-foundation.org,
kasong@tencent.com, youngjun.park@lge.com, aaron.lu@intel.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com
Subject: Re: [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin
Date: Tue, 14 Oct 2025 15:01:36 -0700 [thread overview]
Message-ID: <CACePvbXmd840LQsWBrEGA1SBUQVznNC3wmT8jU6+7GaBwTTW6Q@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4y4CLu7qeHijhJtL+NDrehfiWpu9mtsVGxmn5rBy03v0w@mail.gmail.com>
On Sun, Oct 12, 2025 at 1:41 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sun, Oct 12, 2025 at 5:14 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > Swap devices are assumed to have similar accessing speed if no priority
> > is specified when swapon. It's unfair and doesn't make sense just because
> > one swap device is swapped on firstly, its priority will be higher than
> > the one swapped on later.
> >
> > Here, set all swap devicess to have priority '-1' by default. With this
> > change, swap device with default priority will be selected round robin
> > when swapping out. This can improve the swapping efficiency a lot among
> > multiple swap devices with default priority.
> >
> > Below are swapon output during processes high pressure vm-scability test
> > is being taken:
> >
> > 1) This is pre-commit a2468cc9bfdf, swap device is selectd one by one by
> > priority from high to low when one swap device is exhausted:
> > ------------------------------------
> > [root@hp-dl385g10-03 ~]# swapon
> > NAME TYPE SIZE USED PRIO
> > /dev/zram0 partition 16G 16G -1
> > /dev/zram1 partition 16G 966.2M -2
> > /dev/zram2 partition 16G 0B -3
> > /dev/zram3 partition 16G 0B -4
> >
> > 2) This is behaviour with commit a2468cc9bfdf, on node, swap device
> > sharing the same node id is selected firstly until exhausted; while
> > on node no swap device sharing the node id it selects the one with
> > highest priority until exhaustd:
> > ------------------------------------
> > [root@hp-dl385g10-03 ~]# swapon
> > NAME TYPE SIZE USED PRIO
> > /dev/zram0 partition 16G 15.7G -2
> > /dev/zram1 partition 16G 3.4G -3
> > /dev/zram2 partition 16G 3.4G -4
> > /dev/zram3 partition 16G 2.6G -5
> >
> > 3) After this patch applied, swap devices with default priority are selectd
> > round robin:
> > ------------------------------------
> > [root@hp-dl385g10-03 block]# swapon
> > NAME TYPE SIZE USED PRIO
> > /dev/zram0 partition 16G 6.6G -1
> > /dev/zram1 partition 16G 6.6G -1
> > /dev/zram2 partition 16G 6.6G -1
> > /dev/zram3 partition 16G 6.6G -1
> >
> > With the change, we can see about 18% efficiency promotion relative to
> > node based way as below. (Surely, the pre-commit a2468cc9bfdf way is
> > the worst.)
> >
>
> I’m not against the behavior change; but the swapon man page says:
> "
> Each swap area has a priority, either high or low. The default
> priority is low. Within the low-priority areas, newer areas are
> even lower priority than older areas.
> "
> So my question is whether users still assume that newly added swap areas
> get a lower priority than the older ones?
That is a good catch, if the per node_id swapfile logic reverted, the
man page should be updated to match the kernel behavior as well.
It is a good place to describe the default round robin behavior.
> I assume the priority decrement isn’t a stable ABI, so this change won’t
> break userspace?
There is no ABI change as far as I can tell. The swapon has an option
to specify the priority. The default swap_on does not specify the
priority. It is a kernel internal tuning how we arrange the default
swapfile for the better performance by default. If the user don't
happy with that arrangement, they can always specify a priority with
the existing ABI, there is no ABI change.
> Or if someone sets up Linux assuming that a newer swap file will only be
> used after the older one is full, then this change would break those cases?
The existing kernel implementation always fills up the high priority
swapfile before the low priority one, which hasn't changed.
The negative node_id has been removed/reverted, that is a behavior
change yet. But I fail to see how it breaks the user. If you have a
test case that breaks the user, please specify.
Chris
prev parent reply other threads:[~2025-10-14 22:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-11 8:16 [PATCH v4 mm-new 0/2] mm/swapfile.c: select the swap device with default priority round robin Baoquan He
2025-10-11 8:16 ` [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node Baoquan He
2025-10-11 20:45 ` kernel test robot
2025-10-11 22:04 ` Andrew Morton
2025-10-12 2:08 ` Baoquan He
2025-10-14 11:56 ` Baoquan He
2025-10-13 6:09 ` Barry Song
2025-10-14 21:50 ` Chris Li
2025-10-15 3:06 ` Baoquan He
2025-10-15 5:02 ` Barry Song
2025-10-15 6:23 ` Chris Li
2025-10-15 8:09 ` Barry Song
2025-10-15 13:27 ` Chris Li
2025-10-11 8:16 ` [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin Baoquan He
2025-10-12 20:40 ` Barry Song
2025-10-13 3:58 ` Baoquan He
2025-10-13 6:17 ` Barry Song
2025-10-13 23:07 ` Baoquan He
2025-10-14 22:11 ` Chris Li
2025-10-15 4:29 ` Barry Song
2025-10-15 6:24 ` Chris Li
2025-10-14 22:01 ` Chris Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACePvbXmd840LQsWBrEGA1SBUQVznNC3wmT8jU6+7GaBwTTW6Q@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=21cnbao@gmail.com \
--cc=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).