linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: Baoquan He <bhe@redhat.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	 kasong@tencent.com, youngjun.park@lge.com, aaron.lu@intel.com,
	 shikemeng@huaweicloud.com, nphamcs@gmail.com
Subject: Re: [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin
Date: Tue, 14 Oct 2025 15:01:36 -0700	[thread overview]
Message-ID: <CACePvbXmd840LQsWBrEGA1SBUQVznNC3wmT8jU6+7GaBwTTW6Q@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4y4CLu7qeHijhJtL+NDrehfiWpu9mtsVGxmn5rBy03v0w@mail.gmail.com>

On Sun, Oct 12, 2025 at 1:41 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sun, Oct 12, 2025 at 5:14 AM Baoquan He <bhe@redhat.com> wrote:
> >
> > Swap devices are assumed to have similar accessing speed if no priority
> > is specified when swapon. It's unfair and doesn't make sense just because
> > one swap device is swapped on firstly, its priority will be higher than
> > the one swapped on later.
> >
> > Here, set all swap devicess to have priority '-1' by default. With this
> > change, swap device with default priority will be selected round robin
> > when swapping out. This can improve the swapping efficiency a lot among
> > multiple swap devices with default priority.
> >
> > Below are swapon output during processes high pressure vm-scability test
> > is being taken:
> >
> > 1) This is pre-commit a2468cc9bfdf, swap device is selectd one by one by
> >    priority from high to low when one swap device is exhausted:
> > ------------------------------------
> > [root@hp-dl385g10-03 ~]# swapon
> > NAME       TYPE      SIZE   USED PRIO
> > /dev/zram0 partition  16G    16G   -1
> > /dev/zram1 partition  16G 966.2M   -2
> > /dev/zram2 partition  16G     0B   -3
> > /dev/zram3 partition  16G     0B   -4
> >
> > 2) This is behaviour with commit a2468cc9bfdf, on node, swap device
> >    sharing the same node id is selected firstly until exhausted; while
> >    on node no swap device sharing the node id it selects the one with
> >    highest priority until exhaustd:
> > ------------------------------------
> > [root@hp-dl385g10-03 ~]# swapon
> > NAME       TYPE      SIZE  USED PRIO
> > /dev/zram0 partition  16G 15.7G   -2
> > /dev/zram1 partition  16G  3.4G   -3
> > /dev/zram2 partition  16G  3.4G   -4
> > /dev/zram3 partition  16G  2.6G   -5
> >
> > 3) After this patch applied, swap devices with default priority are selectd
> >    round robin:
> > ------------------------------------
> > [root@hp-dl385g10-03 block]# swapon
> > NAME       TYPE      SIZE USED PRIO
> > /dev/zram0 partition  16G 6.6G   -1
> > /dev/zram1 partition  16G 6.6G   -1
> > /dev/zram2 partition  16G 6.6G   -1
> > /dev/zram3 partition  16G 6.6G   -1
> >
> > With the change, we can see about 18% efficiency promotion relative to
> > node based way as below. (Surely, the pre-commit a2468cc9bfdf way is
> > the worst.)
> >
>
> I’m not against the behavior change; but the swapon man page says:
> "
>        Each swap area has a priority, either high or low.  The default
>        priority is low.  Within the low-priority areas, newer areas are
>        even lower priority than older areas.
> "
> So my question is whether users still assume that newly added swap areas
> get a lower priority than the older ones?

That is a good catch, if the per node_id swapfile logic reverted, the
man page should be updated to match the kernel behavior as well.

It is a good place to describe the default round robin behavior.

> I assume the priority decrement isn’t a stable ABI, so this change won’t
> break userspace?

There is no ABI change as far as I can tell. The swapon has an option
to specify the priority. The default swap_on does not specify the
priority. It is a kernel internal tuning how we arrange the default
swapfile for the better performance by default. If the user don't
happy with that arrangement, they can always specify a priority with
the existing ABI, there is no ABI change.

> Or if someone sets up Linux assuming that a newer swap file will only be
> used after the older one is full, then this change would break those cases?

The existing kernel implementation always fills up the high priority
swapfile before the low priority one, which hasn't changed.

The negative node_id has been removed/reverted, that is a behavior
change yet. But I fail to see how it breaks the user. If you have a
test case that breaks the user, please specify.

Chris


      parent reply	other threads:[~2025-10-14 22:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-11  8:16 [PATCH v4 mm-new 0/2] mm/swapfile.c: select the swap device with default priority round robin Baoquan He
2025-10-11  8:16 ` [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node Baoquan He
2025-10-11 20:45   ` kernel test robot
2025-10-11 22:04     ` Andrew Morton
2025-10-12  2:08       ` Baoquan He
2025-10-14 11:56       ` Baoquan He
2025-10-13  6:09   ` Barry Song
2025-10-14 21:50     ` Chris Li
2025-10-15  3:06     ` Baoquan He
2025-10-15  5:02       ` Barry Song
2025-10-15  6:23         ` Chris Li
2025-10-15  8:09           ` Barry Song
2025-10-15 13:27             ` Chris Li
2025-10-11  8:16 ` [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin Baoquan He
2025-10-12 20:40   ` Barry Song
2025-10-13  3:58     ` Baoquan He
2025-10-13  6:17       ` Barry Song
2025-10-13 23:07         ` Baoquan He
2025-10-14 22:11         ` Chris Li
2025-10-15  4:29           ` Barry Song
2025-10-15  6:24             ` Chris Li
2025-10-14 22:01     ` Chris Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACePvbXmd840LQsWBrEGA1SBUQVznNC3wmT8jU6+7GaBwTTW6Q@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).