linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: chrisl@kernel.org
Cc: 21cnbao@gmail.com, aaron.lu@intel.com, akpm@linux-foundation.org,
	bhe@redhat.com, kasong@tencent.com, linux-mm@kvack.org,
	nphamcs@gmail.com, shikemeng@huaweicloud.com,
	youngjun.park@lge.com
Subject: Re: [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node
Date: Wed, 15 Oct 2025 16:09:25 +0800	[thread overview]
Message-ID: <20251015080925.4008-1-21cnbao@gmail.com> (raw)
In-Reply-To: <CACePvbVDB9-r3dTTAJ8e++1swAt9=fPRK9ex_30L=FgXBe5BpQ@mail.gmail.com>

>
> >
> > On Wed, Oct 15, 2025 at 11:06 AM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > On 10/13/25 at 02:09pm, Barry Song wrote:
> > > > > -static int swap_node(struct swap_info_struct *si)
> > > > > -{
> > > > > -       struct block_device *bdev;
> > > > > -
> > > > > -       if (si->bdev)
> > > > > -               bdev = si->bdev;
> > > > > -       else
> > > > > -               bdev = si->swap_file->f_inode->i_sb->s_bdev;
> > > > > -
> > > > > -       return bdev ? bdev->bd_disk->node_id : NUMA_NO_NODE;
> > > > > -}
> > > > > -
> > > >
> > > > Looking at the code, it seems to have some hardware affinity awareness,
> > > > as it uses the swapfile’s bdev’s node_id. Are we regressing cases where
> > > > each node has a closer block device?
> > >
> > > I had talked about this with Chris before I posted v1. We don't need to
> > > worry about this because:
> > >
> > > 1) Kernel code rarely set disk->node_id, all disks just assign
> > > NUMA_NO_NODE to it except of these:
> > >
> > > drivers/nvdimm/pmem.c <<pmem_attach_disk>>
> > > drivers/md/dm.c <<alloc_dev>>
> > >
> > > For intel ssd Aaron introduced the node based si choosing is for, it
> > > should be Optane which has been discontinued. It could be wrong, then
> > > hope intel can help test so that we can see what impact is brought in.
> > >
> > > 2) The gap between disk io and memory accessing
> > > Usually memory accessing is nanosecond level, while disk io is
> > > microsecond level, HDD even could be at millisecond. The node affinity
> > > saving nanoseconds is negligible compared to the disk's own acessing
> > > speed. This includes pmem, its io is more than ten times or even more
> > > than memory accessing.
> >
> > I agree that it’s fine to remove the code if the related hardware is obsolete.
> > I found a paper [1] showing that accessing local Optane PMEM is much faster
> > than accessing remote Optane PMEM (see slides 4 and 5). That might explain why
> > they started the project to make swapfile NUMA-aware.
>
> Are you suggesting the swapfiel is used for PMEM devices? It sounds
> very strange to back swapfile with PMEM. I am under the impression
> that the original a2468cc9bfdf commit is introduced with the intel SSD
> as a testing swapfile device. I just looked it up. Here is what I find
> out in the commit log:
>
> ======= quote ========
>     To see the effect of the patch, a test that starts N process, each mmap
>     a region of anonymous memory and then continually write to it at random
>     position to trigger both swap in and out is used.
>
>     On a 2 node Skylake EP machine with 64GiB memory, two 170GB SSD drives
>     are used as swap devices with each attached to a different node, the
>     result is:
> ======= end quote =====
>
> > My point is that we should at least mention this in the changelog to
> > honor their past contributions. But since the hardware is no longer used,
> > we can remove the code to reduce complexity and stop maintaining it.
>
> Optane was not even supported in Skylake. Commit a2468cc9bfdf has
> nothing to do with Optane. The Op]tane talk in a2468cc9bfdf is just a
> red herring. I fail to see why reverting a2468cc9bfdf needs to mention
> Optane is obsolete.

Thanks for the clarification. The Optane discussion turned out to be a goof :-)

Just for the record, this paper [1] also mentions that accessing remote SSDs
can significantly decrease performance. However, it is rare to find any NUMA
machine using SSDs directly as swap files without a RAM compression frontend,
so I don’t think the performance penalty of remote access would be a problem
when choosing a direct swapfile.

[1] https://shbakram.github.io/assets/papers/akram-caos12.pdf

Thanks
Barry


  reply	other threads:[~2025-10-15  8:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-11  8:16 [PATCH v4 mm-new 0/2] mm/swapfile.c: select the swap device with default priority round robin Baoquan He
2025-10-11  8:16 ` [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node Baoquan He
2025-10-11 20:45   ` kernel test robot
2025-10-11 22:04     ` Andrew Morton
2025-10-12  2:08       ` Baoquan He
2025-10-14 11:56       ` Baoquan He
2025-10-13  6:09   ` Barry Song
2025-10-14 21:50     ` Chris Li
2025-10-15  3:06     ` Baoquan He
2025-10-15  5:02       ` Barry Song
2025-10-15  6:23         ` Chris Li
2025-10-15  8:09           ` Barry Song [this message]
2025-10-15 13:27             ` Chris Li
2025-10-11  8:16 ` [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin Baoquan He
2025-10-12 20:40   ` Barry Song
2025-10-13  3:58     ` Baoquan He
2025-10-13  6:17       ` Barry Song
2025-10-13 23:07         ` Baoquan He
2025-10-14 22:11         ` Chris Li
2025-10-15  4:29           ` Barry Song
2025-10-15  6:24             ` Chris Li
2025-10-14 22:01     ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251015080925.4008-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).