Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Gregory Price <gregory.price@memverge.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Gregory Price <gourry.memverge@gmail.com>,
	linux-mm@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org, Linux API <linux-api@vger.kernel.org>,
	linux-cxl@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Arnd Bergmann <arnd@arndb.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall
Date: Tue, 19 Sep 2023 14:20:04 -0400	[thread overview]
Message-ID: <ZQnmVI0Q/Al5UKgQ@memverge.com> (raw)
In-Reply-To: <0a7e3ccc-db66-428e-8c09-66e67bfded51@app.fastmail.com>

On Tue, Sep 19, 2023 at 10:59:33AM -0700, Andy Lutomirski wrote:
> 
> I'm not complaining about the name.  I'm objecting about the semantics.
> 
> Apparently you have a system to collect usage statistics of physical addresses, but you have no idea what those pages map do (without crawling /proc or /sys, anyway).  But that means you have no idea when the logical contents of those pages *changes*.  So you fundamentally have a nasty race: anything else that swaps or migrates those pages will mess up your statistics, and you'll start trying to migrate the wrong thing.

How does this change if I use virtual address based migration?

I could do sampling based on virtual address (page faults, IBS/PEBs,
whatever), and by the time I make a decision, the kernel could have
migrated the data or even my task from Node A to Node B.  The sample I
took is now stale, and I could make a poor migration decision.

If I do move_pages(pid, some_virt_addr, some_node) and it migrates the
page from NodeA to NodeB, then the device-side collection is likewise
no longer valid.  This problem doesn't change because I used virtual
address compared to physical address.

But if i have a 512GB memory device, and i can see a wide swath of that
512GB is hot, while a good chunk of my local DRAM is not - then I
probably don't care *what* gets migrated up to DRAM, i just care that a
vast majority of that hot data does.

The goal here isn't 100% precision, you will never get there. The goal
here is broad-scope performance enhancements of the overall system
while minimizing the cost to compute the migration actions to be taken.

I don't think the contents of the page are always relevant.  The entire
concept here is to enable migration without caring about what programs
are using the memory for - just so long as the memcg's and zoning is
respected.

~Gregory

next prev parent reply	other threads:[~2023-09-19 18:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07  7:54 [RFC 0/3] sys_move_phy_pages system call Gregory Price
2023-09-07  7:54 ` [RFC PATCH 1/3] mm/migrate: remove unused mm argument from do_move_pages_to_node Gregory Price
2023-09-07  7:54 ` [RFC PATCH 2/3] mm/migrate: refactor add_page_for_migration for code re-use Gregory Price
2023-09-07  7:54 ` [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall Gregory Price
2023-09-09  8:03   ` Arnd Bergmann
2023-09-08  7:46     ` Gregory Price
2023-09-09 15:18       ` Arnd Bergmann
2023-09-10 12:52         ` Gregory Price
2023-09-11 17:26           ` Arnd Bergmann
2023-09-10 15:01             ` Gregory Price
2023-09-10 20:36   ` Jonathan Corbet
2023-09-10 11:49     ` Gregory Price
2023-09-19  3:34       ` Andy Lutomirski
2023-09-19 16:31         ` Gregory Price
2023-09-19 17:59           ` Andy Lutomirski
2023-09-19 18:20             ` Gregory Price [this message]
2023-09-19 18:52               ` Andy Lutomirski
2023-09-19 19:59                 ` Gregory Price
2023-09-19  0:17   ` Thomas Gleixner
2023-09-19 15:57     ` Gregory Price
2023-09-19 16:37     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQnmVI0Q/Al5UKgQ@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox