linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Andy Lutomirski" <luto@kernel.org>
To: "Gregory Price" <gregory.price@memverge.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Gregory Price" <gourry.memverge@gmail.com>,
	linux-mm@vger.kernel.org,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org,
	"Linux API" <linux-api@vger.kernel.org>,
	linux-cxl@vger.kernel.org, "Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, "Arnd Bergmann" <arnd@arndb.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"the arch/x86 maintainers" <x86@kernel.org>
Subject: Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall
Date: Tue, 19 Sep 2023 11:52:27 -0700	[thread overview]
Message-ID: <1b7ea860-55e2-48fb-86ba-ff3f9f6d8904@app.fastmail.com> (raw)
In-Reply-To: <ZQnmVI0Q/Al5UKgQ@memverge.com>



On Tue, Sep 19, 2023, at 11:20 AM, Gregory Price wrote:
> On Tue, Sep 19, 2023 at 10:59:33AM -0700, Andy Lutomirski wrote:
>> 
>> I'm not complaining about the name.  I'm objecting about the semantics.
>> 
>> Apparently you have a system to collect usage statistics of physical addresses, but you have no idea what those pages map do (without crawling /proc or /sys, anyway).  But that means you have no idea when the logical contents of those pages *changes*.  So you fundamentally have a nasty race: anything else that swaps or migrates those pages will mess up your statistics, and you'll start trying to migrate the wrong thing.
>
> How does this change if I use virtual address based migration?
>
> I could do sampling based on virtual address (page faults, IBS/PEBs,
> whatever), and by the time I make a decision, the kernel could have
> migrated the data or even my task from Node A to Node B.  The sample I
> took is now stale, and I could make a poor migration decision.

The window is a lot narrower. If you’re sampling by VA, you collect stats and associate them with the logical page (the tuple (mapping, VA), for example).  The kernel can do this without races from page faults handlers.  If you sample based on PA, you fundamentally race against anything that does migration.

>
> If I do move_pages(pid, some_virt_addr, some_node) and it migrates the
> page from NodeA to NodeB, then the device-side collection is likewise
> no longer valid.  This problem doesn't change because I used virtual
> address compared to physical address.

Sure it does, as long as you collect those samples when you migrate. And I think the kernel migrating to or from device memory (or more generally allocating and freeing device memory and possibly even regular memory) *should* be aware of whatever hotness statistics are in use.

>
> But if i have a 512GB memory device, and i can see a wide swath of that
> 512GB is hot, while a good chunk of my local DRAM is not - then I
> probably don't care *what* gets migrated up to DRAM, i just care that a
> vast majority of that hot data does.
>
> The goal here isn't 100% precision, you will never get there. The goal
> here is broad-scope performance enhancements of the overall system
> while minimizing the cost to compute the migration actions to be taken.
>
> I don't think the contents of the page are always relevant.  The entire
> concept here is to enable migration without caring about what programs
> are using the memory for - just so long as the memcg's and zoning is
> respected.
>

At the very least I think you need to be aware of page *size*.  And if you want to avoid excessive fragmentation, you probably also want to be aware of the boundaries of a logical allocation.

I think that doing this entire process by PA, blind, from userspace will end up stuck in a not-so-good solution, and the ABI will be set in stone, and it will not be a great situation for long term maintainability or performance.

  reply	other threads:[~2023-09-19 18:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07  7:54 [RFC 0/3] sys_move_phy_pages system call Gregory Price
2023-09-07  7:54 ` [RFC PATCH 1/3] mm/migrate: remove unused mm argument from do_move_pages_to_node Gregory Price
2023-09-07  7:54 ` [RFC PATCH 2/3] mm/migrate: refactor add_page_for_migration for code re-use Gregory Price
2023-09-07  7:54 ` [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall Gregory Price
2023-09-09  8:03   ` Arnd Bergmann
2023-09-08  7:46     ` Gregory Price
2023-09-09 15:18       ` Arnd Bergmann
2023-09-10 12:52         ` Gregory Price
2023-09-11 17:26           ` Arnd Bergmann
2023-09-10 15:01             ` Gregory Price
2023-09-10 20:36   ` Jonathan Corbet
2023-09-10 11:49     ` Gregory Price
2023-09-19  3:34       ` Andy Lutomirski
2023-09-19 16:31         ` Gregory Price
2023-09-19 17:59           ` Andy Lutomirski
2023-09-19 18:20             ` Gregory Price
2023-09-19 18:52               ` Andy Lutomirski [this message]
2023-09-19 19:59                 ` Gregory Price
2023-09-19  0:17   ` Thomas Gleixner
2023-09-19 15:57     ` Gregory Price
2023-09-19 16:37     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b7ea860-55e2-48fb-86ba-ff3f9f6d8904@app.fastmail.com \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).