From: Michal Hocko <mhocko@suse.com>
To: Frank van der Linden <fvdl@google.com>
Cc: Zhongkun He <hezhongkun.hzk@bytedance.com>,
corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
linux-doc@vger.kernel.org, wuyun.abel@bytedance.com
Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy()
Date: Tue, 11 Oct 2022 21:29:05 +0200 [thread overview]
Message-ID: <Y0XEAUD9Ujcu/j8y@dhcp22.suse.cz> (raw)
In-Reply-To: <CAPTztWZZOxtzdEm=wbOiL_VDPJuCaW0XVCvsdRpCHE+ph+5eZQ@mail.gmail.com>
On Tue 11-10-22 10:22:23, Frank van der Linden wrote:
> On Tue, Oct 11, 2022 at 8:00 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 10-10-22 09:22:13, Frank van der Linden wrote:
> > > For consistency with process_madvise(), I would suggest calling it
> > > process_set_mempolicy.
> >
> > This operation has per-thread rather than per-process semantic so I do
> > not think your proposed naming is better.
>
> True. I suppose you could argue that it should have been
> pidfd_madvise() then for consistency, but that ship has sailed.
madvise commands have per mm semantic. It is set_mempolicy which is
kinda special and it allows to have per task_struct semantic even when
the actual allocation is in the same address space. To be honest I am
not really sure why that is this way because threads aim to share memory
so why should they have different memory policies?
I suspect that the original thinking was that some portions that are
private to the process want to have different affinities (e.g. stacks
and dedicated per cpu heap arenas). E.g. worker pools which want to be
per-cpu local with their own allocations but operate on shared data that
requires different policies.
> > > Other than that, this makes sense. To complete
> > > the set, perhaps a process_mbind() should be added as well. What do
> > > you think?
> >
> > Is there any real usecase for this interface? How is the caller supposed
> > to make per-range decisions without a very involved coordination with
> > the target process?
>
> The use case for a potential pidfd_mbind() is basically a combination
> of what is described for in the process_madvise proposal (
> https://lore.kernel.org/lkml/20200901000633.1920247-1-minchan@kernel.org/
> ), and what this proposal describes: system management software acting
> as an orchestrator that has a better overview of the system as a whole
> (NUMA nodes, memory tiering), and has knowledge of the layout of the
> processes involved.
Well, per address range operation is a completely different beast I
would say. External tool would need to a) understand what that range is
used for (e.g. stack/heap ranges, mmaped shared files like libraries or
private mappings) and b) by in sync with memory layout modifications
done by applications (e.g. that an mmap has been issued to back malloc
request). Quite a lot of understanding about the specific process. I
would say that with that intimate knowledge it is quite better to be
part of the process and do those changes from within of the process
itself.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2022-10-11 19:29 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-10 9:48 [RFC] mm: add new syscall pidfd_set_mempolicy() Zhongkun He
2022-10-10 16:22 ` Frank van der Linden
2022-10-11 15:00 ` Michal Hocko
2022-10-11 17:22 ` Frank van der Linden
2022-10-11 19:29 ` Michal Hocko [this message]
2022-10-12 3:14 ` Abel Wu
2022-10-12 12:34 ` Vinicius Petrucci
2022-10-12 13:07 ` Michal Hocko
2022-10-12 13:23 ` Michal Hocko
2022-10-12 16:51 ` Frank van der Linden
2022-10-11 14:57 ` Michal Hocko
2022-10-12 7:55 ` [External] " Zhongkun He
2022-10-12 9:02 ` Michal Hocko
2022-10-12 11:22 ` Zhongkun He
2022-10-12 12:15 ` Michal Hocko
2022-10-13 10:44 ` Zhongkun He
2022-10-13 11:26 ` Michal Hocko
2022-10-13 12:50 ` Zhongkun He
2022-10-13 13:17 ` Michal Hocko
2022-10-13 13:42 ` Zhongkun He
2022-10-12 4:16 ` Bagas Sanjaya
2022-10-12 8:18 ` [External] " Zhongkun He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0XEAUD9Ujcu/j8y@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=fvdl@google.com \
--cc=hezhongkun.hzk@bytedance.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wuyun.abel@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox