public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Abel Wu <wuyun.abel@bytedance.com>
To: Michal Hocko <mhocko@suse.com>, Frank van der Linden <fvdl@google.com>
Cc: Zhongkun He <hezhongkun.hzk@bytedance.com>,
	corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy()
Date: Wed, 12 Oct 2022 11:14:37 +0800	[thread overview]
Message-ID: <85145c75-f2f6-a393-daa2-967251cc3443@bytedance.com> (raw)
In-Reply-To: <Y0XEAUD9Ujcu/j8y@dhcp22.suse.cz>

On 10/12/22 3:29 AM, Michal Hocko wrote:
> On Tue 11-10-22 10:22:23, Frank van der Linden wrote:
>> On Tue, Oct 11, 2022 at 8:00 AM Michal Hocko <mhocko@suse.com> wrote:
>>>
>>> On Mon 10-10-22 09:22:13, Frank van der Linden wrote:
>>>> For consistency with process_madvise(), I would suggest calling it
>>>> process_set_mempolicy.
>>>
>>> This operation has per-thread rather than per-process semantic so I do
>>> not think your proposed naming is better.
>>
>> True. I suppose you could argue that it should have been
>> pidfd_madvise() then for consistency, but that ship has sailed.
> 
> madvise commands have per mm semantic. It is set_mempolicy which is
> kinda special and it allows to have per task_struct semantic even when
> the actual allocation is in the same address space. To be honest I am
> not really sure why that is this way because threads aim to share memory
> so why should they have different memory policies?
> 
> I suspect that the original thinking was that some portions that are
> private to the process want to have different affinities (e.g. stacks
> and dedicated per cpu heap arenas). E.g. worker pools which want to be
> per-cpu local with their own allocations but operate on shared data that
> requires different policies.
> 
>>>> Other than that, this makes sense. To complete
>>>> the set, perhaps a process_mbind() should be added as well. What do
>>>> you think?
>>>
>>> Is there any real usecase for this interface? How is the caller supposed
>>> to make per-range decisions without a very involved coordination with
>>> the target process?
>>
>> The use case for a potential pidfd_mbind() is basically a combination
>> of what is described for in the process_madvise proposal (
>> https://lore.kernel.org/lkml/20200901000633.1920247-1-minchan@kernel.org/
>> ), and what this proposal describes: system management software acting
>> as an orchestrator that has a better overview of the system as a whole
>> (NUMA nodes, memory tiering), and has knowledge of the layout of the
>> processes involved.

This is exactly why we are proposing pidfd/process_set_mempolicy().

> 
> Well, per address range operation is a completely different beast I
> would say. External tool would need to a) understand what that range is
> used for (e.g. stack/heap ranges, mmaped shared files like libraries or
> private mappings) and b) by in sync with memory layout modifications
> done by applications (e.g. that an mmap has been issued to back malloc
> request). Quite a lot of understanding about the specific process. I
> would say that with that intimate knowledge it is quite better to be
> part of the process and do those changes from within of the process
> itself.

Agreed, the orchestrator like system management software may not have
enough knowledge about per address range. And I also don't think it is
appropriate for orchestrators to overwrite tasks' mempolicy as well,
they are set for some purpose by the apps themselves. So I suggested
a per-mm policy which have a lower priority than the tasks'.

Thanks & BR,
Abel

  reply	other threads:[~2022-10-12  3:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10  9:48 [RFC] mm: add new syscall pidfd_set_mempolicy() Zhongkun He
2022-10-10 16:22 ` Frank van der Linden
2022-10-11 15:00   ` Michal Hocko
2022-10-11 17:22     ` Frank van der Linden
2022-10-11 19:29       ` Michal Hocko
2022-10-12  3:14         ` Abel Wu [this message]
2022-10-12 12:34         ` Vinicius Petrucci
2022-10-12 13:07           ` Michal Hocko
2022-10-12 13:23             ` Michal Hocko
2022-10-12 16:51           ` Frank van der Linden
2022-10-11 14:57 ` Michal Hocko
2022-10-12  7:55   ` [External] " Zhongkun He
2022-10-12  9:02     ` Michal Hocko
2022-10-12 11:22       ` Zhongkun He
2022-10-12 12:15         ` Michal Hocko
2022-10-13 10:44           ` Zhongkun He
2022-10-13 11:26             ` Michal Hocko
2022-10-13 12:50               ` Zhongkun He
2022-10-13 13:17                 ` Michal Hocko
2022-10-13 13:42                   ` Zhongkun He
2022-10-12  4:16 ` Bagas Sanjaya
2022-10-12  8:18   ` [External] " Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85145c75-f2f6-a393-daa2-967251cc3443@bytedance.com \
    --to=wuyun.abel@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=fvdl@google.com \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox