linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhongkun He <hezhongkun.hzk@bytedance.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Vinicius Petrucci <vpetrucci@gmail.com>,
	akpm@linux-foundation.org,  linux-mm@vger.kernel.org,
	linux-cxl@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org,  linux-api@vger.kernel.org,
	minchan@kernel.org, dave.hansen@linux.intel.com,  x86@kernel.org,
	Jonathan.Cameron@huawei.com, aneesh.kumar@linux.ibm.com,
	 ying.huang@intel.com, dan.j.williams@intel.com, fvdl@google.com,
	 surenb@google.com, rientjes@google.com, hannes@cmpxchg.org,
	mhocko@suse.com,  Hasan.Maruf@amd.com, jgroves@micron.com,
	ravis.opensrc@micron.com,  sthanneeru@micron.com,
	emirakhur@micron.com, vtavarespetr@micron.com
Subject: Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for external memory binding
Date: Thu, 30 Nov 2023 17:34:04 +0800	[thread overview]
Message-ID: <CACSyD1MrCzyV-93Ov07NpV3Nm3u0fYExmD1ShE_e2tapW6a6HA@mail.gmail.com> (raw)
In-Reply-To: <ZV/HSFMmv3xwkNPL@memverge.com>

Hi Gregory, sorry for the late reply.

I tried pidfd_set_mempolicy(suggested by michal) about a year ago.
There is a problem here that may need attention.

A mempolicy can be either associated with a process or with a VMA.
All vma manipulation is somewhat protected by a down_read on
mmap_lock.In process context(in alloc_pages()) there is no locking
because only the process accesses its own state.

Now  we need to change the process context mempolicy specified
in pidfd. the mempolicy may about to be freed by
pidfd_set_mempolicy() while alloc_pages() is using it,
The race condition appears.

Say something like the following:

pidfd_set_mempolicy()        target task stack:
                                               alloc_pages:
                                             mpol = p->mempolicy;
  task_lock(task);
  old = task->mempolicy;
  task->mempolicy = new;
  task_unlock(task);
  mpol_put(old);
                                           /*old mpol has been freed.*/
                                           policy_node(...., mpol)
                                          __alloc_pages();

To reduce the use of locks and atomic operations(mpol_get/put)
in the hot path, there are no references or lock protections here
for task mempolicy.

It would be great if your refactoring has a good solution.

Thanks.

On Sat, Nov 25, 2023 at 4:09 AM Gregory Price
<gregory.price@memverge.com> wrote:
>
> On Fri, Nov 24, 2023 at 04:13:41PM +0800, Zhongkun He wrote:
> >
> > Per my understanding,  the process_mbind() is implementable without
> > many difficult challenges,
> > since it is always protected by mm->mmap_lock. But task mempolicy does
> > not acquire any lock
> > in alloc_pages().
>
> per-vma policies are protected by the mmap lock, while the task
> mempolicy is protected by the task lock on replacement, and
> task->mems_allowed (protected by task_lock).
>
> There is an update in my refactor tickets that enforces the acquisition
> of task_lock on mpol_set_nodemask, which prevents the need for
> alloc_pages to do anything else.  That's not present in this patch.
>
> Basically mems_allowed deals with the majority of situations, and
> mmap_lock deals with per-vma mempolicy changes and migrations.
>
> ~Gregory

  reply	other threads:[~2023-11-30  9:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22 21:31 [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for external memory binding Vinicius Petrucci
2023-11-22 21:39 ` Andrew Morton
2023-11-22 21:45   ` Gregory Price
2023-11-22 23:57   ` Vinicius Petrucci
2023-11-22 23:53 ` Gregory Price
2023-11-23 15:21   ` Vinicius Petrucci
2023-11-24  8:13 ` Zhongkun He
2023-11-23 21:42   ` Gregory Price
2023-11-30  9:34     ` Zhongkun He [this message]
2023-11-30 16:07       ` Gregory Price
2023-12-01 13:53         ` Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACSyD1MrCzyV-93Ov07NpV3Nm3u0fYExmD1ShE_e2tapW6a6HA@mail.gmail.com \
    --to=hezhongkun.hzk@bytedance.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=fvdl@google.com \
    --cc=gregory.price@memverge.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=ravis.opensrc@micron.com \
    --cc=rientjes@google.com \
    --cc=sthanneeru@micron.com \
    --cc=surenb@google.com \
    --cc=vpetrucci@gmail.com \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).