public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Vinicius Petrucci <vpetrucci@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@vger.kernel.org,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-api@vger.kernel.org,
	minchan@kernel.org, dave.hansen@linux.intel.com, x86@kernel.org,
	Jonathan.Cameron@huawei.com, aneesh.kumar@linux.ibm.com,
	ying.huang@intel.com, dan.j.williams@intel.com,
	hezhongkun.hzk@bytedance.com, fvdl@google.com, surenb@google.com,
	rientjes@google.com, hannes@cmpxchg.org, mhocko@suse.com,
	Hasan.Maruf@amd.com, jgroves@micron.com,
	ravis.opensrc@micron.com, sthanneeru@micron.com,
	emirakhur@micron.com, vtavarespetr@micron.com
Subject: Re: [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for external memory binding
Date: Wed, 22 Nov 2023 18:53:42 -0500	[thread overview]
Message-ID: <ZV6Uhsg6WLBtNqU3@memverge.com> (raw)
In-Reply-To: <ZV5zGROLefrsEcHJ@r13-u19.micron.com>

On Wed, Nov 22, 2023 at 03:31:05PM -0600, Vinicius Petrucci wrote:
> From: Vinicius Tavares Petrucci <vtavarespetr@micron.com>
> 
> The proposed API is as follows:
> 
> long process_mbind(int pidfd, 
>                 const struct iovec *iovec, 
>                 unsigned long vlen, 
>                 unsigned long mode, 
>                 const unsigned long *nmask,
>                 unsigned int flags);
> 
> The `pidfd` argument is used to select the process that is identified by the PID file 
> descriptor provided in pidfd. (See pidofd_open(2) for more information)
>

This is probably a more maintainable interface than the pid_t interface
in my RFC, but I have not used pidfd extensively myself.

I can pivot my RFC to utilize pidfd as a whole and pop this on top, if
the other interfaces are generally useful.

> Please note the initial `maxnode` parameter from `mbind` was omitted 
> to ensure the API doesn't exceed 6 arguments. Instead, the constant 
> MAX_NUMNODES was utilized.
> 

I don't think this will work, users have traditionally been allowed to
shorten their nodemasks, and also for some level of portability.

We may want to consider an arg structure, rather than just chopping an
argument off.

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 10a590ee1c89..91ee300fa728 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1215,11 +1215,10 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
>  }
>  #endif
>  
> -static long do_mbind(unsigned long start, unsigned long len,
> +static long do_mbind(struct mm_struct *mm, unsigned long start, unsigned long len,
>  		     unsigned short mode, unsigned short mode_flags,
>  		     nodemask_t *nmask, unsigned long flags)
>  {
> -	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma, *prev;
>  	struct vma_iterator vmi;
>  	struct migration_mpol mmpol;
> @@ -1465,10 +1464,84 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
>  	return 0;
>  }

This is a completely insufficient change to do_mbind.  do_mbind utilizes
`current` in a variety of places for nodemask (cpuset) validation and to
acquire the task's lock.  This will not work the way you intend it to,
you end up mixing up node masks between current and target task.

see here:
https://lore.kernel.org/all/20231122211200.31620-7-gregory.price@memverge.com/

I had to make do_mbind operate on the target task explicitly.

We may want to combine this change and with my change so that your iovec
changes can be re-used, because that is a very nice feature.

> +	unsigned long maxnode = MAX_NUMNODES;
> +	int err;
> +	nodemask_t nodes;
> +
> +	err = sanitize_mpol_flags(&lmode, &mode_flags);
> +	if (err)
> +		goto out;
> +
> +	err = get_nodes(&nodes, nmask, maxnode);

per above, userland MAX_NUMNODES may or may not be equal to kernel
MAX_NUMNODES, so i don't think you can just chop this argument off.


I think we can combine our RFCs here and get this sorted out.

~Gregory

  parent reply	other threads:[~2023-11-22 23:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22 21:31 [RFC PATCH] mm/mbind: Introduce process_mbind() syscall for external memory binding Vinicius Petrucci
2023-11-22 21:39 ` Andrew Morton
2023-11-22 21:45   ` Gregory Price
2023-11-22 23:57   ` Vinicius Petrucci
2023-11-22 23:53 ` Gregory Price [this message]
2023-11-23 15:21   ` Vinicius Petrucci
2023-11-24  8:13 ` Zhongkun He
2023-11-23 21:42   ` Gregory Price
2023-11-30  9:34     ` Zhongkun He
2023-11-30 16:07       ` Gregory Price
2023-12-01 13:53         ` Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZV6Uhsg6WLBtNqU3@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=ravis.opensrc@micron.com \
    --cc=rientjes@google.com \
    --cc=sthanneeru@micron.com \
    --cc=surenb@google.com \
    --cc=vpetrucci@gmail.com \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox