linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de,
	luto@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mhocko@kernel.org,
	tj@kernel.org, corbet@lwn.net, rakie.kim@sk.com,
	hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com,
	peterz@infradead.org, jgroves@micron.com,
	ravis.opensrc@micron.com, sthanneeru@micron.com,
	emirakhur@micron.com, Hasan.Maruf@amd.com,
	seungjun.ha@samsung.com
Subject: Re: [PATCH v5 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface
Date: Tue, 26 Dec 2023 01:48:05 -0500	[thread overview]
Message-ID: <ZYp3JbcCPQc4fUrB@memverge.com> (raw)
In-Reply-To: <877cl0f8oo.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Wed, Dec 27, 2023 at 02:42:15PM +0800, Huang, Ying wrote:
> Gregory Price <gourry.memverge@gmail.com> writes:
> 
> > +		These weights only affect new allocations, and changes at runtime
> > +		will not cause migrations on already allocated pages.
> > +
> > +		Writing an empty string resets the weight value to 1.
> 
> I still think that it's a good idea to provide some better default
> weight value with HMAT or CDAT if available.  So, better not to make "1"
> as part of ABI?
> 

That's the eventual goal, but this is just the initial mechanism.

My current thought is that the CXL driver will apply weights as the
system iterates through devices and creates numa nodes.  In the
meantime, you have to give the "possible" nodes a default value to
prevent nodes onlined after boot from showing up with 0-value.

Not allowing 0-value weights is simply easier in many respects.

> > +
> > +		Minimum weight: 1
> 
> Can weight be "0"?  Do we need a way to specify that a node don't want
> to participate weighted interleave?
> 

In this code, weight cannot be 0.  My thoguht is that removing the node
from the nodemask is the way to denote 0.

The problem with 0 is hotplug, migration, and cpusets.mems_allowed.  

Example issue:  Use set local weights to [1,0,1,0] for nodes [0-3],
and has a cpusets.mems_allowed mask of (0, 2).

Lets say the user migrates the task via cgroups from nodes (0,2) to
(1,3).

The task will instantly crash as basically OOM because weights of
[1,0,1,0] will prevent memory from being allocations.

Not allowing nodes weights of 0 is defensive.  Instead, simply removing
the node from the nodemask and/or mems_allowed is both equivalent to and
the preferred way to apply a weight of 0.

> > +		Maximum weight: 255
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 10a590ee1c89..0e77633b07a5 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -131,6 +131,8 @@ static struct mempolicy default_policy = {
> >  
> >  static struct mempolicy preferred_node_policy[MAX_NUMNODES];
> >  
> > +static char iw_table[MAX_NUMNODES];
> > +
> 
> It's kind of obscure whether "char" is "signed" or "unsigned".  Given
> the max weight is 255 above, it's better to use "u8"?
>

bah, stupid mistake.  I will switch this to u8.

> And, we may need a way to specify whether the weight has been overridden
> by the user.
> A special value (such as 255) can be used for that.  If
> so, the maximum weight should be 254 instead of 255.  As a user space
> interface, is it better to use 100 as the maximum value?
> 

There's global weights and local weights.  These are the global weights.

Local weights are stored in task->mempolicy.wil.il_weights.

(policy->mode_flags & MPOL_F_GWEIGHT) denotes the override.
This is set if (mempolicy_args->il_weights) was provided.

This simplifies the interface.

(note: local weights are not introduced until the last patch 11/11)

> > +
> > +static void sysfs_mempolicy_release(struct kobject *mempolicy_kobj)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < MAX_NUMNODES; i++)
> > +		sysfs_wi_node_release(node_attrs[i], mempolicy_kobj);
> 
> IIUC, if this is called in error path (such as, in
> add_weighted_interleave_group()), some node_attrs[] element may be
> "NULL"?
> 

The null check is present in sysfs_wi_node_release

if (!node_attr)
	return;

Is it preferable to pull this out? Seemed more defensive to put it
inside the function.

~Gregory

  reply	other threads:[~2023-12-27 15:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-23 18:10 [PATCH v5 00/11] mempolicy2, mbind2, and weighted interleave Gregory Price
2023-12-23 18:10 ` [PATCH v5 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface Gregory Price
2023-12-27  6:42   ` Huang, Ying
2023-12-26  6:48     ` Gregory Price [this message]
2024-01-02  7:41       ` Huang, Ying
2024-01-02 19:45         ` Gregory Price
2024-01-03  2:45           ` Huang, Ying
2024-01-03  2:59             ` Gregory Price
2024-01-03  6:03               ` Huang, Ying
2024-01-03  2:46         ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 02/11] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving Gregory Price
2023-12-27  8:32   ` Huang, Ying
2023-12-26  7:01     ` Gregory Price
2023-12-26  8:06       ` Gregory Price
2023-12-26 11:32       ` Gregory Price
2024-01-02  8:42       ` Huang, Ying
2024-01-02 20:30         ` Gregory Price
2024-01-03  5:46           ` Huang, Ying
2024-01-03 22:09             ` Gregory Price
2024-01-04  5:39               ` Huang, Ying
2024-01-04 18:59                 ` Gregory Price
2024-01-05  6:51                   ` Huang, Ying
2024-01-05  7:25                     ` Gregory Price
2024-01-08  7:08                       ` Huang, Ying
2023-12-23 18:10 ` [PATCH v5 03/11] mm/mempolicy: refactor sanitize_mpol_flags for reuse Gregory Price
2023-12-27  8:39   ` Huang, Ying
2023-12-26  7:05     ` Gregory Price
2023-12-26 11:48       ` Gregory Price
2024-01-02  9:09         ` Huang, Ying
2024-01-02 20:32           ` Gregory Price
2023-12-23 18:10 ` [PATCH v5 04/11] mm/mempolicy: create struct mempolicy_args for creating new mempolicies Gregory Price
2023-12-23 18:10 ` [PATCH v5 05/11] mm/mempolicy: refactor kernel_get_mempolicy for code re-use Gregory Price
2023-12-23 18:10 ` [PATCH v5 06/11] mm/mempolicy: allow home_node to be set by mpol_new Gregory Price
2023-12-23 18:10 ` [PATCH v5 07/11] mm/mempolicy: add userland mempolicy arg structure Gregory Price
2023-12-23 18:10 ` [PATCH v5 08/11] mm/mempolicy: add set_mempolicy2 syscall Gregory Price
2024-01-02 14:38   ` Geert Uytterhoeven
2023-12-23 18:10 ` [PATCH v5 09/11] mm/mempolicy: add get_mempolicy2 syscall Gregory Price
2024-01-02 14:46   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 10/11] mm/mempolicy: add the mbind2 syscall Gregory Price
2024-01-02 14:47   ` Geert Uytterhoeven
2023-12-23 18:11 ` [PATCH v5 11/11] mm/mempolicy: extend set_mempolicy2 and mbind2 to support weighted interleave Gregory Price
2023-12-25  7:54 ` [PATCH v5 00/11] mempolicy2, mbind2, and " Huang, Ying
2023-12-26  7:45   ` Gregory Price
2024-01-02  4:27     ` Huang, Ying
2024-01-02 19:06       ` Gregory Price
2024-01-03  3:15         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYp3JbcCPQc4fUrB@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=honggyu.kim@sk.com \
    --cc=hpa@zytor.com \
    --cc=hyeongtak.ji@sk.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rakie.kim@sk.com \
    --cc=ravis.opensrc@micron.com \
    --cc=seungjun.ha@samsung.com \
    --cc=sthanneeru@micron.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).