Linux cgroups development
 help / color / mirror / Atom feed
From: Yury Norov <ynorov@nvidia.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Farhad Alemi <farhad.alemi@berkeley.edu>,
	Waiman Long <longman@redhat.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	cgroups@vger.kernel.org
Subject: Re: [PATCH] mm: don't allow empty relative nodemask in mpol_relative_nodemask()
Date: Fri, 29 May 2026 13:47:12 -0400	[thread overview]
Message-ID: <ahnRIDBk4bQ3xX2q@yury> (raw)
In-Reply-To: <20260529152616.2308736-1-joshua.hahnjy@gmail.com>

On Fri, May 29, 2026 at 08:26:15AM -0700, Joshua Hahn wrote:
> On Thu, 28 May 2026 12:41:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 28 May 2026 15:03:37 -0400 Yury Norov <ynorov@nvidia.com> wrote:
> > 
> > > Reassigning nodes relative an empty user-provided nodemask is useless,
> > > and triggers divide-by-zero in the function.
> > > 
> > > Reported-by: Farhad Alemi <farhad.alemi@berkeley.edu>
> > > Link: https://lore.kernel.org/all/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@mail.gmail.com/
> > 
> > Thanks both.
> > 
> > It looks like this is very old code, so we'll be wanting a cc:stable in
> > this.
> > 
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -370,8 +370,13 @@ static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
> > >  static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig,
> > >  				   const nodemask_t *rel)
> > >  {
> > > +	unsigned int w = nodes_weight(*rel);
> > >  	nodemask_t tmp;
> > > -	nodes_fold(tmp, *orig, nodes_weight(*rel));
> > > +
> > > +	if (w == 0)
> > > +		return -EINVAL;
> > > +
> > > +	nodes_fold(tmp, *orig, w);
> > >  	nodes_onto(*ret, tmp, *rel);
> > >  }
> > 
> > I suspect we should address this at the mpol level - it should never
> > have got that far.  Hopefully the mempolicy maintainers can have a
> > think.
> 
> Hello Andrew, hello Yury,
> 
> I agree with Andrew here.
> mpol_relative_nodemask is called from two places, the first being
> mpol_rebind_nodemask which is the calling function seen in the bug report as
> well.
> 
> The other place is mpol_set_nodemask, which has a helpful comment that notes:
> "mpol_set_nodemask is called after mpol_new() [...snip...] mpol_new() has
> already validated the nodes parameter with respect to the policy mode and
> flags".
> 
> So it seems like we are missing the big if-else if-else if block from mpol_new
> in other places that should in fact have it, like mpol_rebind_nodemask.
> 
> The approach proposed here of just checking whether the node weight is 0
> won't work for a few cases, namely for MPOL_DEFAULT and MPOL_PREFERRED where
> empty nodemasks are actually allowed. So what should really be done here is to
> do the full policy-nodemask checking section in mpol_new and call that from
> mpol_set_nodemask as well.
> 
> Thank you for taking a shot at fixing the bug report, please let me know what
> you think! Have a great day : -)

Hi Joshua.

Indeed, quick and dirty shot.

The problem is that nodes_fold() can't work with the sz == 0. In
other words, folding to a 0-bit bitmap is an error. We don't check
that on bitmaps level because it's an internal helper, and it's a
caller's responsibility to validate the parameters.

nodes_onto(), or more specifically bitmap_onto(), is a different
story. In case of empty relmap, the function actually clears all the
bits in dst and returns.

I see 2 options to move this forward.

1. Simply disallow empty relmap in mpol_relative_nodemask(). There's
no valid cases for it, AFAIK, so the nodes_fold() limitation looks
reasonable. We can consider it as a new policy.

We've got 2 users for mpol_relative_nodemask(). In mpol_set_nodemask()
we can simply propagate the error; and in mpol_rebind_nodemask() we
can throw a warning and do nothing.

2. Follow the spirit of the nodes_onto(), and in case of empty
relmask, clean the ret mask and bail out

I'm in a favor for the 1st option, because empty relmask looks buggy
anyways.

> The approach proposed here of just checking whether the node weight is 0
> won't work for a few cases, namely for MPOL_DEFAULT and MPOL_PREFERRED where
> empty nodemasks are actually allowed.

Not sure I understand this. The mpol_relative_nodemask() is called
only if MPOL_F_RELATIVE_NODES is set. In mpol_rebind_nodemask(), if
both MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES are set, the former
wins. How would the RELATIVE mode mess with the others?

The mpol_new() code seemingly tries to disable empty nodes in case of
MPOL_DEFAILT and MPOL_PREFERRED + MPOL_F_RELATIVE_NODES, but obviously
it doesn't work very well in the rebind case.

Anyways, I'm not really deep in mempolicy domain, so please educate me if
I miss something.

Thanks,
Yury

  reply	other threads:[~2026-05-29 17:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 19:03 [PATCH] mm: don't allow empty relative nodemask in mpol_relative_nodemask() Yury Norov
2026-05-28 19:37 ` Waiman Long
2026-05-28 19:40   ` Yury Norov
2026-05-28 19:37 ` Matthew Wilcox
2026-05-28 19:41 ` Andrew Morton
2026-05-29 15:26   ` Joshua Hahn
2026-05-29 17:47     ` Yury Norov [this message]
2026-05-29 18:40       ` Joshua Hahn
2026-05-29  8:47 ` kernel test robot
2026-05-29  8:58 ` kernel test robot
2026-05-29 12:45 ` kernel test robot
2026-05-29 12:47 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahnRIDBk4bQ3xX2q@yury \
    --to=ynorov@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=david@kernel.org \
    --cc=farhad.alemi@berkeley.edu \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=longman@redhat.com \
    --cc=matthew.brost@intel.com \
    --cc=rakie.kim@sk.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox