All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@redhat.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,  Zi Yan <ziy@nvidia.com>,
	 Matthew Brost <matthew.brost@intel.com>,
	 Rakie Kim <rakie.kim@sk.com>,  Byungchul Park <byungchul@sk.com>,
	 Gregory Price <gourry@gourry.net>,
	 Alistair Popple <apopple@nvidia.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	 kernel-team@meta.com, Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH] mempolicy: Clarify what RECLAIM_ZONE means
Date: Thu, 31 Jul 2025 09:48:54 +0800	[thread overview]
Message-ID: <87tt2t9lkp.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20250730201908.2395933-1-joshua.hahnjy@gmail.com> (Joshua Hahn's message of "Wed, 30 Jul 2025 13:19:07 -0700")

Joshua Hahn <joshua.hahnjy@gmail.com> writes:

> On Tue, 29 Jul 2025 08:58:49 +0800 "Huang, Ying" <ying.huang@linux.alibaba.com> wrote:
>
>> Joshua Hahn <joshua.hahnjy@gmail.com> writes:
>> 
>> > On Mon, 28 Jul 2025 09:44:06 +0800 "Huang, Ying" <ying.huang@linux.alibaba.com> wrote:
>> >
>> >> Hi, Joshua,
>> >> 
>> >> Joshua Hahn <joshua.hahnjy@gmail.com> writes:
>> >> 
>> >> > The zone_reclaim_mode API controls reclaim behavior when a node runs out of
>> >> > memory. Contrary to its user-facing name, it is internally referred to as
>> >> > "node_reclaim_mode". This is slightly confusing but there is not much we can
>> >> > do given that it has already been exposed to userspace (since at least 2.6).
>> >> >
>> >> > However, what we can do is to make sure the internal description of what the
>> >> > bits inside zone_reclaim_mode aligns with what it does in practice.
>> >> > Setting RECLAIM_ZONE does indeed run shrink_inactive_list, but a more holistic
>> >> > description would be to explain that zone reclaim modulates whether page
>> >> > allocation (and khugepaged collapsing) prefers reclaiming & attempting to
>> >> > allocate locally or should fall back to the next node in the zonelist.
>> >> >
>> >> > Change the description to clarify what zone reclaim entails.
>> >> >
>> >> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>> >> > ---
>> >> >  include/uapi/linux/mempolicy.h | 2 +-
>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
>> >> > index 1f9bb10d1a47..24083809d920 100644
>> >> > --- a/include/uapi/linux/mempolicy.h
>> >> > +++ b/include/uapi/linux/mempolicy.h
>> >> > @@ -69,7 +69,7 @@ enum {
>> >> >   * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
>> >> >   * ABI.  New bits are OK, but existing bits can never change.
>> >> >   */
>> >> > -#define RECLAIM_ZONE	(1<<0)	/* Run shrink_inactive_list on the zone */
>> >> > +#define RECLAIM_ZONE	(1<<0)	/* Prefer reclaiming & allocating locally */
>> >> >  #define RECLAIM_WRITE	(1<<1)	/* Writeout pages during reclaim */
>> >> >  #define RECLAIM_UNMAP	(1<<2)	/* Unmap pages during reclaim */
>> >> >  
>> >> >
>> >> > base-commit: 25fae0b93d1d7ddb25958bcb90c3c0e5e0e202bd
>> >
>> > Hi Ying, thanks for your review, as always!
>> >
>> >> Please consider the document of zone_reclaim_mode in
>> >> Documentation/admin-guide/sysctl/vm.rst too.
>> >
>> > Yes, will do. Along with SJ's comment, I think that the information in the
>> > admin-guide should be sufficient enough to explain what these bits do, so
>> > I think my patch is not very necessary.
>> >
>> >> And, IIUC, RECLAIM_ZONE doesn't mean "locally" exactly.  It's legal to
>> >> bind to some node other than "local node".
>> >
>> > You are correct, it seems you can also reclaim on non-local nodes once you
>> > go further down in the zonelist. I think my intent with the new comment was just
>> > to indicate a preference to reclaim and allocate on the *current* node, as
>> > opposed to falling back to the next node in the zonelist.
>> >
>> > With that said, I think your comment along with SJ's feedback have gotten me
>> > to understand that we proably don't need this change : -) 
>> 
>> TBH, I think that it's good to make some change to the comments.
>> Because IMHO, the original comments are bound to some specific
>> implementation details.  Some more general words may be better for the
>> user space API description.
>
> Hi Ying, sorry for the late reply.
>
> I think that is a good point. Then maybe in that case, we can take SJ's comment
> and leave information about both the implementation detail (i.e. that it will
> perform shrink inactive_list on the zone), and that it will prefer this over
> allocating on the next node as a general description of what happens?

Yes.  Something like this, or

Try to reclaim in the current node/zone before allocating on the fallback.

> On that note, one thing that I felt was slightly undercaptured in
> Documentation/admin-guide is what "zone reclaim" actually means. What it does
> is of course well captured by its name, but it misses the nuance of preferring
> reclaim over fallback allocation.
>
> Actually the whole motivation behind all of this conversation is because I saw
> zone reclaim preventing allocation into a second node in a 2-NUMA node system
> and was a bit confused until I understood what the implication of having
> zone reclaim was.

Yes.  It's good to improve the document.  If it makes you confusing, it
may make others confusing too.

> Anyways, I can probably spin the patch to include information about what
> zone reclaim is, in the comment block above the bits.
>
> But please feel free to correct me if you feel that the descriptions available
> in both the mempolicy.h uapi file or the Documentation/admin-guide is already
> enough.

Thanks for doing this.

---
Best Regards,
Huang, Ying


  reply	other threads:[~2025-07-31  1:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-25 17:35 [PATCH] mempolicy: Clarify what RECLAIM_ZONE means Joshua Hahn
2025-07-25 21:44 ` SeongJae Park
2025-07-26  1:24   ` Joshua Hahn
2025-07-28  1:44 ` Huang, Ying
2025-07-28 14:51   ` Joshua Hahn
2025-07-29  0:58     ` Huang, Ying
2025-07-30 20:19       ` Joshua Hahn
2025-07-31  1:48         ` Huang, Ying [this message]
2025-07-31 18:45           ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tt2t9lkp.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=rakie.kim@sk.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.