All of lore.kernel.org
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>,
	 Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	usamaarif642@gmail.com, kas@kernel.org, kernel-team@meta.com
Subject: Re: [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Date: Thu, 5 Mar 2026 03:48:07 -0800	[thread overview]
Message-ID: <aalXA_nP2k2OrHhc@gmail.com> (raw)
In-Reply-To: <ec07d7f0-cad4-4d9b-8e40-d4ded8170340@lucifer.local>

On Wed, Mar 04, 2026 at 04:40:22PM +0000, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 04, 2026 at 02:22:33AM -0800, Breno Leitao wrote:
> > Writing "never" (or any other value) multiple times to
> > /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled calls
> > start_stop_khugepaged() each time, even when nothing actually changed.
> > This causes set_recommended_min_free_kbytes() to run unconditionally,
> > which is unnecessary and floods the printk buffer with "raising
> > min_free_kbytes" messages. Example:
> >
> >   # for i in $(seq 100); do
> >   #       echo never > /sys/kernel/mm/transparent_hugepage/enabled
> >   # done
> >
> >   # dmesg | grep "min_free_kbytes is not updated" | wc -l
> >   100
> >
> > Use test_and_set_bit()/test_and_clear_bit() instead of the plain
> > variants to detect whether any bit actually flipped, and skip the
> > start_stop_khugepaged() call entirely when the configuration is
> > unchanged.
> >
> > With this patch, redoing the same operation becomes a no-op.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> General concept is sensible, but let's improve this code please.

Ack! Thanks for the suggestions.

> >  		spin_unlock(&huge_anon_orders_lock);
> >  	} else
> >  		ret = -EINVAL;
> >
> > -	if (ret > 0) {
> > +	if (ret > 0 && changed) {
> >  		int err;
> >
> >  		err = start_stop_khugepaged();
> 
> There's a caveat here as mentioned in reply to Kiryl - I'm concerned users
> might rely on the set recommended min kbytes even when things don't change.
> 
> Not sure how likely that is, but it's a user-visible change in how this behaves.

Is there any motivation that users are retouching
/sys/kernel/mm/transparent_hugepage just to trigger
set_recommended_min_free_kbytes() ? That seems weird, but, I will keep it in
the change.

> From cb2c4c8bf183ef0d10068cfd12c12d19cb17a241 Mon Sep 17 00:00:00 2001
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Wed, 4 Mar 2026 16:37:20 +0000
> Subject: [PATCH] idea
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks for the idea. Let me hack on top of it, and propose a v2.

> ---
>  mm/huge_memory.c | 74 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0df1f4a17430..97dabbeb9112 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -515,46 +515,64 @@ static ssize_t anon_enabled_show(struct kobject *kobj,
>  	return sysfs_emit(buf, "%s\n", output);
>  }
> 
> +enum huge_mode {
> +	HUGE_ALWAYS,
> +	HUGE_INHERIT,
> +	HUGE_MADVISE,
> +	HUGE_NUM_MODES,
> +	HUGE_NEVER,
> +};
> +
> +static bool change_anon_orders(int order, enum huge_mode mode)
> +{
> +	static unsigned long *orders[] = {
> +		&huge_anon_orders_always,
> +		&huge_anon_orders_inherit,
> +		&huge_anon_orders_madvise,
> +	};
> +	bool changed = false;
> +	int i;
> +
> +	spin_lock(&huge_anon_orders_lock);
> +	for (i = 0; i < HUGE_NUM_MODES; i++) {

> +		if (i == mode)
> +			changed |= !test_and_set_bit(order, orders[mode]);
> +		else
> +			changed |= test_and_clear_bit(order, orders[mode]);

I suppose we want s/mode/i in the test_and_{clear,set}_bit() here:

		if (i == mode)
			// set for mode
			changed |= !test_and_set_bit(order, orders[i]);
		else
			// clear for !mode
			changed |= test_and_clear_bit(order, orders[i]);

For two reasons:
	* you want to unset "i" when i != mode.
	* you would have an OOB when accessing orders[HUGE_NEVER == 4]


>  static ssize_t anon_enabled_store(struct kobject *kobj,
>  				 struct kobj_attribute *attr,
>  				 const char *buf, size_t count)
>  {
>  	int order = to_thpsize(kobj)->order;
>  	ssize_t ret = count;
> +	bool changed;
> +
> +	if (sysfs_streq(buf, "always"))
> +		changed = change_anon_orders(order, HUGE_ALWAYS);
> +	else if (sysfs_streq(buf, "inherit"))
> +		changed = change_anon_orders(order, HUGE_INHERIT);
> +	else if (sysfs_streq(buf, "madvise"))
> +		changed = change_anon_orders(order, HUGE_MADVISE);
> +	else if (sysfs_streq(buf, "never"))
> +		changed = change_anon_orders(order, HUGE_NEVER);
> +	else
> +		return -EINVAL;

I think we can simplify anon_enabled_store() even more, by leveraging sysfs_match_string().
Something like:

	static const char *const anon_mode_strings[] = {
		[HUGE_ALWAYS]   = "always",
		[HUGE_INHERIT]  = "inherit",
		[HUGE_MADVISE]  = "madvise",
		[HUGE_NEVER]    = "never",
		NULL,
	};

and then

	static ssize_t anon_enabled_store(struct kobject *kobj,
					struct kobj_attribute *attr,
					const char *buf, size_t count)
	{
		int order = to_thpsize(kobj)->order;
		int mode;

		mode = sysfs_match_string(enabled_mode_strings, buf);
		if (mode < 0)
			return mode;

		if (change_anon_orders(order, mode)) {
			int err = start_stop_khugepaged();

			if (err)
				return err;
		} else {
			/* Users expect this even if unchanged. TODO: Put in header... */
			//set_recommended_min_free_kbytes();
		}
		return count;
	}



Anyway, I like this approach, thanks!. Let me hack a v2 based on it.

--breno


  reply	other threads:[~2026-03-05 11:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 10:22 [PATCH 0/2] mm: thp: reduce unnecessary start_stop_khugepaged() calls Breno Leitao
2026-03-04 10:22 ` [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store() Breno Leitao
2026-03-04 16:40   ` Lorenzo Stoakes (Oracle)
2026-03-05 11:48     ` Breno Leitao [this message]
2026-03-05 12:30       ` Lorenzo Stoakes (Oracle)
2026-03-05 12:44     ` David Hildenbrand (Arm)
2026-03-04 10:22 ` [PATCH 2/2] mm: thp: avoid calling start_stop_khugepaged() in enabled_store() Breno Leitao
2026-03-04 16:40   ` Lorenzo Stoakes (Oracle)
2026-03-04 11:18 ` [PATCH 0/2] mm: thp: reduce unnecessary start_stop_khugepaged() calls Kiryl Shutsemau
2026-03-04 11:53   ` Breno Leitao
2026-03-04 16:24   ` Lorenzo Stoakes (Oracle)
2026-03-05 12:41     ` David Hildenbrand (Arm)
2026-03-05 12:45       ` Lorenzo Stoakes (Oracle)
2026-03-05 12:46         ` Lorenzo Stoakes (Oracle)
2026-03-05 13:14         ` David Hildenbrand (Arm)
2026-03-04 16:17 ` Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aalXA_nP2k2OrHhc@gmail.com \
    --to=leitao@debian.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.