All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qi Zheng <qi.zheng@linux.dev>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, david@kernel.org,
	lorenzo.stoakes@oracle.com, ziy@nvidia.com,
	yosry.ahmed@linux.dev, imran.f.khan@oracle.com,
	kamalesh.babulal@oracle.com, axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com,
	chenridong@huaweicloud.com, mkoutny@suse.com,
	akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
	apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com,
	usamaarif642@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
Date: Wed, 25 Mar 2026 11:25:06 +0800	[thread overview]
Message-ID: <5a9d9152-8e23-4438-a322-ec524fd159a6@linux.dev> (raw)
In-Reply-To: <acM93qmvDFofjWdM@hyeyoo>



On 3/25/26 9:43 AM, Harry Yoo (Oracle) wrote:
> On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>
>> The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
>> reparent non-hierarchical stats, the values passed to them might exceed
>> the upper limit of the type int, so correct the val parameter type of them
>> to long.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   include/trace/events/memcg.h | 10 +++++-----
>>   mm/memcontrol.c              |  8 ++++----
>>   2 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 7fb9cbc10dfbb..4a78550f6174e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
>>   
>>   #ifdef CONFIG_MEMCG_V1
>>   static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
>> -				     enum node_stat_item idx, int val);
>> +				     enum node_stat_item idx, long val);
>>   
>>   void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
>>   				       struct mem_cgroup *parent, int idx)
>> @@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
>>    * Normalize the value passed into memcg_rstat_updated() to be in pages. Round
>>    * up non-zero sub-page updates to 1 page as zero page updates are ignored.
>>    */
>> -static int memcg_state_val_in_pages(int idx, int val)
>> +static long memcg_state_val_in_pages(int idx, long val)
>>   {
>>   	int unit = memcg_page_state_unit(idx);
> 
> Sashiko AI made an interesting argument [1] that this could lead to
> incorrectly returning a very large positive number. Let me verify that.
> 
> [1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
> 
> Sashiko wrote:
>> Does this change inadvertently break the handling of negative byte-sized
>> updates?
>> Looking at the rest of the function:
>> 	if (!val || unit == PAGE_SIZE)
>> 		return val;
>> 	else
>> 		return max(val * unit / PAGE_SIZE, 1UL);
> 
>> PAGE_SIZE is defined as an unsigned long.
> 
> Right, it's defined as 1UL << PAGE_SHIFT.
> 
>> When val is negative, such as during uncharging of byte-sized stats like
>> MEMCG_ZSWAP_B, the expression val * unit is a negative long.
> 
> Right.
> 
>> Dividing a signed long by an unsigned long causes the signed long to be
>> promoted to unsigned before division,
> 
> Right.
> 
>> resulting in a massive positive
>> number instead of a small negative one.
> 
> Let's look at an example (assuming unit is 1).
> 
> val = val * unit = -16384 (-16 KiB)
> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
> 
> Yeah, that's a massive positive number.
> 
> Hmm but how did it work when it was int?
> 
> val = val * unit = -16384 (-16KiB)
> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
> (int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
> 
> That's incorrect. It should have been -4?
> 
>> Before this change, the function returned an int, which implicitly truncated
>> the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
>> correct negative arithmetic value.
> 
> So... "accidentally yielding the correct negative arithemetic value"
> is wrong.
> 
> Sounds like it's been subtly broken even before this patch and nobody
> noticed.

Thank you for such a detailed analysis! And I think you are right.

The memcg_state_val_in_pages() is only to make @val to be in pages, so
perhaps we can avoid the above problem by taking the absolute value
first?

Thanks,
Qi

> 
>> By changing the return type to long, this implicit truncation is removed,
>> and the huge positive value is returned unaltered.
> 
> That's true.
> 
>> Could this corrupt tracepoint logs when passed to trace_mod_memcg_state?
> 
> I'm not sure if that's critical but yeah that's true.
> 
>> Also, would passing this huge positive value to memcg_rstat_updated instantly
>> exceed the charge batch threshold and trigger endless, expensive global
>> cgroup_rstat flushing, severely degrading system performance?
> 
> It would lead to more frequent flushes, at least.
> 



  reply	other threads:[~2026-03-25  3:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 11:31 [PATCH 0/3] correct the parameter type of some mm functions Qi Zheng
2026-03-24 11:31 ` [PATCH] fix: mm: memcontrol: convert objcg to be per-memcg per-node type Qi Zheng
2026-03-24 11:34   ` Qi Zheng
2026-03-24 11:31 ` [PATCH 1/3] mm: memcontrol: correct the type of stats_updates to unsigned long Qi Zheng
2026-03-24 12:20   ` Lorenzo Stoakes (Oracle)
2026-03-24 11:31 ` [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state() Qi Zheng
2026-03-24 12:21   ` Lorenzo Stoakes (Oracle)
2026-03-24 14:12     ` Matthew Wilcox
2026-03-24 14:24       ` Lorenzo Stoakes (Oracle)
2026-03-25  1:43   ` Harry Yoo (Oracle)
2026-03-25  3:25     ` Qi Zheng [this message]
2026-03-25  5:17       ` Harry Yoo (Oracle)
2026-03-25  7:26         ` Qi Zheng
2026-03-25  7:36           ` Harry Yoo (Oracle)
2026-03-25  7:39             ` Qi Zheng
2026-03-26  7:49     ` Harry Yoo (Oracle)
2026-03-24 11:31 ` [PATCH 3/3] mm: memcontrol: correct the nr_pages parameter type of mem_cgroup_update_lru_size() Qi Zheng
2026-03-24 12:28   ` Lorenzo Stoakes (Oracle)
2026-03-25  0:27     ` Harry Yoo (Oracle)
2026-03-25  3:34       ` Qi Zheng
2026-03-26  2:35   ` kernel test robot
2026-03-26  3:36     ` Andrew Morton
2026-03-24 11:57 ` [PATCH 0/3] correct the parameter type of some mm functions Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a9d9152-8e23-4438-a322-ec524fd159a6@linux.dev \
    --to=qi.zheng@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axelrasmussen@google.com \
    --cc=bhe@redhat.com \
    --cc=chenridong@huaweicloud.com \
    --cc=david@kernel.org \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry@kernel.org \
    --cc=hughd@google.com \
    --cc=imran.f.khan@oracle.com \
    --cc=kamalesh.babulal@oracle.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=weixugc@google.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.