public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Qi Zheng <qi.zheng@linux.dev>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, david@kernel.org,
	lorenzo.stoakes@oracle.com, ziy@nvidia.com,
	yosry.ahmed@linux.dev, imran.f.khan@oracle.com,
	kamalesh.babulal@oracle.com, axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com,
	chenridong@huaweicloud.com, mkoutny@suse.com,
	akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
	apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com,
	usamaarif642@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
Date: Wed, 25 Mar 2026 11:25:06 +0800	[thread overview]
Message-ID: <5a9d9152-8e23-4438-a322-ec524fd159a6@linux.dev> (raw)
In-Reply-To: <acM93qmvDFofjWdM@hyeyoo>



On 3/25/26 9:43 AM, Harry Yoo (Oracle) wrote:
> On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>
>> The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
>> reparent non-hierarchical stats, the values passed to them might exceed
>> the upper limit of the type int, so correct the val parameter type of them
>> to long.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   include/trace/events/memcg.h | 10 +++++-----
>>   mm/memcontrol.c              |  8 ++++----
>>   2 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 7fb9cbc10dfbb..4a78550f6174e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
>>   
>>   #ifdef CONFIG_MEMCG_V1
>>   static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
>> -				     enum node_stat_item idx, int val);
>> +				     enum node_stat_item idx, long val);
>>   
>>   void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
>>   				       struct mem_cgroup *parent, int idx)
>> @@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
>>    * Normalize the value passed into memcg_rstat_updated() to be in pages. Round
>>    * up non-zero sub-page updates to 1 page as zero page updates are ignored.
>>    */
>> -static int memcg_state_val_in_pages(int idx, int val)
>> +static long memcg_state_val_in_pages(int idx, long val)
>>   {
>>   	int unit = memcg_page_state_unit(idx);
> 
> Sashiko AI made an interesting argument [1] that this could lead to
> incorrectly returning a very large positive number. Let me verify that.
> 
> [1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
> 
> Sashiko wrote:
>> Does this change inadvertently break the handling of negative byte-sized
>> updates?
>> Looking at the rest of the function:
>> 	if (!val || unit == PAGE_SIZE)
>> 		return val;
>> 	else
>> 		return max(val * unit / PAGE_SIZE, 1UL);
> 
>> PAGE_SIZE is defined as an unsigned long.
> 
> Right, it's defined as 1UL << PAGE_SHIFT.
> 
>> When val is negative, such as during uncharging of byte-sized stats like
>> MEMCG_ZSWAP_B, the expression val * unit is a negative long.
> 
> Right.
> 
>> Dividing a signed long by an unsigned long causes the signed long to be
>> promoted to unsigned before division,
> 
> Right.
> 
>> resulting in a massive positive
>> number instead of a small negative one.
> 
> Let's look at an example (assuming unit is 1).
> 
> val = val * unit = -16384 (-16 KiB)
> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
> 
> Yeah, that's a massive positive number.
> 
> Hmm but how did it work when it was int?
> 
> val = val * unit = -16384 (-16KiB)
> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
> max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
> (int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
> 
> That's incorrect. It should have been -4?
> 
>> Before this change, the function returned an int, which implicitly truncated
>> the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
>> correct negative arithmetic value.
> 
> So... "accidentally yielding the correct negative arithemetic value"
> is wrong.
> 
> Sounds like it's been subtly broken even before this patch and nobody
> noticed.

Thank you for such a detailed analysis! And I think you are right.

The memcg_state_val_in_pages() is only to make @val to be in pages, so
perhaps we can avoid the above problem by taking the absolute value
first?

Thanks,
Qi

> 
>> By changing the return type to long, this implicit truncation is removed,
>> and the huge positive value is returned unaltered.
> 
> That's true.
> 
>> Could this corrupt tracepoint logs when passed to trace_mod_memcg_state?
> 
> I'm not sure if that's critical but yeah that's true.
> 
>> Also, would passing this huge positive value to memcg_rstat_updated instantly
>> exceed the charge batch threshold and trigger endless, expensive global
>> cgroup_rstat flushing, severely degrading system performance?
> 
> It would lead to more frequent flushes, at least.
> 



  reply	other threads:[~2026-03-25  3:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 11:31 [PATCH 0/3] correct the parameter type of some mm functions Qi Zheng
2026-03-24 11:31 ` [PATCH] fix: mm: memcontrol: convert objcg to be per-memcg per-node type Qi Zheng
2026-03-24 11:34   ` Qi Zheng
2026-03-24 11:31 ` [PATCH 1/3] mm: memcontrol: correct the type of stats_updates to unsigned long Qi Zheng
2026-03-24 12:20   ` Lorenzo Stoakes (Oracle)
2026-03-24 11:31 ` [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state() Qi Zheng
2026-03-24 12:21   ` Lorenzo Stoakes (Oracle)
2026-03-24 14:12     ` Matthew Wilcox
2026-03-24 14:24       ` Lorenzo Stoakes (Oracle)
2026-03-25  1:43   ` Harry Yoo (Oracle)
2026-03-25  3:25     ` Qi Zheng [this message]
2026-03-25  5:17       ` Harry Yoo (Oracle)
2026-03-25  7:26         ` Qi Zheng
2026-03-25  7:36           ` Harry Yoo (Oracle)
2026-03-25  7:39             ` Qi Zheng
2026-03-26  7:49     ` Harry Yoo (Oracle)
2026-03-24 11:31 ` [PATCH 3/3] mm: memcontrol: correct the nr_pages parameter type of mem_cgroup_update_lru_size() Qi Zheng
2026-03-24 12:28   ` Lorenzo Stoakes (Oracle)
2026-03-25  0:27     ` Harry Yoo (Oracle)
2026-03-25  3:34       ` Qi Zheng
2026-03-26  2:35   ` kernel test robot
2026-03-26  3:36     ` Andrew Morton
2026-03-24 11:57 ` [PATCH 0/3] correct the parameter type of some mm functions Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a9d9152-8e23-4438-a322-ec524fd159a6@linux.dev \
    --to=qi.zheng@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axelrasmussen@google.com \
    --cc=bhe@redhat.com \
    --cc=chenridong@huaweicloud.com \
    --cc=david@kernel.org \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry@kernel.org \
    --cc=hughd@google.com \
    --cc=imran.f.khan@oracle.com \
    --cc=kamalesh.babulal@oracle.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=weixugc@google.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox