From: Qi Zheng <qi.zheng@linux.dev>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, david@kernel.org,
lorenzo.stoakes@oracle.com, ziy@nvidia.com,
yosry.ahmed@linux.dev, imran.f.khan@oracle.com,
kamalesh.babulal@oracle.com, axelrasmussen@google.com,
yuanchu@google.com, weixugc@google.com,
chenridong@huaweicloud.com, mkoutny@suse.com,
akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
apais@linux.microsoft.com, lance.yang@linux.dev, bhe@redhat.com,
usamaarif642@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
Date: Wed, 25 Mar 2026 15:39:56 +0800 [thread overview]
Message-ID: <5fed0611-434c-4fd4-956c-39f23e0459a1@linux.dev> (raw)
In-Reply-To: <acOQkTDJvieu3sh-@hyeyoo>
On 3/25/26 3:36 PM, Harry Yoo (Oracle) wrote:
> On Wed, Mar 25, 2026 at 03:26:46PM +0800, Qi Zheng wrote:
>>
>>
>> On 3/25/26 1:17 PM, Harry Yoo (Oracle) wrote:
>>> On Wed, Mar 25, 2026 at 11:25:06AM +0800, Qi Zheng wrote:
>>>> On 3/25/26 9:43 AM, Harry Yoo (Oracle) wrote:
>>>>> On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
>>>>>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>>
>>>>>> The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
>>>>>> reparent non-hierarchical stats, the values passed to them might exceed
>>>>>> the upper limit of the type int, so correct the val parameter type of them
>>>>>> to long.
>>>>>>
>>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>>>> ---
>>>>>> include/trace/events/memcg.h | 10 +++++-----
>>>>>> mm/memcontrol.c | 8 ++++----
>>>>>> 2 files changed, 9 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>>> index 7fb9cbc10dfbb..4a78550f6174e 100644
>>>>>> --- a/mm/memcontrol.c
>>>>>> +++ b/mm/memcontrol.c
>>>>>> @@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
>>>>>> #ifdef CONFIG_MEMCG_V1
>>>>>> static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
>>>>>> - enum node_stat_item idx, int val);
>>>>>> + enum node_stat_item idx, long val);
>>>>>> void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
>>>>>> struct mem_cgroup *parent, int idx)
>>>>>> @@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
>>>>>> * Normalize the value passed into memcg_rstat_updated() to be in pages. Round
>>>>>> * up non-zero sub-page updates to 1 page as zero page updates are ignored.
>>>>>> */
>>>>>> -static int memcg_state_val_in_pages(int idx, int val)
>>>>>> +static long memcg_state_val_in_pages(int idx, long val)
>>>>>> {
>>>>>> int unit = memcg_page_state_unit(idx);
>>>>>
>>>>> Sashiko AI made an interesting argument [1] that this could lead to
>>>>> incorrectly returning a very large positive number. Let me verify that.
>>>>>
>>>>> [1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
>>>>>
>>>>> Sashiko wrote:
>>>>>> Does this change inadvertently break the handling of negative byte-sized
>>>>>> updates?
>>>>>> Looking at the rest of the function:
>>>>>> if (!val || unit == PAGE_SIZE)
>>>>>> return val;
>>>>>> else
>>>>>> return max(val * unit / PAGE_SIZE, 1UL);
>>>>>
>>>>>> PAGE_SIZE is defined as an unsigned long.
>>>>>
>>>>> Right, it's defined as 1UL << PAGE_SHIFT.
>>>>>
>>>>>> When val is negative, such as during uncharging of byte-sized stats like
>>>>>> MEMCG_ZSWAP_B, the expression val * unit is a negative long.
>>>>>
>>>>> Right.
>>>>>
>>>>>> Dividing a signed long by an unsigned long causes the signed long to be
>>>>>> promoted to unsigned before division,
>>>>>
>>>>> Right.
>>>>>
>>>>>> resulting in a massive positive
>>>>>> number instead of a small negative one.
>>>>>
>>>>> Let's look at an example (assuming unit is 1).
>>>>>
>>>>> val = val * unit = -16384 (-16 KiB)
>>>>> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
>>>>> max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
>>>>>
>>>>> Yeah, that's a massive positive number.
>>>>>
>>>>> Hmm but how did it work when it was int?
>>>>>
>>>>> val = val * unit = -16384 (-16KiB)
>>>>> val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
>>>>> max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
>>>>> (int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
>>>>>
>>>>> That's incorrect. It should have been -4?
>>>>>
>>>>>> Before this change, the function returned an int, which implicitly truncated
>>>>>> the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
>>>>>> correct negative arithmetic value.
>>>>>
>>>>> So... "accidentally yielding the correct negative arithemetic value"
>>>>> is wrong.
>>>>>
>>>>> Sounds like it's been subtly broken even before this patch and nobody
>>>>> noticed.
>>>>
>>>> Thank you for such a detailed analysis! And I think you are right.
>>>
>>> No problem ;)
>>>
>>>> The memcg_state_val_in_pages() is only to make @val to be in pages, so
>>>> perhaps we can avoid the above problem by taking the absolute value
>>>> first?
>>
>> For more clearly, here is the diff:
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6491ca74b3398..2b34805b01476 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -787,11 +787,17 @@ static int memcg_page_state_unit(int item);
>> static long memcg_state_val_in_pages(int idx, long val)
>> {
>> int unit = memcg_page_state_unit(idx);
>> + unsigned long uval;
>> + long res;
>>
>> if (!val || unit == PAGE_SIZE)
>> return val;
>> - else
>> - return max(val * unit / PAGE_SIZE, 1UL);
>> +
>> + uval = val < 0 ? -(unsigned long)val : (unsigned long)val;
>
> We could use abs() macro in linux/math.h here :)
>
> val should not be LONG_MIN, but it should be fine
> to assume it won't reach LONG_MIN.
OK, will use abs() macro.
>
>> +
>> + res = max(mult_frac(uval, unit, PAGE_SIZE), 1UL);
>> +
>> + return val < 0 ? -res : res;
>> }
>>
>> #ifdef CONFIG_MEMCG_V1
>>
>>> That would work for memcg_rstat_updated(),
>>> but not for trace_mod_memcg_state()?
>>
>> Why? The trace_mod_memcg_state() seems to accept 'long' type,
>> am I missing something?
>
> Nevermind. Without the diff, I misunderstood what you meant.
OK, will apply this diff and send the v2.
Thanks,
Qi
>
> Thanks!
>
next prev parent reply other threads:[~2026-03-25 7:40 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 11:31 [PATCH 0/3] correct the parameter type of some mm functions Qi Zheng
2026-03-24 11:31 ` [PATCH] fix: mm: memcontrol: convert objcg to be per-memcg per-node type Qi Zheng
2026-03-24 11:34 ` Qi Zheng
2026-03-24 11:31 ` [PATCH 1/3] mm: memcontrol: correct the type of stats_updates to unsigned long Qi Zheng
2026-03-24 12:20 ` Lorenzo Stoakes (Oracle)
2026-03-24 11:31 ` [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state() Qi Zheng
2026-03-24 12:21 ` Lorenzo Stoakes (Oracle)
2026-03-24 14:12 ` Matthew Wilcox
2026-03-24 14:24 ` Lorenzo Stoakes (Oracle)
2026-03-25 1:43 ` Harry Yoo (Oracle)
2026-03-25 3:25 ` Qi Zheng
2026-03-25 5:17 ` Harry Yoo (Oracle)
2026-03-25 7:26 ` Qi Zheng
2026-03-25 7:36 ` Harry Yoo (Oracle)
2026-03-25 7:39 ` Qi Zheng [this message]
2026-03-26 7:49 ` Harry Yoo (Oracle)
2026-03-24 11:31 ` [PATCH 3/3] mm: memcontrol: correct the nr_pages parameter type of mem_cgroup_update_lru_size() Qi Zheng
2026-03-24 12:28 ` Lorenzo Stoakes (Oracle)
2026-03-25 0:27 ` Harry Yoo (Oracle)
2026-03-25 3:34 ` Qi Zheng
2026-03-26 2:35 ` kernel test robot
2026-03-26 3:36 ` Andrew Morton
2026-03-24 11:57 ` [PATCH 0/3] correct the parameter type of some mm functions Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5fed0611-434c-4fd4-956c-39f23e0459a1@linux.dev \
--to=qi.zheng@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=axelrasmussen@google.com \
--cc=bhe@redhat.com \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=hamzamahfooz@linux.microsoft.com \
--cc=hannes@cmpxchg.org \
--cc=harry@kernel.org \
--cc=hughd@google.com \
--cc=imran.f.khan@oracle.com \
--cc=kamalesh.babulal@oracle.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=weixugc@google.com \
--cc=yosry.ahmed@linux.dev \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox