From: Harry Yoo <harry.yoo@oracle.com>
To: Dev Jain <dev.jain@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>, Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
Meta kernel team <kernel-team@meta.com>
Subject: Re: [PATCH 1/4] memcg: use mod_node_page_state to update stats
Date: Wed, 11 Feb 2026 17:53:38 +0900 [thread overview]
Message-ID: <aYxDkkDI4mk3r011@hyeyoo> (raw)
In-Reply-To: <5a6782f3-d758-4d9c-975b-5ae4b5d80d4e@arm.com>
On Wed, Feb 11, 2026 at 01:07:40PM +0530, Dev Jain wrote:
>
> On 10/02/26 9:59 pm, Shakeel Butt wrote:
> > On Tue, Feb 10, 2026 at 01:08:49PM +0530, Dev Jain wrote:
> > [...]
> >>> Oh so it is arm64 specific issue. I tested on x86-64 machine and it solves
> >>> the little regression it had before. So, on arm64 all this_cpu_ops i.e. without
> >>> double underscore, uses LL/SC instructions.
> >>>
> >>> Need more thought on this.
> >>>
> >>>>> Also can you confirm whether my analysis of the regression was correct?
> >>>>> Because if it was, then this diff looks wrong - AFAIU preempt_disable()
> >>>>> won't stop an irq handler from interrupting the execution, so this
> >>>>> will introduce a bug for code paths running in irq context.
> >>>>>
> >>>> I was worried about the correctness too, but this_cpu_add() is safe
> >>>> against IRQs and so the stat will be _eventually_ consistent?
> >>>>
> >>>> Ofc it's so confusing! Maybe I'm the one confused.
> >>> Yeah there is no issue with proposed patch as it is making the function
> >>> re-entrant safe.
> >> Ah yes, this_cpu_add() does the addition in one shot without read-modify-write.
> >>
> >> I am still puzzled whether the original patch was a bug fix or an optimization.
> > The original patch was a cleanup patch. The memcg stats update functions
> > were already irq/nmi safe without disabling irqs and that patch did the
> > same for the numa stats. Though it seems like that is causing regression
> > for arm64 as this_cpu* ops are expensive on arm64.
> >
> >> The patch description says that node stat updation uses irq unsafe interface.
> >> Therefore, we had foo() calling __foo() nested with local_irq_save/restore. But
> >> there were code paths which directly called __foo() - so, your patch fixes a bug right
> > No, those places were already disabling irqs and should be fine.
>
> Please correct me if I am missing something here. Simply putting an
> if (!irqs_disabled()) -> dump_stack() in __lruvec_stat_mod_folio, before
> calling __mod_node_page_state, reveals:
>
> [ 6.486375] Call trace:
> [ 6.486376] show_stack+0x20/0x38 (C)
> [ 6.486379] dump_stack_lvl+0x74/0x90
> [ 6.486382] dump_stack+0x18/0x28
> [ 6.486383] __lruvec_stat_mod_folio+0x160/0x180
> [ 6.486385] folio_add_file_rmap_ptes+0x128/0x480
> [ 6.486388] set_pte_range+0xe8/0x320
> [ 6.486389] finish_fault+0x260/0x508
> [ 6.486390] do_fault+0x2d0/0x598
> [ 6.486391] __handle_mm_fault+0x398/0xb60
> [ 6.486393] handle_mm_fault+0x15c/0x298
> [ 6.486394] __get_user_pages+0x204/0xb88
> [ 6.486395] populate_vma_page_range+0xbc/0x1b8
> [ 6.486396] __mm_populate+0xcc/0x1e0
> [ 6.486397] __arm64_sys_mlockall+0x1d4/0x1f8
> [ 6.486398] invoke_syscall+0x50/0x120
> [ 6.486399] el0_svc_common.constprop.0+0x48/0xf0
> [ 6.486400] do_el0_svc+0x24/0x38
> [ 6.486400] el0_svc+0x34/0xf0
> [ 6.486402] el0t_64_sync_handler+0xa0/0xe8
> [ 6.486404] el0t_64_sync+0x198/0x1a0
>
> Indeed finish_fault() takes a PTL spin lock without irq disablement.
That indeed looks incorrect to me.
I was assuming __foo() is always called with IRQs disabled!
> > I am working on adding batched stats update functionality in the hope
> > that will fix the regression.
>
> Thanks! FYI, I have zeroed in the issue on to preempt_disable(). Dropping this
> from _pcpu_protect_return solves the regression.
That's interesting, why is the cost of preempt disable/enable so high?
> Unlike x86, arm64 does a preempt_disable
> when doing this_cpu_*. On a cursory look it seems like this is unnecessary - since we
> are doing preempt_enable() immediately after reading the pointer, CPU migration is
> possible anyways, so there is nothing to be gained by reading pcpu pointer with
> preemption disabled. I am investigating whether we can simply drop this in general.
Let me quote an old email from Mark Rutland [1]:
> We also thought that initially, but there's a sbutle race that can
> occur, and so we added code to disable preemption in commit:
>
> f3eab7184ddcd486 ("arm64: percpu: Make this_cpu accessors pre-empt safe")
>
> The problem on arm64 is that our atomics take a single base register,
> and we have to generate the percpu address with separate instructions
> from the atomic itself. That means we can get preempted between address
> generation and the atomic, which is problematic for sequences like:
>
> // Thread-A // Thread-B
>
> this_cpu_add(var)
> local_irq_disable(flags)
> ...
> v = __this_cpu_read(var);
> v = some_function(v);
> __this_cpu_write(var, v);
> ...
> local_irq_restore(flags)
>
> ... which can unexpectedly race as:
>
>
> // Thread-A // Thread-B
>
> < generate CPU X addr >
> < preempted >
>
> < scheduled on CPU X >
> local_irq_disable(flags);
> v = __this_cpu_read(var);
>
> < scheduled on CPU Y >
> < add to CPU X's var >
> v = some_function(v);
> __this_cpu_write(var, v);
> local_irq_restore(flags);
>
> ... and hence we lose an update to a percpu variable.
... so, removing preempt disable _in general_ is probably not a good idea.
[1] https://lore.kernel.org/all/20190311164837.GD24275@lakrids.cambridge.arm.com
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-02-11 10:20 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
2025-11-11 1:39 ` Harry Yoo
2025-11-11 18:58 ` Roman Gushchin
2026-01-29 13:05 ` Dev Jain
2026-02-02 4:26 ` Shakeel Butt
2026-02-02 4:48 ` Dev Jain
2026-02-02 4:54 ` Shakeel Butt
2026-02-02 8:53 ` Dev Jain
2026-02-04 20:38 ` Shakeel Butt
2026-02-05 5:20 ` Dev Jain
2026-02-05 5:45 ` Harry Yoo
2026-02-05 5:58 ` Shakeel Butt
2026-02-10 7:38 ` Dev Jain
2026-02-10 16:29 ` Shakeel Butt
2026-02-11 7:37 ` Dev Jain
2026-02-11 8:53 ` Harry Yoo [this message]
2026-02-11 9:24 ` Shakeel Butt
2026-02-11 10:14 ` Harry Yoo
2026-02-12 5:16 ` Dev Jain
2026-02-12 5:14 ` Dev Jain
2026-02-12 1:31 ` Shakeel Butt
2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
2025-11-11 1:46 ` Harry Yoo
2025-11-11 8:23 ` Qi Zheng
2025-11-11 18:58 ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
2025-11-11 5:21 ` Harry Yoo
2025-11-11 18:58 ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
2025-11-11 5:41 ` Harry Yoo
2025-11-11 18:59 ` Roman Gushchin
2025-11-11 0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
2025-11-11 2:23 ` Qi Zheng
2025-11-11 2:39 ` Shakeel Butt
2025-11-11 2:48 ` Qi Zheng
2025-11-11 3:00 ` Shakeel Butt
2025-11-11 3:07 ` Qi Zheng
2025-11-11 3:18 ` Harry Yoo
2025-11-11 3:29 ` Qi Zheng
2025-11-11 3:05 ` Harry Yoo
2025-11-11 8:01 ` Sebastian Andrzej Siewior
2025-11-11 8:36 ` Qi Zheng
2025-11-11 16:45 ` Shakeel Butt
2025-11-12 2:11 ` Qi Zheng
2025-11-11 9:54 ` Vlastimil Babka
2025-11-11 19:01 ` Roman Gushchin
2025-11-11 19:34 ` Shakeel Butt
2025-11-15 19:27 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYxDkkDI4mk3r011@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.