* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
[not found] ` <97ea4728568459f501ddcab6c378c29064630bb9.1761658310.git.zhengqi.arch@bytedance.com>
@ 2025-11-07 5:11 ` Harry Yoo
2025-11-07 6:41 ` Qi Zheng
2025-11-07 7:18 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 22+ messages in thread
From: Harry Yoo @ 2025-11-07 5:11 UTC (permalink / raw)
To: Qi Zheng
Cc: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Tue, Oct 28, 2025 at 09:58:17PM +0800, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
>
> In a subsequent patch, we'll reparent the LRU folios. The folios that are
> moved to the appropriate LRU list can undergo reparenting during the
> move_folios_to_lru() process. Hence, it's incorrect for the caller to hold
> a lruvec lock. Instead, we should utilize the more general interface of
> folio_lruvec_relock_irq() to obtain the correct lruvec lock.
>
> This patch involves only code refactoring and doesn't introduce any
> functional changes.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
> mm/vmscan.c | 46 +++++++++++++++++++++++-----------------------
> 1 file changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3a1044ce30f1e..660cd40cfddd4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2016,9 +2016,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false,
> lruvec_memcg(lruvec));
>
> - spin_lock_irq(&lruvec->lru_lock);
> - move_folios_to_lru(lruvec, &folio_list);
> + move_folios_to_lru(&folio_list);
>
> + spin_lock_irq(&lruvec->lru_lock);
> __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
> stat.nr_demoted);
Maybe I'm missing something or just confused for now, but let me ask...
How do we make sure the lruvec (and the mem_cgroup containing the
lruvec) did not disappear (due to offlining) after move_folios_to_lru()?
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
> @@ -2166,11 +2166,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
> /*
> * Move folios back to the lru list.
> */
> - spin_lock_irq(&lruvec->lru_lock);
> -
> - nr_activate = move_folios_to_lru(lruvec, &l_active);
> - nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
> + nr_activate = move_folios_to_lru(&l_active);
> + nr_deactivate = move_folios_to_lru(&l_inactive);
>
> + spin_lock_irq(&lruvec->lru_lock);
> __count_vm_events(PGDEACTIVATE, nr_deactivate);
> count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
>
> @@ -4735,14 +4734,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active));
> }
>
> - spin_lock_irq(&lruvec->lru_lock);
> -
> - move_folios_to_lru(lruvec, &list);
> + move_folios_to_lru(&list);
>
> + local_irq_disable();
> walk = current->reclaim_state->mm_walk;
> if (walk && walk->batched) {
> walk->lruvec = lruvec;
> + spin_lock(&lruvec->lru_lock);
> reset_batch_size(walk);
> + spin_unlock(&lruvec->lru_lock);
> }
Cc'ing RT folks as they may not want to disable IRQ on PREEMPT_RT.
IIRC there has been some effort in MM to reduce the scope of
IRQ-disabled section in MM when PREEMPT_RT config was added to the
mainline. spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.
Also, this will break RT according to Documentation/locking/locktypes.rst:
> The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
> have a few implications. For example, on a non-PREEMPT_RT kernel
> the following code sequence works as expected:
>
> local_irq_disable();
> spin_lock(&lock);
>
> and is fully equivalent to:
>
> spin_lock_irq(&lock);
> Same applies to rwlock_t and the _irqsave() suffix variants.
>
> On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires
> a fully preemptible context. Instead, use spin_lock_irq() or
> spin_lock_irqsave() and their unlock counterparts.
>
> In cases where the interrupt disabling and locking must remain separate,
> PREEMPT_RT offers a local_lock mechanism. Acquiring the local_lock pins
> the task to a CPU, allowing things like per-CPU interrupt disabled locks
> to be acquired. However, this approach should be used only where absolutely
> necessary.
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-07 5:11 ` [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru() Harry Yoo
@ 2025-11-07 6:41 ` Qi Zheng
2025-11-07 13:20 ` Harry Yoo
2025-11-07 7:18 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 22+ messages in thread
From: Qi Zheng @ 2025-11-07 6:41 UTC (permalink / raw)
To: Harry Yoo
Cc: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
Hi Harry,
On 11/7/25 1:11 PM, Harry Yoo wrote:
> On Tue, Oct 28, 2025 at 09:58:17PM +0800, Qi Zheng wrote:
>> From: Muchun Song <songmuchun@bytedance.com>
>>
>> In a subsequent patch, we'll reparent the LRU folios. The folios that are
>> moved to the appropriate LRU list can undergo reparenting during the
>> move_folios_to_lru() process. Hence, it's incorrect for the caller to hold
>> a lruvec lock. Instead, we should utilize the more general interface of
>> folio_lruvec_relock_irq() to obtain the correct lruvec lock.
>>
>> This patch involves only code refactoring and doesn't introduce any
>> functional changes.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>> mm/vmscan.c | 46 +++++++++++++++++++++++-----------------------
>> 1 file changed, 23 insertions(+), 23 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 3a1044ce30f1e..660cd40cfddd4 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -2016,9 +2016,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>> nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false,
>> lruvec_memcg(lruvec));
>>
>> - spin_lock_irq(&lruvec->lru_lock);
>> - move_folios_to_lru(lruvec, &folio_list);
>> + move_folios_to_lru(&folio_list);
>>
>> + spin_lock_irq(&lruvec->lru_lock);
>> __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
>> stat.nr_demoted);
>
> Maybe I'm missing something or just confused for now, but let me ask...
>
> How do we make sure the lruvec (and the mem_cgroup containing the
> lruvec) did not disappear (due to offlining) after move_folios_to_lru()?
We obtained lruvec through the following method:
memcg = mem_cgroup_iter(target_memcg, NULL, partial);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
shrink_lruvec(lruvec, sc);
--> shrink_inactive_list
} while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial)));
The mem_cgroup_iter() will hold the refcount of this memcg, so IIUC,
the memcg will not disappear at this time.
>
>> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
>> @@ -2166,11 +2166,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
>> /*
>> * Move folios back to the lru list.
>> */
>> - spin_lock_irq(&lruvec->lru_lock);
>> -
>> - nr_activate = move_folios_to_lru(lruvec, &l_active);
>> - nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
>> + nr_activate = move_folios_to_lru(&l_active);
>> + nr_deactivate = move_folios_to_lru(&l_inactive);
>>
>> + spin_lock_irq(&lruvec->lru_lock);
>> __count_vm_events(PGDEACTIVATE, nr_deactivate);
>> count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
>>
>> @@ -4735,14 +4734,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>> set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active));
>> }
>>
>> - spin_lock_irq(&lruvec->lru_lock);
>> -
>> - move_folios_to_lru(lruvec, &list);
>> + move_folios_to_lru(&list);
>>
>> + local_irq_disable();
>> walk = current->reclaim_state->mm_walk;
>> if (walk && walk->batched) {
>> walk->lruvec = lruvec;
>> + spin_lock(&lruvec->lru_lock);
>> reset_batch_size(walk);
>> + spin_unlock(&lruvec->lru_lock);
>> }
>
> Cc'ing RT folks as they may not want to disable IRQ on PREEMPT_RT.
>
> IIRC there has been some effort in MM to reduce the scope of
> IRQ-disabled section in MM when PREEMPT_RT config was added to the
> mainline. spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.
Thanks for this information.
>
> Also, this will break RT according to Documentation/locking/locktypes.rst:
>> The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
>> have a few implications. For example, on a non-PREEMPT_RT kernel
>> the following code sequence works as expected:
>>
>> local_irq_disable();
>> spin_lock(&lock);
>>
>> and is fully equivalent to:
>>
>> spin_lock_irq(&lock);
>> Same applies to rwlock_t and the _irqsave() suffix variants.
>>
>> On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires
>> a fully preemptible context. Instead, use spin_lock_irq() or
>> spin_lock_irqsave() and their unlock counterparts.
>>
>> In cases where the interrupt disabling and locking must remain separate,
>> PREEMPT_RT offers a local_lock mechanism. Acquiring the local_lock pins
>> the task to a CPU, allowing things like per-CPU interrupt disabled locks
>> to be acquired. However, this approach should be used only where absolutely
>> necessary.
But how do we determine if it's necessary?
Thanks,
Qi
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-07 5:11 ` [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru() Harry Yoo
2025-11-07 6:41 ` Qi Zheng
@ 2025-11-07 7:18 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-07 7:18 UTC (permalink / raw)
To: Harry Yoo
Cc: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On 2025-11-07 14:11:27 [+0900], Harry Yoo wrote:
> > @@ -4735,14 +4734,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active));
> > }
> >
> > - spin_lock_irq(&lruvec->lru_lock);
> > -
> > - move_folios_to_lru(lruvec, &list);
> > + move_folios_to_lru(&list);
> >
> > + local_irq_disable();
> > walk = current->reclaim_state->mm_walk;
> > if (walk && walk->batched) {
> > walk->lruvec = lruvec;
> > + spin_lock(&lruvec->lru_lock);
> > reset_batch_size(walk);
> > + spin_unlock(&lruvec->lru_lock);
> > }
>
> Cc'ing RT folks as they may not want to disable IRQ on PREEMPT_RT.
Thank you, this is not going to work. The local_irq_disable() shouldn't
be used.
> IIRC there has been some effort in MM to reduce the scope of
> IRQ-disabled section in MM when PREEMPT_RT config was added to the
> mainline. spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.
Exactly.
Sebastian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-07 6:41 ` Qi Zheng
@ 2025-11-07 13:20 ` Harry Yoo
2025-11-08 6:32 ` Shakeel Butt
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-11-07 13:20 UTC (permalink / raw)
To: Qi Zheng
Cc: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Fri, Nov 07, 2025 at 02:41:13PM +0800, Qi Zheng wrote:
> Hi Harry,
>
> On 11/7/25 1:11 PM, Harry Yoo wrote:
> > On Tue, Oct 28, 2025 at 09:58:17PM +0800, Qi Zheng wrote:
> > > From: Muchun Song <songmuchun@bytedance.com>
> > >
> > > In a subsequent patch, we'll reparent the LRU folios. The folios that are
> > > moved to the appropriate LRU list can undergo reparenting during the
> > > move_folios_to_lru() process. Hence, it's incorrect for the caller to hold
> > > a lruvec lock. Instead, we should utilize the more general interface of
> > > folio_lruvec_relock_irq() to obtain the correct lruvec lock.
> > >
> > > This patch involves only code refactoring and doesn't introduce any
> > > functional changes.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > > ---
> > > mm/vmscan.c | 46 +++++++++++++++++++++++-----------------------
> > > 1 file changed, 23 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 3a1044ce30f1e..660cd40cfddd4 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2016,9 +2016,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> > > nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false,
> > > lruvec_memcg(lruvec));
> > > - spin_lock_irq(&lruvec->lru_lock);
> > > - move_folios_to_lru(lruvec, &folio_list);
> > > + move_folios_to_lru(&folio_list);
> > > + spin_lock_irq(&lruvec->lru_lock);
> > > __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
> > > stat.nr_demoted);
> >
> > Maybe I'm missing something or just confused for now, but let me ask...
> >
> > How do we make sure the lruvec (and the mem_cgroup containing the
> > lruvec) did not disappear (due to offlining) after move_folios_to_lru()?
>
> We obtained lruvec through the following method:
>
> memcg = mem_cgroup_iter(target_memcg, NULL, partial);
> do {
> struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
>
> shrink_lruvec(lruvec, sc);
> --> shrink_inactive_list
> } while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial)));
>
> The mem_cgroup_iter() will hold the refcount of this memcg, so IIUC,
> the memcg will not disappear at this time.
Ah, right!
It can be offlined, but won't be released due to the refcount.
Thanks for the explanation.
> > > __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
> > > @@ -2166,11 +2166,10 @@ static void shrink_active_list(unsigned long nr_to_scan,
> > > /*
> > > * Move folios back to the lru list.
> > > */
> > > - spin_lock_irq(&lruvec->lru_lock);
> > > -
> > > - nr_activate = move_folios_to_lru(lruvec, &l_active);
> > > - nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
> > > + nr_activate = move_folios_to_lru(&l_active);
> > > + nr_deactivate = move_folios_to_lru(&l_inactive);
> > > + spin_lock_irq(&lruvec->lru_lock);
> > > __count_vm_events(PGDEACTIVATE, nr_deactivate);
> > > count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
> > > @@ -4735,14 +4734,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > > set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_active));
> > > }
> > > - spin_lock_irq(&lruvec->lru_lock);
> > > -
> > > - move_folios_to_lru(lruvec, &list);
> > > + move_folios_to_lru(&list);
> > > + local_irq_disable();
> > > walk = current->reclaim_state->mm_walk;
> > > if (walk && walk->batched) {
> > > walk->lruvec = lruvec;
> > > + spin_lock(&lruvec->lru_lock);
> > > reset_batch_size(walk);
> > > + spin_unlock(&lruvec->lru_lock);
> > > }
> >
> > Cc'ing RT folks as they may not want to disable IRQ on PREEMPT_RT.
> >
> > IIRC there has been some effort in MM to reduce the scope of
> > IRQ-disabled section in MM when PREEMPT_RT config was added to the
> > mainline. spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.
>
> Thanks for this information.
>
> >
> > Also, this will break RT according to Documentation/locking/locktypes.rst:
> > > The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
> > > have a few implications. For example, on a non-PREEMPT_RT kernel
> > > the following code sequence works as expected:
> > >
> > > local_irq_disable();
> > > spin_lock(&lock);
> > >
> > > and is fully equivalent to:
> > >
> > > spin_lock_irq(&lock);
> > > Same applies to rwlock_t and the _irqsave() suffix variants.
> > >
> > > On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires
> > > a fully preemptible context. Instead, use spin_lock_irq() or
> > > spin_lock_irqsave() and their unlock counterparts.
> > >
> > > In cases where the interrupt disabling and locking must remain separate,
> > > PREEMPT_RT offers a local_lock mechanism. Acquiring the local_lock pins
> > > the task to a CPU, allowing things like per-CPU interrupt disabled locks
> > > to be acquired. However, this approach should be used only where absolutely
> > > necessary.
>
> But how do we determine if it's necessary?
Although it's mentioned in the locking documentation, I'm afraid that
local_lock is not the right interface to use here. Preemption will be
disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
updated (in __mod_node_page_state()).
Here we just want to disable IRQ only on !PREEMPT_RT (to update
the stats safely).
Maybe we'll need some ifdeffery? I don't see a better way around...
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-07 13:20 ` Harry Yoo
@ 2025-11-08 6:32 ` Shakeel Butt
2025-11-10 2:13 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: Shakeel Butt @ 2025-11-08 6:32 UTC (permalink / raw)
To: Harry Yoo
Cc: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Fri, Nov 07, 2025 at 10:20:57PM +0900, Harry Yoo wrote:
>
> Although it's mentioned in the locking documentation, I'm afraid that
> local_lock is not the right interface to use here. Preemption will be
> disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
> updated (in __mod_node_page_state()).
>
> Here we just want to disable IRQ only on !PREEMPT_RT (to update
> the stats safely).
I don't think there is a need to disable IRQs. There are three stats
update functions called in that hunk.
1) __mod_lruvec_state
2) __count_vm_events
3) count_memcg_events
count_memcg_events() can be called with IRQs. __count_vm_events can be
replaced with count_vm_events. For __mod_lruvec_state, the
__mod_node_page_state() inside needs preemption disabled.
Easy way would be to just disable/enable preemption instead of IRQs.
Otherwise go a bit more fine-grained approach i.e. replace
__count_vm_events with count_vm_events and just disable preemption
across __mod_node_page_state().
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-08 6:32 ` Shakeel Butt
@ 2025-11-10 2:13 ` Harry Yoo
2025-11-10 4:30 ` Qi Zheng
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-11-10 2:13 UTC (permalink / raw)
To: Shakeel Butt
Cc: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Fri, Nov 07, 2025 at 10:32:52PM -0800, Shakeel Butt wrote:
> On Fri, Nov 07, 2025 at 10:20:57PM +0900, Harry Yoo wrote:
> >
> > Although it's mentioned in the locking documentation, I'm afraid that
> > local_lock is not the right interface to use here. Preemption will be
> > disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
> > updated (in __mod_node_page_state()).
> >
> > Here we just want to disable IRQ only on !PREEMPT_RT (to update
> > the stats safely).
>
> I don't think there is a need to disable IRQs. There are three stats
> update functions called in that hunk.
>
> 1) __mod_lruvec_state
> 2) __count_vm_events
> 3) count_memcg_events
>
> count_memcg_events() can be called with IRQs. __count_vm_events can be
> replaced with count_vm_events.
Right.
> For __mod_lruvec_state, the
> __mod_node_page_state() inside needs preemption disabled.
The function __mod_node_page_state() disables preemption.
And there's a comment in __mod_zone_page_state():
> /*
> * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
> * atomicity is provided by IRQs being disabled -- either explicitly
> * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
> * CPU migrations and preemption potentially corrupts a counter so
> * disable preemption.
> */
> preempt_disable_nested();
We're relying on IRQs being disabled on !PREEMPT_RT.
Maybe we could make it safe against re-entrant IRQ handlers by using
read-modify-write operations?
--
Cheers,
Harry / Hyeonggon
> Easy way would be to just disable/enable preemption instead of IRQs.
> Otherwise go a bit more fine-grained approach i.e. replace
> __count_vm_events with count_vm_events and just disable preemption
> across __mod_node_page_state().
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 2:13 ` Harry Yoo
@ 2025-11-10 4:30 ` Qi Zheng
2025-11-10 5:43 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: Qi Zheng @ 2025-11-10 4:30 UTC (permalink / raw)
To: Harry Yoo, Shakeel Butt
Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On 11/10/25 10:13 AM, Harry Yoo wrote:
> On Fri, Nov 07, 2025 at 10:32:52PM -0800, Shakeel Butt wrote:
>> On Fri, Nov 07, 2025 at 10:20:57PM +0900, Harry Yoo wrote:
>>>
>>> Although it's mentioned in the locking documentation, I'm afraid that
>>> local_lock is not the right interface to use here. Preemption will be
>>> disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
>>> updated (in __mod_node_page_state()).
>>>
>>> Here we just want to disable IRQ only on !PREEMPT_RT (to update
>>> the stats safely).
>>
>> I don't think there is a need to disable IRQs. There are three stats
>> update functions called in that hunk.
>>
>> 1) __mod_lruvec_state
>> 2) __count_vm_events
>> 3) count_memcg_events
>>
>> count_memcg_events() can be called with IRQs. __count_vm_events can be
>> replaced with count_vm_events.
>
> Right.
>
>> For __mod_lruvec_state, the
>> __mod_node_page_state() inside needs preemption disabled.
>
> The function __mod_node_page_state() disables preemption.
> And there's a comment in __mod_zone_page_state():
>
>> /*
>> * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
>> * atomicity is provided by IRQs being disabled -- either explicitly
>> * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
>> * CPU migrations and preemption potentially corrupts a counter so
>> * disable preemption.
>> */
>> preempt_disable_nested();
>
> We're relying on IRQs being disabled on !PREEMPT_RT.
So it's possible for us to update vmstat within an interrupt context,
right?
There is also a comment above __mod_zone_page_state():
/*
* For use when we know that interrupts are disabled,
* or when we know that preemption is disabled and that
* particular counter cannot be updated from interrupt context.
*/
BTW, the comment inside __mod_node_page_state() should be:
/* See __mod_zone_page_state */
instead of
/* See __mod_node_page_state */
Will fix it.
>
> Maybe we could make it safe against re-entrant IRQ handlers by using
> read-modify-write operations?
Isn't it because of the RMW operation that we need to use IRQ to
guarantee atomicity? Or have I misunderstood something?
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 4:30 ` Qi Zheng
@ 2025-11-10 5:43 ` Harry Yoo
2025-11-10 6:11 ` Qi Zheng
2025-11-10 16:47 ` Shakeel Butt
0 siblings, 2 replies; 22+ messages in thread
From: Harry Yoo @ 2025-11-10 5:43 UTC (permalink / raw)
To: Qi Zheng
Cc: Shakeel Butt, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
>
>
> On 11/10/25 10:13 AM, Harry Yoo wrote:
> > On Fri, Nov 07, 2025 at 10:32:52PM -0800, Shakeel Butt wrote:
> > > On Fri, Nov 07, 2025 at 10:20:57PM +0900, Harry Yoo wrote:
> > > >
> > > > Although it's mentioned in the locking documentation, I'm afraid that
> > > > local_lock is not the right interface to use here. Preemption will be
> > > > disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
> > > > updated (in __mod_node_page_state()).
> > > >
> > > > Here we just want to disable IRQ only on !PREEMPT_RT (to update
> > > > the stats safely).
> > >
> > > I don't think there is a need to disable IRQs. There are three stats
> > > update functions called in that hunk.
> > >
> > > 1) __mod_lruvec_state
> > > 2) __count_vm_events
> > > 3) count_memcg_events
> > >
> > > count_memcg_events() can be called with IRQs. __count_vm_events can be
> > > replaced with count_vm_events.
> >
> > Right.
> >
> > > For __mod_lruvec_state, the
> > > __mod_node_page_state() inside needs preemption disabled.
> >
> > The function __mod_node_page_state() disables preemption.
> > And there's a comment in __mod_zone_page_state():
> >
> > > /*
> > > * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
> > > * atomicity is provided by IRQs being disabled -- either explicitly
> > > * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
> > > * CPU migrations and preemption potentially corrupts a counter so
> > > * disable preemption.
> > > */
> > > preempt_disable_nested();
> >
> > We're relying on IRQs being disabled on !PREEMPT_RT.
>
> So it's possible for us to update vmstat within an interrupt context,
> right?
Yes, for instance when freeing memory in an interrupt context we can
update vmstat and that's why we disable interrupts now.
> There is also a comment above __mod_zone_page_state():
>
> /*
> * For use when we know that interrupts are disabled,
> * or when we know that preemption is disabled and that
> * particular counter cannot be updated from interrupt context.
> */
Yeah we don't have to disable IRQs when we already know it's disabled.
> BTW, the comment inside __mod_node_page_state() should be:
>
> /* See __mod_zone_page_state */
>
> instead of
>
> /* See __mod_node_page_state */
>
> Will fix it.
Right :) Thanks!
> > Maybe we could make it safe against re-entrant IRQ handlers by using
> > read-modify-write operations?
>
> Isn't it because of the RMW operation that we need to use IRQ to
> guarantee atomicity? Or have I misunderstood something?
I meant using atomic operations instead of disabling IRQs, like, by
using this_cpu_add() or cmpxchg() instead.
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 5:43 ` Harry Yoo
@ 2025-11-10 6:11 ` Qi Zheng
2025-11-10 16:47 ` Shakeel Butt
1 sibling, 0 replies; 22+ messages in thread
From: Qi Zheng @ 2025-11-10 6:11 UTC (permalink / raw)
To: Harry Yoo
Cc: Shakeel Butt, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On 11/10/25 1:43 PM, Harry Yoo wrote:
> On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
>>
>>
>> On 11/10/25 10:13 AM, Harry Yoo wrote:
>>> On Fri, Nov 07, 2025 at 10:32:52PM -0800, Shakeel Butt wrote:
>>>> On Fri, Nov 07, 2025 at 10:20:57PM +0900, Harry Yoo wrote:
>>>>>
>>>>> Although it's mentioned in the locking documentation, I'm afraid that
>>>>> local_lock is not the right interface to use here. Preemption will be
>>>>> disabled anyway (on both PREEMPT_RT and !PREEMPT_RT) when the stats are
>>>>> updated (in __mod_node_page_state()).
>>>>>
>>>>> Here we just want to disable IRQ only on !PREEMPT_RT (to update
>>>>> the stats safely).
>>>>
>>>> I don't think there is a need to disable IRQs. There are three stats
>>>> update functions called in that hunk.
>>>>
>>>> 1) __mod_lruvec_state
>>>> 2) __count_vm_events
>>>> 3) count_memcg_events
>>>>
>>>> count_memcg_events() can be called with IRQs. __count_vm_events can be
>>>> replaced with count_vm_events.
>>>
>>> Right.
>>>
>>>> For __mod_lruvec_state, the
>>>> __mod_node_page_state() inside needs preemption disabled.
>>>
>>> The function __mod_node_page_state() disables preemption.
>>> And there's a comment in __mod_zone_page_state():
>>>
>>>> /*
>>>> * Accurate vmstat updates require a RMW. On !PREEMPT_RT kernels,
>>>> * atomicity is provided by IRQs being disabled -- either explicitly
>>>> * or via local_lock_irq. On PREEMPT_RT, local_lock_irq only disables
>>>> * CPU migrations and preemption potentially corrupts a counter so
>>>> * disable preemption.
>>>> */
>>>> preempt_disable_nested();
>>>
>>> We're relying on IRQs being disabled on !PREEMPT_RT.
>>
>> So it's possible for us to update vmstat within an interrupt context,
>> right?
>
> Yes, for instance when freeing memory in an interrupt context we can
> update vmstat and that's why we disable interrupts now.
Got it.
>
>> There is also a comment above __mod_zone_page_state():
>>
>> /*
>> * For use when we know that interrupts are disabled,
>> * or when we know that preemption is disabled and that
>> * particular counter cannot be updated from interrupt context.
>> */
>
> Yeah we don't have to disable IRQs when we already know it's disabled.
>
>> BTW, the comment inside __mod_node_page_state() should be:
>>
>> /* See __mod_zone_page_state */
>>
>> instead of
>>
>> /* See __mod_node_page_state */
>>
>> Will fix it.
>
> Right :) Thanks!
>
>>> Maybe we could make it safe against re-entrant IRQ handlers by using
>>> read-modify-write operations?
>>
>> Isn't it because of the RMW operation that we need to use IRQ to
>> guarantee atomicity? Or have I misunderstood something?
>
> I meant using atomic operations instead of disabling IRQs, like, by
> using this_cpu_add() or cmpxchg() instead.
Got it. I will give it a try.
Thanks,
Qi
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 5:43 ` Harry Yoo
2025-11-10 6:11 ` Qi Zheng
@ 2025-11-10 16:47 ` Shakeel Butt
2025-11-11 0:42 ` Harry Yoo
2025-11-11 3:04 ` Qi Zheng
1 sibling, 2 replies; 22+ messages in thread
From: Shakeel Butt @ 2025-11-10 16:47 UTC (permalink / raw)
To: Harry Yoo
Cc: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
> On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
> > > Maybe we could make it safe against re-entrant IRQ handlers by using
> > > read-modify-write operations?
> >
> > Isn't it because of the RMW operation that we need to use IRQ to
> > guarantee atomicity? Or have I misunderstood something?
>
> I meant using atomic operations instead of disabling IRQs, like, by
> using this_cpu_add() or cmpxchg() instead.
We already have mod_node_page_state() which is safe from IRQs and is
optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
includes x86 and arm64.
Let me send the patch to cleanup the memcg code which uses
__mod_node_page_state.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 16:47 ` Shakeel Butt
@ 2025-11-11 0:42 ` Harry Yoo
2025-11-11 3:04 ` Qi Zheng
1 sibling, 0 replies; 22+ messages in thread
From: Harry Yoo @ 2025-11-11 0:42 UTC (permalink / raw)
To: Shakeel Butt
Cc: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Mon, Nov 10, 2025 at 08:47:57AM -0800, Shakeel Butt wrote:
> On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
> > On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
> > > > Maybe we could make it safe against re-entrant IRQ handlers by using
> > > > read-modify-write operations?
> > >
> > > Isn't it because of the RMW operation that we need to use IRQ to
> > > guarantee atomicity? Or have I misunderstood something?
> >
> > I meant using atomic operations instead of disabling IRQs, like, by
> > using this_cpu_add() or cmpxchg() instead.
>
> We already have mod_node_page_state() which is safe from IRQs and is
> optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
> includes x86 and arm64.
Nice observation, thanks!
> Let me send the patch to cleanup the memcg code which uses
> __mod_node_page_state.
I'll take a look.
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-10 16:47 ` Shakeel Butt
2025-11-11 0:42 ` Harry Yoo
@ 2025-11-11 3:04 ` Qi Zheng
2025-11-11 3:16 ` Harry Yoo
2025-11-11 3:17 ` Shakeel Butt
1 sibling, 2 replies; 22+ messages in thread
From: Qi Zheng @ 2025-11-11 3:04 UTC (permalink / raw)
To: Shakeel Butt, Harry Yoo
Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On 11/11/25 12:47 AM, Shakeel Butt wrote:
> On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
>> On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
>>>> Maybe we could make it safe against re-entrant IRQ handlers by using
>>>> read-modify-write operations?
>>>
>>> Isn't it because of the RMW operation that we need to use IRQ to
>>> guarantee atomicity? Or have I misunderstood something?
>>
>> I meant using atomic operations instead of disabling IRQs, like, by
>> using this_cpu_add() or cmpxchg() instead.
>
> We already have mod_node_page_state() which is safe from IRQs and is
> optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
> includes x86 and arm64.
However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
>
> Let me send the patch to cleanup the memcg code which uses
> __mod_node_page_state.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 3:04 ` Qi Zheng
@ 2025-11-11 3:16 ` Harry Yoo
2025-11-11 3:23 ` Qi Zheng
2025-11-11 8:49 ` Sebastian Andrzej Siewior
2025-11-11 3:17 ` Shakeel Butt
1 sibling, 2 replies; 22+ messages in thread
From: Harry Yoo @ 2025-11-11 3:16 UTC (permalink / raw)
To: Qi Zheng
Cc: Shakeel Butt, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Tue, Nov 11, 2025 at 11:04:09AM +0800, Qi Zheng wrote:
>
> On 11/11/25 12:47 AM, Shakeel Butt wrote:
> > On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
> > > On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
> > > > > Maybe we could make it safe against re-entrant IRQ handlers by using
> > > > > read-modify-write operations?
> > > >
> > > > Isn't it because of the RMW operation that we need to use IRQ to
> > > > guarantee atomicity? Or have I misunderstood something?
> > >
> > > I meant using atomic operations instead of disabling IRQs, like, by
> > > using this_cpu_add() or cmpxchg() instead.
> >
> > We already have mod_node_page_state() which is safe from IRQs and is
> > optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
> > includes x86 and arm64.
>
> However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
Hmm I was going to say it's necessary, but AFAICT we don't allocate
or free memory in hardirq context on PREEMPT_RT (that's the policy)
and so I'd say it's not necessary to disable IRQs.
Sounds like we still want to disable IRQs only on !PREEMPT_RT on
such architectures?
Not sure how seriously do PREEMPT_RT folks care about architectures
without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
but doesn't have HAVE_CMPXCHG_LOCAL).
If they do care, this can be done as a separate patch series because
we already call local_irq_{save,restore}() in many places in mm/vmstat.c
if the architecture doesn't not have HAVE_CMPXCHG_LOCAL.
> > Let me send the patch to cleanup the memcg code which uses
> > __mod_node_page_state.
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 3:04 ` Qi Zheng
2025-11-11 3:16 ` Harry Yoo
@ 2025-11-11 3:17 ` Shakeel Butt
2025-11-11 3:24 ` Qi Zheng
1 sibling, 1 reply; 22+ messages in thread
From: Shakeel Butt @ 2025-11-11 3:17 UTC (permalink / raw)
To: Qi Zheng
Cc: Harry Yoo, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On Tue, Nov 11, 2025 at 11:04:09AM +0800, Qi Zheng wrote:
>
> On 11/11/25 12:47 AM, Shakeel Butt wrote:
> > On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
> > > On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
> > > > > Maybe we could make it safe against re-entrant IRQ handlers by using
> > > > > read-modify-write operations?
> > > >
> > > > Isn't it because of the RMW operation that we need to use IRQ to
> > > > guarantee atomicity? Or have I misunderstood something?
> > >
> > > I meant using atomic operations instead of disabling IRQs, like, by
> > > using this_cpu_add() or cmpxchg() instead.
> >
> > We already have mod_node_page_state() which is safe from IRQs and is
> > optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
> > includes x86 and arm64.
>
> However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
>
Yes we can disable irqs on PREEMPT_RT but it is usually frown upon and
it is usually requested to do so only for short window. However if
someone running PREEMPT_RT on an arch without HAVE_CMPXCHG_LOCAL and has
issues with mod_node_page_state() then they can solve it then. I don't
think we need to fix that now.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 3:16 ` Harry Yoo
@ 2025-11-11 3:23 ` Qi Zheng
2025-11-11 8:49 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 22+ messages in thread
From: Qi Zheng @ 2025-11-11 3:23 UTC (permalink / raw)
To: Harry Yoo
Cc: Shakeel Butt, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On 11/11/25 11:16 AM, Harry Yoo wrote:
> On Tue, Nov 11, 2025 at 11:04:09AM +0800, Qi Zheng wrote:
>>
>> On 11/11/25 12:47 AM, Shakeel Butt wrote:
>>> On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
>>>> On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
>>>>>> Maybe we could make it safe against re-entrant IRQ handlers by using
>>>>>> read-modify-write operations?
>>>>>
>>>>> Isn't it because of the RMW operation that we need to use IRQ to
>>>>> guarantee atomicity? Or have I misunderstood something?
>>>>
>>>> I meant using atomic operations instead of disabling IRQs, like, by
>>>> using this_cpu_add() or cmpxchg() instead.
>>>
>>> We already have mod_node_page_state() which is safe from IRQs and is
>>> optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
>>> includes x86 and arm64.
>>
>> However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
>> still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
>
> Hmm I was going to say it's necessary, but AFAICT we don't allocate
> or free memory in hardirq context on PREEMPT_RT (that's the policy)
> and so I'd say it's not necessary to disable IRQs.
>
> Sounds like we still want to disable IRQs only on !PREEMPT_RT on
> such architectures?
>
> Not sure how seriously do PREEMPT_RT folks care about architectures
> without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
> but doesn't have HAVE_CMPXCHG_LOCAL).
>
> If they do care, this can be done as a separate patch series because
> we already call local_irq_{save,restore}() in many places in mm/vmstat.c
> if the architecture doesn't not have HAVE_CMPXCHG_LOCAL.
Got it. I will ignore it for now.
>
>>> Let me send the patch to cleanup the memcg code which uses
>>> __mod_node_page_state.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 3:17 ` Shakeel Butt
@ 2025-11-11 3:24 ` Qi Zheng
0 siblings, 0 replies; 22+ messages in thread
From: Qi Zheng @ 2025-11-11 3:24 UTC (permalink / raw)
To: Shakeel Butt
Cc: Harry Yoo, hannes, hughd, mhocko, roman.gushchin, muchun.song,
david, lorenzo.stoakes, ziy, imran.f.khan, kamalesh.babulal,
axelrasmussen, yuanchu, weixugc, akpm, linux-mm, linux-kernel,
cgroups, Muchun Song, Qi Zheng, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel
On 11/11/25 11:17 AM, Shakeel Butt wrote:
> On Tue, Nov 11, 2025 at 11:04:09AM +0800, Qi Zheng wrote:
>>
>> On 11/11/25 12:47 AM, Shakeel Butt wrote:
>>> On Mon, Nov 10, 2025 at 02:43:21PM +0900, Harry Yoo wrote:
>>>> On Mon, Nov 10, 2025 at 12:30:06PM +0800, Qi Zheng wrote:
>>>>>> Maybe we could make it safe against re-entrant IRQ handlers by using
>>>>>> read-modify-write operations?
>>>>>
>>>>> Isn't it because of the RMW operation that we need to use IRQ to
>>>>> guarantee atomicity? Or have I misunderstood something?
>>>>
>>>> I meant using atomic operations instead of disabling IRQs, like, by
>>>> using this_cpu_add() or cmpxchg() instead.
>>>
>>> We already have mod_node_page_state() which is safe from IRQs and is
>>> optimized to not disable IRQs for archs with HAVE_CMPXCHG_LOCAL which
>>> includes x86 and arm64.
>>
>> However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
>> still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
>>
>
> Yes we can disable irqs on PREEMPT_RT but it is usually frown upon and
> it is usually requested to do so only for short window. However if
Got it.
> someone running PREEMPT_RT on an arch without HAVE_CMPXCHG_LOCAL and has
> issues with mod_node_page_state() then they can solve it then. I don't
> think we need to fix that now.
OK.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 3:16 ` Harry Yoo
2025-11-11 3:23 ` Qi Zheng
@ 2025-11-11 8:49 ` Sebastian Andrzej Siewior
2025-11-11 16:44 ` Shakeel Butt
1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-11 8:49 UTC (permalink / raw)
To: Harry Yoo
Cc: Qi Zheng, Shakeel Butt, hannes, hughd, mhocko, roman.gushchin,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On 2025-11-11 12:16:43 [+0900], Harry Yoo wrote:
> > However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> > still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
>
> Hmm I was going to say it's necessary, but AFAICT we don't allocate
> or free memory in hardirq context on PREEMPT_RT (that's the policy)
> and so I'd say it's not necessary to disable IRQs.
>
> Sounds like we still want to disable IRQs only on !PREEMPT_RT on
> such architectures?
>
> Not sure how seriously do PREEMPT_RT folks care about architectures
> without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
> but doesn't have HAVE_CMPXCHG_LOCAL).
We take things seriously and you shouldn't make assumption based on
implementation. Either the API can be used as such or not.
In case of mod_node_page_state(), the non-IRQ off version
(__mod_node_page_state()) has a preempt_disable_nested() to ensure
atomic update on PREEMPT_RT without disabling interrupts.
Sebastian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 8:49 ` Sebastian Andrzej Siewior
@ 2025-11-11 16:44 ` Shakeel Butt
2025-11-12 7:49 ` Sebastian Andrzej Siewior
2025-11-12 15:45 ` Steven Rostedt
0 siblings, 2 replies; 22+ messages in thread
From: Shakeel Butt @ 2025-11-11 16:44 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Harry Yoo, Qi Zheng, hannes, hughd, mhocko, roman.gushchin,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On Tue, Nov 11, 2025 at 09:49:00AM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-11-11 12:16:43 [+0900], Harry Yoo wrote:
> > > However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> > > still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
> >
> > Hmm I was going to say it's necessary, but AFAICT we don't allocate
> > or free memory in hardirq context on PREEMPT_RT (that's the policy)
> > and so I'd say it's not necessary to disable IRQs.
> >
> > Sounds like we still want to disable IRQs only on !PREEMPT_RT on
> > such architectures?
> >
> > Not sure how seriously do PREEMPT_RT folks care about architectures
> > without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
> > but doesn't have HAVE_CMPXCHG_LOCAL).
>
> We take things seriously and you shouldn't make assumption based on
> implementation. Either the API can be used as such or not.
> In case of mod_node_page_state(), the non-IRQ off version
> (__mod_node_page_state()) has a preempt_disable_nested() to ensure
> atomic update on PREEMPT_RT without disabling interrupts.
>
Harry is talking about mod_node_page_state() on
!CONFIG_HAVE_CMPXCHG_LOCAL which is disabling irqs.
void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
long delta)
{
unsigned long flags;
local_irq_save(flags);
__mod_node_page_state(pgdat, item, delta);
local_irq_restore(flags);
}
Is PREEMPT_RT fine with this?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 16:44 ` Shakeel Butt
@ 2025-11-12 7:49 ` Sebastian Andrzej Siewior
2025-11-12 8:46 ` Harry Yoo
2025-11-12 15:45 ` Steven Rostedt
1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-12 7:49 UTC (permalink / raw)
To: Shakeel Butt
Cc: Harry Yoo, Qi Zheng, hannes, hughd, mhocko, roman.gushchin,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On 2025-11-11 08:44:14 [-0800], Shakeel Butt wrote:
> On Tue, Nov 11, 2025 at 09:49:00AM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-11-11 12:16:43 [+0900], Harry Yoo wrote:
> > > > However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> > > > still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
> > >
> > > Hmm I was going to say it's necessary, but AFAICT we don't allocate
> > > or free memory in hardirq context on PREEMPT_RT (that's the policy)
> > > and so I'd say it's not necessary to disable IRQs.
> > >
> > > Sounds like we still want to disable IRQs only on !PREEMPT_RT on
> > > such architectures?
> > >
> > > Not sure how seriously do PREEMPT_RT folks care about architectures
> > > without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
> > > but doesn't have HAVE_CMPXCHG_LOCAL).
> >
> > We take things seriously and you shouldn't make assumption based on
> > implementation. Either the API can be used as such or not.
> > In case of mod_node_page_state(), the non-IRQ off version
> > (__mod_node_page_state()) has a preempt_disable_nested() to ensure
> > atomic update on PREEMPT_RT without disabling interrupts.
> >
>
> Harry is talking about mod_node_page_state() on
> !CONFIG_HAVE_CMPXCHG_LOCAL which is disabling irqs.
>
> void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
> long delta)
> {
> unsigned long flags;
>
> local_irq_save(flags);
> __mod_node_page_state(pgdat, item, delta);
> local_irq_restore(flags);
> }
>
> Is PREEMPT_RT fine with this?
Yes.
The local_irq_save() is not strictly needed but I am fine with it to
keep it simple. The inner part is just counting.
Sebastian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-12 7:49 ` Sebastian Andrzej Siewior
@ 2025-11-12 8:46 ` Harry Yoo
2025-11-12 8:54 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-11-12 8:46 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Shakeel Butt, Qi Zheng, hannes, hughd, mhocko, roman.gushchin,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On Wed, Nov 12, 2025 at 08:49:30AM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-11-11 08:44:14 [-0800], Shakeel Butt wrote:
> > On Tue, Nov 11, 2025 at 09:49:00AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-11-11 12:16:43 [+0900], Harry Yoo wrote:
> > > > > However, in the !CONFIG_HAVE_CMPXCHG_LOCAL case, mod_node_page_state()
> > > > > still calls local_irq_save(). Is this feasible in the PREEMPT_RT kernel?
> > > >
> > > > Hmm I was going to say it's necessary, but AFAICT we don't allocate
> > > > or free memory in hardirq context on PREEMPT_RT (that's the policy)
> > > > and so I'd say it's not necessary to disable IRQs.
> > > >
> > > > Sounds like we still want to disable IRQs only on !PREEMPT_RT on
> > > > such architectures?
> > > >
> > > > Not sure how seriously do PREEMPT_RT folks care about architectures
> > > > without HAVE_CMPXCHG_LOCAL. (riscv and loongarch have ARCH_SUPPORTS_RT
> > > > but doesn't have HAVE_CMPXCHG_LOCAL).
> > >
> > > We take things seriously and you shouldn't make assumption based on
> > > implementation. Either the API can be used as such or not.
> > > In case of mod_node_page_state(), the non-IRQ off version
> > > (__mod_node_page_state()) has a preempt_disable_nested() to ensure
> > > atomic update on PREEMPT_RT without disabling interrupts.
> >
> > Harry is talking about mod_node_page_state() on
> > !CONFIG_HAVE_CMPXCHG_LOCAL which is disabling irqs.
> >
> > void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
> > long delta)
> > {
> > unsigned long flags;
> >
> > local_irq_save(flags);
> > __mod_node_page_state(pgdat, item, delta);
> > local_irq_restore(flags);
> > }
> >
> > Is PREEMPT_RT fine with this?
>
> Yes.
> The local_irq_save() is not strictly needed but I am fine with it to
> keep it simple. The inner part is just counting.
Yeah I was wondering about this... and thanks for confirming!
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-12 8:46 ` Harry Yoo
@ 2025-11-12 8:54 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-12 8:54 UTC (permalink / raw)
To: Harry Yoo
Cc: Shakeel Butt, Qi Zheng, hannes, hughd, mhocko, roman.gushchin,
muchun.song, david, lorenzo.stoakes, ziy, imran.f.khan,
kamalesh.babulal, axelrasmussen, yuanchu, weixugc, akpm, linux-mm,
linux-kernel, cgroups, Muchun Song, Qi Zheng, Clark Williams,
Steven Rostedt, linux-rt-devel
On 2025-11-12 17:46:51 [+0900], Harry Yoo wrote:
> > > Is PREEMPT_RT fine with this?
> >
> > Yes.
> > The local_irq_save() is not strictly needed but I am fine with it to
> > keep it simple. The inner part is just counting.
>
> Yeah I was wondering about this... and thanks for confirming!
The preempt_disable_nested() and no in-IRQ user makes it okay. Not the
counting, that was meant as "it is a small section". That might have
been too short of an explanation.
Sebastian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru()
2025-11-11 16:44 ` Shakeel Butt
2025-11-12 7:49 ` Sebastian Andrzej Siewior
@ 2025-11-12 15:45 ` Steven Rostedt
1 sibling, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2025-11-12 15:45 UTC (permalink / raw)
To: Shakeel Butt
Cc: Sebastian Andrzej Siewior, Harry Yoo, Qi Zheng, hannes, hughd,
mhocko, roman.gushchin, muchun.song, david, lorenzo.stoakes, ziy,
imran.f.khan, kamalesh.babulal, axelrasmussen, yuanchu, weixugc,
akpm, linux-mm, linux-kernel, cgroups, Muchun Song, Qi Zheng,
Clark Williams, linux-rt-devel
On Tue, 11 Nov 2025 08:44:14 -0800
Shakeel Butt <shakeel.butt@linux.dev> wrote:
> Harry is talking about mod_node_page_state() on
> !CONFIG_HAVE_CMPXCHG_LOCAL which is disabling irqs.
>
> void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
> long delta)
> {
> unsigned long flags;
>
> local_irq_save(flags);
> __mod_node_page_state(pgdat, item, delta);
> local_irq_restore(flags);
> }
>
> Is PREEMPT_RT fine with this?
But should be:
void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
long delta)
{
guard(irqsave)();
__mod_node_page_state(pgdat, item, delta);
}
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-11-12 15:45 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1761658310.git.zhengqi.arch@bytedance.com>
[not found] ` <97ea4728568459f501ddcab6c378c29064630bb9.1761658310.git.zhengqi.arch@bytedance.com>
2025-11-07 5:11 ` [PATCH v1 04/26] mm: vmscan: refactor move_folios_to_lru() Harry Yoo
2025-11-07 6:41 ` Qi Zheng
2025-11-07 13:20 ` Harry Yoo
2025-11-08 6:32 ` Shakeel Butt
2025-11-10 2:13 ` Harry Yoo
2025-11-10 4:30 ` Qi Zheng
2025-11-10 5:43 ` Harry Yoo
2025-11-10 6:11 ` Qi Zheng
2025-11-10 16:47 ` Shakeel Butt
2025-11-11 0:42 ` Harry Yoo
2025-11-11 3:04 ` Qi Zheng
2025-11-11 3:16 ` Harry Yoo
2025-11-11 3:23 ` Qi Zheng
2025-11-11 8:49 ` Sebastian Andrzej Siewior
2025-11-11 16:44 ` Shakeel Butt
2025-11-12 7:49 ` Sebastian Andrzej Siewior
2025-11-12 8:46 ` Harry Yoo
2025-11-12 8:54 ` Sebastian Andrzej Siewior
2025-11-12 15:45 ` Steven Rostedt
2025-11-11 3:17 ` Shakeel Butt
2025-11-11 3:24 ` Qi Zheng
2025-11-07 7:18 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).