* [PATCH 06/27] mm: vmstat: Prepare to protect against concurrent isolated cpuset change
[not found] <20250620152308.27492-1-frederic@kernel.org>
@ 2025-06-20 15:22 ` Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 12/27] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2025-06-20 15:22 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Marco Crivellari,
Michal Hocko, Peter Zijlstra, Tejun Heo, Thomas Gleixner,
Vlastimil Babka, Waiman Long, linux-mm
The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifyable at
runtime. In order to synchronize against vmstat workqueue to make sure
that no asynchronous vmstat work is pending or executing on a newly made
isolated CPU, read-lock the housekeeping rwsem lock while targeting
and queueing a vmstat work.
Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a vmstat
workqueue flush will also be issued in a further change to make sure
that no work remains pending after a CPU had been made isolated.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
mm/vmstat.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 429ae5339bfe..53123675fe31 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2115,11 +2115,13 @@ static void vmstat_shepherd(struct work_struct *w)
* infrastructure ever noticing. Skip regular flushing from vmstat_shepherd
* for all isolated CPUs to avoid interference with the isolated workload.
*/
- if (cpu_is_isolated(cpu))
- continue;
+ scoped_guard(housekeeping) {
+ if (cpu_is_isolated(cpu))
+ continue;
- if (!delayed_work_pending(dw) && need_update(cpu))
- queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
+ if (!delayed_work_pending(dw) && need_update(cpu))
+ queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
+ }
cond_resched();
}
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 12/27] cpu: Provide lockdep check for CPU hotplug lock write-held
[not found] <20250620152308.27492-1-frederic@kernel.org>
2025-06-20 15:22 ` [PATCH 06/27] mm: vmstat: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
@ 2025-06-20 15:22 ` Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 17/27] sched/isolation: Flush vmstat " Frederic Weisbecker
3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2025-06-20 15:22 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Christoph Lameter, Dennis Zhou,
Marco Crivellari, Michal Hocko, Peter Zijlstra, Tejun Heo,
Thomas Gleixner, Vlastimil Babka, Waiman Long, linux-mm
cpuset modifies partitions, including isolated, while holding the cpu
hotplug lock read-held.
This means that write-holding the CPU hotplug lock is safe to
synchronize against housekeeping cpumask changes.
Provide a lockdep check to validate that.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/cpuhplock.h | 1 +
include/linux/percpu-rwsem.h | 1 +
kernel/cpu.c | 5 +++++
3 files changed, 7 insertions(+)
diff --git a/include/linux/cpuhplock.h b/include/linux/cpuhplock.h
index f7aa20f62b87..286b3ab92e15 100644
--- a/include/linux/cpuhplock.h
+++ b/include/linux/cpuhplock.h
@@ -13,6 +13,7 @@
struct device;
extern int lockdep_is_cpus_held(void);
+extern int lockdep_is_cpus_write_held(void);
#ifdef CONFIG_HOTPLUG_CPU
void cpus_write_lock(void);
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 288f5235649a..c8cb010d655e 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -161,6 +161,7 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
__percpu_init_rwsem(sem, #sem, &rwsem_key); \
})
+#define percpu_rwsem_is_write_held(sem) lockdep_is_held_type(sem, 0)
#define percpu_rwsem_is_held(sem) lockdep_is_held(sem)
#define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 069fce6c7eae..ccf11a17c7fd 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -533,6 +533,11 @@ int lockdep_is_cpus_held(void)
{
return percpu_rwsem_is_held(&cpu_hotplug_lock);
}
+
+int lockdep_is_cpus_write_held(void)
+{
+ return percpu_rwsem_is_write_held(&cpu_hotplug_lock);
+}
#endif
static void lockdep_acquire_cpus_lock(void)
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change
[not found] <20250620152308.27492-1-frederic@kernel.org>
2025-06-20 15:22 ` [PATCH 06/27] mm: vmstat: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 12/27] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
@ 2025-06-20 15:22 ` Frederic Weisbecker
2025-06-20 19:30 ` Shakeel Butt
2025-06-20 15:22 ` [PATCH 17/27] sched/isolation: Flush vmstat " Frederic Weisbecker
3 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2025-06-20 15:22 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Johannes Weiner,
Marco Crivellari, Michal Hocko, Michal Hocko, Muchun Song,
Peter Zijlstra, Roman Gushchin, Shakeel Butt, Tejun Heo,
Thomas Gleixner, Vlastimil Babka, Waiman Long, cgroups, linux-mm
The HK_TYPE_DOMAIN housekeeping cpumask is now modifyable at runtime. In
order to synchronize against memcg workqueue to make sure that no
asynchronous draining is still pending or executing on a newly made
isolated CPU, the housekeeping susbsystem must flush the memcg
workqueues.
However the memcg workqueues can't be flushed easily since they are
queued to the main per-CPU workqueue pool.
Solve this with creating a memcg specific pool and provide and use the
appropriate flushing API.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/memcontrol.h | 4 ++++
kernel/sched/isolation.c | 2 ++
kernel/sched/sched.h | 1 +
mm/memcontrol.c | 12 +++++++++++-
4 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 87b6688f124a..ef5036c6bf04 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1046,6 +1046,8 @@ static inline u64 cgroup_id_from_mm(struct mm_struct *mm)
return id;
}
+void mem_cgroup_flush_workqueue(void);
+
extern int mem_cgroup_init(void);
#else /* CONFIG_MEMCG */
@@ -1451,6 +1453,8 @@ static inline u64 cgroup_id_from_mm(struct mm_struct *mm)
return 0;
}
+static inline void mem_cgroup_flush_workqueue(void) { }
+
static inline int mem_cgroup_init(void) { return 0; }
#endif /* CONFIG_MEMCG */
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 7814d60be87e..6fb0c7956516 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -140,6 +140,8 @@ int housekeeping_update(struct cpumask *mask, enum hk_type type)
synchronize_rcu();
+ mem_cgroup_flush_workqueue();
+
kfree(old);
return 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 04094567cad4..53107c021fe9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -44,6 +44,7 @@
#include <linux/lockdep_api.h>
#include <linux/lockdep.h>
#include <linux/memblock.h>
+#include <linux/memcontrol.h>
#include <linux/minmax.h>
#include <linux/mm.h>
#include <linux/module.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 29d44af6c426..928b90cdb5ba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -96,6 +96,8 @@ static bool cgroup_memory_nokmem __ro_after_init;
/* BPF memory accounting disabled? */
static bool cgroup_memory_nobpf __ro_after_init;
+static struct workqueue_struct *memcg_wq __ro_after_init;
+
static struct kmem_cache *memcg_cachep;
static struct kmem_cache *memcg_pn_cachep;
@@ -1979,7 +1981,7 @@ static void schedule_drain_work(int cpu, struct work_struct *work)
{
housekeeping_lock();
if (!cpu_is_isolated(cpu))
- schedule_work_on(cpu, work);
+ queue_work_on(cpu, memcg_wq, work);
housekeeping_unlock();
}
@@ -5140,6 +5142,11 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
refill_stock(memcg, nr_pages);
}
+void mem_cgroup_flush_workqueue(void)
+{
+ flush_workqueue(memcg_wq);
+}
+
static int __init cgroup_memory(char *s)
{
char *token;
@@ -5182,6 +5189,9 @@ int __init mem_cgroup_init(void)
cpuhp_setup_state_nocalls(CPUHP_MM_MEMCQ_DEAD, "mm/memctrl:dead", NULL,
memcg_hotplug_cpu_dead);
+ memcg_wq = alloc_workqueue("memcg", 0, 0);
+ WARN_ON(!memcg_wq);
+
for_each_possible_cpu(cpu) {
INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
drain_local_memcg_stock);
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 17/27] sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
[not found] <20250620152308.27492-1-frederic@kernel.org>
` (2 preceding siblings ...)
2025-06-20 15:22 ` [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
@ 2025-06-20 15:22 ` Frederic Weisbecker
3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2025-06-20 15:22 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Marco Crivellari,
Michal Hocko, Peter Zijlstra, Tejun Heo, Thomas Gleixner,
Vlastimil Babka, Waiman Long, linux-mm
The HK_TYPE_DOMAIN housekeeping cpumask is now modifyable at runtime.
In order to synchronize against vmstat workqueue to make sure
that no asynchronous vmstat work is still pending or executing on a
newly made isolated CPU, the housekeeping susbsystem must flush the
vmstat workqueues.
This involves flushing the whole mm_percpu_wq workqueue, shared with
LRU drain, introducing here a welcome side effect.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/vmstat.h | 2 ++
kernel/sched/isolation.c | 1 +
kernel/sched/sched.h | 1 +
mm/vmstat.c | 5 +++++
4 files changed, 9 insertions(+)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index b2ccb6845595..ba7caacdf356 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -303,6 +303,7 @@ int calculate_pressure_threshold(struct zone *zone);
int calculate_normal_threshold(struct zone *zone);
void set_pgdat_percpu_threshold(pg_data_t *pgdat,
int (*calculate_pressure)(struct zone *));
+void vmstat_flush_workqueue(void);
#else /* CONFIG_SMP */
/*
@@ -403,6 +404,7 @@ static inline void __dec_node_page_state(struct page *page,
static inline void refresh_zone_stat_thresholds(void) { }
static inline void cpu_vm_stats_fold(int cpu) { }
static inline void quiet_vmstat(void) { }
+static inline void vmstat_flush_workqueue(void) { }
static inline void drain_zonestat(struct zone *zone,
struct per_cpu_zonestat *pzstats) { }
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 6fb0c7956516..0119685796be 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,6 +141,7 @@ int housekeeping_update(struct cpumask *mask, enum hk_type type)
synchronize_rcu();
mem_cgroup_flush_workqueue();
+ vmstat_flush_workqueue();
kfree(old);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 53107c021fe9..e2c4258cb818 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -69,6 +69,7 @@
#include <linux/types.h>
#include <linux/u64_stats_sync_api.h>
#include <linux/uaccess.h>
+#include <linux/vmstat.h>
#include <linux/wait_api.h>
#include <linux/wait_bit.h>
#include <linux/workqueue_api.h>
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 53123675fe31..5d462fe12548 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2095,6 +2095,11 @@ static void vmstat_shepherd(struct work_struct *w);
static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd);
+void vmstat_flush_workqueue(void)
+{
+ flush_workqueue(mm_percpu_wq);
+}
+
static void vmstat_shepherd(struct work_struct *w)
{
int cpu;
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change
2025-06-20 15:22 ` [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
@ 2025-06-20 19:30 ` Shakeel Butt
0 siblings, 0 replies; 5+ messages in thread
From: Shakeel Butt @ 2025-06-20 19:30 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Andrew Morton, Ingo Molnar, Johannes Weiner,
Marco Crivellari, Michal Hocko, Michal Hocko, Muchun Song,
Peter Zijlstra, Roman Gushchin, Tejun Heo, Thomas Gleixner,
Vlastimil Babka, Waiman Long, cgroups, linux-mm
On Fri, Jun 20, 2025 at 05:22:57PM +0200, Frederic Weisbecker wrote:
> The HK_TYPE_DOMAIN housekeeping cpumask is now modifyable at runtime. In
> order to synchronize against memcg workqueue to make sure that no
> asynchronous draining is still pending or executing on a newly made
> isolated CPU, the housekeeping susbsystem must flush the memcg
> workqueues.
>
> However the memcg workqueues can't be flushed easily since they are
> queued to the main per-CPU workqueue pool.
>
> Solve this with creating a memcg specific pool and provide and use the
> appropriate flushing API.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-20 19:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250620152308.27492-1-frederic@kernel.org>
2025-06-20 15:22 ` [PATCH 06/27] mm: vmstat: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 12/27] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2025-06-20 15:22 ` [PATCH 16/27] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-06-20 19:30 ` Shakeel Butt
2025-06-20 15:22 ` [PATCH 17/27] sched/isolation: Flush vmstat " Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).