* [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
@ 2025-05-07 11:14 Chen Yu
2025-05-07 11:17 ` [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads Chen Yu
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Chen Yu @ 2025-05-07 11:14 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Venkat Rao Bagalkote, Jain, Ayush, cgroups,
linux-doc, linux-mm, linux-kernel, Chen Yu, Chen Yu
Introducing the task migration and swap statistics in the following places:
/sys/fs/cgroup/{GROUP}/memory.stat
/proc/{PID}/sched
/proc/vmstat
These statistics facilitate a rapid evaluation of the performance and resource
utilization of the target workload.
Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
because Numa balance only cares about the user pages via VMA.
Patch 2 is the major change to expose the statistics of task migration and
swapping in corresponding files.
The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
thread, which causes NULL pointer exception.
The Tested-by and Acked-by tags are preserved, because these tags are provided
in version 1 which has the p->mm check.
Previous version:
v3:
https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
v2:
https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
v1:
https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/
Chen Yu (1):
sched/numa: add statistics of numa balance task migration
Libo Chen (1):
sched/numa: fix task swap by skipping kernel threads
Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
include/linux/sched.h | 4 ++++
include/linux/vm_event_item.h | 2 ++
kernel/sched/core.c | 9 +++++++--
kernel/sched/debug.c | 4 ++++
kernel/sched/fair.c | 3 ++-
mm/memcontrol.c | 2 ++
mm/vmstat.c | 2 ++
8 files changed, 29 insertions(+), 3 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads
2025-05-07 11:14 [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
@ 2025-05-07 11:17 ` Chen Yu
2025-05-07 11:17 ` [PATCH v4 2/2] sched/numa: add statistics of numa balance task migration Chen Yu
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Chen Yu @ 2025-05-07 11:17 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Venkat Rao Bagalkote, Jain, Ayush, cgroups,
linux-doc, linux-mm, linux-kernel, Chen Yu, Ayush Jain, Chen Yu
From: Libo Chen <libo.chen@oracle.com>
Task swapping is triggered when there are no idle CPUs in
task A's preferred node. In this case, the NUMA load balancer
chooses a task B on A's preferred node and swaps B with A. This
helps improve NUMA locality without introducing load imbalance
between nodes.
In the current implementation, B's NUMA node preference is not
mandatory, and it aims not to increase load imbalance. That is
to say, a kernel thread might be chosen as B. However, kernel
threads are not supposed to be covered by NUMA balancing because
NUMA balancing only considers user pages via VMAs.
Fix this by not considering kernel threads as swap targets in
task_numa_compare(). This can be extended beyond kernel threads
in the future by checking if a swap candidate has a valid NUMA
preference through checking the candidate's numa_preferred_nid
and numa_faults. For now, keep the code simple.
Suggested-by: Michal Koutny <mkoutny@suse.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fb9bf995a47..d1af2e084a2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2273,7 +2273,8 @@ static bool task_numa_compare(struct task_numa_env *env,
rcu_read_lock();
cur = rcu_dereference(dst_rq->curr);
- if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
+ if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur) ||
+ !cur->mm))
cur = NULL;
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] sched/numa: add statistics of numa balance task migration
2025-05-07 11:14 [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-07 11:17 ` [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads Chen Yu
@ 2025-05-07 11:17 ` Chen Yu
2025-05-07 14:32 ` [PATCH v4 0/2] " Venkat Rao Bagalkote
2025-05-08 10:23 ` Venkat Rao Bagalkote
3 siblings, 0 replies; 7+ messages in thread
From: Chen Yu @ 2025-05-07 11:17 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Venkat Rao Bagalkote, Jain, Ayush, cgroups,
linux-doc, linux-mm, linux-kernel, Chen Yu, Chen Yu
On systems with NUMA balancing enabled, it has been found
that tracking task activities resulting from NUMA balancing
is beneficial. NUMA balancing employs two mechanisms for task
migration: one is to migrate a task to an idle CPU within its
preferred node, and the other is to swap tasks located on
different nodes when they are on each other's preferred nodes.
The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However,
it lacks statistics regarding task migration and swapping.
Therefore, relevant counts for task migration and swapping should
be added.
The following two new fields:
numa_task_migrated
numa_task_swapped
will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat
Introducing both per-task and per-memory cgroup (memcg) NUMA
balancing statistics facilitates a rapid evaluation of the
performance and resource utilization of the target workload.
For instance, users can first identify the container with high
NUMA balancing activity and then further pinpoint a specific
task within that group, and subsequently adjust the memory policy
for that task. In short, although it is possible to iterate through
/proc/$pid/sched to locate the problematic task, the introduction
of aggregated NUMA balancing activity for tasks within each memcg
can assist users in identifying the task more efficiently through
a divide-and-conquer approach.
As Libo Chen pointed out, the memcg event relies on the text
names in vmstat_text, and /proc/vmstat generates corresponding items
based on vmstat_text. Thus, the relevant task migration and swapping
events introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in
/proc/vmstat.
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
Populate the /prov/vmstat otherwise the items are all zero.
(Libo)
v2->v3:
Remove unnecessary p->mm check because kernel threads are
not supported by Numa Balancing. (Libo Chen)
v1->v2:
Update the Documentation/admin-guide/cgroup-v2.rst. (Michal)
---
Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
include/linux/sched.h | 4 ++++
include/linux/vm_event_item.h | 2 ++
kernel/sched/core.c | 9 +++++++--
kernel/sched/debug.c | 4 ++++
mm/memcontrol.c | 2 ++
mm/vmstat.c | 2 ++
7 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..d346f3235945 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1670,6 +1670,12 @@ The following nested keys are defined.
numa_hint_faults (npn)
Number of NUMA hinting faults.
+ numa_task_migrated (npn)
+ Number of task migration by NUMA balancing.
+
+ numa_task_swapped (npn)
+ Number of task swap by NUMA balancing.
+
pgdemote_kswapd
Number of pages demoted by kswapd.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac1982893..1c50e30b5c01 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -549,6 +549,10 @@ struct sched_statistics {
u64 nr_failed_migrations_running;
u64 nr_failed_migrations_hot;
u64 nr_forced_migrations;
+#ifdef CONFIG_NUMA_BALANCING
+ u64 numa_task_migrated;
+ u64 numa_task_swapped;
+#endif
u64 nr_wakeups;
u64 nr_wakeups_sync;
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 9e15a088ba38..91a3ce9a2687 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -66,6 +66,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
NUMA_HINT_FAULTS,
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
+ NUMA_TASK_MIGRATE,
+ NUMA_TASK_SWAP,
#endif
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c81cf642dba0..62b033199e9c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3352,6 +3352,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
#ifdef CONFIG_NUMA_BALANCING
static void __migrate_swap_task(struct task_struct *p, int cpu)
{
+ __schedstat_inc(p->stats.numa_task_swapped);
+ count_vm_numa_event(NUMA_TASK_SWAP);
+ count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
+
if (task_on_rq_queued(p)) {
struct rq *src_rq, *dst_rq;
struct rq_flags srf, drf;
@@ -7953,8 +7957,9 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
return -EINVAL;
- /* TODO: This is not properly updating schedstats */
-
+ __schedstat_inc(p->stats.numa_task_migrated);
+ count_vm_numa_event(NUMA_TASK_MIGRATE);
+ count_memcg_event_mm(p->mm, NUMA_TASK_MIGRATE);
trace_sched_move_numa(p, curr_cpu, target_cpu);
return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg);
}
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 56ae54e0ce6a..f971c2af7912 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1206,6 +1206,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
P_SCHEDSTAT(nr_failed_migrations_running);
P_SCHEDSTAT(nr_failed_migrations_hot);
P_SCHEDSTAT(nr_forced_migrations);
+#ifdef CONFIG_NUMA_BALANCING
+ P_SCHEDSTAT(numa_task_migrated);
+ P_SCHEDSTAT(numa_task_swapped);
+#endif
P_SCHEDSTAT(nr_wakeups);
P_SCHEDSTAT(nr_wakeups_sync);
P_SCHEDSTAT(nr_wakeups_migrate);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c96c1f2b9cf5..cdaab8a957f3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -463,6 +463,8 @@ static const unsigned int memcg_vm_event_stat[] = {
NUMA_PAGE_MIGRATE,
NUMA_PTE_UPDATES,
NUMA_HINT_FAULTS,
+ NUMA_TASK_MIGRATE,
+ NUMA_TASK_SWAP,
#endif
};
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4c268ce39ff2..ed08bb384ae4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1347,6 +1347,8 @@ const char * const vmstat_text[] = {
"numa_hint_faults",
"numa_hint_faults_local",
"numa_pages_migrated",
+ "numa_task_migrated",
+ "numa_task_swapped",
#endif
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
2025-05-07 11:14 [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-07 11:17 ` [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads Chen Yu
2025-05-07 11:17 ` [PATCH v4 2/2] sched/numa: add statistics of numa balance task migration Chen Yu
@ 2025-05-07 14:32 ` Venkat Rao Bagalkote
2025-05-07 14:52 ` Chen, Yu C
2025-05-08 10:23 ` Venkat Rao Bagalkote
3 siblings, 1 reply; 7+ messages in thread
From: Venkat Rao Bagalkote @ 2025-05-07 14:32 UTC (permalink / raw)
To: Chen Yu, Peter Zijlstra, Andrew Morton
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Jain, Ayush, cgroups, linux-doc, linux-mm,
linux-kernel, Chen Yu
Hello Chenyu,
On 07/05/25 4:44 pm, Chen Yu wrote:
> Introducing the task migration and swap statistics in the following places:
> /sys/fs/cgroup/{GROUP}/memory.stat
> /proc/{PID}/sched
> /proc/vmstat
>
> These statistics facilitate a rapid evaluation of the performance and resource
> utilization of the target workload.
>
> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
> because Numa balance only cares about the user pages via VMA.
>
> Patch 2 is the major change to expose the statistics of task migration and
> swapping in corresponding files.
>
> The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
> thread, which causes NULL pointer exception.
>
> The Tested-by and Acked-by tags are preserved, because these tags are provided
> in version 1 which has the p->mm check.
I see below tags from version 1 are missing. I think, its contridicting
to the above line. Please correct me, If I am wrong.
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Previous version:
> v3:
> https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
> v2:
> https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
> v1:
> https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/
>
> Chen Yu (1):
> sched/numa: add statistics of numa balance task migration
>
> Libo Chen (1):
> sched/numa: fix task swap by skipping kernel threads
>
> Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
> include/linux/sched.h | 4 ++++
> include/linux/vm_event_item.h | 2 ++
> kernel/sched/core.c | 9 +++++++--
> kernel/sched/debug.c | 4 ++++
> kernel/sched/fair.c | 3 ++-
> mm/memcontrol.c | 2 ++
> mm/vmstat.c | 2 ++
> 8 files changed, 29 insertions(+), 3 deletions(-)
>
For some reason, I am not able to apply this patch on top of
next-20250506. I see patch002 fails to apply. Please find the errors below.
Also, I see tags are changed. Specially Tested-by
Errors:
b4 am cover.1746611892.git.yu.c.chen@intel.com
Grabbing thread from
lore.kernel.org/all/cover.1746611892.git.yu.c.chen@intel.com/t.mbox.gz
Analyzing 3 messages in the thread
Looking for additional code-review trailers on lore.kernel.org
Analyzing 0 code-review messages
Checking attestation on all messages, may take a moment...
---
✓ [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads
✓ [PATCH v4 2/2] sched/numa: add statistics of numa balance task
migration
---
✓ Signed: DKIM/intel.com
---
Total patches: 2
---
Cover:
./v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.cover
Link: https://lore.kernel.org/r/cover.1746611892.git.yu.c.chen@intel.com
Base: not specified
git am
./v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.mbx
# git am -i
v4_20250507_yu_c_chen_sched_numa_add_statistics_of_numa_balance_task_migration.mbx
Commit Body is:
--------------------------
sched/numa: fix task swap by skipping kernel threads
Task swapping is triggered when there are no idle CPUs in
task A's preferred node. In this case, the NUMA load balancer
chooses a task B on A's preferred node and swaps B with A. This
helps improve NUMA locality without introducing load imbalance
between nodes.
In the current implementation, B's NUMA node preference is not
mandatory, and it aims not to increase load imbalance. That is
to say, a kernel thread might be chosen as B. However, kernel
threads are not supposed to be covered by NUMA balancing because
NUMA balancing only considers user pages via VMAs.
Fix this by not considering kernel threads as swap targets in
task_numa_compare(). This can be extended beyond kernel threads
in the future by checking if a swap candidate has a valid NUMA
preference through checking the candidate's numa_preferred_nid
and numa_faults. For now, keep the code simple.
Suggested-by: Michal Koutny <mkoutny@suse.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
--------------------------
Apply? [y]es/[n]o/[e]dit/[v]iew patch/[a]ccept all: a
Applying: sched/numa: fix task swap by skipping kernel threads
Applying: sched/numa: add statistics of numa balance task migration
error: patch failed: Documentation/admin-guide/cgroup-v2.rst:1670
error: Documentation/admin-guide/cgroup-v2.rst: patch does not apply
error: patch failed: include/linux/sched.h:549
error: include/linux/sched.h: patch does not apply
error: patch failed: include/linux/vm_event_item.h:66
error: include/linux/vm_event_item.h: patch does not apply
error: patch failed: kernel/sched/core.c:3352
error: kernel/sched/core.c: patch does not apply
error: patch failed: kernel/sched/debug.c:1206
error: kernel/sched/debug.c: patch does not apply
error: patch failed: mm/memcontrol.c:463
error: mm/memcontrol.c: patch does not apply
error: patch failed: mm/vmstat.c:1347
error: mm/vmstat.c: patch does not apply
Patch failed at 0002 sched/numa: add statistics of numa balance task
migration
Am I missing anything? Please suggest.
Regards,
Venkat.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
2025-05-07 14:32 ` [PATCH v4 0/2] " Venkat Rao Bagalkote
@ 2025-05-07 14:52 ` Chen, Yu C
2025-05-07 15:28 ` Venkat Rao Bagalkote
0 siblings, 1 reply; 7+ messages in thread
From: Chen, Yu C @ 2025-05-07 14:52 UTC (permalink / raw)
To: Venkat Rao Bagalkote
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Jain, Ayush, cgroups, linux-doc, linux-mm,
linux-kernel, Chen Yu, Peter Zijlstra, Andrew Morton
Hi Venkat,
On 5/7/2025 10:32 PM, Venkat Rao Bagalkote wrote:
> Hello Chenyu,
>
>
> On 07/05/25 4:44 pm, Chen Yu wrote:
>> Introducing the task migration and swap statistics in the following
>> places:
>> /sys/fs/cgroup/{GROUP}/memory.stat
>> /proc/{PID}/sched
>> /proc/vmstat
>>
>> These statistics facilitate a rapid evaluation of the performance and
>> resource
>> utilization of the target workload.
>>
>> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
>> because Numa balance only cares about the user pages via VMA.
>>
>> Patch 2 is the major change to expose the statistics of task migration
>> and
>> swapping in corresponding files.
>>
>> The reason to fold patch 1 and patch 2 into 1 patch set is that patch
>> 1 is
>> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
>> thread, which causes NULL pointer exception.
>>
>> The Tested-by and Acked-by tags are preserved, because these tags are
>> provided
>> in version 1 which has the p->mm check.
>
> I see below tags from version 1 are missing. I think, its contridicting
> to the above line. Please correct me, If I am wrong.
>
>
> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
These tags are in the patch 2/2, because Madadi and Prateek mainly
tested patch 2/2.
>
> For some reason, I am not able to apply this patch on top of
> next-20250506. I see patch002 fails to apply. Please find the errors below.
>
next-20250507 should be OK(I just checked on top of commit 08710e696081).
next-20250506 might still have the old patch 2/2, and next-20250507 has
reverted it.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
2025-05-07 14:52 ` Chen, Yu C
@ 2025-05-07 15:28 ` Venkat Rao Bagalkote
0 siblings, 0 replies; 7+ messages in thread
From: Venkat Rao Bagalkote @ 2025-05-07 15:28 UTC (permalink / raw)
To: Chen, Yu C
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Jain, Ayush, cgroups, linux-doc, linux-mm,
linux-kernel, Chen Yu, Peter Zijlstra, Andrew Morton
On 07/05/25 8:22 pm, Chen, Yu C wrote:
> Hi Venkat,
>
> On 5/7/2025 10:32 PM, Venkat Rao Bagalkote wrote:
>> Hello Chenyu,
>>
>>
>> On 07/05/25 4:44 pm, Chen Yu wrote:
>>> Introducing the task migration and swap statistics in the following
>>> places:
>>> /sys/fs/cgroup/{GROUP}/memory.stat
>>> /proc/{PID}/sched
>>> /proc/vmstat
>>>
>>> These statistics facilitate a rapid evaluation of the performance
>>> and resource
>>> utilization of the target workload.
>>>
>>> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
>>> because Numa balance only cares about the user pages via VMA.
>>>
>>> Patch 2 is the major change to expose the statistics of task
>>> migration and
>>> swapping in corresponding files.
>>>
>>> The reason to fold patch 1 and patch 2 into 1 patch set is that
>>> patch 1 is
>>> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
>>> thread, which causes NULL pointer exception.
>>>
>>> The Tested-by and Acked-by tags are preserved, because these tags
>>> are provided
>>> in version 1 which has the p->mm check.
>>
>> I see below tags from version 1 are missing. I think, its
>> contridicting to the above line. Please correct me, If I am wrong.
>>
>>
>> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>>
>
> These tags are in the patch 2/2, because Madadi and Prateek mainly
> tested patch 2/2.
Understood. Thanks for clarification.
>
>>
>> For some reason, I am not able to apply this patch on top of
>> next-20250506. I see patch002 fails to apply. Please find the errors
>> below.
>>
>
> next-20250507 should be OK(I just checked on top of commit 08710e696081).
> next-20250506 might still have the old patch 2/2, and next-20250507 has
> reverted it.
>
With next-20250507, there is a build issue [1]
<https://lore.kernel.org/all/1bcc235f-b139-4423-a7bd-2dd16065e08c@linux.ibm.com/>
, I will test this, once the build issue fixed.
Regards,
Venkat.
> thanks,
> Chenyu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration
2025-05-07 11:14 [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
` (2 preceding siblings ...)
2025-05-07 14:32 ` [PATCH v4 0/2] " Venkat Rao Bagalkote
@ 2025-05-08 10:23 ` Venkat Rao Bagalkote
3 siblings, 0 replies; 7+ messages in thread
From: Venkat Rao Bagalkote @ 2025-05-08 10:23 UTC (permalink / raw)
To: Chen Yu, Peter Zijlstra, Andrew Morton
Cc: mkoutny, Ingo Molnar, Tejun Heo, Johannes Weiner, Jonathan Corbet,
Mel Gorman, Michal Hocko, Muchun Song, Roman Gushchin,
Shakeel Butt, Chen, Tim C, Aubrey Li, Libo Chen, K Prateek Nayak,
Madadi Vineeth Reddy, Jain, Ayush, cgroups, linux-doc, linux-mm,
linux-kernel, Chen Yu
On 07/05/25 4:44 pm, Chen Yu wrote:
> Introducing the task migration and swap statistics in the following places:
> /sys/fs/cgroup/{GROUP}/memory.stat
> /proc/{PID}/sched
> /proc/vmstat
>
> These statistics facilitate a rapid evaluation of the performance and resource
> utilization of the target workload.
>
> Patch 1 is a fix from Libo to avoid task swapping for kernel threads,
> because Numa balance only cares about the user pages via VMA.
>
> Patch 2 is the major change to expose the statistics of task migration and
> swapping in corresponding files.
>
> The reason to fold patch 1 and patch 2 into 1 patch set is that patch 1 is
> necessary for patch 2 to avoid accessing a NULL mm_struct from a kernel
> thread, which causes NULL pointer exception.
>
> The Tested-by and Acked-by tags are preserved, because these tags are provided
> in version 1 which has the p->mm check.
>
> Previous version:
> v3:
> https://lore.kernel.org/lkml/20250430103623.3349842-1-yu.c.chen@intel.com/
> v2:
> https://lore.kernel.org/lkml/20250408101444.192519-1-yu.c.chen@intel.com/
> v1:
> https://lore.kernel.org/lkml/20250402010611.3204674-1-yu.c.chen@intel.com/
>
> Chen Yu (1):
> sched/numa: add statistics of numa balance task migration
>
> Libo Chen (1):
> sched/numa: fix task swap by skipping kernel threads
>
> Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
> include/linux/sched.h | 4 ++++
> include/linux/vm_event_item.h | 2 ++
> kernel/sched/core.c | 9 +++++++--
> kernel/sched/debug.c | 4 ++++
> kernel/sched/fair.c | 3 ++-
> mm/memcontrol.c | 2 ++
> mm/vmstat.c | 2 ++
> 8 files changed, 29 insertions(+), 3 deletions(-)
>
Hello Chenyu,
Tested this patch by applying on top of next-20250507, and it fixes the
NULL pointer exception error on IBM Power9 system. Hence,
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Regards,
Venkat.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-05-08 10:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 11:14 [PATCH v4 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-07 11:17 ` [PATCH v4 1/2] sched/numa: fix task swap by skipping kernel threads Chen Yu
2025-05-07 11:17 ` [PATCH v4 2/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-07 14:32 ` [PATCH v4 0/2] " Venkat Rao Bagalkote
2025-05-07 14:52 ` Chen, Yu C
2025-05-07 15:28 ` Venkat Rao Bagalkote
2025-05-08 10:23 ` Venkat Rao Bagalkote
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).