From: Chen Yu <yu.c.chen@intel.com>
To: peterz@infradead.org, akpm@linux-foundation.org
Cc: mkoutny@suse.com, mingo@redhat.com, tj@kernel.org,
hannes@cmpxchg.org, corbet@lwn.net, mgorman@suse.de,
mhocko@kernel.org, muchun.song@linux.dev,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
tim.c.chen@intel.com, aubrey.li@intel.com, libo.chen@oracle.com,
kprateek.nayak@amd.com, vineethr@linux.ibm.com,
venkat88@linux.ibm.com, ayushjai@amd.com,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
yu.chen.surf@foxmail.com, Ayush Jain <Ayush.jain3@amd.com>,
Chen Yu <yu.c.chen@intel.com>
Subject: [PATCH v5 1/2] sched/numa: fix task swap by skipping kernel threads
Date: Fri, 23 May 2025 20:51:01 +0800 [thread overview]
Message-ID: <eaacc9c9bd37bac92d43a671867d85b2fdad3b06.1748002400.git.yu.c.chen@intel.com> (raw)
In-Reply-To: <cover.1748002400.git.yu.c.chen@intel.com>
From: Libo Chen <libo.chen@oracle.com>
Task swapping is triggered when there are no idle CPUs in
task A's preferred node. In this case, the NUMA load balancer
chooses a task B on A's preferred node and swaps B with A. This
helps improve NUMA locality without introducing load imbalance
between nodes. In the current implementation, B's NUMA node
preference is not mandatory. That is to say, a kernel thread
might be incorrectly chosen as B. However, kernel thread and
user space thread that does not have mm are not supposed to be
covered by NUMA balancing because NUMA balancing only considers
user pages via VMAs.
According to Peter's suggestion for fixing this issue, we use
PF_KTHREAD to skip the kernel thread. curr->mm is also checked
because it is possible that user_mode_thread() might create a
user thread without an mm. As per Prateek's analysis, after
adding the PF_KTHREAD check, there is no need to further check
the PF_IDLE flag:
"
- play_idle_precise() already ensures PF_KTHREAD is set before adding
PF_IDLE
- cpu_startup_entry() is only called from the startup thread which
should be marked with PF_KTHREAD (based on my understanding looking at
commit cff9b2332ab7 ("kernel/sched: Modify initial boot task idle
setup"))
"
In summary, the check in task_numa_compare() now aligns with
task_tick_numa().
Suggested-by: Michal Koutny <mkoutny@suse.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v4->v5:
Add PF_KTHREAD check, and remove PF_IDLE check.
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fb9bf995a47..03d9a49a68b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2273,7 +2273,8 @@ static bool task_numa_compare(struct task_numa_env *env,
rcu_read_lock();
cur = rcu_dereference(dst_rq->curr);
- if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
+ if (cur && ((cur->flags & (PF_EXITING | PF_KTHREAD)) ||
+ !cur->mm))
cur = NULL;
/*
--
2.25.1
next prev parent reply other threads:[~2025-05-23 12:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-23 12:48 [PATCH v5 0/2] sched/numa: add statistics of numa balance task migration Chen Yu
2025-05-23 12:51 ` Chen Yu [this message]
2025-05-23 23:22 ` [PATCH v5 1/2] sched/numa: fix task swap by skipping kernel threads Shakeel Butt
2025-05-23 12:51 ` [PATCH v5 2/2] sched/numa: add statistics of numa balance task Chen Yu
2025-05-23 23:42 ` Shakeel Butt
2025-05-24 9:07 ` Chen, Yu C
2025-05-24 17:32 ` Shakeel Butt
2025-05-25 12:35 ` Chen, Yu C
2025-05-27 17:48 ` Shakeel Butt
2025-05-29 5:04 ` Chen, Yu C
2025-05-26 13:35 ` Michal Koutný
2025-05-27 9:20 ` Chen, Yu C
2025-05-27 18:15 ` Shakeel Butt
2025-06-02 16:53 ` Michal Koutný
2025-06-03 14:46 ` Chen, Yu C
2025-06-17 9:30 ` Michal Koutný
2025-06-19 13:03 ` Chen, Yu C
2025-06-19 14:06 ` Michal Koutný
2025-05-23 22:06 ` [PATCH v5 0/2] sched/numa: add statistics of numa balance task migration Andrew Morton
2025-05-23 23:52 ` Shakeel Butt
2025-05-28 0:21 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eaacc9c9bd37bac92d43a671867d85b2fdad3b06.1748002400.git.yu.c.chen@intel.com \
--to=yu.c.chen@intel.com \
--cc=Ayush.jain3@amd.com \
--cc=akpm@linux-foundation.org \
--cc=aubrey.li@intel.com \
--cc=ayushjai@amd.com \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=kprateek.nayak@amd.com \
--cc=libo.chen@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=peterz@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tim.c.chen@intel.com \
--cc=tj@kernel.org \
--cc=venkat88@linux.ibm.com \
--cc=vineethr@linux.ibm.com \
--cc=yu.chen.surf@foxmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).