From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Frederic Weisbecker" <frederic@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Chen Ridong" <chenridong@huawei.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Gabriele Monaco" <gmonaco@redhat.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
"Marco Crivellari" <marco.crivellari@suse.com>,
"Michal Hocko" <mhocko@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Paolo Abeni" <pabeni@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Phil Auld" <pauld@redhat.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Simon Horman" <horms@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Waiman Long" <longman@redhat.com>,
"Will Deacon" <will@kernel.org>,
cgroups@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH 30/33] kthread: Honour kthreads preferred affinity after cpuset changes
Date: Wed, 24 Dec 2025 14:45:17 +0100 [thread overview]
Message-ID: <20251224134520.33231-31-frederic@kernel.org> (raw)
In-Reply-To: <20251224134520.33231-1-frederic@kernel.org>
When cpuset isolated partitions get updated, unbound kthreads get
indifferently affine to all non isolated CPUs, regardless of their
individual affinity preferences.
For example kswapd is a per-node kthread that prefers to be affine to
the node it refers to. Whenever an isolated partition is created,
updated or deleted, kswapd's node affinity is going to be broken if any
CPU in the related node is not isolated because kswapd will be affine
globally.
Fix this with letting the consolidated kthread managed affinity code do
the affinity update on behalf of cpuset.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/kthread.h | 1 +
kernel/cgroup/cpuset.c | 5 ++---
kernel/kthread.c | 41 ++++++++++++++++++++++++++++++----------
kernel/sched/isolation.c | 3 +++
4 files changed, 37 insertions(+), 13 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 8d27403888ce..c92c1149ee6e 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -100,6 +100,7 @@ void kthread_unpark(struct task_struct *k);
void kthread_parkme(void);
void kthread_exit(long result) __noreturn;
void kthread_complete_and_exit(struct completion *, long) __noreturn;
+int kthreads_update_housekeeping(void);
int kthreadd(void *unused);
extern struct task_struct *kthreadd_task;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1cc83a3c25f6..c8cfaf5cd4a1 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1208,11 +1208,10 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
if (top_cs) {
/*
+ * PF_KTHREAD tasks are handled by housekeeping.
* PF_NO_SETAFFINITY tasks are ignored.
- * All per cpu kthreads should have PF_NO_SETAFFINITY
- * flag set, see kthread_set_per_cpu().
*/
- if (task->flags & PF_NO_SETAFFINITY)
+ if (task->flags & (PF_KTHREAD | PF_NO_SETAFFINITY))
continue;
cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus);
} else {
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 968fa5868d21..03008154249c 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -891,14 +891,7 @@ int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
}
EXPORT_SYMBOL_GPL(kthread_affine_preferred);
-/*
- * Re-affine kthreads according to their preferences
- * and the newly online CPU. The CPU down part is handled
- * by select_fallback_rq() which default re-affines to
- * housekeepers from other nodes in case the preferred
- * affinity doesn't apply anymore.
- */
-static int kthreads_online_cpu(unsigned int cpu)
+static int kthreads_update_affinity(bool force)
{
cpumask_var_t affinity;
struct kthread *k;
@@ -924,7 +917,8 @@ static int kthreads_online_cpu(unsigned int cpu)
/*
* Unbound kthreads without preferred affinity are already affine
* to housekeeping, whether those CPUs are online or not. So no need
- * to handle newly online CPUs for them.
+ * to handle newly online CPUs for them. However housekeeping changes
+ * have to be applied.
*
* But kthreads with a preferred affinity or node are different:
* if none of their preferred CPUs are online and part of
@@ -932,7 +926,7 @@ static int kthreads_online_cpu(unsigned int cpu)
* But as soon as one of their preferred CPU becomes online, they must
* be affine to them.
*/
- if (k->preferred_affinity || k->node != NUMA_NO_NODE) {
+ if (force || k->preferred_affinity || k->node != NUMA_NO_NODE) {
kthread_fetch_affinity(k, affinity);
set_cpus_allowed_ptr(k->task, affinity);
}
@@ -943,6 +937,33 @@ static int kthreads_online_cpu(unsigned int cpu)
return ret;
}
+/**
+ * kthreads_update_housekeeping - Update kthreads affinity on cpuset change
+ *
+ * When cpuset changes a partition type to/from "isolated" or updates related
+ * cpumasks, propagate the housekeeping cpumask change to preferred kthreads
+ * affinity.
+ *
+ * Returns 0 if successful, -ENOMEM if temporary mask couldn't
+ * be allocated or -EINVAL in case of internal error.
+ */
+int kthreads_update_housekeeping(void)
+{
+ return kthreads_update_affinity(true);
+}
+
+/*
+ * Re-affine kthreads according to their preferences
+ * and the newly online CPU. The CPU down part is handled
+ * by select_fallback_rq() which default re-affines to
+ * housekeepers from other nodes in case the preferred
+ * affinity doesn't apply anymore.
+ */
+static int kthreads_online_cpu(unsigned int cpu)
+{
+ return kthreads_update_affinity(false);
+}
+
static int kthreads_init(void)
{
return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 84a257d05918..c499474866b8 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -157,6 +157,9 @@ int housekeeping_update(struct cpumask *isol_mask, enum hk_type type)
err = tmigr_isolated_exclude_cpumask(isol_mask);
WARN_ON_ONCE(err < 0);
+ err = kthreads_update_housekeeping();
+ WARN_ON_ONCE(err < 0);
+
kfree(old);
return err;
--
2.51.1
next prev parent reply other threads:[~2025-12-24 13:49 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-24 13:44 [PATCH 00/33 v5] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 01/33] PCI: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-29 3:23 ` Zhang Qiao
2025-12-29 3:53 ` Waiman Long
2025-12-30 22:38 ` Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 02/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-26 23:56 ` Tejun Heo
2025-12-24 13:44 ` [PATCH 04/33] mm: vmstat: " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 05/33] sched/isolation: Save boot defined domain flags Frederic Weisbecker
2025-12-25 22:27 ` Waiman Long
2025-12-24 13:44 ` [PATCH 06/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Frederic Weisbecker
2025-12-25 22:31 ` Waiman Long
2025-12-24 13:44 ` [PATCH 07/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 08/33] net: Keep ignoring isolated cpuset change Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 09/33] block: Protect against concurrent " Frederic Weisbecker
2025-12-30 0:37 ` Jens Axboe
2025-12-24 13:44 ` [PATCH 10/33] timers/migration: Prevent from lockdep false positive warning Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 11/33] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 12/33] cpuset: Provide lockdep check for cpuset lock held Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 13/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-12-26 2:24 ` Waiman Long
2025-12-26 3:20 ` Waiman Long
2025-12-26 8:08 ` Chen Ridong
2025-12-24 13:45 ` [PATCH 15/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 16/33] sched/isolation: Flush vmstat " Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 17/33] PCI: Flush PCI probe workqueue " Frederic Weisbecker
2025-12-26 8:48 ` Chen Ridong
2025-12-24 13:45 ` [PATCH 18/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Frederic Weisbecker
2025-12-26 20:31 ` Waiman Long
2025-12-27 0:18 ` Tejun Heo
2025-12-24 13:45 ` [PATCH 19/33] cpuset: Propagate cpuset isolation update to timers " Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 20/33] timers/migration: Remove superfluous cpuset isolation test Frederic Weisbecker
2025-12-26 20:45 ` Waiman Long
2025-12-24 13:45 ` [PATCH 21/33] cpuset: Remove cpuset_cpu_is_isolated() Frederic Weisbecker
2025-12-26 20:48 ` Waiman Long
2025-12-24 13:45 ` [PATCH 22/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Frederic Weisbecker
2025-12-26 21:26 ` Waiman Long
2025-12-24 13:45 ` [PATCH 23/33] PCI: Remove superfluous HK_TYPE_WQ check Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 24/33] kthread: Refine naming of affinity related fields Frederic Weisbecker
2025-12-26 21:37 ` Waiman Long
2025-12-24 13:45 ` [PATCH 25/33] kthread: Include unbound kthreads in the managed affinity list Frederic Weisbecker
2025-12-26 22:11 ` Waiman Long
2025-12-24 13:45 ` [PATCH 26/33] kthread: Include kthreadd to " Frederic Weisbecker
2025-12-26 22:13 ` Waiman Long
2025-12-24 13:45 ` [PATCH 27/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Frederic Weisbecker
2025-12-26 22:16 ` Waiman Long
2025-12-24 13:45 ` [PATCH 28/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2025-12-26 23:08 ` Waiman Long
2025-12-24 13:45 ` [PATCH 29/33] sched/arm64: Move fallback task " Frederic Weisbecker
2025-12-26 23:46 ` Waiman Long
2025-12-24 13:45 ` Frederic Weisbecker [this message]
2025-12-26 23:59 ` [PATCH 30/33] kthread: Honour kthreads preferred affinity after cpuset changes Waiman Long
2025-12-24 13:45 ` [PATCH 31/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 32/33] kthread: Document kthread_affine_preferred() Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 33/33] doc: Add housekeeping documentation Frederic Weisbecker
2025-12-27 0:39 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251224134520.33231-31-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=dakr@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gmonaco@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).