From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
Gabriele Monaco <gmonaco@redhat.com>,
Chen Ridong <chenridong@huawei.com>,
Michal Koutny <mkoutny@suse.com>,
linux-arm-kernel@lists.infradead.org,
linux-block@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
"David S . Miller" <davem@davemloft.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Michal Hocko <mhocko@suse.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Peter Zijlstra <peterz@infradead.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Phil Auld <pauld@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Paolo Abeni <pabeni@redhat.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Will Deacon <will@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Waiman Long <longman@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
netdev@vger.kernel.org, Danilo Krummrich <dakr@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, Jens Axboe <axboe@kernel.dk>,
Marco Crivellari <marco.crivellari@suse.com>,
Tejun Heo <tj@kernel.org>, Shakeel Butt <shakeel.butt@linux.dev>,
Simon Horman <horms@kernel.org>,
cgroups@vger.kernel.org, linux-pci@vger.kernel.org
Subject: [PATCH 00/33 v5] cpuset/isolation: Honour kthreads preferred affinity
Date: Wed, 24 Dec 2025 14:44:47 +0100 [thread overview]
Message-ID: <20251224134520.33231-1-frederic@kernel.org> (raw)
Hi,
The kthread code was enhanced lately to provide an infrastructure which
manages the preferred affinity of unbound kthreads (node or custom
cpumask) against housekeeping constraints and CPU hotplug events.
One crucial missing piece is cpuset: when an isolated partition is
created, deleted, or its CPUs updated, all the unbound kthreads in the
top cpuset are affine to _all_ the non-isolated CPUs, possibly breaking
their preferred affinity along the way
Solve this with performing the kthreads affinity update from cpuset to
the kthreads consolidated relevant code instead so that preferred
affinities are honoured.
The dispatch of the new cpumasks to workqueues and kthreads is performed
by housekeeping, as per the nice Tejun's suggestion.
As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
from isolcpus= and cpuset isolated partitions. Housekeeping cpumasks are
now modifyable with specific synchronization. A big step toward making
nohz_full= also mutable through cpuset in the future.
Changes since v4:
* Add more tags
* Rebase on v6.19-rc2 with latest cpuset changes
* Accomodate timers migration isolation
* Rename housekeeping_update() parameter from mask to isol_mask (Chen Ridong)
* Link housekeeping documentation to core-api
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
kthread/core-v5
HEAD: 3c0ee047f05f361f215521424f5e789dfffcafc1
Merry Christmas,
Frederic
---
Frederic Weisbecker (33):
PCI: Prepare to protect against concurrent isolated cpuset change
cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug"
memcg: Prepare to protect against concurrent isolated cpuset change
mm: vmstat: Prepare to protect against concurrent isolated cpuset change
sched/isolation: Save boot defined domain flags
cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT
driver core: cpu: Convert /sys/devices/system/cpu/isolated to use HK_TYPE_DOMAIN_BOOT
net: Keep ignoring isolated cpuset change
block: Protect against concurrent isolated cpuset change
timers/migration: Prevent from lockdep false positive warning
cpu: Provide lockdep check for CPU hotplug lock write-held
cpuset: Provide lockdep check for cpuset lock held
sched/isolation: Convert housekeeping cpumasks to rcu pointers
cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
sched/isolation: Flush memcg workqueues on cpuset isolated partition change
sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
PCI: Flush PCI probe workqueue on cpuset isolated partition change
cpuset: Propagate cpuset isolation update to workqueue through housekeeping
cpuset: Propagate cpuset isolation update to timers through housekeeping
timers/migration: Remove superfluous cpuset isolation test
cpuset: Remove cpuset_cpu_is_isolated()
sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()
PCI: Remove superfluous HK_TYPE_WQ check
kthread: Refine naming of affinity related fields
kthread: Include unbound kthreads in the managed affinity list
kthread: Include kthreadd to the managed affinity list
kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management
sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN
sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
kthread: Honour kthreads preferred affinity after cpuset changes
kthread: Comment on the purpose and placement of kthread_affine_node() call
kthread: Document kthread_affine_preferred()
doc: Add housekeeping documentation
Documentation/core-api/housekeeping.rst | 111 ++++++++++++++++++++++
Documentation/core-api/index.rst | 1 +
arch/arm64/kernel/cpufeature.c | 18 +++-
block/blk-mq.c | 6 +-
drivers/base/cpu.c | 2 +-
drivers/pci/pci-driver.c | 71 ++++++++++----
include/linux/cpu.h | 4 +
include/linux/cpuhplock.h | 1 +
include/linux/cpuset.h | 8 +-
include/linux/kthread.h | 1 +
include/linux/memcontrol.h | 4 +
include/linux/mmu_context.h | 2 +-
include/linux/pci.h | 3 +
include/linux/percpu-rwsem.h | 1 +
include/linux/sched/isolation.h | 16 +++-
include/linux/vmstat.h | 2 +
include/linux/workqueue.h | 2 +-
init/Kconfig | 1 +
kernel/cgroup/cpuset.c | 68 +++++++-------
kernel/cpu.c | 42 ++++-----
kernel/kthread.c | 160 +++++++++++++++++++++-----------
kernel/sched/isolation.c | 144 +++++++++++++++++++++++-----
kernel/sched/sched.h | 4 +
kernel/time/timer_migration.c | 25 +++--
kernel/workqueue.c | 17 ++--
mm/memcontrol.c | 25 ++++-
mm/vmstat.c | 15 ++-
net/core/net-sysfs.c | 2 +-
28 files changed, 554 insertions(+), 202 deletions(-)
next reply other threads:[~2025-12-24 13:45 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-24 13:44 Frederic Weisbecker [this message]
2025-12-24 13:44 ` [PATCH 01/33] PCI: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-29 3:23 ` Zhang Qiao
2025-12-29 3:53 ` Waiman Long
2025-12-24 13:44 ` [PATCH 02/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-26 23:56 ` Tejun Heo
2025-12-24 13:44 ` [PATCH 04/33] mm: vmstat: " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 05/33] sched/isolation: Save boot defined domain flags Frederic Weisbecker
2025-12-25 22:27 ` Waiman Long
2025-12-24 13:44 ` [PATCH 06/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Frederic Weisbecker
2025-12-25 22:31 ` Waiman Long
2025-12-24 13:44 ` [PATCH 07/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 08/33] net: Keep ignoring isolated cpuset change Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 09/33] block: Protect against concurrent " Frederic Weisbecker
2025-12-30 0:37 ` Jens Axboe
2025-12-24 13:44 ` [PATCH 10/33] timers/migration: Prevent from lockdep false positive warning Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 11/33] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 12/33] cpuset: Provide lockdep check for cpuset lock held Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 13/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-12-26 2:24 ` Waiman Long
2025-12-26 3:20 ` Waiman Long
2025-12-26 8:08 ` Chen Ridong
2025-12-24 13:45 ` [PATCH 15/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 16/33] sched/isolation: Flush vmstat " Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 17/33] PCI: Flush PCI probe workqueue " Frederic Weisbecker
2025-12-26 8:48 ` Chen Ridong
2025-12-24 13:45 ` [PATCH 18/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Frederic Weisbecker
2025-12-26 20:31 ` Waiman Long
2025-12-27 0:18 ` Tejun Heo
2025-12-24 13:45 ` [PATCH 19/33] cpuset: Propagate cpuset isolation update to timers " Frederic Weisbecker
2025-12-26 20:40 ` Waiman Long
2025-12-24 13:45 ` [PATCH 20/33] timers/migration: Remove superfluous cpuset isolation test Frederic Weisbecker
2025-12-26 20:45 ` Waiman Long
2025-12-24 13:45 ` [PATCH 21/33] cpuset: Remove cpuset_cpu_is_isolated() Frederic Weisbecker
2025-12-26 20:48 ` Waiman Long
2025-12-24 13:45 ` [PATCH 22/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Frederic Weisbecker
2025-12-26 21:26 ` Waiman Long
2025-12-24 13:45 ` [PATCH 23/33] PCI: Remove superfluous HK_TYPE_WQ check Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 24/33] kthread: Refine naming of affinity related fields Frederic Weisbecker
2025-12-26 21:37 ` Waiman Long
2025-12-24 13:45 ` [PATCH 25/33] kthread: Include unbound kthreads in the managed affinity list Frederic Weisbecker
2025-12-26 22:11 ` Waiman Long
2025-12-24 13:45 ` [PATCH 26/33] kthread: Include kthreadd to " Frederic Weisbecker
2025-12-26 22:13 ` Waiman Long
2025-12-24 13:45 ` [PATCH 27/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Frederic Weisbecker
2025-12-26 22:16 ` Waiman Long
2025-12-24 13:45 ` [PATCH 28/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2025-12-26 23:08 ` Waiman Long
2025-12-24 13:45 ` [PATCH 29/33] sched/arm64: Move fallback task " Frederic Weisbecker
2025-12-26 23:46 ` Waiman Long
2025-12-24 13:45 ` [PATCH 30/33] kthread: Honour kthreads preferred affinity after cpuset changes Frederic Weisbecker
2025-12-26 23:59 ` Waiman Long
2025-12-24 13:45 ` [PATCH 31/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 32/33] kthread: Document kthread_affine_preferred() Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 33/33] doc: Add housekeeping documentation Frederic Weisbecker
2025-12-27 0:39 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251224134520.33231-1-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=dakr@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gmonaco@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).