From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Juri Lelli <juri.lelli@redhat.com>,
Waiman Long <longman@redhat.com>, Tejun Heo <tj@kernel.org>,
"Qais Yousef (Google)" <qyousef@layalina.io>
Subject: [PATCH 5.15 75/89] sched/cpuset: Bring back cpuset_mutex
Date: Mon, 28 Aug 2023 12:14:16 +0200 [thread overview]
Message-ID: <20230828101152.730868872@linuxfoundation.org> (raw)
In-Reply-To: <20230828101150.163430842@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli <juri.lelli@redhat.com>
commit 111cd11bbc54850f24191c52ff217da88a5e639b upstream.
Turns out percpu_cpuset_rwsem - commit 1243dc518c9d ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Conflict in kernel/cgroup/cpuset.c due to pulling changes in functions
or comments that don't exist on this branch. Remove a BUG_ON() for rwsem
that doesn't exist on mainline. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/cpuset.h | 8 +-
kernel/cgroup/cpuset.c | 149 ++++++++++++++++++++++++-------------------------
kernel/sched/core.c | 22 ++++---
3 files changed, 93 insertions(+), 86 deletions(-)
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -56,8 +56,8 @@ extern void cpuset_init_smp(void);
extern void cpuset_force_rebuild(void);
extern void cpuset_update_active_cpus(void);
extern void cpuset_wait_for_hotplug(void);
-extern void cpuset_read_lock(void);
-extern void cpuset_read_unlock(void);
+extern void cpuset_lock(void);
+extern void cpuset_unlock(void);
extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
@@ -179,8 +179,8 @@ static inline void cpuset_update_active_
static inline void cpuset_wait_for_hotplug(void) { }
-static inline void cpuset_read_lock(void) { }
-static inline void cpuset_read_unlock(void) { }
+static inline void cpuset_lock(void) { }
+static inline void cpuset_unlock(void) { }
static inline void cpuset_cpus_allowed(struct task_struct *p,
struct cpumask *mask)
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -312,22 +312,23 @@ static struct cpuset top_cpuset = {
if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
/*
- * There are two global locks guarding cpuset structures - cpuset_rwsem and
+ * There are two global locks guarding cpuset structures - cpuset_mutex and
* callback_lock. We also require taking task_lock() when dereferencing a
* task's cpuset pointer. See "The task_lock() exception", at the end of this
- * comment. The cpuset code uses only cpuset_rwsem write lock. Other
- * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to
- * prevent change to cpuset structures.
+ * comment. The cpuset code uses only cpuset_mutex. Other kernel subsystems
+ * can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset
+ * structures. Note that cpuset_mutex needs to be a mutex as it is used in
+ * paths that rely on priority inheritance (e.g. scheduler - on RT) for
+ * correctness.
*
* A task must hold both locks to modify cpusets. If a task holds
- * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it
- * is the only task able to also acquire callback_lock and be able to
- * modify cpusets. It can perform various checks on the cpuset structure
- * first, knowing nothing will change. It can also allocate memory while
- * just holding cpuset_rwsem. While it is performing these checks, various
- * callback routines can briefly acquire callback_lock to query cpusets.
- * Once it is ready to make the changes, it takes callback_lock, blocking
- * everyone else.
+ * cpuset_mutex, it blocks others, ensuring that it is the only task able to
+ * also acquire callback_lock and be able to modify cpusets. It can perform
+ * various checks on the cpuset structure first, knowing nothing will change.
+ * It can also allocate memory while just holding cpuset_mutex. While it is
+ * performing these checks, various callback routines can briefly acquire
+ * callback_lock to query cpusets. Once it is ready to make the changes, it
+ * takes callback_lock, blocking everyone else.
*
* Calls to the kernel memory allocator can not be made while holding
* callback_lock, as that would risk double tripping on callback_lock
@@ -349,16 +350,16 @@ static struct cpuset top_cpuset = {
* guidelines for accessing subsystem state in kernel/cgroup.c
*/
-DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem);
+static DEFINE_MUTEX(cpuset_mutex);
-void cpuset_read_lock(void)
+void cpuset_lock(void)
{
- percpu_down_read(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
}
-void cpuset_read_unlock(void)
+void cpuset_unlock(void)
{
- percpu_up_read(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
}
static DEFINE_SPINLOCK(callback_lock);
@@ -396,7 +397,7 @@ static inline bool is_in_v2_mode(void)
* One way or another, we guarantee to return some non-empty subset
* of cpu_online_mask.
*
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_mutex held.
*/
static void guarantee_online_cpus(struct task_struct *tsk,
struct cpumask *pmask)
@@ -438,7 +439,7 @@ out_unlock:
* One way or another, we guarantee to return some non-empty subset
* of node_states[N_MEMORY].
*
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_mutex held.
*/
static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
{
@@ -450,7 +451,7 @@ static void guarantee_online_mems(struct
/*
* update task's spread flag if cpuset's page/slab spread flag is set
*
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_mutex held.
*/
static void cpuset_update_task_spread_flag(struct cpuset *cs,
struct task_struct *tsk)
@@ -471,7 +472,7 @@ static void cpuset_update_task_spread_fl
*
* One cpuset is a subset of another if all its allowed CPUs and
* Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set. Call holding cpuset_rwsem.
+ * are only set if the other's are set. Call holding cpuset_mutex.
*/
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
@@ -580,7 +581,7 @@ static inline void free_cpuset(struct cp
* If we replaced the flag and mask values of the current cpuset
* (cur) with those values in the trial cpuset (trial), would
* our various subset and exclusive rules still be valid? Presumes
- * cpuset_rwsem held.
+ * cpuset_mutex held.
*
* 'cur' is the address of an actual, in-use cpuset. Operations
* such as list traversal that depend on the actual address of the
@@ -703,7 +704,7 @@ static void update_domain_attr_tree(stru
rcu_read_unlock();
}
-/* Must be called with cpuset_rwsem held. */
+/* Must be called with cpuset_mutex held. */
static inline int nr_cpusets(void)
{
/* jump label reference count + the top-level cpuset */
@@ -729,7 +730,7 @@ static inline int nr_cpusets(void)
* domains when operating in the severe memory shortage situations
* that could cause allocation failures below.
*
- * Must be called with cpuset_rwsem held.
+ * Must be called with cpuset_mutex held.
*
* The three key local variables below are:
* cp - cpuset pointer, used (together with pos_css) to perform a
@@ -958,7 +959,7 @@ static void dl_rebuild_rd_accounting(voi
struct cpuset *cs = NULL;
struct cgroup_subsys_state *pos_css;
- percpu_rwsem_assert_held(&cpuset_rwsem);
+ lockdep_assert_held(&cpuset_mutex);
lockdep_assert_cpus_held();
lockdep_assert_held(&sched_domains_mutex);
@@ -1008,7 +1009,7 @@ partition_and_rebuild_sched_domains(int
* 'cpus' is removed, then call this routine to rebuild the
* scheduler's dynamic sched domains.
*
- * Call with cpuset_rwsem held. Takes cpus_read_lock().
+ * Call with cpuset_mutex held. Takes cpus_read_lock().
*/
static void rebuild_sched_domains_locked(void)
{
@@ -1019,7 +1020,7 @@ static void rebuild_sched_domains_locked
int ndoms;
lockdep_assert_cpus_held();
- percpu_rwsem_assert_held(&cpuset_rwsem);
+ lockdep_assert_held(&cpuset_mutex);
/*
* If we have raced with CPU hotplug, return early to avoid
@@ -1070,9 +1071,9 @@ static void rebuild_sched_domains_locked
void rebuild_sched_domains(void)
{
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
rebuild_sched_domains_locked();
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
}
@@ -1081,7 +1082,7 @@ void rebuild_sched_domains(void)
* @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
*
* Iterate through each task of @cs updating its cpus_allowed to the
- * effective cpuset's. As this function is called with cpuset_rwsem held,
+ * effective cpuset's. As this function is called with cpuset_mutex held,
* cpuset membership stays stable.
*/
static void update_tasks_cpumask(struct cpuset *cs)
@@ -1188,7 +1189,7 @@ static int update_parent_subparts_cpumas
int old_prs, new_prs;
bool part_error = false; /* Partition error? */
- percpu_rwsem_assert_held(&cpuset_rwsem);
+ lockdep_assert_held(&cpuset_mutex);
/*
* The parent must be a partition root.
@@ -1358,7 +1359,7 @@ static int update_parent_subparts_cpumas
*
* On legacy hierarchy, effective_cpus will be the same with cpu_allowed.
*
- * Called with cpuset_rwsem held
+ * Called with cpuset_mutex held
*/
static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
{
@@ -1521,7 +1522,7 @@ static void update_sibling_cpumasks(stru
struct cpuset *sibling;
struct cgroup_subsys_state *pos_css;
- percpu_rwsem_assert_held(&cpuset_rwsem);
+ lockdep_assert_held(&cpuset_mutex);
/*
* Check all its siblings and call update_cpumasks_hier()
@@ -1724,12 +1725,12 @@ static void *cpuset_being_rebound;
* @cs: the cpuset in which each task's mems_allowed mask needs to be changed
*
* Iterate through each task of @cs updating its mems_allowed to the
- * effective cpuset's. As this function is called with cpuset_rwsem held,
+ * effective cpuset's. As this function is called with cpuset_mutex held,
* cpuset membership stays stable.
*/
static void update_tasks_nodemask(struct cpuset *cs)
{
- static nodemask_t newmems; /* protected by cpuset_rwsem */
+ static nodemask_t newmems; /* protected by cpuset_mutex */
struct css_task_iter it;
struct task_struct *task;
@@ -1742,7 +1743,7 @@ static void update_tasks_nodemask(struct
* take while holding tasklist_lock. Forks can happen - the
* mpol_dup() cpuset_being_rebound check will catch such forks,
* and rebind their vma mempolicies too. Because we still hold
- * the global cpuset_rwsem, we know that no other rebind effort
+ * the global cpuset_mutex, we know that no other rebind effort
* will be contending for the global variable cpuset_being_rebound.
* It's ok if we rebind the same mm twice; mpol_rebind_mm()
* is idempotent. Also migrate pages in each mm to new nodes.
@@ -1788,7 +1789,7 @@ static void update_tasks_nodemask(struct
*
* On legacy hierarchy, effective_mems will be the same with mems_allowed.
*
- * Called with cpuset_rwsem held
+ * Called with cpuset_mutex held
*/
static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
{
@@ -1841,7 +1842,7 @@ static void update_nodemasks_hier(struct
* mempolicies and if the cpuset is marked 'memory_migrate',
* migrate the tasks pages to the new memory.
*
- * Call with cpuset_rwsem held. May take callback_lock during call.
+ * Call with cpuset_mutex held. May take callback_lock during call.
* Will take tasklist_lock, scan tasklist for tasks in cpuset cs,
* lock each such tasks mm->mmap_lock, scan its vma's and rebind
* their mempolicies to the cpusets new mems_allowed.
@@ -1931,7 +1932,7 @@ static int update_relax_domain_level(str
* @cs: the cpuset in which each task's spread flags needs to be changed
*
* Iterate through each task of @cs updating its spread flags. As this
- * function is called with cpuset_rwsem held, cpuset membership stays
+ * function is called with cpuset_mutex held, cpuset membership stays
* stable.
*/
static void update_tasks_flags(struct cpuset *cs)
@@ -1951,7 +1952,7 @@ static void update_tasks_flags(struct cp
* cs: the cpuset to update
* turning_on: whether the flag is being set or cleared
*
- * Call with cpuset_rwsem held.
+ * Call with cpuset_mutex held.
*/
static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
@@ -2000,7 +2001,7 @@ out:
* cs: the cpuset to update
* new_prs: new partition root state
*
- * Call with cpuset_rwsem held.
+ * Call with cpuset_mutex held.
*/
static int update_prstate(struct cpuset *cs, int new_prs)
{
@@ -2182,7 +2183,7 @@ static int fmeter_getrate(struct fmeter
static struct cpuset *cpuset_attach_old_cs;
-/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */
+/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
static int cpuset_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
@@ -2194,7 +2195,7 @@ static int cpuset_can_attach(struct cgro
cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
cs = css_cs(css);
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
/* allow moving tasks into an empty cpuset if on default hierarchy */
ret = -ENOSPC;
@@ -2218,7 +2219,7 @@ static int cpuset_can_attach(struct cgro
cs->attach_in_progress++;
ret = 0;
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
return ret;
}
@@ -2230,15 +2231,15 @@ static void cpuset_cancel_attach(struct
cgroup_taskset_first(tset, &css);
cs = css_cs(css);
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
cs->attach_in_progress--;
if (!cs->attach_in_progress)
wake_up(&cpuset_attach_wq);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
}
/*
- * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach()
+ * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach()
* but we can't allocate it dynamically there. Define it global and
* allocate from cpuset_init().
*/
@@ -2246,7 +2247,7 @@ static cpumask_var_t cpus_attach;
static void cpuset_attach(struct cgroup_taskset *tset)
{
- /* static buf protected by cpuset_rwsem */
+ /* static buf protected by cpuset_mutex */
static nodemask_t cpuset_attach_nodemask_to;
struct task_struct *task;
struct task_struct *leader;
@@ -2258,7 +2259,7 @@ static void cpuset_attach(struct cgroup_
cs = css_cs(css);
lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
@@ -2310,7 +2311,7 @@ static void cpuset_attach(struct cgroup_
if (!cs->attach_in_progress)
wake_up(&cpuset_attach_wq);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
}
/* The various types of files and directories in a cpuset file system */
@@ -2342,7 +2343,7 @@ static int cpuset_write_u64(struct cgrou
int retval = 0;
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
if (!is_cpuset_online(cs)) {
retval = -ENODEV;
goto out_unlock;
@@ -2378,7 +2379,7 @@ static int cpuset_write_u64(struct cgrou
break;
}
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
return retval;
}
@@ -2391,7 +2392,7 @@ static int cpuset_write_s64(struct cgrou
int retval = -ENODEV;
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
if (!is_cpuset_online(cs))
goto out_unlock;
@@ -2404,7 +2405,7 @@ static int cpuset_write_s64(struct cgrou
break;
}
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
return retval;
}
@@ -2437,7 +2438,7 @@ static ssize_t cpuset_write_resmask(stru
* operation like this one can lead to a deadlock through kernfs
* active_ref protection. Let's break the protection. Losing the
* protection is okay as we check whether @cs is online after
- * grabbing cpuset_rwsem anyway. This only happens on the legacy
+ * grabbing cpuset_mutex anyway. This only happens on the legacy
* hierarchies.
*/
css_get(&cs->css);
@@ -2445,7 +2446,7 @@ static ssize_t cpuset_write_resmask(stru
flush_work(&cpuset_hotplug_work);
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
if (!is_cpuset_online(cs))
goto out_unlock;
@@ -2469,7 +2470,7 @@ static ssize_t cpuset_write_resmask(stru
free_cpuset(trialcs);
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
kernfs_unbreak_active_protection(of->kn);
css_put(&cs->css);
@@ -2602,13 +2603,13 @@ static ssize_t sched_partition_write(str
css_get(&cs->css);
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
if (!is_cpuset_online(cs))
goto out_unlock;
retval = update_prstate(cs, val);
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
css_put(&cs->css);
return retval ?: nbytes;
@@ -2821,7 +2822,7 @@ static int cpuset_css_online(struct cgro
return 0;
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
set_bit(CS_ONLINE, &cs->flags);
if (is_spread_page(parent))
@@ -2872,7 +2873,7 @@ static int cpuset_css_online(struct cgro
cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
spin_unlock_irq(&callback_lock);
out_unlock:
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
return 0;
}
@@ -2893,7 +2894,7 @@ static void cpuset_css_offline(struct cg
struct cpuset *cs = css_cs(css);
cpus_read_lock();
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
if (is_partition_root(cs))
update_prstate(cs, 0);
@@ -2912,7 +2913,7 @@ static void cpuset_css_offline(struct cg
cpuset_dec();
clear_bit(CS_ONLINE, &cs->flags);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
}
@@ -2925,7 +2926,7 @@ static void cpuset_css_free(struct cgrou
static void cpuset_bind(struct cgroup_subsys_state *root_css)
{
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
@@ -2938,7 +2939,7 @@ static void cpuset_bind(struct cgroup_su
}
spin_unlock_irq(&callback_lock);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
}
/*
@@ -2980,8 +2981,6 @@ struct cgroup_subsys cpuset_cgrp_subsys
int __init cpuset_init(void)
{
- BUG_ON(percpu_init_rwsem(&cpuset_rwsem));
-
BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL));
BUG_ON(!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL));
BUG_ON(!zalloc_cpumask_var(&top_cpuset.subparts_cpus, GFP_KERNEL));
@@ -3053,7 +3052,7 @@ hotplug_update_tasks_legacy(struct cpuse
is_empty = cpumask_empty(cs->cpus_allowed) ||
nodes_empty(cs->mems_allowed);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
/*
* Move tasks to the nearest ancestor with execution resources,
@@ -3063,7 +3062,7 @@ hotplug_update_tasks_legacy(struct cpuse
if (is_empty)
remove_tasks_in_empty_cpuset(cs);
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
}
static void
@@ -3113,14 +3112,14 @@ static void cpuset_hotplug_update_tasks(
retry:
wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
/*
* We have raced with task attaching. We wait until attaching
* is finished, so we won't attach a task to an empty cpuset.
*/
if (cs->attach_in_progress) {
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
goto retry;
}
@@ -3198,7 +3197,7 @@ update_tasks:
hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
cpus_updated, mems_updated);
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
}
/**
@@ -3228,7 +3227,7 @@ static void cpuset_hotplug_workfn(struct
if (on_dfl && !alloc_cpumasks(NULL, &tmp))
ptmp = &tmp;
- percpu_down_write(&cpuset_rwsem);
+ mutex_lock(&cpuset_mutex);
/* fetch the available cpus/mems and find out which changed how */
cpumask_copy(&new_cpus, cpu_active_mask);
@@ -3285,7 +3284,7 @@ static void cpuset_hotplug_workfn(struct
update_tasks_nodemask(&top_cpuset);
}
- percpu_up_write(&cpuset_rwsem);
+ mutex_unlock(&cpuset_mutex);
/* if cpus or mems changed, we need to propagate to descendants */
if (cpus_updated || mems_updated) {
@@ -3695,7 +3694,7 @@ void __cpuset_memory_pressure_bump(void)
* - Used for /proc/<pid>/cpuset.
* - No need to task_lock(tsk) on this tsk->cpuset reference, as it
* doesn't really matter if tsk->cpuset changes after we read it,
- * and we take cpuset_rwsem, keeping cpuset_attach() from changing it
+ * and we take cpuset_mutex, keeping cpuset_attach() from changing it
* anyway.
*/
int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7309,6 +7309,7 @@ static int __sched_setscheduler(struct t
int reset_on_fork;
int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
struct rq *rq;
+ bool cpuset_locked = false;
/* The pi code expects interrupts enabled */
BUG_ON(pi && in_interrupt());
@@ -7405,8 +7406,14 @@ recheck:
return retval;
}
- if (pi)
- cpuset_read_lock();
+ /*
+ * SCHED_DEADLINE bandwidth accounting relies on stable cpusets
+ * information.
+ */
+ if (dl_policy(policy) || dl_policy(p->policy)) {
+ cpuset_locked = true;
+ cpuset_lock();
+ }
/*
* Make sure no PI-waiters arrive (or leave) while we are
@@ -7482,8 +7489,8 @@ change:
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;
task_rq_unlock(rq, p, &rf);
- if (pi)
- cpuset_read_unlock();
+ if (cpuset_locked)
+ cpuset_unlock();
goto recheck;
}
@@ -7550,7 +7557,8 @@ change:
task_rq_unlock(rq, p, &rf);
if (pi) {
- cpuset_read_unlock();
+ if (cpuset_locked)
+ cpuset_unlock();
rt_mutex_adjust_pi(p);
}
@@ -7562,8 +7570,8 @@ change:
unlock:
task_rq_unlock(rq, p, &rf);
- if (pi)
- cpuset_read_unlock();
+ if (cpuset_locked)
+ cpuset_unlock();
return retval;
}
next prev parent reply other threads:[~2023-08-28 10:45 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-28 10:13 [PATCH 5.15 00/89] 5.15.129-rc1 review Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 01/89] objtool/x86: Fix SRSO mess Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 02/89] NFSv4.2: fix error handling in nfs42_proc_getxattr Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 03/89] NFSv4: fix out path in __nfs4_get_acl_uncached Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 04/89] xprtrdma: Remap Receive buffers after a reconnect Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 05/89] PCI: acpiphp: Reassign resources on bridge if necessary Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 06/89] dlm: improve plock logging if interrupted Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 07/89] dlm: replace usage of found with dedicated list iterator variable Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 08/89] fs: dlm: add pid to debug log Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 09/89] fs: dlm: change plock interrupted message to debug again Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 10/89] fs: dlm: use dlm_plock_info for do_unlock_close Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 11/89] fs: dlm: fix mismatch of plock results from userspace Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 12/89] MIPS: cpu-features: Enable octeon_cache by cpu_type Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 13/89] MIPS: cpu-features: Use boot_cpu_type for CPU type based features Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 14/89] fbdev: Improve performance of sys_imageblit() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 15/89] fbdev: Fix sys_imageblit() for arbitrary image widths Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 16/89] fbdev: fix potential OOB read in fast_imageblit() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 17/89] ALSA: pcm: Fix potential data race at PCM memory allocation helpers Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 18/89] jbd2: remove t_checkpoint_io_list Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 19/89] jbd2: remove journal_clean_one_cp_list() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 20/89] jbd2: fix a race when checking checkpoint buffer busy Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 21/89] can: raw: fix receiver memory leak Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 22/89] drm/amd/display: do not wait for mpc idle if tg is disabled Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 23/89] drm/amd/display: check TG is non-null before checking if enabled Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 24/89] can: raw: fix lockdep issue in raw_release() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 25/89] tracing: Fix cpu buffers unavailable due to record_disabled missed Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 26/89] tracing: Fix memleak due to race between current_tracer and trace Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 27/89] octeontx2-af: SDP: fix receive link config Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 28/89] sock: annotate data-races around prot->memory_pressure Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 29/89] dccp: annotate data-races in dccp_poll() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 30/89] ipvlan: Fix a reference count leak warning in ipvlan_ns_exit() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 31/89] net: bgmac: Fix return value check for fixed_phy_register() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 32/89] net: bcmgenet: " Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 33/89] net: validate veth and vxcan peer ifindexes Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 34/89] ice: fix receive buffer size miscalculation Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 35/89] igb: Avoid starting unnecessary workqueues Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 36/89] igc: Fix the typo in the PTM Control macro Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 37/89] net/sched: fix a qdisc modification with ambiguous command request Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 38/89] netfilter: nf_tables: flush pending destroy work before netlink notifier Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 39/89] netfilter: nf_tables: fix out of memory error handling Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 40/89] rtnetlink: return ENODEV when ifname does not exist and group is given Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 41/89] rtnetlink: Reject negative ifindexes in RTM_NEWLINK Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 42/89] net: remove bond_slave_has_mac_rcu() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 43/89] bonding: fix macvlan over alb bond support Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 44/89] net/ncsi: make one oem_gma function for all mfr id Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 45/89] net/ncsi: change from ndo_set_mac_address to dev_set_mac_address Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 46/89] Revert "KVM: x86: enable TDP MMU by default" Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 47/89] ibmveth: Use dcbf rather than dcbfl Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 48/89] NFSv4: Fix dropped lock for racing OPEN and delegation return Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 49/89] clk: Fix slab-out-of-bounds error in devm_clk_release() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 50/89] ALSA: ymfpci: Fix the missing snd_card_free() call at probe error Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 51/89] mm: add a call to flush_cache_vmap() in vmap_pfn() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 52/89] NFS: Fix a use after free in nfs_direct_join_group() Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 53/89] nfsd: Fix race to FREE_STATEID and cl_revoked Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 54/89] selinux: set next pointer before attaching to list Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 55/89] batman-adv: Trigger events for auto adjusted MTU Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 56/89] batman-adv: Dont increase MTU when set by user Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 57/89] batman-adv: Do not get eth header before batadv_check_management_packet Greg Kroah-Hartman
2023-08-28 10:13 ` [PATCH 5.15 58/89] batman-adv: Fix TT global entry leak when client roamed back Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 59/89] batman-adv: Fix batadv_v_ogm_aggr_send memory leak Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 60/89] batman-adv: Hold rtnl lock during MTU update via netlink Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 61/89] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 62/89] radix tree: remove unused variable Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 63/89] of: unittest: Fix EXPECT for parse_phandle_with_args_map() test Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 64/89] of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 65/89] media: vcodec: Fix potential array out-of-bounds in encoder queue_setup Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 66/89] PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 67/89] drm/vmwgfx: Fix shader stage validation Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 68/89] drm/display/dp: Fix the DP DSC Receiver cap size Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 69/89] x86/fpu: Invalidate FPU state correctly on exec() Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 70/89] x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4 Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 71/89] nfs: use vfs setgid helper Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 72/89] nfsd: " Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 73/89] torture: Fix hang during kthread shutdown phase Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 74/89] cgroup/cpuset: Rename functions dealing with DEADLINE accounting Greg Kroah-Hartman
2023-08-28 10:14 ` Greg Kroah-Hartman [this message]
2023-08-28 10:14 ` [PATCH 5.15 76/89] sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 77/89] cgroup/cpuset: Iterate only if DEADLINE tasks are present Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 78/89] sched/deadline: Create DL BW alloc, free & check overflow interface Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 79/89] cgroup/cpuset: Free DL BW in case can_attach() fails Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 80/89] drm/i915: Fix premature release of requests reusable memory Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 81/89] can: raw: add missing refcount for memory leak fix Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 82/89] scsi: snic: Fix double free in snic_tgt_create() Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 83/89] scsi: core: raid_class: Remove raid_component_add() Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 84/89] clk: Fix undefined reference to `clk_rate_exclusive_{get,put} Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 85/89] pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function} Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 86/89] dma-buf/sw_sync: Avoid recursive lock during fence signal Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 87/89] mm: memory-failure: kill soft_offline_free_page() Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 88/89] mm: memory-failure: fix unexpected return value in soft_offline_page() Greg Kroah-Hartman
2023-08-28 10:14 ` [PATCH 5.15 89/89] mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer Greg Kroah-Hartman
2023-08-29 1:57 ` [PATCH 5.15 00/89] 5.15.129-rc1 review SeongJae Park
2023-08-29 9:05 ` Naresh Kamboju
2023-08-29 9:36 ` Naresh Kamboju
2023-08-29 10:00 ` Harshit Mogalapalli
2023-08-29 11:50 ` Sudip Mukherjee (Codethink)
2023-08-29 14:13 ` Shuah Khan
2023-08-29 18:52 ` Florian Fainelli
2023-08-29 23:26 ` Ron Economos
2023-08-30 2:24 ` Guenter Roeck
2023-08-30 10:24 ` Jon Hunter
2023-08-30 13:19 ` Joel Fernandes
2023-08-30 16:32 ` Allen Pais
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230828101152.730868872@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=juri.lelli@redhat.com \
--cc=longman@redhat.com \
--cc=patches@lists.linux.dev \
--cc=qyousef@layalina.io \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).