* [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task()
@ 2014-09-22 18:32 Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
Combine two branches which do the same.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
kernel/sched/fair.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a1e6ac..420bc98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5292,24 +5292,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (!tsk_cache_hot)
tsk_cache_hot = migrate_degrades_locality(p, env);
- if (migrate_improves_locality(p, env)) {
-#ifdef CONFIG_SCHEDSTATS
+ if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
+ env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
if (tsk_cache_hot) {
schedstat_inc(env->sd, lb_hot_gained[env->idle]);
schedstat_inc(p, se.statistics.nr_forced_migrations);
}
-#endif
- return 1;
- }
-
- if (!tsk_cache_hot ||
- env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
-
- if (tsk_cache_hot) {
- schedstat_inc(env->sd, lb_hot_gained[env->idle]);
- schedstat_inc(p, se.statistics.nr_forced_migrations);
- }
-
return 1;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/6] sched: Do not pick a task which is switching on other cpu
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
@ 2014-09-22 18:32 ` Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
Architectures, which define __ARCH_WANT_UNLOCKED_CTXSW,
may pull a task when it's in the middle of schedule().
CPU1(task1 calls schedule) CPU2
... schedule()
... idle_balance()
... load_balance()
... ...
schedule() ...
prepare_lock_switch() ...
raw_spin_unlock(&rq1->lock) ...
... raw_spin_lock(&rq1->lock)
... detach_tasks();
... can_migrate_task(task1)
... attach_tasks(); <--- move task1 to rq2
... raw_spin_unlock(&rq1->lock)
... context_switch() <--- switch to task1's stack
... ...
(using task1's stack) (using task1's stack)
... ...
context_switch() ...
Parallel use of a single stack is not a good idea.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # Should this go to stable?
---
kernel/sched/core.c | 11 +++--------
kernel/sched/deadline.c | 7 ++++++-
kernel/sched/fair.c | 3 +++
kernel/sched/rt.c | 7 ++++++-
kernel/sched/sched.h | 16 ++++++++++++++++
5 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a93b87..5b864e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1700,15 +1700,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
#ifdef CONFIG_SMP
/*
- * If the owning (remote) cpu is still in the middle of schedule() with
- * this task as prev, wait until its done referencing the task.
+ * Note, that p is dequeued at the moment. But it still
+ * may be "prev" in the middle of schedule() on other cpu.
*/
- while (p->on_cpu)
- cpu_relax();
- /*
- * Pairs with the smp_wmb() in finish_lock_switch().
- */
- smp_rmb();
+ cpu_relax__while_on_cpu(p);
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aaa5abb..ea0ba33 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1364,7 +1364,9 @@ static int push_dl_task(struct rq *rq)
next_task = task;
goto retry;
}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+ cpu_relax__while_on_cpu(next_task);
+#endif
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
activate_task(later_rq, next_task, 0);
@@ -1451,6 +1453,9 @@ static int pull_dl_task(struct rq *this_rq)
ret = 1;
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+ cpu_relax__while_on_cpu(p);
+#endif
deactivate_task(src_rq, p, 0);
set_task_cpu(p, this_cpu);
activate_task(this_rq, p, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 420bc98..80c5064 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5298,6 +5298,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
schedstat_inc(env->sd, lb_hot_gained[env->idle]);
schedstat_inc(p, se.statistics.nr_forced_migrations);
}
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+ cpu_relax__while_on_cpu(p);
+#endif
return 1;
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e6a774..de356b0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1734,7 +1734,9 @@ static int push_rt_task(struct rq *rq)
next_task = task;
goto retry;
}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+ cpu_relax__while_on_cpu(next_task);
+#endif
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
@@ -1823,6 +1825,9 @@ static int pull_rt_task(struct rq *this_rq)
ret = 1;
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+ cpu_relax__while_on_cpu(p);
+#endif
deactivate_task(src_rq, p, 0);
set_task_cpu(p, this_cpu);
activate_task(this_rq, p, 0);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1bc6aad..9c07d72 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1034,6 +1034,22 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
#endif /* __ARCH_WANT_UNLOCKED_CTXSW */
/*
+ * If the owning (remote) cpu is still in the middle of schedule() with
+ * this task as prev, wait until its done referencing the task.
+ */
+static inline void cpu_relax__while_on_cpu(struct task_struct *p)
+{
+#ifdef CONFIG_SMP
+ while (p->on_cpu)
+ cpu_relax();
+ /*
+ * Pairs with the smp_wmb() in finish_lock_switch().
+ */
+ smp_rmb();
+#endif
+}
+
+/*
* wake flags
*/
#define WF_SYNC 0x01 /* waker goes to sleep after wakeup */
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/6] sched: Use dl_bw_of() under RCU read lock
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
@ 2014-09-22 18:32 ` Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free isn't zero here.
Also add lockdep assert into dl_bw_cpus().
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
kernel/sched/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b864e9..a300fce 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1989,6 +1989,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
#ifdef CONFIG_SMP
inline struct dl_bw *dl_bw_of(int i)
{
+ rcu_lockdep_assert(rcu_read_lock_sched_held(),
+ "sched RCU must be held");
return &cpu_rq(i)->rd->dl_bw;
}
@@ -1997,6 +1999,8 @@ static inline int dl_bw_cpus(int i)
struct root_domain *rd = cpu_rq(i)->rd;
int cpus = 0;
+ rcu_lockdep_assert(rcu_read_lock_sched_held(),
+ "sched RCU must be held");
for_each_cpu_and(i, rd->span, cpu_active_mask)
cpus++;
@@ -7623,6 +7627,8 @@ static int sched_dl_global_constraints(void)
int cpu, ret = 0;
unsigned long flags;
+ rcu_read_lock();
+
/*
* Here we want to check the bandwidth not being set to some
* value smaller than the currently allocated bandwidth in
@@ -7644,6 +7650,8 @@ static int sched_dl_global_constraints(void)
break;
}
+ rcu_read_unlock();
+
return ret;
}
@@ -7659,6 +7667,7 @@ static void sched_dl_do_global(void)
if (global_rt_runtime() != RUNTIME_INF)
new_bw = to_ratio(global_rt_period(), global_rt_runtime());
+ rcu_read_lock();
/*
* FIXME: As above...
*/
@@ -7669,6 +7678,7 @@ static void sched_dl_do_global(void)
dl_b->bw = new_bw;
raw_spin_unlock_irqrestore(&dl_b->lock, flags);
}
+ rcu_read_unlock();
}
static int sched_rt_global_validate(void)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
@ 2014-09-22 18:32 ` Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
Nothing is locked there, so label's name only confuses a reader.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
kernel/sched/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a300fce..3b07710 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4003,14 +4003,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
rcu_read_lock();
if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
rcu_read_unlock();
- goto out_unlock;
+ goto out_free_new_mask;
}
rcu_read_unlock();
}
retval = security_task_setscheduler(p);
if (retval)
- goto out_unlock;
+ goto out_free_new_mask;
cpuset_cpus_allowed(p, cpus_allowed);
@@ -4028,7 +4028,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
retval = -EBUSY;
- goto out_unlock;
+ goto out_free_new_mask;
}
}
#endif
@@ -4047,7 +4047,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
goto again;
}
}
-out_unlock:
+out_free_new_mask:
free_cpumask_var(new_mask);
out_free_cpus_allowed:
free_cpumask_var(cpus_allowed);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
` (2 preceding siblings ...)
2014-09-22 18:32 ` [PATCH v2 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
@ 2014-09-22 18:32 ` Kirill Tkhai
2014-09-22 18:34 ` Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
4 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
Probability of NULL pointer derefference isn't zero in this place.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
kernel/sched/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b07710..643ee99 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4023,13 +4023,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
* root_domain.
*/
#ifdef CONFIG_SMP
- if (task_has_dl_policy(p)) {
- const struct cpumask *span = task_rq(p)->rd->span;
-
- if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
+ if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+ rcu_read_lock();
+ if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
retval = -EBUSY;
+ rcu_read_unlock();
goto out_free_new_mask;
}
+ rcu_read_unlock();
}
#endif
again:
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 6/6] sched/rt: Use resched_curr() in task_tick_rt()
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
` (3 preceding siblings ...)
2014-09-22 18:32 ` [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-22 18:32 ` Kirill Tkhai
4 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:32 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
From: Kirill Tkhai <ktkhai@parallels.com>
Some time ago PREEMPT_NEED_RESCHED was implemented,
so reschedule technics is a little more difficult now.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
kernel/sched/rt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index de356b0..c322071 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2077,7 +2077,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
for_each_sched_rt_entity(rt_se) {
if (rt_se->run_list.prev != rt_se->run_list.next) {
requeue_task_rt(rq, p, 0);
- set_tsk_need_resched(p);
+ resched_curr(rq);
return;
}
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock
2014-09-22 18:32 ` [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-22 18:34 ` Kirill Tkhai
0 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:34 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai
22.09.2014, 22:32, "Kirill Tkhai" <tkhai@yandex.ru>:
> From: Kirill Tkhai <ktkhai@parallels.com>
>
> task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
> Probability of NULL pointer derefference isn't zero in this place.
Wrong comment, sorry. I'll resend
>
> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> Cc: <stable@vger.kernel.org> # v3.14+
> ---
> kernel/sched/core.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3b07710..643ee99 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4023,13 +4023,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
> * root_domain.
> */
> #ifdef CONFIG_SMP
> - if (task_has_dl_policy(p)) {
> - const struct cpumask *span = task_rq(p)->rd->span;
> -
> - if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
> + if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
> + rcu_read_lock();
> + if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
> retval = -EBUSY;
> + rcu_read_unlock();
> goto out_free_new_mask;
> }
> + rcu_read_unlock();
> }
> #endif
> again:
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-09-22 18:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-22 18:32 [PATCH v2 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
2014-09-22 18:34 ` Kirill Tkhai
2014-09-22 18:32 ` [PATCH v2 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox