All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: initialize runtime to non-zero on cfs bw set
@ 2013-02-08  7:10 Vladimir Davydov
  2013-02-08 14:46 ` Paul Turner
  2013-02-08 15:17 ` [tip:sched/urgent] sched: Initialize cfs_rq-> runtime_remaining " tip-bot for Vladimir Davydov
  0 siblings, 2 replies; 5+ messages in thread
From: Vladimir Davydov @ 2013-02-08  7:10 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Turner; +Cc: Ingo Molnar, devel, linux-kernel

If cfs_rq->runtime_remaining is <= 0 then either
- cfs_rq is throttled and waiting for quota redistribution, or
- cfs_rq is currently executing and will be throttled on
  put_prev_entity, or
- cfs_rq is not throttled and has not executed since its quota was set
  (runtime_remaining is set to 0 on cfs bandwidth reconfiguration).

It is obvious that the last case is rather an exception from the rule
"runtime_remaining<=0 iff cfs_rq is throttled or will be throttled as
soon as it finishes its execution". Moreover, it can lead to a task hang
as follows. If put_prev_task is called immediately after first
pick_next_task after quota was set, "immediately" meaning rq->clock in
both functions is the same, then the corresponding cfs_rq will be
throttled. Besides being unfair (the cfs_rq has not executed in fact),
the quota refilling timer can be idle at that time and it won't be
activated on put_prev_task because update_curr calls
account_cfs_rq_runtime, which activates the timer, only if delta_exec is
strictly positive. As a result we can get a task "running" inside a
throttled cfs_rq which will probably never be unthrottled.

To avoid the problem, the patch makes tg_set_cfs_bandwidth initialize
runtime_remaining of each cfs_rq to 1 instead of 0 so that the cfs_rq
will be throttled only if it has executed for some positive number of
nanoseconds.
--
Several times we had our customers encountered such hangs inside a VM
(seems something is wrong or rather different in time accounting there).
Analyzing crash dumps revealed that hung tasks were running inside
cfs_rq's, which had the following setup

cfs_rq->throttled=1
cfs_rq->runtime_enabled=1
cfs_rq->runtime_remaining=0
cfs_rq->tg->cfs_bandwidth.idle=1
cfs_rq->tg->cfs_bandwidth.timer_active=0

which conforms pretty nice to the explanation given above.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..c7a078f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7686,7 +7686,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
 
 		raw_spin_lock_irq(&rq->lock);
 		cfs_rq->runtime_enabled = runtime_enabled;
-		cfs_rq->runtime_remaining = 0;
+		cfs_rq->runtime_remaining = 1;
 
 		if (cfs_rq->throttled)
 			unthrottle_cfs_rq(cfs_rq);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-02-08 16:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-08  7:10 [PATCH] sched: initialize runtime to non-zero on cfs bw set Vladimir Davydov
2013-02-08 14:46 ` Paul Turner
2013-02-08 15:26   ` Vladimir Davydov
     [not found]   ` <BEF8F492-C44F-43F1-AB39-EA498A0063EA@parallels.com>
2013-02-08 16:32     ` Vladimir Davydov
2013-02-08 15:17 ` [tip:sched/urgent] sched: Initialize cfs_rq-> runtime_remaining " tip-bot for Vladimir Davydov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.