From: tip-bot for Vladimir Davydov <vdavydov@parallels.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
pjt@google.com, peterz@infradead.org, devel@openvz.org,
tglx@linutronix.de, vdavydov@parallels.com
Subject: [tip:sched/urgent] sched: Initialize cfs_rq-> runtime_remaining to non-zero on cfs bw set
Date: Fri, 8 Feb 2013 07:17:47 -0800 [thread overview]
Message-ID: <tip-0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b@git.kernel.org> (raw)
In-Reply-To: <1360307446-26978-1-git-send-email-vdavydov@parallels.com>
Commit-ID: 0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b
Gitweb: http://git.kernel.org/tip/0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b
Author: Vladimir Davydov <vdavydov@parallels.com>
AuthorDate: Fri, 8 Feb 2013 11:10:46 +0400
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 8 Feb 2013 15:14:38 +0100
sched: Initialize cfs_rq->runtime_remaining to non-zero on cfs bw set
If cfs_rq->runtime_remaining is <= 0 then either
- cfs_rq is throttled and waiting for quota redistribution, or
- cfs_rq is currently executing and will be throttled on put_prev_entity, or
- cfs_rq is not throttled and has not executed since its quota was set
(runtime_remaining is set to 0 on cfs bandwidth reconfiguration).
It is obvious that the last case is rather an exception from the
rule "runtime_remaining<=0 iff cfs_rq is throttled or will be
throttled as soon as it finishes its execution".
Moreover, it can lead to a task hang as follows. If
put_prev_task() is called immediately after first pick_next_task
after quota was set, "immediately" meaning rq->clock in both
functions is the same, then the corresponding cfs_rq will be
throttled.
Besides being unfair (the cfs_rq has not executed in fact), the
quota refilling timer can be idle at that time and it won't be
activated on put_prev_task because update_curr calls
account_cfs_rq_runtime, which activates the timer, only if
delta_exec is strictly positive. As a result we can get a task
"running" inside a throttled cfs_rq which will probably never be
unthrottled.
To avoid the problem, the patch makes tg_set_cfs_bandwidth
initialize runtime_remaining of each cfs_rq to 1 instead of 0 so
that the cfs_rq will be throttled only if it has executed for
some positive number of nanoseconds.
Several times we had our customers encountered such hangs inside
a VM (seems something is wrong or rather different in time
accounting there). Analyzing crash dumps revealed that hung
tasks were running inside cfs_rq's, which had the following
setup:
cfs_rq->throttled=1
cfs_rq->runtime_enabled=1
cfs_rq->runtime_remaining=0
cfs_rq->tg->cfs_bandwidth.idle=1
cfs_rq->tg->cfs_bandwidth.timer_active=0
which conforms pretty nice to the explanation given above.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: <devel@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/1360307446-26978-1-git-send-email-vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..c7a078f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7686,7 +7686,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
raw_spin_lock_irq(&rq->lock);
cfs_rq->runtime_enabled = runtime_enabled;
- cfs_rq->runtime_remaining = 0;
+ cfs_rq->runtime_remaining = 1;
if (cfs_rq->throttled)
unthrottle_cfs_rq(cfs_rq);
prev parent reply other threads:[~2013-02-08 15:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 7:10 [PATCH] sched: initialize runtime to non-zero on cfs bw set Vladimir Davydov
2013-02-08 14:46 ` Paul Turner
2013-02-08 15:26 ` Vladimir Davydov
[not found] ` <BEF8F492-C44F-43F1-AB39-EA498A0063EA@parallels.com>
2013-02-08 16:32 ` Vladimir Davydov
2013-02-08 15:17 ` tip-bot for Vladimir Davydov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-0a702bb8af3c1b2dff355fb3c27e7f7d5285e30b@git.kernel.org \
--to=vdavydov@parallels.com \
--cc=devel@openvz.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.