From: Paul Turner <pjt@google.com>
To: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Bharata B Rao <bharata@linux.vnet.ibm.com>,
Dhaval Giani <dhaval.giani@gmail.com>,
Balbir Singh <bsingharora@gmail.com>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Ingo Molnar <mingo@elte.hu>, Pavel Emelyanov <xemul@openvz.org>,
Jason Baron <jbaron@redhat.com>
Subject: [patch 09/18] sched: add support for unthrottling group entities
Date: Thu, 21 Jul 2011 09:43:34 -0700 [thread overview]
Message-ID: <20110721184757.574628950@google.com> (raw)
In-Reply-To: 20110721164325.231521704@google.com
[-- Attachment #1: sched-bwc-unthrottle_entities.patch --]
[-- Type: text/plain, Size: 5369 bytes --]
At the start of each period we refresh the global bandwidth pool. At this time
we must also unthrottle any cfs_rq entities who are now within bandwidth once
more (as quota permits).
Unthrottled entities have their corresponding cfs_rq->throttled flag cleared
and their entities re-enqueued.
Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
kernel/sched.c | 3 +
kernel/sched_fair.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 126 insertions(+), 4 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -9008,6 +9008,9 @@ static int tg_set_cfs_bandwidth(struct t
raw_spin_lock_irq(&rq->lock);
cfs_rq->runtime_enabled = runtime_enabled;
cfs_rq->runtime_remaining = 0;
+
+ if (cfs_rq_throttled(cfs_rq))
+ unthrottle_cfs_rq(cfs_rq);
raw_spin_unlock_irq(&rq->lock);
}
out_unlock:
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1461,6 +1461,84 @@ static __used void throttle_cfs_rq(struc
raw_spin_unlock(&cfs_b->lock);
}
+static void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
+{
+ struct rq *rq = rq_of(cfs_rq);
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+ struct sched_entity *se;
+ int enqueue = 1;
+ long task_delta;
+
+ se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
+
+ cfs_rq->throttled = 0;
+ raw_spin_lock(&cfs_b->lock);
+ list_del_rcu(&cfs_rq->throttled_list);
+ raw_spin_unlock(&cfs_b->lock);
+
+ if (!cfs_rq->load.weight)
+ return;
+
+ task_delta = cfs_rq->h_nr_running;
+ for_each_sched_entity(se) {
+ if (se->on_rq)
+ enqueue = 0;
+
+ cfs_rq = cfs_rq_of(se);
+ if (enqueue)
+ enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
+ cfs_rq->h_nr_running += task_delta;
+
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+ }
+
+ if (!se)
+ rq->nr_running += task_delta;
+
+ /* determine whether we need to wake up potentially idle cpu */
+ if (rq->curr == rq->idle && rq->cfs.nr_running)
+ resched_task(rq->curr);
+}
+
+static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
+ u64 remaining, u64 expires)
+{
+ struct cfs_rq *cfs_rq;
+ u64 runtime = remaining;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
+ throttled_list) {
+ struct rq *rq = rq_of(cfs_rq);
+
+ raw_spin_lock(&rq->lock);
+ if (!cfs_rq_throttled(cfs_rq))
+ goto next;
+
+ runtime = -cfs_rq->runtime_remaining + 1;
+ if (runtime > remaining)
+ runtime = remaining;
+ remaining -= runtime;
+
+ cfs_rq->runtime_remaining += runtime;
+ cfs_rq->runtime_expires = expires;
+
+ /* we check whether we're throttled above */
+ if (cfs_rq->runtime_remaining > 0)
+ unthrottle_cfs_rq(cfs_rq);
+
+next:
+ raw_spin_unlock(&rq->lock);
+
+ if (!remaining)
+ break;
+ }
+ rcu_read_unlock();
+
+ return remaining;
+}
+
/*
* Responsible for refilling a task_group's bandwidth and unthrottling its
* cfs_rqs as appropriate. If there has been no activity within the last
@@ -1469,23 +1547,64 @@ static __used void throttle_cfs_rq(struc
*/
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
{
- int idle = 1;
+ u64 runtime, runtime_expires;
+ int idle = 1, throttled;
raw_spin_lock(&cfs_b->lock);
/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
goto out_unlock;
- idle = cfs_b->idle;
+ throttled = !list_empty(&cfs_b->throttled_cfs_rq);
+ /* idle depends on !throttled (for the case of a large deficit) */
+ idle = cfs_b->idle && !throttled;
+
/* if we're going inactive then everything else can be deferred */
if (idle)
goto out_unlock;
__refill_cfs_bandwidth_runtime(cfs_b);
+ if (!throttled) {
+ /* mark as potentially idle for the upcoming period */
+ cfs_b->idle = 1;
+ goto out_unlock;
+ }
+
+ /*
+ * There are throttled entities so we must first use the new bandwidth
+ * to unthrottle them before making it generally available. This
+ * ensures that all existing debts will be paid before a new cfs_rq is
+ * allowed to run.
+ */
+ runtime = cfs_b->runtime;
+ runtime_expires = cfs_b->runtime_expires;
+ cfs_b->runtime = 0;
+
+ /*
+ * This check is repeated as we are holding onto the new bandwidth
+ * while we unthrottle. This can potentially race with an unthrottled
+ * group trying to acquire new bandwidth from the global pool.
+ */
+ while (throttled && runtime > 0) {
+ raw_spin_unlock(&cfs_b->lock);
+ /* we can't nest cfs_b->lock while distributing bandwidth */
+ runtime = distribute_cfs_runtime(cfs_b, runtime,
+ runtime_expires);
+ raw_spin_lock(&cfs_b->lock);
+
+ throttled = !list_empty(&cfs_b->throttled_cfs_rq);
+ }
- /* mark as potentially idle for the upcoming period */
- cfs_b->idle = 1;
+ /* return (any) remaining runtime */
+ cfs_b->runtime = runtime;
+ /*
+ * While we are ensured activity in the period following an
+ * unthrottle, this also covers the case in which the new bandwidth is
+ * insufficient to cover the existing bandwidth deficit. (Forcing the
+ * timer to remain active while there are any throttled entities.)
+ */
+ cfs_b->idle = 0;
out_unlock:
if (idle)
cfs_b->timer_active = 0;
next prev parent reply other threads:[~2011-07-21 23:02 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-21 16:43 [patch 00/18] CFS Bandwidth Control v7.2 Paul Turner
2011-07-21 16:43 ` [patch 01/18] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-07-22 11:06 ` Kamalesh Babulal
2011-07-21 16:43 ` [patch 02/18] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-08-14 16:15 ` [tip:sched/core] sched: Implement " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 03/18] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-07-22 11:14 ` Kamalesh Babulal
2011-08-14 16:17 ` [tip:sched/core] sched: Introduce " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 04/18] sched: validate CFS quota hierarchies Paul Turner
2011-08-14 16:19 ` [tip:sched/core] sched: Validate " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 05/18] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-08-14 16:21 ` [tip:sched/core] sched: Accumulate " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 06/18] sched: add a timer to handle CFS bandwidth refresh Paul Turner
2011-08-14 16:23 ` [tip:sched/core] sched: Add " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 07/18] sched: expire invalid runtime Paul Turner
2011-08-14 16:24 ` [tip:sched/core] sched: Expire " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 08/18] sched: add support for throttling group entities Paul Turner
2011-08-08 15:46 ` Lin Ming
2011-08-08 16:00 ` Peter Zijlstra
2011-08-08 16:16 ` Paul Turner
2011-08-14 16:26 ` [tip:sched/core] sched: Add " tip-bot for Paul Turner
2011-07-21 16:43 ` Paul Turner [this message]
2011-08-14 16:27 ` [tip:sched/core] sched: Add support for unthrottling " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 10/18] sched: allow for positional tg_tree walks Paul Turner
2011-08-14 16:29 ` [tip:sched/core] sched: Allow " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 11/18] sched: prevent interactions with throttled entities Paul Turner
2011-07-22 11:26 ` Kamalesh Babulal
2011-07-22 11:37 ` Peter Zijlstra
2011-07-22 11:41 ` Kamalesh Babulal
2011-07-22 11:43 ` Peter Zijlstra
2011-07-22 18:16 ` Kamalesh Babulal
2011-08-14 16:30 ` [tip:sched/core] sched: Prevent " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 12/18] sched: prevent buddy " Paul Turner
2011-08-14 16:32 ` [tip:sched/core] sched: Prevent " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 13/18] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-08-14 16:34 ` [tip:sched/core] sched: Migrate " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 14/18] sched: throttle entities exceeding their allowed bandwidth Paul Turner
2011-08-14 16:35 ` [tip:sched/core] sched: Throttle " tip-bot for Paul Turner
2011-07-21 16:43 ` [patch 15/18] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-08-14 16:37 ` [tip:sched/core] sched: Add " tip-bot for Nikhil Rao
2011-07-21 16:43 ` [patch 16/18] sched: return unused runtime on group dequeue Paul Turner
2011-08-14 16:39 ` [tip:sched/core] sched: Return " tip-bot for Paul Turner
2011-07-21 16:43 ` [RFT][patch 17/18] sched: use jump labels to reduce overhead when bandwidth control is inactive Paul Turner
2011-07-21 16:43 ` [patch 18/18] sched: add documentation for bandwidth control Paul Turner
2011-08-14 16:41 ` [tip:sched/core] sched: Add " tip-bot for Bharata B Rao
2011-07-21 23:01 ` [patch 00/18] CFS Bandwidth Control v7.2 Paul Turner
2011-07-25 14:58 ` Peter Zijlstra
2011-07-25 15:00 ` Peter Zijlstra
2011-07-25 16:21 ` Paul E. McKenney
2011-07-25 16:28 ` Peter Zijlstra
2011-07-25 16:46 ` Paul E. McKenney
2011-07-25 17:08 ` Peter Zijlstra
2011-07-25 17:11 ` Dhaval Giani
2011-07-25 17:35 ` Peter Zijlstra
2011-07-28 2:59 ` Paul Turner
2011-09-13 12:10 ` Vladimir Davydov
2011-09-13 14:00 ` Peter Zijlstra
2011-09-16 8:06 ` Paul Turner
2011-09-19 8:22 ` Vladimir Davydov
2011-09-19 8:33 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721184757.574628950@google.com \
--to=pjt@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bharata@linux.vnet.ibm.com \
--cc=bsingharora@gmail.com \
--cc=dhaval.giani@gmail.com \
--cc=jbaron@redhat.com \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox