[RFC v3 PATCH 6/7] sched: Rebalance cfs runtimes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Gautham R Shenoy <ego@in.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Pavel Emelyanov <xemul@openvz.org>,
	Herbert Poetzl <herbert@13thfloor.at>,
	Avi Kivity <avi@redhat.com>, Chris Friesen <cfriesen@nortel.com>,
	Paul Menage <menage@google.com>,
	Mike Waychison <mikew@google.com>
Subject: [RFC v3 PATCH 6/7] sched: Rebalance cfs runtimes
Date: Mon, 9 Nov 2009 14:42:36 +0530	[thread overview]
Message-ID: <20091109091236.GJ23472@in.ibm.com> (raw)
In-Reply-To: <20091109090838.GD23472@in.ibm.com>

sched: CFS runtime borrowing

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Before throttling a group, try to borrow runtime from groups that have excess.

To start with, a group will get equal runtime on every cpu. If the group doesn't
have tasks on all cpus, it might get throttled on some cpus while it still has
runtime left on other cpus where it doesn't have any tasks to consume that
runtime. Hence there is a chance to borrow runtimes from such cpus/cfs_rqs to
cpus/cfs_rqs where it is required.
---
 kernel/sched_fair.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 828d7e7..fc09109 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -214,6 +214,63 @@ static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
 }
 
 /*
+ * Ran out of runtime, check if we can borrow some from others
+ * instead of getting throttled right away.
+ */
+static void do_cfs_balance_runtime(struct cfs_rq *cfs_rq)
+{
+	struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+	const struct cpumask *span = sched_bw_period_mask();
+	int i, weight;
+	u64 cfs_period;
+	struct task_group *tg = container_of(cfs_b, struct task_group,
+				cfs_bandwidth);
+
+	weight = cpumask_weight(span);
+	spin_lock(&cfs_b->cfs_runtime_lock);
+	cfs_period = ktime_to_ns(cfs_b->cfs_period);
+
+	for_each_cpu(i, span) {
+		struct cfs_rq *borrow_cfs_rq = tg->cfs_rq[i];
+		s64 diff;
+
+		if (borrow_cfs_rq == cfs_rq)
+			continue;
+
+		cfs_rq_runtime_lock(borrow_cfs_rq);
+		if (borrow_cfs_rq->cfs_runtime == RUNTIME_INF) {
+			cfs_rq_runtime_unlock(borrow_cfs_rq);
+			continue;
+		}
+
+		diff = borrow_cfs_rq->cfs_runtime - borrow_cfs_rq->cfs_time;
+		if (diff > 0) {
+			diff = div_u64((u64)diff, weight);
+			if (cfs_rq->cfs_runtime + diff > cfs_period)
+				diff = cfs_period - cfs_rq->cfs_runtime;
+			borrow_cfs_rq->cfs_runtime -= diff;
+			cfs_rq->cfs_runtime += diff;
+			if (cfs_rq->cfs_runtime == cfs_period) {
+				cfs_rq_runtime_unlock(borrow_cfs_rq);
+				break;
+			}
+		}
+		cfs_rq_runtime_unlock(borrow_cfs_rq);
+	}
+	spin_unlock(&cfs_b->cfs_runtime_lock);
+}
+
+/*
+ * Called with rq->runtime_lock held.
+ */
+static void cfs_balance_runtime(struct cfs_rq *cfs_rq)
+{
+	cfs_rq_runtime_unlock(cfs_rq);
+	do_cfs_balance_runtime(cfs_rq);
+	cfs_rq_runtime_lock(cfs_rq);
+}
+
+/*
  * Check if group entity exceeded its runtime. If so, mark the cfs_rq as
  * throttled mark the current task for reschedling.
  */
@@ -232,6 +289,9 @@ static void sched_cfs_runtime_exceeded(struct sched_entity *se,
 	if (cfs_rq_throttled(cfs_rq))
 		return;
 
+	if (cfs_rq->cfs_time > cfs_rq->cfs_runtime)
+		cfs_balance_runtime(cfs_rq);
+
 	if (cfs_rq->cfs_time > cfs_rq->cfs_runtime) {
 		cfs_rq->cfs_throttled = 1;
 		update_stats_throttle_start(cfs_rq, se);

next prev parent reply	other threads:[~2009-11-09  9:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-09  9:08 [RFC v3 PATCH 0/7] CFS Hard limits - v3 Bharata B Rao
2009-11-09  9:09 ` [RFC v3 PATCH 1/7] sched: Rename sched_rt_period_mask() and use it in CFS also Bharata B Rao
2009-11-09  9:10 ` [RFC v3 PATCH 2/7] sched: Bandwidth initialization for fair task groups Bharata B Rao
2009-11-09  9:10 ` [RFC v3 PATCH 3/7] sched: Enforce hard limits by throttling Bharata B Rao
2009-11-09  9:11 ` [RFC v3 PATCH 4/7] sched: Unthrottle the throttled tasks Bharata B Rao
2009-11-09  9:11 ` [RFC v3 PATCH 5/7] sched: Add throttle time statistics to /proc/sched_debug Bharata B Rao
2009-11-09  9:12 ` Bharata B Rao [this message]
2009-11-09  9:13 ` [RFC v3 PATCH 7/7] sched: Hard limits documentation Bharata B Rao

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:828d7e7 dfblob:fc09109 )
 OR (
bs:"[RFC v3 PATCH 6/7] sched: Rebalance cfs runtimes" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091109091236.GJ23472@in.ibm.com \
    --to=bharata@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=cfriesen@nortel.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=mikew@google.com \
    --cc=mingo@elte.hu \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox