public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chiluk <chiluk+linux@indeed.com>
To: Ben Segall <bsegall@google.com>, Phil Auld <pauld@redhat.com>,
	Peter Oskolkov <posk@posk.io>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	Brendan Gregg <bgregg@netflix.com>, Kyle Anderson <kwa@yelp.com>,
	Gabriel Munos <gmunoz@netflix.com>,
	John Hammond <jhammond@indeed.com>,
	Cong Wang <xiyou.wangcong@gmail.com>
Subject: [PATCH v4 1/1] sched/fair: Return all runtime when cfs_b has very little remaining.
Date: Mon, 24 Jun 2019 10:50:04 -0500	[thread overview]
Message-ID: <1561391404-14450-2-git-send-email-chiluk+linux@indeed.com> (raw)
In-Reply-To: <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com>

It has been observed, that highly-threaded, user-interactive
applications running under cpu.cfs_quota_us constraints can hit a high
percentage of periods throttled while simultaneously not consuming the
allocated amount of quota. This impacts user-interactive non-cpu bound
applications, such as those running in kubernetes or mesos when run on
multiple cores.

This has been root caused to threads being allocated per cpu bandwidth
slices, and then not fully using that slice within the period. This
results in min_cfs_rq_runtime remaining on each per-cpu cfs_rq. At the
end of the period this remaining quota goes unused and expires. This
expiration of unused time on per-cpu runqueues results in applications
under-utilizing their quota while simultaneously hitting throttling.

The solution is to return all spare cfs_rq->runtime_remaining when
cfs_b->runtime nears the sched_cfs_bandwidth_slice. This balances the
desire to prevent cfs_rq from always pulling quota with the desire to
allow applications to fully utilize their quota.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com>
---
 kernel/sched/fair.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f..4894eda 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4695,7 +4695,9 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun, u
 	return 1;
 }
 
-/* a cfs_rq won't donate quota below this amount */
+/* a cfs_rq won't donate quota below this amount unless cfs_b has very little
+ * remaining runtime.
+ */
 static const u64 min_cfs_rq_runtime = 1 * NSEC_PER_MSEC;
 /* minimum remaining period time to redistribute slack quota */
 static const u64 min_bandwidth_expiration = 2 * NSEC_PER_MSEC;
@@ -4743,16 +4745,27 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
 static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 {
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
-	s64 slack_runtime = cfs_rq->runtime_remaining - min_cfs_rq_runtime;
+	s64 slack_runtime = cfs_rq->runtime_remaining;
 
+	/* There is no runtime to return. */
 	if (slack_runtime <= 0)
 		return;
 
 	raw_spin_lock(&cfs_b->lock);
 	if (cfs_b->quota != RUNTIME_INF &&
 	    cfs_rq->runtime_expires == cfs_b->runtime_expires) {
-		cfs_b->runtime += slack_runtime;
+		/* As we near 0 quota remaining on cfs_b start returning all
+		 * remaining runtime. This avoids stranding and then expiring
+		 * runtime on per-cpu cfs_rq.
+		 *
+		 * cfs->b has plenty of runtime leave min_cfs_rq_runtime of
+		 * runtime on this cfs_rq.
+		 */
+		if (cfs_b->runtime >= sched_cfs_bandwidth_slice() * 3 &&
+		    slack_runtime > min_cfs_rq_runtime)
+			slack_runtime -= min_cfs_rq_runtime;
 
+		cfs_b->runtime += slack_runtime;
 		/* we are under rq->lock, defer unthrottling using a timer */
 		if (cfs_b->runtime > sched_cfs_bandwidth_slice() &&
 		    !list_empty(&cfs_b->throttled_cfs_rq))
-- 
1.8.3.1


  parent reply	other threads:[~2019-06-24 15:50 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 19:30 [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu slices Dave Chiluk
2019-05-23 18:44 ` [PATCH v2 0/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-05-23 18:44   ` [PATCH v2 1/1] " Dave Chiluk
2019-05-23 21:01     ` Peter Oskolkov
2019-05-24 14:32       ` Phil Auld
2019-05-24 15:14         ` Dave Chiluk
2019-05-24 15:59           ` Phil Auld
2019-05-24 16:28           ` Peter Oskolkov
2019-05-24 21:35             ` Dave Chiluk
2019-05-24 22:07               ` Peter Oskolkov
2019-05-28 22:25                 ` Dave Chiluk
2019-05-24  8:55     ` Peter Zijlstra
2019-05-29 19:08 ` [PATCH v3 0/1] " Dave Chiluk
2019-05-29 19:08   ` [PATCH v3 1/1] " Dave Chiluk
2019-05-29 19:28     ` Phil Auld
2019-05-29 19:50     ` bsegall
2019-05-29 21:05     ` bsegall
2019-05-30 17:53       ` Dave Chiluk
2019-05-30 20:44         ` bsegall
     [not found] ` <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com>
2019-06-24 15:50   ` Dave Chiluk [this message]
2019-06-24 17:33     ` [PATCH v4 1/1] sched/fair: Return all runtime when cfs_b has very little remaining bsegall
2019-06-26 22:10       ` Dave Chiluk
2019-06-27 20:18         ` bsegall
2019-06-27 19:09 ` [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-06-27 19:49 ` [PATCH v5 0/1] " Dave Chiluk
2019-06-27 19:49   ` [PATCH v5 1/1] " Dave Chiluk
2019-07-01 20:15     ` bsegall
2019-07-11  9:51       ` Peter Zijlstra
2019-07-11 17:46         ` bsegall
     [not found]           ` <CAC=E7cV4sO50NpYOZ06n_BkZTcBqf1KQp83prc+oave3ircBrw@mail.gmail.com>
2019-07-12 18:01             ` bsegall
2019-07-12 22:09             ` bsegall
2019-07-15 15:44               ` Dave Chiluk
2019-07-16 19:58     ` bsegall
2019-07-23 16:44 ` [PATCH v6 0/1] " Dave Chiluk
2019-07-23 16:44   ` [PATCH v6 1/1] " Dave Chiluk
2019-07-23 17:13     ` Phil Auld
2019-07-23 22:12       ` Dave Chiluk
2019-07-23 23:26         ` Phil Auld
2019-07-26 18:14       ` Peter Zijlstra
2019-08-08 10:53     ` [tip:sched/core] " tip-bot for Dave Chiluk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1561391404-14450-2-git-send-email-chiluk+linux@indeed.com \
    --to=chiluk+linux@indeed.com \
    --cc=bgregg@netflix.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gmunoz@netflix.com \
    --cc=jhammond@indeed.com \
    --cc=kwa@yelp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=posk@posk.io \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox