Re: [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: bsegall@google.com
Cc: Roman Gushchin <klamm@yandex-team.ru>,
	linux-kernel@vger.kernel.org, pjt@google.com,
	chris.j.arges@canonical.com, gregkh@linuxfoundation.org,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock
Date: Wed, 21 May 2014 09:58:44 +0200	[thread overview]
Message-ID: <20140521075844.GE30445@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <xm2638g42rpp.fsf@sword-of-the-dawn.mtv.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

On Tue, May 20, 2014 at 12:08:34PM -0700, bsegall@google.com wrote:
> Yeah, I believe the whole timer_active/cancel-before-start was to not
> lose the timer while still avoiding the HRTIMER_RESTART/hrtimer_start
> race. I'm not sure what, if anything, all the other ~35 users of
> HRTIMER_RESTART do about this. At least some do hrtimer_cancel();
> hrtimer_start(); though.

Yeah, I've not yet had the courage to go look through the tree :-/

That said, I've got a growing suspicion that I'm missing something
obvious and simple here.

> So, this and the other changes will ensure that a period timer is always
> queued. However, this also means that if we race so that we take
> cfs_b->lock before the period timer, but late enough that the interrupt
> has fired and we see !queued, we will hrtimer_forward the entire period
> here.

Right you are, missed that.

> What assign_cfs_rq_runtime does is insufficient - it doesn't
> unthrottle any other cfs_rqs, and since start_bandwidth_timer
> hrtimer_forwarded the period away, neither will
> (do_)sched_cfs_period_timer. (In addition throttle doesn't even do that,
> but that's probably easy enough to fix)

So I had a closer look at the bandwidth bits because of this, and it
strikes me as odd that __refill_cfs_bandwidth_runtime() does an
unconditional fill of cfs_b->runtime.

What I would have expected to see is something like:

	cfs_b->runtime += overruns * cfs_b->quota;
	if (cfs_b->runtime > cfs_b->quota)
		cfs_b->runtime = cfs_b->quota;

Which refills the runtime based on the overruns that happened and
respects the negative runtime accrued from previous inaccuracies.

If you unconditionally top-up like currently happens its possible to
consistently use more time than specified; due to the 1 jiffy throttle
granularity.

> I suppose we could check hrtimer_active and if so record any periods
> start_bandwidth_timer used for sched_cfs_period_timer, which could then
> force a do_sched_cfs_period_timer call. Perhaps something like this
> (equally untested) patch on top of your patch.

You left the new ->stored_periods unused but the idea was clear enough.

Anyway, given that you don't actually use overruns for anything except
pointless stats we could cheat and assume any trigger of the handler is
due to at least 1 overrun, that should trigger the unthrottle code
methinks.

In any case, I think your tracking overruns like that does make sense,
and its something the rt.c code needs too, as it suffers from the same
problem, so maybe we can roll that into the generic
start_bandwidth_timer() thing.

/me goes see if he can track down his suspicion and find shiny simple
truth at the end of it.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

next prev parent reply	other threads:[~2014-05-21  7:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15 16:58 [PATCH] sched: tg_set_cfs_bandwidth() causes rq->lock deadlock Roman Gushchin
2014-05-15 17:43 ` bsegall
2014-05-16  8:38   ` Roman Gushchin
2014-05-16 17:57     ` bsegall
2014-05-19 11:10       ` Roman Gushchin
2014-05-19 17:40         ` bsegall
2014-05-19 10:32     ` Peter Zijlstra
2014-05-19 11:37       ` Roman Gushchin
2014-05-19 17:35         ` bsegall
2014-05-19 17:30       ` bsegall
2014-05-20 13:15         ` Peter Zijlstra
2014-05-20 14:01           ` Peter Zijlstra
2014-05-20 14:55             ` Peter Zijlstra
2014-05-20 14:21           ` Roman Gushchin
2014-05-20 14:28             ` Peter Zijlstra
2014-05-20 14:34               ` Roman Gushchin
2014-05-20 19:08           ` bsegall
2014-05-21  7:58             ` Peter Zijlstra [this message]
2014-05-19 10:40   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140521075844.GE30445@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bsegall@google.com \
    --cc=chris.j.arges@canonical.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=klamm@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox