All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: bsegall@google.com
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	Roman Gushchin <klamm@yandex-team.ru>,
	pjt@google.com
Subject: Re: [PATCH RFC] sched/fair: fix sudden expiration of cfq quota in put_prev_task()
Date: Tue, 07 Apr 2015 18:53:38 +0300	[thread overview]
Message-ID: <5523FD82.30006@yandex-team.ru> (raw)
In-Reply-To: <xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com>

On 07.04.2015 01:45, bsegall@google.com wrote:
> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>
>> Pick_next_task_fair() must be sure that here is at least one runnable
>> task before calling put_prev_task(), but put_prev_task() can expire
>> last remains of cfs quota and throttle all currently runnable tasks.
>> As a result pick_next_task_fair() cannot find next task and crashes.
>>
>> This patch leaves 1 in ->runtime_remaining when current assignation
>> expires and tries to refill it right after that. In the worst case
>> task will be scheduled once and throttled at the end of slice.
>>
>
> I don't think expire_cfs_rq_runtime is the problem. What I believe
> happens is this:
>
> /prev/some_task is running, calls schedule() with nr_running == 2.
> pick_next's first do/while loop does update_curr(/) and picks /next, and
> the next iteration just sees check_cfs_rq_runtime(/next), and thus does
> goto simple. However, there is now only /prev/some_task runnable, and it
> hasn't checked the entire prev hierarchy for throttling, thus leading to
> the crash.
>
> This would require that check_cfs_rq_runtime(/next) return true despite
> being on_rq though, which iirc is not supposed to happen (note that we
> do not call update_curr(/next), and it would do nothing if we did,
> because /next isn't part of the current thread's hierarchy). However,
> this /can/ happen if runtime has just been (re)enabled on /next, because
> tg_set_cfs_bandwidth sets runtime_remaining to 0, not 1.

Yeah, this seems possible too.

> The idea was that each rq would grab runtime when they were scheduled
> (pick_next_task_fair didn't ever look at throttling info), so this was
> fine with the old code, but is a problem now. I think it would be
> sufficient to just initialize to 1 in tg_set_cfs_bandwidth. The arguably
> more precise option would be to only check_cfs_rq_runtime if
> cfs_rq->curr is set, but the code is slightly less pretty.

Probably this code should call something like
account_cfs_rq_runtime(cfs_rq, 0);

>
> Could you check this patch to see if it works (or the trivial
> tg_set_bandwidth runtime_remaining = 1 patch)?

I'm not sure that I'll see this bug again: we're replacing this setup 
with something different.

>
> ---8<----->8---
>
>  From f12fa8e981bf1d87cbbc30951bdf27e70c803e25 Mon Sep 17 00:00:00 2001
> From: Ben Segall <bsegall@google.com>
> Date: Mon, 6 Apr 2015 15:28:10 -0700
> Subject: [PATCH] sched: prevent throttle in early pick_next_task_fair
>
> The first call to check_cfs_rq_runtime in pick_next_task_fair is only supposed
> to trigger when cfs_rq is still an ancestor of prev. However, it was able to
> trigger on tgs that had just had bandwidth toggled, because tg_set_cfs_bandwidth
> set runtime_remaining to 0, and check_cfs_rq_runtime doesn't check the global
> pool.
>
> Fix this by only calling check_cfs_rq_runtime if we are still in prev's
> ancestry, as evidenced by cfs_rq->curr.
>
> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Signed-off-by: Ben Segall <bsegall@google.com>
> ---
>   kernel/sched/fair.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ee595ef..5cb52e9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5038,18 +5038,21 @@ again:
>   		 * entity, update_curr() will update its vruntime, otherwise
>   		 * forget we've ever seen it.
>   		 */
> -		if (curr && curr->on_rq)
> -			update_curr(cfs_rq);
> -		else
> -			curr = NULL;
> +		if (curr) {
> +			if (curr->on_rq)
> +				update_curr(cfs_rq);
> +			else
> +				curr = NULL;
>
> -		/*
> -		 * This call to check_cfs_rq_runtime() will do the throttle and
> -		 * dequeue its entity in the parent(s). Therefore the 'simple'
> -		 * nr_running test will indeed be correct.
> -		 */
> -		if (unlikely(check_cfs_rq_runtime(cfs_rq)))
> -			goto simple;
> +			/*
> +			 * This call to check_cfs_rq_runtime() will do the
> +			 * throttle and dequeue its entity in the parent(s).
> +			 * Therefore the 'simple' nr_running test will indeed
> +			 * be correct.
> +			 */
> +			if (unlikely(check_cfs_rq_runtime(cfs_rq)))
> +				goto simple;
> +		}
>
>   		se = pick_next_entity(cfs_rq, curr);
>   		cfs_rq = group_cfs_rq(se);
>


-- 
Konstantin

  reply	other threads:[~2015-04-07 15:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-03 12:41 [PATCH RFC] sched/fair: fix sudden expiration of cfq quota in put_prev_task() Konstantin Khlebnikov
2015-04-03 12:51 ` Konstantin Khlebnikov
2015-04-07 12:52   ` Peter Zijlstra
2015-04-07 13:47     ` Peter Zijlstra
2015-04-07 15:12       ` Peter Zijlstra
2015-04-07 15:32         ` Konstantin Khlebnikov
2015-04-07 15:43           ` Peter Zijlstra
2015-04-06 22:45 ` bsegall
2015-04-07 15:53   ` Konstantin Khlebnikov [this message]
2015-06-07 17:47   ` [tip:sched/core] sched/fair: Prevent throttling in early pick_next_task_fair() tip-bot for Ben Segall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5523FD82.30006@yandex-team.ru \
    --to=khlebnikov@yandex-team.ru \
    --cc=bsegall@google.com \
    --cc=klamm@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.