Re: [PATCH 2/2] sched: update runqueue clock before migrations away

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Chris Redpath <chris.redpath@arm.com>
Cc: pjt@google.com, mingo@redhat.com, alex.shi@linaro.org,
	morten.rasmussen@arm.com, dietmar.eggemann@arm.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] sched: update runqueue clock before migrations away
Date: Tue, 10 Dec 2013 12:48:25 +0100	[thread overview]
Message-ID: <20131210114825.GF12849@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <1386593950-26475-3-git-send-email-chris.redpath@arm.com>

On Mon, Dec 09, 2013 at 12:59:10PM +0000, Chris Redpath wrote:
> If we migrate a sleeping task away from a CPU which has the
> tick stopped, then both the clock_task and decay_counter will
> be out of date for that CPU and we will not decay load correctly
> regardless of how often we update the blocked load.
> 
> This is only an issue for tasks which are not on a runqueue
> (because otherwise that CPU would be awake) and simultaneously
> the CPU the task previously ran on has had the tick stopped.

OK, so the idiot in a hurry (me) isn't quite getting the issue.

Normally we update the blocked averages from the tick; clearly when no
tick, no update. So far so good.

Now, we also update blocked load from idle balance -- which would
include the CPUs without tick through nohz_idle_balance() -- however
this only appears to be done for CONFIG_FAIR_GROUP_SCHED.

Are you running without cgroup muck? If so should we make this
unconditional?

If you have cgroup muck enabled; what's the problem? Don't we run
nohz_idle_balance() frequently enough to be effective for updating the
blocked load?

You also seem to have overlooked NO_HZ_FULL, that can stop a tick even
when there's a running task and makes the situation even more fun.

> @@ -4343,6 +4344,25 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
>  	 * be negative here since on-rq tasks have decay-count == 0.
>  	 */
>  	if (se->avg.decay_count) {
> +		/*
> +		 * If we migrate a sleeping task away from a CPU
> +		 * which has the tick stopped, then both the clock_task
> +		 * and decay_counter will be out of date for that CPU
> +		 * and we will not decay load correctly.
> +		 */
> +		if (!se->on_rq && nohz_test_cpu(task_cpu(p))) {
> +			struct rq *rq = cpu_rq(task_cpu(p));
> +			unsigned long flags;
> +			/*
> +			 * Current CPU cannot be holding rq->lock in this
> +			 * circumstance, but another might be. We must hold
> +			 * rq->lock before we go poking around in its clocks
> +			 */
> +			raw_spin_lock_irqsave(&rq->lock, flags);
> +			update_rq_clock(rq);
> +			update_cfs_rq_blocked_load(cfs_rq, 0);
> +			raw_spin_unlock_irqrestore(&rq->lock, flags);
> +		}
>  		se->avg.decay_count = -__synchronize_entity_decay(se);
>  		atomic_long_add(se->avg.load_avg_contrib,
>  						&cfs_rq->removed_load);

Right, as Ben already said; taking a rq->lock there is unfortunate at
best.

So normally we 'throttle' the expense of decaying the blocked load to
ticks. But the above does it on every (suitable) task migration which
might be far more often.

So ideally we'd get it all sorted through the nohz_idle_balance() path;
what exactly are the problems with that?

next prev parent reply	other threads:[~2013-12-10 11:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-09 12:59 [PATCH 0/2] Per-task load tracking errors Chris Redpath
2013-12-09 12:59 ` [PATCH 1/2] sched: reset blocked load decay_count during synchronization Chris Redpath
2013-12-09 17:59   ` bsegall
2013-12-09 12:59 ` [PATCH 2/2] sched: update runqueue clock before migrations away Chris Redpath
2013-12-09 18:13   ` bsegall
2013-12-10 11:48   ` Peter Zijlstra [this message]
2013-12-10 13:24     ` Chris Redpath
2013-12-10 15:14       ` Peter Zijlstra
2013-12-10 15:55         ` Chris Redpath
2013-12-12 18:24           ` Peter Zijlstra
2013-12-13  8:48             ` Vincent Guittot
2013-12-17 14:09             ` Chris Redpath
2013-12-17 15:51               ` Peter Zijlstra
2013-12-17 18:03               ` bsegall
2013-12-18 10:13                 ` Chris Redpath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131210114825.GF12849@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alex.shi@linaro.org \
    --cc=chris.redpath@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox