sched: incorrect argument used in task_hot()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
To: <nickpiggin@yahoo.com.au>, "Ingo Molnar" <mingo@elte.hu>,
	"Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: <linux-kernel@vger.kernel.org>
Subject: sched: incorrect argument used in task_hot()
Date: Tue, 14 Nov 2006 15:00:13 -0800	[thread overview]
Message-ID: <000201c70840$a4902df0$d834030a@amr.corp.intel.com> (raw)

The argument used for task_hot in can_migrate_task() looks wrong:

int can_migrate_task()
{ ...
       if (task_hot(p, rq->timestamp_last_tick, sd))
                return 0;
        return 1;
}

It is not using current time to estimate whether a task is cache-hot
or not on a remote CPU, instead it is using timestamp that the remote
cpu took last timer interrupt.  With timer interrupt staggered, this
under estimate how long a task has been off CPU by a wide margin
compare to its actual value.  The ramification is that tasks should
be considered as cache cold is now being evaluated as cache hot.

We've seen that the effect is especially annoying at HT sched domain
where l-b is not able to freely migrate tasks between sibling CPUs
and leave idle time on the system.

One route to defend that misbehave is to override sd->cache_hot_time
at boot time.  Intuitively, sys admin will set that parameter to zero.
But wait, that is not correct.  It should be set to -1 jiffy because
(rq->timestamp_last_tick - p->last_run) can be negative.  On top of
that one has to convert jiffy to ns when set the parameter. All very
unintuitive and undesirable.

I was very tempted to do:
     now - this_rq->timestamp_last_tick + rq->timestamp_last_tick;

But it is equally flawed that timestamp_last_tick is not synchronized
between this_rq and target_rq. The adjustment is simply inaccurate and
not suitable for load balance decision at HT (or even core) domain.

There are a number of other usages of above adjustment, I think they
are all inaccurate.  Though, most of them are for interactiveness and
can withstand the inaccuracy because it makes decision based on much
larger scale.

So back to the first observation on not enough l-b at HT domain because
of inaccurate time calculation, what would be the best solution to fix
this?

- Ken

next             reply	other threads:[~2006-11-14 23:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-14 23:00 Chen, Kenneth W [this message]
2006-11-17 16:56 ` [rfc patch] Re: sched: incorrect argument used in task_hot() Mike Galbraith
2006-11-17 19:20   ` Chen, Kenneth W
2006-11-17 19:20   ` Ingo Molnar
2006-11-17 19:41     ` Chen, Kenneth W
2006-11-17 21:30     ` Mike Galbraith
2006-11-17 21:39       ` Andrew Morton
2006-11-17 22:18         ` Mike Galbraith
2006-11-18  0:25           ` Chen, Kenneth W
2006-11-18  7:28             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000201c70840$a4902df0$d834030a@amr.corp.intel.com' \
    --to=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox