public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Christoph Lameter <cl@linux.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Gilad Ben-Yossef <gilad@benyossef.com>, Tejun Heo <tj@kernel.org>,
	John Stultz <john.stultz@linaro.org>,
	Mike Frysinger <vapier@gentoo.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Hakan Akkan <hakanakkan@gmail.com>,
	Max Krasnyansky <maxk@qti.qualcomm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [NOHZ] Remove scheduler_tick_max_deferment
Date: Mon, 10 Nov 2014 23:43:10 +0100	[thread overview]
Message-ID: <20141110224306.GC29741@lerouge> (raw)
In-Reply-To: <alpine.DEB.2.11.1411061057470.5591@gentwo.org>

On Thu, Nov 06, 2014 at 11:24:59AM -0600, Christoph Lameter wrote:
> I thought there is already logic in there to compensate for times when the
> tick is off.
> 
> tick_do_update_jiffies64 calculates the time differential and calculates
> the number of ticks from there calling do_timer() with the number of ticks
> that have passed since the last invocation. The global load calculation
> is then also made based on the number of ticks that have passed. So it
> compensates when reenabling. And the load during the dynticks busy period
> is known because one process is monopolizing the processor during that
> time.

jiffies accounting is well handled everywhere. But that's different than the
scheduler.

> > I wont happen, if time_delta is KTIME_MAX and the following checks are
> > not having a timer armed.
> >
> >                  if (unlikely(expires.tv64 == KTIME_MAX)) {
> >                         if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
> >                                 hrtimer_cancel(&ts->sched_timer);
> >                         goto out;
> >                 }
> >
> > Which does either not arm the clockevent device (non highres) or
> > cancels ts->sched_timer (highres).
> >
> > So in that case your timer interrupt will stop completely and therefor
> > the scheduler updates on that cpu wont happen anymore.
> 
> Why is that bad? The load is constant and the timer interrupt can be
> reenabled by the dynticks logic when a system call occurs that requires OS
> services. I thought that was already done that way by Frederic?

Yeah it is. Perf events, RCU, posix cpu timers are examples of things that
are well handled by this tick on demand system. But they are all seperate
things than the scheduler.

> 
> > > Why does the scheduler require that tick? It seems that the processor is
> > > always busy running exactly 1 process when the tick is not
> > > occurring. Anything else will switch on the tick again. So the information
> > > that the scheduler has never becomes outdated.
> >
> > Surely vruntime, load balancing data, load accounting and all the
> > other stuff which contributes to global and local state updates itself
> > magically.
> 
> There is logic in there that compensates when the tick is finally
> reenabled. Load balancing data is already not updated when the tick is
> disabled when the processor is idle right? What is so different here?

That's completely different because idle and busy CPUs may play different
roles in load balancing. Load balancing can be assigned to idle CPUs for
example. But the scheduler still assumes that dynticks CPUs are always idle.
And we certainly don't want to assign load balancing duty to nohz full CPUs.

That too needs some work to be fixed properly.

> 
> > As I said before: It can be delegated to a housekeeper, but this needs
> > to be implemented first before we can remove that function.
> 
> We did not need to housekeeper in the dynticks idle case. What is so
> different about dynticks busy?

Because when a task runs we need some things to move forward: timekeeping
for example. We don't want to update jiffies and gettimeofday from full nohz
syscalls kernel entry. So another CPU has to maintain that.

Probably the game between timekeeping and vdso complicates even further the situation.

> 
> > There is a world outside of vmstat kworker, really.
> 
> Absolutely but I thought the logic is already there to compensate for
> issues like the timer interrupt not occurring.
> 
> I may not have the complete picture of the timer tick processing in my
> mind these days (it has been a lots of years since I did any work there
> after all) but as far as my arguably simplistic reading of the code goes I
> do not see why a housekeeper would be needed there. The load is constant
> and known in the dynticks busy case as it is in the dynticks idle case.

This is because of the general confusion between idle and dynticks.
There is no need for housekeeping if there is no activity at all on
a CPU (idle) and the mind makes a shortcut by considering that dynticks doesn't need
housekeeping.

But housekeeping is needed as long as there is activity and kernel service.
And that's the case whether hz or nohz.

Ok, I confess we moved part of that housekeeping to the syscall/exception/interrupt
entry path. We did that for cputime accounting and RCU. And it's possible to
even do that for timekeeping. But then the kernel entrypoint is going to be extremely
costly. It's worth CPU 0 as a sacrificial lamb.

  parent reply	other threads:[~2014-11-10 22:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-31 16:01 [NOHZ] Remove scheduler_tick_max_deferment Christoph Lameter
2014-11-01 19:18 ` Thomas Gleixner
2014-11-01 21:52   ` Christoph Lameter
2014-11-01 22:33     ` Thomas Gleixner
2014-11-06 17:24       ` Christoph Lameter
2014-11-10  7:11         ` Viresh Kumar
2014-11-10 15:31           ` Paul E. McKenney
2014-11-10 16:21             ` Christoph Lameter
2014-11-10 18:26             ` Christoph Lameter
2014-11-11 17:15               ` Future of NOHZ full/isolation development (was Re: [NOHZ] Remove scheduler_tick_max_deferment) Frederic Weisbecker
2014-11-11 17:39                 ` Paul E. McKenney
2014-11-11 18:00                   ` Christoph Lameter
2014-11-12  6:11                 ` Viresh Kumar
2014-11-12 13:54                   ` Frederic Weisbecker
2014-11-12 14:56                     ` Viresh Kumar
2014-11-12 15:06                       ` Peter Zijlstra
2014-11-12 15:16                         ` Viresh Kumar
2014-11-13  7:22                           ` Viresh Kumar
2014-11-10 16:19           ` [NOHZ] Remove scheduler_tick_max_deferment Christoph Lameter
2014-11-10 22:43         ` Frederic Weisbecker [this message]
2014-11-11 14:58           ` Christoph Lameter
2014-11-11 15:36             ` Frederic Weisbecker
2014-11-11 17:08               ` Christoph Lameter
2014-11-10 20:26     ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141110224306.GC29741@lerouge \
    --to=fweisbec@gmail.com \
    --cc=cl@linux.com \
    --cc=gilad@benyossef.com \
    --cc=hakanakkan@gmail.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qti.qualcomm.com \
    --cc=minchan.kim@gmail.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vapier@gentoo.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox