public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Jan Glauber <jang@linux.vnet.ibm.com>,
	heiko.carstens@de.ibm.com, Paul Mackerras <paulus@samba.org>
Subject: Re: [accounting regression since rc1]  scheduler updates
Date: Mon, 20 Aug 2007 19:03:58 +0200	[thread overview]
Message-ID: <1187629438.8541.40.camel@localhost> (raw)
In-Reply-To: <20070820154529.GA300@elte.hu>

On Mon, 2007-08-20 at 17:45 +0200, Ingo Molnar wrote: 
> * Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > 1. Jan could finish his sched_clock implementation for s390 and we 
> > would get close to the precise numbers. This would also let CFS make 
> > better decisions. [...]
> 
> i think this is the best option and it should give us the same /proc 
> accuracy on s390 as before, plus improved scheduler precision. (and 
> improved tracing accuracy, etc. etc.) Note that for architectures that 
> already have sched_clock() at least as precise as the stime/utime stats 
> there's no problem - and that seems to include all architectures except 
> s390.

For far we have used the TOD clock for sched_clock. This clocks measures
real time with an accuracy of 1usec or better. The [us]time accounting
with CONFIG_VIRT_CPU_ACCOUNTING=y is done using the CPU timer. This
timer measures virtual time with an accuracy of 1usec of better. Without
CONFIG_VIRT_CPU_ACCOUNTING the [us]time accounting is done with HZ
ticks. Which means that sched_clock() is at least as precise as [us]time
on s390 as well, only that we distinguish between real time / virtual
time if the improved accounting is used.

> could you send that precise sched_clock() patch? It should be an order 
> of magnitude simpler than the high-precision stime/utime tracking you 
> already do, and it's needed for quality scheduling anyway.

Sure if you can explain what it should do. This is still unclear to me,
for a non-idle CPU the virtual cpu time should be used but for an idle
CPU the real time should be used ? That seems rather ill-defined to me.
On s390 we have three times to consider, real time, virtual cpu time and
steal time. For a given period we have real = virtual + steal. And if a
cpu is idle we have real = steal, virtual = 0. My best interpretation of
what you want is that sched_clock should progress with virtual cpu time
if the current process is not idle and with the real time if it is. No ?

> > [...] Downside: its not as precise as before as we do some math on the 
> > numbers and it will burn cycles to compute numbers we already have 
> > (utime=sum*utime/stime).
> 
> i can see no real downside to it: if all of stime, utime and 
> sum_exec_clock are precise, then the numbers we present via /proc are 
> precise too:
> 
>    sum_exec * utime / stime;
> 
> there should be no loss of precision on s390 because the 
> multiplication/division rounding is not accumulating - we keep the 
> precise sum_exec, utime and stime values untouched.

But then sched_clock() has to return the virtual cpu time only,
otherwise it will be hard to make sum_exec exact, wouldn't it?
And why should we jump through all these loops to come up with values
that are only as good as the values we already have?

> on x86 we dont really want to slow down every irq and syscall event with 
> precise stime/utime stats for 'top' to display. On s390 the 
> multiplication and division is indeed superfluous but it keeps the code 
> generic for arches where utime/stime is less precise and irq-sampled - 
> while the sum is always precise. It also animates architectures that 
> have an imprecise sched_clock() implementation to improve its accuracy. 
> Accessing the /proc files alone is many orders of magnitude more 
> expensive than this simple multiplication and division.

Yes, I can understand why you don't want to have the exact cpu
accounting scheme on x86 since it will slow down every context switch
quite a bit (that includes user <-> kernel, softirq <-> hardirq <->
process context, ..). On s390 the cost is acceptable, for an empty
system call it is about 40 additional cycles for the precise accounting.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



  reply	other threads:[~2007-08-20 17:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-12 16:32 [git pull request] scheduler updates Ingo Molnar
2007-08-14  8:37 ` [accounting regression since rc1] " Christian Borntraeger
2007-08-16  8:17   ` [PATCH][RFC] Re: accounting regression since rc1 Christian Borntraeger
2007-08-20 15:45   ` [accounting regression since rc1] scheduler updates Ingo Molnar
2007-08-20 17:03     ` Martin Schwidefsky [this message]
2007-08-20 18:08       ` Ingo Molnar
2007-08-20 18:33         ` Martin Schwidefsky
2007-08-20 19:00           ` Balbir Singh
2007-08-20 19:05           ` Ingo Molnar
2007-08-21  7:20             ` Christian Borntraeger
2007-08-20 19:12           ` Ingo Molnar
2007-08-21  7:00           ` Christian Borntraeger
2007-08-21  9:18             ` Martin Schwidefsky
2007-08-20 23:07         ` Paul Mackerras
2007-08-21  2:18         ` Andi Kleen
2007-08-21  7:09           ` Ingo Molnar
2007-08-21 10:07             ` Andi Kleen
2007-08-21 10:20               ` Ingo Molnar
2007-08-21 11:15                 ` Andi Kleen
2007-08-21 11:20                   ` Ingo Molnar
2007-08-21  8:17     ` Christian Borntraeger
2007-08-21  8:42       ` Ingo Molnar
2007-08-21  9:11         ` Martin Schwidefsky
2007-08-21  9:34           ` Ingo Molnar
2007-08-21  9:48             ` Paul Mackerras
2007-08-21 10:38             ` Martin Schwidefsky
2007-08-21 11:36               ` Ingo Molnar
2007-08-21 11:58                 ` Martin Schwidefsky
2007-08-21 10:39             ` Christian Borntraeger
2007-08-21 10:43             ` Christian Borntraeger
2007-08-21 11:15               ` Ingo Molnar
2007-08-21 11:24                 ` Christian Borntraeger
2007-08-21 11:30                   ` Ingo Molnar
2007-08-21 11:58                     ` Christian Borntraeger
2007-08-21 12:21                       ` Ingo Molnar
2007-08-21 12:57                         ` Martin Schwidefsky
2007-08-21 11:25       ` Ingo Molnar
2007-08-22  7:50         ` Christian Borntraeger
2007-08-22  7:59           ` Ingo Molnar
     [not found] ` <200708141032.47235.borntraeger@de.ibm.com>
     [not found]   ` <alpine.LFD.0.999.0708140835240.30176@woody.linux-foundation.org>
2007-08-14 18:19     ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1187629438.8541.40.camel@localhost \
    --to=schwidefsky@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jang@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox