From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Jan Glauber <jang@linux.vnet.ibm.com>,
heiko.carstens@de.ibm.com, Paul Mackerras <paulus@samba.org>
Subject: Re: [accounting regression since rc1] scheduler updates
Date: Mon, 20 Aug 2007 19:03:58 +0200 [thread overview]
Message-ID: <1187629438.8541.40.camel@localhost> (raw)
In-Reply-To: <20070820154529.GA300@elte.hu>
On Mon, 2007-08-20 at 17:45 +0200, Ingo Molnar wrote:
> * Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>
> > 1. Jan could finish his sched_clock implementation for s390 and we
> > would get close to the precise numbers. This would also let CFS make
> > better decisions. [...]
>
> i think this is the best option and it should give us the same /proc
> accuracy on s390 as before, plus improved scheduler precision. (and
> improved tracing accuracy, etc. etc.) Note that for architectures that
> already have sched_clock() at least as precise as the stime/utime stats
> there's no problem - and that seems to include all architectures except
> s390.
For far we have used the TOD clock for sched_clock. This clocks measures
real time with an accuracy of 1usec or better. The [us]time accounting
with CONFIG_VIRT_CPU_ACCOUNTING=y is done using the CPU timer. This
timer measures virtual time with an accuracy of 1usec of better. Without
CONFIG_VIRT_CPU_ACCOUNTING the [us]time accounting is done with HZ
ticks. Which means that sched_clock() is at least as precise as [us]time
on s390 as well, only that we distinguish between real time / virtual
time if the improved accounting is used.
> could you send that precise sched_clock() patch? It should be an order
> of magnitude simpler than the high-precision stime/utime tracking you
> already do, and it's needed for quality scheduling anyway.
Sure if you can explain what it should do. This is still unclear to me,
for a non-idle CPU the virtual cpu time should be used but for an idle
CPU the real time should be used ? That seems rather ill-defined to me.
On s390 we have three times to consider, real time, virtual cpu time and
steal time. For a given period we have real = virtual + steal. And if a
cpu is idle we have real = steal, virtual = 0. My best interpretation of
what you want is that sched_clock should progress with virtual cpu time
if the current process is not idle and with the real time if it is. No ?
> > [...] Downside: its not as precise as before as we do some math on the
> > numbers and it will burn cycles to compute numbers we already have
> > (utime=sum*utime/stime).
>
> i can see no real downside to it: if all of stime, utime and
> sum_exec_clock are precise, then the numbers we present via /proc are
> precise too:
>
> sum_exec * utime / stime;
>
> there should be no loss of precision on s390 because the
> multiplication/division rounding is not accumulating - we keep the
> precise sum_exec, utime and stime values untouched.
But then sched_clock() has to return the virtual cpu time only,
otherwise it will be hard to make sum_exec exact, wouldn't it?
And why should we jump through all these loops to come up with values
that are only as good as the values we already have?
> on x86 we dont really want to slow down every irq and syscall event with
> precise stime/utime stats for 'top' to display. On s390 the
> multiplication and division is indeed superfluous but it keeps the code
> generic for arches where utime/stime is less precise and irq-sampled -
> while the sum is always precise. It also animates architectures that
> have an imprecise sched_clock() implementation to improve its accuracy.
> Accessing the /proc files alone is many orders of magnitude more
> expensive than this simple multiplication and division.
Yes, I can understand why you don't want to have the exact cpu
accounting scheme on x86 since it will slow down every context switch
quite a bit (that includes user <-> kernel, softirq <-> hardirq <->
process context, ..). On s390 the cost is acceptable, for an empty
system call it is about 40 additional cycles for the precise accounting.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2007-08-20 17:00 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-12 16:32 [git pull request] scheduler updates Ingo Molnar
2007-08-14 8:37 ` [accounting regression since rc1] " Christian Borntraeger
2007-08-16 8:17 ` [PATCH][RFC] Re: accounting regression since rc1 Christian Borntraeger
2007-08-20 15:45 ` [accounting regression since rc1] scheduler updates Ingo Molnar
2007-08-20 17:03 ` Martin Schwidefsky [this message]
2007-08-20 18:08 ` Ingo Molnar
2007-08-20 18:33 ` Martin Schwidefsky
2007-08-20 19:00 ` Balbir Singh
2007-08-20 19:05 ` Ingo Molnar
2007-08-21 7:20 ` Christian Borntraeger
2007-08-20 19:12 ` Ingo Molnar
2007-08-21 7:00 ` Christian Borntraeger
2007-08-21 9:18 ` Martin Schwidefsky
2007-08-20 23:07 ` Paul Mackerras
2007-08-21 2:18 ` Andi Kleen
2007-08-21 7:09 ` Ingo Molnar
2007-08-21 10:07 ` Andi Kleen
2007-08-21 10:20 ` Ingo Molnar
2007-08-21 11:15 ` Andi Kleen
2007-08-21 11:20 ` Ingo Molnar
2007-08-21 8:17 ` Christian Borntraeger
2007-08-21 8:42 ` Ingo Molnar
2007-08-21 9:11 ` Martin Schwidefsky
2007-08-21 9:34 ` Ingo Molnar
2007-08-21 9:48 ` Paul Mackerras
2007-08-21 10:38 ` Martin Schwidefsky
2007-08-21 11:36 ` Ingo Molnar
2007-08-21 11:58 ` Martin Schwidefsky
2007-08-21 10:39 ` Christian Borntraeger
2007-08-21 10:43 ` Christian Borntraeger
2007-08-21 11:15 ` Ingo Molnar
2007-08-21 11:24 ` Christian Borntraeger
2007-08-21 11:30 ` Ingo Molnar
2007-08-21 11:58 ` Christian Borntraeger
2007-08-21 12:21 ` Ingo Molnar
2007-08-21 12:57 ` Martin Schwidefsky
2007-08-21 11:25 ` Ingo Molnar
2007-08-22 7:50 ` Christian Borntraeger
2007-08-22 7:59 ` Ingo Molnar
[not found] ` <200708141032.47235.borntraeger@de.ibm.com>
[not found] ` <alpine.LFD.0.999.0708140835240.30176@woody.linux-foundation.org>
2007-08-14 18:19 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1187629438.8541.40.camel@localhost \
--to=schwidefsky@de.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jang@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.