Re: [PATCH] virtual sched_clock() for s390

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Paul Mackerras <paulus@samba.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	Jan Glauber <jang@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	vatsa@linux.vnet.ibm.com, mschwid2@linux.vnet.ibm.com,
	efault@gmx.de, dmitry.adamushko@gmail.com, anton@samba.org
Subject: Re: [PATCH] virtual sched_clock() for s390
Date: Fri, 20 Jul 2007 09:22:45 +0200	[thread overview]
Message-ID: <20070720072245.GA4020@elte.hu> (raw)
In-Reply-To: <18080.2390.42425.852087@cargo.ozlabs.ibm.com>

* Paul Mackerras <paulus@samba.org> wrote:

> PowerPC's sched_clock() currently measures real time.  On POWER5 and 
> POWER6 machines we could change it to use a register called the "PURR" 
> (for Processor Utilization of Resources Register), which only measures 
> time spent while the partition is running.  But the PURR has another 
> function as well: it measures the distribution of dispatch cycles 
> between the two hardware threads on each core when running in SMT 
> mode.  That is, the cpu dispatches instructions from one thread or the 
> other (not both) on each CPU cycle, and each thread's PURR only gets 
> incremented on cycles where the cpu dispatches instructions for that 
> thread.  So the sum of the two threads' PURRs adds up to real time.
> 
> Do you think this makes the PURR more useful for CFS, or less?  To me 
> it looks like this would mean that CFS can make a more equitable 
> distribution of CPU time if, for example, you had 3 runnable tasks on 
> a 2-core x dual-threaded machine (4 virtual CPUs).

there's one complication: sched_clock() still needs to increase while 
the CPU (or thread) is idle, so that we can have a correct measurement 
of the CPU's utilization, for SMP load-balancing. CFS constructs another 
clock from sched_clock() [the rq->fair_clock] which does stop while the 
CPU is idle.

So perhaps a combination of the PURR and real-time might work as 
sched_clock(): when a hardware thread is in cpu_idle(), it should 
advance its sched clock with _half_ the rate of real-time [so that the 
sum of advance of all threads if they are all idle is equal to real 
time], and use the PURR if they are not idle. This would still correctly 
keep a meaningful load-average if the physical CPU is under-utilized.

If you do such a change you'll immediately see whether the approach is 
right: monitor the cpu_load[] values in /proc/sched_debug, they should 
match the intuitive 'load average' of that CPU (if divided by 1024), and 
check whether 'top' still works fine.

> BTW, what does "time spent running during sleep" mean?  Does it mean 
> "time that other tasks are running while this task is sleeping"?

yeah. It's "the amount of fair runtime i missed out on while others were 
running".

> > still, CFS needs time measurement across idle periods as well, for 
> > another purpose: to be able to do precise task statistics for /proc. 
> > (for top, ps, etc.) So it's still true that sched_clock() should 
> > include idle periods too.
> 
> As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. 
> That affects how tsk->utime and tsk->stime are accumulated (we call 
> account_user_time and account_system_time directly rather than calling 
> update_process_times) as well as the system hardirq/softirq time, idle 
> time, and stolen time.

tsk->utime and tsk->stime is only used for a single purpose: to 
determine the 'split' factor of how to split up the precise total time 
between user and system time.

> When you say "precise task statistics for /proc", where are they 
> accumulated?  I don't see any changes to the way that tsk->utime and 
> ctime are computed.

we now use p->se.sum_exec_runtime that measures (in nanoseconds) the 
precise amount of time spent executing (sum of system and user time) - 
and ->stime and ->utime is used to determine the 'split'. [this allows 
us to gather ->stime and ->utime via low-resolution sampling, while 
keeping the 'total' precise. Accounting at every system entry point 
would be quite expensive on most platforms.]

	Ingo

next prev parent reply	other threads:[~2007-07-20  7:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-19 10:57 [PATCH] virtual sched_clock() for s390 Jan Glauber
2007-07-19 15:29 ` Jeremy Fitzhardinge
2007-07-19 15:48   ` Srivatsa Vaddagiri
2007-07-19 16:00   ` Ingo Molnar
2007-07-19 19:20     ` Jan Glauber
2007-07-19 19:38       ` Ingo Molnar
2007-07-19 21:07         ` Jan Glauber
2007-07-20  1:01     ` Paul Mackerras
2007-07-20  6:03       ` Jeremy Fitzhardinge
2007-07-20  7:22       ` Ingo Molnar [this message]
2007-07-23  9:15         ` Jan Glauber
2007-07-23 13:24           ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070720072245.GA4020@elte.hu \
    --to=mingo@elte.hu \
    --cc=anton@samba.org \
    --cc=dmitry.adamushko@gmail.com \
    --cc=efault@gmx.de \
    --cc=jang@linux.vnet.ibm.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mschwid2@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.