[PATCH] virtual sched_clock() for s390

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] virtual sched_clock() for s390
@ 2007-07-19 10:57 Jan Glauber
  2007-07-19 15:29 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Glauber @ 2007-07-19 10:57 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, vatsa, mschwid2, efault, dmitry.adamushko, paulus,
	anton

This patch introduces a cpu time clock for s390 (only ticking
if the virtual cpu is running) and bases the s390 implementation
of sched_clock() on it.

The times lice length on a virtual cpu can be anything
between the calculated time slice and zero. In reality
this doesn't seem to be problem, since the scheduler is fair
enough to not let a single process starve but the current
implementation can lead to inefficient short time slices.

By providing a 'virtual' sched_clock() we guarantee that a
process can get its time slice regardless of scheduling
decisions from the hypervisor.

Patch applies to 2.6.22 git and works fine with CFS.

Jan

--
 arch/s390/kernel/time.c  |   18 ++++++++++++------
 arch/s390/kernel/vtime.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/asm-s390/timer.h |    2 ++
 3 files changed, 59 insertions(+), 6 deletions(-)

--- ./include/asm-s390/timer.h.cpu_clock	2007-07-18 13:43:53.000000000 +0200
+++ ./include/asm-s390/timer.h	2007-07-18 20:41:13.000000000 +0200
@@ -48,6 +48,8 @@
 extern void init_cpu_vtimer(void);
 extern void vtime_init(void);
 
+extern unsigned long long cpu_clock(void);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_S390_TIMER_H */
--- ./arch/s390/kernel/time.c.cpu_clock	2007-07-18 13:43:35.000000000 +0200
+++ ./arch/s390/kernel/time.c	2007-07-18 21:01:07.000000000 +0200
@@ -62,21 +62,27 @@
 static u64 xtime_cc;
 
 /*
- * Scheduler clock - returns current time in nanosec units.
+ * Monotonic_clock - returns # of nanoseconds passed since time_init()
  */
-unsigned long long sched_clock(void)
+unsigned long long monotonic_clock(void)
 {
 	return ((get_clock() - jiffies_timer_cc) * 125) >> 9;
 }
+EXPORT_SYMBOL(monotonic_clock);
 
 /*
- * Monotonic_clock - returns # of nanoseconds passed since time_init()
+ * Scheduler clock - returns current time in nanosec units.
+ * Now based on virtual cpu time to only account time the guest
+ * was actually running.
  */
-unsigned long long monotonic_clock(void)
+unsigned long long sched_clock(void)
 {
-	return sched_clock();
+#ifdef CONFIG_VIRT_TIMER
+	return cpu_clock();
+#else
+	return monotonic_clock();
+#endif
 }
-EXPORT_SYMBOL(monotonic_clock);
 
 void tod_to_timeval(__u64 todval, struct timespec *xtime)
 {
--- ./arch/s390/kernel/vtime.c.cpu_clock	2007-07-18 13:43:44.000000000 +0200
+++ ./arch/s390/kernel/vtime.c	2007-07-18 20:52:14.000000000 +0200
@@ -26,6 +26,44 @@
 
 static ext_int_info_t ext_int_info_timer;
 static DEFINE_PER_CPU(struct vtimer_queue, virt_cpu_timer);
+static DEFINE_PER_CPU(struct vtimer_list, cpu_clock_timer);
+
+/*
+ * read the remaining time of a virtual timer running on the current cpu
+ */
+static unsigned long long read_cpu_timer(struct vtimer_list *timer)
+{
+	struct vtimer_queue *vt_list;
+	unsigned long flags;
+	__u64 done;
+
+	local_irq_save(flags);
+	local_irq_disable();
+
+	BUG_ON(timer->cpu != smp_processor_id());
+
+	vt_list = &per_cpu(virt_cpu_timer, timer->cpu);
+	asm volatile ("STPT %0" : "=m" (done));
+
+	done = vt_list->to_expire + vt_list->offset - done;
+	local_irq_restore(flags);
+	return done;
+}
+
+/*
+ * Cpu clock, returns cpu time in nanosec units.
+ * Must be called with preemption disabled.
+ */
+unsigned long long cpu_clock(void)
+{
+	return ((read_cpu_timer(&__get_cpu_var(cpu_clock_timer)) * 125) >> 9);
+}
+
+/* expire after 142 years ... */
+static void cpu_clock_timer_callback(unsigned long data)
+{
+	BUG();
+}
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
 /*
@@ -522,6 +560,7 @@
 void init_cpu_vtimer(void)
 {
 	struct vtimer_queue *vt_list;
+	struct vtimer_list *timer;
 
 	/* kick the virtual timer */
 	S390_lowcore.exit_timer = VTIMER_MAX_SLICE;
@@ -539,6 +578,12 @@
 	vt_list->offset = 0;
 	vt_list->idle = 0;
 
+	/* add dummy timers needed for cpu_clock */
+	timer = &__get_cpu_var(cpu_clock_timer);
+	init_virt_timer(timer);
+	timer->expires = VTIMER_MAX_SLICE;
+	timer->function = cpu_clock_timer_callback;
+	add_virt_timer(timer);
 }
 
 static int vtimer_idle_notify(struct notifier_block *self,



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 10:57 [PATCH] virtual sched_clock() for s390 Jan Glauber
@ 2007-07-19 15:29 ` Jeremy Fitzhardinge
  2007-07-19 15:48   ` Srivatsa Vaddagiri
  2007-07-19 16:00   ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-19 15:29 UTC (permalink / raw)
  To: Jan Glauber
  Cc: LKML, Ingo Molnar, vatsa, mschwid2, efault, dmitry.adamushko,
	paulus, anton

Jan Glauber wrote:
> This patch introduces a cpu time clock for s390 (only ticking
> if the virtual cpu is running) and bases the s390 implementation
> of sched_clock() on it.
>
> The times lice length on a virtual cpu can be anything
> between the calculated time slice and zero. In reality
> this doesn't seem to be problem, since the scheduler is fair
> enough to not let a single process starve but the current
> implementation can lead to inefficient short time slices.
>
> By providing a 'virtual' sched_clock() we guarantee that a
> process can get its time slice regardless of scheduling
> decisions from the hypervisor.
>
> Patch applies to 2.6.22 git and works fine with CFS.
>   

The Xen sched_clock implementation is very similar, and it seems to work
well.

>  /*
> - * Monotonic_clock - returns # of nanoseconds passed since time_init()
> + * Scheduler clock - returns current time in nanosec units.
> + * Now based on virtual cpu time to only account time the guest
> + * was actually running.
>   

Runn*ing*?  Does it include time the VCPU spends idle/blocked?  If not,
then the scheduler won't be able to tell how long a process has been
asleep.  Maybe this doesn't matter (I had this problem in a version of
Xen's sched_clock, and I can't say I saw an ill effects from it).

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 15:29 ` Jeremy Fitzhardinge
@ 2007-07-19 15:48   ` Srivatsa Vaddagiri
  2007-07-19 16:00   ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-19 15:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jan Glauber, LKML, Ingo Molnar, mschwid2, efault,
	dmitry.adamushko, paulus, anton

On Thu, Jul 19, 2007 at 08:29:06AM -0700, Jeremy Fitzhardinge wrote:
> > - * Monotonic_clock - returns # of nanoseconds passed since time_init()
> > + * Scheduler clock - returns current time in nanosec units.
> > + * Now based on virtual cpu time to only account time the guest
> > + * was actually running.
> >   
> 
> Runn*ing*?  Does it include time the VCPU spends idle/blocked?  If not,
> then the scheduler won't be able to tell how long a process has been
> asleep.

Good point ..

I think we need a measure of both virtual and real time here - 
virtual for accounting task-execution time and real for
accounting sleep (and perhaps rq-wait?) time.

>  Maybe this doesn't matter (I had this problem in a version of
> Xen's sched_clock, and I can't say I saw an ill effects from it).

I guess it will show up as some corner case behaviour, which people are
yet to discover on virtual env.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 15:29 ` Jeremy Fitzhardinge
  2007-07-19 15:48   ` Srivatsa Vaddagiri
@ 2007-07-19 16:00   ` Ingo Molnar
  2007-07-19 19:20     ` Jan Glauber
  2007-07-20  1:01     ` Paul Mackerras
  1 sibling, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2007-07-19 16:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jan Glauber, LKML, vatsa, mschwid2, efault, dmitry.adamushko,
	paulus, anton


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> >  /*
> > - * Monotonic_clock - returns # of nanoseconds passed since time_init()
> > + * Scheduler clock - returns current time in nanosec units.
> > + * Now based on virtual cpu time to only account time the guest
> > + * was actually running.
> 
> Runn*ing*?  Does it include time the VCPU spends idle/blocked?  If 
> not, then the scheduler won't be able to tell how long a process has 
> been asleep.  Maybe this doesn't matter (I had this problem in a 
> version of Xen's sched_clock, and I can't say I saw an ill effects 
> from it).

CFS does measure time elapsed across task-sleep periods (and does 
something similar to what the old scheduler's 'sleep average' 
interactivity mechanism did), but that mechanism measures "time spent 
running during sleep", not "time spent idling".

still, CFS needs time measurement across idle periods as well, for 
another purpose: to be able to do precise task statistics for /proc. 
(for top, ps, etc.) So it's still true that sched_clock() should include 
idle periods too.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 16:00   ` Ingo Molnar
@ 2007-07-19 19:20     ` Jan Glauber
  2007-07-19 19:38       ` Ingo Molnar
  2007-07-20  1:01     ` Paul Mackerras
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Glauber @ 2007-07-19 19:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, paulus, anton

On Thu, 2007-07-19 at 18:00 +0200, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
> > >  /*
> > > - * Monotonic_clock - returns # of nanoseconds passed since time_init()
> > > + * Scheduler clock - returns current time in nanosec units.
> > > + * Now based on virtual cpu time to only account time the guest
> > > + * was actually running.
> > 
> > Runn*ing*?  Does it include time the VCPU spends idle/blocked?  If 
> > not, then the scheduler won't be able to tell how long a process has 
> > been asleep.  Maybe this doesn't matter (I had this problem in a 
> > version of Xen's sched_clock, and I can't say I saw an ill effects 
> > from it).

No, it does not include idle time, if we're going idle the cpu timer
gets stopped.

> CFS does measure time elapsed across task-sleep periods (and does 
> something similar to what the old scheduler's 'sleep average' 
> interactivity mechanism did), but that mechanism measures "time spent 
> running during sleep", not "time spent idling".
> 
> still, CFS needs time measurement across idle periods as well, for 
> another purpose: to be able to do precise task statistics for /proc. 
> (for top, ps, etc.) So it's still true that sched_clock() should include 
> idle periods too.

I'm not sure, s390 already has an implemetation for precise accounting
in the architecture code, does CFS also improve accounting data? 

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 19:20     ` Jan Glauber
@ 2007-07-19 19:38       ` Ingo Molnar
  2007-07-19 21:07         ` Jan Glauber
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-07-19 19:38 UTC (permalink / raw)
  To: Jan Glauber
  Cc: Jeremy Fitzhardinge, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, paulus, anton


* Jan Glauber <jang@linux.vnet.ibm.com> wrote:

> > still, CFS needs time measurement across idle periods as well, for 
> > another purpose: to be able to do precise task statistics for /proc. 
> > (for top, ps, etc.) So it's still true that sched_clock() should 
> > include idle periods too.
> 
> I'm not sure, s390 already has an implemetation for precise accounting 
> in the architecture code, does CFS also improve accounting data?

what kind of precise accounting does s390 have in the architecture code? 
CFS changes task (and load) accounting to be sched_clock() driven in 
essence.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 19:38       ` Ingo Molnar
@ 2007-07-19 21:07         ` Jan Glauber
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Glauber @ 2007-07-19 21:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, paulus, anton

On Thu, 2007-07-19 at 21:38 +0200, Ingo Molnar wrote: 
> * Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> 
> > > still, CFS needs time measurement across idle periods as well, for 
> > > another purpose: to be able to do precise task statistics for /proc. 
> > > (for top, ps, etc.) So it's still true that sched_clock() should 
> > > include idle periods too.
> > 
> > I'm not sure, s390 already has an implemetation for precise accounting 
> > in the architecture code, does CFS also improve accounting data?
> 
> what kind of precise accounting does s390 have in the architecture code? 
> CFS changes task (and load) accounting to be sched_clock() driven in 
> essence.

s390 has per-process accounting that is aware of virtual cpu time, implemented in
arch/s390/kernel/time.c: account_ticks() and arch/s390/kernel/vtime.c:
account_*_vtime(). Timestamps are taken in entry.S for system calls, interrupts
and other system entries and are accounted later, we don't call update_process_times().

Jan

> 	Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-19 16:00   ` Ingo Molnar
  2007-07-19 19:20     ` Jan Glauber
@ 2007-07-20  1:01     ` Paul Mackerras
  2007-07-20  6:03       ` Jeremy Fitzhardinge
  2007-07-20  7:22       ` Ingo Molnar
  1 sibling, 2 replies; 12+ messages in thread
From: Paul Mackerras @ 2007-07-20  1:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Jan Glauber, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, anton

Ingo Molnar writes:

> CFS does measure time elapsed across task-sleep periods (and does 
> something similar to what the old scheduler's 'sleep average' 
> interactivity mechanism did), but that mechanism measures "time spent 
> running during sleep", not "time spent idling".

PowerPC's sched_clock() currently measures real time.  On POWER5 and
POWER6 machines we could change it to use a register called the "PURR"
(for Processor Utilization of Resources Register), which only measures
time spent while the partition is running.  But the PURR has another
function as well: it measures the distribution of dispatch cycles
between the two hardware threads on each core when running in SMT
mode.  That is, the cpu dispatches instructions from one thread or
the other (not both) on each CPU cycle, and each thread's PURR only
gets incremented on cycles where the cpu dispatches instructions for
that thread.  So the sum of the two threads' PURRs adds up to real
time.

Do you think this makes the PURR more useful for CFS, or less?  To me
it looks like this would mean that CFS can make a more equitable
distribution of CPU time if, for example, you had 3 runnable tasks on
a 2-core x dual-threaded machine (4 virtual CPUs).

BTW, what does "time spent running during sleep" mean?  Does it mean
"time that other tasks are running while this task is sleeping"?

> still, CFS needs time measurement across idle periods as well, for 
> another purpose: to be able to do precise task statistics for /proc. 
> (for top, ps, etc.) So it's still true that sched_clock() should include 
> idle periods too.

As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING.
That affects how tsk->utime and tsk->stime are accumulated (we call
account_user_time and account_system_time directly rather than calling
update_process_times) as well as the system hardirq/softirq time, idle
time, and stolen time.

When you say "precise task statistics for /proc", where are they
accumulated?  I don't see any changes to the way that tsk->utime and
ctime are computed.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-20  1:01     ` Paul Mackerras
@ 2007-07-20  6:03       ` Jeremy Fitzhardinge
  2007-07-20  7:22       ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-20  6:03 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Ingo Molnar, Jan Glauber, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, anton

Paul Mackerras wrote:
> Do you think this makes the PURR more useful for CFS, or less?  To me
> it looks like this would mean that CFS can make a more equitable
> distribution of CPU time if, for example, you had 3 runnable tasks on
> a 2-core x dual-threaded machine (4 virtual CPUs).
>   

Sounds reasonable to me.  I've proposed in the past that sched_clock
should be scaled by the cpufreq frequency to achieve the same effect
(ie, measure the actual number of cpu cycles that are really available
to tasks).

But more specifically, what you've described is exactly analogous to
hypervisor stolen time, since one thread steals time from the other.

> BTW, what does "time spent running during sleep" mean?  Does it mean
> "time that other tasks are running while this task is sleeping"?
>   

That's how I interpreted it.  You're only credited for sleeping if
someone else wanted the CPU in the meantime.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-20  1:01     ` Paul Mackerras
  2007-07-20  6:03       ` Jeremy Fitzhardinge
@ 2007-07-20  7:22       ` Ingo Molnar
  2007-07-23  9:15         ` Jan Glauber
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-07-20  7:22 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Jeremy Fitzhardinge, Jan Glauber, LKML, vatsa, mschwid2, efault,
	dmitry.adamushko, anton

* Paul Mackerras <paulus@samba.org> wrote:

> PowerPC's sched_clock() currently measures real time.  On POWER5 and 
> POWER6 machines we could change it to use a register called the "PURR" 
> (for Processor Utilization of Resources Register), which only measures 
> time spent while the partition is running.  But the PURR has another 
> function as well: it measures the distribution of dispatch cycles 
> between the two hardware threads on each core when running in SMT 
> mode.  That is, the cpu dispatches instructions from one thread or the 
> other (not both) on each CPU cycle, and each thread's PURR only gets 
> incremented on cycles where the cpu dispatches instructions for that 
> thread.  So the sum of the two threads' PURRs adds up to real time.
> 
> Do you think this makes the PURR more useful for CFS, or less?  To me 
> it looks like this would mean that CFS can make a more equitable 
> distribution of CPU time if, for example, you had 3 runnable tasks on 
> a 2-core x dual-threaded machine (4 virtual CPUs).

there's one complication: sched_clock() still needs to increase while 
the CPU (or thread) is idle, so that we can have a correct measurement 
of the CPU's utilization, for SMP load-balancing. CFS constructs another 
clock from sched_clock() [the rq->fair_clock] which does stop while the 
CPU is idle.

So perhaps a combination of the PURR and real-time might work as 
sched_clock(): when a hardware thread is in cpu_idle(), it should 
advance its sched clock with _half_ the rate of real-time [so that the 
sum of advance of all threads if they are all idle is equal to real 
time], and use the PURR if they are not idle. This would still correctly 
keep a meaningful load-average if the physical CPU is under-utilized.

If you do such a change you'll immediately see whether the approach is 
right: monitor the cpu_load[] values in /proc/sched_debug, they should 
match the intuitive 'load average' of that CPU (if divided by 1024), and 
check whether 'top' still works fine.

> BTW, what does "time spent running during sleep" mean?  Does it mean 
> "time that other tasks are running while this task is sleeping"?

yeah. It's "the amount of fair runtime i missed out on while others were 
running".

> > still, CFS needs time measurement across idle periods as well, for 
> > another purpose: to be able to do precise task statistics for /proc. 
> > (for top, ps, etc.) So it's still true that sched_clock() should 
> > include idle periods too.
> 
> As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. 
> That affects how tsk->utime and tsk->stime are accumulated (we call 
> account_user_time and account_system_time directly rather than calling 
> update_process_times) as well as the system hardirq/softirq time, idle 
> time, and stolen time.

tsk->utime and tsk->stime is only used for a single purpose: to 
determine the 'split' factor of how to split up the precise total time 
between user and system time.

> When you say "precise task statistics for /proc", where are they 
> accumulated?  I don't see any changes to the way that tsk->utime and 
> ctime are computed.

we now use p->se.sum_exec_runtime that measures (in nanoseconds) the 
precise amount of time spent executing (sum of system and user time) - 
and ->stime and ->utime is used to determine the 'split'. [this allows 
us to gather ->stime and ->utime via low-resolution sampling, while 
keeping the 'total' precise. Accounting at every system entry point 
would be quite expensive on most platforms.]

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-20  7:22       ` Ingo Molnar
@ 2007-07-23  9:15         ` Jan Glauber
  2007-07-23 13:24           ` Martin Schwidefsky
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Glauber @ 2007-07-23  9:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul Mackerras, Jeremy Fitzhardinge, LKML, vatsa, mschwid2,
	efault, dmitry.adamushko, anton

On Fri, 2007-07-20 at 09:22 +0200, Ingo Molnar wrote:
> * Paul Mackerras <paulus@samba.org> wrote:
> > As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. 
> > That affects how tsk->utime and tsk->stime are accumulated (we call 
> > account_user_time and account_system_time directly rather than calling 
> > update_process_times) as well as the system hardirq/softirq time, idle 
> > time, and stolen time.
> 
> tsk->utime and tsk->stime is only used for a single purpose: to 
> determine the 'split' factor of how to split up the precise total time 
> between user and system time.
> 
> > When you say "precise task statistics for /proc", where are they 
> > accumulated?  I don't see any changes to the way that tsk->utime and 
> > ctime are computed.
> 
> we now use p->se.sum_exec_runtime that measures (in nanoseconds) the 
> precise amount of time spent executing (sum of system and user time) - 
> and ->stime and ->utime is used to determine the 'split'. [this allows 
> us to gather ->stime and ->utime via low-resolution sampling, while 
> keeping the 'total' precise. Accounting at every system entry point 
> would be quite expensive on most platforms.]

Using se.sum_exec_runtime to generate ->utime and ->stime breaks
the process accounting we have on s390 (and probably on PowerPC too).
With CONFIG_VIRT_CPU_ACCOUNTING we already have precise values in
->utime and ->stime. Can we make the calculation of the CFS-based time
values conditional by CONFIG_VIRT_CPU_ACCOUNTING?

Jan

> 	Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtual sched_clock() for s390
  2007-07-23  9:15         ` Jan Glauber
@ 2007-07-23 13:24           ` Martin Schwidefsky
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2007-07-23 13:24 UTC (permalink / raw)
  To: Jan Glauber
  Cc: Ingo Molnar, Paul Mackerras, Jeremy Fitzhardinge, LKML, vatsa,
	mschwid2, efault, dmitry.adamushko, anton

On Mon, 2007-07-23 at 09:15 +0000, Jan Glauber wrote:
> > > As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. 
> > > That affects how tsk->utime and tsk->stime are accumulated (we call 
> > > account_user_time and account_system_time directly rather than calling 
> > > update_process_times) as well as the system hardirq/softirq time, idle 
> > > time, and stolen time.
> > 
> > tsk->utime and tsk->stime is only used for a single purpose: to 
> > determine the 'split' factor of how to split up the precise total time 
> > between user and system time.

At least for s390 and powerpc the utime and stime already contain a very
precise value how much time was spent in the user and system context.
For s390 the granularity is a microsecond. The other values nice, idle,
iowait, irq, softirq and steal are precise as well.

> > > When you say "precise task statistics for /proc", where are they 
> > > accumulated?  I don't see any changes to the way that tsk->utime and 
> > > ctime are computed.
> > 
> > we now use p->se.sum_exec_runtime that measures (in nanoseconds) the 
> > precise amount of time spent executing (sum of system and user time) - 
> > and ->stime and ->utime is used to determine the 'split'. [this allows 
> > us to gather ->stime and ->utime via low-resolution sampling, while 
> > keeping the 'total' precise. Accounting at every system entry point 
> > would be quite expensive on most platforms.]

With the exact accounting of utime and stime that would mean that
p->se.sum_exec_runtime is utime + stime, no?
Precise Accounting at every cpu context switch has some cost, but for
s390 it is not as bad as it sounds. We do 2 store-cpu-timer (STPT)
instructions, 2 64 bit adds and 2 64 bit subtracts. In terms of cycles
it is less than 30 cycles on each system entry on the latest machine.

> Using se.sum_exec_runtime to generate ->utime and ->stime breaks
> the process accounting we have on s390 (and probably on PowerPC too).
> With CONFIG_VIRT_CPU_ACCOUNTING we already have precise values in
> ->utime and ->stime. Can we make the calculation of the CFS-based time
> values conditional by CONFIG_VIRT_CPU_ACCOUNTING?

Imho, we just have to update utime and stime when the process accounting
values are requested and set se.sum_exec_runtime to the sum of utime and
stime for CONFIG_VIRT_CPU_ACCOUNTING=y.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-07-23 13:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-19 10:57 [PATCH] virtual sched_clock() for s390 Jan Glauber
2007-07-19 15:29 ` Jeremy Fitzhardinge
2007-07-19 15:48   ` Srivatsa Vaddagiri
2007-07-19 16:00   ` Ingo Molnar
2007-07-19 19:20     ` Jan Glauber
2007-07-19 19:38       ` Ingo Molnar
2007-07-19 21:07         ` Jan Glauber
2007-07-20  1:01     ` Paul Mackerras
2007-07-20  6:03       ` Jeremy Fitzhardinge
2007-07-20  7:22       ` Ingo Molnar
2007-07-23  9:15         ` Jan Glauber
2007-07-23 13:24           ` Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox