public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
To: discuss@LessWatts.org,
	Linux-pm mailing list <linux-pm@lists.linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>, Ingo Molnar <mingo@elte.hu>,
	venkatesh.pallipadi@intel.com, tglx@linutronix.de,
	Arjan van de Ven <arjan@infradead.org>,
	suresh.b.siddha@intel.com, Gautham R Shenoy <ego@in.ibm.com>,
	Chanda Sethia <chanda.sethia@in.ibm.com>
Subject: Analysis of sched_mc_power_savings
Date: Tue, 8 Jan 2008 23:08:15 +0530	[thread overview]
Message-ID: <20080108173815.GA7793@dirshya.in.ibm.com> (raw)

Hi,

The following experiments were conducted on a two socket dual core
intel processor based machine in order to understand the impact of
sched_mc_power_savings scheduler heuristics.

Kernel linux-2.6.24-rc6:

CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y

tick-sched.c has been instrumented to collect idle entry and exit time
stamps.

Instrumentation patch:

Instrument tick-sched nohz code and generate time stamp trace data.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 kernel/time/tick-sched.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc6.orig/kernel/time/tick-sched.c
+++ linux-2.6.24-rc6/kernel/time/tick-sched.c
@@ -20,6 +20,7 @@
 #include <linux/profile.h>
 #include <linux/sched.h>
 #include <linux/tick.h>
+#include <linux/ktrace.h>
 
 #include <asm/irq_regs.h>
 
@@ -200,7 +201,10 @@ void tick_nohz_stop_sched_tick(void)
 	if (ts->tick_stopped) {
 		delta = ktime_sub(now, ts->idle_entrytime);
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
-	}
+		ktrace_log2(KT_FUNC_tick_nohz_stop_sched_tick, KT_EVENT_INFO1,
+					ktime_to_ns(now), ts->idle_calls);
>>>>>>>>>>>>>>>>Tracepoint A

+	} else
+		ktrace_log2(KT_FUNC_tick_nohz_stop_sched_tick,
				KT_EVENT_FUNC_ENTER, ktime_to_ns(now), 0);
 
>>>>>>>>>>>>>>>>Tracepoint B

 	ts->idle_entrytime = now;
 	ts->idle_calls++;
@@ -391,6 +395,8 @@ void tick_nohz_restart_sched_tick(void)
 		tick_do_update_jiffies64(now);
 		now = ktime_get();
 	}
+	ktrace_log2(KT_FUNC_tick_nohz_restart_sched_tick, KT_EVENT_FUNC_EXIT,
+				ktime_to_ns(now), ts->idle_calls);
>>>>>>>>>>>>>>>>Tracepoint C
 	local_irq_enable();
 }

The idle time collected are time stamp at (C) minus (B). This is the
time interval between stopping ticks and restarting ticks in an idle
system.

Complete patch series:
http://svaidy.googlepages.com/1-klog.patch
http://svaidy.googlepages.com/1-trace-sched.patch

Userspace program to extract trace data:
http://svaidy.googlepages.com/1-klog.c

Python script to post process binary trace data:
http://svaidy.googlepages.com/1-sched-stats.py

Gnuplot scripts that was used to generate the graphs:
http://svaidy.googlepages.com/1-multiplot.gp

The scheduler heuristics for multi core system
/sys/devices/system/cpu/sched_mc_power_savings should ideally extend
the cpu tickless idle time atleast on few CPU in an SMP machine.

However in the experiment it was found that turning on
sched_mc_power_savings marginally increased the idle time in only some
of the CPUs.

Experiment 1:
-------------

Setup:

* yum-updated and irqbalance daemon was stopped to reduce the idle
  wakeup rate
* All irqs manually routed to CPU0 only (hoping this will keep other
  CPUs idle) (http://svaidy.googlepages.com/1-irq-config.txt)
* Powertop shows around 35 wakeups per second during idle
  (http://svaidy.googlepages.com/1-powertop-screen.txt)
* The trace of idle time stamps was collected for 120 seconds with the
  system in idle state

Results:

There are 4 png files that plots the idle time for each CPU in the
system.  

Please get the graphs from the following URLs

http://svaidy.googlepages.com/1-idle-cpu0.png
http://svaidy.googlepages.com/1-idle-cpu1.png
http://svaidy.googlepages.com/1-idle-cpu2.png
http://svaidy.googlepages.com/1-idle-cpu3.png

Each png file has 4 graphs plotted that is relevant to one CPU

* Right-top plot is the idle time sample obtained during the
  experiment
* Left-top graph is histogram of right top plot
* The bottom graphs corresponding to idle times when
  sched_mc_power_savings=1

Observations with sched_mc_power_savings=1:

* No major impact of sched_mc_power_savings on CPU0 and CPU1
* Significant idle time improvement on CPU2
* However, significant idle time reduction on CPU3

Experiment 2:
-------------

Setup:

* USB stopped
* Most daemons like yum-updatesd, hal, autofs, syslog, crond, irqbalance,
  sendmail, pcscd were stopped  
* Interrupt routing left to default but irqbalance daemon stopped  
* Powertop shows around 4 wakeups per second during idle
  (http://svaidy.googlepages.com/2-powertop-screen.txt) 
* The trace of idle time stamps was collected for 120 seconds with the
  system in idle state

Results:

There are 4 png files that plots the idle time for each CPU in the
system.  

http://svaidy.googlepages.com/2-idle-cpu0.png
http://svaidy.googlepages.com/2-idle-cpu1.png
http://svaidy.googlepages.com/2-idle-cpu2.png
http://svaidy.googlepages.com/2-idle-cpu3.png

The details of the plot are same as the previous experiment.

Observations with sched_mc_power_savings=1:

* No major impact of sched_mc_power_savings on CPU0 and CPU1
* Good idle time improvement on CPU2 and CPU3

Please review the experiment and comment on how the effectiveness of
sched_mc_power_savings can be analysed.  

At very very low wakeup count of ~4 per second on 4 CPU system gave
good idle time result when sched_mc_power_savings is enabled.

However the results are not as expected even at a marginal wakeup
count of ~35 per second.  

Please let us know your comments and suggestions on the experiment or
results.  

Do we have similar analysis and data on scheduler heuristics for power
savings.

Thanks,
Vaidy


             reply	other threads:[~2008-01-08 17:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-08 17:38 Vaidyanathan Srinivasan [this message]
2008-01-08 21:24 ` Analysis of sched_mc_power_savings Siddha, Suresh B
2008-01-09 11:13   ` Vaidyanathan Srinivasan
2008-01-09 11:35     ` Ingo Molnar
2008-01-09 12:28       ` Vaidyanathan Srinivasan
2008-01-09 13:20         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080108173815.GA7793@dirshya.in.ibm.com \
    --to=svaidy@linux.vnet.ibm.com \
    --cc=arjan@infradead.org \
    --cc=chanda.sethia@in.ibm.com \
    --cc=dipankar@in.ibm.com \
    --cc=discuss@LessWatts.org \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=mingo@elte.hu \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=venkatesh.pallipadi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox