public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dimitri Sivanich <sivanich@sgi.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <johnstul@us.ibm.com>
Subject: Re: [PATCH] specific do_timer_cpu value for nohz off mode
Date: Tue, 23 Aug 2011 14:56:28 -0500	[thread overview]
Message-ID: <20110823195628.GB4533@sgi.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1108171819120.2807@ionos>

On Wed, Aug 17, 2011 at 06:47:43PM +0200, Thomas Gleixner wrote:
> On Wed, 17 Aug 2011, Dimitri Sivanich wrote:
> 
> > Reposting this, as this was posted 2 weeks ago with no replies.
> 
> It's still in my vacation backlog :)

Thanks for the prompt reply to this last one.

> 
> > Jiffies updates are currently done by the tick_do_timer_cpu.  This has a
> > non-deterministic value that can be any running cpu on the system.  It
> > changes dynamically in nohz mode.  When nohz mode is off, it gets set to
> > a more static, but still non-deterministic value.
> > 
> > While the nohz behavior is necessary, is there a reason why the nohz off
> > case can't have a specific value, say 0 as it was on earlier kernels?
> 
> Yes, we had troubles when switching over to highres/oneshot mode when
> the first cpu which did the switch did not take the do_timer duty. See
> changelogs.
> 
> > If the cpu is offlined, let the value change at that time (note that the
> > x86 arch disallows offlining cpu 0).
> > 
> > There are certain cases where this would be advantageous, especially where
> > timely jiffies updates may not necessarily occur on specific processors.
> 
> Huch? How about fixing those long interrupt disabled regions instead?
> And honestly jiffies update being delayed for a bit is not really a
> problem.
> 
> > The following sample patch presents one way that this could be done.
> > Processors wait for the selected cpu to enter high resolution mode before
> > they do so.
> 
> That's a horrible hack.

Hopefully the patch below is more tolerable.

> 
> > Note that this patch is not hotplug aware (however, should the
> > tick_static_do_timer_cpu be offlined, the tick_do_timer_cpu simply becomes
> > another cpu anyway).
> > 
> > Comments on this idea, or the sample patch?
> 
> I still have no idea why a random assignment is so harmful. Also if
> there is really a reason to make that assignment static, what about
> using a sysfs file, which lets you read out the assignment and update
> it in the NOHZ off case ?

While not necessarily harmful, doing jiffies updates on an application cpu
does cause some extra overhead that HPC benchmarking people notice.  They
prefer to have OS activity isolated to certain cpus.  They like reproducibility
of results, and having jiffies updates bouncing around introduces variability.

Maybe this is useful for other folks as well?  It does give an indication
of which cpu is currently doing jiffies updates.

The patch below puts the file in /proc/sys/kernel, but if you think it should be in
/sys, please let me know where you'd like to see it.


Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
 include/linux/tick.h        |    5 +++++
 kernel/sysctl.c             |    9 +++++++++
 kernel/time/tick-internal.h |    1 -
 kernel/time/tick-sched.c    |   23 +++++++++++++++++++++++
 4 files changed, 37 insertions(+), 1 deletion(-)

Index: linux/kernel/sysctl.c
===================================================================
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -57,6 +57,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/oom.h>
 #include <linux/kmod.h>
+#include <linux/tick.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -368,6 +369,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= sched_rt_handler,
 	},
+	{
+		.procname	= "sched_jiffies_cpu",
+		.data		= &tick_do_timer_cpu,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= tick_do_timer_cpu_handler,
+		.extra1		= &zero,
+	},
 #ifdef CONFIG_SCHED_AUTOGROUP
 	{
 		.procname	= "sched_autogroup_enabled",
Index: linux/include/linux/tick.h
===================================================================
--- linux.orig/include/linux/tick.h
+++ linux/include/linux/tick.h
@@ -72,6 +72,11 @@ struct tick_sched {
 extern void __init tick_init(void);
 extern int tick_is_oneshot_available(void);
 extern struct tick_device *tick_get_device(int cpu);
+extern int tick_do_timer_cpu_handler(struct ctl_table *table, int write,
+				 void __user *buffer, size_t *lenp,
+				 loff_t *ppos);
+
+extern int tick_do_timer_cpu __read_mostly;
 
 # ifdef CONFIG_HIGH_RES_TIMERS
 extern int tick_init_highres(void);
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -815,6 +815,29 @@ void tick_cancel_sched_timer(int cpu)
 }
 #endif
 
+int tick_do_timer_cpu_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret, old_val;
+
+#ifdef CONFIG_NO_HZ
+	/* nohz mode not supported */
+	if (write && tick_nohz_enabled)
+		return -EINVAL;
+#endif
+
+	old_val = tick_do_timer_cpu;
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (!ret && write && !cpu_online(tick_do_timer_cpu)) {
+		tick_do_timer_cpu = old_val;
+		return -EINVAL;
+	}
+
+	return ret;
+}
+
 /**
  * Async notification about clocksource changes
  */
Index: linux/kernel/time/tick-internal.h
===================================================================
--- linux.orig/kernel/time/tick-internal.h
+++ linux/kernel/time/tick-internal.h
@@ -12,7 +12,6 @@
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 extern ktime_t tick_next_period;
 extern ktime_t tick_period;
-extern int tick_do_timer_cpu __read_mostly;
 
 extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
 extern void tick_handle_periodic(struct clock_event_device *dev);

  reply	other threads:[~2011-08-23 19:56 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-17 16:07 [PATCH] specific do_timer_cpu value for nohz off mode Dimitri Sivanich
2011-08-17 16:47 ` Thomas Gleixner
2011-08-23 19:56   ` Dimitri Sivanich [this message]
2011-09-02  8:19     ` Thomas Gleixner
2011-09-02 19:29       ` Dimitri Sivanich
2011-09-02 19:57         ` Thomas Gleixner
2011-09-02 20:39           ` Dimitri Sivanich
  -- strict thread matches above, loose matches on Subject: below --
2011-11-08 19:11 Dimitri Sivanich
2011-11-23  0:08 ` Andrew Morton
2011-11-30 15:29   ` Dimitri Sivanich
2011-12-01  0:11     ` Andrew Morton
2011-12-01  0:16       ` Andrew Morton
2011-12-01  2:07         ` Dimitri Sivanich
2011-12-01  2:13           ` Andrew Morton
2011-12-01 16:37             ` Dimitri Sivanich
2011-12-01 22:56               ` Andrew Morton
2011-12-02 20:14                 ` Dimitri Sivanich
2011-12-02 20:22                   ` Dimitri Sivanich
2011-12-02 22:42                   ` Thomas Gleixner
2011-12-01  2:06       ` Dimitri Sivanich
2011-12-01  2:12         ` Andrew Morton
2011-12-01  2:34           ` Dimitri Sivanich
2011-12-01  2:38             ` Andrew Morton
2012-01-15 13:46 ` Mike Galbraith
2012-01-15 14:04   ` Mike Galbraith
2012-01-15 14:23   ` Mike Galbraith
2012-01-25 11:27   ` Mike Galbraith
2012-02-15 14:16 ` Thomas Gleixner
2012-02-15 14:37   ` Dimitri Sivanich
2012-02-15 14:52     ` Thomas Gleixner
2012-02-15 15:34       ` Dimitri Sivanich
2012-02-15 20:36         ` Thomas Gleixner
2012-02-16 14:59           ` Dimitri Sivanich
2011-08-03 19:57 Dimitri Sivanich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110823195628.GB4533@sgi.com \
    --to=sivanich@sgi.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox