* [RFC] [PATCH] allow low HZ values?
@ 2010-10-11 20:11 Tim Pepper
2010-10-11 20:32 ` Thomas Gleixner
2010-10-11 22:30 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 8+ messages in thread
From: Tim Pepper @ 2010-10-11 20:11 UTC (permalink / raw)
To: linux-kernel
Cc: Marcio Saito, John Stultz, Jiri Slaby, x86, Ingo Molnar,
Paul Mackerras, H. Peter Anvin, Thomas Gleixner, linuxppc-dev,
Avantika Mathur
I'm not necessarily wanting to open up the age old question of "what is
a good HZ", but we were doing some testing on timer tick overheads for
HPC applications and this came up...
Below is a minimal hack at enabling lower HZ values. The kernel builds
and boots for me on x86_64 (simple laptop and kvm configs) and ppc64
(misc. IBM System p) with each of the added HZ options.
There's explicit code checking HZ down to 12, but HZ<100 wasn't a config
option. We collected some data at 10, 12 and 25. There'd been some
question of whether 10 would even work or not but it looks fine in the
relatively minimal testing we did. We tried 12 since the code seemed
to allow for it. And 25 as a "safe" lower value. The only difference
observed under load (ie: no no idle HZ in play) was the expected timer
tick happening less often. There was definitely surprise that nothing
else seemed to break anywhere, especially at 10.
Do people feel it is reasonable to have Kconfig bits to allow some lower
HZ values?
If so, then there's the question of what breaks. It's reasonable to
think there are other going to be subtleties buried in code around
assumptions on the likely range of HZ:
- I'm not sure that what I did in inet_timewait_sock.h and jiffies.h is
reasonable.
- arch/x86/kernel/i8253.c throws a warning at line 43 (v2.6.36-rc7):
warning: large integer implicitly truncated to unsigned type
- drivers/char/cyclades.c's cy_ioctl() warns:
drivers/char/cyclades.c:2761: warning: division by zero
- drivers, drivers, drivers across all the arch's could use sanity checking
--
Tim Pepper <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center
Not-Signed-off-by: Tim Pepper <lnxninja@us.ibm.com>
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 6811f4b..8c225b2 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -15,7 +15,9 @@
* OSF/1 kernel. The SHIFT_HZ define expresses the same value as the
* nearest power of two in order to avoid hardware multiply operations.
*/
-#if HZ >= 12 && HZ < 24
+#if HZ < 12
+# define SHIFT_HZ 3
+#elif HZ >= 12 && HZ < 24
# define SHIFT_HZ 4
#elif HZ >= 24 && HZ < 48
# define SHIFT_HZ 5
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index a066fdd..1aba305 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -39,8 +39,9 @@ struct inet_hashinfo;
* If time > 4sec, it is "slow" path, no recycling is required,
* so that we select tick to get range about 4 seconds.
*/
-#if HZ <= 16 || HZ > 4096
-# error Unsupported: HZ <= 16 or HZ > 4096
+/* HACK HACK */
+#if HZ > 4096
+# error Unsupported: HZ > 4096
#elif HZ <= 32
# define INET_TWDR_RECYCLE_TICK (5 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
#elif HZ <= 64
diff --git a/kernel/Kconfig.hz b/kernel/Kconfig.hz
index 94fabd5..37302bf 100644
--- a/kernel/Kconfig.hz
+++ b/kernel/Kconfig.hz
@@ -15,6 +15,22 @@ choice
environment leading to NR_CPUS * HZ number of timer interrupts
per second.
+ config HZ_10
+ bool "10 HZ"
+ help
+ 10 Hz is extremely aggressive and may break things.
+
+ config HZ_12
+ bool "12 HZ"
+ help
+ 12 Hz because it's less aggressive than 10?
+
+ config HZ_25
+ bool "25 HZ"
+ help
+ 25 Hz is useful for reducing HPC application jitter caused by
+ timer interrupts happening during a "fixed time quantum of work
+ then barrier" style workload.
config HZ_100
bool "100 HZ"
@@ -49,6 +65,9 @@ endchoice
config HZ
int
+ default 10 if HZ_10
+ default 12 if HZ_12
+ default 25 if HZ_25
default 100 if HZ_100
default 250 if HZ_250
default 300 if HZ_300
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 20:11 [RFC] [PATCH] allow low HZ values? Tim Pepper
@ 2010-10-11 20:32 ` Thomas Gleixner
2010-10-11 21:11 ` Tim Pepper
2010-10-12 14:31 ` Andi Kleen
2010-10-11 22:30 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2010-10-11 20:32 UTC (permalink / raw)
To: Tim Pepper
Cc: Marcio Saito, John Stultz, Jiri Slaby, Peter Zijlstra, x86, LKML,
Frederic Weisbecker, Ingo Molnar, Paul Mackerras, H. Peter Anvin,
Avantika Mathur, linuxppc-dev
On Mon, 11 Oct 2010, Tim Pepper wrote:
> I'm not necessarily wanting to open up the age old question of "what is
> a good HZ", but we were doing some testing on timer tick overheads for
> HPC applications and this came up...
Yeah. This comes always up when the timer tick overhead on HPC is
tested. And this patch is again the fundamentally wrong answer.
We have told HPC folks for years that we need a kind of "NOHZ" mode
for HPC where we can transparently switch off the tick when only one
user space bound thread is active and switch back to normal once this
thing terminates or goes into the kernel via a syscall. Sigh, nothing
happened ever except for repeating the same crap patches over and
over.
FYI, Frederic is working on that right now. He will talk about it at
the plumbers RT microconf, so you might catch him there.
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 20:32 ` Thomas Gleixner
@ 2010-10-11 21:11 ` Tim Pepper
2010-10-12 14:31 ` Andi Kleen
1 sibling, 0 replies; 8+ messages in thread
From: Tim Pepper @ 2010-10-11 21:11 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Marcio Saito, John Stultz, Jiri Slaby, Peter Zijlstra, x86, LKML,
Tim Pepper, Frederic Weisbecker, Ingo Molnar, Paul Mackerras,
H. Peter Anvin, Avantika Mathur, linuxppc-dev
On Mon 11 Oct at 22:32:06 +0200 tglx@linutronix.de said:
> On Mon, 11 Oct 2010, Tim Pepper wrote:
>
> > I'm not necessarily wanting to open up the age old question of "what is
> > a good HZ", but we were doing some testing on timer tick overheads for
> > HPC applications and this came up...
>
> Yeah. This comes always up when the timer tick overhead on HPC is
> tested. And this patch is again the fundamentally wrong answer.
Yep. Long term no hz is definitely the goal. I'm not sufficiently
connected to the -rt space I guess to have followed that there's somebody
again looking in that direction. The rfc patch was mostly just a minimal
is there anything simple we can do in the meantime exercise.
> We have told HPC folks for years that we need a kind of "NOHZ" mode
> for HPC where we can transparently switch off the tick when only one
> user space bound thread is active and switch back to normal once this
> thing terminates or goes into the kernel via a syscall.
I'd not heard of this in between NOHZ-y idea...sounds promising.
We'd talked about different non-idle no hz approaches in the past year
or so, some of which were on the veeery complicated side of the spectrum.
> Sigh, nothing
> happened ever except for repeating the same crap patches over and
> over.
I'll check out what Frederic is doing. Thanks for the pointer and
apologies for the noise.
--
Tim Pepper <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 20:32 ` Thomas Gleixner
2010-10-11 21:11 ` Tim Pepper
@ 2010-10-12 14:31 ` Andi Kleen
2010-10-12 16:56 ` Thomas Gleixner
1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-10-12 14:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Marcio Saito, H. Peter Anvin, Jiri Slaby, Peter Zijlstra,
John Stultz, x86, LKML, Tim Pepper, jblunck, Ingo Molnar,
Paul Mackerras, Frederic Weisbecker, Avantika Mathur,
linuxppc-dev
Thomas Gleixner <tglx@linutronix.de> writes:
> On Mon, 11 Oct 2010, Tim Pepper wrote:
>
>> I'm not necessarily wanting to open up the age old question of "what is
>> a good HZ", but we were doing some testing on timer tick overheads for
>> HPC applications and this came up...
>
> Yeah. This comes always up when the timer tick overhead on HPC is
> tested. And this patch is again the fundamentally wrong answer.
That's a unfair description of the proposal.
> We have told HPC folks for years that we need a kind of "NOHZ" mode
> for HPC where we can transparently switch off the tick when only one
> user space bound thread is active and switch back to normal once this
> thing terminates or goes into the kernel via a syscall. Sigh, nothing
> happened ever except for repeating the same crap patches over and
> over.
Jan Blunck posted a patch for this exactly few months ago.
Unfortunately it didn't get the accounting right, but other than
that it seemed like a reasonable starting point.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-12 14:31 ` Andi Kleen
@ 2010-10-12 16:56 ` Thomas Gleixner
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2010-10-12 16:56 UTC (permalink / raw)
To: Andi Kleen
Cc: Marcio Saito, H. Peter Anvin, Jiri Slaby, Peter Zijlstra,
John Stultz, x86, LKML, Tim Pepper, jblunck, Ingo Molnar,
Paul Mackerras, Frederic Weisbecker, Avantika Mathur,
linuxppc-dev
On Tue, 12 Oct 2010, Andi Kleen wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > We have told HPC folks for years that we need a kind of "NOHZ" mode
> > for HPC where we can transparently switch off the tick when only one
> > user space bound thread is active and switch back to normal once this
> > thing terminates or goes into the kernel via a syscall. Sigh, nothing
> > happened ever except for repeating the same crap patches over and
> > over.
>
> Jan Blunck posted a patch for this exactly few months ago.
> Unfortunately it didn't get the accounting right, but other than
> that it seemed like a reasonable starting point.
Unfortunately it did not get a lot of other things right either.
Thanks,
tglx
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 20:11 [RFC] [PATCH] allow low HZ values? Tim Pepper
2010-10-11 20:32 ` Thomas Gleixner
@ 2010-10-11 22:30 ` Benjamin Herrenschmidt
2010-10-11 22:33 ` Thomas Gleixner
1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-11 22:30 UTC (permalink / raw)
To: Tim Pepper
Cc: Marcio Saito, Jiri Slaby, John Stultz, x86, linux-kernel,
Ingo Molnar, Paul Mackerras, H. Peter Anvin, Thomas Gleixner,
linuxppc-dev, Avantika Mathur
On Mon, 2010-10-11 at 13:11 -0700, Tim Pepper wrote:
> I'm not necessarily wanting to open up the age old question of "what is
> a good HZ", but we were doing some testing on timer tick overheads for
> HPC applications and this came up...
Note that this is also very useful when working on CPU prototypes
implemented in FPGAs and running at something like 12Mhz :-)
Cheers,
Ben.
> Below is a minimal hack at enabling lower HZ values. The kernel builds
> and boots for me on x86_64 (simple laptop and kvm configs) and ppc64
> (misc. IBM System p) with each of the added HZ options.
>
> There's explicit code checking HZ down to 12, but HZ<100 wasn't a config
> option. We collected some data at 10, 12 and 25. There'd been some
> question of whether 10 would even work or not but it looks fine in the
> relatively minimal testing we did. We tried 12 since the code seemed
> to allow for it. And 25 as a "safe" lower value. The only difference
> observed under load (ie: no no idle HZ in play) was the expected timer
> tick happening less often. There was definitely surprise that nothing
> else seemed to break anywhere, especially at 10.
>
> Do people feel it is reasonable to have Kconfig bits to allow some lower
> HZ values?
>
> If so, then there's the question of what breaks. It's reasonable to
> think there are other going to be subtleties buried in code around
> assumptions on the likely range of HZ:
>
> - I'm not sure that what I did in inet_timewait_sock.h and jiffies.h is
> reasonable.
> - arch/x86/kernel/i8253.c throws a warning at line 43 (v2.6.36-rc7):
> warning: large integer implicitly truncated to unsigned type
> - drivers/char/cyclades.c's cy_ioctl() warns:
> drivers/char/cyclades.c:2761: warning: division by zero
> - drivers, drivers, drivers across all the arch's could use sanity checking
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 22:30 ` Benjamin Herrenschmidt
@ 2010-10-11 22:33 ` Thomas Gleixner
2010-10-11 22:47 ` H. Peter Anvin
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2010-10-11 22:33 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Marcio Saito, Jiri Slaby, John Stultz, x86, linux-kernel,
Tim Pepper, Ingo Molnar, Paul Mackerras, H. Peter Anvin,
Avantika Mathur, linuxppc-dev
On Tue, 12 Oct 2010, Benjamin Herrenschmidt wrote:
> On Mon, 2010-10-11 at 13:11 -0700, Tim Pepper wrote:
> > I'm not necessarily wanting to open up the age old question of "what is
> > a good HZ", but we were doing some testing on timer tick overheads for
> > HPC applications and this came up...
>
> Note that this is also very useful when working on CPU prototypes
> implemented in FPGAs and running at something like 12Mhz :-)
/me hands benh 0.5$ for a FPGA upgrade
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] allow low HZ values?
2010-10-11 22:33 ` Thomas Gleixner
@ 2010-10-11 22:47 ` H. Peter Anvin
0 siblings, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2010-10-11 22:47 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Marcio Saito, Jiri Slaby, x86, linux-kernel, Tim Pepper,
Ingo Molnar, Paul Mackerras, John Stultz, Avantika Mathur,
linuxppc-dev
On 10/11/2010 03:33 PM, Thomas Gleixner wrote:
> On Tue, 12 Oct 2010, Benjamin Herrenschmidt wrote:
>
>> On Mon, 2010-10-11 at 13:11 -0700, Tim Pepper wrote:
>>> I'm not necessarily wanting to open up the age old question of "what is
>>> a good HZ", but we were doing some testing on timer tick overheads for
>>> HPC applications and this came up...
>>
>> Note that this is also very useful when working on CPU prototypes
>> implemented in FPGAs and running at something like 12Mhz :-)
>
> /me hands benh 0.5$ for a FPGA upgrade
That's often not possible if the CPU cannot be mapped onto a single FPGA
(either because the core is too large, multiple cores are tested, or
because there is debugging logic is included.) The interconnects slows
things down tremendously.
-hpa
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-10-12 16:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-11 20:11 [RFC] [PATCH] allow low HZ values? Tim Pepper
2010-10-11 20:32 ` Thomas Gleixner
2010-10-11 21:11 ` Tim Pepper
2010-10-12 14:31 ` Andi Kleen
2010-10-12 16:56 ` Thomas Gleixner
2010-10-11 22:30 ` Benjamin Herrenschmidt
2010-10-11 22:33 ` Thomas Gleixner
2010-10-11 22:47 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).