All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment
@ 2013-06-18 23:58 Kevin Hilman
  2013-06-18 23:58 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
  2013-06-19 18:42 ` [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Frederic Weisbecker
  0 siblings, 2 replies; 10+ messages in thread
From: Kevin Hilman @ 2013-06-18 23:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linaro-kernel, Ingo Molnar, Peter Zijlstra, Clark Williams,
	Kevin Hilman, Tony Luck, Andrew Morton, Kees Cook, Mel Gorman,
	Rik van Riel, open list

Allow sysctl override of sched_tick_max_deferment in order to ease
finding/fixing the remaining issues with full nohz.

The value to be written is in jiffies, and -1 means the max deferment
is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 include/linux/sched/sysctl.h | 3 +++
 kernel/sched/core.c          | 6 +++++-
 kernel/sched/debug.c         | 1 +
 kernel/sysctl.c              | 9 +++++++++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index bf8086b..2ad07bb 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -57,6 +57,9 @@ extern unsigned int sysctl_sched_nr_migrate;
 extern unsigned int sysctl_sched_time_avg;
 extern unsigned int sysctl_timer_migration;
 extern unsigned int sysctl_sched_shares_window;
+#ifdef CONFIG_NO_HZ_FULL
+extern unsigned int sysctl_sched_tick_max_deferment;
+#endif
 
 int sched_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *length,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e1a27f9..b5d3f99 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2751,12 +2751,16 @@ void scheduler_tick(void)
  * balancing, etc... continue to move forward, even
  * with a very low granularity.
  */
+unsigned int sysctl_sched_tick_max_deferment = HZ;
 u64 scheduler_tick_max_deferment(void)
 {
 	struct rq *rq = this_rq();
 	unsigned long next, now = ACCESS_ONCE(jiffies);
 
-	next = rq->last_sched_tick + HZ;
+	if (sysctl_sched_tick_max_deferment == -1)
+		return KTIME_MAX;
+
+	next = rq->last_sched_tick + sysctl_sched_tick_max_deferment;
 
 	if (time_before_eq(next, now))
 		return 0;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 75024a6..f445ab9 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -374,6 +374,7 @@ static void sched_debug_header(struct seq_file *m)
 	PN(sysctl_sched_wakeup_granularity);
 	P(sysctl_sched_child_runs_first);
 	P(sysctl_sched_features);
+	P(sysctl_sched_tick_max_deferment);
 #undef PN
 #undef P
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9edcf45..fb0b7d8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -393,6 +393,15 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 #endif /* CONFIG_NUMA_BALANCING */
+#ifdef CONFIG_NO_HZ_FULL
+	{
+		.procname	= "sched_tick_max_deferment",
+		.data		= &sysctl_sched_tick_max_deferment,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif /* CONFIG_NO_HZ_FULL */
 #endif /* CONFIG_SCHED_DEBUG */
 	{
 		.procname	= "sched_rt_period_us",
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-06-18 23:58 [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Kevin Hilman
@ 2013-06-18 23:58 ` Kevin Hilman
  2013-06-19 19:06   ` Frederic Weisbecker
  2013-06-19 18:42 ` [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Frederic Weisbecker
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Hilman @ 2013-06-18 23:58 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linaro-kernel, Ingo Molnar, Peter Zijlstra, open list

The conversion of the max deferment from usecs to nsecs can easily
overflow on platforms where a long is 32-bits.  To fix, cast the usecs
value to u64 before multiplying by NSECS_PER_USEC.

This was discovered on 32-bit ARM platform when extending the max
deferment value.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b5d3f99..b506722 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2765,7 +2765,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
 #endif
 
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment
  2013-06-18 23:58 [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Kevin Hilman
  2013-06-18 23:58 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
@ 2013-06-19 18:42 ` Frederic Weisbecker
  2013-06-19 20:34   ` Kevin Hilman
  1 sibling, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-19 18:42 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: linaro-kernel, Ingo Molnar, Peter Zijlstra, Clark Williams,
	Tony Luck, Andrew Morton, Kees Cook, Mel Gorman, Rik van Riel,
	open list

On Tue, Jun 18, 2013 at 04:58:28PM -0700, Kevin Hilman wrote:
> Allow sysctl override of sched_tick_max_deferment in order to ease
> finding/fixing the remaining issues with full nohz.
> 
> The value to be written is in jiffies, and -1 means the max deferment
> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>

This looks like a useful thing but I wonder if a debugfs file would
be more appropriate than sysctl.

The scheduler tick max deferment is supposed to be a temporary
hack so we probably don't want to bring a real user ABI for that.

I believe sysctl is for permanent ABIs, right?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-06-18 23:58 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
@ 2013-06-19 19:06   ` Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-19 19:06 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linaro-kernel, Ingo Molnar, Peter Zijlstra, open list

On Tue, Jun 18, 2013 at 04:58:29PM -0700, Kevin Hilman wrote:
> The conversion of the max deferment from usecs to nsecs can easily
> overflow on platforms where a long is 32-bits.  To fix, cast the usecs
> value to u64 before multiplying by NSECS_PER_USEC.
> 
> This was discovered on 32-bit ARM platform when extending the max
> deferment value.
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>

Right, if we make it tunable we need that patch.

Thanks!

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b5d3f99..b506722 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2765,7 +2765,7 @@ u64 scheduler_tick_max_deferment(void)
>  	if (time_before_eq(next, now))
>  		return 0;
>  
> -	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
> +	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>  }
>  #endif
>  
> -- 
> 1.8.3
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment
  2013-06-19 18:42 ` [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Frederic Weisbecker
@ 2013-06-19 20:34   ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2013-06-19 20:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linaro-kernel, Ingo Molnar, Peter Zijlstra, Clark Williams,
	Tony Luck, Andrew Morton, Kees Cook, Mel Gorman, Rik van Riel,
	open list

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Tue, Jun 18, 2013 at 04:58:28PM -0700, Kevin Hilman wrote:
>> Allow sysctl override of sched_tick_max_deferment in order to ease
>> finding/fixing the remaining issues with full nohz.
>> 
>> The value to be written is in jiffies, and -1 means the max deferment
>> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
>> 
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Signed-off-by: Kevin Hilman <khilman@linaro.org>
>
> This looks like a useful thing but I wonder if a debugfs file would
> be more appropriate than sysctl.
>
> The scheduler tick max deferment is supposed to be a temporary
> hack so we probably don't want to bring a real user ABI for that.

I wondered about that as well, but I wasn't sure if the existing knobs
under CONFIG_SCHED_DEBUG (sched_min_granularity_ns, sched_latency_ns,
etc.) are considered permanant ABI, or optional debugging tools.

This new option is inside CONFIG_SCHED_DEBUG along with the others, but
if debugfs is preferred I can move it there.  It seems strange though to
just have this knob in debugfs and the rest in sysctl under
CONFIG_SCHED_DEBUG.

Thanks,

Kevin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-09-16 22:43 [PATCH 1/2] sched/nohz: add debugfs " Kevin Hilman
@ 2013-09-16 22:43   ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2013-09-16 22:43 UTC (permalink / raw)
  To: linux-arm-kernel

The conversion of the max deferment from usecs to nsecs can easily
overflow on platforms where a long is 32-bits.  To fix, cast the usecs
value to u64 before multiplying by NSECS_PER_USEC.

This was discovered on 32-bit ARM platform when extending the max
deferment value.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b1fe3e..3d7c80e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
 
 static __init int sched_nohz_full_init_debug(void)
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
@ 2013-09-16 22:43   ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2013-09-16 22:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-arm-kernel, linaro-kernel, Paul McKenney, linux-kernel

The conversion of the max deferment from usecs to nsecs can easily
overflow on platforms where a long is 32-bits.  To fix, cast the usecs
value to u64 before multiplying by NSECS_PER_USEC.

This was discovered on 32-bit ARM platform when extending the max
deferment value.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b1fe3e..3d7c80e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
 
 static __init int sched_nohz_full_init_debug(void)
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-12-17 21:23 [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Kevin Hilman
@ 2013-12-17 21:23 ` Kevin Hilman
  2014-01-05 13:06   ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Hilman @ 2013-12-17 21:23 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner; +Cc: linux-kernel, linaro-kernel

The conversion of the max deferment from usecs to nsecs can easily
overflow on platforms where a long is 32-bits.  To fix, cast the usecs
value to u64 before multiplying by NSECS_PER_USEC.

This was discovered on 32-bit ARM platform when extending the max
deferment value.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b1fe3e69fe4..3d7c80e1c4d9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
 
 static __init int sched_nohz_full_init_debug(void)
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
@ 2014-01-05 13:06   ` Frederic Weisbecker
  2014-01-06 18:27     ` Kevin Hilman
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2014-01-05 13:06 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Peter Zijlstra,
	Ingo Molnar

On Tue, Dec 17, 2013 at 01:23:08PM -0800, Kevin Hilman wrote:
> The conversion of the max deferment from usecs to nsecs can easily
> overflow on platforms where a long is 32-bits.  To fix, cast the usecs
> value to u64 before multiplying by NSECS_PER_USEC.
> 
> This was discovered on 32-bit ARM platform when extending the max
> deferment value.
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4b1fe3e69fe4..3d7c80e1c4d9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
>  	if (time_before_eq(next, now))
>  		return 0;
>  
> -	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
> +	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;

Just to be sure I understand the issue. The problem is that jiffies_to_usecs()
return an unsigned int which is then multiplied by NSEC_PER_USEC. If the result
of the mul is too big to be stored in an unsigned int, we overflow and may loose
some high part of the result. Right?

>  }
>  
>  static __init int sched_nohz_full_init_debug(void)
> -- 
> 1.8.3
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2014-01-05 13:06   ` Frederic Weisbecker
@ 2014-01-06 18:27     ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2014-01-06 18:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Peter Zijlstra,
	Ingo Molnar

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Tue, Dec 17, 2013 at 01:23:08PM -0800, Kevin Hilman wrote:
>> The conversion of the max deferment from usecs to nsecs can easily
>> overflow on platforms where a long is 32-bits.  To fix, cast the usecs
>> value to u64 before multiplying by NSECS_PER_USEC.
>> 
>> This was discovered on 32-bit ARM platform when extending the max
>> deferment value.
>> 
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Signed-off-by: Kevin Hilman <khilman@linaro.org>
>> ---
>>  kernel/sched/core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 4b1fe3e69fe4..3d7c80e1c4d9 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
>>  	if (time_before_eq(next, now))
>>  		return 0;
>>  
>> -	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>> +	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>
> Just to be sure I understand the issue. The problem is that jiffies_to_usecs()
> return an unsigned int which is then multiplied by NSEC_PER_USEC. If the result
> of the mul is too big to be stored in an unsigned int, we overflow and may loose
> some high part of the result. Right?

Correct.

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-06 18:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-18 23:58 [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Kevin Hilman
2013-06-18 23:58 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
2013-06-19 19:06   ` Frederic Weisbecker
2013-06-19 18:42 ` [PATCH 1/2] sched/nohz: add sysctl control over sched_tick_max_deferment Frederic Weisbecker
2013-06-19 20:34   ` Kevin Hilman
  -- strict thread matches above, loose matches on Subject: below --
2013-09-16 22:43 [PATCH 1/2] sched/nohz: add debugfs " Kevin Hilman
2013-09-16 22:43 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
2013-09-16 22:43   ` Kevin Hilman
2013-12-17 21:23 [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Kevin Hilman
2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
2014-01-05 13:06   ` Frederic Weisbecker
2014-01-06 18:27     ` Kevin Hilman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.