* [PATCH 4.19 v2 2/3] sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
2024-02-22 17:05 [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value Petr Vorel
@ 2024-02-22 17:05 ` Petr Vorel
2024-02-22 17:05 ` [PATCH 4.19 v2 3/3] sched/rt: Disallow writing invalid values to sched_rt_period_us Petr Vorel
2024-02-23 16:25 ` [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value Greg Kroah-Hartman
2 siblings, 0 replies; 4+ messages in thread
From: Petr Vorel @ 2024-02-22 17:05 UTC (permalink / raw)
To: stable
Cc: Cyril Hrubis, Greg Kroah-Hartman, Sasha Levin, Ingo Molnar,
Peter Zijlstra, Petr Vorel, Mel Gorman
From: Cyril Hrubis <chrubis@suse.cz>
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is
<= 0. However after reading from this file we always got the last value
written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms
$ cat /proc/sys/kernel/sched_rr_timeslice_ms
-1
Fix this by setting the variable that holds the sysctl file value to the
jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
kernel/sched/rt.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ce4594215728..2ea4da8c5f3a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2735,6 +2735,9 @@ int sched_rr_handler(struct ctl_table *table, int write,
sched_rr_timeslice =
sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE :
msecs_to_jiffies(sysctl_sched_rr_timeslice);
+
+ if (sysctl_sched_rr_timeslice <= 0)
+ sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE);
}
mutex_unlock(&mutex);
--
2.35.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 4.19 v2 3/3] sched/rt: Disallow writing invalid values to sched_rt_period_us
2024-02-22 17:05 [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value Petr Vorel
2024-02-22 17:05 ` [PATCH 4.19 v2 2/3] sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset Petr Vorel
@ 2024-02-22 17:05 ` Petr Vorel
2024-02-23 16:25 ` [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value Greg Kroah-Hartman
2 siblings, 0 replies; 4+ messages in thread
From: Petr Vorel @ 2024-02-22 17:05 UTC (permalink / raw)
To: stable
Cc: Cyril Hrubis, Greg Kroah-Hartman, Sasha Levin, Ingo Molnar,
Peter Zijlstra, Petr Vorel
From: Cyril Hrubis <chrubis@suse.cz>
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken
because:
- the sysclt_sched_rt_period is declared as unsigned int
- parsed by proc_do_intvec()
- the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a
unsigned integer that were later on interpreted as large positive
integers which did passed the check:
if (sysclt_sched_rt_period <= 0)
return EINVAL;
This commit fixes the parsing by setting explicit range for both
perid_us and runtime_us into the sched_rt_sysctls table and processes
the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the
period value we would have to split the proc_handler and use
proc_douintvec() for it however even the
Documentation/scheduller/sched-rt-group.rst describes the range as 1 to
INT_MAX.
As far as I can tell the only problem this causes is that the sysctl
file allows writing negative values which when read back may confuse
userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@suse.cz/
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz
[ pvorel: rebased for 4.19 ]
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
kernel/sched/rt.c | 5 +----
kernel/sysctl.c | 5 +++++
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2ea4da8c5f3a..deb9366e4f30 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2658,9 +2658,6 @@ static int sched_rt_global_constraints(void)
static int sched_rt_global_validate(void)
{
- if (sysctl_sched_rt_period <= 0)
- return -EINVAL;
-
if ((sysctl_sched_rt_runtime != RUNTIME_INF) &&
(sysctl_sched_rt_runtime > sysctl_sched_rt_period))
return -EINVAL;
@@ -2690,7 +2687,7 @@ int sched_rt_handler(struct ctl_table *table, int write,
old_period = sysctl_sched_rt_period;
old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos);
+ ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) {
ret = sched_rt_global_validate();
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4bb194f096ec..6ce9f10b9c7d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -127,6 +127,7 @@ static int zero;
static int __maybe_unused one = 1;
static int __maybe_unused two = 2;
static int __maybe_unused four = 4;
+static int int_max = INT_MAX;
static unsigned long zero_ul;
static unsigned long one_ul = 1;
static unsigned long long_max = LONG_MAX;
@@ -464,6 +465,8 @@ static struct ctl_table kern_table[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = sched_rt_handler,
+ .extra1 = &one,
+ .extra2 = &int_max,
},
{
.procname = "sched_rt_runtime_us",
@@ -471,6 +474,8 @@ static struct ctl_table kern_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = sched_rt_handler,
+ .extra1 = &neg_one,
+ .extra2 = &int_max,
},
{
.procname = "sched_rr_timeslice_ms",
--
2.35.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value
2024-02-22 17:05 [PATCH 4.19 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value Petr Vorel
2024-02-22 17:05 ` [PATCH 4.19 v2 2/3] sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset Petr Vorel
2024-02-22 17:05 ` [PATCH 4.19 v2 3/3] sched/rt: Disallow writing invalid values to sched_rt_period_us Petr Vorel
@ 2024-02-23 16:25 ` Greg Kroah-Hartman
2 siblings, 0 replies; 4+ messages in thread
From: Greg Kroah-Hartman @ 2024-02-23 16:25 UTC (permalink / raw)
To: Petr Vorel
Cc: stable, Cyril Hrubis, Sasha Levin, Ingo Molnar, Peter Zijlstra,
Mel Gorman
On Thu, Feb 22, 2024 at 06:05:38PM +0100, Petr Vorel wrote:
> From: Cyril Hrubis <chrubis@suse.cz>
>
> [ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
>
> There is a 10% rounding error in the intial value of the
> sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
>
> This was found with LTP test sched_rr_get_interval01:
>
> sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
> sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
> sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
> sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
> sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
> sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
>
> What this test does is to compare the return value from the
> sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
> fails if they do not match.
>
> The problem it found is the intial sysctl file value which was computed as:
>
> static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
>
> which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
> introduces 10% rounding error for CONFIG_HZ_300:
>
> (MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
>
> (1000 / 300) * (100 * 300 / 1000)
>
> 3 * 30 = 90
>
> This can be easily fixed by reversing the order of the multiplication
> and division. After this fix we get:
>
> (MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
>
> (1000 * (100 * 300 / 1000)) / 300
>
> (1000 * 30) / 300 = 100
>
> Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Petr Vorel <pvorel@suse.cz>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Tested-by: Petr Vorel <pvorel@suse.cz>
> Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
> [ pvorel: rebased for 4.19 ]
> Signed-off-by: Petr Vorel <pvorel@suse.cz>
> ---
> kernel/sched/rt.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 394c66442cff..ce4594215728 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -8,7 +8,7 @@
> #include "pelt.h"
>
> int sched_rr_timeslice = RR_TIMESLICE;
> -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
> +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ;
>
> static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
>
> --
> 2.35.3
>
>
All now queued up, thanks!
greg k-h
^ permalink raw reply [flat|nested] 4+ messages in thread