* [PATCH v3 0/1] hrtimer: More fixes for handling of timer slack of rt tasks
@ 2024-08-13 7:29 Felix Moessbauer
2024-08-13 7:29 ` [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks Felix Moessbauer
0 siblings, 1 reply; 5+ messages in thread
From: Felix Moessbauer @ 2024-08-13 7:29 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Frederic Weisbecker, Anna-Maria Behnsen,
jan.kiszka, Sebastian Andrzej Siewior, qyousef, Felix Moessbauer
This series fixes the (hopefully) last location of an incorrectly
handled timer slack on rt tasks in hrtimer_start_range_ns(), which was
uncovered by a userland change in glibc 2.33. It does so by removing all special
handling of timerslack on RT tasks by instead setting the timerslack to 0 when
switching to an RT scheduling policy.
Changes since v2:
- complete re-design of approach (as proposed by Thomas Gleixner): Instead of
overwriting the timerslack in the hrtimer setup path, we now set the timerslack
to 0 when switching to an RT scheduling policy.
Changes since v1:
- drop patch "hrtimer: Document, that PI boosted tasks have no timer slack", as
this behavior is incorrect and is already adressed in 20240610192018.1567075-1-qyousef@layalina.io
- use task_is_realtime() instead of rt_task()
- fix style of commit message
v1 discussion: https://lore.kernel.org/lkml/20240805124116.21394-1-felix.moessbauer@siemens.com
v2 discussion: https://lore.kernel.org/lkml/20240805140930.29462-1-felix.moessbauer@siemens.com
Best regards,
Felix Moessbauer
Siemens AG
Felix Moessbauer (1):
hrtimer: use and report correct timerslack values for realtime tasks
fs/proc/base.c | 9 +++++----
fs/select.c | 12 +++---------
kernel/sched/syscalls.c | 8 ++++++++
kernel/sys.c | 2 ++
kernel/time/hrtimer.c | 18 +++---------------
5 files changed, 21 insertions(+), 28 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks
2024-08-13 7:29 [PATCH v3 0/1] hrtimer: More fixes for handling of timer slack of rt tasks Felix Moessbauer
@ 2024-08-13 7:29 ` Felix Moessbauer
2024-08-13 9:45 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Felix Moessbauer @ 2024-08-13 7:29 UTC (permalink / raw)
To: linux-kernel
Cc: Thomas Gleixner, Frederic Weisbecker, Anna-Maria Behnsen,
jan.kiszka, Sebastian Andrzej Siewior, qyousef, Felix Moessbauer
The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.
This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.
This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". The special handling of timerslack on
rt tasks in downstream users is removed as well.
Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
---
fs/proc/base.c | 9 +++++----
fs/select.c | 12 +++---------
kernel/sched/syscalls.c | 8 ++++++++
kernel/sys.c | 2 ++
kernel/time/hrtimer.c | 18 +++---------------
5 files changed, 21 insertions(+), 28 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 72a1acd03675..7ff3618c1e6f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2569,10 +2569,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
}
task_lock(p);
- if (slack_ns == 0)
- p->timer_slack_ns = p->default_timer_slack_ns;
- else
- p->timer_slack_ns = slack_ns;
+ if (task_is_realtime(p))
+ slack_ns = 0;
+ else if (slack_ns == 0)
+ slack_ns = p->default_timer_slack_ns;
+ p->timer_slack_ns = slack_ns;
task_unlock(p);
out:
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..153124ed50fd 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -77,19 +77,13 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
{
u64 ret;
struct timespec64 now;
-
- /*
- * Realtime tasks get a slack of 0 for obvious reasons.
- */
-
- if (rt_task(current))
- return 0;
+ u64 slack = current->timer_slack_ns;
ktime_get_ts64(&now);
now = timespec64_sub(*tv, now);
ret = __estimate_accuracy(&now);
- if (ret < current->timer_slack_ns)
- return current->timer_slack_ns;
+ if (ret < slack || slack == 0)
+ return slack;
return ret;
}
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index ae1b42775ef9..195d2f2834a9 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -406,6 +406,14 @@ static void __setscheduler_params(struct task_struct *p,
else if (fair_policy(policy))
p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+ /* rt-policy tasks do not have a timerslack */
+ if (task_is_realtime(p)) {
+ p->timer_slack_ns = 0;
+ } else if (p->timer_slack_ns == 0) {
+ /* when switching back to non-rt policy, restore timerslack */
+ p->timer_slack_ns = p->default_timer_slack_ns;
+ }
+
/*
* __sched_setscheduler() ensures attr->sched_priority == 0 when
* !rt_policy. Always setting this ensures that things like
diff --git a/kernel/sys.c b/kernel/sys.c
index 3a2df1bd9f64..e3c4cffb520c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2557,6 +2557,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
error = current->timer_slack_ns;
break;
case PR_SET_TIMERSLACK:
+ if (task_is_realtime(current))
+ break;
if (arg2 <= 0)
current->timer_slack_ns =
current->default_timer_slack_ns;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index b8ee320208d4..18eea0be706a 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2072,14 +2072,9 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode,
struct restart_block *restart;
struct hrtimer_sleeper t;
int ret = 0;
- u64 slack;
-
- slack = current->timer_slack_ns;
- if (rt_task(current))
- slack = 0;
hrtimer_init_sleeper_on_stack(&t, clockid, mode);
- hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
+ hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns);
ret = do_nanosleep(&t, mode);
if (ret != -ERESTART_RESTARTBLOCK)
goto out;
@@ -2249,7 +2244,7 @@ void __init hrtimers_init(void)
/**
* schedule_hrtimeout_range_clock - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
* @clock_id: timer clock to be used
*/
@@ -2276,13 +2271,6 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta,
return -EINTR;
}
- /*
- * Override any slack passed by the user if under
- * rt contraints.
- */
- if (rt_task(current))
- delta = 0;
-
hrtimer_init_sleeper_on_stack(&t, clock_id, mode);
hrtimer_set_expires_range_ns(&t.timer, *expires, delta);
hrtimer_sleeper_start_expires(&t, mode);
@@ -2302,7 +2290,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock);
/**
* schedule_hrtimeout_range - sleep until timeout
* @expires: timeout value (ktime_t)
- * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks
+ * @delta: slack in expires timeout (ktime_t)
* @mode: timer mode
*
* Make the current task sleep until the given expiry time has
--
2.39.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks
2024-08-13 7:29 ` [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks Felix Moessbauer
@ 2024-08-13 9:45 ` Thomas Gleixner
2024-08-13 10:23 ` MOESSBAUER, Felix
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2024-08-13 9:45 UTC (permalink / raw)
To: Felix Moessbauer, linux-kernel
Cc: Frederic Weisbecker, Anna-Maria Behnsen, jan.kiszka,
Sebastian Andrzej Siewior, qyousef, Felix Moessbauer
On Tue, Aug 13 2024 at 09:29, Felix Moessbauer wrote:
> @@ -2569,10 +2569,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
> }
>
> task_lock(p);
> - if (slack_ns == 0)
> - p->timer_slack_ns = p->default_timer_slack_ns;
> - else
> - p->timer_slack_ns = slack_ns;
> + if (task_is_realtime(p))
> + slack_ns = 0;
This should respect the user supplied value, i.e.
if (!task_is_realtime(p) && !slack_ns)
slack_ns = p->default_timer_slack_ns;
> + else if (slack_ns == 0)
> + slack_ns = p->default_timer_slack_ns;
> + p->timer_slack_ns = slack_ns;
> task_unlock(p);
>
> out:
> diff --git a/fs/select.c b/fs/select.c
> index 9515c3fa1a03..153124ed50fd 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -77,19 +77,13 @@ u64 select_estimate_accuracy(struct timespec64 *tv)
> {
> u64 ret;
> struct timespec64 now;
> -
> - /*
> - * Realtime tasks get a slack of 0 for obvious reasons.
> - */
> -
> - if (rt_task(current))
> - return 0;
> + u64 slack = current->timer_slack_ns;
>
> ktime_get_ts64(&now);
> now = timespec64_sub(*tv, now);
> ret = __estimate_accuracy(&now);
> - if (ret < current->timer_slack_ns)
> - return current->timer_slack_ns;
> + if (ret < slack || slack == 0)
> + return slack;
Seriously? Do all the calculations first and then discard them when
slack is 0?
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index ae1b42775ef9..195d2f2834a9 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -406,6 +406,14 @@ static void __setscheduler_params(struct task_struct *p,
> else if (fair_policy(policy))
> p->static_prio = NICE_TO_PRIO(attr->sched_nice);
>
> + /* rt-policy tasks do not have a timerslack */
> + if (task_is_realtime(p)) {
> + p->timer_slack_ns = 0;
> + } else if (p->timer_slack_ns == 0) {
> + /* when switching back to non-rt policy, restore timerslack */
> + p->timer_slack_ns = p->default_timer_slack_ns;
> + }
> +
> /*
> * __sched_setscheduler() ensures attr->sched_priority == 0 when
> * !rt_policy. Always setting this ensures that things like
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 3a2df1bd9f64..e3c4cffb520c 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2557,6 +2557,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> error = current->timer_slack_ns;
> break;
> case PR_SET_TIMERSLACK:
> + if (task_is_realtime(current))
> + break;
Why are you declaring that a RT task has to have 0 slack if we are
lifting the hard coded slack zeroing in the hrtimer functions?
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks
2024-08-13 9:45 ` Thomas Gleixner
@ 2024-08-13 10:23 ` MOESSBAUER, Felix
2024-08-14 10:42 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: MOESSBAUER, Felix @ 2024-08-13 10:23 UTC (permalink / raw)
To: tglx@linutronix.de, linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io, frederic@kernel.org, Kiszka, Jan,
bigeasy@linutronix.de, anna-maria@linutronix.de
On Tue, 2024-08-13 at 11:45 +0200, Thomas Gleixner wrote:
> On Tue, Aug 13 2024 at 09:29, Felix Moessbauer wrote:
> > @@ -2569,10 +2569,11 @@ static ssize_t timerslack_ns_write(struct
> > file *file, const char __user *buf,
> > }
> >
> > task_lock(p);
> > - if (slack_ns == 0)
> > - p->timer_slack_ns = p->default_timer_slack_ns;
> > - else
> > - p->timer_slack_ns = slack_ns;
> > + if (task_is_realtime(p))
> > + slack_ns = 0;
>
> This should respect the user supplied value, i.e.
Ok, but then we need to update the man page as well (see below).
>
> if (!task_is_realtime(p) && !slack_ns)
> slack_ns = p->default_timer_slack_ns;
>
> > + else if (slack_ns == 0)
> > + slack_ns = p->default_timer_slack_ns;
> > + p->timer_slack_ns = slack_ns;
> > task_unlock(p);
> >
> > out:
> > diff --git a/fs/select.c b/fs/select.c
> > index 9515c3fa1a03..153124ed50fd 100644
> > --- a/fs/select.c
> > +++ b/fs/select.c
> > @@ -77,19 +77,13 @@ u64 select_estimate_accuracy(struct timespec64
> > *tv)
> > {
> > u64 ret;
> > struct timespec64 now;
> > -
> > - /*
> > - * Realtime tasks get a slack of 0 for obvious reasons.
> > - */
> > -
> > - if (rt_task(current))
> > - return 0;
> > + u64 slack = current->timer_slack_ns;
> >
> > ktime_get_ts64(&now);
> > now = timespec64_sub(*tv, now);
> > ret = __estimate_accuracy(&now);
> > - if (ret < current->timer_slack_ns)
> > - return current->timer_slack_ns;
> > + if (ret < slack || slack == 0)
> > + return slack;
>
> Seriously? Do all the calculations first and then discard them when
> slack is 0?
Ok, I'll short circuit.
>
> > diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> > index ae1b42775ef9..195d2f2834a9 100644
> > --- a/kernel/sched/syscalls.c
> > +++ b/kernel/sched/syscalls.c
> > @@ -406,6 +406,14 @@ static void __setscheduler_params(struct
> > task_struct *p,
> > else if (fair_policy(policy))
> > p->static_prio = NICE_TO_PRIO(attr->sched_nice);
> >
> > + /* rt-policy tasks do not have a timerslack */
> > + if (task_is_realtime(p)) {
> > + p->timer_slack_ns = 0;
> > + } else if (p->timer_slack_ns == 0) {
> > + /* when switching back to non-rt policy, restore
> > timerslack */
> > + p->timer_slack_ns = p->default_timer_slack_ns;
> > + }
> > +
> > /*
> > * __sched_setscheduler() ensures attr->sched_priority == 0
> > when
> > * !rt_policy. Always setting this ensures that things like
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 3a2df1bd9f64..e3c4cffb520c 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2557,6 +2557,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> > long, arg2, unsigned long, arg3,
> > error = current->timer_slack_ns;
> > break;
> > case PR_SET_TIMERSLACK:
> > + if (task_is_realtime(current))
> > + break;
>
> Why are you declaring that a RT task has to have 0 slack if we are
> lifting the hard coded slack zeroing in the hrtimer functions?
This is what the manpage states [1]:
+ Timer slack is not applied to threads that are scheduled under a
+ real-time scheduling policy (see sched_setscheduler(2)).
[1] https://man7.org/linux/man-pages/man2/PR_SET_TIMERSLACK.2const.html
Best regards,
Felix
>
> Thanks,
>
> tglx
--
Siemens AG, Technology
Linux Expert Center
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks
2024-08-13 10:23 ` MOESSBAUER, Felix
@ 2024-08-14 10:42 ` Thomas Gleixner
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2024-08-14 10:42 UTC (permalink / raw)
To: MOESSBAUER, Felix, linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io, frederic@kernel.org, Kiszka, Jan,
bigeasy@linutronix.de, anna-maria@linutronix.de
On Tue, Aug 13 2024 at 10:23, Felix MOESSBAUER wrote:
> On Tue, 2024-08-13 at 11:45 +0200, Thomas Gleixner wrote:
>> Why are you declaring that a RT task has to have 0 slack if we are
>> lifting the hard coded slack zeroing in the hrtimer functions?
>
> This is what the manpage states [1]:
>
> + Timer slack is not applied to threads that are scheduled under a
> + real-time scheduling policy (see sched_setscheduler(2)).
>
> [1] https://man7.org/linux/man-pages/man2/PR_SET_TIMERSLACK.2const.html
Fair enough. Mention it in the changelog please.
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-14 10:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 7:29 [PATCH v3 0/1] hrtimer: More fixes for handling of timer slack of rt tasks Felix Moessbauer
2024-08-13 7:29 ` [PATCH v3 1/1] hrtimer: use and report correct timerslack values for realtime tasks Felix Moessbauer
2024-08-13 9:45 ` Thomas Gleixner
2024-08-13 10:23 ` MOESSBAUER, Felix
2024-08-14 10:42 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox