* [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork
@ 2026-05-21 2:52 Guanyou.Chen
2026-05-21 3:18 ` Cunlong Li
2026-05-21 7:04 ` Peter Zijlstra
0 siblings, 2 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-21 2:52 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, chenguanyou, linqiaoting, chunhui.li,
linux-kernel, linux-mm
Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
__setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
creates child threads, the children inherit timer_slack_ns=0 from the
parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
not restore timer_slack_ns, leaving the child permanently running with
zero slack.
Additionally, init_task never initialized default_timer_slack_ns, so all
processes in the system have default_timer_slack_ns=0 inherited from init.
The original fork code masked this by using timer_slack_ns (50000) as the
source for default_timer_slack_ns. After ed4fb6d7ef68, RT tasks have
timer_slack_ns=0, exposing this latent bug.
This causes unnecessary timer interrupts and increased power consumption,
as NORMAL threads with slack=0 prevent timer coalescing.
Fix this by:
1. Initializing default_timer_slack_ns=50000 in init_task.
2. In copy_process(), removing the incorrect default_timer_slack_ns
override (dup_task_struct already copies both timer_slack_ns and
default_timer_slack_ns correctly from the parent).
3. In sched_fork(), restoring timer_slack_ns from default_timer_slack_ns
when resetting from RT/DL to NORMAL policy.
Before this fix (RT parent, RESET_ON_FORK, 32 child threads usleep(1)):
child slack=0, avg_sleep=38us, ~832K interrupts/s
After this fix:
child slack=50000, avg_sleep=88us, ~363K interrupts/s
Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task struct")
Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
---
init/init_task.c | 1 +
kernel/fork.c | 2 --
kernel/sched/core.c | 1 +
3 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/init_task.c b/init/init_task.c
index 5c838757fc10..57ff8dae9bfb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
INIT_CPU_TIMERS(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
+ .default_timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
#ifdef CONFIG_AUDIT
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..8358df80e11d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
retval = -EAGAIN;
#endif
- p->default_timer_slack_ns = current->timer_slack_ns;
-
#ifdef CONFIG_PSI
p->psi_flags = 0;
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..b1a241810ce0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
p->policy = SCHED_NORMAL;
p->static_prio = NICE_TO_PRIO(0);
p->rt_priority = 0;
+ p->timer_slack_ns = p->default_timer_slack_ns;
} else if (PRIO_TO_NICE(p->static_prio) < 0)
p->static_prio = NICE_TO_PRIO(0);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-21 2:52 [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
@ 2026-05-21 3:18 ` Cunlong Li
2026-05-21 6:23 ` Guanyou Chen
2026-05-21 6:31 ` Guanyou Chen
2026-05-21 7:04 ` Peter Zijlstra
1 sibling, 2 replies; 11+ messages in thread
From: Cunlong Li @ 2026-05-21 3:18 UTC (permalink / raw)
To: Guanyou.Chen
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton, Thomas Gleixner, Felix Moessbauer,
Dietmar Eggemann, Steven Rostedt, Kees Cook, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
> for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
> __setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
> creates child threads, the children inherit timer_slack_ns=0 from the
> parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
> not restore timer_slack_ns, leaving the child permanently running with
> zero slack.
>
> Additionally, init_task never initialized default_timer_slack_ns, so all
> processes in the system have default_timer_slack_ns=0 inherited from init.
> The original fork code masked this by using timer_slack_ns (50000) as the
> source for default_timer_slack_ns. After ed4fb6d7ef68, RT tasks have
> timer_slack_ns=0, exposing this latent bug.
>
> This causes unnecessary timer interrupts and increased power consumption,
> as NORMAL threads with slack=0 prevent timer coalescing.
>
> Fix this by:
> 1. Initializing default_timer_slack_ns=50000 in init_task.
> 2. In copy_process(), removing the incorrect default_timer_slack_ns
> override (dup_task_struct already copies both timer_slack_ns and
> default_timer_slack_ns correctly from the parent).
> 3. In sched_fork(), restoring timer_slack_ns from default_timer_slack_ns
> when resetting from RT/DL to NORMAL policy.
>
> Before this fix (RT parent, RESET_ON_FORK, 32 child threads usleep(1)):
> child slack=0, avg_sleep=38us, ~832K interrupts/s
>
> After this fix:
> child slack=50000, avg_sleep=88us, ~363K interrupts/s
>
> Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task struct")
> Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
> Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
> Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
> Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
> ---
> init/init_task.c | 1 +
> kernel/fork.c | 2 --
> kernel/sched/core.c | 1 +
> 3 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/init/init_task.c b/init/init_task.c
> index 5c838757fc10..57ff8dae9bfb 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> INIT_CPU_TIMERS(init_task)
> .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> .timer_slack_ns = 50000, /* 50 usec default slack */
> + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> .thread_pid = &init_struct_pid,
> .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> #ifdef CONFIG_AUDIT
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 65113a304518..8358df80e11d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> retval = -EAGAIN;
> #endif
>
> - p->default_timer_slack_ns = current->timer_slack_ns;
Hi Guanyou,
This changes behavior for normal (non-RT) tasks. If a process calls
prctl(PR_SET_TIMERSLACK, 200000) and then forks, the child currently
gets default_timer_slack_ns=200000 (the parent's effective slack).
With this removal, the child would get default_timer_slack_ns=50000
(the parent's original default), so a subsequent PR_SET_TIMERSLACK(0)
in the child would reset to a different value than before.
I think the fix should be narrowed to only handle the RT parent case:
if (rt_or_dl_task_policy(current))
p->default_timer_slack_ns = current->default_timer_slack_ns;
else
p->default_timer_slack_ns = current->timer_slack_ns;
Thanks
> -
> #ifdef CONFIG_PSI
> p->psi_flags = 0;
> #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e..b1a241810ce0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
> p->policy = SCHED_NORMAL;
> p->static_prio = NICE_TO_PRIO(0);
> p->rt_priority = 0;
> + p->timer_slack_ns = p->default_timer_slack_ns;
> } else if (PRIO_TO_NICE(p->static_prio) < 0)
> p->static_prio = NICE_TO_PRIO(0);
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-21 3:18 ` Cunlong Li
@ 2026-05-21 6:23 ` Guanyou Chen
2026-05-21 6:31 ` Guanyou Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou Chen @ 2026-05-21 6:23 UTC (permalink / raw)
To: Cunlong Li, Guanyou.Chen, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Andrew Morton, Thomas Gleixner, Felix Moessbauer,
Dietmar Eggemann, Steven Rostedt, Kees Cook, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 5450 bytes --]
Hi Cunlong,
Thanks for looking at this.
You're right that this changes the behavior for non-RT tasks - the
child's default_timer_slack_ns would come from dup_task_struct (parent's
default_timer_slack_ns) rather than parent's timer_slack_ns.
However, looking at the original commit 6976675d9404 that introduced
default_timer_slack_ns, its purpose is described as a "reset target" for
prctl(PR_SET_TIMERSLACK, 0). There's no documented requirement that
children should inherit the parent's effective slack as their default.
That said, I'm happy to narrow the fix to only handle the RT parent
case if maintainers prefer preserving the existing behavior. Will wait
for their input.
Thanks
Guanyou
Cunlong Li <shenxiaogll@gmail.com> 于2026年5月21日周四 11:18写道:
> On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> > Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
> > for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
> > __setscheduler_params(). However, when an RT task with
> SCHED_RESET_ON_FORK
> > creates child threads, the children inherit timer_slack_ns=0 from the
> > parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
> > not restore timer_slack_ns, leaving the child permanently running with
> > zero slack.
> >
> > Additionally, init_task never initialized default_timer_slack_ns, so all
> > processes in the system have default_timer_slack_ns=0 inherited from
> init.
> > The original fork code masked this by using timer_slack_ns (50000) as the
> > source for default_timer_slack_ns. After ed4fb6d7ef68, RT tasks have
> > timer_slack_ns=0, exposing this latent bug.
> >
> > This causes unnecessary timer interrupts and increased power consumption,
> > as NORMAL threads with slack=0 prevent timer coalescing.
> >
> > Fix this by:
> > 1. Initializing default_timer_slack_ns=50000 in init_task.
> > 2. In copy_process(), removing the incorrect default_timer_slack_ns
> > override (dup_task_struct already copies both timer_slack_ns and
> > default_timer_slack_ns correctly from the parent).
> > 3. In sched_fork(), restoring timer_slack_ns from default_timer_slack_ns
> > when resetting from RT/DL to NORMAL policy.
> >
> > Before this fix (RT parent, RESET_ON_FORK, 32 child threads usleep(1)):
> > child slack=0, avg_sleep=38us, ~832K interrupts/s
> >
> > After this fix:
> > child slack=50000, avg_sleep=88us, ~363K interrupts/s
> >
> > Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task
> struct")
> > Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
> for realtime tasks")
> > Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
> > Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
> > Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
> > ---
> > init/init_task.c | 1 +
> > kernel/fork.c | 2 --
> > kernel/sched/core.c | 1 +
> > 3 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 5c838757fc10..57ff8dae9bfb 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -170,6 +170,7 @@ struct task_struct init_task
> __aligned(L1_CACHE_BYTES) = {
> > INIT_CPU_TIMERS(init_task)
> > .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> > .timer_slack_ns = 50000, /* 50 usec default slack */
> > + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> > .thread_pid = &init_struct_pid,
> > .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> > #ifdef CONFIG_AUDIT
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 65113a304518..8358df80e11d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> > retval = -EAGAIN;
> > #endif
> >
> > - p->default_timer_slack_ns = current->timer_slack_ns;
> Hi Guanyou,
>
> This changes behavior for normal (non-RT) tasks. If a process calls
> prctl(PR_SET_TIMERSLACK, 200000) and then forks, the child currently
> gets default_timer_slack_ns=200000 (the parent's effective slack).
> With this removal, the child would get default_timer_slack_ns=50000
> (the parent's original default), so a subsequent PR_SET_TIMERSLACK(0)
> in the child would reset to a different value than before.
> I think the fix should be narrowed to only handle the RT parent case:
> if (rt_or_dl_task_policy(current))
> p->default_timer_slack_ns =
> current->default_timer_slack_ns;
> else
> p->default_timer_slack_ns = current->timer_slack_ns;
> Thanks
> > -
> > #ifdef CONFIG_PSI
> > p->psi_flags = 0;
> > #endif
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b7f77c165a6e..b1a241810ce0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct
> *p)
> > p->policy = SCHED_NORMAL;
> > p->static_prio = NICE_TO_PRIO(0);
> > p->rt_priority = 0;
> > + p->timer_slack_ns = p->default_timer_slack_ns;
> > } else if (PRIO_TO_NICE(p->static_prio) < 0)
> > p->static_prio = NICE_TO_PRIO(0);
> >
> > --
> > 2.34.1
> >
>
[-- Attachment #2: Type: text/html, Size: 6769 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-21 3:18 ` Cunlong Li
2026-05-21 6:23 ` Guanyou Chen
@ 2026-05-21 6:31 ` Guanyou Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou Chen @ 2026-05-21 6:31 UTC (permalink / raw)
To: Cunlong Li, Guanyou.Chen, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Andrew Morton, Thomas Gleixner, Felix Moessbauer,
Dietmar Eggemann, Steven Rostedt, Kees Cook, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
Hi Cunlong,
Thanks for looking at this.
You're right that this changes the behavior for non-RT tasks - the
child's default_timer_slack_ns would come from dup_task_struct (parent's
default_timer_slack_ns) rather than parent's timer_slack_ns.
However, looking at the original commit 6976675d9404 that introduced
default_timer_slack_ns, its purpose is described as a "reset target" for
prctl(PR_SET_TIMERSLACK, 0). There's no documented requirement that
children should inherit the parent's effective slack as their default.
That said, I'm happy to narrow the fix to only handle the RT parent
case if maintainers prefer preserving the existing behavior. Will wait
for their input.
Thanks
Guanyou
Cunlong Li <shenxiaogll@gmail.com> 于2026年5月21日周四 11:18写道:
>
> On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> > Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
> > for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
> > __setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
> > creates child threads, the children inherit timer_slack_ns=0 from the
> > parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
> > not restore timer_slack_ns, leaving the child permanently running with
> > zero slack.
> >
> > Additionally, init_task never initialized default_timer_slack_ns, so all
> > processes in the system have default_timer_slack_ns=0 inherited from init.
> > The original fork code masked this by using timer_slack_ns (50000) as the
> > source for default_timer_slack_ns. After ed4fb6d7ef68, RT tasks have
> > timer_slack_ns=0, exposing this latent bug.
> >
> > This causes unnecessary timer interrupts and increased power consumption,
> > as NORMAL threads with slack=0 prevent timer coalescing.
> >
> > Fix this by:
> > 1. Initializing default_timer_slack_ns=50000 in init_task.
> > 2. In copy_process(), removing the incorrect default_timer_slack_ns
> > override (dup_task_struct already copies both timer_slack_ns and
> > default_timer_slack_ns correctly from the parent).
> > 3. In sched_fork(), restoring timer_slack_ns from default_timer_slack_ns
> > when resetting from RT/DL to NORMAL policy.
> >
> > Before this fix (RT parent, RESET_ON_FORK, 32 child threads usleep(1)):
> > child slack=0, avg_sleep=38us, ~832K interrupts/s
> >
> > After this fix:
> > child slack=50000, avg_sleep=88us, ~363K interrupts/s
> >
> > Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task struct")
> > Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
> > Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
> > Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
> > Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
> > ---
> > init/init_task.c | 1 +
> > kernel/fork.c | 2 --
> > kernel/sched/core.c | 1 +
> > 3 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 5c838757fc10..57ff8dae9bfb 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> > INIT_CPU_TIMERS(init_task)
> > .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> > .timer_slack_ns = 50000, /* 50 usec default slack */
> > + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> > .thread_pid = &init_struct_pid,
> > .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> > #ifdef CONFIG_AUDIT
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 65113a304518..8358df80e11d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> > retval = -EAGAIN;
> > #endif
> >
> > - p->default_timer_slack_ns = current->timer_slack_ns;
> Hi Guanyou,
>
> This changes behavior for normal (non-RT) tasks. If a process calls
> prctl(PR_SET_TIMERSLACK, 200000) and then forks, the child currently
> gets default_timer_slack_ns=200000 (the parent's effective slack).
> With this removal, the child would get default_timer_slack_ns=50000
> (the parent's original default), so a subsequent PR_SET_TIMERSLACK(0)
> in the child would reset to a different value than before.
> I think the fix should be narrowed to only handle the RT parent case:
> if (rt_or_dl_task_policy(current))
> p->default_timer_slack_ns = current->default_timer_slack_ns;
> else
> p->default_timer_slack_ns = current->timer_slack_ns;
> Thanks
> > -
> > #ifdef CONFIG_PSI
> > p->psi_flags = 0;
> > #endif
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b7f77c165a6e..b1a241810ce0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
> > p->policy = SCHED_NORMAL;
> > p->static_prio = NICE_TO_PRIO(0);
> > p->rt_priority = 0;
> > + p->timer_slack_ns = p->default_timer_slack_ns;
> > } else if (PRIO_TO_NICE(p->static_prio) < 0)
> > p->static_prio = NICE_TO_PRIO(0);
> >
> > --
> > 2.34.1
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-21 2:52 [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-21 3:18 ` Cunlong Li
@ 2026-05-21 7:04 ` Peter Zijlstra
2026-05-21 7:35 ` [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks Guanyou.Chen
1 sibling, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2026-05-21 7:04 UTC (permalink / raw)
To: Guanyou.Chen
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Andrew Morton,
Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, chenguanyou, linqiaoting, chunhui.li,
linux-kernel, linux-mm
On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> diff --git a/init/init_task.c b/init/init_task.c
> index 5c838757fc10..57ff8dae9bfb 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> INIT_CPU_TIMERS(init_task)
> .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> .timer_slack_ns = 50000, /* 50 usec default slack */
> + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> .thread_pid = &init_struct_pid,
> .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> #ifdef CONFIG_AUDIT
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 65113a304518..8358df80e11d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> retval = -EAGAIN;
> #endif
>
> - p->default_timer_slack_ns = current->timer_slack_ns;
> -
> #ifdef CONFIG_PSI
> p->psi_flags = 0;
> #endif
Cunlong makes a good point in that this changes behaviour. That said I
do find the current behaviour 'odd'.
*IF* we want to change this (and changing behaviour is always dodgy),
then it should be a separate patch with a separate justification.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e..b1a241810ce0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
> p->policy = SCHED_NORMAL;
> p->static_prio = NICE_TO_PRIO(0);
> p->rt_priority = 0;
> + p->timer_slack_ns = p->default_timer_slack_ns;
> } else if (PRIO_TO_NICE(p->static_prio) < 0)
> p->static_prio = NICE_TO_PRIO(0);
Yes, this matches __setscheduler_param(). And yes, this wants to be
done.
Anyway, while looking at all this I found that the manpages specify
RESET_ON_FORK to apply to CAP_SYS properties; which is a tad awkward,
esp if we end up allowing unpriv access to DL (or even FIFO/RR when
isolated in a bandwidth group).
Additionally, it doesn't look like PR_SET_TIMERSLACK is CAP_SYS guarded
itself, so this is all a bit of a mess.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks
2026-05-21 7:04 ` Peter Zijlstra
@ 2026-05-21 7:35 ` Guanyou.Chen
2026-05-21 7:35 ` [PATCH 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-21 7:35 ` [PATCH 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks Guanyou.Chen
1 sibling, 2 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-21 7:35 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, chenguanyou, linqiaoting, chunhui.li,
linux-kernel, linux-mm
v2:
- Split into two patches per Peter's suggestion
- Patch 1: pure bug fix in sched_fork (restore timer_slack on reset)
- Patch 2: fix default_timer_slack_ns inheritance + init_task init
When an RT task with SCHED_RESET_ON_FORK creates child threads, the
children inherit timer_slack_ns=0 and can never recover a proper slack
value. This results in ~2x more timer interrupts than necessary.
Patch 1 fixes the immediate issue by restoring timer_slack_ns in
sched_fork() when resetting from RT to NORMAL policy.
Patch 2 fixes the root cause: copy_process() was overwriting the
child's default_timer_slack_ns with the parent's timer_slack_ns (0 for
RT), and init_task never initialized default_timer_slack_ns.
With both patches applied (RT parent, RESET_ON_FORK, 32 child threads
usleep(1)):
Before: child slack=0, avg_sleep=38us, ~832K interrupts/s
After: child slack=50000, avg_sleep=88us, ~363K interrupts/s
Guanyou.Chen (2):
sched: restore timer_slack_ns when resetting RT policy on fork
fork: fix default_timer_slack_ns inheritance from RT parent
init/init_task.c | 1 +
kernel/fork.c | 2 --
kernel/sched/core.c | 1 +
3 files changed, 2 insertions(+), 2 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-21 7:35 ` [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks Guanyou.Chen
@ 2026-05-21 7:35 ` Guanyou.Chen
2026-05-21 7:35 ` [PATCH 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-21 7:35 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, chenguanyou, linqiaoting, chunhui.li,
linux-kernel, linux-mm
Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
__setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
creates child threads, the children inherit timer_slack_ns=0 from the
parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
not restore timer_slack_ns, leaving the child permanently running with
zero slack.
Fix this by restoring timer_slack_ns from default_timer_slack_ns in
sched_fork() when resetting from RT/DL to NORMAL policy, matching the
existing behavior in __setscheduler_params().
Note: this fix alone requires a correct default_timer_slack_ns to be
effective. See the following patch for that fix.
Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..b1a241810ce0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
p->policy = SCHED_NORMAL;
p->static_prio = NICE_TO_PRIO(0);
p->rt_priority = 0;
+ p->timer_slack_ns = p->default_timer_slack_ns;
} else if (PRIO_TO_NICE(p->static_prio) < 0)
p->static_prio = NICE_TO_PRIO(0);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] fork: fix default_timer_slack_ns inheritance from RT parent
2026-05-21 7:35 ` [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks Guanyou.Chen
2026-05-21 7:35 ` [PATCH 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
@ 2026-05-21 7:35 ` Guanyou.Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-21 7:35 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, chenguanyou, linqiaoting, chunhui.li,
linux-kernel, linux-mm
copy_process() sets the child's default_timer_slack_ns from the parent's
timer_slack_ns. Since commit ed4fb6d7ef68 ("hrtimer: Use and report
correct timerslack values for realtime tasks"), RT tasks have
timer_slack_ns forced to 0. This corrupts the child's
default_timer_slack_ns to 0, making prctl(PR_SET_TIMERSLACK, 0) unable
to restore a meaningful default.
Additionally, init_task never initialized default_timer_slack_ns (it was
implicitly 0). The original fork code masked this by always overwriting
the child's default from timer_slack_ns. With this fix, init_task must
explicitly initialize it.
Fix this by:
1. Initializing default_timer_slack_ns=50000 in init_task.
2. Removing the default_timer_slack_ns override in copy_process()
(dup_task_struct already copies both fields correctly from the
parent).
The semantic of default_timer_slack_ns is a "reset target" for
prctl(PR_SET_TIMERSLACK, 0). No userspace API modifies it, so it should
propagate unchanged through the process tree via struct copy.
Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task struct")
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
---
init/init_task.c | 1 +
kernel/fork.c | 2 --
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/init/init_task.c b/init/init_task.c
index 5c838757fc10..57ff8dae9bfb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
INIT_CPU_TIMERS(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
+ .default_timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
#ifdef CONFIG_AUDIT
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..8358df80e11d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
retval = -EAGAIN;
#endif
- p->default_timer_slack_ns = current->timer_slack_ns;
-
#ifdef CONFIG_PSI
p->psi_flags = 0;
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks
2026-05-21 7:04 ` Peter Zijlstra
2026-05-21 7:35 ` [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks Guanyou.Chen
@ 2026-05-22 13:09 ` Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-22 13:10 ` [PATCH v3 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
1 sibling, 2 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-22 13:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, K Prateek Nayak, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
v3:
- Patch 2: only fix RT/DL parent case, non-RT unchanged. Addresses
concerns from Cunlong Li and K Prateek Nayak.
v2:
- Split into two patches per Peter's suggestion.
When an RT task with SCHED_RESET_ON_FORK creates child threads, the
children inherit timer_slack_ns=0 and can never recover a proper slack
value.
Per prctl(2), "Timer slack is not applied to threads that are scheduled
under a real-time scheduling policy." The RT parent's timer_slack_ns=0
is not a user-chosen value but a forced "not applicable" state.
Inheriting this as the child's default is a bug introduced by
ed4fb6d7ef68.
Patch 1 restores timer_slack_ns in sched_fork() when resetting from RT
to NORMAL policy.
Patch 2 fixes copy_process() to use default_timer_slack_ns (preserves
pre-RT value) instead of timer_slack_ns (forced 0) for RT/DL parents.
With both patches applied (RT parent, RESET_ON_FORK, 32 child threads
usleep(1)):
Before: child slack=0, avg_sleep=38us, ~832K interrupts/s
After: child slack=50000, avg_sleep=88us, ~363K interrupts/s
Guanyou.Chen (2):
sched: restore timer_slack_ns when resetting RT policy on fork
fork: fix default_timer_slack_ns inheritance from RT parent
kernel/fork.c | 6 +++++-
kernel/sched/core.c | 1 +
2 files changed, 6 insertions(+), 1 deletion(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/2] sched: restore timer_slack_ns when resetting RT policy on fork
2026-05-22 13:09 ` [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks Guanyou.Chen
@ 2026-05-22 13:09 ` Guanyou.Chen
2026-05-22 13:10 ` [PATCH v3 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-22 13:09 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, K Prateek Nayak, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
__setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
creates child threads, the children inherit timer_slack_ns=0 from the
parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
not restore timer_slack_ns, leaving the child permanently running with
zero slack.
Fix this by restoring timer_slack_ns from default_timer_slack_ns in
sched_fork() when resetting from RT/DL to NORMAL policy, matching the
existing behavior in __setscheduler_params().
Note: this fix alone requires a correct default_timer_slack_ns to be
effective. See the following patch for that fix.
Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..b1a241810ce0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
p->policy = SCHED_NORMAL;
p->static_prio = NICE_TO_PRIO(0);
p->rt_priority = 0;
+ p->timer_slack_ns = p->default_timer_slack_ns;
} else if (PRIO_TO_NICE(p->static_prio) < 0)
p->static_prio = NICE_TO_PRIO(0);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 2/2] fork: fix default_timer_slack_ns inheritance from RT parent
2026-05-22 13:09 ` [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
@ 2026-05-22 13:10 ` Guanyou.Chen
1 sibling, 0 replies; 11+ messages in thread
From: Guanyou.Chen @ 2026-05-22 13:10 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Andrew Morton
Cc: Thomas Gleixner, Felix Moessbauer, Dietmar Eggemann,
Steven Rostedt, Kees Cook, K Prateek Nayak, chenguanyou,
linqiaoting, chunhui.li, linux-kernel, linux-mm
Per prctl(2), "Timer slack is not applied to threads that are scheduled
under a real-time scheduling policy." This means RT tasks' timer_slack_ns
is forcibly 0 - not a user-chosen value but a "not applicable" state.
When copy_process() sets the child's default_timer_slack_ns from the
parent's timer_slack_ns, it inherits this forced 0 for RT parents. This
corrupts the child's reset target, making prctl(PR_SET_TIMERSLACK, 0)
and sched_setscheduler() back to NORMAL unable to restore a meaningful
default.
Fix this by using default_timer_slack_ns (which preserves the pre-RT
value) when the parent is RT/DL. For non-RT parents, timer_slack_ns is
a meaningful user value and the existing behavior is preserved.
Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Reported-by: Qiaoting.Lin <linqiaoting@xiaomi.com>
Signed-off-by: Guanyou.Chen <chenguanyou@xiaomi.com>
Signed-off-by: Chunhui.Li <chunhui.li@mediatek.com>
---
kernel/fork.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..bc4df18bfd90 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -23,6 +23,7 @@
#include <linux/sched/task_stack.h>
#include <linux/sched/cputime.h>
#include <linux/sched/ext.h>
+#include <linux/sched/rt.h>
#include <linux/seq_file.h>
#include <linux/rtmutex.h>
#include <linux/init.h>
@@ -2133,7 +2134,10 @@ __latent_entropy struct task_struct *copy_process(
retval = -EAGAIN;
#endif
- p->default_timer_slack_ns = current->timer_slack_ns;
+ if (rt_or_dl_task_policy(current))
+ p->default_timer_slack_ns = current->default_timer_slack_ns;
+ else
+ p->default_timer_slack_ns = current->timer_slack_ns;
#ifdef CONFIG_PSI
p->psi_flags = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-22 13:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 2:52 [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-21 3:18 ` Cunlong Li
2026-05-21 6:23 ` Guanyou Chen
2026-05-21 6:31 ` Guanyou Chen
2026-05-21 7:04 ` Peter Zijlstra
2026-05-21 7:35 ` [PATCH v2 0/2] sched: fix timer_slack_ns for children of RT tasks Guanyou.Chen
2026-05-21 7:35 ` [PATCH 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-21 7:35 ` [PATCH 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 0/2] sched/fork: fix timer_slack_ns for children of RT tasks Guanyou.Chen
2026-05-22 13:09 ` [PATCH v3 1/2] sched: restore timer_slack_ns when resetting RT policy on fork Guanyou.Chen
2026-05-22 13:10 ` [PATCH v3 2/2] fork: fix default_timer_slack_ns inheritance from RT parent Guanyou.Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox