* [PATCH] sched/topology: clear freecpu bit on detach
@ 2025-04-22 19:48 Doug Berger
2025-04-29 8:15 ` Florian Fainelli
2025-07-25 22:33 ` Doug Berger
0 siblings, 2 replies; 7+ messages in thread
From: Doug Berger @ 2025-04-22 19:48 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Florian Fainelli, linux-kernel, Doug Berger
There is a hazard in the deadline scheduler where an offlined CPU
can have its free_cpus bit left set in the def_root_domain when
the schedutil cpufreq governor is used. This can allow a deadline
thread to be pushed to the runqueue of a powered down CPU which
breaks scheduling. The details can be found here:
https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
The free_cpus mask is expected to be cleared by set_rq_offline();
however, the hazard occurs before the root domain is made online
during CPU hotplug so that function is not invoked for the CPU
that is being made active.
This commit works around the issue by ensuring the free_cpus bit
for a CPU is always cleared when the CPU is removed from a
root_domain. This likely makes the call of cpudl_clear_freecpu()
in rq_offline_dl() fully redundant, but I have not removed it
here because I am not certain of all flows.
It seems likely that a better solution is possible from someone
more familiar with the scheduler implementation, but this
approach is minimally invasive from someone who is not.
Signed-off-by: Doug Berger <opendmb@gmail.com>
---
kernel/sched/topology.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index a2a38e1b6f18..c10c5385031f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -496,6 +496,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
set_rq_offline(rq);
cpumask_clear_cpu(rq->cpu, old_rd->span);
+ cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
/*
* If we don't want to free the old_rd yet then
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-04-22 19:48 [PATCH] sched/topology: clear freecpu bit on detach Doug Berger
@ 2025-04-29 8:15 ` Florian Fainelli
2025-05-02 13:02 ` Juri Lelli
2025-07-25 22:33 ` Doug Berger
1 sibling, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2025-04-29 8:15 UTC (permalink / raw)
To: Doug Berger, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Florian Fainelli, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]
On 4/22/2025 9:48 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
>
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
>
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
>
> Signed-off-by: Doug Berger <opendmb@gmail.com>
> ---
FWIW, we were able to reproduce this with the attached hotplug.sh script
which would just randomly hot plug/unplug CPUs (./hotplug.sh 4). Within
a few hundred of iterations you could see the lock up occur, it's
unclear why this has not been seen by more people.
Since this is not the first posting or attempt at fixing this bug [1]
and we consider it to be a serious one, can this be reviewed/commented
on/applied? Thanks!
[1]: https://lkml.org/lkml/2025/1/14/1687
--
Florian
[-- Attachment #2: hotplug.sh --]
[-- Type: text/plain, Size: 699 bytes --]
#!/bin/sh
# Hotplug test
usage() {
echo "Usage: $0 [# cpus]"
echo " If number of cpus is not given, defaults to 2"
exit
}
# Default to 2 CPUs
NR_CPUS=${1:-2}
[ $NR_CPUS -lt 2 ] && usage 1>&2
MAXCPU=$((NR_CPUS-1))
MAX=`cat /sys/devices/system/cpu/kernel_max`
[ $MAXCPU -gt $MAX ] && echo "Too many CPUs" 1>&2 && usage 1>&2
cpu_path() {
echo /sys/devices/system/cpu/cpu$1
}
checkpoint_test() {
if [ $(($1 % 50)) -eq 0 ]; then
echo "**** Finished test $1 ****"
fi
}
echo '****'
echo "Testing $NR_CPUS CPUs"
echo '****'
TEST=0
while :
do
N=$((RANDOM % MAXCPU + 1))
ON=`cat $(cpu_path $N)/online`
echo $((1-ON)) > $(cpu_path $N)/online
TEST=$((TEST+1))
checkpoint_test $TEST
done
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-04-29 8:15 ` Florian Fainelli
@ 2025-05-02 13:02 ` Juri Lelli
2025-05-23 18:14 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2025-05-02 13:02 UTC (permalink / raw)
To: Florian Fainelli
Cc: Doug Berger, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Florian Fainelli, linux-kernel
Hi,
On 29/04/25 10:15, Florian Fainelli wrote:
>
>
> On 4/22/2025 9:48 PM, Doug Berger wrote:
> > There is a hazard in the deadline scheduler where an offlined CPU
> > can have its free_cpus bit left set in the def_root_domain when
> > the schedutil cpufreq governor is used. This can allow a deadline
> > thread to be pushed to the runqueue of a powered down CPU which
> > breaks scheduling. The details can be found here:
> > https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
> >
> > The free_cpus mask is expected to be cleared by set_rq_offline();
> > however, the hazard occurs before the root domain is made online
> > during CPU hotplug so that function is not invoked for the CPU
> > that is being made active.
> >
> > This commit works around the issue by ensuring the free_cpus bit
> > for a CPU is always cleared when the CPU is removed from a
> > root_domain. This likely makes the call of cpudl_clear_freecpu()
> > in rq_offline_dl() fully redundant, but I have not removed it
> > here because I am not certain of all flows.
> >
> > It seems likely that a better solution is possible from someone
> > more familiar with the scheduler implementation, but this
> > approach is minimally invasive from someone who is not.
> >
> > Signed-off-by: Doug Berger <opendmb@gmail.com>
> > ---
>
> FWIW, we were able to reproduce this with the attached hotplug.sh script
> which would just randomly hot plug/unplug CPUs (./hotplug.sh 4). Within a
> few hundred of iterations you could see the lock up occur, it's unclear why
> this has not been seen by more people.
>
> Since this is not the first posting or attempt at fixing this bug [1] and we
> consider it to be a serious one, can this be reviewed/commented on/applied?
> Thanks!
>
> [1]: https://lkml.org/lkml/2025/1/14/1687
So, going back to the initial report, the thing that makes me a bit
uncomfortable with the suggested change is the worry that it might be
plastering over a more fundamental issue. Not against it, though, and I
really appreciate Doug's analysis and proposed fixes!
Doug wrote:
"Initially, CPU0 and CPU1 are active and CPU2 and CPU3 have been
previously offlined so their runqueues are attached to the
def_root_domain.
1) A hot plug is initiated on CPU2.
2) The cpuhp/2 thread invokes the cpufreq governor driver during
the CPUHP_AP_ONLINE_DYN step.
3) The sched util cpufreq governor creates the "sugov:2" thread to
execute on CPU2 with the deadline scheduler.
4) The deadline scheduler clears the free_cpus mask for CPU2 within
the def_root_domain when "sugov:2" is scheduled."
I wonder if it's OK to schedule sugov:2 on a CPU that didn't reach yet
complete online state. Peter, others, what do you think?
Thanks,
Juri
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-05-02 13:02 ` Juri Lelli
@ 2025-05-23 18:14 ` Florian Fainelli
2025-06-03 16:18 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2025-05-23 18:14 UTC (permalink / raw)
To: Juri Lelli, Florian Fainelli, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
Steven Rostedt, Valentin Schneider
Cc: Doug Berger, linux-kernel
Moving CC list to To
On 5/2/25 06:02, Juri Lelli wrote:
> Hi,
>
> On 29/04/25 10:15, Florian Fainelli wrote:
>>
>>
>> On 4/22/2025 9:48 PM, Doug Berger wrote:
>>> There is a hazard in the deadline scheduler where an offlined CPU
>>> can have its free_cpus bit left set in the def_root_domain when
>>> the schedutil cpufreq governor is used. This can allow a deadline
>>> thread to be pushed to the runqueue of a powered down CPU which
>>> breaks scheduling. The details can be found here:
>>> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>>>
>>> The free_cpus mask is expected to be cleared by set_rq_offline();
>>> however, the hazard occurs before the root domain is made online
>>> during CPU hotplug so that function is not invoked for the CPU
>>> that is being made active.
>>>
>>> This commit works around the issue by ensuring the free_cpus bit
>>> for a CPU is always cleared when the CPU is removed from a
>>> root_domain. This likely makes the call of cpudl_clear_freecpu()
>>> in rq_offline_dl() fully redundant, but I have not removed it
>>> here because I am not certain of all flows.
>>>
>>> It seems likely that a better solution is possible from someone
>>> more familiar with the scheduler implementation, but this
>>> approach is minimally invasive from someone who is not.
>>>
>>> Signed-off-by: Doug Berger <opendmb@gmail.com>
>>> ---
>>
>> FWIW, we were able to reproduce this with the attached hotplug.sh script
>> which would just randomly hot plug/unplug CPUs (./hotplug.sh 4). Within a
>> few hundred of iterations you could see the lock up occur, it's unclear why
>> this has not been seen by more people.
>>
>> Since this is not the first posting or attempt at fixing this bug [1] and we
>> consider it to be a serious one, can this be reviewed/commented on/applied?
>> Thanks!
>>
>> [1]: https://lkml.org/lkml/2025/1/14/1687
>
> So, going back to the initial report, the thing that makes me a bit
> uncomfortable with the suggested change is the worry that it might be
> plastering over a more fundamental issue. Not against it, though, and I
> really appreciate Doug's analysis and proposed fixes!
>
> Doug wrote:
>
> "Initially, CPU0 and CPU1 are active and CPU2 and CPU3 have been
> previously offlined so their runqueues are attached to the
> def_root_domain.
> 1) A hot plug is initiated on CPU2.
> 2) The cpuhp/2 thread invokes the cpufreq governor driver during
> the CPUHP_AP_ONLINE_DYN step.
> 3) The sched util cpufreq governor creates the "sugov:2" thread to
> execute on CPU2 with the deadline scheduler.
> 4) The deadline scheduler clears the free_cpus mask for CPU2 within
> the def_root_domain when "sugov:2" is scheduled."
>
> I wonder if it's OK to schedule sugov:2 on a CPU that didn't reach yet
> complete online state. Peter, others, what do you think?
Peter, can you please review this patch? Thank you
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-05-23 18:14 ` Florian Fainelli
@ 2025-06-03 16:18 ` Florian Fainelli
2025-06-11 20:06 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2025-06-03 16:18 UTC (permalink / raw)
To: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman, Steven Rostedt,
Valentin Schneider
Cc: Doug Berger, linux-kernel
On 5/23/25 11:14, Florian Fainelli wrote:
> Moving CC list to To
>
> On 5/2/25 06:02, Juri Lelli wrote:
>> Hi,
>>
>> On 29/04/25 10:15, Florian Fainelli wrote:
>>>
>>>
>>> On 4/22/2025 9:48 PM, Doug Berger wrote:
>>>> There is a hazard in the deadline scheduler where an offlined CPU
>>>> can have its free_cpus bit left set in the def_root_domain when
>>>> the schedutil cpufreq governor is used. This can allow a deadline
>>>> thread to be pushed to the runqueue of a powered down CPU which
>>>> breaks scheduling. The details can be found here:
>>>> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>>>>
>>>> The free_cpus mask is expected to be cleared by set_rq_offline();
>>>> however, the hazard occurs before the root domain is made online
>>>> during CPU hotplug so that function is not invoked for the CPU
>>>> that is being made active.
>>>>
>>>> This commit works around the issue by ensuring the free_cpus bit
>>>> for a CPU is always cleared when the CPU is removed from a
>>>> root_domain. This likely makes the call of cpudl_clear_freecpu()
>>>> in rq_offline_dl() fully redundant, but I have not removed it
>>>> here because I am not certain of all flows.
>>>>
>>>> It seems likely that a better solution is possible from someone
>>>> more familiar with the scheduler implementation, but this
>>>> approach is minimally invasive from someone who is not.
>>>>
>>>> Signed-off-by: Doug Berger <opendmb@gmail.com>
>>>> ---
>>>
>>> FWIW, we were able to reproduce this with the attached hotplug.sh script
>>> which would just randomly hot plug/unplug CPUs (./hotplug.sh 4).
>>> Within a
>>> few hundred of iterations you could see the lock up occur, it's
>>> unclear why
>>> this has not been seen by more people.
>>>
>>> Since this is not the first posting or attempt at fixing this bug [1]
>>> and we
>>> consider it to be a serious one, can this be reviewed/commented on/
>>> applied?
>>> Thanks!
>>>
>>> [1]: https://lkml.org/lkml/2025/1/14/1687
>>
>> So, going back to the initial report, the thing that makes me a bit
>> uncomfortable with the suggested change is the worry that it might be
>> plastering over a more fundamental issue. Not against it, though, and I
>> really appreciate Doug's analysis and proposed fixes!
>>
>> Doug wrote:
>>
>> "Initially, CPU0 and CPU1 are active and CPU2 and CPU3 have been
>> previously offlined so their runqueues are attached to the
>> def_root_domain.
>> 1) A hot plug is initiated on CPU2.
>> 2) The cpuhp/2 thread invokes the cpufreq governor driver during
>> the CPUHP_AP_ONLINE_DYN step.
>> 3) The sched util cpufreq governor creates the "sugov:2" thread to
>> execute on CPU2 with the deadline scheduler.
>> 4) The deadline scheduler clears the free_cpus mask for CPU2 within
>> the def_root_domain when "sugov:2" is scheduled."
>>
>> I wonder if it's OK to schedule sugov:2 on a CPU that didn't reach yet
>> complete online state. Peter, others, what do you think?
>
> Peter, can you please review this patch? Thank you
Ping? Can we get to some resolution on way or another here? Thanks
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-06-03 16:18 ` Florian Fainelli
@ 2025-06-11 20:06 ` Florian Fainelli
0 siblings, 0 replies; 7+ messages in thread
From: Florian Fainelli @ 2025-06-11 20:06 UTC (permalink / raw)
To: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman, Steven Rostedt,
Valentin Schneider
Cc: Doug Berger, linux-kernel
On 6/3/25 09:18, Florian Fainelli wrote:
> On 5/23/25 11:14, Florian Fainelli wrote:
>> Moving CC list to To
>>
>> On 5/2/25 06:02, Juri Lelli wrote:
>>> Hi,
>>>
>>> On 29/04/25 10:15, Florian Fainelli wrote:
>>>>
>>>>
>>>> On 4/22/2025 9:48 PM, Doug Berger wrote:
>>>>> There is a hazard in the deadline scheduler where an offlined CPU
>>>>> can have its free_cpus bit left set in the def_root_domain when
>>>>> the schedutil cpufreq governor is used. This can allow a deadline
>>>>> thread to be pushed to the runqueue of a powered down CPU which
>>>>> breaks scheduling. The details can be found here:
>>>>> https://lore.kernel.org/lkml/20250110233010.2339521-1-
>>>>> opendmb@gmail.com
>>>>>
>>>>> The free_cpus mask is expected to be cleared by set_rq_offline();
>>>>> however, the hazard occurs before the root domain is made online
>>>>> during CPU hotplug so that function is not invoked for the CPU
>>>>> that is being made active.
>>>>>
>>>>> This commit works around the issue by ensuring the free_cpus bit
>>>>> for a CPU is always cleared when the CPU is removed from a
>>>>> root_domain. This likely makes the call of cpudl_clear_freecpu()
>>>>> in rq_offline_dl() fully redundant, but I have not removed it
>>>>> here because I am not certain of all flows.
>>>>>
>>>>> It seems likely that a better solution is possible from someone
>>>>> more familiar with the scheduler implementation, but this
>>>>> approach is minimally invasive from someone who is not.
>>>>>
>>>>> Signed-off-by: Doug Berger <opendmb@gmail.com>
>>>>> ---
>>>>
>>>> FWIW, we were able to reproduce this with the attached hotplug.sh
>>>> script
>>>> which would just randomly hot plug/unplug CPUs (./hotplug.sh 4).
>>>> Within a
>>>> few hundred of iterations you could see the lock up occur, it's
>>>> unclear why
>>>> this has not been seen by more people.
>>>>
>>>> Since this is not the first posting or attempt at fixing this bug
>>>> [1] and we
>>>> consider it to be a serious one, can this be reviewed/commented on/
>>>> applied?
>>>> Thanks!
>>>>
>>>> [1]: https://lkml.org/lkml/2025/1/14/1687
>>>
>>> So, going back to the initial report, the thing that makes me a bit
>>> uncomfortable with the suggested change is the worry that it might be
>>> plastering over a more fundamental issue. Not against it, though, and I
>>> really appreciate Doug's analysis and proposed fixes!
>>>
>>> Doug wrote:
>>>
>>> "Initially, CPU0 and CPU1 are active and CPU2 and CPU3 have been
>>> previously offlined so their runqueues are attached to the
>>> def_root_domain.
>>> 1) A hot plug is initiated on CPU2.
>>> 2) The cpuhp/2 thread invokes the cpufreq governor driver during
>>> the CPUHP_AP_ONLINE_DYN step.
>>> 3) The sched util cpufreq governor creates the "sugov:2" thread to
>>> execute on CPU2 with the deadline scheduler.
>>> 4) The deadline scheduler clears the free_cpus mask for CPU2 within
>>> the def_root_domain when "sugov:2" is scheduled."
>>>
>>> I wonder if it's OK to schedule sugov:2 on a CPU that didn't reach yet
>>> complete online state. Peter, others, what do you think?
>>
>> Peter, can you please review this patch? Thank you
>
> Ping? Can we get to some resolution on way or another here? Thanks
Peter, can you please review this patch or ask questions or anything in
case something is not clear?
This is currently preventing our systems using the schedutil cpufreq
governor from being able to pass a few hundred CPU hotplug cycles before
getting a hang.
Thank you!
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sched/topology: clear freecpu bit on detach
2025-04-22 19:48 [PATCH] sched/topology: clear freecpu bit on detach Doug Berger
2025-04-29 8:15 ` Florian Fainelli
@ 2025-07-25 22:33 ` Doug Berger
1 sibling, 0 replies; 7+ messages in thread
From: Doug Berger @ 2025-07-25 22:33 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Florian Fainelli, linux-kernel
I have observed a separate hazard that can occur when offlining a CPU
that is not addressed by this work around.
I intend to submit a more targeted solution to this issue in the near
future, so please continue to disregard this submission :).
Thanks,
Doug
On 4/22/2025 12:48 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
>
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
>
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
>
> Signed-off-by: Doug Berger <opendmb@gmail.com>
> ---
> kernel/sched/topology.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index a2a38e1b6f18..c10c5385031f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -496,6 +496,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
> set_rq_offline(rq);
>
> cpumask_clear_cpu(rq->cpu, old_rd->span);
> + cpudl_clear_freecpu(&old_rd->cpudl, rq->cpu);
>
> /*
> * If we don't want to free the old_rd yet then
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-25 22:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-22 19:48 [PATCH] sched/topology: clear freecpu bit on detach Doug Berger
2025-04-29 8:15 ` Florian Fainelli
2025-05-02 13:02 ` Juri Lelli
2025-05-23 18:14 ` Florian Fainelli
2025-06-03 16:18 ` Florian Fainelli
2025-06-11 20:06 ` Florian Fainelli
2025-07-25 22:33 ` Doug Berger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).