All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Doug Berger <opendmb@gmail.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Florian Fainelli <florian.fainelli@broadcom.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched/topology: clear freecpu bit on detach
Date: Tue, 29 Apr 2025 10:15:29 +0200	[thread overview]
Message-ID: <609e6fe5-2893-4c13-8e52-e9df05146ffb@gmail.com> (raw)
In-Reply-To: <20250422194853.1636334-1-opendmb@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]



On 4/22/2025 9:48 PM, Doug Berger wrote:
> There is a hazard in the deadline scheduler where an offlined CPU
> can have its free_cpus bit left set in the def_root_domain when
> the schedutil cpufreq governor is used. This can allow a deadline
> thread to be pushed to the runqueue of a powered down CPU which
> breaks scheduling. The details can be found here:
> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
> 
> The free_cpus mask is expected to be cleared by set_rq_offline();
> however, the hazard occurs before the root domain is made online
> during CPU hotplug so that function is not invoked for the CPU
> that is being made active.
> 
> This commit works around the issue by ensuring the free_cpus bit
> for a CPU is always cleared when the CPU is removed from a
> root_domain. This likely makes the call of cpudl_clear_freecpu()
> in rq_offline_dl() fully redundant, but I have not removed it
> here because I am not certain of all flows.
> 
> It seems likely that a better solution is possible from someone
> more familiar with the scheduler implementation, but this
> approach is minimally invasive from someone who is not.
> 
> Signed-off-by: Doug Berger <opendmb@gmail.com>
> ---

FWIW, we were able to reproduce this with the attached hotplug.sh script 
which would just randomly hot plug/unplug CPUs (./hotplug.sh 4). Within 
a few hundred of iterations you could see the lock up occur, it's 
unclear why this has not been seen by more people.

Since this is not the first posting or attempt at fixing this bug [1] 
and we consider it to be a serious one, can this be reviewed/commented 
on/applied? Thanks!

[1]: https://lkml.org/lkml/2025/1/14/1687
-- 
Florian

[-- Attachment #2: hotplug.sh --]
[-- Type: text/plain, Size: 699 bytes --]

#!/bin/sh
# Hotplug test

usage() {
	echo "Usage: $0 [# cpus]"
	echo "   If number of cpus is not given, defaults to 2"
	exit
}

# Default to 2 CPUs
NR_CPUS=${1:-2}

[ $NR_CPUS -lt 2 ] && usage 1>&2

MAXCPU=$((NR_CPUS-1))
MAX=`cat /sys/devices/system/cpu/kernel_max`

[ $MAXCPU -gt $MAX ] && echo "Too many CPUs" 1>&2 && usage 1>&2

cpu_path() {
	echo /sys/devices/system/cpu/cpu$1
}

checkpoint_test() {
	if [ $(($1 % 50)) -eq 0 ]; then
		echo "**** Finished test $1 ****"
	fi
}

echo '****'
echo "Testing $NR_CPUS CPUs"
echo '****'

TEST=0
while :
do
	N=$((RANDOM % MAXCPU + 1))
	ON=`cat $(cpu_path $N)/online`
	echo $((1-ON)) > $(cpu_path $N)/online
	TEST=$((TEST+1))
	checkpoint_test $TEST
done

  reply	other threads:[~2025-04-29  8:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 19:48 [PATCH] sched/topology: clear freecpu bit on detach Doug Berger
2025-04-29  8:15 ` Florian Fainelli [this message]
2025-05-02 13:02   ` Juri Lelli
2025-05-23 18:14     ` Florian Fainelli
2025-06-03 16:18       ` Florian Fainelli
2025-06-11 20:06         ` Florian Fainelli
2025-07-25 22:33 ` Doug Berger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=609e6fe5-2893-4c13-8e52-e9df05146ffb@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=opendmb@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.