From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08C0825A640
	for <linux-kernel@vger.kernel.org>; Fri, 17 Jan 2025 01:06:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1737076020; cv=none; b=nWBALcx4z3UMVegc2w5TyLHLVmuEw6JW8djjejvdFAbIxG44k3u/bEQ8sxM/j8xKRy3jRLX2myktFotxq1x3J4hfBgdJ9JdkNi8pfZrASCEMB9Nqmm4KFfRDpwbZTyIplgmaF6a6BmbxWWQRaWPj6r6UDEzMbSgiTDIcjeQDbPg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1737076020; c=relaxed/simple;
	bh=vsa9TykXXcR4PFfXZUBYMduloUw7qMH7SeIu3vtMvRM=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=MOf2zGwXHsKvbj7mAvb6QMmKwuzSLimeTgr5KzuT/y4NzBTjJ2CPXzxj5Trmk7AOXRnU/9lbQBE4fiSXfDCC1l+ankvV+3pSUx38E7Yu8TY9LAD8T+hNHC7skhMYxms210M5rTJQkFy6D7uVcpqBq1qhQK/JwjAxUTHwSpRLh4g=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YjHNCFnp; arc=none smtp.client-ip=209.85.214.174
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YjHNCFnp"
Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2165cb60719so28666025ad.0
        for <linux-kernel@vger.kernel.org>; Thu, 16 Jan 2025 17:06:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1737076018; x=1737680818; darn=vger.kernel.org;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=RE+p26I0QU0F/2/enFrMBCqK2fhdEz2lQuWrZLaTR1Y=;
        b=YjHNCFnpgspUh8MHo2MYJ1id40LrG0srqMNJp7+mE1j72zKHuBULxGsPOKBLaKiRvu
         Sw57D6RwzcjbFH7cpTlpzkzeUz34aEKyT4sUt9abY8WHE7shmefM+Pw5uf0cUwE2YUil
         49JHkwGWoHbaOURJBvd86jeHGxbBmN9y7Fjh3vFWE1nNSjdCnbFdbDXl/TWCJKqjLS5h
         CMAZYPqK9o4fygQ5G8cdZZO9JsU4wbQrhTVBwDzAzz/eDCf1tMyIwB66jJ698ARrFc7n
         Rliaj5Woadqj0c93d7dp8esfBzkTMJEaAweZ3Z0gueczRN5E6kYLOfQXUHlorTx9SuCG
         4+PQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1737076018; x=1737680818;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=RE+p26I0QU0F/2/enFrMBCqK2fhdEz2lQuWrZLaTR1Y=;
        b=sI5XzD7PEWXiwVcLhYaSzsNvTY0FI3YtGdeYsFg+FUHxaUJ0PLF9mcychsvlPVKxr0
         FWcRGCqLG4581gLZqwWSkKGMQz9T4W8lCFatpU2YyGKkOg+dRRsU/uij8IpdxDSbS2nZ
         EFWCikh7J+Tya6c3AexOj/Q6L5uphwWH/5AROWFkFjZinUAXD73N4AyivUkkIObiiY4e
         hhh8uOxahuY23M88ihcNWoYSBbq1u+EQzPp6tq1eHhU9HPE8vNlkRzBkr9KvioNXFtnh
         2J4v8yYUpl/HU6Qx7Vx20a5MnKA5QYiSAu28CMhdbGkc3+YZJxe6zaqSV+7iyAZTxEln
         3SHA==
X-Forwarded-Encrypted: i=1; AJvYcCWWLcgmvdf1XQMYXUh02scvEdfK1Yd+0M9SYiZr8BPQLsb7mRXeKmEBUD+m0mHiZPW5cmC/AiOKgycJPFI=@vger.kernel.org
X-Gm-Message-State: AOJu0YwLLNxV7LbbsFb3VciRfjyeSVAYEczPY0NIEVWEpKhhgeCYdaPz
	gGDLUDUkXilfaXNggJQMHnKt1QzrD4olWSMJ4NFqQBD0O2NavM5/
X-Gm-Gg: ASbGncvwM2JAr7oy8vPZ3Uonf3ONFJgUIo/Ng4ILKEH50U9CvWt0MRjBt/X9utG/VAD
	VtMb29Q+rAHaqTHtteQgpeF7h0sdmrnzno9gjhI0tB+Ls3VZY9Xn2D9XWeRUovPCxLLnYHRGANS
	ghBCiECub7TQhlwOgP3f6PvO2Qe4AX+/3jtE6B014o+g0q+tiMZnmFf0REdr1cu7v1gROH3RRGL
	CgISK8oUnm4lUMnKRk5qADa5TOLI36Eja6eF3P7MUj4RLSwUv6Uwtj8/9iI5quLUZw+
X-Google-Smtp-Source: AGHT+IG8LBxyacRuh5JMOW+Y2ILVqMkLFn6QLEzOJZT9OCa+PcEisjASruu2AEKOOyHaN/f21PrXyg==
X-Received: by 2002:a05:6a20:a111:b0:1e1:b062:f403 with SMTP id adf61e73a8af0-1eb215a05ebmr1449607637.34.1737076018028;
        Thu, 16 Jan 2025 17:06:58 -0800 (PST)
Received: from [10.69.47.104] ([192.19.223.252])
        by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72dab816412sm652591b3a.66.2025.01.16.17.06.56
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 16 Jan 2025 17:06:57 -0800 (PST)
Message-ID: <dc24756b-f180-443c-8d01-d1517953c3bf@gmail.com>
Date: Thu, 16 Jan 2025 17:06:55 -0800
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [RFC PATCH] sched/topology: clear freecpu bit on detach
To: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
 Mel Gorman <mgorman@suse.de>, Valentin Schneider <vschneid@redhat.com>,
 Florian Fainelli <florian.fainelli@broadcom.com>,
 linux-kernel@vger.kernel.org
References: <20250114230404.661569-1-opendmb@gmail.com>
 <Z4jb2dFEMQhz2G2E@jlelli-thinkpadt14gen4.remote.csb>
Content-Language: en-US
From: Doug Berger <opendmb@gmail.com>
In-Reply-To: <Z4jb2dFEMQhz2G2E@jlelli-thinkpadt14gen4.remote.csb>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 1/16/2025 2:13 AM, Juri Lelli wrote:
> Hi Doug,
> 
> On 14/01/25 15:04, Doug Berger wrote:
>> There is a hazard in the deadline scheduler where an offlined CPU
>> can have its free_cpus bit left set in the def_root_domain when
>> the schedutil cpufreq governor is used. This can allow a deadline
>> thread to be pushed to the runqueue of a powered down CPU which
>> breaks scheduling. The details can be found here:
>> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
>>
>> The free_cpus mask is expected to be cleared by set_rq_offline();
>> however, the hazard occurs before the root domain is made online
>> during CPU hotplug so that function is not invoked for the CPU
>> that is being made active.
>>
>> This commit works around the issue by ensuring the free_cpus bit
>> for a CPU is always cleared when the CPU is removed from a
>> root_domain. This likely makes the call of cpudl_clear_freecpu()
>> in rq_offline_dl() fully redundant, but I have not removed it
>> here because I am not certain of all flows.
> 
> Hummm, I am not sure we are covering all the cases this way.
> 
> Would you mind trying out the following maybe? It's based around your
> idea on the first patch you proposed.
Thanks! I'm happy to test any suggestions.

> 
> Thanks!
> Juri
> 
> ---
>   kernel/sched/cpudeadline.c | 23 ++++++-----------------
>   kernel/sched/cpudeadline.h |  3 +--
>   kernel/sched/deadline.c    | 11 ++++++-----
>   3 files changed, 13 insertions(+), 24 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 95baa12a1029..30419fa92c1e 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -170,7 +170,7 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
>    *
>    * Returns: (void)
>    */
> -void cpudl_clear(struct cpudl *cp, int cpu)
> +void cpudl_clear(struct cpudl *cp, int cpu, bool offline)
>   {
>   	int old_idx, new_cpu;
>   	unsigned long flags;
> @@ -181,11 +181,8 @@ void cpudl_clear(struct cpudl *cp, int cpu)
>   
>   	old_idx = cp->elements[cpu].idx;
>   	if (old_idx == IDX_INVALID) {
> -		/*
> -		 * Nothing to remove if old_idx was invalid.
> -		 * This could happen if a rq_offline_dl is
> -		 * called for a CPU without -dl tasks running.
> -		 */
> +		if (offline)
> +			cpumask_clear_cpu(cpu, cp->free_cpus);
>   	} else {
>   		new_cpu = cp->elements[cp->size - 1].cpu;
>   		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
> @@ -195,7 +192,8 @@ void cpudl_clear(struct cpudl *cp, int cpu)
>   		cp->elements[cpu].idx = IDX_INVALID;
>   		cpudl_heapify(cp, old_idx);
>   
> -		cpumask_set_cpu(cpu, cp->free_cpus);
> +		if (!offline)
> +			cpumask_set_cpu(cpu, cp->free_cpus);
>   	}
>   	raw_spin_unlock_irqrestore(&cp->lock, flags);
>   }
> @@ -243,19 +241,10 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
>    */
>   void cpudl_set_freecpu(struct cpudl *cp, int cpu)
>   {
> +	guard(raw_spinlock)(&cp->lock);
>   	cpumask_set_cpu(cpu, cp->free_cpus);
>   }
>   
> -/*
> - * cpudl_clear_freecpu - Clear the cpudl.free_cpus
> - * @cp: the cpudl max-heap context
> - * @cpu: rd attached CPU
> - */
> -void cpudl_clear_freecpu(struct cpudl *cp, int cpu)
> -{
> -	cpumask_clear_cpu(cpu, cp->free_cpus);
> -}
> -
>   /*
>    * cpudl_init - initialize the cpudl structure
>    * @cp: the cpudl max-heap context
> diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
> index 0adeda93b5fb..cb2be7a205dd 100644
> --- a/kernel/sched/cpudeadline.h
> +++ b/kernel/sched/cpudeadline.h
> @@ -18,9 +18,8 @@ struct cpudl {
>   #ifdef CONFIG_SMP
>   int  cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask);
>   void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
> -void cpudl_clear(struct cpudl *cp, int cpu);
> +void cpudl_clear(struct cpudl *cp, int cpu, bool offline);
>   int  cpudl_init(struct cpudl *cp);
>   void cpudl_set_freecpu(struct cpudl *cp, int cpu);
> -void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
>   void cpudl_cleanup(struct cpudl *cp);
>   #endif /* CONFIG_SMP */
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 643d101cb96a..b52234ebb991 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1870,7 +1870,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
>   	if (!dl_rq->dl_nr_running) {
>   		dl_rq->earliest_dl.curr = 0;
>   		dl_rq->earliest_dl.next = 0;
> -		cpudl_clear(&rq->rd->cpudl, rq->cpu);
> +		cpudl_clear(&rq->rd->cpudl, rq->cpu, false);
>   		cpupri_set(&rq->rd->cpupri, rq->cpu, rq->rt.highest_prio.curr);
>   	} else {
>   		struct rb_node *leftmost = rb_first_cached(&dl_rq->root);
> @@ -2921,9 +2921,11 @@ static void rq_online_dl(struct rq *rq)
>   	if (rq->dl.overloaded)
>   		dl_set_overload(rq);
>   
> -	cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
> -	if (rq->dl.dl_nr_running > 0)
> +	if (rq->dl.dl_nr_running > 0) {
>   		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
> +	} else {
> +		cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
> +	}
>   }
>   
>   /* Assumes rq->lock is held */
> @@ -2932,8 +2934,7 @@ static void rq_offline_dl(struct rq *rq)
>   	if (rq->dl.overloaded)
>   		dl_clear_overload(rq);
>   
> -	cpudl_clear(&rq->rd->cpudl, rq->cpu);
> -	cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
> +	cpudl_clear(&rq->rd->cpudl, rq->cpu, true);
>   }
>   
>   void __init init_sched_dl_class(void)

Unfortunately, this patch doesn't prevent the issue in my testing.

I believe when the "sugov:n" thread is early scheduled while hotplugging 
CPUn it takes the dec_dl_deadline() path upon completion which ends up 
setting the free_cpus bit in the def_root_domain before the dynamic 
scheduling domain is onlined (even with your patch). This sets the trap 
that gets tripped later.

I attempted to confirm that this was in fact introduced by:
Fixes: 120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control")
However, I discovered that it doesn't appear to have been introduced by 
that commit since the set_rq_offline() call from sched_cpu_dying() was 
already meaningless since the runqueue was already offline by that point 
in the unplug sequence.

Thanks for your feedback,
     Doug