From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08C0825A640 for ; Fri, 17 Jan 2025 01:06:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737076020; cv=none; b=nWBALcx4z3UMVegc2w5TyLHLVmuEw6JW8djjejvdFAbIxG44k3u/bEQ8sxM/j8xKRy3jRLX2myktFotxq1x3J4hfBgdJ9JdkNi8pfZrASCEMB9Nqmm4KFfRDpwbZTyIplgmaF6a6BmbxWWQRaWPj6r6UDEzMbSgiTDIcjeQDbPg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737076020; c=relaxed/simple; bh=vsa9TykXXcR4PFfXZUBYMduloUw7qMH7SeIu3vtMvRM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=MOf2zGwXHsKvbj7mAvb6QMmKwuzSLimeTgr5KzuT/y4NzBTjJ2CPXzxj5Trmk7AOXRnU/9lbQBE4fiSXfDCC1l+ankvV+3pSUx38E7Yu8TY9LAD8T+hNHC7skhMYxms210M5rTJQkFy6D7uVcpqBq1qhQK/JwjAxUTHwSpRLh4g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YjHNCFnp; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YjHNCFnp" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2165cb60719so28666025ad.0 for ; Thu, 16 Jan 2025 17:06:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737076018; x=1737680818; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=RE+p26I0QU0F/2/enFrMBCqK2fhdEz2lQuWrZLaTR1Y=; b=YjHNCFnpgspUh8MHo2MYJ1id40LrG0srqMNJp7+mE1j72zKHuBULxGsPOKBLaKiRvu Sw57D6RwzcjbFH7cpTlpzkzeUz34aEKyT4sUt9abY8WHE7shmefM+Pw5uf0cUwE2YUil 49JHkwGWoHbaOURJBvd86jeHGxbBmN9y7Fjh3vFWE1nNSjdCnbFdbDXl/TWCJKqjLS5h CMAZYPqK9o4fygQ5G8cdZZO9JsU4wbQrhTVBwDzAzz/eDCf1tMyIwB66jJ698ARrFc7n Rliaj5Woadqj0c93d7dp8esfBzkTMJEaAweZ3Z0gueczRN5E6kYLOfQXUHlorTx9SuCG 4+PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737076018; x=1737680818; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RE+p26I0QU0F/2/enFrMBCqK2fhdEz2lQuWrZLaTR1Y=; b=sI5XzD7PEWXiwVcLhYaSzsNvTY0FI3YtGdeYsFg+FUHxaUJ0PLF9mcychsvlPVKxr0 FWcRGCqLG4581gLZqwWSkKGMQz9T4W8lCFatpU2YyGKkOg+dRRsU/uij8IpdxDSbS2nZ EFWCikh7J+Tya6c3AexOj/Q6L5uphwWH/5AROWFkFjZinUAXD73N4AyivUkkIObiiY4e hhh8uOxahuY23M88ihcNWoYSBbq1u+EQzPp6tq1eHhU9HPE8vNlkRzBkr9KvioNXFtnh 2J4v8yYUpl/HU6Qx7Vx20a5MnKA5QYiSAu28CMhdbGkc3+YZJxe6zaqSV+7iyAZTxEln 3SHA== X-Forwarded-Encrypted: i=1; AJvYcCWWLcgmvdf1XQMYXUh02scvEdfK1Yd+0M9SYiZr8BPQLsb7mRXeKmEBUD+m0mHiZPW5cmC/AiOKgycJPFI=@vger.kernel.org X-Gm-Message-State: AOJu0YwLLNxV7LbbsFb3VciRfjyeSVAYEczPY0NIEVWEpKhhgeCYdaPz gGDLUDUkXilfaXNggJQMHnKt1QzrD4olWSMJ4NFqQBD0O2NavM5/ X-Gm-Gg: ASbGncvwM2JAr7oy8vPZ3Uonf3ONFJgUIo/Ng4ILKEH50U9CvWt0MRjBt/X9utG/VAD VtMb29Q+rAHaqTHtteQgpeF7h0sdmrnzno9gjhI0tB+Ls3VZY9Xn2D9XWeRUovPCxLLnYHRGANS ghBCiECub7TQhlwOgP3f6PvO2Qe4AX+/3jtE6B014o+g0q+tiMZnmFf0REdr1cu7v1gROH3RRGL CgISK8oUnm4lUMnKRk5qADa5TOLI36Eja6eF3P7MUj4RLSwUv6Uwtj8/9iI5quLUZw+ X-Google-Smtp-Source: AGHT+IG8LBxyacRuh5JMOW+Y2ILVqMkLFn6QLEzOJZT9OCa+PcEisjASruu2AEKOOyHaN/f21PrXyg== X-Received: by 2002:a05:6a20:a111:b0:1e1:b062:f403 with SMTP id adf61e73a8af0-1eb215a05ebmr1449607637.34.1737076018028; Thu, 16 Jan 2025 17:06:58 -0800 (PST) Received: from [10.69.47.104] ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72dab816412sm652591b3a.66.2025.01.16.17.06.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Jan 2025 17:06:57 -0800 (PST) Message-ID: Date: Thu, 16 Jan 2025 17:06:55 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] sched/topology: clear freecpu bit on detach To: Juri Lelli Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Florian Fainelli , linux-kernel@vger.kernel.org References: <20250114230404.661569-1-opendmb@gmail.com> Content-Language: en-US From: Doug Berger In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/16/2025 2:13 AM, Juri Lelli wrote: > Hi Doug, > > On 14/01/25 15:04, Doug Berger wrote: >> There is a hazard in the deadline scheduler where an offlined CPU >> can have its free_cpus bit left set in the def_root_domain when >> the schedutil cpufreq governor is used. This can allow a deadline >> thread to be pushed to the runqueue of a powered down CPU which >> breaks scheduling. The details can be found here: >> https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com >> >> The free_cpus mask is expected to be cleared by set_rq_offline(); >> however, the hazard occurs before the root domain is made online >> during CPU hotplug so that function is not invoked for the CPU >> that is being made active. >> >> This commit works around the issue by ensuring the free_cpus bit >> for a CPU is always cleared when the CPU is removed from a >> root_domain. This likely makes the call of cpudl_clear_freecpu() >> in rq_offline_dl() fully redundant, but I have not removed it >> here because I am not certain of all flows. > > Hummm, I am not sure we are covering all the cases this way. > > Would you mind trying out the following maybe? It's based around your > idea on the first patch you proposed. Thanks! I'm happy to test any suggestions. > > Thanks! > Juri > > --- > kernel/sched/cpudeadline.c | 23 ++++++----------------- > kernel/sched/cpudeadline.h | 3 +-- > kernel/sched/deadline.c | 11 ++++++----- > 3 files changed, 13 insertions(+), 24 deletions(-) > > diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c > index 95baa12a1029..30419fa92c1e 100644 > --- a/kernel/sched/cpudeadline.c > +++ b/kernel/sched/cpudeadline.c > @@ -170,7 +170,7 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p, > * > * Returns: (void) > */ > -void cpudl_clear(struct cpudl *cp, int cpu) > +void cpudl_clear(struct cpudl *cp, int cpu, bool offline) > { > int old_idx, new_cpu; > unsigned long flags; > @@ -181,11 +181,8 @@ void cpudl_clear(struct cpudl *cp, int cpu) > > old_idx = cp->elements[cpu].idx; > if (old_idx == IDX_INVALID) { > - /* > - * Nothing to remove if old_idx was invalid. > - * This could happen if a rq_offline_dl is > - * called for a CPU without -dl tasks running. > - */ > + if (offline) > + cpumask_clear_cpu(cpu, cp->free_cpus); > } else { > new_cpu = cp->elements[cp->size - 1].cpu; > cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl; > @@ -195,7 +192,8 @@ void cpudl_clear(struct cpudl *cp, int cpu) > cp->elements[cpu].idx = IDX_INVALID; > cpudl_heapify(cp, old_idx); > > - cpumask_set_cpu(cpu, cp->free_cpus); > + if (!offline) > + cpumask_set_cpu(cpu, cp->free_cpus); > } > raw_spin_unlock_irqrestore(&cp->lock, flags); > } > @@ -243,19 +241,10 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl) > */ > void cpudl_set_freecpu(struct cpudl *cp, int cpu) > { > + guard(raw_spinlock)(&cp->lock); > cpumask_set_cpu(cpu, cp->free_cpus); > } > > -/* > - * cpudl_clear_freecpu - Clear the cpudl.free_cpus > - * @cp: the cpudl max-heap context > - * @cpu: rd attached CPU > - */ > -void cpudl_clear_freecpu(struct cpudl *cp, int cpu) > -{ > - cpumask_clear_cpu(cpu, cp->free_cpus); > -} > - > /* > * cpudl_init - initialize the cpudl structure > * @cp: the cpudl max-heap context > diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h > index 0adeda93b5fb..cb2be7a205dd 100644 > --- a/kernel/sched/cpudeadline.h > +++ b/kernel/sched/cpudeadline.h > @@ -18,9 +18,8 @@ struct cpudl { > #ifdef CONFIG_SMP > int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask); > void cpudl_set(struct cpudl *cp, int cpu, u64 dl); > -void cpudl_clear(struct cpudl *cp, int cpu); > +void cpudl_clear(struct cpudl *cp, int cpu, bool offline); > int cpudl_init(struct cpudl *cp); > void cpudl_set_freecpu(struct cpudl *cp, int cpu); > -void cpudl_clear_freecpu(struct cpudl *cp, int cpu); > void cpudl_cleanup(struct cpudl *cp); > #endif /* CONFIG_SMP */ > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index 643d101cb96a..b52234ebb991 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1870,7 +1870,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline) > if (!dl_rq->dl_nr_running) { > dl_rq->earliest_dl.curr = 0; > dl_rq->earliest_dl.next = 0; > - cpudl_clear(&rq->rd->cpudl, rq->cpu); > + cpudl_clear(&rq->rd->cpudl, rq->cpu, false); > cpupri_set(&rq->rd->cpupri, rq->cpu, rq->rt.highest_prio.curr); > } else { > struct rb_node *leftmost = rb_first_cached(&dl_rq->root); > @@ -2921,9 +2921,11 @@ static void rq_online_dl(struct rq *rq) > if (rq->dl.overloaded) > dl_set_overload(rq); > > - cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu); > - if (rq->dl.dl_nr_running > 0) > + if (rq->dl.dl_nr_running > 0) { > cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr); > + } else { > + cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu); > + } > } > > /* Assumes rq->lock is held */ > @@ -2932,8 +2934,7 @@ static void rq_offline_dl(struct rq *rq) > if (rq->dl.overloaded) > dl_clear_overload(rq); > > - cpudl_clear(&rq->rd->cpudl, rq->cpu); > - cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu); > + cpudl_clear(&rq->rd->cpudl, rq->cpu, true); > } > > void __init init_sched_dl_class(void) Unfortunately, this patch doesn't prevent the issue in my testing. I believe when the "sugov:n" thread is early scheduled while hotplugging CPUn it takes the dec_dl_deadline() path upon completion which ends up setting the free_cpus bit in the def_root_domain before the dynamic scheduling domain is onlined (even with your patch). This sets the trap that gets tripped later. I attempted to confirm that this was in fact introduced by: Fixes: 120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control") However, I discovered that it doesn't appear to have been introduced by that commit since the set_rq_offline() call from sched_cpu_dying() was already meaningless since the runqueue was already offline by that point in the unplug sequence. Thanks for your feedback, Doug