From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EDA63C3BFE for ; Thu, 18 Jun 2026 09:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781774301; cv=none; b=fnnLfpgjIjSdps+4tLYmJRV08uZ5bfFT7WjpoR+EZWwmw+dCE2KVwicdqnKS67cmDpy7ir7Su+IcmQ+FmPrrvIIL4QCciZ/+s+oBHtv8z3cjJpKmay3n3RzGNbE+t2mnSqdx1N79fbDg4hluxOOUO9xKgdASbxTxR8wmh6SpQAo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781774301; c=relaxed/simple; bh=2wSIVu6FKlzH65Ov6bSLkvM5TnHHjx3yXRcdp/tdmzw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jKC7IfIUdRus2X22EP7bqBpCyzSbSgt7UsmPsempzhE6wDEw5qFu8AVgooWSQimpVdQvwbfOef6LKv8MDTs6Pkh61JPBdKNUK+/IWoJuyxMJK41OCKKXpe0J6nDnnCf/HlG4fqoTrX+QM1bFNADxwBxaWa0Q/Ri8Fa8KYy0ehX0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=pass smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=nMUj/xWH; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="nMUj/xWH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=G9xjjTSIj9H+0PVMVEWsKqidvbWb6RQs5oWtTQDglR0=; b=nMUj/xWH89ZRQdWMf8toCIZu4o eBPFw5n8rbMbnEdurxXv+P4WY77gJy0XnnpgM07jnjCrChv26yI9ySm9E8xqRUvQDWqb+JVd62fX+ 0wk+os0V0F8d/ow9p4LHT7XAWHHpQSPCKQ5WLAhuyGXStew4DtgwBnj7aT3JGk2fi7PnFdRu6tCf1 wBuuAGrmLrxpUiHmn0dxk4+lH/zN+iv1rFq3nnVwgl8fjBCaqdePbjTzWKIxdhPvui/7VaLLbfyc7 ML1HRt8KYDEuGzpbjh7iGIYKLpRUxTMWEGFETmWHZdNGBGbHG0RQjnAf28VpaR4F705tyIJhXAVwN wTccLxeQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wa8sv-0000000ECJR-24Ps; Thu, 18 Jun 2026 09:18:06 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 9FD6C300578; Thu, 18 Jun 2026 11:18:04 +0200 (CEST) Date: Thu, 18 Jun 2026 11:18:04 +0200 From: Peter Zijlstra To: Xin Zhao Cc: vschneid@redhat.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, kprateek.nayak@amd.com, pauld@redhat.com, aiqun.yu@oss.qualcomm.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 1/2] sched/fair: Don't trigger active lb if src_rq->curr is not on_rq Message-ID: <20260618091804.GM42921@noisy.programming.kicks-ass.net> References: <20260617072151.1173416-1-jackzxcui1989@163.com> <20260617072151.1173416-2-jackzxcui1989@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260617072151.1173416-2-jackzxcui1989@163.com> On Wed, Jun 17, 2026 at 03:21:50PM +0800, Xin Zhao wrote: > Active balancing needs the help by migration threads which will interrupt > task on src_rq. It has a certain impact on overall performance. Active > balancing often fails, there is a check to determine whether the current > task(say it 'curr') on src_rq can run on dst_rq. We have observed that > even that, if curr is a CFS task and on_rq is 0, the failure rate of > active balancing is very high. Below are the test data from a certain > fillback task scenario executed on a platform with 18 CPUs over 300 > seconds: > In __schedule(), before setting curr to next, during the execution of > pick_next_task(), sched_balance_rq() is called. It will unlock and then > re-lock the rq, creating "holes" during which other CPUs may see zero > rq->curr->on_rq. try_to_block_task() sets curr->on_rq to 0, and during the > rq lock "hole" in pick_next_task(), rq->curr has not yet been assigned to > next, resulting in curr->on_rq being seen as 0. > > We do not need to perform active balancing when src_rq->curr is CFS task > but on_rq is 0, as other CFS tasks have been probably checked just before. > For cases where src_rq->curr is a non-CFS task, we retain the affinity > check for dst_rq to trigger active balancing because such task is likely > to wake-up or woken-by src_rq CFS task which has similar affinity > characteristics to migrate. Also, after executing detach_tasks(), rq lock > is released. Tasks on the rq awakened during detach_tasks() may preempt > the previous CFS task. Based on my test(though not shown above), success > rate of active balancing under the condition of !fair && on_rq is 98.4%. > This scenario does not require the use of stop work, but need to add > another path to detach attach task(s). It seems not necessary enough to > add it, Valentin and Vincent have already discussed about it, see [1]. > > Additionally, sched_class field is a bit far from on_cpu in task_struct. > The previous traversal of cfs_tasks checks on_cpu in can_migrate_task(), > so the additional check for on_rq will not incur much cpu cycle loss, due > to cache locality. > > Two reasons why not check sched_class and on_rq of busiest->curr with the > cpumask_test_cpu() check: > 1. Let the PATCH not introduce new cases that skip logic for resetting > balance_interval to min_interval. > 2. The check of whether busiest cpu has been just triggered active balance > filters a bit more cases than the check of sched_class and on_rq. > > [1]: https://lore.kernel.org/lkml/20190815145107.5318-5-valentin.schneider@arm.com/ > > Signed-off-by: Xin Zhao Perhaps because it is early and I've not had enough wake-up juice, or perhaps because it is too damn warm already and my brain is pre-emptively shutting down already, I found it very hard to read your Changelog. I asked one of these fancy AI things to rephrase things, and then did a manual edit on it and ended up with the below. Notably, the changelog talks about a sched_class check, while the actual patch has none of that. Does this work for you? --- Subject: sched/fair: Don't trigger active lb if src_rq->curr is not on_rq From: Xin Zhao Date: Wed, 17 Jun 2026 15:21:50 +0800 From: Xin Zhao Active load balancing relies on migration threads, which temporarily preempt tasks on the source runqueue (src_rq). This preemption can negatively impact overall system performance. The active balancing logic includes a check to verify whether the current task (curr) on src_rq can actually run on the destination runqueue (dst_rq). We have observed that when curr is a CFS task and its on_rq flag is 0, the active balancing failure rate is exceptionally high. The following table summarizes test data collected over 300 seconds on an 18-CPU platform under a specific fillback task scenario: fair: busiest->curr->sched_class == &fair_sched_class on_rq: busiest->curr->on_rq total: active balance count triggered of correspondent type fail: fail to migrate one task in active_load_balance_cpu_stop() fair && !on_rq !fair && !on_rq domain total fail total fail cpu0 0x00003 0 0 0 0 cpu0 0x3ffff 33 33 1 1 cpu1 0x00003 0 0 0 0 cpu1 0x3ffff 42 42 0 0 cpu2 0x0003c 4 4 0 0 cpu2 0x3ffff 12 12 0 0 cpu3 0x0003c 3 3 0 0 cpu3 0x3ffff 8 7 0 0 cpu4 0x0003c 2 2 0 0 cpu4 0x3ffff 5 4 0 0 cpu5 0x0003c 4 4 0 0 cpu5 0x3ffff 8 8 0 0 cpu6 0x003c0 60 60 0 0 cpu6 0x3ffff 28 27 0 0 cpu7 0x003c0 194 184 0 0 cpu7 0x3ffff 35 35 1 1 cpu8 0x003c0 240 228 0 0 cpu8 0x3ffff 28 28 0 0 cpu9 0x003c0 0 0 0 0 cpu9 0x3ffff 10 10 0 0 cpu10 0x03c00 52 50 0 0 cpu10 0x3ffff 0 0 0 0 cpu11 0x03c00 70 68 0 0 cpu11 0x3ffff 1 1 0 0 cpu12 0x03c00 73 72 0 0 cpu12 0x3ffff 0 0 0 0 cpu13 0x03c00 79 76 0 0 cpu13 0x3ffff 0 0 0 0 cpu14 0x3c000 0 0 0 0 cpu14 0x3ffff 57 55 1 0 cpu15 0x3c000 53 52 1 0 cpu15 0x3ffff 30 29 0 0 cpu16 0x3c000 344 341 10 6 cpu16 0x3ffff 103 100 2 1 cpu17 0x3c000 183 179 2 2 cpu17 0x3ffff 78 77 0 0 sum 1839 1791 18 11 In __schedule(), before curr is updated to next, pick_next_task() invokes sched_balance_rq(). This function temporarily unlocks and relocks the runqueue, creating a window where other CPUs may observe rq->curr->on_rq as 0. We can safely skip active balancing when src_rq->curr->on_rq == 0, as other eligible tasks have likely already been evaluated. We retain the affinity check on dst_rq to trigger active balancing, since such tasks are often woken by (or wake up) tasks on src_rq that share similar affinity constraints. Furthermore, detach_tasks() releases the runqueue lock; any tasks awakened during this window may preempt the previous CFS task. My testing (data not shown) indicates that active balancing succeeds in 98.4% of cases where !fair && on_rq. This scenario does not require a stop-work callback, but would necessitate an additional detach/attach path. As Valentin and Vincent have already discussed, this addition does not appear justified at this time (see [1]). Since can_migrate_task() already checks on_cpu during the cfs_tasks traversal, adding an on_rq check will have negligible performance overhead due to cache locality. There are two reasons for not combining the on_rq check with the cpumask_test_cpu() check: - Avoiding new scenarios that would skip the logic for resetting balance_interval to min_interval. - The existing check for whether the busiest CPU recently triggered active load balancing already filters more cases than the on_rq check. [1]: https://lore.kernel.org/lkml/20190815145107.5318-5-valentin.schneider@arm.com/ Signed-off-by: Xin Zhao Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Link: https://patch.msgid.link/20260617072151.1173416-2-jackzxcui1989@163.com --- kernel/sched/fair.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13482,12 +13482,21 @@ static int sched_balance_rq(int this_cpu * ->active_balance_work. Once set, it's cleared * only after active load balance is finished. */ - if (!busiest->active_balance) { - busiest->active_balance = 1; - busiest->push_cpu = this_cpu; - active_balance = 1; - } + if (busiest->active_balance) + goto no_active_balance; + /* + * @busiest dropped its rq_lock in the middle of + * scheduling out its ->curr task (->on_rq := 0), no + * need to forcefully punt it away with active balance. + */ + if (!busiest->curr->on_rq) + goto no_active_balance; + + busiest->active_balance = 1; + busiest->push_cpu = this_cpu; + active_balance = 1; +no_active_balance: preempt_disable(); raw_spin_rq_unlock_irqrestore(busiest, flags); if (active_balance) {