From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ptr1337.dev (mail.ptr1337.dev [202.61.224.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F153359A74 for ; Thu, 23 Apr 2026 05:49:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.61.224.105 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776923354; cv=none; b=SM8wd/4uH97beGZAqVu00rmhoGI6z28+X/5CacoYG3uILQs0gKpPRPYH3lC7Kv+oXdpX5GNuqXwFqMafNFhAOJV68SswAs176SeQIGaMbwpFQ/0utAgrvl6Wpi+S3kHtKf6u8ZFyXrFqGBm+RY3UoOLKJJio6nkSVty/a76FZ4s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776923354; c=relaxed/simple; bh=Dn8sqwpIT1rKjbTOQSVm7TmBnbEqUst8/8o2ycZJtZM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=M8/KLWBLO7brAZ03lfbS1qIEwv685VWFw9LlUjJMQWS9CFCKfM5IjzpN1zQ7PQkZI4BSdOcnWXFH4Gp0Jh3nw6uJDa5GbqIDYL62I1n6RvPAIWsh1rB62/vfsoCSRJoCrE9IyeE7iIb49YM9pApup+8W17GaM8L+RmTp6Iuyoss= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org; spf=pass smtp.mailfrom=cachyos.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b=exYHmLbX; arc=none smtp.client-ip=202.61.224.105 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cachyos.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b="exYHmLbX" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 67F30285DB5; Thu, 23 Apr 2026 07:48:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cachyos.org; s=dkim; t=1776923344; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=t+OgtqJwCQxNaK+11CZpGdgp2FP+iZ60yqtubNngZks=; b=exYHmLbXvTCv2fDXVjftErGqu6K5t2OWtuLzJqDMStAe3/Gw7orgxc4oyJtry61MPikvg+ nTa2mQQluu0J83Nsn00Ee9kxb/vE8/9Qs9mATew4Yb5PnlzlLmHk1voCBMvb19RuyGrfpN JU+Jb4wCI1Xfgl9DZ85+0M6gSGJFSrmh9/dOtJWJpXPjaSAE1rxA8OIIHIMSntA59LkMAw RW5ruJYSTQtdjAF5qTCXcq2kr03I92IBF5Yjweq/DiSJoXxw8CPiCzCU2fRH1yGhGUL1c2 LZS9AIMKj5qXhYT+6pi7J2jS9MSuJuJvpv5Cj32KfGHvuosbSz4h/8rIFevCVw== Message-ID: <65622eeb-648b-4f9a-99ff-edb8ddf9db2f@cachyos.org> Date: Thu, 23 Apr 2026 05:48:00 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v2] sched/idle: Fix avg_idle saturation by establishing symmetric idle entry hook To: Masahito S , mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org Cc: dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, christian.loehle@arm.com References: <20260417020654.911709-1-firelzrd@gmail.com> <20260423023322.1293923-1-firelzrd@gmail.com> From: Eric Naim In-Reply-To: <20260423023322.1293923-1-firelzrd@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Last-TLS-Session-Version: TLSv1.3 On 4/23/26 10:33 AM, Masahito S wrote: > update_rq_avg_idle(), called from put_prev_task_idle(), computes > rq->avg_idle as rq_clock() - rq->idle_stamp. However, idle_stamp is > only set by sched_balance_newidle() when a CPU enters CPU_NEWLY_IDLE > through the fair class path. When the idle task is preempted without > sched_balance_newidle() having run (boot, hotplug, sched class > transitions), idle_stamp remains 0, producing a delta equal to > rq_clock() — a value in the billions of nanoseconds — which saturates > avg_idle at 2 * max_idle_balance_cost. > > This inflated avg_idle prevents sched_balance_newidle() from > early-returning (fair.c: avg_idle < max_newidle_lb_cost check), > making it overly aggressive. The resulting excess newidle migrations > override wake-time placement decisions made by select_idle_sibling(), > degrading cache locality that careful placement (recent_used_cpu, > select_idle_core, etc.) is designed to preserve. > > Fix this by: > > 1. Adding an idle_stamp validity guard to update_rq_avg_idle(), so > that a zero idle_stamp is never used as a timestamp. > > 2. Setting idle_stamp in set_next_task_idle() when it has not already > been set by sched_balance_newidle(). This establishes a symmetric > idle entry/exit contract: set_next_task_idle() marks the start of > the idle period, put_prev_task_idle() measures and records it via > update_rq_avg_idle(). > > The entry hook preserves idle_stamp if sched_balance_newidle() has > already set it, maintaining the existing semantic where balance-attempt > duration is included in the idle measurement. > > Signed-off-by: Masahito Suzuki Should this have Fixes: 4b603f1551a73 ("sched: Update rq->avg_idle when a task is moved to an idle CPU") > --- > Changes in v2: > - Added missing Signed-off-by tag (no functional changes). > Thanks to Eric Naim and Christian Loehle for pointing this out. > > kernel/sched/core.c | 3 +++ > kernel/sched/idle.c | 3 +++ > 2 files changed, 6 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 496dff740d..ec801f731c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3633,6 +3633,9 @@ static inline void ttwu_do_wakeup(struct task_struct *p) > > void update_rq_avg_idle(struct rq *rq) > { > + if (!rq->idle_stamp) > + return; > + > u64 delta = rq_clock(rq) - rq->idle_stamp; > u64 max = 2*rq->max_idle_balance_cost; > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index a83be0c834..9ceb7e6224 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -491,6 +491,9 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir > schedstat_inc(rq->sched_goidle); > next->se.exec_start = rq_clock_task(rq); > > + if (!rq->idle_stamp) > + rq->idle_stamp = rq_clock(rq); > + > /* > * rq is about to be idle, check if we need to update the > * lost_idle_time of clock_pelt -- Regards, Eric