From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 847E8379C3B for ; Wed, 13 May 2026 20:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704433; cv=none; b=PP0CAeAOB9e6+wAv8gyRLOijdaRQVKQRwqo8k5wbCWIHDk0oifkpopLS8EZIlHZ4Zv4lXierE9jooFoYjLcKU/C3wGPUbLTGasYpjuq/utKG/aV8Iup7BWdttzM6h1hzfOBfV6lpD3cu8Z3J1UhI1eH5ajMon2FxQ4Lhh4NC6xo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778704433; c=relaxed/simple; bh=YESSvGLYZOHvqekIBAvKifjum8PdrXsf2pmyO9yVqwk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I1nnfZhJXNDEJ2RBbh2gyM+mVkbGykBtNvQyDI93YTrJID6k2b7w0yQigmsnXDWzjJLTAyakOtCNJNj2GSdbCLb6r5ULF8FGv/aTuuweokA/BLSxDwY1sOevovs6UlSmX1zVj/R4xUGoe6GE3nDbuBcw0skbkfONA6xtN0SpES4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=igLz2Ylj; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="igLz2Ylj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778704427; x=1810240427; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YESSvGLYZOHvqekIBAvKifjum8PdrXsf2pmyO9yVqwk=; b=igLz2YljQ5XhHF31ec/jTtjLTyiZUE0ny9Xmhm0fqivVHu6dECIZDgr6 LlI6BPm/Oo8VI6Wwdy15j6EIfyNOMoL6ZJIJ9RjCFd5FCIQPL1EXHaZwk vrf2kM7vBLiiinmBo/oy3e5xdvfvTQoi166icNh00YCVUbbt7LpW33FRH glFnZ7G6Ta0mVvrIBYmtZWVGGokPNdtwkoFD2/Xv35lqbUcpjEl/bNQh9 89jMa04zRjDm7xlKFjQbpX+L78w2Pd+Uz+KFhf+ejmnrqEosKAMJMMQZB 8GcyxTOq5ZbgjrHv90Gww916XZsQDPnO89PrKbsTCj3unN+AoL0aBfbSz Q==; X-CSE-ConnectionGUID: 6ncHlTb+TQOJWfRCMOKKSw== X-CSE-MsgGUID: amQ8MBS+RfSy4wzuJlvY0A== X-IronPort-AV: E=McAfee;i="6800,10657,11785"; a="79623176" X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="79623176" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2026 13:33:43 -0700 X-CSE-ConnectionGUID: shjoP4GxTQqdqWA/pM63sg== X-CSE-MsgGUID: 8v4OE/Z4TYaR1iaTMQdBlw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,233,1770624000"; d="scan'208";a="238076389" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by orviesa008.jf.intel.com with ESMTP; 13 May 2026 13:33:43 -0700 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , Josh Don , Gavin Guo , Qais Yousef , Libo Chen , Luo Gengkun , linux-kernel@vger.kernel.org Subject: [Patch v4 10/16] sched/cache: Fix unpaired account_llc_enqueue/dequeue Date: Wed, 13 May 2026 13:39:21 -0700 Message-Id: <0c8c6a1571d66792a4d2ff0103ba3cc13e059046.1778703694.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Chen Yu There is a race condition that, after a task is enqueued on a runqueue, task_llc(p) may change due to CPU hotplug, because the llc_id is dynamically allocated and adjusted at runtime. Therefore, checking task_llc(p) to determine whether the task is being dequeued from its preferred LLC is unreliable and can cause inconsistent values. To fix this problem, record whether p is enqueued on its preferred LLC, in order to pair with account_llc_dequeue() to maintain a consistent nr_pref_llc_running per runqueue. This bug was reported by sashiko, and the solution was once suggested by Prateek. Fixes: 46afe3af7ead ("sched/cache: Track LLC-preferred tasks per runqueue") Suggested-by: K Prateek Nayak Signed-off-by: Chen Yu Co-developed-by: Tim Chen Signed-off-by: Tim Chen --- include/linux/sched.h | 2 ++ init/init_task.c | 1 + kernel/sched/fair.c | 31 ++++++++++++++++++++++++++++--- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 95729670929c..2c9e8e2edde1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1410,6 +1410,8 @@ struct task_struct { #ifdef CONFIG_SCHED_CACHE struct callback_head cache_work; int preferred_llc; + /* 1: task was enqueued to its preferred LLC, 0 otherwise */ + int pref_llc_queued; #endif struct rseq_data rseq; diff --git a/init/init_task.c b/init/init_task.c index 5d90db4ff1f8..3ecd66fbd563 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -217,6 +217,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = { #endif #ifdef CONFIG_SCHED_CACHE .preferred_llc = -1, + .pref_llc_queued = 0, #endif #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) .kasan_depth = 1, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 73f185ba6e48..9e6edd40cd80 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1472,15 +1472,32 @@ static bool invalid_llc_nr(struct mm_struct *mm, struct task_struct *p, static void account_llc_enqueue(struct rq *rq, struct task_struct *p) { + int pref_llc, pref_llc_queued; struct sched_domain *sd; - int pref_llc; pref_llc = p->preferred_llc; if (pref_llc < 0) return; + pref_llc_queued = (pref_llc == task_llc(p)); rq->nr_llc_running++; - rq->nr_pref_llc_running += (pref_llc == task_llc(p)); + rq->nr_pref_llc_running += pref_llc_queued; + + /* + * Record whether p is enqueued on its preferred + * LLC, in order to pair with account_llc_dequeue() + * to maintain a consistent nr_pref_llc_running per + * runqueue. + * This is necessary because a race condition exists: + * after a task is enqueued on a runqueue, task_llc(p) + * may change due to CPU hotplug. Therefore, checking + * task_llc(p) to determine whether the task is being + * dequeued from its preferred LLC is unreliable and + * can cause inconsistent values - checking the + * p->pref_llc_queued in account_llc_dequeue() would + * be reliable. + */ + p->pref_llc_queued = pref_llc_queued; sd = rcu_dereference_all(rq->sd); if (sd && (unsigned int)pref_llc < sd->llc_max) @@ -1497,7 +1514,15 @@ static void account_llc_dequeue(struct rq *rq, struct task_struct *p) return; rq->nr_llc_running--; - rq->nr_pref_llc_running -= (pref_llc == task_llc(p)); + if (p->pref_llc_queued) { + rq->nr_pref_llc_running--; + /* + * Update the status in case + * other logic might query + * this. + */ + p->pref_llc_queued = 0; + } sd = rcu_dereference_all(rq->sd); if (sd && (unsigned int)pref_llc < sd->llc_max) { -- 2.32.0