From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3029B35FF7B for ; Fri, 5 Dec 2025 15:24:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764948306; cv=none; b=cEnah+PbQd09ZT3nyva/7dzqXn1tKo2G25VsFfvfB97dxr/OqmCBaD+Or/LOfxZOBERWlNKBSXQIja76NX8gf3HRBzV2rPAyIt5NNMRh9z++kDOHHAF+AQvRg6eaaD8kSF5gQ0LFdrxEagle52A3/dLUasp6hfKdiyMThXCKzhs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764948306; c=relaxed/simple; bh=8m3hoBxRSMTxXggxXCcAxnqSgLRWK0kA+Y9NqBslei0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bAyVF/reDqTZBaHfoGc8/aTTEeKm+UWpwwjV4iTChA17A046RspV1RLF0/Ly99Zul2iGzBHk/VWzFpbYCstIHvHr1O4svaEhxTatkosgtzpCX9AO8JEIMbIqFrObNZYB3GwANkknqXlhET2Y6rKI9zEYbFR5EbJtojHKAe4ltes= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=PiBdO/yi; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="PiBdO/yi" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=UirOD9y2Buytd33NkjPJNXO4NKeNcUOHB9G41BTod9M=; b=PiBdO/yiUpxPaFeQZu0bqxHQjM FmYh1Oggqkeb4R3WNEdZ+eH1Q2SzG5v39GkgmdUuWGNjK3sHvzsPZZ5b9AdtdUe3Lgh3DcWn0Br1t d2nL798Vmi9Ormvga0YOzt1opSpB8bDsEvXyzRi0A3sk6WnQBqZvH5QFd0iIxL07O4LvexDCpe7ai rYgfzTwjNgHq0ZYIc2YSsN3shsZzDeZz9mNiRSymJq2jb3DNYL+uLw4GOuySOfdQtqZ/f2PL7N2Ta 3QALygaApsBOT/yFmax7dj6QkPg7bXe+WUhADIW/R9HDUahUxz1nYpfzQiuC6r4nGKdAUMIMzbBXM JzpwD07w==; Received: from 2001-1c00-8d85-5700-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:5700:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vRXff-00000005ryp-2R5j; Fri, 05 Dec 2025 15:24:35 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id C1EAE3002E5; Fri, 05 Dec 2025 16:24:32 +0100 (CET) Date: Fri, 5 Dec 2025 16:24:32 +0100 From: Peter Zijlstra To: John Stultz Cc: LKML , K Prateek Nayak , Haiyue Wang , Johannes Weiner , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Suren Baghdasaryan , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Subject: Re: [PATCH v3] sched: Fix psi_dequeue for Proxy Execution Message-ID: <20251205152432.GC2528459@noisy.programming.kicks-ass.net> References: <20251205012721.756394-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251205012721.756394-1-jstultz@google.com> On Fri, Dec 05, 2025 at 01:27:09AM +0000, John Stultz wrote: > Currently, if the sleep flag is set, psi_dequeue() doesn't > change any of the psi_flags. > > This is because psi_task_switch() will clear TSK_ONCPU as well > as other potential flags (TSK_RUNNING), and the assumption is > that a voluntary sleep always consists of a task being dequeued > followed shortly there after with a psi_sched_switch() call. > > Proxy Execution changes this expectation, as mutex-blocked tasks > that would normally sleep stay on the runqueue. But in the case > where the mutex-owning task goes to sleep, or the owner is on a > remote cpu, we will then deactivate the blocked task shortly > after. > > In that situation, the mutex-blocked task will have had its > TSK_ONCPU cleared when it was switched off the cpu, but it will > stay TSK_RUNNING. Then if we later dequeue it (as currently done > if we hit a case find_proxy_task() can't yet handle, such as the > case of the owner being on another rq or a sleeping owner) > psi_dequeue() won't change any state (leaving it TSK_RUNNING), > as it incorrectly expects a psi_task_switch() call to > immediately follow. > > Later on when the task get woken/re-enqueued, and psi_flags are > set for TSK_RUNNING, we hit an error as the task is already > TSK_RUNNING: > psi: inconsistent task state! task=188:kworker/28:0 cpu=28 psi_flags=4 clear=0 set=4 > > To resolve this, extend the logic in psi_dequeue() so that > if the sleep flag is set, we also check if psi_flags have > TSK_ONCPU set (meaning the psi_task_switch is imminent) before > we do the shortcut return. > > If TSK_ONCPU is not set, that means we've already switched away, > and this psi_dequeue call needs to clear the flags. > > Fixes: be41bde4c3a8 ("sched: Add an initial sketch of the find_proxy_task() function") > Reported-by: K Prateek Nayak > Closes: https://lore.kernel.org/lkml/20251117185550.365156-1-kprateek.nayak@amd.com/ > Signed-off-by: John Stultz > Tested-by: K Prateek Nayak > Tested-by: Haiyue Wang > Acked-by: Johannes Weiner Stuck this on my post rc1 pile to look at.