From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16CCF286419 for ; Fri, 3 Apr 2026 12:54:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775220888; cv=none; b=c9KKB2npdkZQkMv4LcDCR9rPDTPjI1+NzOkpBQ8RJmB5ZJypUv/ErVjSBUybVRXogu2pmaqHaMTiFa9UI1vyZE0ZYuLSwbd3doWYiLdIq0vvTy9JS2sQNlGQsMYp3leo0GgGtg8+EgVG6bsZ4zv+3udzLqS18hOTLISvZX3+Dys= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775220888; c=relaxed/simple; bh=Q2PCVhkwtqexaAJU/otM2OduYTHe5hlnNUc/G1bNCU0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pnmz7/iTw6WzY1gTdU/7mjfmrOpH9gSnIe1cN7To+E3IeRFh3X1zeU7m000b80tFIaxa48F8o/5xTJcwE5q2OuMso5hqsreaJ3qqxDQP9H6oODeU8/cN6Tdegperp3Ja7MAH1JNDpdZyE4bC8HAwCWUfUmbYBMTA31yxCk5QkbQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=VLCncrxp; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VLCncrxp" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=RCWa9t186axQk2K8xZXoApXai1G4EvJ85iSkj2FOoJ8=; b=VLCncrxp1ce3qoAhmJdg3o5Oy3 h5vPMXb8DQf0x2Qxy5LYvc4zrAPOvUwnwHWVpPhp8gc+lXpAwR2GkQ9QSthqgCXZ5Qs/a/C2Am6A6 sI3dKC66rANzmbvYPZyDXWZWq7sgqk0pO089lu6xcNjP6bDm6L27rgjbDPdQ3R8y6Ugbj8neHcS4B 5Qw2QkdV9WuTFHufuht/wPBlC4kGJOK4u/iDEQvdsoS8zSMzzp6szT3FGnuKYWdmqvTr8tkAZIPr+ lBAFEoy/5k3lcz+4et0ODhNIf1Zki6FzeFjzj/nra2xzkDoTpWuQ34RSOT3nX02pTK59lxhGr2cdb sNs80DGg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w8e2c-0000000EQos-0Qbt; Fri, 03 Apr 2026 12:54:26 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 6BCB9301BDE; Fri, 03 Apr 2026 14:54:24 +0200 (CEST) Date: Fri, 3 Apr 2026 14:54:24 +0200 From: Peter Zijlstra To: K Prateek Nayak Cc: John Stultz , LKML , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution Message-ID: <20260403125424.GA2872@noisy.programming.kicks-ass.net> References: <36e96f87-a682-436e-aefc-13e2e5810019@amd.com> <20260327114844.GQ2872@noisy.programming.kicks-ass.net> <33e60181-1809-44e1-bc4c-8ac7f79d49d6@amd.com> <20260327160017.GK3738010@noisy.programming.kicks-ass.net> <1515d405-62fc-4952-842f-b69e2bf192c0@amd.com> <20260402155055.GV3738010@noisy.programming.kicks-ass.net> <20260403095225.GY3738010@noisy.programming.kicks-ass.net> <1d2d4596-93d6-4d87-babc-084b8d6c2d98@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1d2d4596-93d6-4d87-babc-084b8d6c2d98@amd.com> On Fri, Apr 03, 2026 at 03:55:22PM +0530, K Prateek Nayak wrote: > >> if (sched_proxy_exec() && p->blocked_on) { > > > > So I had doubts about this lockless test of ->blocked_on, I still cannot > > convince myself it is correct. > > Let me give a try: A task's "blocked_on" starts off as a valid mutex and > can be transitioned optionally to PROXY_WAKING (!= NULL) before being > cleared. > > If blocked_on is cleared directly, PROXY_WAKING transition never > happens even if someone does set_task_blocked_on_waking() since we bail > out early if !p->blocked_on. > > All "p->blocked_on" transition happen with "blocked_on_lock" held. > > So that begs the question, when is "blocked_on" actually cleared? > > 1) If the task is task_on_rq_queued(), we either clear it in schedule() > (find_proxy_task() to be precise) or in ttwu_runnable() - both with > rq_lock held. > > 2) *NEW* If the task is off rq and is waking up, it means there is a > ttwu_state_match() and without proxy, the task would have woken up > and executed on the CPU. > > Since the task is completely off rq, schedule() cannot clear the > p->blocked_on. Only other remote transition possible is to > PROXY_WAKING (!= NULL). > > So *inspecting* the p->blocked_on relation without the > blocked_on_lock held should be fine to know if the task has a > blocked_on relation. > > Only the task itself can set "p->blocked_on" to a valid mutex when > running on the CPU so it is out of question we can suddenly get a > transition to a new mutex when we are in schedule() or in middle of > waking the task. So my consideration was: __mutex_lock_common() ... raw_spin_lock(¤t->blocked_lock); __set_task_blocked_on(current, lock) current->blocked_on = lock; set_current_state(state) current->__state = state; smp_mb(); This means we have: LOCK [W] ->blocked_on = lock [W] ->__state = state; MB Then consider: try_to_wake_up() ... raw_spin_lock_irqsave(&p->lock); if (ttwu_state_match(p, state, &success)) ... smp_rmb(); if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags)) if (sched_proxy_exec() && p->blocked_on) This is effectively: ACQUIRE [R] ->__state RMB [R] ->blocked_on Combined this gives: CPU0 CPU1 LOCK ACQUIRE [W] ->blocked_on = lock [R] ->__state [W] ->__state = state; RMB MB [R] ->blocked_on And that is *NOT* properly ordered. It is possible to observe [W] __state and pass ttwu_state_match() and NOT observe [W] ->blocked_on and see !->blocked_on. (on weakly ordered machines, obviously) So that does a ttwu() but will 'retain' ->blocked_on -- which violates the model. Which is about where I got. That said; this race, while valid, doesn't actually harm. Because as you say, this means that CPU1 is in the middle of mutex_lock() and will observe the wakeup and cancel the block and clean up ->blocked_on itself. So yeah, I think we're good.