From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E407E309DCF for ; Fri, 3 Apr 2026 09:19:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775207948; cv=none; b=HCG6Vs94PoY/jt7xtLEY1py/ZrcPEp6HG2AJKrXheV/QvDEja8UxOCRSncb9Oay870YD93uEtTK2A4UlXWYyjEy4sbHITfrDzG3vOQBP/2QE7mRNzH7y0WNwjWxTf5Sn4oZbJ85O05Q+WRD0soZnxC117JSiPfAE2lnWB0MRvfg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775207948; c=relaxed/simple; bh=HJ78V+h0AYSyCgcB8O02JxfQUkNvJO9JYOc75xMkyYo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Sc0TzX65XnxSMfq4ZbmnlKtifx/8r5pubomqYhAUiz3ulMFYwNF1GFrCpGbWybGY4t26OauFJtFsZXrqN3k8xSf9ks5Ly5m7XBAF8+WhysY91AFyeH3Y2FnoQUlvWewpPBBUH92jHrPy7onra3s+UiRqk7Pqywz/JJCsTwBHBnw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=S7chMUY2; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="S7chMUY2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=OuCxN7Q2fWyKYNwsLx9g6Td4CUHAvcIrO68LXd3ryh0=; b=S7chMUY21ngm6p4nM79ovKFYJU E6qCMa0AZaLL0eKayu7++6VFPRUuuIAUMcsVAcHTGjOQAWPDQ1Vang71egD0j3T7yObr94Xv5Ovc0 z0+WD+MBvxYcWl2meSHmRUSP2rApFrAAKiC9GcldD3pMj0EX4fbFi21hy75WYK9SzbOIdZTI+SZcy Egd5uKkdFxI+4pDOtYbdCIBJWZEQ/arQk0DMxMFsUpDv3W3pBKBb+BX9/tqmiZXTgh531yN5mGJ5q PjpZI6omD8HgBc2JGpXnoJImziC09NpxABObJdQsL2OEtVW+b3+n4GhYroyMZTQBWByA/RWZgq2OR BiDjGalQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w8afz-0000000Dmob-3Ckx; Fri, 03 Apr 2026 09:18:51 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 28E8B300346; Fri, 03 Apr 2026 11:18:51 +0200 (CEST) Date: Fri, 3 Apr 2026 11:18:51 +0200 From: Peter Zijlstra To: John Stultz Cc: K Prateek Nayak , LKML , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution Message-ID: <20260403091851.GX3738010@noisy.programming.kicks-ass.net> References: <20260324191337.1841376-1-jstultz@google.com> <36e96f87-a682-436e-aefc-13e2e5810019@amd.com> <20260327114844.GQ2872@noisy.programming.kicks-ass.net> <33e60181-1809-44e1-bc4c-8ac7f79d49d6@amd.com> <20260327160017.GK3738010@noisy.programming.kicks-ass.net> <1515d405-62fc-4952-842f-b69e2bf192c0@amd.com> <20260402155055.GV3738010@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Apr 02, 2026 at 11:31:56AM -0700, John Stultz wrote: > So I like getting rid of proxy_force_return(), but its not clear to me > that proxy_deactivate() is what we want to do in these > find_proxy_task() edge cases. > > It feels like if we are already racing with ttwu, deactivating the > task seems like it might open more windows where we might lose the > wakeup. > > In fact, the whole reason we have proxy_force_return() is that earlier > in the proxy-exec development, when we hit those edge cases we usually > would return proxy_reschedule_idle() just to drop the rq lock and let > ttwu do its thing, but there kept on being cases where we would end up > with lost wakeups. > > But I'll give this a shot (and will integrate your ttwu_runnable > cleanups regardless) and see how it does. So the main idea is that ttwu() will be in charge of migrating back, as one an only means of doing so. This includes signals and unlock and everything. This means that there are two main cases: - ttwu() happens first and finds the task on_rq; we hit ttwu_runnable(). - schedule() happens first and hits this task without means of going forward. Lets do the second first; this is handled by doing dequeue. It must take the task off the runqueue, so it can select another task and make progress. But this had me hit those proxy_deactivate() failure cases, those must not exist. The first is that deactivate can encounter TASK_RUNNING, this must not be, because TASK_RUNNING would mean ttwu() has happened and that would then have sorted everything out. The second is that signal case, which again should not happen, because the signal ttwu() should sort it all out. We just want to take the task off the runqueue here. Now the ttwu() case. So if it is first it will hit ttwu_runnable(), but we don't want this case. So instead we dequeue the task and say: 'nope, wasn't on_rq', which proceeds into the 'normal' wakeup path which does a migration. And note, that if proxy_deactivate() happened first, we simply skip that first step and directly go into the normal wakeup path. There is no worry about ttwu() going missing, ttwu() is changed to make sure any ->TASK_RUNNING transition ensures ->blocked_on gets cleared and the task ends up on a suitable CPU. Anyway, that is the high level idea, like said I didn't get around to doing all the details (and I clearly missed a few :-).