From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6725D22A4EE for ; Fri, 3 Apr 2026 14:38:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775227116; cv=none; b=FIljQKFVzRSzYw6nrh7T5B6S87QRWu6dDVckvAYtqTIzQ7ZX+4AZXhZ+chypcSkDJptEGTzoakp/r++TIZrNYXPVYClY7LY9b/zE4vrVSKl4v+jfMt/hdY3K1lynjSIIe83Sf/iz85lcUJqSrPOWhOmxmWe60v4fsrvvxFjIFpw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775227116; c=relaxed/simple; bh=En/3xiMQ7h73eOx5Gq0638B9TSMQcZPMznHubvF6QAY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=P/+lBDpzKy+o2Jd54LuhDbcD9FTGu8diTfyxditnojbi9O0HsMkk+NRuzuFroZ4I96LEGifk+nGshLopOecHbAQxkrSLiaz7flt6TchGHKZsvjwh2k4udK5QZDODor+LDGRp30VmtxJsb4KQWFR0xbJWUwQnFD1tGfN4EfQMjvw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=jyGGq/LA; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="jyGGq/LA" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=xjv/MZtT9tLTkwg0Z7RFGLMtkV6HOWfHFz8ffCdnKmE=; b=jyGGq/LAMAkawXLqHerUMT7Rww yktDxpZSP9QuEBRseEHFzmUkmKJne1xKAfWGxAYkJjY4lcXwRblMfbGF2ILwFrIUuxJbR8mdR2HTQ upjChYOAAR5LiN0d56A6oFbIfOapMuI+VBaWwX/ARbqX50GJx6V/isfmoHhRtWi6j8Z1aWgjm/Qke 3GOpu3gyWOBT25ktsmBl0jii8RRpq8A1dA75+OnOYnd+P3ruaezqJplZhHza+FF63Q6UEuTCXif+a 8kRM1HcKiWALdRzB7aLAn/q0U8DfcT1If4BferWE44bBr4iypgWKJu1LrOfKJph9Y3yfgyWLW0yg9 blllXKfA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w8ffA-00000004GTa-2OiI; Fri, 03 Apr 2026 14:38:21 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 1E7733004F8; Fri, 03 Apr 2026 16:38:19 +0200 (CEST) Date: Fri, 3 Apr 2026 16:38:19 +0200 From: Peter Zijlstra To: K Prateek Nayak Cc: John Stultz , LKML , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution Message-ID: <20260403143819.GC2872@noisy.programming.kicks-ass.net> References: <33e60181-1809-44e1-bc4c-8ac7f79d49d6@amd.com> <20260327160017.GK3738010@noisy.programming.kicks-ass.net> <1515d405-62fc-4952-842f-b69e2bf192c0@amd.com> <20260402155055.GV3738010@noisy.programming.kicks-ass.net> <20260403095225.GY3738010@noisy.programming.kicks-ass.net> <1d2d4596-93d6-4d87-babc-084b8d6c2d98@amd.com> <20260403112810.GG3738786@noisy.programming.kicks-ass.net> <5e2337e7-24d8-4cc7-a86d-b8f5a19fb3ce@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5e2337e7-24d8-4cc7-a86d-b8f5a19fb3ce@amd.com> On Fri, Apr 03, 2026 at 07:13:29PM +0530, K Prateek Nayak wrote: > Hello Peter, > > On 4/3/2026 4:58 PM, Peter Zijlstra wrote: > > On Fri, Apr 03, 2026 at 03:55:22PM +0530, K Prateek Nayak wrote: > >>>> @@ -4256,6 +4277,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > >>>> */ > >>>> smp_cond_load_acquire(&p->on_cpu, !VAL); > >>>> > >>>> + /* > >>>> + * We never clear the blocked_on relation on proxy_deactivate. > >>>> + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on > >>>> + * when waking up. Since this is a fully blocked, off CPU task > >>>> + * waking up, it should be safe to clear the blocked_on relation. > >>>> + */ > >>>> + if (task_is_blocked(p)) > >>>> + clear_task_blocked_on(p, NULL); > >>>> + > >>> > >>> Aah, yes! This is when find_proxy_task() hits deactivate() for us and we > >>> skip ttwu_runnable(). We still need to clear ->blocked_on. > > > > I wonder, should we have proxy_deactivate() do this instead? > > That is one way to tackle that, yes! OK, lets put it there. At that site we already know task_is_blocked() and we get less noise in the wakeup path. Or should we perhaps put it in block_task() itself? The moment you're off the runqueue, ->blocked_on becomes meaningless. > >>> > >>>> cpu = select_task_rq(p, p->wake_cpu, &wake_flags); > >>>> if (task_cpu(p) != cpu) { > >>>> if (p->in_iowait) { > >>>> @@ -6977,6 +7007,10 @@ static void __sched notrace __schedule(int sched_mode) > >>>> switch_count = &prev->nvcsw; > >>>> } > >>>> > >>>> + /* See: https://github.com/kudureranganath/linux/commit/0d6a01bb19db39f045d6f0f5fb4d196500091637 */ > >>>> + if (!prev_state && task_is_blocked(prev)) > >>>> + clear_task_blocked_on(prev, NULL); > >>>> + > >>> > >>> This one confuses me, ttwu() should never results in ->blocked_on being > >>> set. > >> > >> This is from the signal_pending_state() in try_to_block_task() putting > >> prev to TASK_RUNNING while still having p->blocked_on. > > > > Ooh, I misread that. I saw that set_task_blocked_on_waking(, NULL) in > > there and though it would clear. Damn, sometimes reading is so very > > hard... > > > > Anyhow, with my changes on, try_to_block_task() is only ever called from > > __schedule() on current. This means this could in fact be > > clear_task_blocked_on(), right? > > Yes, that should work too but there is also the case you mentioned on > the parallel thread - if ttwu() doesn't see p->blocked_on, it won't > clear it and if ttwu_runnable() wins we can simply go into > __schedule() with TASK_RUNNING + p->blocked_on with no guarantee that > same task will be picked again. Then the task gets preempted and stays > in an illegal state. > > Clearing of ->blocked_on in schedule also safeguards against that race: > > o Scenario 1: ttwu_runnable() wins > > CPU0 CPU1 > > LOCK ACQUIRE > [W] ->blocked_on = lock [R] ->__state > [W] ->__state = state; RMB > [R] ->blocked_on > [W] ->__state = RUNNING > > /* Synchronized by rq_lock */ > > MB > [R] if (!->__state && > [R] ->blocked_on) > [W] ->blocked_on = NULL; /* Safeguard */ > > > o Scenario 2: __schedule() wins > > CPU0 CPU1 > > LOCK ACQUIRE > [W] ->blocked_on = lock [R] ->__state > [W] ->__state = state; > > /* Synchronized by rq_lock */ > MB > [W] __block_task() > > MB > /* Full wakeup. */ > [R] if (->blocked_on) > [W] ->blocked_on = NULL > Oh Boohoo :-( Yes, you're quite right. Is find_proxy_task() that is affected, so this fixup should perhaps live in the existing sched_proxy_exec() branch? Something like so? --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2222,6 +2222,13 @@ static bool block_task(struct rq *rq, st { int flags = DEQUEUE_NOCLOCK; + /* + * We're being taken off the runqueue, cannot still be blocked_on + * anything. This also means that delay_dequeue can not have + * blocked_on. + */ + clear_task_blocked_on(p, NULL); + p->sched_contributes_to_load = (task_state & TASK_UNINTERRUPTIBLE) && !(task_state & TASK_NOLOAD) && @@ -6614,7 +6621,7 @@ static bool try_to_block_task(struct rq if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p = TASK_RUNNING; - set_task_blocked_on_waking(p, NULL); + clear_task_blocked_on(p, NULL); return false; } @@ -7043,6 +7050,14 @@ static void __sched notrace __schedule(i if (sched_proxy_exec()) { struct task_struct *prev_donor = rq->donor; + /* + * There is a race between ttwu() and __mutex_lock_common() + * where it is possible for the mutex code to call into + * schedule() with ->blocked_on still set. + */ + if (!prev_state && prev->blocked_on) + clear_task_blocked_on(prev, NULL); + rq_set_donor(rq, next); if (unlikely(next->blocked_on)) { next = find_proxy_task(rq, next, &rf);