From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4991333A03A for ; Mon, 24 Nov 2025 22:31:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764023476; cv=none; b=cRLtX5BohXNKEc0oHHN85p6bHfn00H5O3tSQEDltkMiySCrZL1aN+0NZtL0RDNI7nkuCZQ+nShB8yTOAZ3ok5CiVvIr/mqlB3bM5kbYXgvmUxm813XBvFwxHFOU9e+c9RMrMOi1YmFZDxYZo2NK3NIB1beP9+mL3DH9qXnAi0yM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764023476; c=relaxed/simple; bh=AVf5P7AUGQRhN3uVgkrSmIHeLUyLaV8e6161HEiBjnQ=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=t+Obs4DTcowzKvv6ECD6DFd5GLUY+ymnUq9ADzjPxJSvbQtmQpmp6mQgsD+e70bSf4wm/YqOMv+wzp2j+2LWi5ez6vcUSShunelAmtBwxqIHpUhVqAlZlCOk6zzEwjW2PA3pQQ9tho9xZBoySXbjYLjE+jkwmmmmvQ7nF9h53Uo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xrsW7eST; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xrsW7eST" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-7b89ee2c1a4so9688651b3a.2 for ; Mon, 24 Nov 2025 14:31:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1764023473; x=1764628273; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=OmWy/TA8Bo/wp/aTeNN4o/9AFIxlQ5ZYmtGjYcMmgzw=; b=xrsW7eSTYjusy4syoTk/HA819f8OlsBVdqzEiFxOpUqgc+yoaiZIg3CdwgjwLVlWtK 3f0HlEQCIsrKKCTDnfbZC4dTNSSW4IsYd1eadYvcy3J5qFgPaFBHMQP61x4c42zRVRmz BlPuxtbyFtKIDJkVoH+fFySSuSHrPNWimoYTkb6Ctew1rLx27LMFRy0qSpg4ZgpE4asI QYtY72pWP7rbAzU5QTFDj8Vzow/TJw88Rd4QGLgJP7T1GaM/1T6u8kwpzxxDDJpGaxuB qKD1QsZL7W5noAHMmfPK7dy3FAAZk7AmmGNd3psIAKEJwiqitj/S1OTP6C/5FeSrqoUk zgYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764023473; x=1764628273; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OmWy/TA8Bo/wp/aTeNN4o/9AFIxlQ5ZYmtGjYcMmgzw=; b=PH1IW4fot27j6ngUHB+UWYBELjQJ1o+Jf/iNAMp8Z+JaUQzCO78f2ATtaxHpxL0mVB JT2r9OVsmSob2AKlNnpqcY2k6J9UyjVKIWOlXWtPvmSW2xO/Cj6Hl7QZs+PqipeJlSAM 4IK1CZMgWblFt1vFjz+NufnkrfS5gp7pmzRRoA/yyOCMGyuABKi/7/qhXL5emMT2BeAv rceIWq1HMj/1UDiFy53S/3LNRC1qwJJr6n7UFMRbV93O3OWmSMMa7EicubmDnL0DGneX 2q74s2Harw5w37/dHrHwPA1tTNfaKZEGlGo0tF6MuUuU61s3PWoHcF+8PnKWfVB3oHHD sPqA== X-Gm-Message-State: AOJu0Yx+WZcmd7Dkv22EMnfVV3C7tU8apHcWm/1GNgdt+uO+F69abDJh t8VebjRx0yeOvdbj4Z+zy15/6ynxSBGN8B3BJnkt56KVkD3ZrgRugCFtV5jdHLR/z5FKWPJuQft iN/6WBDgZbbwtkcjQHyN/lsbAhYhAp+kHLOU4M8A8N7ySs+mdmPs6Pz2divFA5XhcyHIx3MqIA9 xu4fmgCkG1d0ErLUD3kfOlMHymgrJNSpWzrlqHi4cvl1aR9bos X-Google-Smtp-Source: AGHT+IEFTtyjs29Ciet76dxE+xVLHFxzI069R6ESnTsKfdKf3U8F3QNFq0tz+81Y/EbVbfZzYiiTzV1bzgfc X-Received: from pgbdo11.prod.google.com ([2002:a05:6a02:e8b:b0:bac:6acd:817e]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:7490:b0:34e:1009:4205 with SMTP id adf61e73a8af0-36150ec03bfmr13518196637.27.1764023473173; Mon, 24 Nov 2025 14:31:13 -0800 (PST) Date: Mon, 24 Nov 2025 22:30:52 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.52.0.487.g5c8c507ade-goog Message-ID: <20251124223111.3616950-1-jstultz@google.com> Subject: [PATCH v24 00/11] Donor Migration for Proxy Execution (v24) From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey All, Yet another iteration on the next chunk of the Proxy Exec series: Donor Migration This is just the next step for Proxy Execution, to allow us to migrate blocked donors across runqueues to boost remote lock owners. In this portion of the series, I=E2=80=99m only submitting for review and consideration the logic that allows us to do donor=20 (blocked waiter) migration, which requires some additional changes to locking and extra state tracking to ensure we don=E2=80=99t accidentally run a migrated donor on a cpu it isn=E2=80=99t affined to, as well as some extra handling to deal with balance callback state that needs to be reset when we decide to pick a different task after doing donor migration. In the last iteration, K Prateek provided some really great review feedback, so I=E2=80=99ve tried to integrate all of his suggested cleanups and improvements. Many thanks again to K Prateek! Additionally, in my continued efforts to make Proxy Execution and sched_ext play well together, I realized a bug I saw with sched_ext was actually a larger issue around the sched class implementations assumptions that the =E2=80=9Cprev=E2=80=9D argument passed= in from __schedule() is stable across rq lock drops. Without Proxy Exec, =E2=80=9Cprev=E2=80=9D is always =E2=80=9Ccurrent=E2=80=9D and is on = the cpu, so this assumption held, but with Proxy Exec, =E2=80=9Cprev=E2=80=9D is =E2=80=9Cr= q->donor=E2=80=9D, and if the rq lock is dropped, the rq->donor may be woken up on another cpu and return migrated away, with rq->donor being set to idle. So I=E2=80=99ve gone through the class schedulers for both pick_next_task() and prev_balance() and removed the prev argument. Reworking the functions to sample rq->donor, particularly after a rq lock drop. New in this iteration: * Reworking pick_next_task() and prev_balance() to not pass prev argument which might go stale across rq lock drops=20 * Change to avoid null ptr traversal task calls yield when rq->donor is idle.=20 * _Lots_ of cleanups and improvements suggested by K Prateek. * Fix for edge case where select_task_rq() chooses the current cpu and we don=E2=80=99t call set_task_cpu(), which caused wake_cpu to go stale=20 I=E2=80=99d love to get further feedback on any place where these patches are confusing, or could use additional clarifications. In the full series, there=E2=80=99s a number of fixes for issues found enabling and testing with sched_ext, along with another revision of Suleiman=E2=80=99s rwsem support. I=E2=80=99d appreciate any testing or comments that folks have with the fully set: You can find the full Proxy Exec series here: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v24-6.18-= rc6 https://github.com/johnstultz-work/linux-dev.git proxy-exec-v24-6.18-rc6 Issues still to address with the full series: * Continue working to get sched_ext to be ok with Proxy Execution enabled. * I=E2=80=99ve reproduced the performance regression K Prateek Nayak found with the full series. I=E2=80=99m hoping to work to understand and narrow the issue down soon. * The chain migration functionality needs further iterations and better validation to ensure it truly maintains the RT/DL load balancing invariants (despite this being broken in vanilla upstream with RT_PUSH_IPI currently) Future work: * Expand to more locking primitives: Figuring out pi-futexes would be good, using proxy for Binder PI is something else we=E2=80=99re exploring. * Eventually: Work to replace rt_mutexes and get things happy with PREEMPT_RT I=E2=80=99d really appreciate any feedback or review thoughts on the full series as well. I=E2=80=99m trying to keep the chunks small, reviewable and iteratively testable, but if you have any suggestions on how to improve the larger series, I=E2=80=99m all ears. Credit/Disclaimer: =E2=80=94-------------------- As always, this Proxy Execution series has a long history with lots of developers that deserve credit:=20 First described in a paper[1] by Watkins, Straub, Niehaus, then from patches from Peter Zijlstra, extended with lots of work by Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank you to Steven Rostedt for providing additional details here!). Thanks also to Joel Fernandes, Dietmar Eggemann, Metin Kaya, K Prateek Nayak and Suleiman Souhlal for their substantial review, suggestion, and patch contributions. So again, many thanks to those above, as all the credit for this series really is due to them - while the mistakes are surely mine. Thanks so much! -john [1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf Cc: Joel Fernandes Cc: Qais Yousef =20 Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com John Stultz (10): locking: Add task::blocked_lock to serialize blocked_on state sched: Fix modifying donor->blocked on without proper locking sched/locking: Add special p->blocked_on=3D=3DPROXY_WAKING value for prox= y return-migration sched: Add assert_balance_callbacks_empty helper sched: Add logic to zap balance callbacks if we pick again sched: Handle blocked-waiter migration (and return migration) sched: Rework pick_next_task() and prev_balance() to avoid stale prev references sched: Avoid donor->sched_class->yield_task() null traversal sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case sched: Migrate whole chain in proxy_migrate_task() Peter Zijlstra (1): sched: Add blocked_donor link to task for smarter mutex handoffs include/linux/sched.h | 95 +++++--- init/init_task.c | 5 + kernel/fork.c | 5 + kernel/locking/mutex-debug.c | 4 +- kernel/locking/mutex.c | 82 +++++-- kernel/locking/mutex.h | 6 + kernel/locking/ww_mutex.h | 16 +- kernel/sched/core.c | 418 +++++++++++++++++++++++++++++++---- kernel/sched/deadline.c | 8 +- kernel/sched/ext.c | 8 +- kernel/sched/fair.c | 15 +- kernel/sched/idle.c | 2 +- kernel/sched/rt.c | 8 +- kernel/sched/sched.h | 17 +- kernel/sched/stop_task.c | 2 +- kernel/sched/syscalls.c | 3 +- 16 files changed, 582 insertions(+), 112 deletions(-) --=20 2.52.0.487.g5c8c507ade-goog