From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CBE3146D5A for ; Tue, 2 Dec 2025 05:52:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764654742; cv=none; b=rAjbS9W3NL+lqwTSEiIlOgtAuzLSCHUS9xA+JCg2yBGGDJuAZNN+iYXcNU8Ysk2sTIiwxsb5dR4lD+6vgCjNdRduRuTsMZBaTbf8mB/47NtxrVWYnozdbEFD01cffleJlHhds4fbHbMMwb2JjV03TB2QReaWn/ts7q/e4Nu/VQU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764654742; c=relaxed/simple; bh=Ogu+YtQz1LMz7H3ecOZjk7giHL46mlNyYiyR7guErCM=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=kKc/7pFKt7tan2/tK/H0n6aG7PLFn0w1CMtzJ49xP5VbhiCzPOiY4/6GVNWfFkXWY2NgxP9TEE0CWT+xtjhhUgPgWklFZvj/eMUJw9jEncOY5TGqmW93sYQ67APUaFf5TNmrbnLhxFu87xg1+WNlGpDrlWiut0qCITrsW1YdHcI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=QFVkUApg; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="QFVkUApg" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1764654738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=tDeNnTreAroCxw2wpVjWfiXAm/iVpkzYmqDwXoL/kKc=; b=QFVkUApgLYpU/F0IT2URBf87jWH8aqQOO+tNa9WgeYQhn/rBwgN3NBTHMv7JmBHgb5QbBf 4vzwvPPegVbyrnOwGwj2k9vajM+oAqYAo3FC5BQCxzXn0ndUnvLK/qRLrCLw6QnJ1SvbUd C2BGdYpuL2vweITe96bWyZApcMTL3oY= From: wen.yang@linux.dev To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt Cc: Wen Yang , linux-kernel@vger.kernel.org Subject: [PATCH 0/2] sched: expose RT throttling info to facilitate priority reversal processing Date: Tue, 2 Dec 2025 13:51:17 +0800 Message-Id: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Wen Yang This series helps to solve a priority reversal issue, where a CFS task waits for an rtmutex lock, an RT task holding the lock stops due to RT throttling, and higher priority RT tasks frequently trigger RT throttling due to long-term CPU consumption. Details of it: A priority inversion scenario can occur when a CFS task is starved due to RT throttling. The scenario is as follows: 0. An rtmutex (e.g., softirq_ctrl.lock) is contended by both CFS tasks (e.g., ksoftirqd) and RT tasks (e.g., ktimer). 1. An RT task 'A' (e.g., ktimer) acquired the rtmutex. 2. A CFS task 'B' (e.g., ksoftirqd) attempts to acquire the same rtmutex and blocks. 3. A higher-priority RT task 'C' (e.g., stress-ng) runs for an extended period, preempting task 'A' and causing the RT runqueue to be throttled. 4. Once throttled, CFS task 'B' should run, but it remains blocked because the lock is still held by the non-running RT task 'A'. This can even lead to the CPU going idle. 5. When the throttle period ends, the high-priority RT task 'C' resumes execution, and the cycle repeats, leading to indefinite starvation of CFS task 'B'. A typical stack trace for the blocked ksoftirqd shows it in a 'D' (TASK_RTLOCK_WAIT) state, waiting on the lock: ksoftirqd/5-61 [005] d...211 58212.064160: sched_switch: prev_comm=ksoftirqd/5 prev_pid=61 prev_prio=120 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120 ksoftirqd/5-61 [005] d...211 58212.064161: => __schedule => schedule_rtlock => rtlock_slowlock_locked => rt_spin_lock => __local_bh_disable_ip => run_ksoftirqd => smpboot_thread_fn => kthread => ret_from_fork These two patches expose the TASK_RTLOCK_WAIT state and add throttle_count to rt_rq for monitoring in /proc/sched_debug. User-space tools like stalld can use this info to detect and resolve the inversion, for example, by boosting the lock holder or adjusting the priority of the blocked CFS task in TASK_RTLOCK_WAIT state. Wen Yang (2): sched/debug: add explicit TASK_RTLOCK_WAIT printing sched/rt: add RT throttle statistics fs/proc/array.c | 3 ++- include/linux/sched.h | 21 +++++++++------------ include/trace/events/sched.h | 1 + kernel/sched/debug.c | 1 + kernel/sched/rt.c | 1 + kernel/sched/sched.h | 1 + 6 files changed, 15 insertions(+), 13 deletions(-) Cc: linux-kernel@vger.kernel.org -- 2.25.1