From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FE311D554 for ; Tue, 21 Jan 2025 07:23:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737444190; cv=none; b=rz7hRuX+G3F0plOv5E5vvP8Y2gxF9frY8scdoCDU4MkOqU91/gMscRC7zbD+Vh/FHDr0G2g8qx8Uh88YYQeBsZGAvWGaU0Qp2Q4xF4neJOjinEuznIczo0ihD5BQVV7VPVf1UO06211Sy7xjtMgE4BsWJoCpQbeUQjlKlLG0VPk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737444190; c=relaxed/simple; bh=ycRKfTxtbHYRIDPIHmQ1owVeaasFWN6+/FhE+eT6LZM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rb3FZStZ0LTEpEEtb9LqGgQtCcwIYIQZ2ObPYKbguDHSXjsHbNoJaYL3kCtFfBAT0qMRKl2CYo4sMal4qJz/9pd7VDt/PPkrkN0t8z+3UNB8jt/SJ6bmfIKNsfqOlevBb/BIYG2dC/4EIuNkEQSKjfPZg4KmXXpfG7dyT+bXjdI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lIm5SjAF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lIm5SjAF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8BE3BC4CEDF; Tue, 21 Jan 2025 07:23:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737444189; bh=ycRKfTxtbHYRIDPIHmQ1owVeaasFWN6+/FhE+eT6LZM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lIm5SjAFW9pKVzwYfGMitgc5AV7JEZm0gHHmM97VupdRhNefdCo0ZeJMGXPusoR9b Z/DJkk7Wq2q8HRI9OZDTsL6gpkFtEgpwJO4jNs1xJERMsgcjRZ47I49G3MYN8z4kox HNLU1NYRKodIjid8j2NjqrdDoRQb8epcnO1F4WR+ZeMMMZn0f0I+H7SkS0alhb7AkM bZEbcgXVsb0jdUTDaYDeagHyOPTgTCeukwHujbs1sk4QTYTLS7Juv8FO8tvQh661OK GW1lTevofNw+RkT1sGs9uITnoVXHRp51rRpU7TUpRLR354nNp97CYmR4UJwCfVpHH+ vnYgRHOixGwlg== Date: Tue, 21 Jan 2025 08:23:03 +0100 From: Ingo Molnar To: Mathieu Desnoyers Cc: Linus Torvalds , linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Shrikanth Hegde , Tejun Heo Subject: [GIT PULL v2] Scheduler enhancements for v6.14 Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: * Mathieu Desnoyers wrote: > On 20-Jan-2025 12:07:41 PM, Ingo Molnar wrote: > > > > Linus, > > > > Please pull the latest sched/core Git tree from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2025-01-20 > > > > # HEAD: 7d9da040575b343085287686fa902a5b2d43c7ca psi: Fix race when task wakes up before psi_sched_switch() adjusts flags > > > > Scheduler enhancements for v6.14: > > [...] > > > - RSEQ enhancements: > > > > - Validate read-only fields under DEBUG_RSEQ config > > (Mathieu Desnoyers) > > FYI, a regression introduced by this commit was reported by s390x > glibc developers testing against linux-next: > > https://sourceware.org/pipermail/libc-alpha/2025-January/163993.html > > I've sent a fix here: > > https://lore.kernel.org/lkml/20250116205956.836074-1-mathieu.desnoyers@efficios.com/ > > The commit introducing the issue is in this PR, but not the fix. Indeed - with the bug RSEQ_FLAG_UNREGISTER would fail with an incorrect -EFAULT return. I've applied your fix, and updated the pull request for Linus further below. If Linus has already pulled I'll send a fixes pull request separately, or Linus can apply the fix from email directly: Acked-by: Ingo Molnar Or he can pull the sched-core-2025-01-21 tag below safely on top of sched-core-2025-01-20, which will result in a diffstat of: Mathieu Desnoyers (1): rseq: Fix rseq unregistration regression kernel/rseq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Since I booted the scheduler tree on generic desktops and it was tested on other systems as well and nothing appeared to be broken, I presume RSEQ_FLAG_UNREGISTER is used only in libc syscall-testcases and in specific applications? Thanks, Ingo ===================================> Linus, Please pull the latest sched/core Git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2025-01-21 # HEAD: 40724ecafccb1fb62b66264854e8c3ad394c8f3d rseq: Fix rseq unregistration regression Scheduler enhancements for v6.14: - Fair scheduler (SCHED_FAIR) enhancements: - Behavioral improvements: - Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra) - Delayed-dequeue enhancements & fixes: (Vincent Guittot) - Rename h_nr_running into h_nr_queued - Add new cfs_rq.h_nr_runnable - Use the new cfs_rq.h_nr_runnable - Removed unsued cfs_rq.h_nr_delayed - Rename cfs_rq.idle_h_nr_running into h_nr_idle - Remove unused cfs_rq.idle_nr_running - Rename cfs_rq.nr_running into nr_queued - Do not try to migrate delayed dequeue task - Fix variable declaration position - Encapsulate set custom slice in a __setparam_fair() function - Fixes: - Fix race between yield_to() and try_to_wake_up() (Tianchen Ding) - Fix CPU bandwidth limit bypass during CPU hotplug (Vishal Chourasia) - Cleanups: - Clean up in migrate_degrades_locality() to improve readability (Peter Zijlstra) - Mark m*_vruntime() with __maybe_unused (Andy Shevchenko) - Update comments after sched_tick() rename (Sebastian Andrzej Siewior) - Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used() (Valentin Schneider) - Deadline scheduler (SCHED_DL) enhancements: - Restore dl_server bandwidth on non-destructive root domain changes (Juri Lelli) - Correctly account for allocated bandwidth during hotplug (Juri Lelli) - Check bandwidth overflow earlier for hotplug (Juri Lelli) - Clean up goto label in pick_earliest_pushable_dl_task() (John Stultz) - Consolidate timer cancellation (Wander Lairson Costa) - Load-balancer enhancements: - Improve performance by prioritizing migrating eligible tasks in sched_balance_rq() (Hao Jia) - Do not compute NUMA Balancing stats unnecessarily during load-balancing (K Prateek Nayak) - Do not compute overloaded status unnecessarily during load-balancing (K Prateek Nayak) - Generic scheduling code enhancements: - Use READ_ONCE() in task_on_rq_queued(), to consistently use the WRITE_ONCE() updated ->on_rq field (Harshit Agarwal) - Isolated CPUs support enhancements: (Waiman Long) - Make "isolcpus=nohz" equivalent to "nohz_full" - Consolidate housekeeping cpumasks that are always identical - Remove HK_TYPE_SCHED - Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE - RSEQ enhancements: - Validate read-only fields under DEBUG_RSEQ config (Mathieu Desnoyers) - PSI enhancements: - Fix race when task wakes up before psi_sched_switch() adjusts flags (Chengming Zhou) - IRQ time accounting performance enhancements: (Yafang Shao) - Define sched_clock_irqtime as static key - Don't account irq time if sched_clock_irqtime is disabled - Virtual machine scheduling enhancements: - Don't try to catch up excess steal time (Suleiman Souhlal) - Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak) - Convert "sysctl_sched_itmt_enabled" to boolean - Use guard() for itmt_update_mutex - Move the "sched_itmt_enabled" sysctl to debugfs - Remove x86_smt_flags and use cpu_smt_flags directly - Use x86_sched_itmt_flags for PKG domain unconditionally - Debugging code & instrumentation enhancements: - Change need_resched warnings to pr_err() (David Rientjes) - Print domain name in /proc/schedstat (K Prateek Nayak) - Fix value reported by hot tasks pulled in /proc/schedstat (Peter Zijlstra) - Report the different kinds of imbalances in /proc/schedstat (Swapnil Sapkal) - Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal) - Update Schedstat version to 17 (Swapnil Sapkal) Thanks, Ingo ------------------> Andy Shevchenko (1): sched/fair: Mark m*_vruntime() with __maybe_unused Chengming Zhou (1): psi: Fix race when task wakes up before psi_sched_switch() adjusts flags David Rientjes (1): sched/debug: Change need_resched warnings to pr_err Hao Jia (1): sched/core: Prioritize migrating eligible tasks in sched_balance_rq() Harshit Agarwal (1): sched: add READ_ONCE to task_on_rq_queued John Stultz (1): sched: deadline: Cleanup goto label in pick_earliest_pushable_dl_task Juri Lelli (3): sched/deadline: Restore dl_server bandwidth on non-destructive root domain changes sched/deadline: Correctly account for allocated bandwidth during hotplug sched/deadline: Check bandwidth overflow earlier for hotplug K Prateek Nayak (8): sched/stats: Print domain name in /proc/schedstat x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean x86/itmt: Use guard() for itmt_update_mutex x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb sched/fair: Do not compute overloaded status unnecessarily during lb Mathieu Desnoyers (2): rseq: Validate read-only fields under DEBUG_RSEQ config rseq: Fix rseq unregistration regression Peter Zijlstra (3): sched/fair: Untangle NEXT_BUDDY and pick_next_task() sched/fair: Fix value reported by hot tasks pulled in /proc/schedstat sched/fair: Cleanup in migrate_degrades_locality() to improve readability Sebastian Andrzej Siewior (1): sched/fair: Update comments after sched_tick() rename. Suleiman Souhlal (1): sched: Don't try to catch up excess steal time. Swapnil Sapkal (3): sched: Report the different kinds of imbalances in /proc/schedstat sched: Move sched domain name out of CONFIG_SCHED_DEBUG docs: Update Schedstat version to 17 Tianchen Ding (1): sched: Fix race between yield_to() and try_to_wake_up() Valentin Schneider (1): sched/fair: Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used() Vincent Guittot (10): sched/fair: Rename h_nr_running into h_nr_queued sched/fair: Add new cfs_rq.h_nr_runnable sched/fair: Use the new cfs_rq.h_nr_runnable sched/fair: Removed unsued cfs_rq.h_nr_delayed sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle sched/fair: Remove unused cfs_rq.idle_nr_running sched/fair: Rename cfs_rq.nr_running into nr_queued sched/fair: Do not try to migrate delayed dequeue task sched/fair: Fix variable declaration position sched/fair: Encapsulate set custom slice in a __setparam_fair() function Vishal Chourasia (1): sched/fair: Fix CPU bandwidth limit bypass during CPU hotplug Waiman Long (4): sched/core: Remove HK_TYPE_SCHED sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full" sched/isolation: Consolidate housekeeping cpumasks that are always identical sched: Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE Wander Lairson Costa (1): sched/deadline: Consolidate Timer Cancellation Yafang Shao (3): sched: Define sched_clock_irqtime as static key sched: Don't account irq time if sched_clock_irqtime is disabled sched, psi: Don't account irq time if sched_clock_irqtime is disabled Documentation/admin-guide/kernel-parameters.txt | 4 +- Documentation/scheduler/sched-stats.rst | 126 ++++--- arch/x86/include/asm/topology.h | 4 +- arch/x86/kernel/itmt.c | 81 ++--- arch/x86/kernel/smpboot.c | 19 +- include/linux/sched.h | 10 + include/linux/sched/isolation.h | 21 +- include/linux/sched/topology.h | 13 +- kernel/rseq.c | 98 ++++++ kernel/sched/core.c | 94 +++-- kernel/sched/cputime.c | 16 +- kernel/sched/deadline.c | 119 +++++-- kernel/sched/debug.c | 25 +- kernel/sched/fair.c | 444 ++++++++++++++---------- kernel/sched/features.h | 9 + kernel/sched/isolation.c | 22 +- kernel/sched/pelt.c | 4 +- kernel/sched/psi.c | 7 +- kernel/sched/sched.h | 37 +- kernel/sched/stats.c | 11 +- kernel/sched/stats.h | 4 + kernel/sched/syscalls.c | 18 +- kernel/sched/topology.c | 12 +- 23 files changed, 720 insertions(+), 478 deletions(-)