From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 984C339C637 for ; Thu, 30 Apr 2026 21:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585122; cv=none; b=Ox2mLgWM5w9Ej33uR0GsjyXzsfNBysrwQhmMouoLMWI0ltDcunPp4fguma2bcbwIY8Y3wgrf8wX/ToUnsjhBbo8R/uceyd7tn/7gAfoCaCScSxA4Mu6pQEA7/bwcxNZtwVi8WUNJ1PuAIrYq4GBlsTFmd5Zx3i0VVVLu+9hbPFk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585122; c=relaxed/simple; bh=OcdQoQjrB1zygeJye7Tjfkpm0q/jdWsr/isQiDN+Z8U=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=dufNjCYPwG5K+X0faSYvmytDYcxfKuWRqCbjGaIk9t4PsqEm0bEgMwgpByQoNF2Z6ojvYHkidWdLDVbXmWVbg85G7U6ZY+NPJoQHQV3z5+W1XX3aRk9qZbRvt+5DuaIPV7dpfYPuc6Lg3EuqZPW9UN8sal7Z9Zyjw319wqMeIWk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H4RmR8pi; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H4RmR8pi" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-43d75312379so1474541f8f.1 for ; Thu, 30 Apr 2026 14:38:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585119; x=1778189919; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5SbV/vfoauRB4Nal/x73HI2QrEWc1gLZZH8ZzFFWUqE=; b=H4RmR8pitHjuCpwNAedNyY5KQhwvYZCQIW+/aWdf2Y75blkfE0PVO5FYAhL4b8aGuA qrh/4Zh16CMbHcFlvo7+aUjuZPLdX7u+YiOwjH36LEh7DUQ8/xHNkPtL9rV9mqet38/r lNJlrRPVk8UFeQIirHLOdpqOMA+Jf2Mce6WTmVIhWq2PLgWF9SXSKo6H+3yLArcLlGBn JbbuhSqPu9pHhhlu4DO69COwYnUQfudaJ1JhWCFHftw5aFK6xnIgd7aQvUhm6JxdO9AM nai18cOCtWfxHCNQ5DjQ6AACqbJfcAAm3BGapXR6Lx6DJ+CqcTC0v+xWfThkNaB4mvdy PMfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585119; x=1778189919; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5SbV/vfoauRB4Nal/x73HI2QrEWc1gLZZH8ZzFFWUqE=; b=GypIXM+rYBG0VzbA4aqfPAS7SJzGRaxy/tvS6IAfjTch6bVr91Y9VGYtuMh1KKbCzy jKRYf4uajTNVl61gmHGQi4/4NXM3vKyqx+Py8lGfWsjyK5uD2COtwNPykuH8DuBczgUb dQrjEUC0mTfq1j8WFMI1ZIh5ZzaeqE0XrBbbpsJ9TmkngNbHNnN6kAJ5XvgvnKe0Zbm/ c7fla8QidRgOGnm+VIbB7BGqtS/O6+lClzDv+hXsU5aSI5owM23ZCYZftz3Q2lclrvZE TLnDbuw//okfTAKyLqRtOwodDonOBQ2jRswtfRuO0J3tn5xDxbNCtzBqU1c7UF3HEWPR fP8w== X-Gm-Message-State: AOJu0YxBTEi5aPylPn+QDGXL8p7f5vYJ4h+QAmugCUPBqX7X1YPViVmr TmJMQeLb1ZMcVd7/Kuo+UAPgL0r/kEHCAZyCgiOHB8GFu4RW2CZoBCqH X-Gm-Gg: AeBDievG2HnS6muSkV7nODyLyCzXNon/hRpMJnx3n45W61HIx8tH8f2rJIZK5oSGtUT PdQeHE9rRzsAuOjSRRPaw9sSt7yK38MwqFwlOdPCLPuiHIAFt3uNRV86P9gitET3/8Zo9fLDDfr OvY5rnGpulBiTY7Ph1MM82KBr/vlbV+O7KpkEtrk+0lqRStBFumSnyZeXMV2RLvMIy1MIDEp6vz hFghKIM93c3Avh7UgXGIOD3Cq9eKYDbZu02X1Q38TAbTq1NR0HjSDSupvWxT3Pzjp9P5Zpg85v5 ofjZzfYO3hGiZ9lqLNJFcE8YKsknaNHBNsf6y00Lb18L93fTfu/SjFIz44vD/XQBUZxislcPVE9 eYTtRqJJmkUeYsg6sTmtVH6M7VcTGOHCL0tWCEXIfBFd58CBw+OANohGsr8dROAALoQog41uft0 A0kLmmyGEtmb004FBhcp4R2jBvhDzPNUrXFCiCD2H0 X-Received: by 2002:a05:6000:310c:b0:43e:a75e:352 with SMTP id ffacd0b85a97d-4494dd48b5emr7918520f8f.4.1777585118906; Thu, 30 Apr 2026 14:38:38 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:38 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 00/29] Hierarchical Constant Bandwidth Server Date: Thu, 30 Apr 2026 23:38:04 +0200 Message-ID: <20260430213835.62217-1-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Hello, This is the v5 for Hierarchical Constant Bandwidth Server, aiming at replacing the current RT_GROUP_SCHED mechanism with something more robust and theoretically sound. The patchset has been presented at OSPM25 and OSPM26 (https://retis.sssup.it/ospm-summit/), and a summary of its inner workings can be found at https://lwn.net/Articles/1021332/ . You can find the previous versions of this patchset at the bottom of the page, in particular version 1 which talks in more detail what this patchset is all about and how it is implemented. This v5 version works on the comments by the reviewers and introduces the following meaningful changes: - Update to kernel version 7.0. - General refactorings, cleanups, extensive use of lock guard for cleaner code. - Add missing rcu read sections in deadline.c and rt.c code. - Include fix for non-deferred deadline server logic (Patch 1). - Account HCBS deadline servers along with all the active tasks when the servers are active. This ensures correct behaviour for servers that are just replenished but have no tasks to run. - Update and reuse __checkparam_dl to also check for HCBS servers' parameters. - Update default sysctl_sched_rt_runtime to 1s, as sysctl_sched_rt_period. These parameters only manage the deadline tasks' and servers' bandwidth, not the actual parameters of the fair (and ext) servers. - Add early release of cgroup resources in unregister_rt_sched_group, reducing from two to one the number of RCU grace periods to wait for the release of reserved deadline bandwidth. - Remove rt_server_try_pull, as it is now possible to pull tasks directly in rt_server_pick on server replenish. - Remove dl_server_stop call when emptying a cgroup's runqueue, as the server is nonetheless stopped on the next server pick (if the pull operation fails). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Summary of the patches: 1) Replenishment logic fix for non-deferred deadline servers 2-5) Preparation patches, so that the RT classes' code can be used both for normal and cgroup scheduling. 6-17) Implementation of HCBS, no migration and only one level hierarchy. The old RT_GROUP_SCHED code is removed. 18-19) Remove cgroups v1 in favour of v2. 20) Add support for deeper hierarchies. 21) Update default bandwidth for deadline entities. 22-26) Add support for tasks migration. 27) Documentation for HCBS. 28-29) Debug BUG_ONs optional patches. Updates from v4: - Rebase to latest tip/master. - General rebasing/cleanup. - Update default sysctl_sched_rt_runtime to 1s, same as the period. - Fix non-deferred deadline server replenishment logic. - Add missing RCU read sections. - Account HCBS servers along with their tasks when the servers are active. - Release bandwidth resources early in unregister_rt_sched_group. - Drop server_try_pull_task as it is now redundant. - Remove dl_server_stop call in dequeue_task_rt. - Update to reuse __checkparam_dl for deadline servers. Updates from v3: - Rebase to latest tip/master. - General rebasing/cleanup. - Add Documentation. - Define **live** and **active** groups. - Introduce server_try_pull_task in place of the removed server_has_task. - Introduce RELEASE_LOCK helper macro for guard-based locking. - Update inc/dec_dl_tasks to account for served runqueues regardless of the server type. - Fix computing of new bandwidth values in dl_init_tg. - Fix check in dl_check_tg to use capacity scaling. - Fix wakeup_preempt_rt to check if curr is a DEADLINE task. Updates from v2: - Rebase to latest tip/master. - Remove fair-servers' bw reclaiming. - Fix a check which prevented execution of wakeup_preempt code. - Fix a priority check in group_pull_rt_task between tasks of different groups. - Rework allocation/deallocation code for rt-cgroups. - Update signatures for some group related migration functions. - Add documentation for wakeup_preempt preemption rules. Updates from v1: - Rebase to latest tip/master. - Add migration code. - Split big patches for more readability. - Refactor code to use guarded locks where applicable. - Remove unnecessary patches from v1 which have been addressed differently by mainline updates. - Remove unnecessary checks and general code cleanup. Notes: Patch 1 has already been submitted for review at: https://lore.kernel.org/all/20260420163410.20808-1-yurand2000@gmail.com/ Patches 28-29 are completely optional and are not meant to be included in the final patchset: they just add some invasive BUG_ONs that assert some preconditions expected on some function calls. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Testing v5: The patchset has been tested with a suite of tests tailored to stress all the implemented functionalities. The tests are available at https://github.com/Yurand2000/HCBS-Test-Suite . Refer to the README of the repository for more details. Follow these steps to test HCBS v5: - Get the HCBS patch up and running. Any kernel/disto should work effortlessly. - Get, compile and _install_ the tests. - Run the `go_rt.sh` script to set the frequency of the CPUs to a fixed value and disable hyperthreading and power saving features. - Run the `run_tests.sh full` script, to run the whole test suite. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Future Work: We think the current patchset is stable enough. Our current test suite demonstrates, on our limited hardware, that the kernel does not throw warnings and that it is actually possible to guarantee time reservations and isolation among tenants. In the hope that the pre-migration patches (2-19) have reached a decent final form, we of course expect comments on the migration related code (22-26) and the other patches (1,20-21). Since the updates on the latest comments were already worked onto, we've decided to release v5 without the multiCPU feature, presented at OSPM26, as the code is not yet fully tested and cleaned, in the hope to release it in a future v6 RFC. Additional future work: - capacity aware bandwidth reservation. - hotplug/hotunplug management. Have a nice day, Yuri v1: https://lore.kernel.org/all/20250605071412.139240-1-yurand2000@gmail.com/ v2: https://lore.kernel.org/all/20250731105543.40832-1-yurand2000@gmail.com/ v3: https://lore.kernel.org/all/20250929092221.10947-1-yurand2000@gmail.com/ v4: https://lore.kernel.org/all/20251201124205.11169-1-yurand2000@gmail.com/ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Yuri Andriaccio (13): sched/deadline: Fix replenishment logic for non-deferred servers sched/rt: Disable RT_GROUP_SCHED sched/rt: Remove unnecessary runqueue pointer in struct rt_rq sched/rt: Implement dl-server operations for rt-cgroups sched/rt: Update task event callbacks for HCBS scheduling sched/rt: Allow zeroing the runtime of the root control group sched/rt: Remove support for cgroups-v1 sched/rt: Update default bandwidth for real-time tasks to ONE sched/rt: Try pull task on empty server pick. sched/core: Execute enqueued balance callbacks after migrate_disable_switch Documentation: Update documentation for real-time cgroups sched/rt: Add debug BUG_ONs for pre-migration code sched/rt: Add debug BUG_ONs in migration code luca abeni (16): sched/deadline: Do not access dl_se->rq directly sched/deadline: Distinguish between dl_rq and my_q sched/rt: Pass an rt_rq instead of an rq where needed sched/rt: Move functions from rt.c to sched.h sched/rt: Introduce HCBS specific structs in task_group sched/core: Initialize HCBS specific structures sched/deadline: Add dl_init_tg sched/rt: Add {alloc/unregister/free}_rt_sched_group sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests. sched/rt: Update rt-cgroup schedulability checks sched/rt: Remove old RT_GROUP_SCHED data structures sched/core: Cgroup v2 support sched/deadline: Allow deeper hierarchies of RT cgroups sched/rt: Add rt-cgroup migration functions sched/rt: Hook HCBS migration functions sched/core: Execute enqueued balance callbacks when changing allowed CPUs Documentation/scheduler/sched-rt-group.rst | 504 ++- include/linux/rcupdate.h | 1 + include/linux/sched.h | 10 +- kernel/sched/autogroup.c | 4 +- kernel/sched/core.c | 74 +- kernel/sched/deadline.c | 251 +- kernel/sched/debug.c | 6 - kernel/sched/ext.c | 4 +- kernel/sched/fair.c | 4 +- kernel/sched/rt.c | 3240 ++++++++++---------- kernel/sched/sched.h | 178 +- kernel/sched/syscalls.c | 9 +- 12 files changed, 2393 insertions(+), 1892 deletions(-) base-commit: 028ef9c96e96197026887c0f092424679298aae8 -- 2.53.0