From: Wanpeng Li <kernellwp@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>
Subject: [PATCH v2 2/9] sched/fair: Add rate-limiting and validation helpers
Date: Fri, 19 Dec 2025 11:53:26 +0800 [thread overview]
Message-ID: <20251219035334.39790-3-kernellwp@gmail.com> (raw)
In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com>
From: Wanpeng Li <wanpengli@tencent.com>
Implement core safety mechanisms for yield deboost operations.
Add yield_deboost_rate_limit() for high-frequency gating to prevent
excessive overhead on compute-intensive workloads. The 6ms threshold
balances responsiveness with overhead reduction.
Add yield_deboost_validate_tasks() for comprehensive validation ensuring
both tasks are valid and distinct, both belong to fair_sched_class,
target is on the same runqueue, and tasks are runnable.
The rate limiter prevents pathological high-frequency cases while
validation ensures only appropriate task pairs proceed. Both functions
are static and will be integrated in subsequent patches.
v1 -> v2:
- Remove unnecessary READ_ONCE/WRITE_ONCE for per-rq fields accessed
under rq->lock
- Change rq->clock to rq_clock(rq) helper for consistency
- Change yield_deboost_rate_limit() signature from (rq, now_ns) to (rq),
obtaining time internally via rq_clock()
- Remove redundant sched_class check for p_yielding (already implied by
rq->donor being fair)
- Simplify task_rq check to only verify p_target
- Change rq->curr to rq->donor for correct EEVDF donor tracking
- Move sysctl_sched_vcpu_debooster_enabled and NULL checks to caller
(yield_to_deboost) for early exit before update_rq_clock()
- Simplify function signature by returning p_yielding directly instead
of using output pointer parameters
- Add documentation explaining the 6ms rate limit threshold
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
kernel/sched/fair.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 87c30db2c853..2f327882bf4d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9040,6 +9040,68 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, struct t
}
}
+/*
+ * Rate-limit yield deboost operations to prevent excessive overhead.
+ * Returns true if the operation should be skipped due to rate limiting.
+ *
+ * The 6ms threshold balances responsiveness with overhead reduction:
+ * - Short enough to allow timely yield boosting for lock contention
+ * - Long enough to prevent pathological high-frequency penalty application
+ *
+ * Called under rq->lock, so direct field access is safe.
+ */
+static bool yield_deboost_rate_limit(struct rq *rq)
+{
+ u64 now = rq_clock(rq);
+ u64 last = rq->yield_deboost_last_time_ns;
+
+ if (last && (now - last) <= 6 * NSEC_PER_MSEC)
+ return true;
+
+ rq->yield_deboost_last_time_ns = now;
+ return false;
+}
+
+/*
+ * Validate tasks for yield deboost operation.
+ * Returns the yielding task on success, NULL on validation failure.
+ *
+ * Checks: feature enabled, valid target, same runqueue, target is fair class,
+ * both on_rq. Called under rq->lock.
+ *
+ * Note: p_yielding (rq->donor) is guaranteed to be fair class by the caller
+ * (yield_to_task_fair is only called when curr->sched_class == p->sched_class).
+ */
+static struct task_struct __maybe_unused *
+yield_deboost_validate_tasks(struct rq *rq, struct task_struct *p_target)
+{
+ struct task_struct *p_yielding;
+
+ if (!sysctl_sched_vcpu_debooster_enabled)
+ return NULL;
+
+ if (!p_target)
+ return NULL;
+
+ if (yield_deboost_rate_limit(rq))
+ return NULL;
+
+ p_yielding = rq->donor;
+ if (!p_yielding || p_yielding == p_target)
+ return NULL;
+
+ if (p_target->sched_class != &fair_sched_class)
+ return NULL;
+
+ if (task_rq(p_target) != rq)
+ return NULL;
+
+ if (!p_target->se.on_rq || !p_yielding->se.on_rq)
+ return NULL;
+
+ return p_yielding;
+}
+
/*
* sched_yield() is very simple
*/
--
2.43.0
next prev parent reply other threads:[~2025-12-19 3:53 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-19 3:53 [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 1/9] sched: Add vCPU debooster infrastructure Wanpeng Li
2025-12-19 3:53 ` Wanpeng Li [this message]
2025-12-22 21:12 ` [PATCH v2 2/9] sched/fair: Add rate-limiting and validation helpers kernel test robot
2026-01-04 4:09 ` Hillf Danton
2025-12-19 3:53 ` [PATCH v2 3/9] sched/fair: Add cgroup LCA finder for hierarchical yield Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 4/9] sched/fair: Add penalty calculation and application logic Wanpeng Li
2025-12-22 23:36 ` kernel test robot
2025-12-19 3:53 ` [PATCH v2 5/9] sched/fair: Wire up yield deboost in yield_to_task_fair() Wanpeng Li
2025-12-22 7:06 ` kernel test robot
2025-12-22 9:31 ` kernel test robot
2025-12-19 3:53 ` [PATCH v2 6/9] KVM: x86: Add IPI tracking infrastructure Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 7/9] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 8/9] KVM: Implement IPI-aware directed yield candidate selection Wanpeng Li
2025-12-19 3:53 ` [PATCH v2 9/9] KVM: Relaxed boost as safety net Wanpeng Li
2026-01-04 2:40 ` [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2026-01-05 6:26 ` K Prateek Nayak
2026-03-13 1:13 ` Sean Christopherson
2026-04-01 9:48 ` Wanpeng Li
2026-04-02 23:43 ` Sean Christopherson
2026-03-26 14:41 ` Christian Borntraeger
2026-04-01 9:34 ` Wanpeng Li
2026-04-08 9:35 ` Richie Buturla
2026-04-17 11:30 ` Richie Buturla
2026-05-13 12:52 ` Richie Buturla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251219035334.39790-3-kernellwp@gmail.com \
--to=kernellwp@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.