Re: [PATCH 04/10] sched/fair: Add penalty calculation and application logic

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Wanpeng Li <kernellwp@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH 04/10] sched/fair: Add penalty calculation and application logic
Date: Wed, 12 Nov 2025 12:55:23 +0530	[thread overview]
Message-ID: <ef7d974e-7fcd-4e30-8a0d-8b97e00478bc@amd.com> (raw)
In-Reply-To: <20251110033232.12538-5-kernellwp@gmail.com>

Hello Wanpeng,

On 11/10/2025 9:02 AM, Wanpeng Li wrote:
> +/*
> + * Calculate penalty with debounce logic for EEVDF yield deboost.
> + * Computes vruntime penalty based on fairness gap (need) plus granularity,
> + * applies queue-size-based caps to prevent excessive penalties in small queues,
> + * and implements reverse-pair debounce (~300us) to reduce ping-pong effects.
> + * Returns 0 if no penalty needed, otherwise returns clamped penalty value.
> + */
> +static u64 __maybe_unused yield_deboost_calculate_penalty(struct rq *rq, struct sched_entity *se_y_lca,
> +				    struct sched_entity *se_t_lca, struct sched_entity *se_t,
> +				    int nr_queued)
> +{
> +	u64 gran, need, penalty, maxp;
> +	u64 gran_floor;
> +	u64 weighted_need, base;
> +
> +	gran = calc_delta_fair(sysctl_sched_base_slice, se_y_lca);
> +	/* Low-bound safeguard for gran when slice is abnormally small */
> +	gran_floor = calc_delta_fair(sysctl_sched_base_slice >> 1, se_y_lca);
> +	if (gran < gran_floor)

Is this even possible?

> +		gran = gran_floor;
> +
> +	need = 0;
> +	if (se_t_lca->vruntime > se_y_lca->vruntime)
> +		need = se_t_lca->vruntime - se_y_lca->vruntime;

So I'm assuming you want the yielding task's vruntime to
cross the target's vruntime simply because one task somewhere
down the hierarchy said so.

> +
> +	/* Apply 10% boost to need when positive (weighted_need = need * 1.10) */
> +	penalty = gran;

So at the very least I see it getting weighted(base_slice / 2) penalty
... 

> +	if (need) {
> +		/* weighted_need = need + 10% */
> +		weighted_need = need + need / 10;
> +		/* clamp to avoid overflow when adding to gran (still capped later) */
> +		if (weighted_need > U64_MAX - penalty)
> +			weighted_need = U64_MAX - penalty;
> +		penalty += weighted_need;

... if not more ...

> +	}
> +
> +	/* Apply debounce via helper to avoid ping-pong */
> +	penalty = yield_deboost_apply_debounce(rq, se_t, penalty, need, gran);

... since without debounce, penalty remains same.

> +
> +	/* Upper bound (cap): slightly more aggressive for mid-size queues */
> +	if (nr_queued == 2)
> +		maxp = gran * 6;		/* Strongest push for 2-task ping-pong */
> +	else if (nr_queued == 3)
> +		maxp = gran * 4;		/* 4.0 * gran */
> +	else if (nr_queued <= 6)
> +		maxp = (gran * 5) / 2;		/* 2.5 * gran */
> +	else if (nr_queued <= 8)
> +		maxp = gran * 2;		/* 2.0 * gran */
> +	else if (nr_queued <= 12)
> +		maxp = (gran * 3) / 2;		/* 1.5 * gran */
> +	else
> +		maxp = gran;			/* 1.0 * gran */

And all the nr_queued calculations are based on the entities queued
and not the "h_nr_queued" so we can have a boat load of tasks to
run above but since one task decided to call yield_to() let us make
them all starve a little?

> +
> +	if (penalty < gran)
> +		penalty = gran;
> +	if (penalty > maxp)
> +		penalty = maxp;
> +
> +	/* If no need, apply refined baseline push (low risk + mid risk combined). */
> +	if (need == 0) {
> +		/*
> +		 * Baseline multiplier for need==0:
> +		 *   2        -> 1.00 * gran
> +		 *   3        -> 0.9375 * gran
> +		 *   4–6      -> 0.625 * gran
> +		 *   7–8      -> 0.50  * gran
> +		 *   9–12     -> 0.375 * gran
> +		 *   >12      -> 0.25  * gran
> +		 */
> +		base = gran;
> +		if (nr_queued == 3)
> +			base = (gran * 15) / 16;	/* 0.9375 */
> +		else if (nr_queued >= 4 && nr_queued <= 6)
> +			base = (gran * 5) / 8;		/* 0.625 */
> +		else if (nr_queued >= 7 && nr_queued <= 8)
> +			base = gran / 2;		/* 0.5 */
> +		else if (nr_queued >= 9 && nr_queued <= 12)
> +			base = (gran * 3) / 8;		/* 0.375 */
> +		else if (nr_queued > 12)
> +			base = gran / 4;		/* 0.25 */
> +
> +		if (penalty < base)
> +			penalty = base;
> +	}
> +
> +	return penalty;
> +}
> +
> +/*
> + * Apply penalty and update EEVDF fields for scheduler consistency.
> + * Safely applies vruntime penalty with overflow protection, then updates
> + * EEVDF-specific fields (deadline, vlag) and cfs_rq min_vruntime to maintain
> + * scheduler state consistency. Returns true on successful application,
> + * false if penalty cannot be safely applied.
> + */
> +static void __maybe_unused yield_deboost_apply_penalty(struct rq *rq, struct sched_entity *se_y_lca,
> +				 struct cfs_rq *cfs_rq_common, u64 penalty)
> +{
> +	u64 new_vruntime;
> +
> +	/* Overflow protection */
> +	if (se_y_lca->vruntime > (U64_MAX - penalty))
> +		return;
> +
> +	new_vruntime = se_y_lca->vruntime + penalty;
> +
> +	/* Validity check */
> +	if (new_vruntime <= se_y_lca->vruntime)
> +		return;
> +
> +	se_y_lca->vruntime = new_vruntime;
> +	se_y_lca->deadline = se_y_lca->vruntime + calc_delta_fair(se_y_lca->slice, se_y_lca);

And with that we update vruntime to an arbitrary value simply
because one task in the hierarchy decided to call yield_to().

Since we are on the topic, you are also missing an update_curr()
which is only done in yield_task_fair() so you are actually
looking at old vruntime for the yielding entity.

> +	se_y_lca->vlag = avg_vruntime(cfs_rq_common) - se_y_lca->vruntime;
> +	update_min_vruntime(cfs_rq_common);
> +}
> +
>  /*
>   * sched_yield() is very simple
>   */

-- 
Thanks and Regards,
Prateek

next prev parent reply	other threads:[~2025-11-12  7:25 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10  3:32 [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Wanpeng Li
2025-11-10  3:32 ` [PATCH 01/10] sched: Add vCPU debooster infrastructure Wanpeng Li
2025-11-10  3:32 ` [PATCH 02/10] sched/fair: Add rate-limiting and validation helpers Wanpeng Li
2025-11-12  6:40   ` K Prateek Nayak
2025-11-12  6:44     ` K Prateek Nayak
2025-11-13 13:36       ` Wanpeng Li
2025-11-13 12:00     ` Wanpeng Li
2025-11-10  3:32 ` [PATCH 03/10] sched/fair: Add cgroup LCA finder for hierarchical yield Wanpeng Li
2025-11-12  6:50   ` K Prateek Nayak
2025-11-13  8:59     ` Wanpeng Li
2025-11-10  3:32 ` [PATCH 04/10] sched/fair: Add penalty calculation and application logic Wanpeng Li
2025-11-12  7:25   ` K Prateek Nayak [this message]
2025-11-13 13:25     ` Wanpeng Li
2025-11-10  3:32 ` [PATCH 05/10] sched/fair: Wire up yield deboost in yield_to_task_fair() Wanpeng Li
2025-11-10  5:16   ` kernel test robot
2025-11-10  5:16   ` kernel test robot
2025-11-10  3:32 ` [PATCH 06/10] KVM: Fix last_boosted_vcpu index assignment bug Wanpeng Li
2025-11-21  0:35   ` Sean Christopherson
2025-11-21  0:38     ` Sean Christopherson
2025-11-21 11:46     ` Wanpeng Li
2025-11-10  3:32 ` [PATCH 07/10] KVM: x86: Add IPI tracking infrastructure Wanpeng Li
2025-11-10  3:32 ` [PATCH 08/10] KVM: x86/lapic: Integrate IPI tracking with interrupt delivery Wanpeng Li
2025-11-10  3:32 ` [PATCH 09/10] KVM: Implement IPI-aware directed yield candidate selection Wanpeng Li
2025-11-10  3:39 ` [PATCH 10/10] KVM: Relaxed boost as safety net Wanpeng Li
2025-11-10 12:02 ` [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM Christian Borntraeger
2025-11-12  5:01   ` Wanpeng Li
2025-11-18  8:11     ` Christian Borntraeger
2025-11-18 14:19       ` Wanpeng Li
2025-11-11  6:28 ` K Prateek Nayak
2025-11-12  4:54   ` Wanpeng Li
2025-11-12  6:07     ` K Prateek Nayak
2025-11-13  5:37       ` Wanpeng Li
2025-11-13  4:42     ` K Prateek Nayak
2025-11-13  8:33       ` Wanpeng Li
2025-11-13  9:48         ` K Prateek Nayak
2025-11-13 13:56           ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef7d974e-7fcd-4e30-8a0d-8b97e00478bc@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox