Re: [PATCH v2] tools/sched_ext: scx_flatcg: Fix inverted vtime delta accounting

BPF List
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: luoliang@kylinos.cn
Cc: bpf@vger.kernel.org, sched-ext@lists.linux.dev
Subject: Re: [PATCH v2] tools/sched_ext: scx_flatcg: Fix inverted vtime delta accounting
Date: Wed, 17 Jun 2026 09:17:46 +0000	[thread overview]
Message-ID: <20260617091747.0D18A1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260617090348.4131187-1-luoliang@kylinos.cn>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Casting absolute unsigned virtual time cvtime to s64 triggers a clamp to 0 when the clock exceeds 0x7FFFFFFFFFFFFFFF, causing permanent starvation.

Pre-existing issues:
- [High] Non-atomic read-and-clear of cvtime_delta in cgrp_cap_budget leads to double-counting of concurrent time updates.
- [High] Bypassed CPU time is added to cvtime_delta unscaled, massively undercharging the cgroup.
--

commit b06d8cd51f70901f471bd851ffb7fe016e3053b0
Author: Liang Luo <luoliang@kylinos.cn>
tools/sched_ext: scx_flatcg: Fix inverted vtime delta accounting

In try_pick_next_cgroup(), a cgroup is charged its full slice upfront.
fcg_dispatch() is then supposed to refund the unused time by adding
(expected_end - now) * FCG_HWEIGHT_ONE / hweight to cvtime_delta.
This patch corrects an issue where overruns and early yields were
handled with inverted logic, causing massive overcharging or underflows.

> diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
> index fec3595818269..6aa0c62b76d26 100644
> --- a/tools/sched_ext/scx_flatcg.bpf.c
> +++ b/tools/sched_ext/scx_flatcg.bpf.c
> @@ -249,15 +249,26 @@ static void cgrp_refresh_hweight(struct cgroup *cgrp, struct fcg_cgrp_ctx *cgc)
>  
>  static void cgrp_cap_budget(struct cgv_node *cgv_node, struct fcg_cgrp_ctx *cgc)
>  {
> -	u64 delta, cvtime, max_budget;
> +	s64 delta;
> +	u64 cvtime, max_budget;
>  
>  	/*
>  	 * A node which is on the rbtree can't be pointed to from elsewhere yet
>  	 * and thus can't be updated and repositioned. Instead, we collect the
>  	 * vtime deltas separately and apply it asynchronously here.
>  	 */
> -	delta = __sync_fetch_and_sub(&cgc->cvtime_delta, cgc->cvtime_delta);
> +	delta = (s64)__sync_fetch_and_sub(&cgc->cvtime_delta, cgc->cvtime_delta);

[Severity: High]
This is a pre-existing issue, but does this non-atomic read-and-clear pattern
lead to double-counting of concurrent time updates?

The __sync_fetch_and_sub() pattern evaluates the second argument
(cgc->cvtime_delta) before performing the atomic subtraction. If a concurrent
context like fcg_stopping() adds time to cvtime_delta between the read and
the subtraction, the atomic operation will subtract the old value from the
newly incremented value.

This extracts the concurrent addition into delta, but leaves the concurrently
added value in cvtime_delta in memory to be extracted again on the next
execution, skewing the cgroup's virtual time.

> +	/*
> +	 * cvtime_delta may be negative (early-yield refund from fcg_dispatch()).
> +	 * Apply it in the signed domain and clamp the resulting cvtime at zero
> +	 * so that a refund larger than the accumulated cvtime doesn't roll the
> +	 * field negative.
> +	 */
> +	s64 vtime = (s64)cgv_node->cvtime + delta;
> +
> +	if (vtime < 0)
> +		vtime = 0;
> +	cvtime = (u64)vtime;

[Severity: High]
Can casting the absolute unsigned virtual time (cvtime) to s64 trigger a clamp
to 0 when the clock exceeds 0x7FFFFFFFFFFFFFFF, permanently starving the
cgroup?

Virtual time is a monotonically increasing u64 timestamp that uses modular
arithmetic via time_before(). Because it is scaled by FCG_HWEIGHT_ONE (65536) /
hweight, a heavily nested cgroup with hweight == 1 advances its cvtime 65,536
times faster than real time, and will exceed 0x7FFFFFFFFFFFFFFF within roughly
39 hours of CPU time.

When this happens, casting cgv_node->cvtime to s64 evaluates to a negative
number, triggering the vtime < 0 check and forcefully clamping the cgroup's
cvtime to 0. Since time_before(0, 2^63 + X) evaluates to false in modular
arithmetic, 0 is treated as a time far in the future, placing the cgroup at
the end of the rbtree indefinitely.

Should underflow prevention be applied only to the relative delta, rather
than the absolute timestamp?

[ ... ]

[Severity: High]
This isn't a bug introduced by this patch, but does the time accounting for
bypassed tasks undercharge the cgroup by failing to scale the elapsed time?

Looking at fcg_stopping(), bypassed CPU time is added to cvtime_delta unscaled:

tools/sched_ext/scx_flatcg.bpf.c:fcg_stopping() {
    ...
    __sync_fetch_and_add(&cgc->cvtime_delta,
                         p->se.sum_exec_runtime - taskc->bypassed_at);
    ...
}

cgc->cvtime_delta accumulates virtual time, which is wall-clock execution time
scaled by FCG_HWEIGHT_ONE / cgc->hweight. By adding the raw, unscaled elapsed
nanoseconds directly, a cgroup with hweight == 1 will be undercharged by a
factor of 65,536 for bypassed time compared to regular execution, breaking
cgroup weight fairness.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617090348.4131187-1-luoliang@kylinos.cn?part=1

     prev parent reply	other threads:[~2026-06-17  9:17 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17  6:48 [PATCH] tools/sched_ext: scx_flatcg: Fix inverted vtime delta accounting luoliang
2026-06-17  7:02 ` sashiko-bot
2026-06-17  7:59 ` Andrea Righi
2026-06-17  9:03 ` [PATCH v2] " luoliang
2026-06-17  9:17   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617091747.0D18A1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=luoliang@kylinos.cn \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sched-ext@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox