From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4617386C1F for ; Wed, 15 Apr 2026 21:12:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776287530; cv=none; b=Ue5oKbZAJJAE98F2XhNi5KCW1X4lCoqYFoqJ9MpozAg1KzlVK31i+nMZfHe7z3FucIs0aw8RXsllYcRsB3bv1SGI4htyDbJafJM6KIeumBs+WJRMLfW5ptubV2KyjBpcZf7q/v/ZqXY6xo690vlBRbjPNeQYal71B1xnj1OcVyw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776287530; c=relaxed/simple; bh=Ar8UJdZpGXM+duOCl1JySoRd1307I6DkyqGZII+JwpM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Nm69QhX5IpeM9IUaQFi0KR8MOQDWkMsGSgCV/63tWszG3ew6B9vjdXLm5Fadf3DBB9Y5eDD26cJtsQZpwtmQIqYkkVhi10QdPzQUS915vnPLDXtG0dq9JrhzST2BBrpPKKNnYE3CcoZyfzgKn4n2qEsNjH8Ll3nhuAzffbnjkWg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iTvp8Y5A; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iTvp8Y5A" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2b2e91add2aso44593075ad.1 for ; Wed, 15 Apr 2026 14:12:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776287528; x=1776892328; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=f++IkkhO6iiKTMtOgrcXNiV0MCS3xXw8Z+Ww77FoANc=; b=iTvp8Y5A9yzaD5lx3FGZlRhhEXubg3P2pHfdcP3eGde8CKoOvCR+0PYcSNW6/fznSq GUDWEVxIv+3C8nLD6nRE11+JOAnc78faVPifG1B7sAd9+hcTeY3BfsiBXAR1dQIudhhD toQZzIPZ9LkZdd86iB6Pwg3cx/K+gUY/HIEvrDjv1Whj0pcx4OW7rHUgX2uKh634DvYH Nau/khbCd2aBdezkdETHyv0ZsGSN3324RDMI5l3N0ouJE/q/e/5AJ8mn1xHh10nYu/4n uyDXLJ5O5jCQWucBHcE/mknPkFT5UVG6B8a/wJ15TfDnj10uD6iCtM2dcqDoONb5NnCR hZPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776287528; x=1776892328; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=f++IkkhO6iiKTMtOgrcXNiV0MCS3xXw8Z+Ww77FoANc=; b=tRwqDvkPzhZdkiI1tCkQzD5j6Q94/7Ti6h26he/8CpykWI0B6d3FTWiY6N1SQAh1Dq OwsGYO5NXvYZHAetdYMf4Uyo9fgR2pInxJAR9iaKXp/RVzTfkFLSSfxnYFnw/YCL2WXZ tljKVAiklNTuPvYQIZFWmPyw/ZFlCcspqcroVOquS8F2O2JbsEfqH/1hM6wY3XDSwv+7 2LCZMjcGaA0/Pha7ZBvRwsOzMWoz18czEgp11tDDntF/HcwbuPlTO5QCPp80kfUhUnfF QX7Vp1Ymqv3OrVOZ9dt1VDoIk5PMYNHU1CqEPRjYHXkxSvTeB7FKLPhOfvRSjIC0PVEP CoLA== X-Forwarded-Encrypted: i=1; AFNElJ+2MDIQxMt6nMn3Ki469dsNFWy4K1vDpfIeEUoLezknJ4ZpMtZ+3F0rv8Oh5JzkR/1u4AI/2VG1cCz8BW4=@vger.kernel.org X-Gm-Message-State: AOJu0Yy8nN2Spfj/LM05HjEL4fQ902NuqvI2smzEeIPOqLJlK6NBqCq6 Uj8uhsEwUvbDLMAsRyL62lFz8/iNstqF+S5jfJA6hdWLEzpgZ8L/OaMzf0K2utR2VRp1DRTLu8l QQBR20onV X-Received: from plbkb5.prod.google.com ([2002:a17:903:3385:b0:2b4:5bcc:fc4a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:b490:b0:2b0:6e60:9582 with SMTP id d9443c01a7336-2b2d59661ddmr180947685ad.18.1776287527842; Wed, 15 Apr 2026 14:12:07 -0700 (PDT) Date: Wed, 15 Apr 2026 21:10:53 +0000 In-Reply-To: <20260324100126.3502-1-willymontaz@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260324100126.3502-1-willymontaz@gmail.com> X-Mailer: git-send-email 2.54.0.rc1.513.gad8abe7a5a-goog Message-ID: <20260415211149.2658910-1-jstultz@google.com> Subject: [PATCH 6.18] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") From: John Stultz To: stable@vger.kernel.org, LKML Cc: Peter Zijlstra , Zicheng Qu , Vincent Guittot , K Prateek Nayak , Shubhang Kaushik , John Stultz , Dietmar Eggemann , Xuewen Yan , William Montaz Content-Type: text/plain; charset="UTF-8" From: Peter Zijlstra [ Upstream commit 101f3498b4bdfef97152a444847948de1543f692 ] Zicheng Qu reported that, because avg_vruntime() always includes cfs_rq->curr, when ->on_rq, place_entity() doesn't work right. Specifically, the lag scaling in place_entity() relies on avg_vruntime() being the state *before* placement of the new entity. However in this case avg_vruntime() will actually already include the entity, which breaks things. Also, Zicheng Qu argues that avg_vruntime should be invariant under reweight. IOW commit 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") was wrong! The issue reported in 6d71a9c61604 could possibly be explained by rounding artifacts -- notably the extreme weight '2' is outside of the range of avg_vruntime/sum_w_vruntime, since that uses scale_load_down(). By scaling vruntime by the real weight, but accounting it in vruntime with a factor 1024 more, the average moves significantly. However, that is now cured. Tested by reverting 66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE") and tracing vruntime and vlag figures again. Reported-by: Zicheng Qu Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Tested-by: K Prateek Nayak Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20260219080625.066102672%40infradead.org (cherry picked from commit 101f3498b4bdfef97152a444847948de1543f692) [jstultz: Resolved minor collision in the revert against 6.18-stable] Signed-off-by: John Stultz --- Cc: Dietmar Eggemann Cc: "Xuewen Yan" Cc: William Montaz --- kernel/sched/fair.c | 148 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 124 insertions(+), 24 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d9777c81db0da..4279035367dbb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -764,17 +764,22 @@ static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq); * * -r_max < lag < max(r_max, q) */ -static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) +static s64 entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 avruntime) { u64 max_slice = cfs_rq_max_slice(cfs_rq) + TICK_NSEC; s64 vlag, limit; - WARN_ON_ONCE(!se->on_rq); - - vlag = avg_vruntime(cfs_rq) - se->vruntime; + vlag = avruntime - se->vruntime; limit = calc_delta_fair(max_slice, se); - se->vlag = clamp(vlag, -limit, limit); + return clamp(vlag, -limit, limit); +} + +static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + WARN_ON_ONCE(!se->on_rq); + + se->vlag = entity_lag(cfs_rq, se, avg_vruntime(cfs_rq)); } /* @@ -3831,23 +3836,125 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); } -static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags); +static void +rescale_entity(struct sched_entity *se, unsigned long weight, bool rel_vprot) +{ + unsigned long old_weight = se->load.weight; + + /* + * VRUNTIME + * -------- + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag = (V - v)*w = (V'- v')*w', where v = v' + * ==> V' = (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' = (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) = (V - v)*w/w' + v + * ==> (WV-Wv+Wv+w'v-wv)/(W+w'-w) = (V - v)*w/w' + v + * ==> (WV - Wv)/(W + w' - w) + v = (V - v)*w/w' + v + * ==> (V - v)*W/(W + w' - w) = (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V != v, we + * can simplify (3): + * + * ==> W / (W + w' - w) = w / w' + * ==> Ww' = Ww + ww' - ww + * ==> W * (w' - w) = w * (w' - w) + * ==> W = w (re-weight indicates w' != w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w = (V' - v')*w' + * ==> v' = V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' = (WV - wv + w'v') / (W - w + w') + * = (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * = (WV - wv + w'V' - Vw + wv) / (W - w + w') + * = (WV + w'V' - Vw) / (W - w + w') + * + * ==> V'*(W - w + w') = WV + w'V' - Vw + * ==> V' * (W - w) = (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W != w, then Eq. (5) turns + * into V' = V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' = V' - (V - v) * w / w' (4) + * = V - (V - v) * w / w' + * = V - vl * w / w' + * = V - vl' + */ + se->vlag = div64_long(se->vlag * old_weight, weight); + + /* + * DEADLINE + * -------- + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' = v' + (d - v)*w/w' + * = V' - (V - v)*w/w' + (d - v)*w/w' + * = V - (V - v)*w/w' + (d - v)*w/w' + * = V + (d - V)*w/w' + */ + if (se->rel_deadline) + se->deadline = div64_long(se->deadline * old_weight, weight); + + if (rel_vprot) + se->vprot = div64_long(se->vprot * old_weight, weight); +} static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { bool curr = cfs_rq->curr == se; bool rel_vprot = false; - u64 vprot; + u64 avruntime = 0; if (se->on_rq) { /* commit outstanding execution time */ update_curr(cfs_rq); - update_entity_lag(cfs_rq, se); - se->deadline -= se->vruntime; + avruntime = avg_vruntime(cfs_rq); + se->vlag = entity_lag(cfs_rq, se, avruntime); + se->deadline -= avruntime; se->rel_deadline = 1; if (curr && protect_slice(se)) { - vprot = se->vprot - se->vruntime; + se->vprot -= avruntime; rel_vprot = true; } @@ -3858,30 +3965,23 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, } dequeue_load_avg(cfs_rq, se); - /* - * Because we keep se->vlag = V - v_i, while: lag_i = w_i*(V - v_i), - * we need to scale se->vlag when w_i changes. - */ - se->vlag = div_s64(se->vlag * se->load.weight, weight); - if (se->rel_deadline) - se->deadline = div_s64(se->deadline * se->load.weight, weight); - - if (rel_vprot) - vprot = div_s64(vprot * se->load.weight, weight); + rescale_entity(se, weight, rel_vprot); update_load_set(&se->load, weight); do { u32 divider = get_pelt_divider(&se->avg); - se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider); } while (0); enqueue_load_avg(cfs_rq, se); if (se->on_rq) { - place_entity(cfs_rq, se, 0); if (rel_vprot) - se->vprot = se->vruntime + vprot; + se->vprot += avruntime; + se->deadline += avruntime; + se->rel_deadline = 0; + se->vruntime = avruntime - se->vlag; + update_load_add(&cfs_rq->load, se->load.weight); if (!curr) __enqueue_entity(cfs_rq, se); @@ -5281,7 +5381,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) se->vruntime = vruntime - lag; - if (se->rel_deadline) { + if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) { se->deadline += se->vruntime; se->rel_deadline = 0; return; -- 2.54.0.rc1.513.gad8abe7a5a-goog