From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F6D328DEE9 for ; Fri, 19 Dec 2025 03:54:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116442; cv=none; b=SLayoj+UvCt5lB8A1fYkWnW4Y1HXgj33NsfFtEgiF3+VkIcgv+r4DgHADgIKsMlwGkwE2UgGFWoOJCZJOsHQuTaXhtWIdMVgPTCGCG+T7LxvEgnxpyksAUCpK/LWA3wwBsheokCx2QomYbyhMZjOtfpps3RjejZOI71ZMZYU2SI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766116442; c=relaxed/simple; bh=H8U0wRFSdW5gcTaqf8OG8+irmfu/waefCnMBKyOk/y4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HjMQma7t+rJAAgrfgBLiOy67hQG5gC2O8s/MaMxWMMx+tylWVEqik9RmEMDXF3D6YfrC0FVBn953sYfCo5i62H/ubOnmk9u3j0ZSDF8ifZLnRygH7cCVtbx9ntT7ycuWlC3i2oX9BixRNaZgPC0uLH1WXvgBbygYQx0V+EVyPQk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UZRR+Cya; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UZRR+Cya" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2a0d6f647e2so20874045ad.1 for ; Thu, 18 Dec 2025 19:54:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766116439; x=1766721239; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qf8yDAv14VzvWW94c4Eap1J+svTAb90Oiob+iJ0N9Kw=; b=UZRR+Cya8HgdcEUuONWiEgVQk1lo1FJWM3QeXpZ1Jjd9+rQxWk8FwtWQRhhm5OBPCH JLJdowOsu3SqQcgNZQzjtltlXdYD5W8JV8YuvSBxMUjUTrAQWZSjjTIdmDG5b0L8VFa2 Exd2csCzIwJQsiiMdYyOfoU9ZkTGRo3nS8LwQaysjOo+DFdz6igIu5tJXXK69E9nIo/w ESWmQ4mBM9gSPXLrhziPi1Bn5cazSyX9HFX+U0z14xmpPkaqBV7kkR76eKIFvxh3/p9x og08McNNOjrFEBfA8QMLkkFdFntkIzlE2V4SsBjtKiArigSLryKtlYa36XytEErOfbB9 m8jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766116439; x=1766721239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Qf8yDAv14VzvWW94c4Eap1J+svTAb90Oiob+iJ0N9Kw=; b=sCy2vRTf+Pre4T6rVPXzVPEyl9VZqclWeC4sS+RK54LUsUlNGHmQhthLuobU7lZUJK hGpevhh18rZCz+hkEubFq8wDkoWzAMkqAZnuRzGeP7toOWxgAYavV9zgqmnB4kRGR1tJ OsGNWwV1p6VAHxa71+dHocqitieijgtOSrljjGEbEQoOA95Hnap9yYGlRoXgQf0yM4F7 U6cEfBasBV9e6ZjhIVSElWlnz/LNX2FglY4Xc5o4EtCmPzwyi4KRnQutpxORwqhpfFSy cQBdAkRZHjI+xFyQCl/nUnT82Y4yBJ5c708QZcoZ8NaPukEQUvTqFfcrUUyRi0cSIZny xulA== X-Forwarded-Encrypted: i=1; AJvYcCX1VOHx4bBdE0Sde3O+XlqhPwjdbDf3OLEnpB0ZaM/4n3Ju66CaymL94RUlp+YA3SPN1koMFIrGizHIaPk=@vger.kernel.org X-Gm-Message-State: AOJu0YyMwjVZNDvRAOLBgLZyor8X35Kau0iPjq01Q1L/oyJU+HUuEXCc IkOZlG+vdkUHUGdxqhPSq+GtP3UkBD0HrrwxvxYWFU4GVDqMOqsR9xhyhTAeMoGie8ej9g== X-Gm-Gg: AY/fxX4/7y+9q58DToL9vUHp0jTaiOHfrgDGTO8UHgRQ0riPMBgz0wsJz+MrG2VZtAr Rw6QOp7XD0nM7rfMuJja+iwC+8NO/GrdkQjYQ5Qe6oVUyQxf+Vr4U9DlV1DUDUD0FjJ4L3q5xzH qgnegNfiFrMUo2/7HTf5PgP3YEufNRqsAx2WDkY/twJpsOkY01NdHdqXBJ2BYEh+Zt9c2lExf62 8kR/TGIZajqzo3wVnA0gKf3l9zaruW6/BTOnuDrZ7N60KVnjC1/1GVCHMsxUIP3Cfn+gjVIZMh0 qXsBT5KnBi4V/Ra1Op1pr8IzQJnr2yoInpvachpjW6DuApRahQvCgGY0zU/KzwX61eW0yRmQRz+ n+sLMM44IT36FfDZEBAcVMEtkNRS5hakHchefudXzVLEjbgh4nOCi6iMkKakBEFXhgnt3aDG2Ay eeICoCFUkPcA== X-Google-Smtp-Source: AGHT+IG3CrHjUqC8LaluiLhOYvEUK1BTjRjaMa/qSz9KV9sZCRV9vB/gQavWA0z81NhCPjki2yEypw== X-Received: by 2002:a17:902:ec90:b0:297:e59c:63cc with SMTP id d9443c01a7336-2a2f2737be9mr15900185ad.35.1766116439121; Thu, 18 Dec 2025 19:53:59 -0800 (PST) Received: from wanpengli.. ([175.170.92.22]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2a2f3d4d36esm7368135ad.63.2025.12.18.19.53.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Dec 2025 19:53:58 -0800 (PST) From: Wanpeng Li To: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: K Prateek Nayak , Christian Borntraeger , Steven Rostedt , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li Subject: [PATCH v2 5/9] sched/fair: Wire up yield deboost in yield_to_task_fair() Date: Fri, 19 Dec 2025 11:53:29 +0800 Message-ID: <20251219035334.39790-6-kernellwp@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251219035334.39790-1-kernellwp@gmail.com> References: <20251219035334.39790-1-kernellwp@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Wanpeng Li Integrate yield_to_deboost() into yield_to_task_fair() to activate the vCPU debooster mechanism. The integration works in concert with the existing buddy mechanism: set_next_buddy() provides immediate preference, yield_to_deboost() applies bounded vruntime penalty based on the fairness gap, and yield_task_fair() completes the standard yield path including the EEVDF forfeit operation. Note: yield_to_deboost() must be called BEFORE yield_task_fair() because v6.19+ kernels perform forfeit (se->vruntime = se->deadline) in yield_task_fair(). If deboost runs after forfeit, the fairness gap calculation would see the already-inflated vruntime, resulting in need=0 and only baseline penalty being applied. Performance testing (16 pCPUs host, 16 vCPUs/VM): Dbench 16 clients per VM: 2 VMs: +14.4% throughput 3 VMs: +9.8% throughput 4 VMs: +6.7% throughput Gains stem from sustained lock holder preference reducing ping-pong between yielding vCPUs and lock holders. Most pronounced at moderate overcommit where contention reduction outweighs context switch cost. v1 -> v2: - Move sysctl_sched_vcpu_debooster_enabled check to yield_to_deboost() entry point for early exit before update_rq_clock() - Restore conditional update_curr() check (se_y_lca != cfs_rq->curr) to avoid unnecessary accounting updates - Keep yield_task_fair() unchanged (no for_each_sched_entity loop) to avoid double-penalizing the yielding task - Move yield_to_deboost() BEFORE yield_task_fair() to preserve fairness gap calculation (v6.19+ forfeit would otherwise inflate vruntime before penalty calculation) - Improve function documentation Signed-off-by: Wanpeng Li --- kernel/sched/fair.c | 67 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8738cfc3109c..9e0991f0c618 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9066,23 +9066,19 @@ static bool yield_deboost_rate_limit(struct rq *rq) * Validate tasks for yield deboost operation. * Returns the yielding task on success, NULL on validation failure. * - * Checks: feature enabled, valid target, same runqueue, target is fair class, - * both on_rq. Called under rq->lock. + * Checks: valid target, same runqueue, target is fair class, + * both on_rq, rate limiting. Called under rq->lock. * * Note: p_yielding (rq->donor) is guaranteed to be fair class by the caller * (yield_to_task_fair is only called when curr->sched_class == p->sched_class). + * Note: sysctl_sched_vcpu_debooster_enabled is checked by caller before + * update_rq_clock() to avoid unnecessary clock updates. */ static struct task_struct __maybe_unused * yield_deboost_validate_tasks(struct rq *rq, struct task_struct *p_target) { struct task_struct *p_yielding; - if (!sysctl_sched_vcpu_debooster_enabled) - return NULL; - - if (!p_target) - return NULL; - if (yield_deboost_rate_limit(rq)) return NULL; @@ -9287,6 +9283,57 @@ yield_deboost_apply_penalty(struct sched_entity *se_y_lca, se_y_lca->deadline = new_vruntime + calc_delta_fair(se_y_lca->slice, se_y_lca); } +/* + * yield_to_deboost - Apply vruntime penalty to favor the target task + * @rq: runqueue containing both tasks (rq->lock must be held) + * @p_target: task to favor in scheduling + * + * Cooperates with yield_to_task_fair(): set_next_buddy() provides immediate + * preference; this routine applies a bounded vruntime penalty at the cgroup + * LCA so the target maintains scheduling advantage beyond the buddy effect. + * + * Only operates on tasks resident on the same rq. Penalty is bounded by + * granularity and queue-size caps to prevent starvation. + */ +static void yield_to_deboost(struct rq *rq, struct task_struct *p_target) +{ + struct task_struct *p_yielding; + struct sched_entity *se_y, *se_t, *se_y_lca, *se_t_lca; + struct cfs_rq *cfs_rq_common; + u64 penalty; + + /* Quick validation before updating clock */ + if (!sysctl_sched_vcpu_debooster_enabled) + return; + + if (!p_target) + return; + + /* Update clock - rate limiting and debounce use rq_clock() */ + update_rq_clock(rq); + + /* Full validation including rate limiting */ + p_yielding = yield_deboost_validate_tasks(rq, p_target); + if (!p_yielding) + return; + + se_y = &p_yielding->se; + se_t = &p_target->se; + + /* Find LCA in cgroup hierarchy */ + if (!yield_deboost_find_lca(se_y, se_t, &se_y_lca, &se_t_lca, &cfs_rq_common)) + return; + + /* Update current accounting before modifying vruntime */ + if (se_y_lca != cfs_rq_common->curr) + update_curr(cfs_rq_common); + + /* Calculate and apply penalty */ + penalty = yield_deboost_calculate_penalty(rq, se_y_lca, se_t_lca, + p_target, cfs_rq_common->h_nr_queued); + yield_deboost_apply_penalty(se_y_lca, cfs_rq_common, penalty); +} + /* * sched_yield() is very simple */ @@ -9341,6 +9388,10 @@ static bool yield_to_task_fair(struct rq *rq, struct task_struct *p) /* Tell the scheduler that we'd really like se to run next. */ set_next_buddy(se); + /* Apply deboost BEFORE forfeit to preserve fairness gap calculation */ + yield_to_deboost(rq, p); + + /* Complete the standard yield path (includes forfeit in v6.19+) */ yield_task_fair(rq); return true; -- 2.43.0