From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5382E3DE450 for ; Tue, 24 Mar 2026 10:01:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774346509; cv=none; b=RWaALNV9CUcJ5W4wNsLpP07MP/UVdVFfiJdpe2o1A8XGH7BFMbACMZNlkJYLYXxLxvew3CWZZAnvliNfTT1A1Au/kpVk/SqNtrdNcGTcYZC3Fhn4l1myURxpMsveRjBUt+f3Dr88BFc4bfuIsI/2LMPtP6PVXxaee8iT7IYOkOc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774346509; c=relaxed/simple; bh=0JFD37PWUET4DRz099vqZXUbjvyefYQzxjNIG9YzxNs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cBtJVO8f3hk5smKuN8W8EK6vlr9kTNNSaJhMUJte00aChVALcxHRU3NoS97geaGU09T3Ogd/ncGDkdlEq8biSWEmEilQMWxRvhofc3uF64ygR8yNAeuEQhK5WKtlnowmxmU2LLuQ/r7iYtR1AWBRkj49FwoNQxIlPzKiuU3OaWg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PczDpDqD; arc=none smtp.client-ip=209.85.221.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PczDpDqD" Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-439b611274bso2822833f8f.3 for ; Tue, 24 Mar 2026 03:01:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774346505; x=1774951305; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2/eRdTLLCth4tFdTSbIxC5XnUzV9CuMNagP7Z9e2ycU=; b=PczDpDqDskQKbw353bc18xSW+ao72Kzbbtf95fvTKnEViwyDaRCWGc1MukiMqVmsVP DpPUy3Df8JFgdXj+QnJC/eR4aL321oAiFw0eLkOGoD9+umTF+y3Won5dBxcVnRqLaXbX m0GBL1VmUMwSF0J9+d3bCq0PszsPbWhl5jI35Bd2+hSBJJ9G+n4EnByTqtVsa/sp/tM8 Yb6UMk03CJ/eQfZHsRf7Vt9LS9h2Dq54Y9nCFzkWjcGipKiICn/S6sDEkPaUSUY1Ymya +kTIZf+730oEIAHr98dV7sQcGoIS9cSaEjcylkCiKYKplLBThkL6yJFACxro8WXNpDyP 91XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774346505; x=1774951305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=2/eRdTLLCth4tFdTSbIxC5XnUzV9CuMNagP7Z9e2ycU=; b=qhNqNpOPN5MJ6Sl8sJhJcd5pYI1v9ExcTo3KjM5Vy2TSRa3JJ8UoWHCJsPqZCAnKAy 31mByRnnfFv6btgkh+zmn8daYFmGL0oEqUOC0DatbiPWR/5jaQ/PwayrRRHXTMK0JvqC N8EvIjUkphSuN7zEcIWnqMa286i1OW8n3pJKfWm44Ivau77GEAVoLNwnRtEop2WEI1mb JGuh1VKfTFy7gKbH7WRZ+ex1+hnWd7dl6+8f0j4B0iPv4JoWZX/VwRw0BDsfKMz3cUxy oJWWHaReXro2+k70qziZwt+pNJb7rL+A+jQpz9bhsKqmJEL3eoWM5+4HrxZwB9CKmB9E C87g== X-Forwarded-Encrypted: i=1; AJvYcCWvhh7iUZ5cikHGtAQWrbF9RuyHFH4BmFXXpKgXJ1jWr+/zZjwlcPKXqVbM1nk5I+3/u69HmPjctCXEOcQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzXkm7ZsUJkQYVBG9mQqv36ra/6VB9EVGEPbLDtn2nqyuBMhAPd 7IpiXeBtZZClGQnOdrSt7RJT7cH99rJl0/rDb2zQPhNiQk43WZ3ScewF X-Gm-Gg: ATEYQzwRvIHTHZxVmbIR80Tn6RgA/wd7bABDP4BY+hZ99zFrxuCqLDMM0D74TJTNlWE viDu4VGddLpr5zLqMCOmN9vRJbgo9tXMiS16GDX6z51Bo3PRqv8mlYnSaLaZFV8wEw4i64PnoHh 19XsLmW9g+CBWYNA8boxlE/8N8OPS/85g++LQHaabvkiJP0wA7gc8qAjA9ECV2b8cKObi8KIHxF YZo+xA/95uTYxpuowsZpqeims9qLJS2ece3x0VHqWfSwtXfvPxAVOjGGJ90J0AlQDE3MxWTUXuD I8TwGi4TeYdk8wICKhu2QeQTdlJ7SS41/tnje4jslp94NHzsNUdI6By6BADHkxaVtUD/22N9pVm lzdX7KDruLAvukFwSBYIDz53YZVI6bg1pV7esv9K007dOG5FfgsZutTpe+BctOTiaKsKQFYXBKm q8aMRa0sDiagEG/fp4HGKo9k4dbZZCH5bDvQAAQ705fz9Zhb+SCA== X-Received: by 2002:a05:6000:400d:b0:43b:63df:14ad with SMTP id ffacd0b85a97d-43b6423d917mr24215522f8f.2.1774346487453; Tue, 24 Mar 2026 03:01:27 -0700 (PDT) Received: from hwtest-linux09-fr4.criteo.prod. ([178.250.7.111]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43b644bdaf8sm42078132f8f.13.2026.03.24.03.01.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Mar 2026 03:01:26 -0700 (PDT) From: William Montaz To: vincent.guittot@linaro.org Cc: bsegall@google.com, dietmar.eggemann@arm.com, dsmythies@telus.net, juri.lelli@redhat.com, kprateek.nayak@amd.com, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@kernel.org, peterz@infradead.org, quzicheng@huawei.com, rostedt@goodmis.org, shubhang@os.amperecomputing.com, vschneid@redhat.com, wangtao554@huawei.com Subject: Re: [PATCH v2 6/7] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") Date: Tue, 24 Mar 2026 10:01:26 +0000 Message-ID: <20260324100126.3502-1-willymontaz@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Hi, > Zicheng Qu reported that, because avg_vruntime() always includes > cfs_rq->curr, when ->on_rq, place_entity() doesn't work right. > Specifically, the lag scaling in place_entity() relies on > avg_vruntime() being the state *before* placement of the new entity. > However in this case avg_vruntime() will actually already include the > entity, which breaks things. This has proven to be harmful on our production cluster using kernel version 6.18.19 We witness a parent cgroup entity (/kubepods.slice in our case) changing very frequently load_avg figures, which leads to calling entity_pick->update_cfs_group->reweight_entity very often (pretty much at all entity_tick call). If a cpu hogging task is member of this cgroup and bound to a CPU, we observe starvation of processes bound to that same CPU but not being members of this cgroup (kworkers for ceph rbd in our production case). Looking at /sys/kernel/debug/sched/debug, we can indeed see that cfs_rq[0]:/ .avg_vruntime and .zero_vruntime continuously move back in time while .left_deadline and .left_vruntime are stuck. This is likely due to the wrong lag calculation of the cgroup entity within the root cgroup. We can reproduce that in a sandboxed manner doing the following: * create a cgroup 'CG' * run a cpu intensive task 'offender', bound to a CPU * move the task to cgroup 'CG' * run a cpu intensive task 'victim' bound to the same CPU * To reproduce the frequent call to reweight_entity, we change rapidly CG/cpu.weight from 99, 100, 101 and loop * 'victim' will stop running I use the following script to reproduce: --- #!/bin/bash TARGET_CPU=0 CG_PATH="/sys/fs/cgroup/test_reweight" cat << 'EOF' > heartbeat.c #include #include #include int main() { struct timespec last, now; uint64_t count = 0; clock_gettime(CLOCK_MONOTONIC, &last); while (1) { count++; clock_gettime(CLOCK_MONOTONIC, &now); long delta_ms = (now.tv_sec - last.tv_sec) * 1000 + (now.tv_nsec - last.tv_nsec) / 1000000; if (delta_ms >= 500) { printf("Tick: %lu iterations (delta %ld ms)\n", count, delta_ms); fflush(stdout); count = 0; last = now; } } return 0; } EOF gcc -O2 heartbeat.c -o heartbeat mkdir -p "$CG_PATH" echo "+cpu" > /sys/fs/cgroup/cgroup.subtree_control taskset -c $TARGET_CPU yes > /dev/null & PID_YES=$! echo $PID_YES > "$CG_PATH/cgroup.procs" taskset -c $TARGET_CPU ./heartbeat & PID_HEARTBEAT=$! echo "5 seconds observation..." sleep 5 echo "Jittering on $CG_PATH/cpu.weight..." trap "kill $PID_YES $PID_HEARTBEAT; rmdir $CG_PATH; rm heartbeat.c; rm heartbeat; exit" SIGINT SIGTERM while true; do echo 99 > "$CG_PATH/cpu.weight" echo 100 > "$CG_PATH/cpu.weight" echo 101 > "$CG_PATH/cpu.weight" done --- I tested the following versions: * LTS 5.10.252, 5.15.202, 6.1.166, 6.6.129, 6.12.77 --> no issue * LTS 6.18.19 has the issue * Stable 6.19.9 has the issue * Mainline 7.0-rc5 has the issue * Tip 7.0.0-rc5+ no issue Finally, I applied the patch to 6.18.19 LTS which solves the issue. However, we do not benefit from previous patches such as [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime. Thus I would prefer to let you decide how you want to adress backport on 6.18 If you want I can share my patch file, let me know. Best regards