From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A76B72E8DF6 for ; Wed, 3 Sep 2025 20:46:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756932414; cv=none; b=dB/csHFzrBH6mMfnxGpqBIfCLQ0nuFwIGXSZ0RFKfcgfR56g/4Bxgw/zt9ig7DSfeH55xNvO4Lmf3MmY+Wb+N11W4GjPHz1a2qSp0+skFdTIO6GAMySCtdoi90prc1y9sOK7ofpSGrSOyPu90EEahW56tmLny9DTSeBXYfPs34o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756932414; c=relaxed/simple; bh=8Sjpx6LwL8qngcyx8effURsIdeC4kjMw3r2A2PEEfhA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=rOht2uTNWaQMTe/Lg4GnQ5cszuK4w0LBMMjfXsRmx3qhpN9PuWLwSB5P04ZPruz6HhiZEM0PX/+XpWNL8nlajQ+qEAitIBSUeuVtItana1I62xURpzEPKt+SCeVvn6mQHutt7CwQF5Upxwuww0YifH/6ypNi4DIUVgLmx3Ipf8A= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=llzdWare; arc=none smtp.client-ip=209.85.215.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="llzdWare" Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-b47173749dbso168796a12.1 for ; Wed, 03 Sep 2025 13:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1756932412; x=1757537212; darn=vger.kernel.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=iWGn1Qb0KIjnuqOw6TphYaw5j4IfbMgJEHpHGOy8roU=; b=llzdWarerkRMJ6XwxrSlqT7J4pj/NNtDYVKJehRSulAYzULSd8cPmlf7YnZp1I1BIa SPfibDrAd0AXqFQ/I8PiWwaLrKV9cRtOFzCHFrJoVbnxnZcvyijpsHrk+Lr848ml84ta GZhO0XTltUaGQ2akxvtklaBwagx7ox9rujtzv598cFCPS0iJ2fcVpSF9JH2SrD5fNQ7Q arW236PKwz5eQcebEeDhbQQrrxyZqIlFWH4mcbmSLT/OdE807fRJSrU5ho0O61uRcaOu sPpv+ymLjFyV8uYtoIyH83wR7f8Pf9J32fDPCniKaYrDCP/AyJoZXtCeUADfDmV/Kxws Bq7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756932412; x=1757537212; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iWGn1Qb0KIjnuqOw6TphYaw5j4IfbMgJEHpHGOy8roU=; b=T2lb/VevNGHYb8qFDjTefnI1S4QmRR4CNJVAzMsm1adEhdiuOK2gjfK2zu5Zl1eoN2 i80jXzKL9Fxt4AFSP93vak+GjAVkGikwAi295tukLMilcA0SaH3ROqvL2+INPH5LCrrI Yp37KTL2KbacVfYuyI0OBCsfL50FnKyPubY6SGw2kOU/2/59qjYAMnIJyA/FMpQzrJDs UWjMMsNwHifjU35GyRxgVjfpjIDdY8bqs5zvBlh+0yo5ooe6NsqSPDJcLzNIMqk6Jaou 7V6xLJyj5/7fqbcxDjD6F5CpSi+DLuAe8eeMwap57rKLZlwPxR+vAc2dRgmZN8nCaOwS JL1g== X-Forwarded-Encrypted: i=1; AJvYcCXhLu0x22CYibs3uApwJdvlt6lRyIwhe99ou0biKLmMsRWcrTTlUhVOj8q43wJm/4EH3owwpVN1UKY0GBE=@vger.kernel.org X-Gm-Message-State: AOJu0YzggWH7lamLFF4IMnJaWpdhAL55fzHVsLFbfGJyXQQiKIt3X6Mt 3nBoiwI/S15ZnCspZ0Y32pk9K0ONBKmegUUYni9fuBHVlmGz9P4X9sTK9/SXxbhbBA== X-Gm-Gg: ASbGncvLkjLYgE6togcqzofkIvHOwPndCfbMtZc6y51sWOZ7yttUfZkNZVJgR4IY0Ey dRRpo9P9N9zJhXI0T/I2AkPA1/ljkIpWCPlzRv063z2SjfdivQJmFDSuzjVEErMtq0Cnws3y1k/ hVFD8AApgK/wiswATmWah/YrVokVEnaWDrOmv+bplhmzfCp2bCxByCnxUQswPudV08PTidYFAl6 0fDqDtvUb5joSA3pljn12CRRwZu1iEl9DVeld1E1arJhXRWDGN0MM5KBqECJU2+3z7u25ZQKdkv QX9+vsbqCntR7rfZrhrExK2jXWUH4yZSbBFpxeFnY/j0A5gwsI8p3lQjeAf+/vhwg+zOhwONeIB cTu5IDMl9kcCMrei2p9P2+3tBJcNIkv13upI/fENRKrwsLpM3JPZqXcNaFwZdhT7igVMxUijCE9 ez73ojXkQqUg== X-Google-Smtp-Source: AGHT+IEDT8HfDdtAel2AQicVpMxeAncF1ME4lzWDyWrqnTfqxyUP0FHouzQS0YPDBmy4427ZPSfVkQ== X-Received: by 2002:a17:902:f683:b0:24a:aca0:7ce5 with SMTP id d9443c01a7336-24aaca081ddmr181967905ad.44.1756932411666; Wed, 03 Sep 2025 13:46:51 -0700 (PDT) Received: from bsegall-glaptop.localhost ([2a00:79e0:2efc:9:2cc5:e7bd:145c:10aa]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2490375a591sm171671485ad.60.2025.09.03.13.46.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Sep 2025 13:46:51 -0700 (PDT) From: Benjamin Segall To: K Prateek Nayak Cc: Peter Zijlstra , Aaron Lu , Valentin Schneider , Chengming Zhou , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang , , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chuyi Zhou , Jan Kiszka , Florian Bezdeka , Songtang Liu , Chen Yu , Matteo Martelli , Michal Koutn?? , Sebastian Andrzej Siewior Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model In-Reply-To: <14be66aa-e088-4267-ac10-d04d600b1294@amd.com> (K. Prateek Nayak's message of "Wed, 3 Sep 2025 22:42:01 +0530") References: <20250829081120.806-1-ziqianlu@bytedance.com> <20250829081120.806-4-ziqianlu@bytedance.com> <20250903145124.GM4067720@noisy.programming.kicks-ass.net> <14be66aa-e088-4267-ac10-d04d600b1294@amd.com> Date: Wed, 03 Sep 2025 13:46:48 -0700 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain K Prateek Nayak writes: > Hello Peter, > > On 9/3/2025 8:21 PM, Peter Zijlstra wrote: >>> static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) >>> { >>> + if (task_is_throttled(p)) { >>> + dequeue_throttled_task(p, flags); >>> + return true; >>> + } >>> + >>> if (!p->se.sched_delayed) >>> util_est_dequeue(&rq->cfs, p); >>> >> >> OK, so this makes it so that either a task is fully enqueued (all >> cfs_rq's) or full not. A group cfs_rq is only marked throttled when all >> its tasks are gone, and unthrottled when a task gets added. Right? > > cfs_rq (and the hierarchy below) is marked throttled when the quota > has elapsed. Tasks on the throttled hierarchies will dequeue > themselves completely via task work added during pick. When the last > task leaves on a cfs_rq of throttled hierarchy, PELT is frozen for > that cfs_rq. > > When a new task is added on the hierarchy, the PELT is unfrozen and > the task becomes runnable. The cfs_rq and the hierarchy is still > marked throttled. > > Unthrottling of hierarchy is only done at distribution. > >> >> But propagate_entity_cfs_rq() is still doing the old thing, and has a >> if (cfs_rq_throttled(cfs_rq)) break; inside the for_each_sched_entity() >> iteration. >> >> This seems somewhat inconsistent; or am I missing something ? > > Probably an oversight. But before that, what was the reason to have > stopped this propagation at throttled_cfs_rq() before the changes? > Yeah, this was one of the things I was (slowly) looking at - with this series we currently still abort in: 1) update_cfs_group 2) dequeue_entities's set_next_buddy 3) check_preempt_fair 4) yield_to 5) propagate_entity_cfs_rq In the old design on throttle immediately remove the entire cfs_rq, freeze time for it, and stop adjusting load. In the new design we still pick from it, so we definitely don't want to stop time (and don't). I'm guessing we probably also want to now adjust load for it, but it is arguable - since all the cfs_rqs for the tg are likely to throttle at the same time, so we might not want to mess with the shares distribution, since when unthrottle comes around the most likely correct distribution is the distribution we had at the time of throttle. Assuming we do want to adjust load for a throttle then we probably want to remove the aborts from update_cfs_group and propagate_entity_cfs_rq. I'm guessing that we need the list_add_leaf_cfs_rq from propagate, but I'm not 100% sure when they are actually doing something in propagate as opposed to enqueue. The other 3 are the same sort of thing - scheduling pick heuristics which imo are pretty arbitrary to keep. We can reasonably say that "the most likely thing a task in a throttled hierarchy will do is just go throttle itself, so we shouldn't buddy it or let it preempt", but it would also be reasonable to let them preempt/buddy normally, in case they hold locks or such. yield_to is used by kvm and st-dma-fence-chain.c. Yielding to a throttle-on-exit kvm cpu thread isn't useful (so no need to remove the abort there). The dma code is just yielding to a just-spawned kthread, so it should be fine either way.