From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A35843D647C for ; Wed, 11 Mar 2026 11:10:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227418; cv=none; b=DLdcRbIMWHtx5iQGTpIAuWOkKpXUwJpNShY4ZKyqI0iXF5yM1QOLUnZoUeoP7F24fAwscRlHfsDaC0wJyrFKuut5ryXJGychDZS+JQxSLE3IiM0AbGh970Lf4qI9SnQesR5J3vMckZ3bPkzyaZNK+ebxKFgdLFJgADRNXgefC4Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227418; c=relaxed/simple; bh=zWuI1bVRtCrY4Vn3SI31YBqX6MOiwVdU58L2T8nCE4A=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pXDA8AfrQHiD5+siAVPdVXD2xCJf6RKB3qc8zPg1b89HjbyC6W2i+TcKZXx15goW31fwhFK12INXEi5l5C7AzKG2MqgYUK9kibMtFJqC8sKbjzgxmSGP4vGEvDxOQV4dherZAtEoo6Xa5juF35QKYyUNeO66gkGeeoOBOBjvSfE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com; spf=none smtp.mailfrom=readmodwrite.com; dkim=pass (2048-bit key) header.d=readmodwrite-com.20230601.gappssmtp.com header.i=@readmodwrite-com.20230601.gappssmtp.com header.b=O71xcw6B; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=readmodwrite-com.20230601.gappssmtp.com header.i=@readmodwrite-com.20230601.gappssmtp.com header.b="O71xcw6B" Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-439ac15f35fso10409068f8f.0 for ; Wed, 11 Mar 2026 04:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readmodwrite-com.20230601.gappssmtp.com; s=20230601; t=1773227414; x=1773832214; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=C7pFJ6CG81cno/CoatDry73R9ImO5x5r/hD6f/HlZv0=; b=O71xcw6BYOR8CKZhnzVVxfFwLrh6gYiku9BT1e3JR9HYdNAfU+OQIgCpD/6jH8eB5A pj99yt7tJoHFGBcKo3XfqaOznAU3bUA721bpq/K+BdyadstbZhmnLy4n7tontX/yjy// w/+3cXG7eBdmbNwKbCiFN6lVxH8QMn6fyTbxJR17XlxxyBlYOakPNkUBSaOGltYWryS6 ueG3BDOjD8031vg2LbT8EkuHXJTw7g7wzJLTqq51BIEE6FAUeHrwSoG9gKa3IgH9qVDX Y7Q9cVKojAP/tqOdu13pyyEOYTbiwHgbhsKamxKtvUXrIWt3SLMXvTi/PKm/gj8JHrE4 GXnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773227414; x=1773832214; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C7pFJ6CG81cno/CoatDry73R9ImO5x5r/hD6f/HlZv0=; b=Fi7Em/ACPAA7p1T/dEIcIZvi92tIhaiw6abmqEZ/gZtFI3Z4UoPJP/4aacfoFh92XN 6EPrcRJLCqjVTPbF9Ir+4yEZRDMOyUGrCwp6t61GzFzjzwvQwzBDt92Flhvbx2F7olh4 AnI5HeaZmHwDii8Mr07M+yhh0F8S6pXaHj35hoeNDkXuQiX6jVDwCkx/PMdt0rsX7oSE uss2W2RXLbLLRa2tl7xHZaHACVp6tZF3bzoRI+lLGPHIYzMKf9SC9kV1CYhdlzt5Qxtd Jqpjmb+PIRWhJRGuAri8E2q1T6IMEsS4PeLBZ+Iif14MC+xmmn3D4Lgaa29mRslXUCzA 7MmA== X-Forwarded-Encrypted: i=1; AJvYcCVyhaqU4RgjBvE8Lu5HfyMPPHQMIfLQIdbGo8ZmcpKMp4o8n1LTl82ipoCcJz1B6BrAzCSDT+hgSpd/6+g=@vger.kernel.org X-Gm-Message-State: AOJu0YzTlWmkofgB4A6jXLNVUGKrxHhG2YtYzOD77WtDFgMnKfI65A7c LBkcor2dHlA61vqhh4tro4SlzptBvTNFpJseGuvT+13o+j0kA6WKmiTtpbhbiqzdoog= X-Gm-Gg: ATEYQzzUAaQ3wEqQyw3mB9nF19ErUSjEsCgSyTpqGGKHHZb5V4453qtlH07Yjcidumg aY70xTu2i6urD0iFV4Pmdj/BWO5nk4Ka4e+dGOKWcmQtpPBEsApBHxJ8cGtDb3AqDBMQ972tbY0 vVk7n9DG4Zbt8H1jem0S5KEXNuRxTDLKEnHY7txhHxMf7BpUk4YNMz7aQ+GlqNWoocliAoJmb7s QwvwSaaoDC5vBqs0e0FV5ynBkZByK1VbOGr/SMQI+4XWfZoHed+MBWfhYbSVNoC0v4wXblaJLKi j5sc6EQvRumZjvb5JldoXkvJO36X/NYqnnEUyKbi7vJ5GAj9Lz9Q7hsyc6kpbwwRx//XIT+dWAj irIXhKMQrYM7Z9lJG0K7xgCywih9jSQOIQC3GOxewQHd1yR+pmSahTD18KZNrR7GQjqB4qxR+fp VJPQ576Z4= X-Received: by 2002:a05:6000:2906:b0:439:936b:bff4 with SMTP id ffacd0b85a97d-439f8439118mr4116834f8f.46.1773227413525; Wed, 11 Mar 2026 04:10:13 -0700 (PDT) Received: from localhost ([2a09:bac1:2880:f0::3d8:12]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439f8224420sm6120465f8f.39.2026.03.11.04.10.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 04:10:12 -0700 (PDT) Date: Wed, 11 Mar 2026 11:10:11 +0000 From: Matt Fleming To: Tejun Heo Cc: sched-ext@lists.linux.dev, kernel-team@cloudflare.com, arighi@nvidia.com, void@manifault.com, changwoo@igalia.com, peterz@infradead.org, linux-kernel@vger.kernel.org Subject: Re: sched_ext: Partial mode priority and fallthrough to EEVDF Message-ID: References: <20260310145213.1060649-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Mar 10, 2026 at 08:27:00AM -1000, Tejun Heo wrote: > > Hmm... I have a bit of hard time following how that's different from partial > mode. If you want the scheduler to decide whether a task should be in SCX or > fair, you can do so from ops.init_task() by asserting p->scx.disallow. If > you mean that you want to switch dynamically on each scheduling event, I > don't think that's a good idea given that each hop would be full sched_class > switch. Oh no, I don't want to switch dynamically at runtime. Doing the classification once at BPF program load time is fine, but AFAIU p->scx.disallow still gives us two scheduling classes (SCHED_EXT and SCHED_NORMAL) where tasks in the fair class get chosen first. > As for the ordering between the two, I don't know. How are you using partial > mode? No matter how you order them, the behaviors on pathological cases are > pretty bad and I've been thinking that most would use partial mode to > partition the system so that some CPUs are managed by SCX and others by fair > in which case the ordering doesn't matter that much. If you're mixing the > two classes on the same CPUs, I wonder whether this is something which can > be better dealt with the deadline servers. Andrea, what do you think? I want to use SCHED_EXT to schedule the most latency-critical tasks because a custom BPF scheduler allows me to make better CPU placement and preemption decisions. Doing it with partial mode allows me to progressively switch services over to SCHED_EXT without needing to take on a mass migration for 100+ services in one go (something I'm trying to my hardest to avoid :) ). To clarify my "fallthrough to EEVDF" comment: if I could run in full-mode, use disallow to keep most tasks EEVDF, and have SCHED_EXT tasks scheduled with higher priority than SCHED_NORMAL then this would tick all the boxes. I have experimented with isolating CPUs where all tasks running are SCHED_EXT while other CPUs run the SCHED_NORMAL workloads, so that's a possibility. But not all our servers are configured that way and given that we run heterogeneous workloads on single machines, it's a tall price to pay capacity-wise if we can't fully utilise those isolated CPUs at all times. And to limit the pathological case in my experiments so far I'm using cpu.max to cap CPU bandwidth (thanks to scx_lavd's bandwidth support). All our services are systemd services, so we can set limits to guard against complete meltdowns. Thanks for the tip on the DL server. This looks promising and might solve my problem nicely. I'll reply in more detail to Andrea's post.