From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACFAD217F33
	for <linux-kernel@vger.kernel.org>; Mon,  2 Feb 2026 15:28:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770046125; cv=none; b=pmKQKvaSpeEexJ+r4UZqKCsfr7AYrobfhpDvMl0+T9EuJMzojodhW/8ouF1YSvYwika62gLCZekq/bBMhfA30suKGs4auYTO9uHyHMoAywl/no7tNh8XYfl8cBT//pcEMpyeMCba2XdfHxP5ckwt/iy/b2Wzf7GvpKuGmZMZY38=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770046125; c=relaxed/simple;
	bh=/0mrByk6UkaNdSmutuuq59yelrMSQ6E1zxgJsGHpAwk=;
	h=Date:From:To:CC:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=i5OZa2wFt89AkqPWKwwTjPG1sRUCdY9zDSgneupuqjZPJ79veg/3I3YxZfVRaX2xtM5wbLt0GJkG5KYSY7eiPHUAffwLXMmKTvuQr0Wex/Tp6IqUhTFJK4IQy3vuswlpGEdEcpve7j6GmGa8WOl6Lp2TFlbXoXHKg57jd15/DOg=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=tQMROglo; arc=none smtp.client-ip=67.231.145.42
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="tQMROglo"
Received: from pps.filterd (m0109333.ppops.net [127.0.0.1])
	by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 6129CKgG875574
	for <linux-kernel@vger.kernel.org>; Mon, 2 Feb 2026 07:28:43 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc
	:content-type:date:from:in-reply-to:message-id:mime-version
	:references:subject:to; s=s2048-2025-q2; bh=QslMVWa3rpW94NCp3TNQ
	LHkqvp34SY+Fi4EVkyw49K4=; b=tQMROglowYtcA8JaxEkmrsDgmUKeMlazfuTF
	DMxGVnz8Oyo9PAlvC2MNOv0LXf7OvoMRK+zDc1vMpb9ve+EudHWIee2iEK0/nAK0
	B6IIQd7I31Ny00ivQae+DaT1j9DRcMKnsfWerbHZ+WDDT5DDim5moI4uKPqnyaie
	KQiITlkJTjPmGzrAPnWoGOV2vMe/gQ/9HmRug1FAnej/BapX5928+ZwT+oWoy8Xg
	TIU04RqiVu0h5G5kGjOf12UWDOgMaUQM/jD1Cvb4dTXiolu///jLDFSGA4rlmqez
	PtnMcvwihK6ve27t2eeYh2MUbiBW+VC3biycNwf9rV6/DO9MIQ==
Received: from mail.thefacebook.com ([163.114.134.16])
	by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4c2ecmengw-6
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT)
	for <linux-kernel@vger.kernel.org>; Mon, 02 Feb 2026 07:28:42 -0800 (PST)
Received: from twshared26871.17.frc2.facebook.com (2620:10d:c085:108::150d) by
 mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.2.2562.35; Mon, 2 Feb 2026 15:28:40 +0000
Received: by devbig010.atn3.facebook.com (Postfix, from userid 224791)
	id 55852862BDA; Mon,  2 Feb 2026 07:28:28 -0800 (PST)
Date: Mon, 2 Feb 2026 07:28:28 -0800
From: Daniel Hodges <hodgesd@meta.com>
To: Peter Zijlstra <peterz@infradead.org>
CC: Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann
	<dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>, Ben Segall
	<bsegall@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Valentin Schneider
	<vschneid@redhat.com>,
        <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched: Skip schedule() in sched_yield() when CPU has no
 other work
Message-ID: <aYDBoS7JPSv9qpUT@fb.com>
References: <20260202140039.1970735-1-hodgesd@meta.com>
 <20260202151402.GE1282955@noisy.programming.kicks-ass.net>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20260202151402.GE1282955@noisy.programming.kicks-ass.net>
X-FB-Internal: Safe
X-Authority-Analysis: v=2.4 cv=d7D4CBjE c=1 sm=1 tr=0 ts=6980c2aa cx=c_pps
 a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17
 a=kj9zAlcOel0A:10 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22
 a=NEAV23lmAAAA:8 a=fDdKHXBs73Q66ZDcZDcA:9 a=CjuIK1q_8ugA:10
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjAyMDEyMSBTYWx0ZWRfX7Txd4i6HwP3m
 Ue43m7fPX/ftrHnnuoyUyzyAKcLU5zTApjn1+tmG0TlQVJFMXCCeKBvpYHbrVzPt2bZ/5Rus9gL
 SO6+A+W6TQN0GZ0UlSeluoin+SZZKtNLM6Om9AbH8+bXfOB5r1kfNLXov6VjabQHuBfcQ1AxMxV
 vp+9hy4BR3Ouk/2RiBCdvv7GQrHVIzLo6R0/mME48OOS4c6KM/wjss9dJRI/zBkgmdUYES/GvBu
 DNQAkZD8IDslDmbH8LNit6U0f+688cksmIaRxaz83m5st2ZjlcAUdRCmcfnkcFGjFvWf3ZG9/4W
 6WLukkvXY7TbtCpUNj7e+L+lml8fNyd8booKqWkJ0WfxtN5jQPpaJA1tV9DWwSsE/G/z1PeiNRZ
 VEr8P1m2IM4JTUWc5Mm7hUE/lntnZwVGDuDqSFz8GWNhb0AACdUspeQoPgUumcYXyXEOyBQKtvC
 FakgghH/qjSQf/WEk+g==
X-Proofpoint-GUID: op-2QoAX2ts7wikKUs7C0epZpwu-ji7-
X-Proofpoint-ORIG-GUID: op-2QoAX2ts7wikKUs7C0epZpwu-ji7-
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49
 definitions=2026-02-02_04,2026-01-30_04,2025-10-01_01

On Mon, Feb 02, 2026 at 04:14:02PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 02, 2026 at 06:00:38AM -0800, Daniel Hodges wrote:
> > When a task calls sched_yield() but is the only runnable task on its
> > CPU with no pending wakeups, there's nothing to yield to. In this case,
> > skip the schedule() overhead entirely and return immediately.
> > 
> > The yield_task() callback is still invoked to preserve per-class
> > semantics (e.g., SCHED_DEADLINE's dl_yielded flag for bandwidth
> > reclamation). The early exit only occurs after yield_task() completes
> > and only if nr_running == 1 and ttwu_pending is false.
> > 
> > Testing performed in a 32-CPU VM using virtme-ng:
> > 
> > stress-ng --yield 8, unpinned workers, 10s each, 30 runs:
> >      Baseline:  10.18M yields/sec
> >      Optimized: 11.58M yields/sec
> > 
> > The optimization benefits lightly-loaded systems and CPU-pinned
> > workloads where tasks are often alone on their CPUs. On loaded systems
> > where CPUs have multiple runnable tasks, the check fails and we fall
> > through to the normal schedule() path with no regression.
> 
> What is calling sched_yield() enough for this to matter? Calling
> sched_yield() outside of FIFO/DL is basically UB.

Very good question. I did some more digging through profiles and a lot
of it is in the NCCL library:
https://github.com/search?q=repo%3ANVIDIA%2Fnccl%20sched_yield&type=code

One issue with some of the GPU workloads is that they run on large
machines and aren't always fully utilized. Does it make sense to
optimize the training libraries instead?