From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACFAD217F33 for ; Mon, 2 Feb 2026 15:28:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770046125; cv=none; b=pmKQKvaSpeEexJ+r4UZqKCsfr7AYrobfhpDvMl0+T9EuJMzojodhW/8ouF1YSvYwika62gLCZekq/bBMhfA30suKGs4auYTO9uHyHMoAywl/no7tNh8XYfl8cBT//pcEMpyeMCba2XdfHxP5ckwt/iy/b2Wzf7GvpKuGmZMZY38= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770046125; c=relaxed/simple; bh=/0mrByk6UkaNdSmutuuq59yelrMSQ6E1zxgJsGHpAwk=; h=Date:From:To:CC:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=i5OZa2wFt89AkqPWKwwTjPG1sRUCdY9zDSgneupuqjZPJ79veg/3I3YxZfVRaX2xtM5wbLt0GJkG5KYSY7eiPHUAffwLXMmKTvuQr0Wex/Tp6IqUhTFJK4IQy3vuswlpGEdEcpve7j6GmGa8WOl6Lp2TFlbXoXHKg57jd15/DOg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=tQMROglo; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="tQMROglo" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 6129CKgG875574 for ; Mon, 2 Feb 2026 07:28:43 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=s2048-2025-q2; bh=QslMVWa3rpW94NCp3TNQ LHkqvp34SY+Fi4EVkyw49K4=; b=tQMROglowYtcA8JaxEkmrsDgmUKeMlazfuTF DMxGVnz8Oyo9PAlvC2MNOv0LXf7OvoMRK+zDc1vMpb9ve+EudHWIee2iEK0/nAK0 B6IIQd7I31Ny00ivQae+DaT1j9DRcMKnsfWerbHZ+WDDT5DDim5moI4uKPqnyaie KQiITlkJTjPmGzrAPnWoGOV2vMe/gQ/9HmRug1FAnej/BapX5928+ZwT+oWoy8Xg TIU04RqiVu0h5G5kGjOf12UWDOgMaUQM/jD1Cvb4dTXiolu///jLDFSGA4rlmqez PtnMcvwihK6ve27t2eeYh2MUbiBW+VC3biycNwf9rV6/DO9MIQ== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4c2ecmengw-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 02 Feb 2026 07:28:42 -0800 (PST) Received: from twshared26871.17.frc2.facebook.com (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Mon, 2 Feb 2026 15:28:40 +0000 Received: by devbig010.atn3.facebook.com (Postfix, from userid 224791) id 55852862BDA; Mon, 2 Feb 2026 07:28:28 -0800 (PST) Date: Mon, 2 Feb 2026 07:28:28 -0800 From: Daniel Hodges To: Peter Zijlstra CC: Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Subject: Re: [PATCH] sched: Skip schedule() in sched_yield() when CPU has no other work Message-ID: References: <20260202140039.1970735-1-hodgesd@meta.com> <20260202151402.GE1282955@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20260202151402.GE1282955@noisy.programming.kicks-ass.net> X-FB-Internal: Safe X-Authority-Analysis: v=2.4 cv=d7D4CBjE c=1 sm=1 tr=0 ts=6980c2aa cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=kj9zAlcOel0A:10 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=NEAV23lmAAAA:8 a=fDdKHXBs73Q66ZDcZDcA:9 a=CjuIK1q_8ugA:10 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjAyMDEyMSBTYWx0ZWRfX7Txd4i6HwP3m Ue43m7fPX/ftrHnnuoyUyzyAKcLU5zTApjn1+tmG0TlQVJFMXCCeKBvpYHbrVzPt2bZ/5Rus9gL SO6+A+W6TQN0GZ0UlSeluoin+SZZKtNLM6Om9AbH8+bXfOB5r1kfNLXov6VjabQHuBfcQ1AxMxV vp+9hy4BR3Ouk/2RiBCdvv7GQrHVIzLo6R0/mME48OOS4c6KM/wjss9dJRI/zBkgmdUYES/GvBu DNQAkZD8IDslDmbH8LNit6U0f+688cksmIaRxaz83m5st2ZjlcAUdRCmcfnkcFGjFvWf3ZG9/4W 6WLukkvXY7TbtCpUNj7e+L+lml8fNyd8booKqWkJ0WfxtN5jQPpaJA1tV9DWwSsE/G/z1PeiNRZ VEr8P1m2IM4JTUWc5Mm7hUE/lntnZwVGDuDqSFz8GWNhb0AACdUspeQoPgUumcYXyXEOyBQKtvC FakgghH/qjSQf/WEk+g== X-Proofpoint-GUID: op-2QoAX2ts7wikKUs7C0epZpwu-ji7- X-Proofpoint-ORIG-GUID: op-2QoAX2ts7wikKUs7C0epZpwu-ji7- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-02_04,2026-01-30_04,2025-10-01_01 On Mon, Feb 02, 2026 at 04:14:02PM +0100, Peter Zijlstra wrote: > On Mon, Feb 02, 2026 at 06:00:38AM -0800, Daniel Hodges wrote: > > When a task calls sched_yield() but is the only runnable task on its > > CPU with no pending wakeups, there's nothing to yield to. In this case, > > skip the schedule() overhead entirely and return immediately. > > > > The yield_task() callback is still invoked to preserve per-class > > semantics (e.g., SCHED_DEADLINE's dl_yielded flag for bandwidth > > reclamation). The early exit only occurs after yield_task() completes > > and only if nr_running == 1 and ttwu_pending is false. > > > > Testing performed in a 32-CPU VM using virtme-ng: > > > > stress-ng --yield 8, unpinned workers, 10s each, 30 runs: > > Baseline: 10.18M yields/sec > > Optimized: 11.58M yields/sec > > > > The optimization benefits lightly-loaded systems and CPU-pinned > > workloads where tasks are often alone on their CPUs. On loaded systems > > where CPUs have multiple runnable tasks, the check fails and we fall > > through to the normal schedule() path with no regression. > > What is calling sched_yield() enough for this to matter? Calling > sched_yield() outside of FIFO/DL is basically UB. Very good question. I did some more digging through profiles and a lot of it is in the NCCL library: https://github.com/search?q=repo%3ANVIDIA%2Fnccl%20sched_yield&type=code One issue with some of the GPU workloads is that they run on large machines and aren't always fully utilized. Does it make sense to optimize the training libraries instead?