From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f73.google.com (mail-ej1-f73.google.com [209.85.218.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 791003E0C6E for ; Thu, 19 Mar 2026 15:18:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933522; cv=none; b=ap7vdE972RQgT5wR67eZb+xONRol0au8jV67RtKqEXJ9KRl9mnTwHAtvnrlRFehgJA+rjFvw5D7GxNBuDq/BERDc6VPYYJQY6jJxYejrTK4km7j51blvAaV5McTG99KtA+hk6wOtxEjysCqWqgUXPCwHr8sem/XOqTsHhwVqy6E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773933522; c=relaxed/simple; bh=niHvY7MMeL6zV9kP1e+ImXILWXsXsinBqgjGIvN5SIo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NI/Q9fsXtuT23uiwwlp39MD1aCkY2hGUTNBjqMyQ3/44zRCug82gMv4iCAyNArHHuP/5rKAafnmcheToumqJ4pFkXNMkQ5rLaPdkoAUayMFClhtkXIbzr/hYmhP2eVP//DnZJMgTFeqxmPrrLHjMylWruxjp8b3Hm+xgjYgp3wE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UKwcHM0X; arc=none smtp.client-ip=209.85.218.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UKwcHM0X" Received: by mail-ej1-f73.google.com with SMTP id a640c23a62f3a-b934e96af9dso201885466b.3 for ; Thu, 19 Mar 2026 08:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773933519; x=1774538319; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=11lXjlr/6EIc6z6eP9gKbX/qJapbPddaLXGFefOuYDo=; b=UKwcHM0XcKK6w2zGssUbxZto7MBknyXLGjxH5/tdFHJ7KLWiC9CNtfNVwJnvfzCoqV wKLrP9SYAuv+BfYqriCzqpVr924X7VLxoESgp7vHE86vIK1W8j7ZFtzKcGvtJVrroRmm eofkeDTiYeTvyJ0zGxXoA9Eyva5fg6lhiY8bmJ7IW7hLWexOo7Q+vKmE+P3eIWrWRCSH yf140Ct4oYOHxx4DBrmishN4AlgohreLOaR67dkSc/HpSiW9c0hjGRjQ2UgF1dhVFYtO Fjvm7ili2ZpUzt/wB3G9xRoR7IZAaIaHC5A+yi1yu2y6GcQJkoH+dPCkbN8PgZ2ma/Ll QDYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773933519; x=1774538319; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=11lXjlr/6EIc6z6eP9gKbX/qJapbPddaLXGFefOuYDo=; b=l6UtYllojq4OKe7PyucC/vJhV3wnvTYAJNWWyWq2KtFyYRGO+X1RSDHO2PRXst4Cyb QKOIgcfEVSFrFYqJyVXKKuNkWgi85rTt5KaZhYY7/Pg0hcGm+HfiPfrD17roDl/VtpLt pL1s8OGWv3toPeylb5YLP1n+nwkcHOUrKtLN6jepaDww2G70QsaVCl9L8wzmnBsWSRXP Adxq+4iODPERrjXfbL6+7jDxIelPEIjCgHtAKo62bzXVnE8DWXv6x8abVTnGtDfzDWvw nRfDO8eoVb46EwmzFu9X4X9WgvrCuw+HclzZzboZNHVxN2VKsquII0yoFD/fyMPVhOwV j/hA== X-Forwarded-Encrypted: i=1; AJvYcCWIduI9tT+uH6zHOYcpRxUydX5xDvPJ7gLd5XS5ckGI4T4Gs3ARpifkF5VLNecJQoorUQQI2Iyhwwtf8h0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy1dK4aBzb/PFQ3DuNDHAyfEr4LJKJaGQGesIrb7WcjeMISHC+o QigWa+GaUxMdNWgfCRYVyfrFXH+8VCN9GLPu3fcmy/k6Sln/xUlls0i0AMuwn3vq1NRqboEAyK7 ocm8yysHcTvShDQ== X-Received: from edqw3.prod.google.com ([2002:aa7:d283:0:b0:667:7f8c:7f64]) (user=jpiecuch job=prod-delivery.src-stubby-dispatcher) by 2002:a17:906:3912:b0:b97:bffb:b265 with SMTP id a640c23a62f3a-b97f4a7dff6mr431806566b.36.1773933518764; Thu, 19 Mar 2026 08:18:38 -0700 (PDT) Date: Thu, 19 Mar 2026 15:18:38 +0000 In-Reply-To: <20260319083518.94673-1-arighi@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260319083518.94673-1-arighi@nvidia.com> X-Mailer: aerc 0.21.0-0-g5549850facc2 Message-ID: Subject: Re: [PATCH v2 sched_ext/for-7.1] sched_ext: Invalidate dispatch decisions on CPU affinity changes From: Kuba Piecuch To: Andrea Righi , Tejun Heo , David Vernet , Changwoo Min Cc: Kuba Piecuch , Emil Tsalapatis , Christian Loehle , Daniel Hodges , , Content-Type: text/plain; charset="UTF-8" Hi Andrea, On Thu Mar 19, 2026 at 8:35 AM UTC, Andrea Righi wrote: > A BPF scheduler may rely on p->cpus_ptr from ops.dispatch() to select a > target CPU. However, task affinity can change between the dispatch > decision and its finalization in finish_dispatch(). When this happens, > the scheduler may attempt to dispatch a task to a CPU that is no longer > allowed, resulting in fatal errors such as: > > EXIT: runtime error (SCX_DSQ_LOCAL[_ON] target CPU 10 not allowed for stress-ng-race-[13565]) > > This race exists because ops.dispatch() runs without holding the task's > run queue lock, allowing a concurrent set_cpus_allowed() to update > p->cpus_ptr while the BPF scheduler is still using it. The dispatch is > then finalized using stale affinity information. > > Example timeline: > > CPU0 CPU1 > ---- ---- > task_rq_lock(p) > if (cpumask_test_cpu(cpu, p->cpus_ptr)) > set_cpus_allowed_scx(p, new_mask) > task_rq_unlock(p) > scx_bpf_dsq_insert(p, > SCX_DSQ_LOCAL_ON | cpu, 0) > > With commit ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics"), BPF > schedulers can avoid the affinity race by tracking task state and > handling %SCX_DEQ_SCHED_CHANGE in ops.dequeue(): when a task is dequeued > due to a property change, the scheduler can update the task state and > skip the direct dispatch from ops.dispatch() for non-queued tasks. > > However, schedulers that do not implement task state tracking and > dispatch directly to a local DSQ directly from ops.dispatch() may > trigger the scx_error() condition when the kernel validates the > destination in dispatch_to_local_dsq(). The two paragraphs above mention "direct dispatch from ops.dispatch()" and "dispatch directly to a local DSQ directly from ops.dispatch()". My understanding is that a "direct dispatch" can only happen from ops.select_cpu() or ops.enqueue(), not from ops.dispatch(). Is this just an unfortunate choice of words? Would "dispatch to a local DSQ" be a more accurate phrase here? Thanks, Kuba