From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B79AE3A0B05 for ; Thu, 22 Jan 2026 09:28:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769074123; cv=none; b=t04qgT39LshKNZ+6okT4okGDbYrl+XSJHcNcjEMRW5VsmQf41klQIsnBsvDA+w2zG5s1JeZC0z6JmtT5MDZC6EOeR1eYS1M583nSMML5Jkt3fBScn7AB4bgZnzEc+WDeGMNpE83c8tg7xZPqHza4uwmjgaTwfoQnO07Q3ptrUlA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769074123; c=relaxed/simple; bh=Cd/IbpnrX73vMEkOt1ewQva9AEO2RQVcddrBANb97WQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hha/ofWPGYxegfUC9cmGrLw6KsM9vDsPa7v6XpeNWAAHyDYaqxnvskTvdiCmKeL8KTjZJEP9gLi7YuqI+jJa3XMOku8rUDOMLKwGp1aqwNLcQlpKsOJwWnDMY9F519u8sV6QLeMlEbHB+aWwHt/RcWhf1mkS76gROv/cpZyjfZY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=pmUX6e76; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pmUX6e76" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-47edfdc6c1aso5589035e9.3 for ; Thu, 22 Jan 2026 01:28:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769074120; x=1769678920; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=y4ypKDsz7ui5lbR4h8ZdeIQiaRNT1DTOumna9FC0+Cc=; b=pmUX6e76k6XbUEeDbaqVu3BsAjRsftEByktw1G+vUIxrdGbXsizroV/hrfUaSE6NiF SmT/lPWzYdUFp0ZlnFiFtsg8n10BrQVzpCPhnbyZOY8LyxglmTV205HYd4/6gheXOR9O 8959lucXQJMjkhKFEQyMcJHoR9G3H7H4QoV7Pu3awOEABlrhvtuxxwkPKud9PCf/z3S1 6Hz0HWxMBy6CyRHADeutyLwlFL1TUpcvOp6Rq+6PTmvlbYOYURQw03SO8SxWMM59ahGf r73SoWjjIX6tBEY/0O0CADZuRuBcVQdR6+bryBkR5KIn+5xTPvqhtYgbjTZrrzWoXeup 34CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769074120; x=1769678920; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=y4ypKDsz7ui5lbR4h8ZdeIQiaRNT1DTOumna9FC0+Cc=; b=FTnZKkJ3wfOWwNil1u4gv10YaT5I9+4IgVp/kq1JF5NamjUCdmAFCMxNva/PvfHkIQ QI1ZQEWZL8eMog5kAB79h+mJwSnE+opnmqlR6NCGDG3tMX/5NdXrRAyS2zdAyyDKX7YI 1jLHd2i2H7V8C3aPVGqseeN4qSgH+OW0+PpHk1brpLSR8eaPshvLM8ggX/uRRR+PzP10 aQkSm04PkWVorH4c/ldnAwN9GQhcboz4MY5KedShjKd2HG0KiM3pmbm+9f1QYPQJElDA MHRY9+TsiOqIiHLYCmNFTf0sYvJKqiZpSRCmUXk8RG81dO+hYDJsFXqnpkMwmedbuTTS aOzA== X-Forwarded-Encrypted: i=1; AJvYcCWxoGRbPMZ4NwDTkQVitfcQeu1ylfb5pf7E0FhNn9/jwPC7aaogpqZNc/V9TQT+yypnXnxvectwCaC/8HM=@vger.kernel.org X-Gm-Message-State: AOJu0Yyui4frRSBIW3E2uX6MK/fpYo6+RNXtVC5TfIf889l1Vve5WoGx 0i3TdYKdQO2c9iV2CUtn20x2kvY7fMjMoH15cBxQCFlud5TAsnLXYlFZaklAA0on0jLwsqJr6aP Acbwd1gr+q+DI8A== X-Received: from wmbe7.prod.google.com ([2002:a05:600c:5907:b0:477:9801:6a64]) (user=jpiecuch job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:540f:b0:479:2a3c:f31a with SMTP id 5b1f17b1804b1-4801eaae2b0mr304165495e9.1.1769074119966; Thu, 22 Jan 2026 01:28:39 -0800 (PST) Date: Thu, 22 Jan 2026 09:28:39 +0000 In-Reply-To: <20260121123118.964704-2-arighi@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260121123118.964704-1-arighi@nvidia.com> <20260121123118.964704-2-arighi@nvidia.com> X-Mailer: aerc 0.21.0-0-g5549850facc2 Message-ID: Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics From: Kuba Piecuch To: Andrea Righi , Tejun Heo , David Vernet , Changwoo Min Cc: Emil Tsalapatis , Daniel Hodges , , Content-Type: text/plain; charset="UTF-8" [Resending with reply-all, messed up the first time, apologies.] Hi Andrea, On Wed Jan 21, 2026 at 12:25 PM UTC, Andrea Righi wrote: > Currently, ops.dequeue() is only invoked when the sched_ext core knows > that a task resides in BPF-managed data structures, which causes it to > miss scheduling property change scenarios. As a result, BPF schedulers > cannot reliably track task state. > > In addition, some ops.dequeue() callbacks can be skipped (e.g., during > direct dispatch), so ops.enqueue() calls are not always paired with a > corresponding ops.dequeue(), potentially breaking accounting logic. > > Fix this by guaranteeing that every ops.enqueue() is matched with a > corresponding ops.dequeue(), and introduce the SCX_DEQ_ASYNC flag to > distinguish dequeues triggered by scheduling property changes from those > occurring in the normal dispatch workflow. > > New semantics: > 1. ops.enqueue() is called when a task enters the BPF scheduler > 2. ops.dequeue() is called when the task leaves the BPF scheduler, > because it is dispatched to a DSQ (regular workflow) > 3. ops.dequeue(SCX_DEQ_ASYNC) is called when the task leaves the BPF > scheduler, because a task property is changed (sched_change) What about the case where ops.dequeue() is called due to core-sched picking the task through sched_core_find()? If I understand core-sched correctly, it can happen without prior dispatch, so it doesn't fit case 2, and we're not changing task properties, so it doesn't fit case 3 either. > + /* > + * Set when ops.dequeue() is called after successful dispatch; used to > + * distinguish dispatch dequeues from async dequeues (property changes) > + * and to prevent duplicate dequeue calls. > + */ > + SCX_TASK_DISPATCH_DEQUEUED = 1 << 4, I see this flag being set and cleared in several places, but I don't see it actually being read, is that intentional? > @@ -1529,6 +1553,17 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags) > > switch (opss & SCX_OPSS_STATE_MASK) { > case SCX_OPSS_NONE: > + if (SCX_HAS_OP(sch, dequeue) && > + p->scx.flags & SCX_TASK_OPS_ENQUEUED) { > + bool is_async_dequeue = > + !(deq_flags & (DEQUEUE_SLEEP | SCX_DEQ_CORE_SCHED_EXEC)); > + > + if (is_async_dequeue) > + SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue, rq, > + p, deq_flags | SCX_DEQ_ASYNC); > + p->scx.flags &= ~(SCX_TASK_OPS_ENQUEUED | > + SCX_TASK_DISPATCH_DEQUEUED); > + } > break; > case SCX_OPSS_QUEUEING: > /* > @@ -1537,9 +1572,17 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags) > */ > BUG(); > case SCX_OPSS_QUEUED: > - if (SCX_HAS_OP(sch, dequeue)) > + /* > + * Task is in the enqueued state. This is a property change > + * dequeue before dispatch completes. Notify the BPF scheduler > + * with SCX_DEQ_ASYNC flag. > + */ > + if (SCX_HAS_OP(sch, dequeue)) { > SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue, rq, > - p, deq_flags); > + p, deq_flags | SCX_DEQ_ASYNC); > + p->scx.flags &= ~(SCX_TASK_OPS_ENQUEUED | > + SCX_TASK_DISPATCH_DEQUEUED); > + } > A core-sched pick of a task queued on the BPF scheduler will result in entering the SCX_OPSS_QUEUED case, which in turn will call ops.dequeue(SCX_DEQ_ASYNC). This seems to conflict with the is_async_dequeue check above, which treats SCX_DEQ_CORE_SCHED_EXEC as a synchronous dequeue. Thanks, Kuba