patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>, Andrea Righi <arighi@nvidia.com>,
	Sasha Levin <sashal@kernel.org>,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, sched-ext@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.17-6.16] sched_ext: Keep bypass on between enable failure and scx_disable_workfn()
Date: Mon,  6 Oct 2025 14:17:39 -0400	[thread overview]
Message-ID: <20251006181835.1919496-7-sashal@kernel.org> (raw)
In-Reply-To: <20251006181835.1919496-1-sashal@kernel.org>

From: Tejun Heo <tj@kernel.org>

[ Upstream commit 4a1d9d73aabc8f97f48c4f84f936de3b265ffd6f ]

scx_enable() turns on the bypass mode while enable is in progress. If
enabling fails, it turns off the bypass mode and then triggers scx_error().
scx_error() will trigger scx_disable_workfn() which will turn on the bypass
mode again and unload the failed scheduler.

This moves the system out of bypass mode between the enable error path and
the disable path, which is unnecessary and can be brittle - e.g. the thread
running scx_enable() may already be on the failed scheduler and can be
switched out before it triggers scx_error() leading to a stall. The watchdog
would eventually kick in, so the situation isn't critical but is still
suboptimal.

There is nothing to be gained by turning off the bypass mode between
scx_enable() failure and scx_disable_workfn(). Keep bypass on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Based on my comprehensive analysis, here is my determination:

## **YES**

This commit should be backported to stable kernel trees.

### Detailed Analysis:

#### **Context and Timing:**
- **sched_ext introduction**: This feature was first introduced in
  kernel **v6.12** (confirmed via git tag analysis)
- **Applicability**: This commit is only relevant for kernels **v6.12
  and later** that include sched_ext
- **Current status**: The commit has **already been selected** for
  backporting to 6.17 (commit 2d013fc03f11a with Sasha Levin's sign-off)

#### **Bug Description and Impact:**
The code change addresses a **race condition in the error handling
path** during BPF scheduler enablement:

**Old behavior (buggy):**
1. Line 4580: `scx_bypass(true)` - Enable bypass mode during scheduler
   initialization
2. On error → Line 4708-4711: `scx_bypass(false)` - **Disable bypass
   mode**
3. Call `scx_error()` which triggers `scx_disable_workfn()`
4. Line 3873 in `scx_disable_workfn()`: `scx_bypass(true)` - Re-enable
   bypass mode

**Problem:** Between steps 2 and 4, the system is **out of bypass mode**
with a failed scheduler. The commit message explicitly states: *"the
thread running scx_enable() may already be on the failed scheduler and
can be switched out before it triggers scx_error() **leading to a
stall**"*

**New behavior (fixed):**
Simply **removes** the `scx_bypass(false)` call at line 4710, keeping
bypass mode continuously enabled from the failure point through the
entire disable sequence.

#### **Why This Should Be Backported:**

1. **Real Bug**: This fixes an actual stall condition (confirmed by
   author Tejun Heo and acked by Andrea Righi)

2. **User Impact**: While the watchdog eventually recovers, users
   experience **unnecessary stalls** when BPF schedulers fail to load -
   a real-world scenario

3. **Minimal Risk**:
   - **1-line change** (removal only)
   - Makes error path **more conservative** (keeps bypass on longer)
   - No new logic introduced
   - Only affects **error conditions**, not normal operation

4. **Stable Tree Criteria Met**:
   - ✅ Fixes important bug affecting users
   - ✅ Doesn't introduce new features
   - ✅ No architectural changes
   - ✅ Minimal regression risk
   - ✅ Confined to sched_ext subsystem

5. **Already Validated**: The autosel process has already selected this
   for 6.17, indicating automated analysis confirms its suitability

6. **Active Subsystem**: Multiple sched_ext fixes show this is under
   active maintenance and bug discovery

#### **Code Change Details:**
```c
// kernel/sched/ext.c, line 4708-4712
err_disable_unlock_all:
    scx_cgroup_unlock();
    percpu_up_write(&scx_fork_rwsem);
- scx_bypass(false);  // REMOVED
+   /* we'll soon enter disable path, keep bypass on */  // ADDED
COMMENT
err_disable:
```

This single-line removal prevents the problematic window where the
system exits bypass mode between error detection and cleanup,
eliminating the potential for stalls during scheduler enable failures.

 kernel/sched/ext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 088ceff38c8a4..2ccc885a229d5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5795,7 +5795,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
 err_disable_unlock_all:
 	scx_cgroup_unlock();
 	percpu_up_write(&scx_fork_rwsem);
-	scx_bypass(false);
+	/* we'll soon enter disable path, keep bypass on */
 err_disable:
 	mutex_unlock(&scx_enable_mutex);
 	/*
-- 
2.51.0


  parent reply	other threads:[~2025-10-06 18:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 18:17 [PATCH AUTOSEL 6.17-5.4] x86/build: Remove cc-option from stack alignment flags Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.1] btrfs: zoned: refine extent allocator hint selection Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.1] arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] btrfs: abort transaction on specific error places when walking log tree Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-5.4] btrfs: use smp_mb__after_atomic() when forcing COW in create_pending_snapshot() Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17] x86/bugs: Add attack vector controls for VMSCAPE Sasha Levin
2025-10-06 18:17 ` Sasha Levin [this message]
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] btrfs: abort transaction if we fail to update inode in log replay dir fixup Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.1] EDAC/mc_sysfs: Increase legacy channel support to 16 Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] btrfs: abort transaction in the process_one_buffer() log tree walk callback Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.6] btrfs: zoned: return error from btrfs_zone_finish_endio() Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.6] perf: Skip user unwind if the task is a kernel thread Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.1] perf: Have get_perf_callchain() return NULL if crosstask and user are set Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.16] EDAC: Fix wrong executable file modes for C source files Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.6] perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.6] btrfs: use level argument in log tree walk callback replay_one_buffer() Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] sched_ext: Make qmap dump operation non-destructive Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] seccomp: passthrough uprobe systemcall without filtering Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] cpuset: Use new excpus for nocpu error check when enabling root partition Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.12] btrfs: tree-checker: add inode extref checks Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17] sched/fair: update_cfs_group() for throttled cfs_rqs Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-5.15] btrfs: scrub: replace max_t()/min_t() with clamp() in scrub_throttle_dev_io() Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-5.4] x86/bugs: Fix reporting of LFENCE retpoline Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-5.10] btrfs: always drop log root tree reference in btrfs_replay_log() Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17-6.6] audit: record fanotify event regardless of presence of rules Sasha Levin
2025-10-06 18:17 ` [PATCH AUTOSEL 6.17] EDAC/ie31200: Add two more Intel Alder Lake-S SoCs for EDAC support Sasha Levin
2025-10-06 18:18 ` [PATCH AUTOSEL 6.17-6.6] x86/bugs: Report correct retbleed mitigation status Sasha Levin
2025-10-06 21:55 ` [PATCH AUTOSEL 6.17-5.4] x86/build: Remove cc-option from stack alignment flags Nathan Chancellor
2025-10-06 23:13   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251006181835.1919496-7-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).