public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
	syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com,
	Sasha Levin <sashal@kernel.org>,
	io-uring@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts()
Date: Wed, 11 Feb 2026 07:30:45 -0500	[thread overview]
Message-ID: <20260211123112.1330287-35-sashal@kernel.org> (raw)
In-Reply-To: <20260211123112.1330287-1-sashal@kernel.org>

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 42b12cb5fd4554679bac06bbdd05dc8b643bcc42 ]

syzbot correctly reports this as a KCSAN race, as ctx->cached_cq_tail
should be read under ->uring_lock. This isn't immediately feasible in
io_flush_timeouts(), but as long as we read a stable value, that should
be good enough. If two io-wq threads compete on this value, then they
will both end up calling io_flush_timeouts() and at least one of them
will see the correct value.

Reported-by: syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Good - the commit `42b12cb5fd455` does contain the fix. The working tree
shows the pre-fix state because HEAD (v6.19) doesn't contain this commit
yet (it's likely in the io_uring for-next tree pending merge).

Now I have all the information needed for a comprehensive analysis.

---

## Comprehensive Analysis

### 1. COMMIT MESSAGE ANALYSIS

The commit subject explicitly says "**annotate** data race" — this is
about addressing a KCSAN-reported data race on `ctx->cached_cq_tail` in
`io_flush_timeouts()`. The author (Jens Axboe, io_uring maintainer)
acknowledges:

- syzbot correctly identifies this as a KCSAN race
- `ctx->cached_cq_tail` should be read under `->uring_lock`, which isn't
  feasible here
- The fix uses `READ_ONCE()` to ensure a stable single load
- The race is **benign**: if two io-wq threads compete, both will call
  `io_flush_timeouts()` and at least one will see the correct value

Key tags: `Reported-by: syzbot` (automated fuzzer), authored by `Jens
Axboe` (io_uring maintainer).

### 2. CODE CHANGE ANALYSIS

The change is a single-line modification in `io_flush_timeouts()`:

**Before:**
```c
seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
```

**After:**
```c
seq = READ_ONCE(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
```

**The race mechanism:**
- `cached_cq_tail` is an `unsigned int` in `struct io_ring_ctx`, in a
  `____cacheline_aligned_in_smp` section
- It is incremented in `io_get_cqe_overflow()` (io_uring.h:256,262) and
  `io_skip_cqe()` (io_uring.c:756), normally under `->completion_lock`
  or `->uring_lock`
- `io_flush_timeouts()` is called from `__io_commit_cqring_flush()` →
  `io_commit_cqring_flush()`, which runs from `__io_cq_unlock_post()`
  and `io_cq_unlock_post()` — both call it **after** releasing
  `completion_lock`
- Without `READ_ONCE()`, the compiler could theoretically generate
  multiple loads of `cached_cq_tail`, or cache a stale value, or
  experience torn reads (though 32-bit aligned reads are atomic on most
  architectures)

**What `READ_ONCE()` provides:**
1. **Compiler barrier**: Prevents the compiler from optimizing away the
   load, generating multiple loads, or reordering it
2. **KCSAN annotation**: Tells KCSAN that this is an intentional racy
   read, suppressing the warning
3. **Single stable load guarantee**: Ensures exactly one load
   instruction is generated

**Interesting precedent**: The same field is accessed in `io_timeout()`
at line 615 using `data_race()` instead:
```c
tail = data_race(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
```
This was added in commit `5498bf28d8f2b` (May 2023) for the same reason.
The choice of `READ_ONCE()` over `data_race()` is slightly stronger —
`READ_ONCE()` guarantees a stable volatile load, while `data_race()`
only suppresses the KCSAN warning without changing code generation.

### 3. CLASSIFICATION

This is a **data race fix** (KCSAN-detected), category "race condition".
While the commit message calls it an "annotation," it does address a
real C-language-level data race:
- Without `READ_ONCE()`, the code has undefined behavior per C11 memory
  model (concurrent unsynchronized read/write of the same variable)
- `READ_ONCE()` eliminates compiler-induced issues from this UB

However, the author is clear that the **observable impact is benign** —
the worst case is one thread seeing a slightly stale `cached_cq_tail`,
which just means some timeouts aren't flushed in this pass but will be
flushed in the next.

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed**: 1 (minimal)
- **Files touched**: 1 (`io_uring/timeout.c`)
- **Complexity**: Trivially low — just wrapping a read with
  `READ_ONCE()`
- **Risk of regression**: Essentially zero. `READ_ONCE()` only adds a
  volatile qualifier to the load; it cannot change functional behavior
- **Subsystem**: io_uring (widely used on modern systems)

### 5. USER IMPACT

- **Who is affected**: Any io_uring user with timeout operations running
  on multi-CPU systems
- **Severity of the bug**: Very low. The race is acknowledged as benign.
  No crash, no corruption, no security issue
- **Observable symptoms**: KCSAN noise in kernel logs when running with
  CONFIG_KCSAN. No user-visible functional issue
- **Without the fix**: Users running KCSAN-enabled kernels see a data
  race report. Theoretically, the compiler could generate suboptimal
  code, though this is unlikely in practice for a single `unsigned int`
  read

### 6. STABILITY INDICATORS

- Written by Jens Axboe (io_uring maintainer and subsystem creator)
- The pattern is well-established — the same fix was done for
  `io_timeout()` in 2023
- The code path is cold (`__cold` attribute on `io_flush_timeouts`)

### 7. DEPENDENCY CHECK

The patch context shows `raw_spin_lock_irq(&ctx->timeout_lock)`, but
stable trees (v6.12, v6.6, v6.1) use `spin_lock_irq(&ctx->timeout_lock)`
because the `raw_spinlock` conversion (`020b40f356249`) hasn't been
backported. The patch will need a trivial context adjustment (the
surrounding `spin_lock_irq` vs `raw_spin_lock_irq` line), but the actual
change (`READ_ONCE()` addition) has no dependencies.

The affected code exists in v6.12, v6.6, and v6.1 stable trees with the
same bug (bare `ctx->cached_cq_tail` read without annotation).

### 8. VERDICT REASONING

**Arguments FOR backporting:**
- Syzbot-reported KCSAN data race — these are real bugs per the C memory
  model
- Fix is trivially small (one line) with zero regression risk
- Fixes undefined behavior (concurrent unsynchronized access)
- `READ_ONCE()` ensures compiler cannot generate problematic code
- Precedent: The same annotation was done for `io_timeout()` in 2023
- io_uring is widely used; this is a commonly exercised path for timeout
  users
- Written by subsystem maintainer

**Arguments AGAINST backporting:**
- The commit message explicitly says the race is benign ("as long as we
  read a stable value, that should be good enough")
- No crash, corruption, security issue, or user-visible problem
- This is fundamentally a KCSAN annotation — it silences a sanitizer
  warning
- The `unsigned int` field is naturally atomic on all supported
  architectures (no tearing)
- The value is read once into a local variable, so compiler optimization
  concerns are minimal

**Assessment:**
While this is a legitimate data race fix and KCSAN reports should be
taken seriously, the commit author explicitly acknowledges this is a
benign race with no user-visible consequences. The fix is purely about
C-language correctness and KCSAN suppression. Stable kernels prioritize
fixes for bugs that affect real users. This data race does not cause
crashes, corruption, or any functional issue. The risk is zero, but the
benefit is also minimal — mainly cleaner KCSAN output for kernel
developers testing stable trees.

This falls in the "nice to have but not necessary" category for stable.
It's an annotation for correctness rather than a fix for a user-facing
bug.

**YES**

 io_uring/timeout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index d8fbbaf31cf35..84dda24f3eb24 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -130,7 +130,7 @@ __cold void io_flush_timeouts(struct io_ring_ctx *ctx)
 	u32 seq;
 
 	raw_spin_lock_irq(&ctx->timeout_lock);
-	seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
+	seq = READ_ONCE(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
 
 	list_for_each_entry_safe(timeout, tmp, &ctx->timeout_list, list) {
 		struct io_kiocb *req = cmd_to_io_kiocb(timeout);
-- 
2.51.0


      parent reply	other threads:[~2026-02-11 12:32 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 12:30 [PATCH AUTOSEL 6.19-5.10] s390/perf: Disable register readout on sampling events Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] arm64: Add support for TSV110 Spectre-BHB mitigation Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] xenbus: Use .freeze/.thaw to handle xenbus devices Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] s390/boot: " Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.1] perf/arm-cmn: Support CMN-600AE Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] ntfs: ->d_compare() must not block Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display) Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] block: decouple secure erase size limit from discard size limit Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: don't reference obsolete termio struct for TC* constants Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't go past the ARM processor CPER record buffer Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] cpufreq: dt-platdev: Block the driver from probing on more QC platforms Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] EFI/CPER: don't dump the entire memory region Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: battery: fix incorrect charging status when current is zero Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] rust: cpufreq: always inline functions using build_assert with arguments Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] blk-mq-sched: unify elevators checking for async requests Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] APEI/GHES: ARM processor Error: don't go past allocated memory Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] md raid: fix hang when stopping arrays with metadata through dm-raid Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] tools/power cpupower: Reset errno before strtoull() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] sparc: Synchronize user stack on fork and clone Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] rnbd-srv: Zero the rsp buffer before using it Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] alpha: fix user-space corruption during memory compaction Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] ACPICA: Abort AML bytecode execution when executing AML_FATAL_OP Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] arm64: mte: Set TCMA1 whenever MTE is present in the kernel Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] tools/cpupower: Fix inverted APERF capability check Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.15] ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4() Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[] Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.6] md-cluster: fix NULL pointer dereference in process_metadata_update Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] APEI/GHES: ensure that won't go past CPER allocated record Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.12] powercap: intel_rapl: Add PL4 support for Ice Lake Sasha Levin
2026-02-11 12:30 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260211123112.1330287-35-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox