Linux io-uring development
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
       [not found] <20260511221931.2370053-1-sashal@kernel.org>
@ 2026-05-11 22:19 ` Sasha Levin
  2026-05-12 15:47   ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2026-05-11 22:19 UTC (permalink / raw)
  To: patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Jens Axboe, Maoyi Xie, Sasha Levin,
	io-uring, linux-kernel

From: Maoyi Xie <maoyixie.tju@gmail.com>

[ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Record: Subsystem `io_uring/wait`; action verb `honour`; intent is to
make `IORING_ENTER_ABS_TIMER` interpret caller absolute times in the
caller’s time namespace.

Record: Tags present:
`Suggested-by: Pavel Begunkov`, `Suggested-by: Jens Axboe`, author
`Signed-off-by: Maoyi Xie`, `Link:
https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`,
maintainer `Signed-off-by: Jens Axboe`. No `Fixes:`, `Reported-by:`,
`Tested-by:`, `Reviewed-by`, `Acked-by`, or `Cc: stable`.

Record: The commit describes a real userspace-visible bug:
`io_uring_enter()` with `IORING_ENTER_ABS_TIMER` parses `ext_arg->ts`
directly, then arms an absolute hrtimer without converting from the
caller’s time namespace to host time. The supplied reproducer in
`unshare --user --time` with a `-10s` monotonic offset returns `-ETIME`
in under 1 ms instead of about 1 second.

Record: This is not hidden cleanup. It is a direct correctness fix for
absolute timeout interpretation in time namespaces.

### Phase 2: Diff Analysis
Record: One file changed, `io_uring/wait.c`, 5 insertions and 1
deletion. Function modified: `io_cqring_wait()`. Scope: single-file
surgical fix.

Record: Before, `ext_arg->ts` was converted with
`timespec64_to_ktime()`. If `IORING_ENTER_ABS_TIMER` was unset, the code
added `start_time`; if set, it used the raw caller value as a host
absolute deadline. After, the absolute branch calls
`timens_ktime_to_host(ctx->clockid, iowq.timeout)`, while the relative
branch remains unchanged.

Record: Bug category is logic/correctness in time namespace handling.
The broken mechanism is that a namespaced absolute
`CLOCK_MONOTONIC`/`CLOCK_BOOTTIME` timestamp was fed to a host hrtimer
as if it were already in host time.

Record: Fix quality is strong: minimal, local, uses existing kernel
helper, and no new API. Regression risk is very low because
`timens_ktime_to_host()` is verified as a no-op for the initial time
namespace, for unsupported clocks, and when `CONFIG_TIME_NS` is
disabled.

### Phase 3: Git History Investigation
Record: `git blame` on the changed wait lines points to `0105b0562a5e`
(`io_uring: split out CQ waiting code into wait.c`) for the current file
location. The same logic predates the split; `2b8e976b9842` (`io_uring:
user registered clockid for wait timeouts`) shows this absolute-wait
path using `ctx->clockid` and is contained by `v6.12-rc1`.

Record: No `Fixes:` tag is present, so there was no tagged introducing
commit to follow. I inspected the companion parent commit instead:
`9cc6bac1bebf` fixes the same time-namespace issue for
`IORING_TIMEOUT_ABS`.

Record: Recent related history shows this is patch 2/2 after
`9cc6bac1bebf`. The candidate’s parent is exactly `9cc6bac1bebf`, but
this wait fix compiles independently as long as `timens_ktime_to_host()`
and `ctx->clockid` exist.

Record: Author history in `io_uring` before this commit only showed the
companion timeout fix. Jens Axboe applied the patch, and Pavel/Jens were
suggested-by/review participants.

Record: Dependencies: affected stable trees need `ctx->clockid` and
`timens_ktime_to_host()`. I verified both exist in local `for-
greg/6.12-100`; the same `IORING_ENTER_ABS_TIMER` buggy line exists in
`6.12`, `6.18`, `6.19`, and `7.0` local stable branches, but not in
`5.10`, `5.15`, `6.1`, or `6.6`.

### Phase 4: Mailing List And External Research
Record: `b4 dig -c 45d2b37a37ab...` found the original submission at `ht
tps://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`.

Record: `b4 dig -a` found only v1 of the series. The thread shows Jens
applied both patches with commit IDs `9cc6bac1bebf` and `45d2b37a37ab`.

Record: `b4 dig -w` shows the right people/lists were included: Maoyi
Xie, Jens Axboe, Pavel Begunkov, `io-uring@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`.

Record: Reviewer feedback was positive: Pavel wrote “both look good” and
requested a liburing test; Jens replied “+1” for the test and later
applied the series. No NAKs or objections found.

Record: No separate bug-report link exists beyond the patch
thread/reproducer. Stable-specific WebFetch was blocked by Anubis, and
local thread search found no stable nomination.

### Phase 5: Code Semantic Analysis
Record: Modified function: `io_cqring_wait()`.

Record: Callers: `io_uring_enter(2)` reaches `io_cqring_wait()` when
`IORING_ENTER_GETEVENTS` is set, after `io_get_ext_arg()` copies/parses
the userspace getevents argument. This is directly syscall-reachable.

Record: Key callees: `timespec64_to_ktime()`, `timens_ktime_to_host()`,
`ktime_add()`, `io_get_time()`, `io_cqring_schedule_timeout()`, and
hrtimer setup/start helpers.

Record: Call chain: userspace `io_uring_enter()` -> `io_get_ext_arg()`
-> `io_cqring_wait()` -> `io_cqring_wait_schedule()` ->
`__io_cqring_wait_schedule()` -> `io_cqring_schedule_timeout()` ->
absolute hrtimer. The buggy path is reachable from userspace with
`IORING_ENTER_GETEVENTS | IORING_ENTER_EXT_ARG |
IORING_ENTER_ABS_TIMER`.

Record: Similar patterns: the companion commit fixes
`io_parse_user_time()` for `IORING_TIMEOUT_ABS`; POSIX timers,
`clock_nanosleep`, alarm timers, and `timerfd` already use
`timens_ktime_to_host()` for absolute timers.

### Phase 6: Stable Tree Analysis
Record: Local stable-branch grep found the buggy
`IORING_ENTER_ABS_TIMER` code in `for-greg/6.12-100`, `for-
greg/6.18-100`, `for-greg/6.19-200`, and `for-greg/7.0-100`. It was
absent from `5.10`, `5.15`, `6.1`, and `6.6`.

Record: Backport difficulty: current `7.0.y` apply check succeeds
cleanly. `6.12`/`7.0` have `io_uring/wait.c`; `6.18`/`6.19` local
branches have the same logic in `io_uring/io_uring.c`, so those need a
path/context backport but not semantic rework.

Record: No related fix with this subject was found in the checked stable
candidate branches.

### Phase 7: Subsystem Context
Record: Subsystem is `io_uring`, a core async I/O syscall subsystem.
Criticality: IMPORTANT, not universal core MM/VFS, but directly
userspace-facing and widely used.

Record: Subsystem activity is high; recent `io_uring` history has many
fixes and feature changes. This specific change is small despite the
active subsystem.

### Phase 8: Impact And Risk
Record: Affected population: users of `io_uring_enter()` absolute CQ
wait timeouts inside non-initial time namespaces, especially container-
like environments. Branch-limited to stable trees that contain
`IORING_ENTER_ABS_TIMER`.

Record: Trigger: userspace can trigger via `io_uring_enter()` with
`IORING_ENTER_ABS_TIMER` and a timespec from a shifted time namespace.
The provided reproducer uses `unshare --user --time`; whether fully
unprivileged depends on system user-namespace policy.

Record: Failure mode: incorrect timeout behavior. With the reproduced
negative offset, the wait returns `-ETIME` immediately; with other
offsets, absolute waits can be delayed incorrectly. Severity: MEDIUM to
HIGH user-visible correctness bug, potential application timeout/hang
behavior, but not a kernel crash, memory corruption, or security fix.

Record: Benefit is moderate/high for affected containerized users
because it restores syscall semantics. Risk is very low: one local
conditional change plus an include, using established helper semantics.

### Phase 9: Final Synthesis
Record: Evidence for backporting: real reproduced bug, syscall-
reachable, affects stable branches with the feature, tiny patch,
maintainer-applied, positive reviewer feedback, matches established
time-namespace behavior elsewhere.

Record: Evidence against backporting: not a crash/security/data-
corruption fix; affects a narrower feature combination; no explicit
stable nomination; older stable trees do not contain the affected
feature.

Record: Unresolved: I did not run the reproducer locally. Lore WebFetch
was blocked by Anubis, but `b4` successfully fetched the thread. Exact
first upstream introduction of `IORING_ENTER_ABS_TIMER` was not cleanly
reconstructed from local blame alone, but affected stable branches were
directly verified by grep.

Stable rules:
1. Obviously correct and tested: yes by code inspection, reproducer, and
   positive review; no formal `Tested-by`.
2. Fixes a real bug: yes, reproduced wrong timeout result.
3. Important issue: yes for affected users, because absolute waits can
   return immediately or at the wrong time.
4. Small and contained: yes, 6-line single-function change.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current `7.0.y`; minor path adjustment
   may be needed in some branches.

No automatic exception category applies.

## Verification
- [Phase 1] Parsed `git show` commit message and tags for
  `45d2b37a37ab98484693533496395c610a2cab96`.
- [Phase 2] Verified diff is one file, `io_uring/wait.c`, 5 insertions/1
  deletion in `io_cqring_wait()`.
- [Phase 3] Ran `git blame` on the changed lines; current file location
  comes from `0105b0562a5e`.
- [Phase 3] Inspected `2b8e976b9842`; verified `ctx->clockid`,
  `io_get_time(ctx)`, and selected-clock wait timeout support.
- [Phase 3] Inspected companion commit `9cc6bac1bebf`; verified same
  class of fix for `IORING_TIMEOUT_ABS`.
- [Phase 4] Ran `b4 dig`, `b4 dig -a`, `b4 dig -w`, and `b4 mbox`;
  verified v1-only series, correct recipients, positive feedback, and
  applied notice.
- [Phase 5] Read `io_uring_enter()` and `io_get_ext_arg()` call path;
  verified direct syscall reachability.
- [Phase 5] Verified `timens_ktime_to_host()` behavior in
  `include/linux/time_namespace.h` and `kernel/time/namespace.c`.
- [Phase 5] Verified similar established conversions in `kernel/time`
  and `fs/timerfd.c`.
- [Phase 6] Ran `git grep` on local stable branches; affected: `6.12`,
  `6.18`, `6.19`, `7.0`; unaffected: `5.10`, `5.15`, `6.1`, `6.6`.
- [Phase 6] Ran `git apply --check` for the candidate patch on current
  `7.0.y`; it applies cleanly.
- [Phase 8] Verified reproducer details from commit and mailing-list
  cover letter; did not execute it locally.

This should be backported to stable trees that contain
`IORING_ENTER_ABS_TIMER`, with the companion timeout patch strongly
recommended for complete io_uring absolute-timeout time-namespace
correctness.

**YES**

 io_uring/wait.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/wait.c b/io_uring/wait.c
index 91df86ce0d18c..ec01e78a216d6 100644
--- a/io_uring/wait.c
+++ b/io_uring/wait.c
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/io_uring.h>
+#include <linux/time_namespace.h>
 
 #include <trace/events/io_uring.h>
 
@@ -229,7 +230,10 @@ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 
 	if (ext_arg->ts_set) {
 		iowq.timeout = timespec64_to_ktime(ext_arg->ts);
-		if (!(flags & IORING_ENTER_ABS_TIMER))
+		if (flags & IORING_ENTER_ABS_TIMER)
+			iowq.timeout = timens_ktime_to_host(ctx->clockid,
+							    iowq.timeout);
+		else
 			iowq.timeout = ktime_add(iowq.timeout, start_time);
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-11 22:19 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
@ 2026-05-12 15:47   ` Jens Axboe
  2026-05-15 14:04     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2026-05-12 15:47 UTC (permalink / raw)
  To: Sasha Levin, patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring, linux-kernel

On 5/11/26 4:19 PM, Sasha Levin wrote:
> From: Maoyi Xie <maoyixie.tju@gmail.com>
> 
> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]

If you auto-pick this one, please also do the other one in the
series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense
to do just one of them.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-12 15:47   ` Jens Axboe
@ 2026-05-15 14:04     ` Jens Axboe
  2026-05-15 14:11       ` Sasha Levin
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2026-05-15 14:04 UTC (permalink / raw)
  To: Sasha Levin, patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring, linux-kernel

On 5/12/26 9:47 AM, Jens Axboe wrote:
> On 5/11/26 4:19 PM, Sasha Levin wrote:
>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>
>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
> 
> If you auto-pick this one, please also do the other one in the
> series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense
> to do just one of them.

Hello?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-15 14:04     ` Jens Axboe
@ 2026-05-15 14:11       ` Sasha Levin
  0 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-15 14:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: patches, stable, Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring,
	linux-kernel

On Fri, May 15, 2026 at 08:04:57AM -0600, Jens Axboe wrote:
>On 5/12/26 9:47 AM, Jens Axboe wrote:
>> On 5/11/26 4:19 PM, Sasha Levin wrote:
>>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>>
>>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>
>> If you auto-pick this one, please also do the other one in the
>> series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense
>> to do just one of them.
>
>Hello?

Sorry, yes - I'll pick it up too.

I usually let AUTOSEL reviews soak for a bit before I go over them to catch
follow up fixes or other concerns.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0-6.6] io_uring: hold uring_lock when walking link chain in io_wq_free_work()
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
@ 2026-05-20 11:18 ` Sasha Levin
  2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:18 UTC (permalink / raw)
  To: patches, stable; +Cc: Jens Axboe, Sasha Levin, io-uring, linux-kernel

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 20c39819a27646573dfa0ac0d01c38895298a6f6 ]

io_wq_free_work() calls io_req_find_next() from io-wq worker context,
which reads and clears req->link without holding any lock. This can
potentially race with other paths that mutate the same chain under
ctx->uring_lock.

Take ctx->uring_lock around the io_req_find_next() call. Only requests
with IO_REQ_LINK_FLAGS reach this path, which is not the hot path.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `io_uring`; action verb `hold`; intent:
serialize linked-request chain walking in `io_wq_free_work()` with
`ctx->uring_lock`.

Step 1.2 Record: Tags present in commit
`20c39819a27646573dfa0ac0d01c38895298a6f6`:
- `Signed-off-by: Jens Axboe <axboe@kernel.dk>`
- No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Acked-
  by:`, `Link:`, or `Cc: stable@vger.kernel.org` tags in the committed
  message.

Step 1.3 Record: The commit states that `io_wq_free_work()` calls
`io_req_find_next()` from io-wq worker context, and `io_req_find_next()`
reads and clears `req->link` without a lock. The stated failure mode is
a potential race with other paths mutating the same chain under
`ctx->uring_lock`. No stack trace, reproducer, affected-version
statement, or user report is in the commit message.

Step 1.4 Record: This is a hidden bug fix despite the subject not saying
“fix”: it adds missing synchronization around shared linked-request
state. The diff confirms it is not a cleanup or feature.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `io_uring/io_uring.c`, 6 insertions
and 1 deletion. Only `io_wq_free_work()` is modified. Scope: single-file
surgical locking fix.

Step 2.2 Record: Before, `io_wq_free_work()` called
`io_req_find_next(req)` directly when `IO_REQ_LINK_FLAGS` was set.
After, it stores `req->ctx`, takes `ctx->uring_lock`, calls
`io_req_find_next(req)`, and unlocks. The affected path is io-wq worker
completion/freeing of linked requests, not the normal unlinked hot path.

Step 2.3 Record: Bug category is synchronization/race condition.
`io_req_find_next()` reads `req->link` and clears it; `git grep`
verified other link-chain assignment/mutation sites in
submission/timeout paths. The fix serializes this worker-side chain walk
with the mutex used by normal chain mutation paths.

Step 2.4 Record: The fix is obviously small and locally correct: it
protects exactly the shared `req->link` read/clear. Regression risk is
low but not zero, because it adds a mutex acquisition in worker cleanup.
The commit message and code both verify the path is limited to requests
with `IO_REQ_LINK_FLAGS`.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` on the pre-fix parent showed the current
direct `io_wq_free_work()` call to `io_req_find_next()` came from
`247f97a5f19b64`, described by `git describe` as `v6.5-rc1~235^2~10`.
Older helper-based worker cleanup existed before that;
`io_wq_free_work()`/io-wq callback code is present from at least
`v5.15-rc1~185^2~41`, and stable branch checks show equivalent
vulnerable helper paths in `5.10.y`, `5.15.y`, and `6.1.y`.

Step 3.2 Record: No `Fixes:` tag is present, so there was no tagged
introducing commit to follow.

Step 3.3 Record: Recent `io_uring/io_uring.c` history includes related
io-wq/refcount work, notably `390513642ee676` / stable variants,
“io_uring: always do atomic put from iowq,” which changed the same
function and was KCSAN/syzbot-motivated. Mainline related commits
immediately after this candidate are `49ae66eb8c273` and
`a65855ec34aed`, the other two patches in the linked-request locking
series.

Step 3.4 Record: `MAINTAINERS` verifies Jens Axboe is the `IO_URING`
maintainer. `git log --author='Jens Axboe' -- io_uring` shows multiple
recent io_uring commits by him.

Step 3.5 Record: Build-wise this patch is standalone for trees with the
current direct `io_wq_free_work()` shape. For older stable trees using
`io_put_req_find_next()`, it needs a manual backport into the helper or
equivalent worker path. Semantically, it is patch 1/3 of a related
locking series; patches `49ae66eb8c273` and `a65855ec34aed` should be
considered with it to complete the linked-chain locking invariant.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 20c39819a2764` found the original submission
at `https://patch.msgid.link/20260511182217.226763-2-axboe@kernel.dk`.
`b4 dig -a` found only v1. The saved mbox shows this was `[PATCH 1/3]`.

Step 4.2 Record: `b4 dig -w` showed the patch was sent by Jens Axboe to
`io-uring@vger.kernel.org`, with Jens on Cc. No separate
reviewer/maintainer tags or replies were found in the saved matched
thread.

Step 4.3 Record: No bug-report link or `Reported-by:` tag exists. Web
search for the exact subject did not find a direct bug report.

Step 4.4 Record: The mbox cover letter says the series is “Linked
request fix” and “closing some gaps on linked requests, where iterating
a chain must hold either ->uring_lock OR ->timeout_lock, and modifying
any existing [chain] must hold both.” Patch 2 defers linked-timeout
splicing out of hrtimer context; patch 3 keeps `uring_lock` held across
`io_kill_timeouts()`.

Step 4.5 Record: WebFetch of lore was blocked by Anubis, but `b4`
successfully retrieved the thread. Web search did not find stable-
specific discussion for this exact patch. No direct stable nomination
was verified.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Modified function: `io_wq_free_work()`.

Step 5.2 Record: Callers verified by `git grep`: `io_wq_free_work()` is
called from `io_uring/io-wq.c` after `io_wq_submit_work()` in the worker
loop and from the cancel path helper `io_run_cancel()`. This is io-wq
worker context.

Step 5.3 Record: Key callee is `io_req_find_next()`, verified to read
`req->link`, set `req->link = NULL`, and return the next linked request.
`io_wq_free_work()` then frees the current request via `io_free_req()`.

Step 5.4 Record: Reachability is verified from userspace:
`io_uring_enter()` locks `ctx->uring_lock` and calls `io_submit_sqes()`,
user SQE flags include `IOSQE_IO_LINK`, `IOSQE_IO_HARDLINK`, and
`IOSQE_ASYNC`, and async paths queue work into io-wq. This makes the
affected path reachable by user-submitted linked async io_uring
requests.

Step 5.5 Record: Similar patterns found: the normal completion/free
batching path calls `io_queue_next()`/`io_req_find_next()` while
`__io_submit_flush_completions()` and `io_free_batch_list()` require
`ctx->uring_lock`. Timeout code also mutates `req->link`, and the same
series addresses that.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: Stable branch checks verified equivalent vulnerable
code in `stable/linux-5.10.y`, `stable/linux-5.15.y`,
`stable/linux-6.1.y`, `stable/linux-6.6.y`, `stable/linux-6.12.y`,
`stable/linux-6.19.y`, and `stable/linux-7.0.y`. The exact direct hunk
exists in newer trees; older trees use `io_put_req_find_next()`.

Step 6.2 Record: `git apply --check` of the candidate patch succeeded on
the current checked-out `stable/linux-7.0.y` tree. Backport difficulty:
clean or near-clean for newer trees with the direct function body;
manual but simple for older helper-based trees.

Step 6.3 Record: Exact-subject `git log` over listed stable branches
found no existing stable copy of this fix. Related stable history
contains earlier io_uring link/refcount fixes, but not this locking fix.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is `io_uring`, a core async I/O subsystem
reachable through the `io_uring_enter` syscall. Criticality:
important/core-adjacent because it is syscall-reachable and handles
request lifetime, completion, and linked request ordering.

Step 7.2 Record: The subsystem is active: recent mainline history around
the candidate contains multiple io_uring fixes and refactors, and the
candidate came through the io_uring maintainer tree.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected users are systems using io_uring linked
requests that can complete through io-wq, especially linked async
operations. This is feature/config/user-workload specific, not
universal.

Step 8.2 Record: Trigger requires linked request chains and worker
completion/cancellation interleaving with other chain mutation/walk
paths. Unprivileged reachability depends on system policy, but the code
path is syscall-reachable through io_uring submission. No public
reproducer was verified.

Step 8.3 Record: Verified failure mode is an unsynchronized data race on
`req->link`. The precise observed symptom is unverified, but the raced
state controls request-chain lifetime/progression; plausible
consequences include lost/misordered linked request handling or memory-
safety/lifetime bugs. Severity: medium-high to high because it is a
syscall-reachable race in request lifetime code, though no crash report
was verified.

Step 8.4 Record: Benefit is high enough for stable because it removes a
real locking hole in io_uring linked-request handling. Risk is low:
6-line contained mutex protection, not on the unlinked hot path, no new
API, no behavior change except serialization.

## Phase 9: Final Synthesis
Step 9.1 Evidence for backporting:
- Verified real unsynchronized access: `io_wq_free_work()` called
  `io_req_find_next()` without `ctx->uring_lock`, while
  `io_req_find_next()` reads and clears `req->link`.
- Verified reachability from user-submitted io_uring linked async
  requests.
- Verified equivalent code exists across active stable trees from
  `5.10.y` through `7.0.y`.
- Verified fix is tiny, contained, maintainer-authored, and applies
  cleanly to `7.0.y`.
- Verified this is part of an explicit “Linked request fix” series
  closing locking gaps.

Evidence against or concerns:
- No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, stable Cc,
  stack trace, or reproducer was verified.
- It is patch 1/3 of a related series; for best stable correctness,
  patches `49ae66eb8c273` and `a65855ec34aed` should be
  evaluated/backported together.
- Older stable trees need manual adaptation to `io_put_req_find_next()`.

Step 9.2 Stable rules checklist:
1. Obviously correct and tested? Correct by inspection; external testing
   not verified.
2. Fixes a real bug? Yes, verified race on `req->link`.
3. Important issue? Yes enough for stable: syscall-reachable request-
   chain/lifetime race, though exact crash symptom is unverified.
4. Small and contained? Yes, one function, one file, 6 insertions.
5. No new features/APIs? Yes.
6. Can apply to stable? Yes for newer trees; older trees need
   straightforward backport.

Step 9.3 Exception category: None. This is not a device ID, quirk, DT,
build, or documentation-only patch.

Step 9.4 Decision: Backport. The technical merit is a small, maintainer-
authored synchronization fix for a verified io_uring linked-request race
in code present across stable trees. Treat the rest of the 3-patch
linked-request locking series as related follow-up material rather than
ignoring it.

## Verification
- [Phase 1] `git show --format=fuller --stat --patch 20c39819a2764`:
  confirmed subject, body, tags, one-file diff, and absence of
  `Fixes:`/`Reported-by:`/stable tags.
- [Phase 2] Diff inspection: confirmed
  `mutex_lock(&ctx->uring_lock)`/unlock added around
  `io_req_find_next()` only in `io_wq_free_work()`.
- [Phase 3] `git blame` on the parent: confirmed `io_wq_free_work()`
  direct call history and `req->link` read/clear history.
- [Phase 3] `git describe --contains`: confirmed relevant code history
  reaches released kernels including `v6.5-rc1`, `v5.15-rc1`, and
  earlier helper behavior.
- [Phase 3] Stable branch `git grep`: confirmed vulnerable worker/helper
  paths in `5.10.y`, `5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`, `6.19.y`, and
  `7.0.y`.
- [Phase 4] `b4 dig -c`, `-a`, `-w`, and saved mbox: confirmed original
  submission URL, v1-only series, recipients, cover-letter context, and
  3-patch series.
- [Phase 4] WebFetch: lore direct fetch was blocked by Anubis; b4 mbox
  provided the thread content.
- [Phase 5] `git grep` call tracing: confirmed `io_wq_free_work()`
  callers in io-wq worker/cancel paths, `io_uring_enter()` to
  `io_submit_sqes()` submission path, and user-visible link flags.
- [Phase 6] `git apply --check`: confirmed the patch applies cleanly to
  the current `stable/linux-7.0.y` worktree.
- [Phase 7] `MAINTAINERS` search: confirmed Jens Axboe is the `IO_URING`
  maintainer.
- [Phase 8] Verified no public reproducer or exact crash symptom in the
  commit/thread; severity assessment is based on verified race location
  and syscall reachability, not on an observed report.

**YES**

 io_uring/io_uring.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 97260bca67e7b..a72efb3a62bac 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1450,8 +1450,13 @@ struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
 	struct io_kiocb *nxt = NULL;
 
 	if (req_ref_put_and_test_atomic(req)) {
-		if (req->flags & IO_REQ_LINK_FLAGS)
+		if (req->flags & IO_REQ_LINK_FLAGS) {
+			struct io_ring_ctx *ctx = req->ctx;
+
+			mutex_lock(&ctx->uring_lock);
 			nxt = io_req_find_next(req);
+			mutex_unlock(&ctx->uring_lock);
+		}
 		io_free_req(req);
 	}
 	return nxt ? &nxt->work : NULL;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
  2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] io_uring: hold uring_lock when walking link chain in io_wq_free_work() Sasha Levin
@ 2026-05-20 11:18 ` Sasha Levin
  2026-05-20 11:40   ` Jens Axboe
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: defer linked-timeout chain splice out of hrtimer context Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:18 UTC (permalink / raw)
  To: patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Jens Axboe, Maoyi Xie, Sasha Levin,
	io-uring, linux-kernel

From: Maoyi Xie <maoyixie.tju@gmail.com>

[ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Record: Subsystem `io_uring/wait`; action verb `honour`; intent is to
make `IORING_ENTER_ABS_TIMER` interpret caller absolute times in the
caller’s time namespace.

Record: Tags present:
`Suggested-by: Pavel Begunkov`, `Suggested-by: Jens Axboe`, author
`Signed-off-by: Maoyi Xie`, `Link:
https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`,
maintainer `Signed-off-by: Jens Axboe`. No `Fixes:`, `Reported-by:`,
`Tested-by:`, `Reviewed-by`, `Acked-by`, or `Cc: stable`.

Record: The commit describes a real userspace-visible bug:
`io_uring_enter()` with `IORING_ENTER_ABS_TIMER` parses `ext_arg->ts`
directly, then arms an absolute hrtimer without converting from the
caller’s time namespace to host time. The supplied reproducer in
`unshare --user --time` with a `-10s` monotonic offset returns `-ETIME`
in under 1 ms instead of about 1 second.

Record: This is not hidden cleanup. It is a direct correctness fix for
absolute timeout interpretation in time namespaces.

### Phase 2: Diff Analysis
Record: One file changed, `io_uring/wait.c`, 5 insertions and 1
deletion. Function modified: `io_cqring_wait()`. Scope: single-file
surgical fix.

Record: Before, `ext_arg->ts` was converted with
`timespec64_to_ktime()`. If `IORING_ENTER_ABS_TIMER` was unset, the code
added `start_time`; if set, it used the raw caller value as a host
absolute deadline. After, the absolute branch calls
`timens_ktime_to_host(ctx->clockid, iowq.timeout)`, while the relative
branch remains unchanged.

Record: Bug category is logic/correctness in time namespace handling.
The broken mechanism is that a namespaced absolute
`CLOCK_MONOTONIC`/`CLOCK_BOOTTIME` timestamp was fed to a host hrtimer
as if it were already in host time.

Record: Fix quality is strong: minimal, local, uses existing kernel
helper, and no new API. Regression risk is very low because
`timens_ktime_to_host()` is verified as a no-op for the initial time
namespace, for unsupported clocks, and when `CONFIG_TIME_NS` is
disabled.

### Phase 3: Git History Investigation
Record: `git blame` on the changed wait lines points to `0105b0562a5e`
(`io_uring: split out CQ waiting code into wait.c`) for the current file
location. The same logic predates the split; `2b8e976b9842` (`io_uring:
user registered clockid for wait timeouts`) shows this absolute-wait
path using `ctx->clockid` and is contained by `v6.12-rc1`.

Record: No `Fixes:` tag is present, so there was no tagged introducing
commit to follow. I inspected the companion parent commit instead:
`9cc6bac1bebf` fixes the same time-namespace issue for
`IORING_TIMEOUT_ABS`.

Record: Recent related history shows this is patch 2/2 after
`9cc6bac1bebf`. The candidate’s parent is exactly `9cc6bac1bebf`, but
this wait fix compiles independently as long as `timens_ktime_to_host()`
and `ctx->clockid` exist.

Record: Author history in `io_uring` before this commit only showed the
companion timeout fix. Jens Axboe applied the patch, and Pavel/Jens were
suggested-by/review participants.

Record: Dependencies: affected stable trees need `ctx->clockid` and
`timens_ktime_to_host()`. I verified both exist in local `for-
greg/6.12-100`; the same `IORING_ENTER_ABS_TIMER` buggy line exists in
`6.12`, `6.18`, `6.19`, and `7.0` local stable branches, but not in
`5.10`, `5.15`, `6.1`, or `6.6`.

### Phase 4: Mailing List And External Research
Record: `b4 dig -c 45d2b37a37ab...` found the original submission at `ht
tps://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`.

Record: `b4 dig -a` found only v1 of the series. The thread shows Jens
applied both patches with commit IDs `9cc6bac1bebf` and `45d2b37a37ab`.

Record: `b4 dig -w` shows the right people/lists were included: Maoyi
Xie, Jens Axboe, Pavel Begunkov, `io-uring@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`.

Record: Reviewer feedback was positive: Pavel wrote “both look good” and
requested a liburing test; Jens replied “+1” for the test and later
applied the series. No NAKs or objections found.

Record: No separate bug-report link exists beyond the patch
thread/reproducer. Stable-specific WebFetch was blocked by Anubis, and
local thread search found no stable nomination.

### Phase 5: Code Semantic Analysis
Record: Modified function: `io_cqring_wait()`.

Record: Callers: `io_uring_enter(2)` reaches `io_cqring_wait()` when
`IORING_ENTER_GETEVENTS` is set, after `io_get_ext_arg()` copies/parses
the userspace getevents argument. This is directly syscall-reachable.

Record: Key callees: `timespec64_to_ktime()`, `timens_ktime_to_host()`,
`ktime_add()`, `io_get_time()`, `io_cqring_schedule_timeout()`, and
hrtimer setup/start helpers.

Record: Call chain: userspace `io_uring_enter()` -> `io_get_ext_arg()`
-> `io_cqring_wait()` -> `io_cqring_wait_schedule()` ->
`__io_cqring_wait_schedule()` -> `io_cqring_schedule_timeout()` ->
absolute hrtimer. The buggy path is reachable from userspace with
`IORING_ENTER_GETEVENTS | IORING_ENTER_EXT_ARG |
IORING_ENTER_ABS_TIMER`.

Record: Similar patterns: the companion commit fixes
`io_parse_user_time()` for `IORING_TIMEOUT_ABS`; POSIX timers,
`clock_nanosleep`, alarm timers, and `timerfd` already use
`timens_ktime_to_host()` for absolute timers.

### Phase 6: Stable Tree Analysis
Record: Local stable-branch grep found the buggy
`IORING_ENTER_ABS_TIMER` code in `for-greg/6.12-100`, `for-
greg/6.18-100`, `for-greg/6.19-200`, and `for-greg/7.0-100`. It was
absent from `5.10`, `5.15`, `6.1`, and `6.6`.

Record: Backport difficulty: current `7.0.y` apply check succeeds
cleanly. `6.12`/`7.0` have `io_uring/wait.c`; `6.18`/`6.19` local
branches have the same logic in `io_uring/io_uring.c`, so those need a
path/context backport but not semantic rework.

Record: No related fix with this subject was found in the checked stable
candidate branches.

### Phase 7: Subsystem Context
Record: Subsystem is `io_uring`, a core async I/O syscall subsystem.
Criticality: IMPORTANT, not universal core MM/VFS, but directly
userspace-facing and widely used.

Record: Subsystem activity is high; recent `io_uring` history has many
fixes and feature changes. This specific change is small despite the
active subsystem.

### Phase 8: Impact And Risk
Record: Affected population: users of `io_uring_enter()` absolute CQ
wait timeouts inside non-initial time namespaces, especially container-
like environments. Branch-limited to stable trees that contain
`IORING_ENTER_ABS_TIMER`.

Record: Trigger: userspace can trigger via `io_uring_enter()` with
`IORING_ENTER_ABS_TIMER` and a timespec from a shifted time namespace.
The provided reproducer uses `unshare --user --time`; whether fully
unprivileged depends on system user-namespace policy.

Record: Failure mode: incorrect timeout behavior. With the reproduced
negative offset, the wait returns `-ETIME` immediately; with other
offsets, absolute waits can be delayed incorrectly. Severity: MEDIUM to
HIGH user-visible correctness bug, potential application timeout/hang
behavior, but not a kernel crash, memory corruption, or security fix.

Record: Benefit is moderate/high for affected containerized users
because it restores syscall semantics. Risk is very low: one local
conditional change plus an include, using established helper semantics.

### Phase 9: Final Synthesis
Record: Evidence for backporting: real reproduced bug, syscall-
reachable, affects stable branches with the feature, tiny patch,
maintainer-applied, positive reviewer feedback, matches established
time-namespace behavior elsewhere.

Record: Evidence against backporting: not a crash/security/data-
corruption fix; affects a narrower feature combination; no explicit
stable nomination; older stable trees do not contain the affected
feature.

Record: Unresolved: I did not run the reproducer locally. Lore WebFetch
was blocked by Anubis, but `b4` successfully fetched the thread. Exact
first upstream introduction of `IORING_ENTER_ABS_TIMER` was not cleanly
reconstructed from local blame alone, but affected stable branches were
directly verified by grep.

Stable rules:
1. Obviously correct and tested: yes by code inspection, reproducer, and
   positive review; no formal `Tested-by`.
2. Fixes a real bug: yes, reproduced wrong timeout result.
3. Important issue: yes for affected users, because absolute waits can
   return immediately or at the wrong time.
4. Small and contained: yes, 6-line single-function change.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current `7.0.y`; minor path adjustment
   may be needed in some branches.

No automatic exception category applies.

## Verification
- [Phase 1] Parsed `git show` commit message and tags for
  `45d2b37a37ab98484693533496395c610a2cab96`.
- [Phase 2] Verified diff is one file, `io_uring/wait.c`, 5 insertions/1
  deletion in `io_cqring_wait()`.
- [Phase 3] Ran `git blame` on the changed lines; current file location
  comes from `0105b0562a5e`.
- [Phase 3] Inspected `2b8e976b9842`; verified `ctx->clockid`,
  `io_get_time(ctx)`, and selected-clock wait timeout support.
- [Phase 3] Inspected companion commit `9cc6bac1bebf`; verified same
  class of fix for `IORING_TIMEOUT_ABS`.
- [Phase 4] Ran `b4 dig`, `b4 dig -a`, `b4 dig -w`, and `b4 mbox`;
  verified v1-only series, correct recipients, positive feedback, and
  applied notice.
- [Phase 5] Read `io_uring_enter()` and `io_get_ext_arg()` call path;
  verified direct syscall reachability.
- [Phase 5] Verified `timens_ktime_to_host()` behavior in
  `include/linux/time_namespace.h` and `kernel/time/namespace.c`.
- [Phase 5] Verified similar established conversions in `kernel/time`
  and `fs/timerfd.c`.
- [Phase 6] Ran `git grep` on local stable branches; affected: `6.12`,
  `6.18`, `6.19`, `7.0`; unaffected: `5.10`, `5.15`, `6.1`, `6.6`.
- [Phase 6] Ran `git apply --check` for the candidate patch on current
  `7.0.y`; it applies cleanly.
- [Phase 8] Verified reproducer details from commit and mailing-list
  cover letter; did not execute it locally.

This should be backported to stable trees that contain
`IORING_ENTER_ABS_TIMER`, with the companion timeout patch strongly
recommended for complete io_uring absolute-timeout time-namespace
correctness.

**YES**

 io_uring/wait.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/wait.c b/io_uring/wait.c
index 91df86ce0d18c..ec01e78a216d6 100644
--- a/io_uring/wait.c
+++ b/io_uring/wait.c
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/io_uring.h>
+#include <linux/time_namespace.h>
 
 #include <trace/events/io_uring.h>
 
@@ -229,7 +230,10 @@ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 
 	if (ext_arg->ts_set) {
 		iowq.timeout = timespec64_to_ktime(ext_arg->ts);
-		if (!(flags & IORING_ENTER_ABS_TIMER))
+		if (flags & IORING_ENTER_ABS_TIMER)
+			iowq.timeout = timens_ktime_to_host(ctx->clockid,
+							    iowq.timeout);
+		else
 			iowq.timeout = ktime_add(iowq.timeout, start_time);
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0] io_uring: defer linked-timeout chain splice out of hrtimer context
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
  2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] io_uring: hold uring_lock when walking link chain in io_wq_free_work() Sasha Levin
  2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
@ 2026-05-20 11:19 ` Sasha Levin
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] io_uring: validate user-controlled cq.head in io_cqe_cache_refill() Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:19 UTC (permalink / raw)
  To: patches, stable; +Cc: Jens Axboe, Sasha Levin, io-uring, linux-kernel

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 49ae66eb8c27375075ffa308cfd4bf25af335d41 ]

io_link_timeout_fn() is the hrtimer callback that fires when a linked
timeout expires. It currently calls io_remove_next_linked(prev) under
ctx->timeout_lock to splice the timeout request out of the link chain.
This is the only chain-mutation site that runs without ctx->uring_lock,
because hrtimer callbacks cannot take a mutex. Defer the splicing until
the task_work callback.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Record: subsystem `io_uring`; action verb `defer`; intent is to move
linked-timeout chain splicing out of `io_link_timeout_fn()` hrtimer
context and into task_work.

Record: tags found only `Signed-off-by: Jens Axboe <axboe@kernel.dk>`.
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Cc: stable`,
or external `Link:` in the upstream commit.

Record: the body describes a locking bug: `io_link_timeout_fn()` mutates
the linked request chain under `ctx->timeout_lock` but without
`ctx->uring_lock`; hrtimer callbacks cannot take the mutex, so mutation
is deferred to task_work.

Record: this is a real hidden bug fix despite not saying “fix”: it
corrects an unsynchronized linked-list mutation in an hrtimer callback.

## Phase 2: Diff Analysis
Record: one file changed, `io_uring/timeout.c`, 14 insertions and 2
deletions. Modified functions: `__io_disarm_linked_timeout()`,
`io_req_task_link_timeout()`, `io_link_timeout_fn()`. Scope is single-
file surgical locking/race fix.

Record: before, `io_link_timeout_fn()` called
`io_remove_next_linked(prev)` directly from hrtimer context under only
`timeout_lock`. After, the timer claims the timeout, stores
`timeout->prev`, and queues task_work; `io_req_task_link_timeout()` then
splices `req` out of `prev->link` if the normal completion path did not
already do so.

Record: `__io_disarm_linked_timeout()` now detects `timeout->head ==
NULL`, meaning the timer already claimed the timeout, and avoids
cancel/list removal in that race.

Record: bug category is synchronization/race on linked request chain
mutation. Fix quality is good but series-sensitive: patch 3/3
(`a65855ec34aed`) is needed to keep `io_kill_timeouts()` walking chains
under `uring_lock` after this patch changes where splicing happens.

## Phase 3: Git History Investigation
Record: blame shows the relevant linked-timeout code and
`io_remove_next_linked()` originated mainly from `59915143e89f`
(“io_uring: move timeout opcodes and handling into its own file”), first
contained around `v6.0-rc1`; later timeout-lock changes include
`020b40f35624`, and `__io_disarm_linked_timeout()` changes include
`78967aabf613`, first around `v6.16-rc1`.

Record: no `Fixes:` tag exists, so there was no tagged introducer to
follow.

Record: recent history shows this is patch 2/3 in a linked-request
locking series:
`20c39819a276` locks `io_wq_free_work()` chain walking,
`49ae66eb8c27` defers linked-timeout splicing,
`a65855ec34ae` keeps `uring_lock` across `io_kill_timeouts()`.

Record: Jens Axboe is listed in `MAINTAINERS` as the `IO_URING`
maintainer and authored the commit.

## Phase 4: Mailing List And External Research
Record: `b4 dig -c 49ae66eb8c27` found the lore submission at
`https://patch.msgid.link/20260511182217.226763-3-axboe@kernel.dk`.

Record: `b4 dig -a` found only v1 of the 3-patch series. `b4 dig -w`
showed recipients were Jens Axboe and `io-uring@vger.kernel.org`.

Record: the saved mbox contains the cover letter “[PATCHSET 0/3] Linked
request fix”, stating chain iteration must hold either `uring_lock` or
`timeout_lock`, and modification should be buttoned up. No replies,
NAKs, review tags, or stable nominations were present in the mbox.

Record: direct `WebFetch` of lore and stable search pages was blocked by
Anubis, so no web-side stable discussion could be verified.

## Phase 5: Code Semantic Analysis
Record: key functions are `io_link_timeout_fn()`,
`io_req_task_link_timeout()`, `__io_disarm_linked_timeout()`, and
`io_remove_next_linked()`.

Record: call/reachability tracing verified `IORING_OP_LINK_TIMEOUT` uses
`io_link_timeout_prep()` in `io_uring/opdef.c`; prep installs
`io_link_timeout_fn()` as the hrtimer callback, and linked timeouts are
queued on `ctx->ltimeout_list`.

Record: task_work runners in `io_uring/tw.c` execute callbacks while
holding `ctx->uring_lock` in normal, fallback, and local-work paths.
This verifies the deferred splice runs in a mutex-protected context.

Record: similar pattern search found the hrtimer callback was the unique
changed direct chain mutation site in this diff; the related series
covers other chain walking gaps.

## Phase 6: Stable Tree Analysis
Record: `git merge-base --is-ancestor` verified the old timeout split
commit exists in `v6.19.14` and `v6.6.140`; the candidate itself is not
in `v7.0.9` or `v6.19.14`.

Record: `git show`/`rg` verified the buggy `io_link_timeout_fn()`
pattern exists in `v7.0.9`, `v6.19.14`, `v6.15`, `v6.12.90`, `v6.6.140`,
and in older `v5.15` under `fs/io_uring.c`.

Record: `git diff 49ae^..49ae | git apply --check` succeeded on the
current `v7.0.9` checkout. Older trees have API/path differences such as
task_work signature and `spin_lock` vs `raw_spin_lock`, so they need
manual backporting.

## Phase 7: Subsystem Context
Record: subsystem is `io_uring`, a core async I/O userspace API.
Criticality is IMPORTANT: not universal like MM/VFS, but reachable from
userspace and widely used.

Record: `git log origin/master --oneline -20 -- io_uring` shows high
activity, including this linked-request locking series and other recent
fixes.

## Phase 8: Impact And Risk
Record: affected users are systems using io_uring linked requests with
`IORING_OP_LINK_TIMEOUT`.

Record: trigger is a timing race between linked-timeout hrtimer expiry
and other linked-chain completion/cancel paths; this is reachable from
userspace via io_uring submissions.

Record: verified failure class is unsynchronized linked-list/request-
chain mutation. No crash report was verified, but the protected object
is request-chain state, so the stability risk is request chain
corruption, wrong cancellation/completion, or follow-on memory lifetime
bugs.

Record: benefit is high for affected io_uring users because it closes a
real locking gap in request lifetime/chain handling. Risk is low-medium:
the patch is small, but should be backported with the adjacent locking
fixes, especially `a65855ec34aed`.

## Phase 9: Final Synthesis
Record: evidence for backporting: real race fix, userspace-reachable
io_uring path, single-file 16-line patch, authored by subsystem
maintainer, applies cleanly to `v7.0.9`, and the buggy pattern exists
across active stable/LTS tags checked.

Record: evidence against/concerns: no reporter/test tag, no explicit
stable tag, no verified crash trace, and the commit is part of a 3-patch
locking series; backporting only this patch without the follow-up
cancel-path lock change can leave the locking story incomplete.

Record: stable rules: obviously correct by code inspection with the
series context; fixes a real synchronization bug; important because it
affects request-chain mutation in a userspace API; small and contained;
no new feature/API; applies cleanly to `v7.0.9`, with older trees
needing backport adjustment.

Record: no automatic exception category applies; this is not a device
ID, quirk, DT, build, or documentation fix.

## Verification
- [Phase 1] `git show -s` confirmed subject, body, author, and absence
  of tags beyond Jens’s SOB.
- [Phase 2] `git show --patch 49ae66eb8c27` confirmed
  `io_uring/timeout.c` only, 14 insertions/2 deletions.
- [Phase 3] `git blame` confirmed relevant code history; `git describe
  --contains` placed `59915143e89f` around `v6.0-rc1` and `78967aabf613`
  around `v6.16-rc1`.
- [Phase 3] `git log` confirmed related commits `20c39819a276` and
  `a65855ec34ae`.
- [Phase 4] `b4 dig` found the exact patch submission and v1 3-patch
  series; saved mbox showed no review replies or stable nomination.
- [Phase 5] `rg` and `git show` traced `IORING_OP_LINK_TIMEOUT` prep,
  hrtimer setup, task_work execution, and task_work locking.
- [Phase 6] stable tag checks verified the buggy pattern exists in
  checked stable/LTS tags; `git apply --check` succeeded on current
  `v7.0.9`.
- [Phase 7] `MAINTAINERS` verified Jens Axboe maintains `IO_URING`.
- [Phase 8] failure mode is verified as a locking/race bug; concrete
  crash symptoms are UNVERIFIED.

The commit should be backported, preferably together with the adjacent
linked-request locking series commits needed for a complete invariant.

**YES**

 io_uring/timeout.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index e3815e3465dde..4ee1c21e1b15f 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -245,6 +245,10 @@ static struct io_kiocb *__io_disarm_linked_timeout(struct io_kiocb *req,
 	struct io_timeout *timeout = io_kiocb_to_cmd(link, struct io_timeout);
 
 	io_remove_next_linked(req);
+
+	/* If this is NULL, then timer already claimed it and will complete it */
+	if (!timeout->head)
+		return NULL;
 	timeout->head = NULL;
 	if (hrtimer_try_to_cancel(&io->timer) != -1) {
 		list_del(&timeout->list);
@@ -328,6 +332,14 @@ static void io_req_task_link_timeout(struct io_tw_req tw_req, io_tw_token_t tw)
 	int ret;
 
 	if (prev) {
+		/*
+		 * splice the linked timeout out of prev's chain if the regular
+		 * completion path didn't already do it.
+		 */
+		if (prev->link == req)
+			prev->link = req->link;
+		req->link = NULL;
+
 		if (!tw.cancel) {
 			struct io_cancel_data cd = {
 				.ctx		= req->ctx,
@@ -362,10 +374,10 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
 
 	/*
 	 * We don't expect the list to be empty, that will only happen if we
-	 * race with the completion of the linked work.
+	 * race with the completion of the linked work. Splice of prev is
+	 * done in io_req_task_link_timeout(), if needed.
 	 */
 	if (prev) {
-		io_remove_next_linked(prev);
 		if (!req_ref_inc_not_zero(prev))
 			prev = NULL;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] io_uring: validate user-controlled cq.head in io_cqe_cache_refill()
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: defer linked-timeout chain splice out of hrtimer context Sasha Levin
@ 2026-05-20 11:19 ` Sasha Levin
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: hold uring_lock across io_kill_timeouts() in cancel path Sasha Levin
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] io_uring/fdinfo: translate SqThread PID through caller's pid_ns Sasha Levin
  5 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:19 UTC (permalink / raw)
  To: patches, stable; +Cc: Zizhi Wo, Jens Axboe, Sasha Levin, io-uring, linux-kernel

From: Zizhi Wo <wozizhi@huawei.com>

[ Upstream commit f44d38a31f1802b7222adaea9ee69f9d280f698a ]

A fuzzing run reproduced an unkillable io_uring task stuck at ~100% CPU:

[root@fedora io_uring_stress]# ps -ef | grep io_uring
root  1240  1  99 13:36 ?  00:01:35 [io_uring_stress] <defunct>

The task loops inside io_cqring_wait() and never returns to userspace,
and SIGKILL has no effect.

This is caused by the CQ ring exposing rings->cq.head to userspace as
writable, while the authoritative tail lives in kernel-private
ctx->cached_cq_tail. io_cqe_cache_refill() computes free space as an
unsigned subtraction:

    free = ctx->cq_entries - min(tail - head, ctx->cq_entries);

If userspace keeps head within [0, tail], the subtraction is well
defined and min() just acts as a defensive clamp. But if userspace
advances head past tail, (tail - head) wraps to a huge value, free
becomes 0, and io_cqe_cache_refill() fails. The CQE is pushed onto the
overflow list and IO_CHECK_CQ_OVERFLOW_BIT is set.

The wait loop in io_cqring_wait() relies on an invariant: refill() only
fails when the CQ is *physically* full, in which case rings->cq.tail has
been advanced to iowq->cq_tail and io_should_wake() returns true. The
tampered head breaks this: refill() fails while the ring is not full, no
OCQE is copied in, rings->cq.tail never catches up, io_should_wake()
stays false, and io_cqring_wait_schedule() keeps returning early because
IO_CHECK_CQ_OVERFLOW_BIT is still set. The result is a tight retry loop
that never returns to userspace.

Introduce io_cqring_queued() as the single point that converts the
(tail, head) pair into a trustworthy queued count. Since the real
head/tail distance is bounded by cq_entries (far below 2^31), a signed
comparison reliably detects userspace moving head past tail; in that
case treat the queue as empty so callers see the full cache as free and
forward progress is preserved.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Link: https://patch.msgid.link/20260514021847.4062782-1-wozizhi@huaweicloud.com
[axboe: fixup commit message, kill 'queued' var, and keep it all in
io_uring.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Record: Subsystem `io_uring`; action verb `validate`; intent is to
validate a user-controlled CQ head value used by
`io_cqe_cache_refill()`.

Record: Tags found: `Suggested-by: Jens Axboe`, `Signed-off-by: Zizhi
Wo`, `Link: https://patch.msgid.link/20260514021847.4062782-1-
wozizhi@huaweicloud.com`, maintainer edit note from Jens, `Signed-off-
by: Jens Axboe`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-
by`, or `Cc: stable` tags were present.

Record: The commit describes a fuzzed, reproducible user-visible
failure: an `io_uring` task spins at about 100% CPU inside
`io_cqring_wait()`, never returns to userspace, and ignores `SIGKILL`.
Root cause is that userspace can write `rings->cq.head`; if it advances
`head` past the kernel-private `ctx->cached_cq_tail`, unsigned
subtraction wraps, `io_cqe_cache_refill()` sees no free space, overflow
stays set, and the wait loop keeps retrying.

Record: This is not hidden cleanup. It is an explicit bug fix for a
userspace-triggerable livelock/unkillable task.

## Phase 2: Diff Analysis
Record: One file changed: `io_uring/io_uring.c`, 17 insertions and 5
deletions. Modified/added functions: new `io_cqring_queued()`, modified
`io_fill_nop_cqe()`, modified `io_cqe_cache_refill()`. Scope is a
single-file surgical fix.

Record: Before, free CQ space was computed as `ctx->cq_entries -
min(__io_cqring_events(ctx), ctx->cq_entries)`, where
`__io_cqring_events()` is `cached_cq_tail - user_head`. If `user_head >
cached_cq_tail`, that unsigned subtraction wraps and is clamped to
`cq_entries`, making `free` zero.

Record: After, `io_cqring_queued()` casts the tail-head difference to
signed `int`; non-negative values are clamped to `cq_entries`, while
negative values are treated as zero queued entries. `io_fill_nop_cqe()`
uses the same trusted queued-count helper.

Record: Bug category is logic/correctness with user-controlled index
validation failure, causing an overflow-path livelock. It is not a
feature, API, refactor, or hardware enablement.

Record: Fix quality is good: for valid rings it preserves existing
behavior; for invalid `head > tail` it chooses forward progress.
Regression risk is low because the helper is local and affects only CQ
free-space calculation. The only semantic change is for corrupted user
CQ head state.

## Phase 3: Git History Investigation
Record: `git blame` shows the affected free-space calculation in
`io_cqe_cache_refill()` comes from `faf88dde060f74` (`io_uring: don't
inline __io_get_cqe()`), first contained in `v6.0-rc1~181^2~85`. The
overflow ordering guard comes from `aa1df3a360a0c5` (`io_uring: fix CQE
reordering`), first contained in `v6.1-rc1~135^2~10`. The later
`cqe32`/NOP path comes from `e26dca67fde19`, first contained in
`v6.18-rc1~137^2~45`.

Record: No `Fixes:` tag is present, so there was no tagged introducing
commit to follow.

Record: Recent file history shows multiple `io_uring` fixes around
CQ/ring handling, including `61a11cf481272` protecting lockless
`ctx->rings` accesses and `a7d755ed9ce97` fixing overflow CQE
reordering. No prerequisite specific to this helper was identified.

Record: Author Zizhi Wo has other kernel commits, but no recent local
`io_uring` commits found. Jens Axboe is the `IO_URING` maintainer in
`MAINTAINERS` and applied the final patch with edits.

Record: Dependencies: the fix depends only on existing
`ctx->cached_cq_tail`, `ctx->cq_entries`, `READ_ONCE(rings->cq.head)`,
and `min()`. It can be backported standalone, though older stable trees
need context adjustment because the exact function signature and file
layout differ.

## Phase 4: Mailing List And External Research
Record: `b4 dig -c f44d38a31f1802b7222adaea9ee69f9d280f698a` found the
original v2 submission at `https://patch.msgid.link/20260514021847.40627
82-1-wozizhi@huaweicloud.com`.

Record: `b4 dig -a` found v1 and v2. v1 was
`20260513063254.1122354-1-wozizhi@huaweicloud.com`; v2 was the submitted
version that matches the final fix concept. Jens reviewed v1 and said
snapshotting `tail` before a possible NOP fill looked wrong, and noted
the refill path had the same unsigned issue. v2 addressed this by
introducing a helper used by both paths.

Record: `b4 dig -w` showed the right recipients: Jens Axboe, Pavel
Begunkov, `io-uring@vger.kernel.org`, `linux-kernel@vger.kernel.org`,
and related Huawei contacts.

Record: The v2 mbox shows Jens applied it and then further edited it by
moving the helper into `io_uring.c`, removing the now-unused `queued`
variable, and trimming the comments/message. No NAK was found. No stable
nomination was found in the fetched thread.

Record: WebFetch access to lore search pages and git.kernel.org was
blocked by Anubis, so stable-list web search could not be verified
through WebFetch. Local `git log --grep` on sampled stable branches
found no existing exact stable commit.

## Phase 5: Code Semantic Analysis
Record: Key functions: `io_cqring_queued()`, `io_fill_nop_cqe()`,
`io_cqe_cache_refill()`.

Record: Callers: `io_cqe_cache_refill()` is called by
`io_get_cqe_overflow()` in `io_uring/io_uring.h`, which feeds normal CQE
posting, auxiliary CQEs, request completions, multishot completions,
message-ring completions, and overflow flushing. `io_cqring_wait()` is
reached from `SYSCALL_DEFINE6(io_uring_enter)` when
`IORING_ENTER_GETEVENTS` is used.

Record: Callees/side effects: the affected code reads the user-writable
CQ head, computes queue occupancy/free space, sets
`ctx->cqe_cached`/`ctx->cqe_sentinel`, and decides whether completions
go directly to the CQ ring or the overflow list.

Record: Reachability is verified from userspace through
`io_uring_enter()`. The provided reproduction ran as root; unprivileged
triggerability was not independently verified, but the affected state is
controlled by the userspace owner of the mmaped CQ ring.

Record: Similar pattern found: `__io_cqring_events()` in current code
and stable branches computes `cached_cq_tail - READ_ONCE(cq.head)`, so
the unsigned wrap condition is real in the relevant code paths.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Record: The buggy free-space logic exists in sampled stable trees:
`stable/linux-6.1.y` has it in `__io_get_cqe()`, and
`stable/linux-6.6.y`, `stable/linux-6.12.y`, `stable/linux-6.18.y`, and
`stable/linux-6.19.y` have it in `io_cqe_cache_refill()` or equivalent.
The specific min/free logic was introduced for v6.0-rc1, so v6.1+ stable
trees are affected.

Record: `stable/linux-5.15.y` has an older `io_get_cqe()` form using
`__io_cqring_events(ctx) == ctx->cq_entries`, not the same `min(tail -
head, cq_entries)` free-space calculation. I did not verify that the
exact livelock fixed here applies to 5.15, so this decision is driven by
verified v6.1+ evidence.

Record: Expected backport difficulty: low to moderate. 6.18/6.19 are
close but may lack the exact split into `wait.c`/`wait.h` seen in
current 7.0; 6.6/6.12 need a smaller adaptation because there is no
`cqe32`/NOP path; 6.1 needs the helper folded into the older
`__io_get_cqe()` path. The semantic fix is standalone.

Record: No related fix already present was found by exact subject search
in sampled stable branches.

## Phase 7: Subsystem And Maintainer Context
Record: Subsystem is `io_uring`, a core async I/O userspace API.
Criticality is IMPORTANT/CORE-adjacent because it is syscall reachable
and used by databases, storage/network software, runtimes, and fuzzers.

Record: Subsystem activity is high: recent local history shows many
`io_uring` fixes and refactors. The patch was handled by Jens Axboe,
listed maintainer for `IO_URING`.

## Phase 8: Impact And Risk Assessment
Record: Affected users are systems using `io_uring`; trigger requires a
userspace process manipulating its CQ head and waiting for completions.
The reproduction is a fuzzing/stress case with direct userspace control
of the mapped CQ ring.

Record: Trigger likelihood is not “everyday normal app behavior”, but it
is syscall/userspace reachable and can create an unkillable high-CPU
task. Unprivileged triggerability was not independently verified beyond
normal `io_uring` userspace reachability.

Record: Failure mode is HIGH severity: livelock/tight retry loop, 100%
CPU, no return to userspace, and `SIGKILL` ineffective per the commit
and mailing-list patch.

Record: Benefit is high for affected stable trees because it prevents a
userspace-triggered unkillable task. Risk is low because the change is
small, local, and only changes behavior for invalid user-controlled CQ
head state. Risk/benefit strongly favors backporting.

## Phase 9: Final Synthesis
Record: Evidence for backporting: real fuzzed bug; clear root cause;
userspace-reachable path; severe livelock/unkillable task; small local
fix; maintainer-reviewed evolution from v1 to v2; final maintainer-
applied version; verified affected code in v6.1+ stable branches.

Record: Evidence against backporting: no explicit `Cc: stable`, no
`Fixes:` tag, no `Tested-by`, and exact patch may need small branch-
specific backport adjustments. These are not enough to outweigh the
verified bug severity and small fix.

Record: Unresolved questions: exact applicability to 5.15 was not
established; exact clean-apply status on each stable branch was not
tested; unprivileged triggerability beyond ordinary userspace `io_uring`
access was not independently proven.

Stable rules checklist:
1. Obviously correct and tested: mostly yes by inspection and maintainer
   review; no explicit `Tested-by` and no local runtime test.
2. Fixes a real bug: yes, fuzzed livelock/unkillable task.
3. Important issue: yes, high-severity CPU spin and unkillable wait.
4. Small and contained: yes, one file, 17 insertions and 5 deletions.
5. No new feature/API: yes, static helper only.
6. Can apply to stable: yes with likely minor backport adjustments for
   older branches.

Exception category: none; this is a direct bug fix, not a device ID,
quirk, DT, build, or documentation exception.

Decision: backport to affected stable trees, especially v6.1+ where the
buggy free-space calculation was verified. Avoid claiming 5.15 without a
separate targeted analysis/backport.

## Verification
- [Phase 1] Parsed commit object
  `f44d38a31f1802b7222adaea9ee69f9d280f698a` with `git show`; confirmed
  subject, tags, and 17/5 diffstat.
- [Phase 2] Inspected the candidate diff with `git show`; confirmed new
  `io_cqring_queued()` and replacements in `io_fill_nop_cqe()` and
  `io_cqe_cache_refill()`.
- [Phase 3] Ran `git blame` on affected lines; confirmed key code came
  from `faf88dde060f74`, `aa1df3a360a0c5`, and `e26dca67fde19`.
- [Phase 3] Ran `git describe --contains`; confirmed first containment
  around v6.0-rc1, v6.1-rc1, and v6.18-rc1 respectively.
- [Phase 4] Ran `b4 dig -c`, `b4 dig -a`, and `b4 dig -w`; confirmed
  v1/v2 patch history, recipients, and maintainer involvement.
- [Phase 4] Fetched v1/v2 mboxes with `b4 mbox`; confirmed Jens’ v1
  concern and v2 application with edits.
- [Phase 5] Used `rg` and file reads to trace `io_cqe_cache_refill()`
  through CQE posting and `io_uring_enter()`/`IORING_ENTER_GETEVENTS`.
- [Phase 6] Checked stable branches with `git blame`; verified affected
  logic in sampled v6.1, v6.6, v6.12, v6.18, and v6.19 branches.
- [Phase 7] Checked `MAINTAINERS`; verified Jens Axboe is listed
  maintainer for `IO_URING`.
- [Phase 8] Verified failure mode from commit message and mailing-list
  patch body; did not independently run the fuzzer or reproducer.
- UNVERIFIED: exact clean apply on every stable tree, exact
  applicability to 5.15, and unprivileged triggerability.

**YES**

 io_uring/io_uring.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a72efb3a62bac..431d157e81595 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -680,13 +680,27 @@ static struct io_overflow_cqe *io_alloc_ocqe(struct io_ring_ctx *ctx,
 	return ocqe;
 }
 
+/*
+ * Compute queued CQEs for free-space calculation, clamped to cq_entries.
+ */
+static unsigned int io_cqring_queued(struct io_ring_ctx *ctx)
+{
+	struct io_rings *rings = io_get_rings(ctx);
+	int diff;
+
+	diff = (int)(ctx->cached_cq_tail - READ_ONCE(rings->cq.head));
+	if (diff >= 0)
+		return min((unsigned int)diff, ctx->cq_entries);
+	return 0;
+}
+
 /*
  * Fill an empty dummy CQE, in case alignment is off for posting a 32b CQE
  * because the ring is a single 16b entry away from wrapping.
  */
 static bool io_fill_nop_cqe(struct io_ring_ctx *ctx, unsigned int off)
 {
-	if (__io_cqring_events(ctx) < ctx->cq_entries) {
+	if (io_cqring_queued(ctx) < ctx->cq_entries) {
 		struct io_uring_cqe *cqe = &ctx->rings->cqes[off];
 
 		cqe->user_data = 0;
@@ -707,7 +721,7 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32)
 {
 	struct io_rings *rings = ctx->rings;
 	unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1);
-	unsigned int free, queued, len;
+	unsigned int free, len;
 
 	/*
 	 * Posting into the CQ when there are pending overflowed CQEs may break
@@ -727,9 +741,7 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32)
 		off = 0;
 	}
 
-	/* userspace may cheat modifying the tail, be safe and do min */
-	queued = min(__io_cqring_events(ctx), ctx->cq_entries);
-	free = ctx->cq_entries - queued;
+	free = ctx->cq_entries - io_cqring_queued(ctx);
 	/* we need a contiguous range, limit based on the current array offset */
 	len = min(free, ctx->cq_entries - off);
 	if (len < (cqe32 + 1))
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0] io_uring: hold uring_lock across io_kill_timeouts() in cancel path
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] io_uring: validate user-controlled cq.head in io_cqe_cache_refill() Sasha Levin
@ 2026-05-20 11:19 ` Sasha Levin
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] io_uring/fdinfo: translate SqThread PID through caller's pid_ns Sasha Levin
  5 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:19 UTC (permalink / raw)
  To: patches, stable; +Cc: Jens Axboe, Sasha Levin, io-uring, linux-kernel

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit a65855ec34aed84e1e5b4aea0323cc1745f83a5c ]

io_uring_try_cancel_requests() dropped ctx->uring_lock before calling
io_kill_timeouts(), which walks each timeout's link chain via
io_match_task() to test REQ_F_INFLIGHT. With chain mutation now
serialized by ctx->uring_lock, that walk needs the lock too.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Record 1.1: Subsystem `io_uring`; action verb `hold`; intent is to keep
`ctx->uring_lock` held while `io_kill_timeouts()` walks timeout link
chains in the cancel path.

Record 1.2: Tags present: `Signed-off-by: Jens Axboe <axboe@kernel.dk>`.
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Acked-by:`,
`Link:`, or `Cc: stable@vger.kernel.org` tags were present in the commit
message I verified from upstream commit
`a65855ec34aed84e1e5b4aea0323cc1745f83a5c`.

Record 1.3: The commit body describes a locking bug:
`io_uring_try_cancel_requests()` dropped `ctx->uring_lock` before
`io_kill_timeouts()`, but `io_kill_timeouts()` calls `io_match_task()`
and walks linked requests to inspect `REQ_F_INFLIGHT`. The root cause
stated by the author is that after linked-chain mutation is serialized
by `ctx->uring_lock`, this read-side traversal also needs that lock. No
crash log, reproducer, affected kernel version, or user report is
included.

Record 1.4: This is a hidden synchronization bug fix, despite the
subject not saying “fix”. It changes lock coverage around an existing
linked-list traversal and matches a race-condition pattern.

## Phase 2: Diff Analysis
Record 2.1: One file changed: `io_uring/cancel.c`, 1 insertion and 1
deletion. One function changed: `io_uring_try_cancel_requests()`. Scope
is a single-file, single-hunk surgical locking fix.

Record 2.2: Before: `ctx->uring_lock` was unlocked after canceling
deferred files, poll, waitid, futex, and uring_cmd requests, then
`io_kill_timeouts()` ran unlocked. After: `io_kill_timeouts()` runs
before unlocking `ctx->uring_lock`. The affected path is cancellation
during io_uring task/ring teardown, including exit/exec/SQPOLL/ring-exit
paths verified in callers.

Record 2.3: Bug category is synchronization/race condition. The specific
mechanism is an unlocked traversal of a linked request chain in
`io_kill_timeouts()`/`io_match_task()` while related chain mutation is
intended to be serialized by `ctx->uring_lock`.

Record 2.4: Fix quality is high if applied with its series dependency:
it is minimal, changes no data structures or APIs, and only extends an
already-held mutex over one additional cancel helper. Regression risk is
low but not zero because it extends lock scope over code that takes
`completion_lock` and `timeout_lock`; this risk is mitigated by patch
2/3 moving linked-timeout chain splicing out of hrtimer context.

## Phase 3: Git History Investigation
Record 3.1: `git blame` on current `io_uring/cancel.c` shows the old
unlock-before-`io_kill_timeouts()` code came from `ffce324364318`
(`io_uring/cancel: move cancelation code from io_uring.c to cancel.c`),
first contained in `v6.19`. The timeout chain walk in `io_match_task()`
was introduced by `59915143e89f`, first contained in `v6.0`.

Record 3.2: No `Fixes:` tag is present, so there was no Fixes target to
follow.

Record 3.3: Recent history shows this commit follows `49ae66eb8c273`
(`io_uring: defer linked-timeout chain splice out of hrtimer context`)
and is part of the same linked-request locking series. Recent current-
branch churn in these files is low: current `HEAD` after `v7.0` has only
`93a9caab11350` touching these files.

Record 3.4: Jens Axboe is listed in `MAINTAINERS` as the `IO_URING`
maintainer and has extensive recent io_uring commits in local history.
This is maintainer-authored.

Record 3.5: Dependency found: upstream parent `49ae66eb8c273` is patch
2/3, and `20c39819a276` is patch 1/3. The candidate’s rationale
explicitly depends on patch 2/3’s serialization change. I verified the
full 3-patch series applies cleanly to the current tree.

## Phase 4: Mailing List And External Research
Record 4.1: `b4 dig -c a65855ec34ae...` found the original patch at
`https://patch.msgid.link/20260511182217.226763-4-axboe@kernel.dk`. Lore
mirror confirms it was `[PATCH 3/3]` in `[PATCHSET 0/3] Linked request
fix`. `b4 dig -a` found only v1; no newer revision was found.

Record 4.2: `b4 dig -w` showed recipients were Jens Axboe and `io-
uring@vger.kernel.org`. No separate reviewer/acked/tested tags were
found.

Record 4.3: No `Reported-by` or bug-report `Link` tag exists. I found no
syzbot, bugzilla, or user report for this exact commit.

Record 4.4: Related patches are patch 1/3 (`20c39819a276`, hold
`uring_lock` in `io_wq_free_work()`) and patch 2/3 (`49ae66eb8c273`,
defer linked-timeout splice out of hrtimer context). The series cover
letter says it closes gaps where iterating a chain must hold either
`uring_lock` or `timeout_lock`, and modifying an existing chain must
hold both.

Record 4.5: Stable-list search was limited by lore.kernel.org bot
protection, and web search did not find stable-specific discussion for
this exact commit. No stable-specific objection was found.

## Phase 5: Code Semantic Analysis
Record 5.1: Modified function: `io_uring_try_cancel_requests()`.

Record 5.2: Callers verified: `io_ring_exit_work()` calls
`io_uring_try_cancel_requests(ctx, NULL, true, false)` during ring exit;
`io_uring_cancel_generic()` calls it during task cancellation;
`sqpoll.c` calls `io_uring_cancel_generic(true, sqd)` for SQPOLL
shutdown; `fs/exec.c` reaches this via `io_uring_task_cancel()`;
`kernel/exit.c` reaches it via `io_uring_files_cancel()`.

Record 5.3: Key callees around the fix: `io_cancel_defer_files()`,
`io_poll_remove_all()`, `io_waitid_remove_all()`,
`io_futex_remove_all()`, `io_uring_try_cancel_uring_cmd()`, then
`io_kill_timeouts()`. `io_kill_timeouts()` takes `completion_lock` and
`timeout_lock`, iterates `ctx->timeout_list`, calls `io_match_task()`,
and flushes killed timeouts.

Record 5.4: Reachability is verified from userspace lifecycle
operations: io_uring rings/requests can reach cancellation via process
exit, exec, SQPOLL thread shutdown, or ring teardown. Whether
unprivileged users can create io_uring instances on a given deployment
depends on config/sysctl and was not separately verified.

Record 5.5: Similar patterns found: nearby cancel walkers such as
`io_cancel_remove_all()` and `io_poll_remove_all()` assert or run under
`ctx->uring_lock`; `io_match_task_safe()` exists to protect linked-
timeout walks, and patch 1/3 fixes another unlocked link-chain walk in
`io_wq_free_work()`.

## Phase 6: Stable Tree Analysis
Record 6.1: The exact pre-fix `io_uring/cancel.c` pattern exists in
local `v6.19`, `v7.0`, and current `HEAD`. `v6.18` does not have this
exact `io_kill_timeouts()` call in `io_uring/cancel.c`. The refactor
commit `ffce324364318` is an ancestor of `v6.19`, `v7.0`, and `HEAD`,
but not `v6.18`.

Record 6.2: Backport difficulty is low for `v6.19+` style trees: `git
apply --check` succeeded for the candidate alone and for the full
3-patch series on the current tree.

Record 6.3: No alternate stable fix for this exact locking gap was found
in local history or web search.

## Phase 7: Subsystem Context
Record 7.1: Subsystem is `io_uring`, a core async I/O subsystem
reachable through userspace syscalls when enabled. Criticality is
IMPORTANT to CORE depending on deployment, because it affects process
exit/exec and ring teardown correctness for io_uring users.

Record 7.2: Subsystem activity is high; recent local history shows many
io_uring changes by Jens Axboe and others. This patch was pulled into
Linus’ tree for `v7.1-rc4` as part of io_uring fixes.

## Phase 8: Impact And Risk
Record 8.1: Affected users are systems with `CONFIG_IO_URING` and
workloads using linked io_uring requests/timeouts, especially during
cancellation/teardown paths.

Record 8.2: Trigger conditions are linked request/timeouts plus
cancellation paths such as exit, exec, SQPOLL shutdown, or ring exit.
The exact race timing was not reproduced here.

Record 8.3: Failure mode is a locking/data-race hazard on linked
request-chain traversal. No crash report is verified, so I rate severity
as MEDIUM-HIGH rather than proven CRITICAL: cancellation races in
io_uring can lead to missed cancellation or unsafe traversal, but this
specific commit message does not document an observed oops/UAF.

Record 8.4: Benefit is high when backporting the linked-request locking
series, because it completes the lock invariant introduced by patch 2/3.
Risk is low: 1-line lock-scope adjustment, no new API, no feature, no
data structure change. Risk rises if cherry-picked without understanding
the series, so it should be queued with `20c39819a276` and
`49ae66eb8c273`.

## Phase 9: Final Synthesis
Record 9.1: Evidence for backporting: real synchronization bug;
maintainer-authored; included in an upstream fixes pull; tiny and
contained; applies cleanly; affects userspace-reachable cancellation
paths; needed to complete a 3-patch linked-chain locking invariant.
Evidence against: no reported crash/reproducer; patch is part 3/3 and
should not be treated as an isolated standalone semantic fix; older
stable trees before the `cancel.c` refactor need separate backport
analysis.

Record 9.2: Stable rules checklist: obviously correct and tested by
upstream integration: yes, with dependency caveat. Fixes a real bug:
yes, a verified locking race/gap. Important issue: yes enough for
stable, because it is a race in io_uring linked request cancellation,
though no crash is documented. Small and contained: yes, 1 insertion/1
deletion in one function. No new features/APIs: yes. Can apply to
stable: yes for current `v6.19+` style trees; full series apply-check
passed on this tree.

Record 9.3: No automatic exception category applies; this is not a
device ID, quirk, DT, build, or documentation fix.

Record 9.4: Decision: backport, but queue it with the preceding linked-
request locking patches, especially `49ae66eb8c273`, because this
commit’s locking rationale depends on that series invariant.

## Verification
- [Phase 1] Verified upstream commit
  `a65855ec34aed84e1e5b4aea0323cc1745f83a5c` message and tags via GitHub
  API and Gitiles.
- [Phase 2] Verified diff is 1 insertion/1 deletion in
  `io_uring/cancel.c`, moving `mutex_unlock(&ctx->uring_lock)` after
  `io_kill_timeouts()`.
- [Phase 3] Ran `git blame` on `io_uring/cancel.c` and
  `io_uring/timeout.c`; identified `ffce324364318`, `59915143e89f`,
  `6971253f0787`, and `a9c83a0ab66a` as relevant historical commits.
- [Phase 3] Verified containing tags: `ffce324364318` present from
  `v6.19`; candidate and series commits first contained in `v7.1-rc4`.
- [Phase 4] Ran `b4 dig -c`, `-a`, and `-w`; found the lore message ID,
  v1-only series, and original recipients.
- [Phase 4] Fetched lore.gnuweeb mirror for patch 0/3, 1/3, 2/3, and
  3/3; confirmed series context and dependency.
- [Phase 5] Used code search and file reads to trace callers from
  `kernel/exit.c`, `fs/exec.c`, `sqpoll.c`, `io_ring_exit_work()`, and
  `io_uring_cancel_generic()`.
- [Phase 6] Checked `v6.18`, `v6.19`, `v7.0`, and `HEAD` for the exact
  code pattern; verified current tree and `v6.19+` have the old unlock-
  before-timeout call.
- [Phase 6] Ran `git apply --check` for the candidate and the full
  3-patch series; both apply cleanly to the current tree.
- [Phase 7] Verified `MAINTAINERS` lists Jens Axboe as `IO_URING`
  maintainer.
- [Phase 8] Verified the failure class from actual code paths and series
  discussion; no runtime reproducer or observed crash was found.
- UNVERIFIED: Whether older pre-`v6.19` stable trees have an equivalent
  bug in the pre-refactor `io_uring.c` layout.
- UNVERIFIED: Any stable-list discussion, because lore.kernel.org/stable
  fetch was blocked and web search found no exact stable discussion.

**YES**

 io_uring/cancel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 65e04063e343b..1d8928c829b61 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -554,8 +554,8 @@ __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 	ret |= io_waitid_remove_all(ctx, tctx, cancel_all);
 	ret |= io_futex_remove_all(ctx, tctx, cancel_all);
 	ret |= io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all);
-	mutex_unlock(&ctx->uring_lock);
 	ret |= io_kill_timeouts(ctx, tctx, cancel_all);
+	mutex_unlock(&ctx->uring_lock);
 	if (tctx)
 		ret |= io_run_task_work() > 0;
 	else
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] io_uring/fdinfo: translate SqThread PID through caller's pid_ns
       [not found] <20260520111944.3424570-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: hold uring_lock across io_kill_timeouts() in cancel path Sasha Levin
@ 2026-05-20 11:19 ` Sasha Levin
  5 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-20 11:19 UTC (permalink / raw)
  To: patches, stable
  Cc: Maoyi Xie, Maoyi Xie, Jens Axboe, Sasha Levin, io-uring,
	linux-kernel

From: Maoyi Xie <maoyixie.tju@gmail.com>

[ Upstream commit 3799c2570982577551023ae035f5a786cf39a76e ]

SQPOLL stores current->pid (init_pid_ns view) in sqd->task_pid
at thread creation. fdinfo prints it raw via
seq_printf("SqThread:\t%d\n", sq_pid). A reader inside a
non-initial pid_ns sees the host PID, not the kthread's PID in
the reader's own pid_ns.

The SQPOLL kthread is created with CLONE_THREAD and no
CLONE_NEW*, so it lives in the submitter's pid_ns. An
unprivileged user_ns + pid_ns submitter can read fdinfo and
learn the host PID of a kthread whose in-namespace PID is
different.

Reproducer (mainline 7.0, KASAN): unshare CLONE_NEWUSER |
CLONE_NEWPID | CLONE_NEWNS, mount a private /proc, then have a
grandchild that is pid 1 in the new pid_ns open an io_uring
ring with IORING_SETUP_SQPOLL. /proc/self/task lists {1, 2};
the SQPOLL kthread is pid 2. Before: fdinfo prints
SqThread = <host pid>. After: SqThread = 2.

Use task_pid_nr_ns() against the proc inode's pid_ns to compute
sq_pid, instead of reading the stored sq->task_pid (which holds
the init_pid_ns view). pidfd_show_fdinfo() in kernel/pid.c
follows the same pattern.

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260510084119.457578-1-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Phase 1, commit message forensics:
- Record 1.1: subsystem `io_uring/fdinfo`; action verb `translate`;
  intent is to report `SqThread` in the proc fdinfo reader’s PID
  namespace.
- Record 1.2: tags are `Signed-off-by: Maoyi Xie
  <maoyi.xie@ntu.edu.sg>`, `Link: https://patch.msgid.link/2026051008411
  9.457578-1-maoyi.xie@ntu.edu.sg`, and `Signed-off-by: Jens Axboe
  <axboe@kernel.dk>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
  `Reviewed-by:`, `Acked-by:`, or `Cc: stable`.
- Record 1.3: the bug is a namespace information leak: `SqThread`
  reports the init-namespace/host PID to a reader inside a non-initial
  PID namespace. The message includes a concrete reproducer using
  unprivileged user/pid/mount namespaces and an SQPOLL ring.
- Record 1.4: this is not hidden cleanup; it is an explicit namespace
  correctness and information disclosure fix.

Phase 2, diff analysis:
- Record 2.1: one file, `io_uring/fdinfo.c`, with 2 insertions and 1
  deletion in `__io_uring_show_fdinfo()`. Scope is a single-function
  surgical fix.
- Record 2.2: before, fdinfo used stored `sq->task_pid`; after, it
  computes `sq_pid = task_pid_nr_ns(tsk,
  proc_pid_ns(file_inode(m->file)->i_sb))`.
- Record 2.3: bug category is logic/security namespace translation. The
  broken value was a raw task PID; the fix translates the live SQPOLL
  task into the proc fdinfo file’s PID namespace.
- Record 2.4: fix quality is high: minimal, uses existing helpers, keeps
  the existing task lifetime protection, and follows the verified
  `pidfd_show_fdinfo()` pattern. Regression risk is very low; host/init
  namespace output remains equivalent.

Phase 3, git history:
- Record 3.1: blame shows the current `sq_pid = sq->task_pid` line last
  touched by `606559dc4fa36a`, while the semantic change to store/print
  `sq->task_pid` came from `a0d45c3f596be`, first contained around
  `v6.7-rc2`.
- Record 3.2: no `Fixes:` tag is present, so there was no tagged
  introducing commit to follow.
- Record 3.3: recent `io_uring/fdinfo.c` history includes multiple
  fdinfo correctness fixes, including SQPOLL lifetime/UAF fixes and SQE
  display fixes. No prerequisite series was found for this patch.
- Record 3.4: local history shows no other `Maoyi Xie` commits under
  `io_uring`; `Jens Axboe` is the listed `IO_URING` maintainer and
  committed/applied the patch.
- Record 3.5: dependencies `task_pid_nr_ns()` and `proc_pid_ns()` exist
  in relevant stable branches checked. The patch applies cleanly to
  `p-6.12`, `p-6.18`, `p-6.19`, and `p-7.0`.

Phase 4, mailing list research:
- Record 4.1: `b4 dig -c 3799c2570982577551023ae035f5a786cf39a76e` found
  the lore thread at the supplied patch.msgid link. `b4 dig -a` found
  only v1.
- Record 4.2: original recipients included Jens Axboe, Pavel Begunkov,
  `io-uring@vger.kernel.org`, and `linux-kernel@vger.kernel.org`.
- Record 4.3: no separate bug-report link or reporter tag was present;
  the bug evidence is the commit’s reproducer.
- Record 4.4: no multi-patch series or related required patches were
  found by b4.
- Record 4.5: no stable-specific discussion was verified. WebFetch hit
  Anubis protection; web search did not produce usable stable discussion
  for this exact patch.

Phase 5, semantic analysis:
- Record 5.1: modified function is `__io_uring_show_fdinfo()`.
- Record 5.2: caller chain is `/proc/*/fdinfo` read in `fs/proc/fd.c` ->
  `file->f_op->show_fdinfo()` -> `io_uring_show_fdinfo()` ->
  `__io_uring_show_fdinfo()`.
- Record 5.3: relevant callees are `rcu_dereference()`,
  `get_task_struct()`, `io_sq_cpu_usec()`, `task_pid_nr_ns()`,
  `proc_pid_ns()`, and `seq_printf()`.
- Record 5.4: reachable from userspace by creating an
  `IORING_SETUP_SQPOLL` ring and reading `/proc/self/fdinfo/<fd>`.
  Current code has global `io_uring_allowed()` gating, but no SQPOLL-
  specific capability check was found in the flag validation path.
- Record 5.5: similar verified pattern exists in `pidfd_show_fdinfo()`,
  which derives the namespace from `file_inode(m->file)->i_sb`.

Phase 6, stable tree analysis:
- Record 6.1: `p-6.12`, `p-6.18`, `p-6.19`, and `p-7.0` contain the
  exact `sq->task_pid` fdinfo pattern. `p-6.6` also contains
  `sq->task_pid`; `p-6.1` uses `task_pid_nr(sq->thread)`, which also
  returns `tsk->pid` in the checked header. `p-5.10` and `p-5.15` did
  not show `SqThread` fdinfo matches in checked paths.
- Record 6.2: exact patch applies cleanly to `p-6.12+` branches checked.
  Older `p-6.1`/`p-6.6` need backport adjustment due code shape
  differences.
- Record 6.3: no existing stable fix for this specific namespace
  translation was found in checked stable branch code.

Phase 7, subsystem context:
- Record 7.1: subsystem is `io_uring`, an important syscall-facing
  kernel I/O subsystem; affected surface is SQPOLL fdinfo, not the core
  I/O data path.
- Record 7.2: file history shows active fdinfo development and recent
  fixes. Maintainer metadata verifies Jens Axboe as `IO_URING`
  maintainer.

Phase 8, impact and risk:
- Record 8.1: affected users are container/pid-namespace users who
  create SQPOLL io_uring rings and read proc fdinfo.
- Record 8.2: trigger is straightforward where io_uring is allowed:
  create SQPOLL in non-init pid namespace and read fdinfo. It is not
  timing-dependent.
- Record 8.3: failure mode is host PID information disclosure and pid
  namespace isolation violation. Severity is medium
  security/correctness, not crash/corruption.
- Record 8.4: benefit is meaningful for namespace isolation; risk is
  very low due a 3-line localized proc-output fix.

Phase 9, synthesis:
- Record 9.1: evidence for backporting: real reproducer, unprivileged
  namespace information leak, tiny fix, maintainer-applied, uses
  established proc namespace helper pattern, clean applies to multiple
  stable branches. Evidence against: no crash/data corruption; older
  stable branches need adapted backports.
- Record 9.2: stable rules: obviously correct yes; fixes real user-
  visible bug yes; important enough as security/isolation info leak yes;
  small and contained yes; no new feature/API yes; applies cleanly to
  checked `p-6.12+`, with older-tree rework as noted.
- Record 9.3: no automatic exception category applies.
- Record 9.4: decision is to backport.

## Verification
- Phase 1: `git show --format=fuller --patch
  3799c2570982577551023ae035f5a786cf39a76e` verified the commit message,
  trailers, and 3-line diff.
- Phase 2: local `io_uring/fdinfo.c` read verified the pre-patch
  `sq->task_pid` fdinfo output and task reference context.
- Phase 3: `git blame`, `git show a0d45c3f596be`, `git describe
  --contains`, and file logs verified history and first-release context.
- Phase 4: `b4 dig -c`, `b4 dig -a`, `b4 dig -w`, and `b4 mbox` verified
  the lore thread, v1-only submission, recipients, and Jens “Applied,
  thanks” reply with commit `3799c257...`.
- Phase 5: reads of `fs/proc/fd.c`, `io_uring/io_uring.c`,
  `io_uring/sqpoll.c`, `kernel/fork.c`, `include/linux/pid.h`,
  `kernel/pid.c`, and `fs/pidfs.c` verified reachability, helper
  semantics, SQPOLL creation flags, and the pidfd fdinfo pattern.
- Phase 6: checked `p-6.1`, `p-6.6`, `p-6.12`, `p-6.18`, `p-6.19`,
  `p-7.0`, and `stable/linux-7.0.y` code; worktree `git apply --check`
  verified clean application to `p-6.12`, `p-6.18`, `p-6.19`, and
  `p-7.0`.
- Unverified: no kernel build or runtime reproducer was run; stable-list
  discussion could not be verified because direct lore WebFetch was
  blocked and search found no usable exact stable thread.

This is stable material: it fixes a concrete namespace information leak
with a tiny, conventional, low-risk change.

**YES**

 io_uring/fdinfo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index c2d3e45544bb4..001fb542dc11a 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -190,8 +190,9 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 			get_task_struct(tsk);
 			rcu_read_unlock();
 			usec = io_sq_cpu_usec(tsk);
+			sq_pid = task_pid_nr_ns(tsk,
+						proc_pid_ns(file_inode(m->file)->i_sb));
 			put_task_struct(tsk);
-			sq_pid = sq->task_pid;
 			sq_cpu = sq->sq_cpu;
 			sq_total_time = usec;
 			sq_work_time = sq->work_time;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
@ 2026-05-20 11:40   ` Jens Axboe
  2026-05-23 14:23     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2026-05-20 11:40 UTC (permalink / raw)
  To: Sasha Levin, patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring, linux-kernel

On 5/20/26 5:18 AM, Sasha Levin wrote:
> From: Maoyi Xie <maoyixie.tju@gmail.com>
> 
> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
> 
> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
> timespec from the caller via ext_arg->ts. It arms an ABS mode
> hrtimer in __io_cqring_wait_schedule(). The conversion path in
> io_uring/wait.c parses ext_arg->ts inline rather than going
> through io_parse_user_time(). It therefore does not pick up the
> time namespace conversion added by the previous patch.

Once again - If you auto-pick this one, please also do the other one in
the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
do just one of them.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-20 11:40   ` Jens Axboe
@ 2026-05-23 14:23     ` Jens Axboe
  2026-05-23 14:45       ` Sasha Levin
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2026-05-23 14:23 UTC (permalink / raw)
  To: Sasha Levin, patches, stable
  Cc: Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring, linux-kernel

On 5/20/26 5:40 AM, Jens Axboe wrote:
> On 5/20/26 5:18 AM, Sasha Levin wrote:
>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>
>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>
>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>> io_uring/wait.c parses ext_arg->ts inline rather than going
>> through io_parse_user_time(). It therefore does not pick up the
>> time namespace conversion added by the previous patch.
> 
> Once again - If you auto-pick this one, please also do the other one in
> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
> do just one of them.

And once again, no reply. What is going on with stable these days?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-23 14:23     ` Jens Axboe
@ 2026-05-23 14:45       ` Sasha Levin
  2026-05-23 14:55         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Sasha Levin @ 2026-05-23 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: patches, stable, Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring,
	linux-kernel

On Sat, May 23, 2026 at 08:23:13AM -0600, Jens Axboe wrote:
>On 5/20/26 5:40 AM, Jens Axboe wrote:
>> On 5/20/26 5:18 AM, Sasha Levin wrote:
>>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>>
>>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>>
>>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>>> io_uring/wait.c parses ext_arg->ts inline rather than going
>>> through io_parse_user_time(). It therefore does not pick up the
>>> time namespace conversion added by the previous patch.
>>
>> Once again - If you auto-pick this one, please also do the other one in
>> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
>> do just one of them.
>
>And once again, no reply. What is going on with stable these days?

Jens, as I've mentioned in the previous mail, I handle the AUTOSEL mails weeks
after I originally sent them out for reviews.

The volume of mails and patches makes it really difficult to give prompt
answers here. I have no idea if 9cc6bac1bebf8310d2950d1411a91479e86d69a1
applies cleanly, whether I need to ask for a backport, or whether I should just
drop 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
commits.

If this process doesn't work well for you, I'm happy top skip all
non-stable-tagged commits for io_uring. This is supposed to be only a best
effort attempt to catch commits that slipped through the cracks.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-23 14:45       ` Sasha Levin
@ 2026-05-23 14:55         ` Jens Axboe
  2026-05-23 15:06           ` Sasha Levin
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2026-05-23 14:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: patches, stable, Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring,
	linux-kernel

On 5/23/26 8:45 AM, Sasha Levin wrote:
> On Sat, May 23, 2026 at 08:23:13AM -0600, Jens Axboe wrote:
>> On 5/20/26 5:40 AM, Jens Axboe wrote:
>>> On 5/20/26 5:18 AM, Sasha Levin wrote:
>>>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>>>
>>>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>>>
>>>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>>>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>>>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>>>> io_uring/wait.c parses ext_arg->ts inline rather than going
>>>> through io_parse_user_time(). It therefore does not pick up the
>>>> time namespace conversion added by the previous patch.
>>>
>>> Once again - If you auto-pick this one, please also do the other one in
>>> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
>>> do just one of them.
>>
>> And once again, no reply. What is going on with stable these days?
> 
> Jens, as I've mentioned in the previous mail, I handle the AUTOSEL
> mails weeks after I originally sent them out for reviews.

And you think that's working fine? I would suggest that's a terrible
process. How are maintainers supposed to deal with that? Patches x and y
are autoselected and an email is sent out. Maintainers react to that,
either saying "no don't pick X" or "if you pick Y, please also do Z".
The expectation would then be a reply that says "ok, doing that" or
whatever might be appropriate there. Instead, it's just silence. And now
I have to follow-up MULTIPLE times to ensure the right thing is being
done. We're about 2 weeks into this particular incidence, and
hilariously, I still have no idea what the state is on your end. Did it
get dropped? Did the other one I asked for get picked up? Nobody knows!

At least Greg actually promptly replies for the non-autosel stuff he
does. Which is the ONLY thing that makes Fixes tags and CC stable
actually work. The AUTOSEL stuff, it does not. When it happens to pick
the right patches, yeah all is good. But when there's a problem, the
process is terrible, as evidenced by this particular patch.

> The volume of mails and patches makes it really difficult to give
> prompt answers here. I have no idea if
> 9cc6bac1bebf8310d2950d1411a91479e86d69a1 applies cleanly, whether I
> need to ask for a backport, or whether I should just drop
> 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
> commits.

If you can't handle basic replies when running AUTOSEL, then I don't
think you should have that process in the first place.

> If this process doesn't work well for you, I'm happy top skip all
> non-stable-tagged commits for io_uring. This is supposed to be only a
> best effort attempt to catch commits that slipped through the cracks.

Please don't do AUTOSEL for any patches for any subsystem that I am a
maintainer or co-maintainer of. Until this part of the stable tree
process can be improved, it's a net negative.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  2026-05-23 14:55         ` Jens Axboe
@ 2026-05-23 15:06           ` Sasha Levin
  0 siblings, 0 replies; 15+ messages in thread
From: Sasha Levin @ 2026-05-23 15:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: patches, stable, Maoyi Xie, Pavel Begunkov, Maoyi Xie, io-uring,
	linux-kernel

On Sat, May 23, 2026 at 08:55:43AM -0600, Jens Axboe wrote:
>On 5/23/26 8:45 AM, Sasha Levin wrote:
>> The volume of mails and patches makes it really difficult to give
>> prompt answers here. I have no idea if
>> 9cc6bac1bebf8310d2950d1411a91479e86d69a1 applies cleanly, whether I
>> need to ask for a backport, or whether I should just drop
>> 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
>> commits.
>
>If you can't handle basic replies when running AUTOSEL, then I don't
>think you should have that process in the first place.

You know, you're probably right. I'll just take a break from AUTOSEL for now.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-05-23 15:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260520111944.3424570-1-sashal@kernel.org>
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] io_uring: hold uring_lock when walking link chain in io_wq_free_work() Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
2026-05-20 11:40   ` Jens Axboe
2026-05-23 14:23     ` Jens Axboe
2026-05-23 14:45       ` Sasha Levin
2026-05-23 14:55         ` Jens Axboe
2026-05-23 15:06           ` Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: defer linked-timeout chain splice out of hrtimer context Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] io_uring: validate user-controlled cq.head in io_cqe_cache_refill() Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: hold uring_lock across io_kill_timeouts() in cancel path Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] io_uring/fdinfo: translate SqThread PID through caller's pid_ns Sasha Levin
     [not found] <20260511221931.2370053-1-sashal@kernel.org>
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
2026-05-12 15:47   ` Jens Axboe
2026-05-15 14:04     ` Jens Axboe
2026-05-15 14:11       ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox