All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Maoyi Xie <maoyixie.tju@gmail.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Jens Axboe <axboe@kernel.dk>, Maoyi Xie <maoyi.xie@ntu.edu.sg>,
	Sasha Levin <sashal@kernel.org>,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
Date: Mon, 11 May 2026 18:19:12 -0400	[thread overview]
Message-ID: <20260511221931.2370053-13-sashal@kernel.org> (raw)
In-Reply-To: <20260511221931.2370053-1-sashal@kernel.org>

From: Maoyi Xie <maoyixie.tju@gmail.com>

[ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Record: Subsystem `io_uring/wait`; action verb `honour`; intent is to
make `IORING_ENTER_ABS_TIMER` interpret caller absolute times in the
caller’s time namespace.

Record: Tags present:
`Suggested-by: Pavel Begunkov`, `Suggested-by: Jens Axboe`, author
`Signed-off-by: Maoyi Xie`, `Link:
https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`,
maintainer `Signed-off-by: Jens Axboe`. No `Fixes:`, `Reported-by:`,
`Tested-by:`, `Reviewed-by`, `Acked-by`, or `Cc: stable`.

Record: The commit describes a real userspace-visible bug:
`io_uring_enter()` with `IORING_ENTER_ABS_TIMER` parses `ext_arg->ts`
directly, then arms an absolute hrtimer without converting from the
caller’s time namespace to host time. The supplied reproducer in
`unshare --user --time` with a `-10s` monotonic offset returns `-ETIME`
in under 1 ms instead of about 1 second.

Record: This is not hidden cleanup. It is a direct correctness fix for
absolute timeout interpretation in time namespaces.

### Phase 2: Diff Analysis
Record: One file changed, `io_uring/wait.c`, 5 insertions and 1
deletion. Function modified: `io_cqring_wait()`. Scope: single-file
surgical fix.

Record: Before, `ext_arg->ts` was converted with
`timespec64_to_ktime()`. If `IORING_ENTER_ABS_TIMER` was unset, the code
added `start_time`; if set, it used the raw caller value as a host
absolute deadline. After, the absolute branch calls
`timens_ktime_to_host(ctx->clockid, iowq.timeout)`, while the relative
branch remains unchanged.

Record: Bug category is logic/correctness in time namespace handling.
The broken mechanism is that a namespaced absolute
`CLOCK_MONOTONIC`/`CLOCK_BOOTTIME` timestamp was fed to a host hrtimer
as if it were already in host time.

Record: Fix quality is strong: minimal, local, uses existing kernel
helper, and no new API. Regression risk is very low because
`timens_ktime_to_host()` is verified as a no-op for the initial time
namespace, for unsupported clocks, and when `CONFIG_TIME_NS` is
disabled.

### Phase 3: Git History Investigation
Record: `git blame` on the changed wait lines points to `0105b0562a5e`
(`io_uring: split out CQ waiting code into wait.c`) for the current file
location. The same logic predates the split; `2b8e976b9842` (`io_uring:
user registered clockid for wait timeouts`) shows this absolute-wait
path using `ctx->clockid` and is contained by `v6.12-rc1`.

Record: No `Fixes:` tag is present, so there was no tagged introducing
commit to follow. I inspected the companion parent commit instead:
`9cc6bac1bebf` fixes the same time-namespace issue for
`IORING_TIMEOUT_ABS`.

Record: Recent related history shows this is patch 2/2 after
`9cc6bac1bebf`. The candidate’s parent is exactly `9cc6bac1bebf`, but
this wait fix compiles independently as long as `timens_ktime_to_host()`
and `ctx->clockid` exist.

Record: Author history in `io_uring` before this commit only showed the
companion timeout fix. Jens Axboe applied the patch, and Pavel/Jens were
suggested-by/review participants.

Record: Dependencies: affected stable trees need `ctx->clockid` and
`timens_ktime_to_host()`. I verified both exist in local `for-
greg/6.12-100`; the same `IORING_ENTER_ABS_TIMER` buggy line exists in
`6.12`, `6.18`, `6.19`, and `7.0` local stable branches, but not in
`5.10`, `5.15`, `6.1`, or `6.6`.

### Phase 4: Mailing List And External Research
Record: `b4 dig -c 45d2b37a37ab...` found the original submission at `ht
tps://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`.

Record: `b4 dig -a` found only v1 of the series. The thread shows Jens
applied both patches with commit IDs `9cc6bac1bebf` and `45d2b37a37ab`.

Record: `b4 dig -w` shows the right people/lists were included: Maoyi
Xie, Jens Axboe, Pavel Begunkov, `io-uring@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`.

Record: Reviewer feedback was positive: Pavel wrote “both look good” and
requested a liburing test; Jens replied “+1” for the test and later
applied the series. No NAKs or objections found.

Record: No separate bug-report link exists beyond the patch
thread/reproducer. Stable-specific WebFetch was blocked by Anubis, and
local thread search found no stable nomination.

### Phase 5: Code Semantic Analysis
Record: Modified function: `io_cqring_wait()`.

Record: Callers: `io_uring_enter(2)` reaches `io_cqring_wait()` when
`IORING_ENTER_GETEVENTS` is set, after `io_get_ext_arg()` copies/parses
the userspace getevents argument. This is directly syscall-reachable.

Record: Key callees: `timespec64_to_ktime()`, `timens_ktime_to_host()`,
`ktime_add()`, `io_get_time()`, `io_cqring_schedule_timeout()`, and
hrtimer setup/start helpers.

Record: Call chain: userspace `io_uring_enter()` -> `io_get_ext_arg()`
-> `io_cqring_wait()` -> `io_cqring_wait_schedule()` ->
`__io_cqring_wait_schedule()` -> `io_cqring_schedule_timeout()` ->
absolute hrtimer. The buggy path is reachable from userspace with
`IORING_ENTER_GETEVENTS | IORING_ENTER_EXT_ARG |
IORING_ENTER_ABS_TIMER`.

Record: Similar patterns: the companion commit fixes
`io_parse_user_time()` for `IORING_TIMEOUT_ABS`; POSIX timers,
`clock_nanosleep`, alarm timers, and `timerfd` already use
`timens_ktime_to_host()` for absolute timers.

### Phase 6: Stable Tree Analysis
Record: Local stable-branch grep found the buggy
`IORING_ENTER_ABS_TIMER` code in `for-greg/6.12-100`, `for-
greg/6.18-100`, `for-greg/6.19-200`, and `for-greg/7.0-100`. It was
absent from `5.10`, `5.15`, `6.1`, and `6.6`.

Record: Backport difficulty: current `7.0.y` apply check succeeds
cleanly. `6.12`/`7.0` have `io_uring/wait.c`; `6.18`/`6.19` local
branches have the same logic in `io_uring/io_uring.c`, so those need a
path/context backport but not semantic rework.

Record: No related fix with this subject was found in the checked stable
candidate branches.

### Phase 7: Subsystem Context
Record: Subsystem is `io_uring`, a core async I/O syscall subsystem.
Criticality: IMPORTANT, not universal core MM/VFS, but directly
userspace-facing and widely used.

Record: Subsystem activity is high; recent `io_uring` history has many
fixes and feature changes. This specific change is small despite the
active subsystem.

### Phase 8: Impact And Risk
Record: Affected population: users of `io_uring_enter()` absolute CQ
wait timeouts inside non-initial time namespaces, especially container-
like environments. Branch-limited to stable trees that contain
`IORING_ENTER_ABS_TIMER`.

Record: Trigger: userspace can trigger via `io_uring_enter()` with
`IORING_ENTER_ABS_TIMER` and a timespec from a shifted time namespace.
The provided reproducer uses `unshare --user --time`; whether fully
unprivileged depends on system user-namespace policy.

Record: Failure mode: incorrect timeout behavior. With the reproduced
negative offset, the wait returns `-ETIME` immediately; with other
offsets, absolute waits can be delayed incorrectly. Severity: MEDIUM to
HIGH user-visible correctness bug, potential application timeout/hang
behavior, but not a kernel crash, memory corruption, or security fix.

Record: Benefit is moderate/high for affected containerized users
because it restores syscall semantics. Risk is very low: one local
conditional change plus an include, using established helper semantics.

### Phase 9: Final Synthesis
Record: Evidence for backporting: real reproduced bug, syscall-
reachable, affects stable branches with the feature, tiny patch,
maintainer-applied, positive reviewer feedback, matches established
time-namespace behavior elsewhere.

Record: Evidence against backporting: not a crash/security/data-
corruption fix; affects a narrower feature combination; no explicit
stable nomination; older stable trees do not contain the affected
feature.

Record: Unresolved: I did not run the reproducer locally. Lore WebFetch
was blocked by Anubis, but `b4` successfully fetched the thread. Exact
first upstream introduction of `IORING_ENTER_ABS_TIMER` was not cleanly
reconstructed from local blame alone, but affected stable branches were
directly verified by grep.

Stable rules:
1. Obviously correct and tested: yes by code inspection, reproducer, and
   positive review; no formal `Tested-by`.
2. Fixes a real bug: yes, reproduced wrong timeout result.
3. Important issue: yes for affected users, because absolute waits can
   return immediately or at the wrong time.
4. Small and contained: yes, 6-line single-function change.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current `7.0.y`; minor path adjustment
   may be needed in some branches.

No automatic exception category applies.

## Verification
- [Phase 1] Parsed `git show` commit message and tags for
  `45d2b37a37ab98484693533496395c610a2cab96`.
- [Phase 2] Verified diff is one file, `io_uring/wait.c`, 5 insertions/1
  deletion in `io_cqring_wait()`.
- [Phase 3] Ran `git blame` on the changed lines; current file location
  comes from `0105b0562a5e`.
- [Phase 3] Inspected `2b8e976b9842`; verified `ctx->clockid`,
  `io_get_time(ctx)`, and selected-clock wait timeout support.
- [Phase 3] Inspected companion commit `9cc6bac1bebf`; verified same
  class of fix for `IORING_TIMEOUT_ABS`.
- [Phase 4] Ran `b4 dig`, `b4 dig -a`, `b4 dig -w`, and `b4 mbox`;
  verified v1-only series, correct recipients, positive feedback, and
  applied notice.
- [Phase 5] Read `io_uring_enter()` and `io_get_ext_arg()` call path;
  verified direct syscall reachability.
- [Phase 5] Verified `timens_ktime_to_host()` behavior in
  `include/linux/time_namespace.h` and `kernel/time/namespace.c`.
- [Phase 5] Verified similar established conversions in `kernel/time`
  and `fs/timerfd.c`.
- [Phase 6] Ran `git grep` on local stable branches; affected: `6.12`,
  `6.18`, `6.19`, `7.0`; unaffected: `5.10`, `5.15`, `6.1`, `6.6`.
- [Phase 6] Ran `git apply --check` for the candidate patch on current
  `7.0.y`; it applies cleanly.
- [Phase 8] Verified reproducer details from commit and mailing-list
  cover letter; did not execute it locally.

This should be backported to stable trees that contain
`IORING_ENTER_ABS_TIMER`, with the companion timeout patch strongly
recommended for complete io_uring absolute-timeout time-namespace
correctness.

**YES**

 io_uring/wait.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/wait.c b/io_uring/wait.c
index 91df86ce0d18c..ec01e78a216d6 100644
--- a/io_uring/wait.c
+++ b/io_uring/wait.c
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/io_uring.h>
+#include <linux/time_namespace.h>
 
 #include <trace/events/io_uring.h>
 
@@ -229,7 +230,10 @@ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 
 	if (ext_arg->ts_set) {
 		iowq.timeout = timespec64_to_ktime(ext_arg->ts);
-		if (!(flags & IORING_ENTER_ABS_TIMER))
+		if (flags & IORING_ENTER_ABS_TIMER)
+			iowq.timeout = timens_ktime_to_host(ctx->clockid,
+							    iowq.timeout);
+		else
 			iowq.timeout = ktime_add(iowq.timeout, start_time);
 	}
 
-- 
2.53.0


  parent reply	other threads:[~2026-05-11 22:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 22:19 [PATCH AUTOSEL 7.0-5.10] ALSA: sparc/dbri: add missing fallthrough Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.6] docs: cgroup-v1: Update charge-commit section Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] drm/panel: feiyang-fy07024di26a30d: return display-on error Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.1] smb: client: Zero-pad short GSS session keys per MS-SMB2 Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.15] wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.6] ipv6: Implement limits on extension header parsing Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.12] net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.15] net: usb: r8152: add TRENDnet TUC-ET2G v2.0 Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.18] ALSA: usb-audio: add min_mute quirk for Razer Nommo V2 X Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] wifi: libertas: fix integer underflow in process_cmdrequest() Sasha Levin
2026-05-11 22:19 ` Sasha Levin [this message]
2026-05-12 15:47   ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Jens Axboe
2026-05-15 14:04     ` Jens Axboe
2026-05-15 14:11       ` Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.12] media: qcom: camss: avoid format string warning Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] scsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.18] Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.15] ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.18] ALSA: hda/realtek: Add codec SSID quirk for Lenovo Yoga Pro 9 16IMH9 Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] fbdev: ipu-v3: clean up kernel-doc warnings Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.6] ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VE Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.1] powerpc/pasemi: Drop redundant res assignment Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.15] scsi: smartpqi: Silence a recursive lock warning Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.18] powerpc/pseries/htmdump: Free the global buffers in htmdump module exit Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-6.18] iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19 Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: add clock quirk for Motu 1248 Sasha Levin
2026-05-11 22:19 ` [PATCH AUTOSEL 7.0] ASoC: sdw_utils: avoid the SDCA companion function not supported failure Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260511221931.2370053-13-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maoyi.xie@ntu.edu.sg \
    --cc=maoyixie.tju@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.