* [PATCH 1/2] io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
2026-05-04 15:37 [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Maoyi Xie
@ 2026-05-04 15:37 ` Maoyi Xie
2026-05-04 15:37 ` [PATCH 2/2] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Maoyi Xie
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Maoyi Xie @ 2026-05-04 15:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-kernel
io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.
A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:
kernel/time/posix-timers.c -- timer_settime(TIMER_ABSTIME)
kernel/time/posix-stubs.c -- clock_nanosleep(TIMER_ABSTIME)
kernel/time/alarmtimer.c -- alarm_timer_nsleep(TIMER_ABSTIME)
fs/timerfd.c -- timerfd_settime(TFD_TIMER_ABSTIME)
io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.
Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after <1ms instead of
the expected ~1s.
Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.
timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.
SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current->nsproxy->time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
---
io_uring/timeout.c | 35 ++++++++++++++++++++++-------------
1 file changed, 22 insertions(+), 13 deletions(-)
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 4cfdfc519..e2595cae2 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -3,6 +3,7 @@
#include <linux/errno.h>
#include <linux/file.h>
#include <linux/io_uring.h>
+#include <linux/time_namespace.h>
#include <trace/events/io_uring.h>
@@ -35,6 +36,22 @@ struct io_timeout_rem {
bool ltimeout;
};
+static clockid_t io_flags_to_clock(unsigned flags)
+{
+ switch (flags & IORING_TIMEOUT_CLOCK_MASK) {
+ case IORING_TIMEOUT_BOOTTIME:
+ return CLOCK_BOOTTIME;
+ case IORING_TIMEOUT_REALTIME:
+ return CLOCK_REALTIME;
+ default:
+ /* can't happen, vetted at prep time */
+ WARN_ON_ONCE(1);
+ fallthrough;
+ case 0:
+ return CLOCK_MONOTONIC;
+ }
+}
+
static int io_parse_user_time(ktime_t *time, u64 arg, unsigned flags)
{
struct timespec64 ts;
@@ -43,7 +60,7 @@ static int io_parse_user_time(ktime_t *time, u64 arg, unsigned flags)
*time = ns_to_ktime(arg);
if (*time < 0)
return -EINVAL;
- return 0;
+ goto out;
}
if (get_timespec64(&ts, u64_to_user_ptr(arg)))
@@ -51,6 +68,9 @@ static int io_parse_user_time(ktime_t *time, u64 arg, unsigned flags)
if (ts.tv_sec < 0 || ts.tv_nsec < 0)
return -EINVAL;
*time = timespec64_to_ktime(ts);
+out:
+ if (flags & IORING_TIMEOUT_ABS)
+ *time = timens_ktime_to_host(io_flags_to_clock(flags), *time);
return 0;
}
@@ -399,18 +419,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
static clockid_t io_timeout_get_clock(struct io_timeout_data *data)
{
- switch (data->flags & IORING_TIMEOUT_CLOCK_MASK) {
- case IORING_TIMEOUT_BOOTTIME:
- return CLOCK_BOOTTIME;
- case IORING_TIMEOUT_REALTIME:
- return CLOCK_REALTIME;
- default:
- /* can't happen, vetted at prep time */
- WARN_ON_ONCE(1);
- fallthrough;
- case 0:
- return CLOCK_MONOTONIC;
- }
+ return io_flags_to_clock(data->flags);
}
static int io_linked_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 2/2] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
2026-05-04 15:37 [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Maoyi Xie
2026-05-04 15:37 ` [PATCH 1/2] io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS Maoyi Xie
@ 2026-05-04 15:37 ` Maoyi Xie
2026-05-06 9:05 ` [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Pavel Begunkov
2026-05-06 11:01 ` Jens Axboe
3 siblings, 0 replies; 7+ messages in thread
From: Maoyi Xie @ 2026-05-04 15:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-kernel
io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.
Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.
timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.
Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
---
io_uring/wait.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/io_uring/wait.c b/io_uring/wait.c
index 91df86ce0..ec01e78a2 100644
--- a/io_uring/wait.c
+++ b/io_uring/wait.c
@@ -5,6 +5,7 @@
#include <linux/kernel.h>
#include <linux/sched/signal.h>
#include <linux/io_uring.h>
+#include <linux/time_namespace.h>
#include <trace/events/io_uring.h>
@@ -229,7 +230,10 @@ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
if (ext_arg->ts_set) {
iowq.timeout = timespec64_to_ktime(ext_arg->ts);
- if (!(flags & IORING_ENTER_ABS_TIMER))
+ if (flags & IORING_ENTER_ABS_TIMER)
+ iowq.timeout = timens_ktime_to_host(ctx->clockid,
+ iowq.timeout);
+ else
iowq.timeout = ktime_add(iowq.timeout, start_time);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
2026-05-04 15:37 [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Maoyi Xie
2026-05-04 15:37 ` [PATCH 1/2] io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS Maoyi Xie
2026-05-04 15:37 ` [PATCH 2/2] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Maoyi Xie
@ 2026-05-06 9:05 ` Pavel Begunkov
2026-05-06 11:01 ` Maoyi Xie
2026-05-06 11:01 ` Jens Axboe
2026-05-06 11:01 ` Jens Axboe
3 siblings, 2 replies; 7+ messages in thread
From: Pavel Begunkov @ 2026-05-06 9:05 UTC (permalink / raw)
To: Maoyi Xie, Jens Axboe; +Cc: io-uring, linux-kernel
On 5/4/26 16:37, Maoyi Xie wrote:
> This series addresses two io_uring code paths that arm an ABS
> hrtimer from a timestamp supplied by the caller. Both paths skip
> the conversion from the submitter's time namespace view to host
> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
> default, or optionally CLOCK_BOOTTIME.
>
> All four other ABS timer interfaces already do this conversion:
> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
> alarm_timer_nsleep(TIMER_ABSTIME), and
> timerfd_settime(TFD_TIMER_ABSTIME).
>
> Patch 1/2 (io_uring/timeout) covers IORING_OP_TIMEOUT and
> IORING_OP_LINK_TIMEOUT via io_parse_user_time(). It is essentially
> the draft Pavel posted on the original thread. I rebased it on
> io_uring-7.1 and verified end to end.
>
> Patch 2/2 (io_uring/wait) covers the IORING_ENTER_ABS_TIMER path
> in io_uring_enter(). That path parses ext_arg->ts inline rather
> than going through io_parse_user_time(). Patch 1/2 therefore does
> not cover it.
>
> Per Pavel and Jens's discussion on the original thread, the two
> sites use two direct timens_ktime_to_host() call sites rather
> than a shared helper. Patch 1/2 also splits the existing
> io_timeout_get_clock() into a flags only io_flags_to_clock(), so
> io_parse_user_time() can resolve the clock without a
> struct io_timeout_data.
>
> SQPOLL is automatically covered. The SQPOLL kernel thread is
> created via create_io_thread() with CLONE_THREAD and no CLONE_NEW*
> flag. copy_namespaces() therefore shares the submitter's nsproxy
> by reference. timens_ktime_to_host() through "current" sees the
> submitter's time_ns when called from the SQPOLL kthread. PoCs for
> both paths confirm this.
At a quick glance, both look good. I think you had an isolated
reproducer, are you sending it as a liburing test? Would be
greatly appreciated.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
2026-05-06 9:05 ` [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Pavel Begunkov
@ 2026-05-06 11:01 ` Maoyi Xie
2026-05-06 11:01 ` Jens Axboe
1 sibling, 0 replies; 7+ messages in thread
From: Maoyi Xie @ 2026-05-06 11:01 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-kernel
Hi Pavel,
Thanks for the look. We will turn the reproducers into a
liburing test and send it shortly.
The current shape is two minimal C programs. Each forks into
a fresh user namespace plus time namespace with a -10s
monotonic offset. The child submits either IORING_OP_TIMEOUT
or io_uring_enter with IORING_ENTER_ABS_TIMER and a deadline
of now + 1s. The test asserts the call returns after the
expected ~1000ms rather than after <1ms.
We will reshape that into a single liburing test that
exercises both paths. The test will gate the unshare on
CLONE_NEWUSER | CLONE_NEWTIME availability so it skips
gracefully on kernels without time namespace support. It
will use the standard t_* helpers.
Maoyi
Nanyang Technological University
https://maoyixie.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
2026-05-06 9:05 ` [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Pavel Begunkov
2026-05-06 11:01 ` Maoyi Xie
@ 2026-05-06 11:01 ` Jens Axboe
1 sibling, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2026-05-06 11:01 UTC (permalink / raw)
To: Pavel Begunkov, Maoyi Xie; +Cc: io-uring, linux-kernel
On 5/6/26 3:05 AM, Pavel Begunkov wrote:
> On 5/4/26 16:37, Maoyi Xie wrote:
>> This series addresses two io_uring code paths that arm an ABS
>> hrtimer from a timestamp supplied by the caller. Both paths skip
>> the conversion from the submitter's time namespace view to host
>> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
>> default, or optionally CLOCK_BOOTTIME.
>>
>> All four other ABS timer interfaces already do this conversion:
>> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
>> alarm_timer_nsleep(TIMER_ABSTIME), and
>> timerfd_settime(TFD_TIMER_ABSTIME).
>>
>> Patch 1/2 (io_uring/timeout) covers IORING_OP_TIMEOUT and
>> IORING_OP_LINK_TIMEOUT via io_parse_user_time(). It is essentially
>> the draft Pavel posted on the original thread. I rebased it on
>> io_uring-7.1 and verified end to end.
>>
>> Patch 2/2 (io_uring/wait) covers the IORING_ENTER_ABS_TIMER path
>> in io_uring_enter(). That path parses ext_arg->ts inline rather
>> than going through io_parse_user_time(). Patch 1/2 therefore does
>> not cover it.
>>
>> Per Pavel and Jens's discussion on the original thread, the two
>> sites use two direct timens_ktime_to_host() call sites rather
>> than a shared helper. Patch 1/2 also splits the existing
>> io_timeout_get_clock() into a flags only io_flags_to_clock(), so
>> io_parse_user_time() can resolve the clock without a
>> struct io_timeout_data.
>>
>> SQPOLL is automatically covered. The SQPOLL kernel thread is
>> created via create_io_thread() with CLONE_THREAD and no CLONE_NEW*
>> flag. copy_namespaces() therefore shares the submitter's nsproxy
>> by reference. timens_ktime_to_host() through "current" sees the
>> submitter's time_ns when called from the SQPOLL kthread. PoCs for
>> both paths confirm this.
>
> At a quick glance, both look good. I think you had an isolated
> reproducer, are you sending it as a liburing test? Would be
> greatly appreciated.
+1 Yes please, test case for liburing would be great!
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
2026-05-04 15:37 [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Maoyi Xie
` (2 preceding siblings ...)
2026-05-06 9:05 ` [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts Pavel Begunkov
@ 2026-05-06 11:01 ` Jens Axboe
3 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2026-05-06 11:01 UTC (permalink / raw)
To: Maoyi Xie; +Cc: Pavel Begunkov, io-uring, linux-kernel
On Mon, 04 May 2026 23:37:53 +0800, Maoyi Xie wrote:
> This series addresses two io_uring code paths that arm an ABS
> hrtimer from a timestamp supplied by the caller. Both paths skip
> the conversion from the submitter's time namespace view to host
> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
> default, or optionally CLOCK_BOOTTIME.
>
> All four other ABS timer interfaces already do this conversion:
> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
> alarm_timer_nsleep(TIMER_ABSTIME), and
> timerfd_settime(TFD_TIMER_ABSTIME).
>
> [...]
Applied, thanks!
[1/2] io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
commit: 9cc6bac1bebf8310d2950d1411a91479e86d69a1
[2/2] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
commit: 45d2b37a37ab98484693533496395c610a2cab96
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread