* [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps
@ 2025-05-30 12:18 Pavel Begunkov
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Vadim Fedorenko suggested to add an alternative API for receiving
tx timestamps through io_uring. The series introduces io_uring socket
cmd for fetching tx timestamps, which is a polled multishot request,
i.e. internally polling the socket for POLLERR and posts timestamps
when they're arrives. For the API description see Patch 5.
It reuses existing timestamp infra and takes them from the socket's
error queue. For networking people the important parts are Patch 1,
and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
It should be reasonable to take it through the io_uring tree once
we have consensus, but let me know if there are any concerns.
Pavel Begunkov (5):
net: timestamp: add helper returning skb's tx tstamp
io_uring/poll: introduce io_arm_apoll()
io_uring/cmd: allow multishot polled commands
io_uring: add mshot helper for posting CQE32
io_uring/netcmd: add tx timestamping cmd support
include/net/sock.h | 4 ++
include/uapi/linux/io_uring.h | 6 +++
io_uring/cmd_net.c | 77 +++++++++++++++++++++++++++++++++++
io_uring/io_uring.c | 40 ++++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/poll.c | 43 +++++++++++--------
io_uring/poll.h | 1 +
io_uring/uring_cmd.c | 34 ++++++++++++++++
io_uring/uring_cmd.h | 7 ++++
net/socket.c | 49 ++++++++++++++++++++++
10 files changed, 245 insertions(+), 17 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
@ 2025-05-30 12:18 ` Pavel Begunkov
2025-05-30 18:14 ` Stanislav Fomichev
2025-06-01 13:52 ` Willem de Bruijn
2025-05-30 12:18 ` [PATCH 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
` (4 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an skb from an queue queue.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/net/sock.h | 4 ++++
net/socket.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+)
diff --git a/include/net/sock.h b/include/net/sock.h
index 92e7c1aae3cc..b0493e82b6e3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb);
+bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk);
+bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+ struct timespec64 *ts);
+
static inline void
sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..d1dc8ab28e46 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
sizeof(ts_pktinfo), &ts_pktinfo);
}
+bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
+{
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+ struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+
+ if (serr->ee.ee_errno != ENOMSG ||
+ serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
+ return false;
+
+ /* software time stamp available and wanted */
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
+ return true;
+ /* hardware time stamps available and wanted */
+ return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
+ skb_hwtstamps(skb)->hwtstamp;
+}
+
+bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+ struct timespec64 *ts)
+{
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+ bool false_tstamp = false;
+ ktime_t hwtstamp;
+ int if_index = 0;
+
+ if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
+ __net_timestamp(skb);
+ false_tstamp = true;
+ }
+
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
+ ktime_to_timespec64_cond(skb->tstamp, ts))
+ return true;
+
+ if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
+ skb_is_swtx_tstamp(skb, false_tstamp))
+ return false;
+
+ if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
+ hwtstamp = get_timestamp(sk, skb, &if_index);
+ else
+ hwtstamp = skb_hwtstamps(skb)->hwtstamp;
+
+ if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
+ hwtstamp = ptp_convert_timestamp(&hwtstamp,
+ READ_ONCE(sk->sk_bind_phc));
+ return ktime_to_timespec64_cond(hwtstamp, ts);
+}
+
/*
* called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
*/
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/5] io_uring/poll: introduce io_arm_apoll()
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-05-30 12:18 ` Pavel Begunkov
2025-05-31 10:28 ` Pavel Begunkov
2025-05-30 12:18 ` [PATCH 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
` (3 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
In preparation to allowing commands to do file polling, add a helper
that takes the desired poll event mask and arms it for polling. We won't
be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it
tries to infer the mask from the opcode data, and we can't unify it
across all commands.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/poll.c | 43 ++++++++++++++++++++++++++-----------------
io_uring/poll.h | 1 +
2 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0526062e2f81..e323221317f7 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -669,33 +669,17 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req,
return apoll;
}
-int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask)
{
- const struct io_issue_def *def = &io_issue_defs[req->opcode];
struct async_poll *apoll;
struct io_poll_table ipt;
- __poll_t mask = POLLPRI | POLLERR | EPOLLET;
int ret;
- if (!def->pollin && !def->pollout)
- return IO_APOLL_ABORTED;
if (!io_file_can_poll(req))
return IO_APOLL_ABORTED;
if (!(req->flags & REQ_F_APOLL_MULTISHOT))
mask |= EPOLLONESHOT;
- if (def->pollin) {
- mask |= EPOLLIN | EPOLLRDNORM;
-
- /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
- if (req->flags & REQ_F_CLEAR_POLLIN)
- mask &= ~EPOLLIN;
- } else {
- mask |= EPOLLOUT | EPOLLWRNORM;
- }
- if (def->poll_exclusive)
- mask |= EPOLLEXCLUSIVE;
-
apoll = io_req_alloc_apoll(req, issue_flags);
if (!apoll)
return IO_APOLL_ABORTED;
@@ -712,6 +696,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
return IO_APOLL_OK;
}
+int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+{
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
+ __poll_t mask = POLLPRI | POLLERR | EPOLLET;
+
+ if (!def->pollin && !def->pollout)
+ return IO_APOLL_ABORTED;
+ if (!io_file_can_poll(req))
+ return IO_APOLL_ABORTED;
+
+ if (def->pollin) {
+ mask |= EPOLLIN | EPOLLRDNORM;
+
+ /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+ if (req->flags & REQ_F_CLEAR_POLLIN)
+ mask &= ~EPOLLIN;
+ } else {
+ mask |= EPOLLOUT | EPOLLWRNORM;
+ }
+ if (def->poll_exclusive)
+ mask |= EPOLLEXCLUSIVE;
+
+ return io_arm_apoll(req, issue_flags, mask);
+}
+
/*
* Returns true if we found and killed one or more poll requests
*/
diff --git a/io_uring/poll.h b/io_uring/poll.h
index 27e2db2ed4ae..c8438286dfa0 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -41,6 +41,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
struct io_cancel_data;
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned issue_flags);
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask);
int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
bool cancel_all);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/5] io_uring/cmd: allow multishot polled commands
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-05-30 12:18 ` [PATCH 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
@ 2025-05-30 12:18 ` Pavel Begunkov
2025-05-30 12:18 ` [PATCH 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
` (2 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Some commands like timestamping in the next patch can make use of
multishot polling, i.e. REQ_F_APOLL_MULTISHOT. Add support for that,
which is condensed in a single helper called io_cmd_poll_multishot().
The user who wants to continue with a request in a multishot mode must
call the function, and only if it returns 0 the user is free to proceed.
Apart from normal terminal errors, it can also end up with -EIOCBQUEUED,
in which case the user must forward it to the core io_uring. It's
forbidden to use task work while the request is executing in a multishot
mode.
The API is not foolproof, hence it's not exported to modules nor exposed
in public headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/uring_cmd.c | 23 +++++++++++++++++++++++
io_uring/uring_cmd.h | 3 +++
2 files changed, 26 insertions(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 929cad6ee326..2710521eec62 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -12,6 +12,7 @@
#include "alloc_cache.h"
#include "rsrc.h"
#include "uring_cmd.h"
+#include "poll.h"
void io_cmd_cache_free(const void *entry)
{
@@ -136,6 +137,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
ioucmd->task_work_cb = task_work_cb;
req->io_task_work.func = io_uring_cmd_work;
__io_req_task_work_add(req, flags);
@@ -158,6 +162,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
io_uring_cmd_del_cancelable(ioucmd, issue_flags);
if (ret < 0)
@@ -310,3 +317,19 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
io_req_queue_iowq(req);
}
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask)
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+ int ret;
+
+ if (likely(req->flags & REQ_F_APOLL_MULTISHOT))
+ return 0;
+
+ req->flags |= REQ_F_APOLL_MULTISHOT;
+ mask &= ~EPOLLONESHOT;
+
+ ret = io_arm_apoll(req, issue_flags, mask);
+ return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index e6a5142c890e..9565ca5d5cf2 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -17,3 +17,6 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
struct io_uring_task *tctx, bool cancel_all);
void io_cmd_cache_free(const void *entry);
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/5] io_uring: add mshot helper for posting CQE32
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
` (2 preceding siblings ...)
2025-05-30 12:18 ` [PATCH 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
@ 2025-05-30 12:18 ` Pavel Begunkov
2025-05-30 12:18 ` [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
2025-05-30 13:30 ` [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Jens Axboe
5 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Add a helper for posting 32 byte CQEs in a multishot mode and add a cmd
helper on top. As it specifically works with requests, the helper ignore
the passed in cqe->user_data and sets it to the one stored in the
request.
The command helper is only valid with multishot requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 40 ++++++++++++++++++++++++++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/uring_cmd.c | 11 +++++++++++
io_uring/uring_cmd.h | 4 ++++
4 files changed, 56 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c7a9cecf528e..4ca357057384 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -788,6 +788,21 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow)
return true;
}
+static bool io_fill_cqe_aux32(struct io_ring_ctx *ctx,
+ struct io_uring_cqe src_cqe[2])
+{
+ struct io_uring_cqe *cqe;
+
+ if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+ return false;
+ if (unlikely(!io_get_cqe(ctx, &cqe)))
+ return false;
+
+ memcpy(cqe, src_cqe, 2 * sizeof(*cqe));
+ trace_io_uring_complete(ctx, NULL, cqe);
+ return true;
+}
+
static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
u32 cflags)
{
@@ -899,6 +914,31 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
return posted;
}
+/*
+ * A helper for multishot requests posting additional CQEs.
+ * Should only be used from a task_work including IO_URING_F_MULTISHOT.
+ */
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool posted;
+
+ lockdep_assert(!io_wq_current_is_worker());
+ lockdep_assert_held(&ctx->uring_lock);
+
+ cqe[0].user_data = req->cqe.user_data;
+ if (!ctx->lockless_cq) {
+ spin_lock(&ctx->completion_lock);
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ }
+
+ ctx->submit_state.cq_flush = true;
+ return posted;
+}
+
static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 0ea7a435d1de..bff5580507ba 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -81,6 +81,7 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe src_cqe[2]);
void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
struct file *io_file_get_normal(struct io_kiocb *req, int fd);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 2710521eec62..429a3e4a6a02 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -333,3 +333,14 @@ int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
ret = io_arm_apoll(req, issue_flags, mask);
return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
}
+
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2])
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+
+ if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_MULTISHOT)))
+ return false;
+ return io_req_post_cqe32(req, cqe);
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index 9565ca5d5cf2..be97407e4019 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -16,6 +16,10 @@ void io_uring_cmd_cleanup(struct io_kiocb *req);
bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
struct io_uring_task *tctx, bool cancel_all);
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2]);
+
void io_cmd_cache_free(const void *entry);
int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
` (3 preceding siblings ...)
2025-05-30 12:18 ` [PATCH 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
@ 2025-05-30 12:18 ` Pavel Begunkov
2025-05-31 8:34 ` kernel test robot
2025-05-30 13:30 ` [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Jens Axboe
5 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 12:18 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Add a new socket command which returns tx time stamps to the user. It
provide an alternative to the existing error queue recvmsg interface.
The command works in a polled multishot mode, which means io_uring will
poll the socket and keep posting timestamps until the request is
cancelled or fails in any other way (e.g. with no space in the CQ). It
reuses the net infra and grabs timestamps from the socket's error queue.
The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
timevalue is store in the upper part of the extended CQE. The final
completion won't have IORING_CQR_F_MORE and will have cqe->res storing
0/error.
Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 6 +++
io_uring/cmd_net.c | 77 +++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index cfd17e382082..0bc156eb96d4 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
__u32 flags;
};
+struct io_timespec {
+ __u64 tv_sec;
+ __u64 tv_nsec;
+};
+
/*
* Argument for IORING_OP_URING_CMD when file is a socket
*/
@@ -968,6 +973,7 @@ enum io_uring_socket_op {
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
SOCKET_URING_OP_SETSOCKOPT,
+ SOCKET_URING_OP_TX_TIMESTAMP,
};
/* Zero copy receive refill queue entry */
diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
index e99170c7d41a..c9e80f7e14cb 100644
--- a/io_uring/cmd_net.c
+++ b/io_uring/cmd_net.c
@@ -1,5 +1,6 @@
#include <asm/ioctls.h>
#include <linux/io_uring/net.h>
+#include <linux/errqueue.h>
#include <net/sock.h>
#include "uring_cmd.h"
@@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
optlen);
}
+static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
+ struct sk_buff *skb, unsigned issue_flags)
+{
+ struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+ struct io_uring_cqe cqe[2];
+ struct io_timespec *iots;
+ struct timespec64 ts;
+ u32 tskey;
+
+ BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
+
+ if (!skb_get_tx_timestamp(skb, sk, &ts))
+ return false;
+
+ tskey = serr->ee.ee_data;
+
+ cqe->user_data = 0;
+ cqe->res = tskey;
+ cqe->flags = IORING_CQE_F_MORE;
+ cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
+
+ iots = (struct io_timespec *)&cqe[1];
+ iots->tv_sec = ts.tv_sec;
+ iots->tv_nsec = ts.tv_nsec;
+ return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
+}
+
+static int io_uring_cmd_timestamp(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct sock *sk = sock->sk;
+ struct sk_buff_head *q = &sk->sk_error_queue;
+ struct sk_buff *skb, *tmp;
+ struct sk_buff_head list;
+ int ret;
+
+ if (!(issue_flags & IO_URING_F_CQE32))
+ return -EINVAL;
+ ret = io_cmd_poll_multishot(cmd, issue_flags, POLLERR);
+ if (unlikely(ret))
+ return ret;
+
+ if (skb_queue_empty_lockless(q))
+ return -EAGAIN;
+ __skb_queue_head_init(&list);
+
+ scoped_guard(spinlock_irq, &q->lock) {
+ skb_queue_walk_safe(q, skb, tmp) {
+ /* don't support skbs with payload */
+ if (!skb_has_tx_timestamp(skb, sk) || skb->len)
+ continue;
+ __skb_unlink(skb, q);
+ __skb_queue_tail(&list, skb);
+ }
+ }
+
+ while (1) {
+ skb = skb_peek(&list);
+ if (!skb)
+ break;
+ if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
+ break;
+ __skb_dequeue(&list);
+ consume_skb(skb);
+ }
+
+ if (!unlikely(skb_queue_empty(&list))) {
+ scoped_guard(spinlock_irqsave, &q->lock)
+ skb_queue_splice(q, &list);
+ }
+ return -EAGAIN;
+}
+
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -76,6 +151,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
case SOCKET_URING_OP_SETSOCKOPT:
return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_TX_TIMESTAMP:
+ return io_uring_cmd_timestamp(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
` (4 preceding siblings ...)
2025-05-30 12:18 ` [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-05-30 13:30 ` Jens Axboe
5 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2025-05-30 13:30 UTC (permalink / raw)
To: Pavel Begunkov, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
On 5/30/25 6:18 AM, Pavel Begunkov wrote:
> Vadim Fedorenko suggested to add an alternative API for receiving
> tx timestamps through io_uring. The series introduces io_uring socket
> cmd for fetching tx timestamps, which is a polled multishot request,
> i.e. internally polling the socket for POLLERR and posts timestamps
> when they're arrives. For the API description see Patch 5.
>
> It reuses existing timestamp infra and takes them from the socket's
> error queue. For networking people the important parts are Patch 1,
> and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
>
> It should be reasonable to take it through the io_uring tree once
> we have consensus, but let me know if there are any concerns.
FWIW, this series looks good to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-05-30 18:14 ` Stanislav Fomichev
2025-05-30 18:30 ` Stanislav Fomichev
2025-06-01 13:52 ` Willem de Bruijn
1 sibling, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2025-05-30 18:14 UTC (permalink / raw)
To: Pavel Begunkov
Cc: io-uring, Vadim Fedorenko, netdev, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
David S . Miller, Jakub Kicinski, Richard Cochran
On 05/30, Pavel Begunkov wrote:
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an skb from an queue queue.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> include/net/sock.h | 4 ++++
> net/socket.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 92e7c1aae3cc..b0493e82b6e3 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
> struct sk_buff *skb);
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk);
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts);
> +
> static inline void
> sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> {
> diff --git a/net/socket.c b/net/socket.c
> index 9a0e720f0859..d1dc8ab28e46 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> sizeof(ts_pktinfo), &ts_pktinfo);
> }
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
> +{
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> +
> + if (serr->ee.ee_errno != ENOMSG ||
> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> + return false;
> +
> + /* software time stamp available and wanted */
> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> + return true;
> + /* hardware time stamps available and wanted */
> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> + skb_hwtstamps(skb)->hwtstamp;
> +}
> +
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts)
> +{
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> + bool false_tstamp = false;
> + ktime_t hwtstamp;
> + int if_index = 0;
> +
[..]
> + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
> + __net_timestamp(skb);
> + false_tstamp = true;
> + }
The place it was copy-pasted from (__sock_recv_timestamp) has a comment
about a race between packet rx and enabling the timestamp. Does the same
race happen here? Worth keeping the comment?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-05-30 18:14 ` Stanislav Fomichev
@ 2025-05-30 18:30 ` Stanislav Fomichev
2025-05-30 18:44 ` Pavel Begunkov
0 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2025-05-30 18:30 UTC (permalink / raw)
To: Pavel Begunkov
Cc: io-uring, Vadim Fedorenko, netdev, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
David S . Miller, Jakub Kicinski, Richard Cochran
On 05/30, Stanislav Fomichev wrote:
> On 05/30, Pavel Begunkov wrote:
> > Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> > associated with an skb from an queue queue.
> >
> > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> > ---
> > include/net/sock.h | 4 ++++
> > net/socket.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 53 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 92e7c1aae3cc..b0493e82b6e3 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> > void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
> > struct sk_buff *skb);
> >
> > +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk);
> > +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> > + struct timespec64 *ts);
> > +
> > static inline void
> > sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> > {
> > diff --git a/net/socket.c b/net/socket.c
> > index 9a0e720f0859..d1dc8ab28e46 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> > sizeof(ts_pktinfo), &ts_pktinfo);
> > }
> >
> > +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
> > +{
> > + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> > + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> > +
> > + if (serr->ee.ee_errno != ENOMSG ||
> > + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> > + return false;
> > +
> > + /* software time stamp available and wanted */
> > + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> > + return true;
> > + /* hardware time stamps available and wanted */
> > + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> > + skb_hwtstamps(skb)->hwtstamp;
> > +}
> > +
> > +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> > + struct timespec64 *ts)
> > +{
> > + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> > + bool false_tstamp = false;
> > + ktime_t hwtstamp;
> > + int if_index = 0;
> > +
>
> [..]
>
> > + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
> > + __net_timestamp(skb);
> > + false_tstamp = true;
> > + }
>
> The place it was copy-pasted from (__sock_recv_timestamp) has a comment
> about a race between packet rx and enabling the timestamp. Does the same
> race happen here? Worth keeping the comment?
Or maybe you don't need this case at all? Since you're skipping the
tstamp == 0 cases anyway down below... Pass 'false' to skb_is_swtx_tstamp
instead?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-05-30 18:30 ` Stanislav Fomichev
@ 2025-05-30 18:44 ` Pavel Begunkov
0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-30 18:44 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: io-uring, Vadim Fedorenko, netdev, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
David S . Miller, Jakub Kicinski, Richard Cochran
On 5/30/25 19:30, Stanislav Fomichev wrote:
> On 05/30, Stanislav Fomichev wrote:
>> On 05/30, Pavel Begunkov wrote:
>>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
>>> associated with an skb from an queue queue.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>> include/net/sock.h | 4 ++++
>>> net/socket.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 53 insertions(+)
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index 92e7c1aae3cc..b0493e82b6e3 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>>> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
>>> struct sk_buff *skb);
>>>
>>> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk);
>>> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
>>> + struct timespec64 *ts);
>>> +
>>> static inline void
>>> sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
>>> {
>>> diff --git a/net/socket.c b/net/socket.c
>>> index 9a0e720f0859..d1dc8ab28e46 100644
>>> --- a/net/socket.c
>>> +++ b/net/socket.c
>>> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
>>> sizeof(ts_pktinfo), &ts_pktinfo);
>>> }
>>>
>>> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
>>> +{
>>> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
>>> + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
>>> +
>>> + if (serr->ee.ee_errno != ENOMSG ||
>>> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
>>> + return false;
>>> +
>>> + /* software time stamp available and wanted */
>>> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
>>> + return true;
>>> + /* hardware time stamps available and wanted */
>>> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>>> + skb_hwtstamps(skb)->hwtstamp;
>>> +}
>>> +
>>> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
>>> + struct timespec64 *ts)
>>> +{
>>> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
>>> + bool false_tstamp = false;
>>> + ktime_t hwtstamp;
>>> + int if_index = 0;
>>> +
>>
>> [..]
>>
>>> + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
>>> + __net_timestamp(skb);
>>> + false_tstamp = true;
>>> + }
>>
>> The place it was copy-pasted from (__sock_recv_timestamp) has a comment
>> about a race between packet rx and enabling the timestamp. Does the same
>> race happen here? Worth keeping the comment?
I can add the comment
> Or maybe you don't need this case at all? Since you're skipping the
> tstamp == 0 cases anyway down below... Pass 'false' to skb_is_swtx_tstamp
> instead?
__net_timestamp updates skb->tstamp, so I couldn't prove it's fine to
omit just from looking at code. But I don't know all intricacies of
timestamping, would be great someone knows a way to simplify it further.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support
2025-05-30 12:18 ` [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-05-31 8:34 ` kernel test robot
0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-05-31 8:34 UTC (permalink / raw)
To: Pavel Begunkov, io-uring, Vadim Fedorenko
Cc: oe-kbuild-all, asml.silence, netdev, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
David S . Miller, Jakub Kicinski, Richard Cochran
Hi Pavel,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net/main]
[also build test WARNING on net-next/main linus/master next-20250530]
[cannot apply to horms-ipvs/master v6.15]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Begunkov/net-timestamp-add-helper-returning-skb-s-tx-tstamp/20250530-201922
base: net/main
patch link: https://lore.kernel.org/r/2308b0e2574858aeef6837f4f9897560a835e0f7.1748607147.git.asml.silence%40gmail.com
patch subject: [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support
config: riscv-randconfig-r121-20250531 (https://download.01.org/0day-ci/archive/20250531/202505311513.4gHg718O-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 10.5.0
reproduce: (https://download.01.org/0day-ci/archive/20250531/202505311513.4gHg718O-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505311513.4gHg718O-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
io_uring/cmd_net.c:59:32: sparse: sparse: array of flexible structures
io_uring/cmd_net.c: note: in included file:
io_uring/uring_cmd.h:21:59: sparse: sparse: array of flexible structures
>> io_uring/cmd_net.c:94:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected restricted __poll_t [usertype] mask @@ got int @@
io_uring/cmd_net.c:94:55: sparse: expected restricted __poll_t [usertype] mask
io_uring/cmd_net.c:94:55: sparse: got int
vim +94 io_uring/cmd_net.c
81
82 static int io_uring_cmd_timestamp(struct socket *sock,
83 struct io_uring_cmd *cmd,
84 unsigned int issue_flags)
85 {
86 struct sock *sk = sock->sk;
87 struct sk_buff_head *q = &sk->sk_error_queue;
88 struct sk_buff *skb, *tmp;
89 struct sk_buff_head list;
90 int ret;
91
92 if (!(issue_flags & IO_URING_F_CQE32))
93 return -EINVAL;
> 94 ret = io_cmd_poll_multishot(cmd, issue_flags, POLLERR);
95 if (unlikely(ret))
96 return ret;
97
98 if (skb_queue_empty_lockless(q))
99 return -EAGAIN;
100 __skb_queue_head_init(&list);
101
102 scoped_guard(spinlock_irq, &q->lock) {
103 skb_queue_walk_safe(q, skb, tmp) {
104 /* don't support skbs with payload */
105 if (!skb_has_tx_timestamp(skb, sk) || skb->len)
106 continue;
107 __skb_unlink(skb, q);
108 __skb_queue_tail(&list, skb);
109 }
110 }
111
112 while (1) {
113 skb = skb_peek(&list);
114 if (!skb)
115 break;
116 if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
117 break;
118 __skb_dequeue(&list);
119 consume_skb(skb);
120 }
121
122 if (!unlikely(skb_queue_empty(&list))) {
123 scoped_guard(spinlock_irqsave, &q->lock)
124 skb_queue_splice(q, &list);
125 }
126 return -EAGAIN;
127 }
128
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/5] io_uring/poll: introduce io_arm_apoll()
2025-05-30 12:18 ` [PATCH 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
@ 2025-05-31 10:28 ` Pavel Begunkov
0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-05-31 10:28 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
On 5/30/25 13:18, Pavel Begunkov wrote:
> In preparation to allowing commands to do file polling, add a helper
> that takes the desired poll event mask and arms it for polling. We won't
> be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it
> tries to infer the mask from the opcode data, and we can't unify it
> across all commands.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
... - if (req->flags & REQ_F_CLEAR_POLLIN)
> - mask &= ~EPOLLIN;
> - } else {
> - mask |= EPOLLOUT | EPOLLWRNORM;
> - }
> - if (def->poll_exclusive)
> - mask |= EPOLLEXCLUSIVE;
> -
> apoll = io_req_alloc_apoll(req, issue_flags);
> if (!apoll)
> return IO_APOLL_ABORTED;
> @@ -712,6 +696,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
> return IO_APOLL_OK;
> }
>
> +int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
> +{
> + const struct io_issue_def *def = &io_issue_defs[req->opcode];
> + __poll_t mask = POLLPRI | POLLERR | EPOLLET;
> +
> + if (!def->pollin && !def->pollout)
> + return IO_APOLL_ABORTED;
> + if (!io_file_can_poll(req))
> + return IO_APOLL_ABORTED;
> +
> + if (def->pollin) {
> + mask |= EPOLLIN | EPOLLRDNORM;
> +
> + /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> + if (req->flags & REQ_F_CLEAR_POLLIN)
> + mask &= ~EPOLLIN;
fwiw, I need to fix tabulation here
> + } else {
> + mask |= EPOLLOUT | EPOLLWRNORM;
> + }
> + if (def->poll_exclusive)
> + mask |= EPOLLEXCLUSIVE;
> +
> + return io_arm_apoll(req, issue_flags, mask);
> +}
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-05-30 18:14 ` Stanislav Fomichev
@ 2025-06-01 13:52 ` Willem de Bruijn
2025-06-02 9:57 ` Pavel Begunkov
1 sibling, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2025-06-01 13:52 UTC (permalink / raw)
To: Pavel Begunkov, io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Pavel Begunkov wrote:
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an skb from an queue queue.
Just curious: why a timestamp specific operation, rather than a
general error queue report?
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> include/net/sock.h | 4 ++++
> net/socket.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 92e7c1aae3cc..b0493e82b6e3 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
> struct sk_buff *skb);
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk);
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts);
> +
> static inline void
> sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> {
> diff --git a/net/socket.c b/net/socket.c
> index 9a0e720f0859..d1dc8ab28e46 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> sizeof(ts_pktinfo), &ts_pktinfo);
> }
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
Here and elsewhere: consider const pointers where possible
> +{
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> +
> + if (serr->ee.ee_errno != ENOMSG ||
> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> + return false;
> +
> + /* software time stamp available and wanted */
> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> + return true;
> + /* hardware time stamps available and wanted */
> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> + skb_hwtstamps(skb)->hwtstamp;
> +}
> +
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts)
> +{
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> + bool false_tstamp = false;
> + ktime_t hwtstamp;
> + int if_index = 0;
> +
> + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
> + __net_timestamp(skb);
> + false_tstamp = true;
> + }
This is for SO_TIMESTAMP, not SO_TIMESTAMPING, and intended in the
receive path only, where net_enable_timestamp may be too late for
initial packets.
> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
> + ktime_to_timespec64_cond(skb->tstamp, ts))
> + return true;
> +
> + if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
> + skb_is_swtx_tstamp(skb, false_tstamp))
> + return false;
> +
> + if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
> + hwtstamp = get_timestamp(sk, skb, &if_index);
> + else
> + hwtstamp = skb_hwtstamps(skb)->hwtstamp;
> +
> + if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
> + hwtstamp = ptp_convert_timestamp(&hwtstamp,
> + READ_ONCE(sk->sk_bind_phc));
> + return ktime_to_timespec64_cond(hwtstamp, ts);
This duplicates code in __sock_recv_timestamp. Perhaps worth a helper.
> +}
> +
> /*
> * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
> */
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-01 13:52 ` Willem de Bruijn
@ 2025-06-02 9:57 ` Pavel Begunkov
2025-06-02 13:31 ` Willem de Bruijn
0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2025-06-02 9:57 UTC (permalink / raw)
To: Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
On 6/1/25 14:52, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
>> associated with an skb from an queue queue.
>
> Just curious: why a timestamp specific operation, rather than a
> general error queue report?
Timestamps still need custom code, not like we can do a generic
implementation just by copying sock_extended_err to user. And then
it'll be a problem to fit it into completions, it's already tight
after placing the timeval directly into cqe, there are only
few bits left.
Either way, I guess it can be extended if there are more use cases,
or might be better introducing and new command to cover that and
share some of the handling.
...>> diff --git a/net/socket.c b/net/socket.c
>> index 9a0e720f0859..d1dc8ab28e46 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
>> sizeof(ts_pktinfo), &ts_pktinfo);
>> }
>>
>> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
>
> Here and elsewhere: consider const pointers where possible
will do
>
>> +{
>> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
>> + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
>> +
>> + if (serr->ee.ee_errno != ENOMSG ||
>> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
>> + return false;
>> +
>> + /* software time stamp available and wanted */
>> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
>> + return true;
>> + /* hardware time stamps available and wanted */
>> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>> + skb_hwtstamps(skb)->hwtstamp;
>> +}
>> +
>> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
>> + struct timespec64 *ts)
>> +{
>> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
>> + bool false_tstamp = false;
>> + ktime_t hwtstamp;
>> + int if_index = 0;
>> +
>> + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
>> + __net_timestamp(skb);
>> + false_tstamp = true;
>> + }
>
> This is for SO_TIMESTAMP, not SO_TIMESTAMPING, and intended in the
> receive path only, where net_enable_timestamp may be too late for
> initial packets.
Got it, I'll drop that chunk if you think it's fine. Thanks
for review
>> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
>> + ktime_to_timespec64_cond(skb->tstamp, ts))
>> + return true;
>> +
>> + if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
>> + skb_is_swtx_tstamp(skb, false_tstamp))
>> + return false;
>> +
>> + if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
>> + hwtstamp = get_timestamp(sk, skb, &if_index);
>> + else
>> + hwtstamp = skb_hwtstamps(skb)->hwtstamp;
>> +
>> + if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
>> + hwtstamp = ptp_convert_timestamp(&hwtstamp,
>> + READ_ONCE(sk->sk_bind_phc));
>> + return ktime_to_timespec64_cond(hwtstamp, ts);
>
> This duplicates code in __sock_recv_timestamp. Perhaps worth a helper.
I couldn't find a good way for doing that. There are rx checks in
every if, there is also pkt info handling nested. And
scm_timestamping_internal has 3 timeouts , so
__sock_recv_timestamp() would need to duplicate some checks to
choose the right place for the timeout or so.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-02 9:57 ` Pavel Begunkov
@ 2025-06-02 13:31 ` Willem de Bruijn
2025-06-04 8:51 ` Pavel Begunkov
0 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2025-06-02 13:31 UTC (permalink / raw)
To: Pavel Begunkov, Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Pavel Begunkov wrote:
> On 6/1/25 14:52, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> >> associated with an skb from an queue queue.
> >
> > Just curious: why a timestamp specific operation, rather than a
> > general error queue report?
>
> Timestamps still need custom code, not like we can do a generic
> implementation just by copying sock_extended_err to user. And then
> it'll be a problem to fit it into completions, it's already tight
> after placing the timeval directly into cqe, there are only
> few bits left.
Ok understood.
> Either way, I guess it can be extended if there are more use cases,
> or might be better introducing and new command to cover that and
> share some of the handling.
Not a request from me, to be clear. Just wanted to understand the
design choice.
> ...>> diff --git a/net/socket.c b/net/socket.c
> >> index 9a0e720f0859..d1dc8ab28e46 100644
> >> --- a/net/socket.c
> >> +++ b/net/socket.c
> >> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> >> sizeof(ts_pktinfo), &ts_pktinfo);
> >> }
> >>
> >> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
> >
> > Here and elsewhere: consider const pointers where possible
>
> will do
>
> >
> >> +{
> >> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> >> + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> >> +
> >> + if (serr->ee.ee_errno != ENOMSG ||
> >> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> >> + return false;
> >> +
> >> + /* software time stamp available and wanted */
> >> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> >> + return true;
> >> + /* hardware time stamps available and wanted */
> >> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> >> + skb_hwtstamps(skb)->hwtstamp;
> >> +}
> >> +
> >> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> >> + struct timespec64 *ts)
> >> +{
> >> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> >> + bool false_tstamp = false;
> >> + ktime_t hwtstamp;
> >> + int if_index = 0;
> >> +
> >> + if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
> >> + __net_timestamp(skb);
> >> + false_tstamp = true;
> >> + }
> >
> > This is for SO_TIMESTAMP, not SO_TIMESTAMPING, and intended in the
> > receive path only, where net_enable_timestamp may be too late for
> > initial packets.
>
> Got it, I'll drop that chunk if you think it's fine. Thanks
> for review
>
> >> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
> >> + ktime_to_timespec64_cond(skb->tstamp, ts))
> >> + return true;
> >> +
> >> + if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
> >> + skb_is_swtx_tstamp(skb, false_tstamp))
> >> + return false;
> >> +
> >> + if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
> >> + hwtstamp = get_timestamp(sk, skb, &if_index);
> >> + else
> >> + hwtstamp = skb_hwtstamps(skb)->hwtstamp;
> >> +
> >> + if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
> >> + hwtstamp = ptp_convert_timestamp(&hwtstamp,
> >> + READ_ONCE(sk->sk_bind_phc));
> >> + return ktime_to_timespec64_cond(hwtstamp, ts);
> >
> > This duplicates code in __sock_recv_timestamp. Perhaps worth a helper.
>
> I couldn't find a good way for doing that. There are rx checks in
> every if, there is also pkt info handling nested. And
> scm_timestamping_internal has 3 timeouts , so
> __sock_recv_timestamp() would need to duplicate some checks to
> choose the right place for the timeout or so.
Ack, then let's leave as is. Thanks for taking a stab.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-02 13:31 ` Willem de Bruijn
@ 2025-06-04 8:51 ` Pavel Begunkov
2025-06-04 13:38 ` Willem de Bruijn
0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2025-06-04 8:51 UTC (permalink / raw)
To: Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
On 6/2/25 14:31, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> On 6/1/25 14:52, Willem de Bruijn wrote:
>>> Pavel Begunkov wrote:
>>>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
>>>> associated with an skb from an queue queue.
...>> ...>> diff --git a/net/socket.c b/net/socket.c
>>>> index 9a0e720f0859..d1dc8ab28e46 100644
>>>> --- a/net/socket.c
>>>> +++ b/net/socket.c
>>>> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
>>>> sizeof(ts_pktinfo), &ts_pktinfo);
>>>> }
>>>>
>>>> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
>>>
>>> Here and elsewhere: consider const pointers where possible
>>
>> will do
I constantized the sock pointer in v2 but can't do same with skb as
skb_hwtstamps() and other helpers don't work with const. I can follow
up on top preparing those helpers, but to avoid cross tree conflicts
it's probably better to leave the helpers from this patch without
const untill all is merged and pulled, hope that's works for you.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-04 8:51 ` Pavel Begunkov
@ 2025-06-04 13:38 ` Willem de Bruijn
0 siblings, 0 replies; 17+ messages in thread
From: Willem de Bruijn @ 2025-06-04 13:38 UTC (permalink / raw)
To: Pavel Begunkov, Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran
Pavel Begunkov wrote:
> On 6/2/25 14:31, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> On 6/1/25 14:52, Willem de Bruijn wrote:
> >>> Pavel Begunkov wrote:
> >>>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> >>>> associated with an skb from an queue queue.
> ...>> ...>> diff --git a/net/socket.c b/net/socket.c
> >>>> index 9a0e720f0859..d1dc8ab28e46 100644
> >>>> --- a/net/socket.c
> >>>> +++ b/net/socket.c
> >>>> @@ -843,6 +843,55 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> >>>> sizeof(ts_pktinfo), &ts_pktinfo);
> >>>> }
> >>>>
> >>>> +bool skb_has_tx_timestamp(struct sk_buff *skb, struct sock *sk)
> >>>
> >>> Here and elsewhere: consider const pointers where possible
> >>
> >> will do
>
> I constantized the sock pointer in v2 but can't do same with skb as
> skb_hwtstamps() and other helpers don't work with const. I can follow
> up on top preparing those helpers, but to avoid cross tree conflicts
> it's probably better to leave the helpers from this patch without
> const untill all is merged and pulled, hope that's works for you.
Ok. No need to follow up per se.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-06-04 13:38 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 12:18 [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-05-30 12:18 ` [PATCH 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-05-30 18:14 ` Stanislav Fomichev
2025-05-30 18:30 ` Stanislav Fomichev
2025-05-30 18:44 ` Pavel Begunkov
2025-06-01 13:52 ` Willem de Bruijn
2025-06-02 9:57 ` Pavel Begunkov
2025-06-02 13:31 ` Willem de Bruijn
2025-06-04 8:51 ` Pavel Begunkov
2025-06-04 13:38 ` Willem de Bruijn
2025-05-30 12:18 ` [PATCH 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
2025-05-31 10:28 ` Pavel Begunkov
2025-05-30 12:18 ` [PATCH 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
2025-05-30 12:18 ` [PATCH 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
2025-05-30 12:18 ` [PATCH 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
2025-05-31 8:34 ` kernel test robot
2025-05-30 13:30 ` [PATCH io_uring-next 0/5] io_uring cmd for tx timestamps Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).