From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Willem de Bruijn <willemb@google.com>
Cc: netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
Jason Xing <kerneljasonxing@gmail.com>,
Paolo Abeni <pabeni@redhat.com>, David Ahern <dsahern@kernel.org>
Subject: Re: [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
Date: Mon, 2 Sep 2024 22:07:44 +0100 [thread overview]
Message-ID: <932f7ff3-6087-47cf-91ec-2601da38ebec@linux.dev> (raw)
In-Reply-To: <66d5cbbba9669_6138829497@willemb.c.googlers.com.notmuch>
On 02/09/2024 15:29, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>> timestamps and packets sent via socket. Unfortunately, there is no way
>> to reliably predict socket timestamp ID value in case of error returned
>> by sendmsg. For UDP sockets it's impossible because of lockless
>> nature of UDP transmit, several threads may send packets in parallel. In
>> case of RAW sockets MSG_MORE option makes things complicated. More
>> details are in the conversation [1].
>> This patch adds new control message type to give user-space
>> software an opportunity to control the mapping between packets and
>> values by providing ID with each sendmsg. This works fine for UDP
>> sockets only, and explicit check is added to control message parser.
>>
>> [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>>
>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>> ---
>> Documentation/networking/timestamping.rst | 14 ++++++++++++++
>> arch/alpha/include/uapi/asm/socket.h | 4 +++-
>> arch/mips/include/uapi/asm/socket.h | 2 ++
>> arch/parisc/include/uapi/asm/socket.h | 2 ++
>> arch/sparc/include/uapi/asm/socket.h | 2 ++
>> include/net/inet_sock.h | 4 +++-
>> include/net/sock.h | 1 +
>> include/uapi/asm-generic/socket.h | 2 ++
>> include/uapi/linux/net_tstamp.h | 3 ++-
>> net/core/sock.c | 12 ++++++++++++
>> net/ethtool/common.c | 1 +
>> net/ipv4/ip_output.c | 16 ++++++++++++----
>> net/ipv6/ip6_output.c | 16 ++++++++++++----
>> 13 files changed, 68 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
>> index 5e93cd71f99f..93b0901e4e8e 100644
>> --- a/Documentation/networking/timestamping.rst
>> +++ b/Documentation/networking/timestamping.rst
>> @@ -193,6 +193,20 @@ SOF_TIMESTAMPING_OPT_ID:
>> among all possibly concurrently outstanding timestamp requests for
>> that socket.
>>
>> + With this option enabled user-space application can provide custom
>> + ID for each message sent via UDP socket with control message with
>> + type set to SCM_TS_OPT_ID::
>> +
>> + struct msghdr *msg;
>> + ...
>> + cmsg = CMSG_FIRSTHDR(msg);
>> + cmsg->cmsg_level = SOL_SOCKET;
>> + cmsg->cmsg_type = SO_TIMESTAMPING;
>> + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
>> + *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
>> + err = sendmsg(fd, msg, 0);
>> +
>
> Please make it clear that this CMSG is optional.
>
> The process can optionally override the default generated ID, by
> passing a specific ID with control message SCM_TS_OPT_ID:
Ok, I'll re-phrase it this way, thanks!
>> SOF_TIMESTAMPING_OPT_ID_TCP:
>> Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
>> timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index e94f621903fe..0698e6662cdf 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -10,7 +10,7 @@
>> * Note: we only bother about making the SOL_SOCKET options
>> * same as OSF/1, as that's all that "normal" programs are
>> * likely to set. We don't necessarily want to be binary
>> - * compatible with _everything_.
>> + * compatible with _everything_.
>
> Is this due to a checkpatch warning? If so, please add a brief comment
> to the commit message to show that this change is intentional. If not,
> please don't touch unrelated code.
I'll remove it, because it looks like it was some unhappy linter...
>> */
>> #define SOL_SOCKET 0xffff
>>
>> @@ -140,6 +140,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index 60ebaed28a4c..bb3dc8feb205 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -151,6 +151,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index be264c2b1a11..c3ab3b3289eb 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -132,6 +132,8 @@
>> #define SO_PASSPIDFD 0x404A
>> #define SO_PEERPIDFD 0x404B
>>
>> +#define SCM_TS_OPT_ID 0x404C
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 682da3714686..9b40f0a57fbc 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -133,6 +133,8 @@
>> #define SO_PASSPIDFD 0x0055
>> #define SO_PEERPIDFD 0x0056
>>
>> +#define SCM_TS_OPT_ID 0x0057
>> +
>> #if !defined(__KERNEL__)
>>
>>
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index 394c3b66065e..2161d50cf0fd 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -174,6 +174,7 @@ struct inet_cork {
>> __s16 tos;
>> char priority;
>> __u16 gso_size;
>> + u32 ts_opt_id;
>> u64 transmit_time;
>> u32 mark;
>> };
>> @@ -241,7 +242,8 @@ struct inet_sock {
>> struct inet_cork_full cork;
>> };
>>
>> -#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
>> +#define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
>> +#define IPCORK_TS_OPT_ID 2 /* timestmap opt id has been provided in cmsg */
>
> typo: timestamp
>
> And maybe more relevant: /* ts_opt_id field is valid, overriding sk_tskey */
I'll change it
>> enum {
>> INET_FLAGS_PKTINFO = 0,
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index f51d61fab059..73e21dad5660 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1794,6 +1794,7 @@ struct sockcm_cookie {
>> u64 transmit_time;
>> u32 mark;
>> u32 tsflags;
>> + u32 ts_opt_id;
>> };
>>
>> static inline void sockcm_init(struct sockcm_cookie *sockc,
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index 8ce8a39a1e5f..db3df3e74b01 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -135,6 +135,8 @@
>> #define SO_PASSPIDFD 76
>> #define SO_PEERPIDFD 77
>>
>> +#define SCM_TS_OPT_ID 78
>
> nit: different indentation
Hmm... that's interesting, it's ok in the code, there no spaces before
#define. I'll re-check it in the patch in v3.
>> +
>> #if !defined(__KERNEL__)
>>
>> #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index a2c66b3d7f0f..e2f145e3f3a1 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -32,8 +32,9 @@ enum {
>> SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>> SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>> SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>> + SOF_TIMESTAMPING_OPT_ID_CMSG = (1 << 17),
>>
>> - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
>> + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_CMSG,
>> SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>> SOF_TIMESTAMPING_LAST
>> };
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 468b1239606c..560b075765fa 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2859,6 +2859,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>> return -EINVAL;
>> sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
>> break;
>> + case SCM_TS_OPT_ID:
>> + /* allow this option for UDP sockets only */
>> + if (!sk_is_udp(sk))
>> + return -EINVAL;
>
> Let's relax the restriction that this is only for UDP.
>
> At least to also support SOCK_RAW. I don't think that requires any
> additional code at all?
RAW sockets use skb_setup_tx_timestamps which does atomic operation of
incrementing sk_tskey when in _sock_tx_timestamp. So I'll have to
convert all spots (can, ipv4/raw, ipv6/raw, 3 x af_packet) to use the
same logic as in udp path (sock_tx_timestamp) and add conditions.
Or change skb_setup_tx_timestamps to do the logic and take different
arguments. It may look as a big refactoring, so I would like to make
it as a follow-up series.
> Extending to TCP should be straightforward too, just a branch
> on sockc in tcp_tx_timestamp.
TCP part looks a bit easier, as you said, I have to adjust
tcp_tx_timestamp and the logic is straightforward. I still have to
provide a pointer to sock coockie instead of flags, but there is only
one caller of this function and it's much easier than with RAW sockets.
> If so, let's support all. It makes for a simpler API if it is
> supported uniformly wherever OPT_ID is.
If you think that the way I explained for RAW sockets is good enough,
I can send all of them as a single patcheset. Otherwise I would like to
add RAW sockets in follow-up series.
>> + tsflags = READ_ONCE(sk->sk_tsflags);
>> + if (!(tsflags & SOF_TIMESTAMPING_OPT_ID))
>> + return -EINVAL;
>> + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
>> + return -EINVAL;
>> + sockc->ts_opt_id = *(u32 *)CMSG_DATA(cmsg);
>> + sockc->tsflags |= SOF_TIMESTAMPING_OPT_ID_CMSG;
>> + break;
>> /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
>> case SCM_RIGHTS:
>> case SCM_CREDENTIALS:
next prev parent reply other threads:[~2024-09-02 21:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-02 13:09 [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-02 13:09 ` [PATCH net-next v2 2/2] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
2024-09-02 14:24 ` Jason Xing
2024-09-08 20:04 ` Vadim Fedorenko
2024-09-08 23:36 ` Jason Xing
2024-09-02 14:29 ` [PATCH net-next v2 1/2] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Willem de Bruijn
2024-09-02 21:07 ` Vadim Fedorenko [this message]
2024-09-03 20:40 ` Willem de Bruijn
2024-09-02 15:19 ` Jason Xing
2024-09-02 15:51 ` Willem de Bruijn
2024-09-02 19:40 ` Vadim Fedorenko
2024-09-02 20:59 ` Willem de Bruijn
2024-09-02 21:10 ` Vadim Fedorenko
2024-09-03 16:01 ` Vadim Fedorenko
2024-09-03 20:46 ` Willem de Bruijn
2024-09-02 18:38 ` Simon Horman
2024-09-02 19:35 ` Vadim Fedorenko
2024-09-03 15:23 ` Simon Horman
2024-09-03 8:06 ` Dan Carpenter
2024-09-03 8:16 ` Vadim Fedorenko
-- strict thread matches above, loose matches on Subject: below --
2024-09-03 4:16 kernel test robot
2024-09-04 13:25 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=932f7ff3-6087-47cf-91ec-2601da38ebec@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=dsahern@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.