netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Willem de Bruijn <willemb@google.com>
Cc: netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
	Jason Xing <kerneljasonxing@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets
Date: Fri, 6 Sep 2024 18:27:00 +0100	[thread overview]
Message-ID: <1f17d828-5d0f-4050-be4b-8840feb8de76@linux.dev> (raw)
In-Reply-To: <66db1f004a0c_29a3852944d@willemb.c.googlers.com.notmuch>

On 06/09/2024 16:25, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>
>>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
>>>>> ---
>>>>>     net/ipv4/tcp.c | 13 +++++++++----
>>>>>     1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>> --- a/net/ipv4/tcp.c
>>>>> +++ b/net/ipv4/tcp.c
>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>     }
>>>>>     EXPORT_SYMBOL(tcp_init_sock);
>>>>>     
>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>     {
>>>>>     	struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>> +	u32 tsflags = sockc->tsflags;
>>>>>     
>>>>>     	if (tsflags && skb) {
>>>>>     		struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>     		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>     		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>     			tcb->txstamp_ack = 1;
>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>> +				shinfo->tskey = sockc->ts_opt_id;
>>>>> +			else
>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>> +		}
>>>>>     	}
>>>>>     }
>>>>>     
>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>     
>>>>>     out:
>>>>>     	if (copied) {
>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
>>>>> +		tcp_tx_timestamp(sk, &sockc);
>>>>>     		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>     	}
>>>>>     out_nopush:
>>>>
>>>> Hi Willem,
>>>>
>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>> flow:
>>>>
>>>> tcp_skb_collapse_tstamp()
>>>> tcp_fragment_tstamp()
>>>> tcp_gso_tstamp()
>>>>
>>>> I believe the last one breaks tests, but the problem is that there is no
>>>> easy way to provide the flag of constant tskey to it. Only
>>>> shinfo::tx_flags are available at the caller side and we have already
>>>> discussed that we shouldn't use the last bit of this field.
>>>>
>>>> So, how should we deal with the problem? Or is it better to postpone
>>>> support for TCP sockets in this case?
>>>
>>> Are you sure that this is a problem. These functions pass on the
>>> skb_shinfo(skb)->ts_key from one skb to another.
>>
>> Yes, you are right, the problem is in a different place.
>>
>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>> value:
>>
>> 	if (sk_is_tcp(sk))
>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>
>> It happens because of assumption that for TCP sockets shinfo::tskey will
>> have sequence number and the logic has to recalculate it back to the
>> bytes sent. The same logic exists in all TCP TX timestamping functions
>> (mentioned in the previous mail) and may trigger some unexpected
>> behavior. To fix the issue we have to provide some kind of signal that
>> tskey value is provided from user-space and shouldn't be changed. And we
>> have to have it somewhere in skb info. Again, tx_flags looks like the
>> best candidate, but it's impossible to use. I'm thinking of using
>> special flag in tcp_skb_cb - gonna test more, but open for other
>> suggestions.
> 
> Ai, that is tricky. tx_flags is full/scarce indeed.
> 
> CB does not persist as the skb transitions between layers.
> 
> The most obvious solution would be to set the flag in sk_tsflags
> itself. But then the cmsg would no long work on a per request basis:
> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> 
> Good that we catch this now before the ABI is finalized.
> 
> If necessary TCP semantics can diverge from datagrams. So we could
> punt on this. But it's not ideal.

I have done proof of concept code which uses hwtsamp as a storage for
custom tskey in case of TCP:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a52638363ea5..40ed49e61bf7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct 
sk_buff *skb,
         serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
         if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
                 serr->ee.ee_data = skb_shinfo(skb)->tskey;
-               if (sk_is_tcp(sk))
-                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
         }

         err = sock_queue_err_skb(sk, skb);
@@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
          * but only if the socket refcount is not zero.
          */
         if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
+               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) & 
SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
+                       skb_shinfo(skb)->tskey = 
(u32)skb_hwtstamps(skb)->hwtstamp;
                 *skb_hwtstamps(skb) = *hwtstamps;
                 __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, 
false);
                 sock_put(sk);
@@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
                 skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
         }

+       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+               if (skb_hwtstamps(orig_skb)->hwtstamp)
+                       skb_shinfo(skb)->tskey = 
(u32)skb_hwtstamps(orig_skb)->hwtstamp;
+               else
+                       skb_shinfo(skb)->tskey -= 
atomic_read(&sk->sk_tskey);
+       }
         if (hwtstamps)
                 *skb_hwtstamps(skb) = *hwtstamps;
         else
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d3decc13a99..1a161a2155b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct 
sockcm_cookie *sockc)
                         tcb->txstamp_ack = 1;
                 if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
                         if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
-                               shinfo->tskey = sockc->ts_opt_id;
-                       else
-                               shinfo->tskey = TCP_SKB_CB(skb)->seq + 
skb->len - 1;
+                               skb_hwtstamps(skb)->hwtstamp = 
sockc->ts_opt_id;
+                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
                 }
         }
  }


Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
it. netdev_data field is only used on RX timestamp side, so should be
fine to reuse. WDYT?


  parent reply	other threads:[~2024-09-06 17:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04 11:31 [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 1/4] net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message Vadim Fedorenko
2024-09-04 13:56   ` Jason Xing
2024-09-04 14:01     ` Vadim Fedorenko
2024-09-04 20:54   ` Willem de Bruijn
2024-09-05 10:10     ` Vadim Fedorenko
2024-09-05 13:29       ` Willem de Bruijn
2024-09-05  8:24   ` Jason Xing
2024-09-05  8:34     ` Vadim Fedorenko
2024-09-05  8:49       ` Jason Xing
2024-09-06  9:22   ` kernel test robot
2024-09-06 12:58   ` kernel test robot
2024-09-04 11:31 ` [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP sockets Vadim Fedorenko
2024-09-05 14:58   ` Vadim Fedorenko
2024-09-05 15:37     ` Jason Xing
2024-09-05 16:39     ` Willem de Bruijn
2024-09-06 12:20       ` Vadim Fedorenko
2024-09-06 15:25         ` Willem de Bruijn
2024-09-06 16:33           ` Willem de Bruijn
2024-09-06 17:27             ` Vadim Fedorenko
2024-09-06 23:50               ` Willem de Bruijn
2024-09-06 17:27           ` Vadim Fedorenko [this message]
2024-09-06 23:48             ` Willem de Bruijn
2024-09-07 21:55               ` Vadim Fedorenko
2024-09-08 19:19                 ` Willem de Bruijn
2024-09-08 19:55                   ` Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 3/4] net_tstamp: add SCM_TS_OPT_ID for RAW sockets Vadim Fedorenko
2024-09-04 11:31 ` [PATCH net-next v3 4/4] selftests: txtimestamp: add SCM_TS_OPT_ID test Vadim Fedorenko
2024-09-04 11:36 ` [PATCH net-next v3 0/4] Add option to provide OPT_ID value via cmsg Vadim Fedorenko
2024-09-04 21:04   ` Willem de Bruijn
2024-09-04 22:50     ` Vadim Fedorenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f17d828-5d0f-4050-be4b-8840feb8de76@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=dsahern@kernel.org \
    --cc=horms@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).