netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, dsahern@kernel.org,
	willemdebruijn.kernel@gmail.com, willemb@google.com,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
	ykolal@fb.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset
Date: Wed, 30 Oct 2024 18:17:39 -0700	[thread overview]
Message-ID: <e56f78a9-cbda-4b80-8b55-c16b36e4efb1@linux.dev> (raw)
In-Reply-To: <CAL+tcoBNiZQr=yk_fb9eoKX1_Nr4LuDaa1kkLGbdnc=8JNKnNg@mail.gmail.com>

On 10/29/24 11:50 PM, Jason Xing wrote:
> On Wed, Oct 30, 2024 at 1:42 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/28/24 4:05 AM, Jason Xing wrote:
>>> +/* Used to track the tskey for bpf extension
>>> + *
>>> + * @sk_tskey: bpf extension can use it only when no application uses.
>>> + *            Application can use it directly regardless of bpf extension.
>>> + *
>>> + * There are three strategies:
>>> + * 1) If we've already set through setsockopt() and here we're going to set
>>> + *    OPT_ID for bpf use, we will not re-initialize the @sk_tskey and will
>>> + *    keep the record of delta between the current "key" and previous key.
>>> + * 2) If we've already set through bpf_setsockopt() and here we're going to
>>> + *    set for application use, we will record the delta first and then
>>> + *    override/initialize the @sk_tskey.
>>> + * 3) other cases, which means only either of them takes effect, so initialize
>>> + *    everything simplely.
>>> + */
>>> +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type)
>>> +{
>>> +     u32 tskey;
>>> +
>>> +     if (sk_is_tcp(sk)) {
>>> +             if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
>>> +                     return -EINVAL;
>>> +
>>> +             if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
>>> +                     tskey = tcp_sk(sk)->write_seq;
>>> +             else
>>> +                     tskey = tcp_sk(sk)->snd_una;
>>> +     } else {
>>> +             tskey = 0;
>>> +     }
>>> +
>>> +     if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>>> +             sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey);
>>> +             return 0;
>>> +     } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) {
>>> +             sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey;
>>> +     } else {
>>> +             sk->sk_tskey_bpf_offset = 0;
>>> +     }
>>> +
>>> +     return tskey;
>>> +}
>>
>> Before diving into this route, the bpf prog can peek into the tcp seq no in the
>> skb. It can also look at the sk->sk_tskey for UDP socket. Can you explain why
>> those are not enough information for the bpf prog?
> 
> Well, it does make sense. It seems we don't need to implement tskey
> for this bpf feature...
> 
> Due to lack of enough knowledge of bpf, could you provide more hints
> that I can follow to write a bpf program to print more information
> from the skb? Like in the last patch of this series, in
> tools/testing/selftests/bpf/prog_tests/so_timestamping.c, do we have a
> feasible way to do that?

The bpf-prog@sendmsg() will be run to capture a timestamp for sendmsg().
When running the bpf-prog@sendmsg(), the skb can be set to the "struct 
bpf_sock_ops_kern sock_ops;" which is passed to the sockops prog. Take a look at 
bpf_skops_write_hdr_opt().

bpf prog cannot directly access the skops->skb now. It is because the sockops 
prog sees the uapi "struct bpf_sock_ops" instead of "struct 
bpf_sock_ops(_kern)". The conversion is done in sock_ops_convert_ctx_access. It 
is an old way before BTF. I don't want to extend the uapi "struct bpf_sock_ops".

Instead, use bpf_cast_to_kern_ctx((struct bpf_sock_ops *)skops_ctx) to get a 
trusted "struct bpf_sock_ops(_kern) *skops" pointer. Then it can access the 
skops->skb. afaik, the tcb->seq should be available already during sendmsg. it 
should be able to get it from TCP_SKB_CB(skb)->seq with the bpf_core_cast. Take 
a look at the existing examples of bpf_core_cast.

The same goes for the skb->data. It can use the bpf_dynptr_from_skb(). It is not 
available to skops program now but should be easy to expose.

The bpf prog wants to calculate the delay between [sendmsg, SCHED], [SCHED, 
SND], [SND, ACK]. It is why (at least in my mental model) a key is needed to 
co-relate the sendmsg, SCHED, SND, and ACK timestamp. The tcp seqno could be 
served as that key.

All that said, while looking at tcp_tx_timestamp() again, there is always 
"shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;". shinfo->tskey can be 
used directly as-is by the bpf prog. I think now I am missing why the bpf prog 
needs the sk_tskey in the sk?

In the bpf prog, when the SCHED/SND/ACK timestamp comes back, it has to find the 
earlier sendmsg timestamp. One option is to store the earlier sendmsg timestamp 
at the bpf map key-ed by seqno or the shinfo's tskey. Storing in a bpf map 
key-ed by seqno/tskey is probably what the selftest should do. In the future, we 
can consider allowing the rbtree in the bpf sk local storage for searching 
seqno. There is shinfo's hwtstamp that can be used also if there is a need.

  reply	other threads:[~2024-10-31  1:17 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-28 11:05 [PATCH net-next v3 00/14] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 01/14] net-timestamp: reorganize in skb_tstamp_tx_output() Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly Jason Xing
2024-10-29 23:00   ` Martin KaFai Lau
2024-10-30  1:23     ` Jason Xing
2024-10-30  1:45       ` Willem de Bruijn
2024-10-30  2:32         ` Jason Xing
2024-10-30  2:47           ` Willem de Bruijn
2024-10-30  3:04             ` Jason Xing
2024-10-30  5:37               ` Martin KaFai Lau
2024-10-30  6:42                 ` Jason Xing
2024-10-30 17:15                   ` Willem de Bruijn
2024-10-30 23:54                     ` Jason Xing
2024-10-31  0:13                       ` Jason Xing
2024-10-31  6:27                         ` Martin KaFai Lau
2024-10-31  7:04                           ` Jason Xing
2024-10-31 12:30                             ` Willem de Bruijn
2024-10-31 13:50                               ` Jason Xing
2024-10-31 23:26                                 ` Martin KaFai Lau
2024-11-01  7:47                                   ` Jason Xing
2024-11-05  1:50                                     ` Martin KaFai Lau
2024-11-05  3:13                                       ` Jason Xing
2024-11-01 13:32                                   ` Willem de Bruijn
2024-11-01 16:08                                     ` Jason Xing
2024-11-01 16:39                                       ` Willem de Bruijn
2024-11-05  2:09                                     ` Martin KaFai Lau
2024-11-05  6:22                                       ` Jason Xing
2024-11-05 19:22                                         ` Martin KaFai Lau
2024-11-06  0:17                                           ` Jason Xing
2024-11-06  1:09                                             ` Martin KaFai Lau
2024-11-06  2:51                                               ` Jason Xing
2024-11-07  1:19                                                 ` Martin KaFai Lau
2024-11-07  3:31                                                   ` Jason Xing
2024-11-07 19:05                                                     ` Martin KaFai Lau
2024-11-06  1:11                                             ` Willem de Bruijn
2024-11-06  2:37                                               ` Jason Xing
2024-11-05 14:29                                       ` Willem de Bruijn
2024-11-02 13:43   ` Simon Horman
2024-11-03  0:42     ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 03/14] net-timestamp: open gate for bpf_setsockopt/_getsockopt Jason Xing
2024-10-29  0:59   ` Willem de Bruijn
2024-10-29  1:18     ` Jason Xing
2024-10-30  0:32   ` Martin KaFai Lau
2024-10-30  1:15     ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 04/14] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-29  0:23   ` kernel test robot
2024-10-29  1:02   ` Willem de Bruijn
2024-10-29  1:30     ` Jason Xing
2024-10-29  1:04   ` kernel test robot
2024-10-28 11:05 ` [PATCH net-next v3 05/14] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 06/14] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-29  1:03   ` Willem de Bruijn
2024-10-29  1:19     ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 07/14] net-timestamp: add a new triggered point to set sk_tsflags_bpf in UDP layer Jason Xing
2024-10-29  1:07   ` Willem de Bruijn
2024-10-29  1:23     ` Jason Xing
2024-10-29  1:33       ` Willem de Bruijn
2024-10-29  3:12         ` Jason Xing
2024-10-29 15:04           ` Willem de Bruijn
2024-10-29 15:44             ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 08/14] net-timestamp: make bpf for tx timestamp work Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 09/14] net-timestamp: add a common helper to set tskey Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset Jason Xing
2024-10-29  1:24   ` Willem de Bruijn
2024-10-29  2:41     ` Jason Xing
2024-10-29 15:03       ` Willem de Bruijn
2024-10-29 15:50         ` Jason Xing
2024-10-29 19:45           ` Willem de Bruijn
2024-10-30  3:27             ` Jason Xing
2024-10-30  5:42   ` Martin KaFai Lau
2024-10-30  6:50     ` Jason Xing
2024-10-31  1:17       ` Martin KaFai Lau [this message]
2024-10-31  2:41         ` Jason Xing
2024-10-31  3:27           ` Jason Xing
2024-10-31  5:52           ` Martin KaFai Lau
2024-10-31  6:16             ` Jason Xing
2024-10-31 23:50           ` Martin KaFai Lau
2024-11-01  6:33             ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 11/14] net-timestamp: support OPT_ID for TCP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 12/14] net-timestamp: add OPT_ID for UDP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 13/14] net-timestamp: use static key to control bpf extension Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 14/14] bpf: add simple bpf tests in the tx path for so_timstamping feature Jason Xing
2024-10-29  1:26   ` Willem de Bruijn
2024-10-29  1:33     ` Jason Xing
2024-10-29  1:40       ` Willem de Bruijn
2024-10-29  3:13         ` Jason Xing
2024-10-30  5:57   ` Martin KaFai Lau
2024-10-30  6:54     ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e56f78a9-cbda-4b80-8b55-c16b36e4efb1@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=ykolal@fb.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).