BPF List
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, dsahern@kernel.org, willemb@google.com,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
	yonghong.song@linux.dev, john.fastabend@gmail.com,
	kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
	jolsa@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
Date: Wed, 9 Oct 2024 15:59:44 +0100	[thread overview]
Message-ID: <662873cb-a897-464e-bdb3-edf01363c3b2@linux.dev> (raw)
In-Reply-To: <CAL+tcoCX4ayowenaT9pBTqGzKQ=pH9BdRPa=1QB2PiJ=+yFxSg@mail.gmail.com>

On 09/10/2024 15:35, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 9:58 PM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 09/10/2024 14:47, Jason Xing wrote:
>>> On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko
>>> <vadim.fedorenko@linux.dev> wrote:
>>>>
>>>> On 09/10/2024 12:48, Jason Xing wrote:
>>>>> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
>>>>>> <vadim.fedorenko@linux.dev> wrote:
>>>>>>>
>>>>>>> On 09/10/2024 02:05, Jason Xing wrote:
>>>>>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>>>>>>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Jason Xing wrote:
>>>>>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>>>>>
>>>>>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>>>>>>>>> tracepoint to print information (say, tstamp) so that we can
>>>>>>>>>>> transparently equip applications with this feature and require no
>>>>>>>>>>> modification in user side.
>>>>>>>>>>>
>>>>>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>>>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>>>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>>>>>>>>> better solution to extend.
>>>>>>>>>>>
>>>>>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>>>>>>>>> users only needs to pass certain flags through bpf program to make sure
>>>>>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>>>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>>>>>>>>> and wait for the moment when recvmsg() is called.
>>>>>>>>>>
>>>>>>>>>> As you mention, overall I am very supportive of having a way to add
>>>>>>>>>> timestamping by adminstrators, without having to rebuild applications.
>>>>>>>>>> BPF hooks seem to be the right place for this.
>>>>>>>>>>
>>>>>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>>>>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>>>>>>>>> set, and correlation between measurements for the same data in the
>>>>>>>>>> stream.
>>>>>>>>>>
>>>>>>>>>>> After this series, we could step by step implement more advanced
>>>>>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>>>>>>>>
>>>>>>>>>> My main implementation concern is where this API overlaps with the
>>>>>>>>>> existing user API, and how they might conflict. A few questions in the
>>>>>>>>>> patches.
>>>>>>>>>
>>>>>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
>>>>>>>>> related experts' help.
>>>>>>>>>
>>>>>>>>> How to deal with it without interfering with the existing apps in the
>>>>>>>>> right way is the key problem.
>>>>>>>>
>>>>>>>> What I try to implement is let the bpf program have the highest
>>>>>>>> precedence. It's similar to RTO min, see the commit as an example:
>>>>>>>>
>>>>>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
>>>>>>>> Author: Kevin Yang <yyd@google.com>
>>>>>>>> Date:   Mon Jun 3 21:30:54 2024 +0000
>>>>>>>>
>>>>>>>>         tcp: add sysctl_tcp_rto_min_us
>>>>>>>>
>>>>>>>>         Adding a sysctl knob to allow user to specify a default
>>>>>>>>         rto_min at socket init time, other than using the hard
>>>>>>>>         coded 200ms default rto_min.
>>>>>>>>
>>>>>>>>         Note that the rto_min route option has the highest precedence
>>>>>>>>         for configuring this setting, followed by the TCP_BPF_RTO_MIN
>>>>>>>>         socket option, followed by the tcp_rto_min_us sysctl.
>>>>>>>>
>>>>>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
>>>>>>>> The first priority can override others. It doesn't have a good
>>>>>>>> chance/point to restore the icsk_rto_min field if users want to
>>>>>>>> shutdown the bpf program because it is set in
>>>>>>>> bpf_sol_tcp_setsockopt().
>>>>>>>
>>>>>>> rto_min example is slightly different. With tcp_rto_min the doesn't
>>>>>>> expect any data to come back to user space while for timestamping the
>>>>>>> app may be confused directly by providing more data, or by not providing
>>>>>>> expected data. I believe some hint about requestor of the data is needed
>>>>>>> here. It will also help to solve the problem of populating sk_err_queue
>>>>>>> mentioned by Martin.
>>>>>>
>>>>>> Sorry, I don't fully get it. In this patch series, this bpf extension
>>>>>> feature will not rely on sk_err_queue any more to report tx timestamps
>>>>>> to userspace. Bpf program can do that printing.
>>>>>>
>>>>>> Do you mean that it could be wrong if one skb carries the tsflags that
>>>>>> are previously set due to the bpf program and then suddenly users
>>>>>> detach the program? It indeed will put a new/cloned skb into the error
>>>>>> queue. Interesting corner case. It seems I have to re-implement a
>>>>>> totally independent tsflags for bpf extension feature. Do you have a
>>>>>> better idea on this?
>>>>>
>>>>> I feel that if I could introduce bpf new flags like
>>>>> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
>>>>> series, then it will not populate skb in sk_err_queue even users
>>>>> remove the bpf program all of sudden. With this kind of specific bpf
>>>>> flags, we can also avoid conflicting with the apps using
>>>>> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
>>>>> solution shows up.
>>>>
>>>> It doesn't look great to have duplicate flags just to indicate that this
>>>> particular timestamp was asked by a bpf program, even though it looks
>>>
>>> Or introduce a new field in struct sock or struct sk_buff so that
>>> existing SOF_TIMESTAMPING_* can be reused.
>>
>> Well, I was thinking about this way. We can potentially add an array of
>> tsflags meaning the index of the array is the requestor. That will be
>> more flexible in terms of adding new requestor (like scheduler or
>> congestion control algo) if needed. But it comes with increased memory
>> usage on hot path which might be a blocker.
> 
> Is the following code snippet what you expect? But I wonder why not
> just add a u32 field instead and then use each bit of it defined in
> include/uapi/linux/net_tstamp.h?
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index b32f1424ecc5..4677f53da75a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -445,6 +445,7 @@ struct sock {
>          u32                     sk_reserved_mem;
>          int                     sk_forward_alloc;
>          u32                     sk_tsflags;
> +       u32                     new_tsflags[10];
>          __cacheline_group_end(sock_write_rxtx);
> 
>          __cacheline_group_begin(sock_write_tx);
> 
> I could be missing something. Sorry. If possible, could you show me
> some code snippets?
> 
> As for the new requestor, IIUC, do you want to add more tx timestamp
> generating points in the future?

It's more like this:

diff --git a/include/net/sock.h b/include/net/sock.h
index c58ca8dd561b..93f931dcc4cc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -234,6 +234,14 @@ struct sock_common {
  struct bpf_local_storage;
  struct sk_filter;

+enum {
+       SOCKETOPT_TS_REQUESTOR = 0,
+       CMSG_TS_REQUESTOR,
+       BPFPROG_TS_REQUESTOR,
+
+       __MAX_TS_REQUESTOR,
+};
+
  /**
    *    struct sock - network layer representation of sockets
    *    @__sk_common: shared layout with inet_timewait_sock
@@ -444,7 +452,7 @@ struct sock {
         socket_lock_t           sk_lock;
         u32                     sk_reserved_mem;
         int                     sk_forward_alloc;
-       u32                     sk_tsflags;
+       u32                     sk_tsflags[__MAX_TS_REQUESTOR];
         __cacheline_group_end(sock_write_rxtx);

         __cacheline_group_begin(sock_write_tx);


And use existing SOF_TIMESTAMPING_* for each element in the array. Not
sure that struct sock is the best place though, as some timestamping
requests may be on per-packet basis for protocols other than TCP.

Again, I'm just thinking out loud, kinda wild idea.


  reply	other threads:[~2024-10-09 14:59 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08  9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-08  9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
2024-10-08 18:45   ` Willem de Bruijn
2024-10-08 23:27     ` Jason Xing
2024-10-09 13:22       ` Willem de Bruijn
2024-10-09 13:57         ` Jason Xing
2024-10-09  0:58   ` Kuniyuki Iwashima
2024-10-09  8:11     ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-08  9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-08 19:13   ` Vadim Fedorenko
2024-10-08 23:08     ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-08  9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
2024-10-08 18:53   ` Willem de Bruijn
2024-10-08 23:37     ` Jason Xing
2024-10-08 19:18   ` Vadim Fedorenko
2024-10-08 23:48     ` Jason Xing
2024-10-09  9:16       ` Vadim Fedorenko
2024-10-09 11:15         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
2024-10-08 18:56   ` Willem de Bruijn
2024-10-08 23:18     ` Jason Xing
2024-10-09 13:19       ` Willem de Bruijn
2024-10-09 13:52         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
2024-10-09  7:19   ` Martin KaFai Lau
2024-10-09  8:09     ` Jason Xing
2024-10-09 13:23       ` Willem de Bruijn
2024-10-09 13:48         ` Jason Xing
2024-10-08  9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
2024-10-09  0:22   ` Jakub Kicinski
2024-10-09  0:30     ` Jason Xing
2024-10-09  2:33   ` kernel test robot
2024-10-09  4:17   ` kernel test robot
2024-10-09  5:09   ` kernel test robot
2024-10-08  9:51 ` [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp Jason Xing
2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
2024-10-08 23:22   ` Jason Xing
2024-10-09  1:05     ` Jason Xing
2024-10-09  9:27       ` Vadim Fedorenko
2024-10-09 11:12         ` Jason Xing
2024-10-09 11:48           ` Jason Xing
2024-10-09 13:16             ` Vadim Fedorenko
2024-10-09 13:47               ` Jason Xing
2024-10-09 13:58                 ` Vadim Fedorenko
2024-10-09 14:35                   ` Jason Xing
2024-10-09 14:59                     ` Vadim Fedorenko [this message]
2024-10-09 15:20                       ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=662873cb-a897-464e-bdb3-edf01363c3b2@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox