From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, dsahern@kernel.org, willemb@google.com,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
yonghong.song@linux.dev, john.fastabend@gmail.com,
kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
jolsa@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently
Date: Wed, 9 Oct 2024 14:16:45 +0100 [thread overview]
Message-ID: <9c5b405c-9b3d-4c1f-b278-303fe24c7926@linux.dev> (raw)
In-Reply-To: <CAL+tcoD47VfZJFPJcQOgPsQuGA=jPfKU2548fJp2NBH14gEoHA@mail.gmail.com>
On 09/10/2024 12:48, Jason Xing wrote:
> On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko
>> <vadim.fedorenko@linux.dev> wrote:
>>>
>>> On 09/10/2024 02:05, Jason Xing wrote:
>>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>>>
>>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn
>>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>
>>>>>> Jason Xing wrote:
>>>>>>> From: Jason Xing <kernelxing@tencent.com>
>>>>>>>
>>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using
>>>>>>> tracepoint to print information (say, tstamp) so that we can
>>>>>>> transparently equip applications with this feature and require no
>>>>>>> modification in user side.
>>>>>>>
>>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better
>>>>>>> extension, which is mainly suggested by John Fastabend and Willem de
>>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a
>>>>>>> better solution to extend.
>>>>>>>
>>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path,
>>>>>>> users only needs to pass certain flags through bpf program to make sure
>>>>>>> the last skb from each sendmsg() has timestamp related controlled flag.
>>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags
>>>>>>> and wait for the moment when recvmsg() is called.
>>>>>>
>>>>>> As you mention, overall I am very supportive of having a way to add
>>>>>> timestamping by adminstrators, without having to rebuild applications.
>>>>>> BPF hooks seem to be the right place for this.
>>>>>>
>>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting
>>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature
>>>>>> set, and correlation between measurements for the same data in the
>>>>>> stream.
>>>>>>
>>>>>>> After this series, we could step by step implement more advanced
>>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>>>>>>
>>>>>> My main implementation concern is where this API overlaps with the
>>>>>> existing user API, and how they might conflict. A few questions in the
>>>>>> patches.
>>>>>
>>>>> Agreed. That's also what I'm concerned about. So I decided to ask for
>>>>> related experts' help.
>>>>>
>>>>> How to deal with it without interfering with the existing apps in the
>>>>> right way is the key problem.
>>>>
>>>> What I try to implement is let the bpf program have the highest
>>>> precedence. It's similar to RTO min, see the commit as an example:
>>>>
>>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1
>>>> Author: Kevin Yang <yyd@google.com>
>>>> Date: Mon Jun 3 21:30:54 2024 +0000
>>>>
>>>> tcp: add sysctl_tcp_rto_min_us
>>>>
>>>> Adding a sysctl knob to allow user to specify a default
>>>> rto_min at socket init time, other than using the hard
>>>> coded 200ms default rto_min.
>>>>
>>>> Note that the rto_min route option has the highest precedence
>>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN
>>>> socket option, followed by the tcp_rto_min_us sysctl.
>>>>
>>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl.
>>>> The first priority can override others. It doesn't have a good
>>>> chance/point to restore the icsk_rto_min field if users want to
>>>> shutdown the bpf program because it is set in
>>>> bpf_sol_tcp_setsockopt().
>>>
>>> rto_min example is slightly different. With tcp_rto_min the doesn't
>>> expect any data to come back to user space while for timestamping the
>>> app may be confused directly by providing more data, or by not providing
>>> expected data. I believe some hint about requestor of the data is needed
>>> here. It will also help to solve the problem of populating sk_err_queue
>>> mentioned by Martin.
>>
>> Sorry, I don't fully get it. In this patch series, this bpf extension
>> feature will not rely on sk_err_queue any more to report tx timestamps
>> to userspace. Bpf program can do that printing.
>>
>> Do you mean that it could be wrong if one skb carries the tsflags that
>> are previously set due to the bpf program and then suddenly users
>> detach the program? It indeed will put a new/cloned skb into the error
>> queue. Interesting corner case. It seems I have to re-implement a
>> totally independent tsflags for bpf extension feature. Do you have a
>> better idea on this?
>
> I feel that if I could introduce bpf new flags like
> SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch
> series, then it will not populate skb in sk_err_queue even users
> remove the bpf program all of sudden. With this kind of specific bpf
> flags, we can also avoid conflicting with the apps using
> SO_TIEMSTAMPING feature. Let me give it a shot unless a better
> solution shows up.
It doesn't look great to have duplicate flags just to indicate that this
particular timestamp was asked by a bpf program, even though it looks
like a straight forward solution. Sounds like we have to re-think the
interface for timestamping requests, but I don't have proper suggestion
right now.
next prev parent reply other threads:[~2024-10-09 13:16 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-08 9:51 [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-08 9:51 ` [PATCH net-next 1/9] net-timestamp: add bpf infrastructure to allow exposing more information later Jason Xing
2024-10-08 18:45 ` Willem de Bruijn
2024-10-08 23:27 ` Jason Xing
2024-10-09 13:22 ` Willem de Bruijn
2024-10-09 13:57 ` Jason Xing
2024-10-09 0:58 ` Kuniyuki Iwashima
2024-10-09 8:11 ` Jason Xing
2024-10-08 9:51 ` [PATCH net-next 2/9] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-08 9:51 ` [PATCH net-next 3/9] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-08 19:13 ` Vadim Fedorenko
2024-10-08 23:08 ` Jason Xing
2024-10-08 9:51 ` [PATCH net-next 4/9] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-08 9:51 ` [PATCH net-next 5/9] net-timestamp: ready to turn on the button to generate tx timestamps Jason Xing
2024-10-08 18:53 ` Willem de Bruijn
2024-10-08 23:37 ` Jason Xing
2024-10-08 19:18 ` Vadim Fedorenko
2024-10-08 23:48 ` Jason Xing
2024-10-09 9:16 ` Vadim Fedorenko
2024-10-09 11:15 ` Jason Xing
2024-10-08 9:51 ` [PATCH net-next 6/9] net-timestamp: add tx OPT_ID_TCP support for bpf case Jason Xing
2024-10-08 18:56 ` Willem de Bruijn
2024-10-08 23:18 ` Jason Xing
2024-10-09 13:19 ` Willem de Bruijn
2024-10-09 13:52 ` Jason Xing
2024-10-08 9:51 ` [PATCH net-next 7/9] net-timestamp: open gate for bpf_setsockopt Jason Xing
2024-10-09 7:19 ` Martin KaFai Lau
2024-10-09 8:09 ` Jason Xing
2024-10-09 13:23 ` Willem de Bruijn
2024-10-09 13:48 ` Jason Xing
2024-10-08 9:51 ` [PATCH net-next 8/9] net-timestamp: add bpf framework for rx timestamps Jason Xing
2024-10-09 0:22 ` Jakub Kicinski
2024-10-09 0:30 ` Jason Xing
2024-10-09 2:33 ` kernel test robot
2024-10-09 4:17 ` kernel test robot
2024-10-09 5:09 ` kernel test robot
2024-10-08 9:51 ` [PATCH net-next 9/9] net-timestamp: add bpf support for rx software/hardware timestamp Jason Xing
2024-10-08 18:44 ` [PATCH net-next 0/9] net-timestamp: bpf extension to equip applications transparently Willem de Bruijn
2024-10-08 23:22 ` Jason Xing
2024-10-09 1:05 ` Jason Xing
2024-10-09 9:27 ` Vadim Fedorenko
2024-10-09 11:12 ` Jason Xing
2024-10-09 11:48 ` Jason Xing
2024-10-09 13:16 ` Vadim Fedorenko [this message]
2024-10-09 13:47 ` Jason Xing
2024-10-09 13:58 ` Vadim Fedorenko
2024-10-09 14:35 ` Jason Xing
2024-10-09 14:59 ` Vadim Fedorenko
2024-10-09 15:20 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c5b405c-9b3d-4c1f-b278-303fe24c7926@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox