From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
willemb@google.com, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, shuah@kernel.org,
ykolal@fb.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly
Date: Wed, 6 Nov 2024 17:19:36 -0800 [thread overview]
Message-ID: <49ad2b87-29af-429e-8acb-2bba13e2b2aa@linux.dev> (raw)
In-Reply-To: <CAL+tcoBzces5_awOzZsyqpTWjk0moxkjj7kHjCtPcsU3kNJ4tg@mail.gmail.com>
On 11/5/24 6:51 PM, Jason Xing wrote:
> On Wed, Nov 6, 2024 at 9:09 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 11/5/24 4:17 PM, Jason Xing wrote:
>>> On Wed, Nov 6, 2024 at 3:22 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 11/4/24 10:22 PM, Jason Xing wrote:
>>>>> On Tue, Nov 5, 2024 at 10:09 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>>>
>>>>>> On 11/1/24 6:32 AM, Willem de Bruijn wrote:
>>>>>>>> In udp/raw/..., I don't know how likely is the user space having "cork->tx_flags
>>>>>>>> & SKBTX_ANY_TSTAMP" set but has neither "READ_ONCE(sk->sk_tsflags) &
>>>>>>>> SOF_TIMESTAMPING_OPT_ID" nor "cork->flags & IPCORK_TS_OPT_ID" set.
>>>>>>> This is not something to rely on. OPT_ID was added relatively recently.
>>>>>>> Older applications, or any that just use the most straightforward API,
>>>>>>> will not set this.
>>>>>>
>>>>>> Good point that the OPT_ID per cmsg is very new.
>>>>>>
>>>>>> The datagram support on SOF_TIMESTAMPING_OPT_ID in sk->sk_tsflags had
>>>>>> been there for quite some time now. Is it a safe assumption that
>>>>>> most applications doing udp tx timestamping should have
>>>>>> the SOF_TIMESTAMPING_OPT_ID set to be useful?
>>>>>>
>>>>>>>
>>>>>>>> If it is
>>>>>>>> unlikely, may be we can just disallow bpf prog from directly setting
>>>>>>>> skb_shinfo(skb)->tskey for this particular skb.
>>>>>>>>
>>>>>>>> For all other cases, in __ip[6]_append_data, directly call a bpf prog and also
>>>>>>>> pass the kernel decided tskey to the bpf prog.
>>>>>>>>
>>>>>>>> The kernel passed tskey could be 0 (meaning the user space has not used it). The
>>>>>>>> bpf prog can give one for the kernel to use. The bpf prog can store the
>>>>>>>> sk_tskey_bpf in the bpf_sk_storage now. Meaning no need to add one to the struct
>>>>>>>> sock. The bpf prog does not have to start from 0 (e.g. start from U32_MAX
>>>>>>>> instead) if it helps.
>>>>>>>>
>>>>>>>> If the kernel passed tskey is not 0, the bpf prog can just use that one
>>>>>>>> (assuming the user space is doing something sane, like the value in
>>>>>>>> SCM_TS_OPT_ID won't be jumping back and front between 0 to U32_MAX). I hope this
>>>>>>>> is very unlikely also (?) but the bpf prog can probably detect this and choose
>>>>>>>> to ignore this sk.
>>>>>>> If an applications uses OPT_ID, it is unlikely that they will toggle
>>>>>>> the feature on and off on a per-packet basis. So in the common case
>>>>>>> the program could use the user-set counter or use its own if userspace
>>>>>>> does not enable the feature. In the rare case that an application does
>>>>>>> intermittently set an OPT_ID, the numbering would be erratic. This
>>>>>>> does mean that an actively malicious application could mess with admin
>>>>>>> measurements.
>>>>>>
>>>>>> All make sense. Given it is reasonable to assume the user space should either
>>>>>> has SOF_TIMESTAMPING_OPT_ID always on or always off. When it is off, the bpf
>>>>>> prog can directly provide its own tskey to be used in shinfo->tskey. The bpf
>>>>>> prog can generate the id itself without using the sk->sk_tskey, e.g. store an
>>>>>> atomic int in the bpf_sk_storage.
>>>>>
>>>>> I wonder, how can we correlate the key with each skb in the bpf
>>>>> program for non-TCP type without implementing a bpf extension for
>>>>> SCM_TS_OPT_ID? Every time the timestamp is reported, we cannot know
>>>>> which sendmsg() the skb belongs to for non-TCP cases.
>>>>
>>>> SCM_TS_OPT_ID is eventually setting the shinfo->tskey.
>>>> If the shinfo->tskey is not set by the user space, the bpf prog can directly set
>>>> the shinfo->tskey. There is no need to use the sk->sk_tskey as the ID generator
>>>> also. The bpf prog can have its own id generator.
>>>>
>>>> If the user space has already set the shinfo->tskey (either by sk->sk_tskey or
>>>> SCM_TS_OPT_ID), the bpf prog can just use the user space one.
>>>>
>>>> If there is a weird application that flips flops between OPT_ID on/off, the bpf
>>>> prog will get confused which is fine. The bpf prog can detect this and choose to
>>>> ignore measuring this sk/skb. The bpf prog can also choose to be on the very
>>>> safe side and ignore all skb with SKBTX_ANY_TSTAMP set in txflags but with no
>>>> OPT_ID. The bpf prog can look into the details of the sk and skb to decide what
>>>> makes the most sense for its deployment.
>>>>
>>>> I don't know whether it makes more sense to call the bpf prog to decide the
>>>> shinfo->{tx_flags,tskey} just before the "while (length > 0)" in
>>>> __ip[6]_append_data or it is better to call the bpf prog in ip[6]_setup_cork.
>>>> I admittedly less familiar with this code path than the tcp one.
>>>
>>> Now I feel it could be complicated for a software engineer to consider
>>> how they will handle the key if they don't read the kernel code very
>>> carefully. They are facing different situations. Being user-friendly
>>> lets this feature have more chances to get widely used. As I insisted
>>> before, I still would like to know if it is possible that we can try
>>> to introduce sk_tskey_bpf_offset (like patch 10-12) to calculate a bpf
>>> exclusive tskey for bpf use? Only exporting one key. It will be really
>>> simple and easy-to-use :)
>>
>> imo, there is no need for adding sk_tskey_bpf_offset to sk. just allow the bpf
>> prog to decide what is the tskey.
>>
>> There is no usability issue in bpf prog. It is pretty normal for a bpf prog
>> author to look at the sk details to make decision.
>>
>> Abstracting the sk/skb is not helping the bpf prog and not the right direction
>> to go. Over time, there has been case over case that the bpf prog wants to know
>> more instead of being abstracted away like running in the user space. e.g. The
>> "struct bpf_sock" abstraction in the uapi/linux/bpf.h does not scale and we have
>> stopped adding more abstraction this way. The btf (and PTR_TO_BTF_ID,
>> CO-RE...etc) has been added to allow the bpf prog to learn other details in sk
>> and skb.
>>
>> Instead, design a better bpf kfunc to help the bpf prog to set the bits/tskey in
>> the skb. I think this is more important. tcp tskey is easy. just need some care
>> on the udp tskey and need to check if the user space has already set one.
>> A good designed bpf kfunc is all it needs.
>
> Thanks!
>
> Let me confirm again in case I'm missing something important.
> 1) For tcp, as you said before, bpf prog can extract the seq from the
> exported skb, so I don't need to export any key in this case.
> 2) For udp, if the skb has skb_shinfo(skb)->tskey set, then export the
> key, else, export zero to the bpf program.
A follow up to myself on the earlier bpf kfunc comment. Something like this:
/* ack: request ACK timestamp (tcp only)
* req_tskey: bpf prog can request to use a particular tskey.
* req_tskey should always be 0 for tcp.
* return: -ve for error. u32 for the tskey that the bpf prog should use.
* may be different from the req_tskey (e.g. the user space has
* already set one).
*/
__bpf_kfunc s64 bpf_skops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
bool ack, u32 req_tskey);
/* "not sure" if this kfunc is needed. probably no. I think it is easier to pass
* true/false in the args[0]. It seems tskey can be 0 in udp, so
* passing tskey can't tell if the skb/cork/sockcm_cookie has the tskey.
*/
__bpf_kfunc bool bpf_skops_has_tskey(struct bpf_sock_ops_kern *skops);
For udp, I don't know whether it will be easier to set the tskey in the 'cork'
or 'sockcm_cookie' or 'skb'. I guess it depends where the bpf prog is called. If
skb, it seems the bpf prog may be called repetitively for doing the same thing
in the while loop in __ip[6]_append_data. If it is better to set the 'cork' or
'sockcm_cookie', the cork/sockcm_cookie pointer can be added to 'struct
bpf_sock_ops_kern'. The sizeof(struct bpf_sock_ops_kern) is at 64bytes. Adding
one pointer is not ideal.... probably it can be union with syn_skb but will need
some code audit (so please check).
> 3) extend SCM_TS_OPT_ID for the udp/bpf case.
I don't understand. What does it mean to extend SCM_TS_OPT_ID?
> I'm not sure if I should postpone implementing this part after the
> basic framework of this series gets merged. Anyway, I will try this :)
next prev parent reply other threads:[~2024-11-07 1:19 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-28 11:05 [PATCH net-next v3 00/14] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 01/14] net-timestamp: reorganize in skb_tstamp_tx_output() Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly Jason Xing
2024-10-29 23:00 ` Martin KaFai Lau
2024-10-30 1:23 ` Jason Xing
2024-10-30 1:45 ` Willem de Bruijn
2024-10-30 2:32 ` Jason Xing
2024-10-30 2:47 ` Willem de Bruijn
2024-10-30 3:04 ` Jason Xing
2024-10-30 5:37 ` Martin KaFai Lau
2024-10-30 6:42 ` Jason Xing
2024-10-30 17:15 ` Willem de Bruijn
2024-10-30 23:54 ` Jason Xing
2024-10-31 0:13 ` Jason Xing
2024-10-31 6:27 ` Martin KaFai Lau
2024-10-31 7:04 ` Jason Xing
2024-10-31 12:30 ` Willem de Bruijn
2024-10-31 13:50 ` Jason Xing
2024-10-31 23:26 ` Martin KaFai Lau
2024-11-01 7:47 ` Jason Xing
2024-11-05 1:50 ` Martin KaFai Lau
2024-11-05 3:13 ` Jason Xing
2024-11-01 13:32 ` Willem de Bruijn
2024-11-01 16:08 ` Jason Xing
2024-11-01 16:39 ` Willem de Bruijn
2024-11-05 2:09 ` Martin KaFai Lau
2024-11-05 6:22 ` Jason Xing
2024-11-05 19:22 ` Martin KaFai Lau
2024-11-06 0:17 ` Jason Xing
2024-11-06 1:09 ` Martin KaFai Lau
2024-11-06 2:51 ` Jason Xing
2024-11-07 1:19 ` Martin KaFai Lau [this message]
2024-11-07 3:31 ` Jason Xing
2024-11-07 19:05 ` Martin KaFai Lau
2024-11-06 1:11 ` Willem de Bruijn
2024-11-06 2:37 ` Jason Xing
2024-11-05 14:29 ` Willem de Bruijn
2024-11-02 13:43 ` Simon Horman
2024-11-03 0:42 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 03/14] net-timestamp: open gate for bpf_setsockopt/_getsockopt Jason Xing
2024-10-29 0:59 ` Willem de Bruijn
2024-10-29 1:18 ` Jason Xing
2024-10-30 0:32 ` Martin KaFai Lau
2024-10-30 1:15 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 04/14] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-29 0:23 ` kernel test robot
2024-10-29 1:02 ` Willem de Bruijn
2024-10-29 1:30 ` Jason Xing
2024-10-29 1:04 ` kernel test robot
2024-10-28 11:05 ` [PATCH net-next v3 05/14] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 06/14] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-29 1:03 ` Willem de Bruijn
2024-10-29 1:19 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 07/14] net-timestamp: add a new triggered point to set sk_tsflags_bpf in UDP layer Jason Xing
2024-10-29 1:07 ` Willem de Bruijn
2024-10-29 1:23 ` Jason Xing
2024-10-29 1:33 ` Willem de Bruijn
2024-10-29 3:12 ` Jason Xing
2024-10-29 15:04 ` Willem de Bruijn
2024-10-29 15:44 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 08/14] net-timestamp: make bpf for tx timestamp work Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 09/14] net-timestamp: add a common helper to set tskey Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset Jason Xing
2024-10-29 1:24 ` Willem de Bruijn
2024-10-29 2:41 ` Jason Xing
2024-10-29 15:03 ` Willem de Bruijn
2024-10-29 15:50 ` Jason Xing
2024-10-29 19:45 ` Willem de Bruijn
2024-10-30 3:27 ` Jason Xing
2024-10-30 5:42 ` Martin KaFai Lau
2024-10-30 6:50 ` Jason Xing
2024-10-31 1:17 ` Martin KaFai Lau
2024-10-31 2:41 ` Jason Xing
2024-10-31 3:27 ` Jason Xing
2024-10-31 5:52 ` Martin KaFai Lau
2024-10-31 6:16 ` Jason Xing
2024-10-31 23:50 ` Martin KaFai Lau
2024-11-01 6:33 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 11/14] net-timestamp: support OPT_ID for TCP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 12/14] net-timestamp: add OPT_ID for UDP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 13/14] net-timestamp: use static key to control bpf extension Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 14/14] bpf: add simple bpf tests in the tx path for so_timstamping feature Jason Xing
2024-10-29 1:26 ` Willem de Bruijn
2024-10-29 1:33 ` Jason Xing
2024-10-29 1:40 ` Willem de Bruijn
2024-10-29 3:13 ` Jason Xing
2024-10-30 5:57 ` Martin KaFai Lau
2024-10-30 6:54 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49ad2b87-29af-429e-8acb-2bba13e2b2aa@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=ykolal@fb.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).