From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, dsahern@kernel.org, willemb@google.com,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, horms@kernel.org,
bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v8 01/12] bpf: add support for bpf_setsockopt()
Date: Wed, 5 Feb 2025 12:57:58 -0800 [thread overview]
Message-ID: <0f3159c8-d2fe-4fc0-8f09-260aa84e12bd@linux.dev> (raw)
In-Reply-To: <CAL+tcoA26mQQf-_0Ebi2qqd=qVn1soUoma-3o8NdUFGZL_-L2g@mail.gmail.com>
On 2/5/25 7:34 AM, Jason Xing wrote:
> On Wed, Feb 5, 2025 at 11:22 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>>
>> Jason Xing wrote:
>>> Users can write the following code to enable the bpf extension:
>>> int flags = SK_BPF_CB_TX_TIMESTAMPING;
>>> int opts = SK_BPF_CB_FLAGS;
>>> bpf_setsockopt(skops, SOL_SOCKET, opts, &flags, sizeof(flags));
>>>
>>> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
>>> ---
>>> include/net/sock.h | 3 +++
>>> include/uapi/linux/bpf.h | 8 ++++++++
>>> net/core/filter.c | 23 +++++++++++++++++++++++
>>> tools/include/uapi/linux/bpf.h | 1 +
>>> 4 files changed, 35 insertions(+)
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index 8036b3b79cd8..7916982343c6 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -303,6 +303,7 @@ struct sk_filter;
>>> * @sk_stamp: time stamp of last packet received
>>> * @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
>>> * @sk_tsflags: SO_TIMESTAMPING flags
>>> + * @sk_bpf_cb_flags: used in bpf_setsockopt()
>>> * @sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
>>> * Sockets that can be used under memory reclaim should
>>> * set this to false.
>>> @@ -445,6 +446,8 @@ struct sock {
>>> u32 sk_reserved_mem;
>>> int sk_forward_alloc;
>>> u32 sk_tsflags;
>>> +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
>>> + u32 sk_bpf_cb_flags;
>>> __cacheline_group_end(sock_write_rxtx);
>>>
>>> __cacheline_group_begin(sock_write_tx);
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 2acf9b336371..6116eb3d1515 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -6913,6 +6913,13 @@ enum {
>>> BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F,
>>> };
>>>
>>> +/* Definitions for bpf_sk_cb_flags */
>>> +enum {
>>> + SK_BPF_CB_TX_TIMESTAMPING = 1<<0,
>>> + SK_BPF_CB_MASK = (SK_BPF_CB_TX_TIMESTAMPING - 1) |
>>> + SK_BPF_CB_TX_TIMESTAMPING
>>> +};
>>> +
>>> /* List of known BPF sock_ops operators.
>>> * New entries can only be added at the end
>>> */
>>> @@ -7091,6 +7098,7 @@ enum {
>>> TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */
>>> TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */
>>> TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
>>> + SK_BPF_CB_FLAGS = 1009, /* Used to set socket bpf flags */
>>> };
>>>
>>> enum {
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 2ec162dd83c4..1c6c07507a78 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -5222,6 +5222,25 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
>>> .arg1_type = ARG_PTR_TO_CTX,
>>> };
>>>
>>> +static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
>>> +{
>>> + u32 sk_bpf_cb_flags;
>>> +
>>> + if (getopt) {
>>> + *(u32 *)optval = sk->sk_bpf_cb_flags;
>>> + return 0;
>>> + }
>>> +
>>> + sk_bpf_cb_flags = *(u32 *)optval;
>>> +
>>> + if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
>>> + return -EINVAL;
>>> +
>>> + sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
>>
>> I don't know BPF internals that well:
>>
>> Is there mutual exclusion between these sol_socket_sockopt calls?
Yep. There is a sock_owned_by_me() in the earlier code path of sol_socket_sockopt.
Another reader is at patch 11 tcp_tx_timestamp() which should have owned the
sock also.
>> Or do these sk field accesses need WRITE_ONCE/READ_ONCE.
> this potential data race issue, just in case bpf program doesn't use
> it as we expect, I think I will add the this annotation in v9.
Jason, it should not have data race issue in sol_socket_sockopt(). bpf program
cannot use the sol_socket_sockopt without a lock. Patch 4 was added exactly to
ensure that.
The situation is similar to the existing tcp_sk(sk)->bpf_sock_ops_cb_flags. It
is also a plain access such that it is clear all read/write is under the
sock_owned_by_me.
next prev parent reply other threads:[~2025-02-05 20:58 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-04 18:30 [PATCH bpf-next v8 00/12] net-timestamp: bpf extension to equip applications transparently Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 01/12] bpf: add support for bpf_setsockopt() Jason Xing
2025-02-05 15:22 ` Willem de Bruijn
2025-02-05 15:34 ` Jason Xing
2025-02-05 20:57 ` Martin KaFai Lau [this message]
2025-02-05 21:25 ` Willem de Bruijn
2025-02-04 18:30 ` [PATCH bpf-next v8 02/12] bpf: prepare for timestamping callbacks use Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 03/12] bpf: stop unsafely accessing TCP fields in bpf callbacks Jason Xing
2025-02-05 15:24 ` Willem de Bruijn
2025-02-05 15:35 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 04/12] bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks Jason Xing
2025-02-05 15:26 ` Willem de Bruijn
2025-02-05 15:50 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 05/12] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
2025-02-05 1:47 ` Jakub Kicinski
2025-02-05 2:40 ` Jason Xing
2025-02-05 3:14 ` Jakub Kicinski
2025-02-05 3:23 ` Jason Xing
2025-02-05 1:50 ` Jakub Kicinski
2025-02-05 15:34 ` Willem de Bruijn
2025-02-05 15:52 ` Jason Xing
2025-02-06 8:43 ` Jason Xing
2025-02-06 10:22 ` Jason Xing
2025-02-06 16:13 ` Willem de Bruijn
2025-02-07 0:22 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 06/12] bpf: support SCM_TSTAMP_SCHED " Jason Xing
2025-02-05 15:36 ` Willem de Bruijn
2025-02-05 15:55 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 07/12] bpf: support sw SCM_TSTAMP_SND " Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 08/12] bpf: support hw " Jason Xing
2025-02-05 15:45 ` Willem de Bruijn
2025-02-05 16:03 ` Jason Xing
2025-02-10 22:39 ` Martin KaFai Lau
2025-02-11 0:00 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 09/12] bpf: support SCM_TSTAMP_ACK " Jason Xing
2025-02-05 15:47 ` Willem de Bruijn
2025-02-05 16:06 ` Jason Xing
2025-02-05 21:25 ` Willem de Bruijn
2025-02-04 18:30 ` [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf extension work Jason Xing
2025-02-05 1:57 ` Jakub Kicinski
2025-02-05 2:15 ` Jason Xing
2025-02-05 21:57 ` Martin KaFai Lau
2025-02-06 0:12 ` Jason Xing
2025-02-06 0:42 ` Jason Xing
2025-02-06 0:47 ` Martin KaFai Lau
2025-02-06 1:05 ` Jason Xing
2025-02-06 2:39 ` Jason Xing
2025-02-06 2:56 ` Willem de Bruijn
2025-02-06 3:09 ` Jason Xing
2025-02-06 3:25 ` Willem de Bruijn
2025-02-06 3:41 ` Jason Xing
2025-02-06 6:12 ` Martin KaFai Lau
2025-02-06 6:56 ` Jason Xing
2025-02-07 2:07 ` Martin KaFai Lau
2025-02-07 2:18 ` Jason Xing
2025-02-07 12:07 ` Jason Xing
2025-02-08 2:11 ` Martin KaFai Lau
2025-02-08 6:53 ` Jason Xing
2025-02-07 13:34 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 11/12] bpf: add a new callback in tcp_tx_timestamp() Jason Xing
2025-02-05 5:28 ` Jason Xing
2025-02-04 18:30 ` [PATCH bpf-next v8 12/12] selftests/bpf: add simple bpf tests in the tx path for timestamping feature Jason Xing
2025-02-05 15:54 ` Willem de Bruijn
2025-02-05 16:08 ` Jason Xing
2025-02-06 1:28 ` Martin KaFai Lau
2025-02-06 2:14 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f3159c8-d2fe-4fc0-8f09-260aa84e12bd@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.