From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, dsahern@kernel.org, willemb@google.com,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
yonghong.song@linux.dev, john.fastabend@gmail.com,
kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
jolsa@kernel.org, shuah@kernel.org, ykolal@fb.com,
bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset
Date: Tue, 29 Oct 2024 15:45:39 -0400 [thread overview]
Message-ID: <67213b62f4100_2f188c294b7@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <CAL+tcoCDN+YSwXDocv9DcvPGW-sLhEfPHHbzcO2+1PBZFRkB0Q@mail.gmail.com>
> > > > > +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type)
> > > > > +{
> > > > > + u32 tskey;
> > > > > +
> > > > > + if (sk_is_tcp(sk)) {
> > > > > + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> > > > > + tskey = tcp_sk(sk)->write_seq;
> > > > > + else
> > > > > + tskey = tcp_sk(sk)->snd_una;
> > > > > + } else {
> > > > > + tskey = 0;
> > > > > + }
> > > > > +
> > > > > + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> > > > > + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey);
> > > > > + return 0;
> > > > > + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) {
> > > > > + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey;
> > > > > + } else {
> > > > > + sk->sk_tskey_bpf_offset = 0;
> > > > > + }
> > > > > +
> > > > > + return tskey;
> > > > > +}
> > > > > +
> > > > > int sock_set_tskey(struct sock *sk, int val, int bpf_type)
> > > > > {
> > > > > u32 tsflags = bpf_type ? sk->sk_tsflags_bpf : sk->sk_tsflags;
> > > > > @@ -901,17 +944,13 @@ int sock_set_tskey(struct sock *sk, int val, int bpf_type)
> > > > >
> > > > > if (val & SOF_TIMESTAMPING_OPT_ID &&
> > > > > !(tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> > > > > - if (sk_is_tcp(sk)) {
> > > > > - if ((1 << sk->sk_state) &
> > > > > - (TCPF_CLOSE | TCPF_LISTEN))
> > > > > - return -EINVAL;
> > > > > - if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
> > > > > - else
> > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
> > > > > - } else {
> > > > > - atomic_set(&sk->sk_tskey, 0);
> > > > > - }
> > > > > + long int ret;
> > > > > +
> > > > > + ret = sock_calculate_tskey_offset(sk, val, bpf_type);
> > > > > + if (ret <= 0)
> > > > > + return ret;
> > > > > +
> > > > > + atomic_set(&sk->sk_tskey, ret);
> > > > > }
> > > > >
> > > > > return 0;
> > > > > @@ -956,10 +995,15 @@ static int sock_set_timestamping_bpf(struct sock *sk,
> > > > > struct so_timestamping timestamping)
> > > > > {
> > > > > u32 flags = timestamping.flags;
> > > > > + int ret;
> > > > >
> > > > > if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK)
> > > > > return -EINVAL;
> > > > >
> > > > > + ret = sock_set_tskey(sk, flags, 1);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > WRITE_ONCE(sk->sk_tsflags_bpf, flags);
> > > > >
> > > > > return 0;
> > > >
> > > > I'm a bit hazy on when this can be called. We can assume that this new
> > > > BPF operation cannot race with the existing setsockopt nor with the
> > > > datapath that might touch the atomic fields, right?
> > >
> > > It surely can race with the existing setsockopt.
> > >
> > > 1)
> > > if (only existing setsockopt works) {
> > > then sk->sk_tskey is set through setsockopt, sk_tskey_bpf_offset is 0.
> > > }
> > >
> > > 2)
> > > if (only bpf setsockopt works) {
> > > then sk->sk_tskey is set through bpf_setsockopt,
> > > sk_tskey_bpf_offset is 0.
> > > }
> > >
> > > 3)
> > > if (existing setsockopt already started, here we enable the bpf feature) {
> > > then sk->sk_tskey will not change, but the sk_tskey_bpf_offset
> > > will be calculated.
> > > }
> > >
> > > 4)
> > > if (bpf setsockopt already started, here we enable the application feature) {
> > > then sk->sk_tskey will re-initialized/overridden by
> > > setsockopt, and the sk_tskey_bpf_offset will be calculated.
> > > }
>
> I will copy the above to the commit message next time in order to
> provide a clear design to future readers.
>
> > >
> > > Then the skb tskey will use the sk->sk_tskey like before.
> >
> > I mean race as in the setsockopt and bpf setsockopt and datapath
> > running concurrently.
> >
> > As long as both variants of setsockopt hold the socket lock, that
> > won't happen.
> >
> > The datapath is lockless for UDP, so atomic_inc sk_tskey can race
> > with calculating the difference. But this is a known issue. A process
> > that cares should not run setsockopt and send concurrently. So this is
> > fine too.
>
> Oh, now I see. Thanks for the detailed explanation! So Do you feel if
> we need to take care of this in the future, I mean, after this series
> gets merged...?
If there is a race condition, then that cannot be fixed up later.
But from my admittedly brief analysis, it seems that there is nothing
here that needs to be fixed: control plane operations (setsockopt)
hold the socket lock. A setsockopt that conflicts with a lockless
datapath update will have a slightly ambiguous offset. It is under
controlof and up to the user to avoid that if they care.
next prev parent reply other threads:[~2024-10-29 19:45 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-28 11:05 [PATCH net-next v3 00/14] net-timestamp: bpf extension to equip applications transparently Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 01/14] net-timestamp: reorganize in skb_tstamp_tx_output() Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly Jason Xing
2024-10-29 23:00 ` Martin KaFai Lau
2024-10-30 1:23 ` Jason Xing
2024-10-30 1:45 ` Willem de Bruijn
2024-10-30 2:32 ` Jason Xing
2024-10-30 2:47 ` Willem de Bruijn
2024-10-30 3:04 ` Jason Xing
2024-10-30 5:37 ` Martin KaFai Lau
2024-10-30 6:42 ` Jason Xing
2024-10-30 17:15 ` Willem de Bruijn
2024-10-30 23:54 ` Jason Xing
2024-10-31 0:13 ` Jason Xing
2024-10-31 6:27 ` Martin KaFai Lau
2024-10-31 7:04 ` Jason Xing
2024-10-31 12:30 ` Willem de Bruijn
2024-10-31 13:50 ` Jason Xing
2024-10-31 23:26 ` Martin KaFai Lau
2024-11-01 7:47 ` Jason Xing
2024-11-05 1:50 ` Martin KaFai Lau
2024-11-05 3:13 ` Jason Xing
2024-11-01 13:32 ` Willem de Bruijn
2024-11-01 16:08 ` Jason Xing
2024-11-01 16:39 ` Willem de Bruijn
2024-11-05 2:09 ` Martin KaFai Lau
2024-11-05 6:22 ` Jason Xing
2024-11-05 19:22 ` Martin KaFai Lau
2024-11-06 0:17 ` Jason Xing
2024-11-06 1:09 ` Martin KaFai Lau
2024-11-06 2:51 ` Jason Xing
2024-11-07 1:19 ` Martin KaFai Lau
2024-11-07 3:31 ` Jason Xing
2024-11-07 19:05 ` Martin KaFai Lau
2024-11-06 1:11 ` Willem de Bruijn
2024-11-06 2:37 ` Jason Xing
2024-11-05 14:29 ` Willem de Bruijn
2024-11-02 13:43 ` Simon Horman
2024-11-03 0:42 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 03/14] net-timestamp: open gate for bpf_setsockopt/_getsockopt Jason Xing
2024-10-29 0:59 ` Willem de Bruijn
2024-10-29 1:18 ` Jason Xing
2024-10-30 0:32 ` Martin KaFai Lau
2024-10-30 1:15 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 04/14] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp Jason Xing
2024-10-29 0:23 ` kernel test robot
2024-10-29 1:02 ` Willem de Bruijn
2024-10-29 1:30 ` Jason Xing
2024-10-29 1:04 ` kernel test robot
2024-10-28 11:05 ` [PATCH net-next v3 05/14] net-timestamp: introduce TS_SW_OPT_CB to generate driver timestamp Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 06/14] net-timestamp: introduce TS_ACK_OPT_CB to generate tcp acked timestamp Jason Xing
2024-10-29 1:03 ` Willem de Bruijn
2024-10-29 1:19 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 07/14] net-timestamp: add a new triggered point to set sk_tsflags_bpf in UDP layer Jason Xing
2024-10-29 1:07 ` Willem de Bruijn
2024-10-29 1:23 ` Jason Xing
2024-10-29 1:33 ` Willem de Bruijn
2024-10-29 3:12 ` Jason Xing
2024-10-29 15:04 ` Willem de Bruijn
2024-10-29 15:44 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 08/14] net-timestamp: make bpf for tx timestamp work Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 09/14] net-timestamp: add a common helper to set tskey Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset Jason Xing
2024-10-29 1:24 ` Willem de Bruijn
2024-10-29 2:41 ` Jason Xing
2024-10-29 15:03 ` Willem de Bruijn
2024-10-29 15:50 ` Jason Xing
2024-10-29 19:45 ` Willem de Bruijn [this message]
2024-10-30 3:27 ` Jason Xing
2024-10-30 5:42 ` Martin KaFai Lau
2024-10-30 6:50 ` Jason Xing
2024-10-31 1:17 ` Martin KaFai Lau
2024-10-31 2:41 ` Jason Xing
2024-10-31 3:27 ` Jason Xing
2024-10-31 5:52 ` Martin KaFai Lau
2024-10-31 6:16 ` Jason Xing
2024-10-31 23:50 ` Martin KaFai Lau
2024-11-01 6:33 ` Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 11/14] net-timestamp: support OPT_ID for TCP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 12/14] net-timestamp: add OPT_ID for UDP proto Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 13/14] net-timestamp: use static key to control bpf extension Jason Xing
2024-10-28 11:05 ` [PATCH net-next v3 14/14] bpf: add simple bpf tests in the tx path for so_timstamping feature Jason Xing
2024-10-29 1:26 ` Willem de Bruijn
2024-10-29 1:33 ` Jason Xing
2024-10-29 1:40 ` Willem de Bruijn
2024-10-29 3:13 ` Jason Xing
2024-10-30 5:57 ` Martin KaFai Lau
2024-10-30 6:54 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67213b62f4100_2f188c294b7@willemb.c.googlers.com.notmuch \
--to=willemdebruijn.kernel@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=ykolal@fb.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox