From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, dsahern@kernel.org,
willemdebruijn.kernel@gmail.com, willemb@google.com,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, horms@kernel.org,
bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension
Date: Wed, 15 Jan 2025 16:03:53 -0800 [thread overview]
Message-ID: <5d9ba064-3288-4926-b9dc-3119bb3404c1@linux.dev> (raw)
In-Reply-To: <20250112113748.73504-14-kerneljasonxing@gmail.com>
On 1/12/25 3:37 AM, Jason Xing wrote:
> Introduce tskey_bpf to correlate tcp_sendmsg timestamp with other
> three points (SND/SW/ACK). More details can be found in the
> selftest.
>
> For TCP, tskey_bpf is used to store the initial write_seq value
> the moment tcp_sendmsg is called, so that the last skb of this
> call will have the same tskey_bpf with tcp_sendmsg bpf callback.
>
> UDP works similarly because tskey_bpf can increase by one everytime
> udp_sendmsg gets called. It will be implemented soon.
>
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
> include/linux/skbuff.h | 2 ++
> include/uapi/linux/bpf.h | 3 +++
> net/core/sock.c | 3 ++-
> net/ipv4/tcp.c | 10 ++++++++--
> tools/include/uapi/linux/bpf.h | 3 +++
> 5 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d3ef8db94a94..3b7b470d5d89 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -609,6 +609,8 @@ struct skb_shared_info {
> };
> unsigned int gso_type;
> u32 tskey;
> + /* For TCP, it records the initial write_seq when sendmsg is called */
> + u32 tskey_bpf;
I would suggest to remove this tskey_bpf addition to skb_shared_info. My
understanding is the intention is to get the delay spent in the
tcp_sendmsg_locked(). I think this can be done in bpf_sk_storage. More below.
>
> /*
> * Warning : all fields before dataref are cleared in __alloc_skb()
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index a0aff1b4eb61..87420c0f2235 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7037,6 +7037,9 @@ enum {
> * feature is on. It indicates the
> * recorded timestamp.
> */
> + BPF_SOCK_OPS_TS_TCP_SND_CB, /* Called when every tcp_sendmsg
> + * syscall is triggered
> + */
UDP will need this also?
> };
>
> /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 2f54e60a50d4..e74ab0e2979d 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -958,7 +958,8 @@ void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> if (sk_is_tcp(sk) && sk_fullsock(sk))
> sock_ops.is_fullsock = 1;
> sock_ops.sk = sk;
> - bpf_skops_init_skb(&sock_ops, skb, 0);
> + if (skb)
> + bpf_skops_init_skb(&sock_ops, skb, 0);
> sock_ops.timestamp_used = 1;
> __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> }
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0a41006b10d1..b6e0db5e4ead 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -477,7 +477,7 @@ void tcp_init_sock(struct sock *sk)
> }
> EXPORT_SYMBOL(tcp_init_sock);
>
> -static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc, u32 first_write_seq)
> {
> struct sk_buff *skb = tcp_write_queue_tail(sk);
> u32 tsflags = sockc->tsflags;
> @@ -500,6 +500,7 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> tcb->txstamp_ack_bpf = 1;
> shinfo->tx_flags |= SKBTX_BPF;
> shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
Add the bpf prog callout here instead:
bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_TCP_SND_CB);
If the bpf prog wants to figure out the delay from the very beginning of the
tcp_sendmsg_locked(), a bpf prog (either by tracing the tcp_sendmsg_locked or by
adding a new callout at the beginning of tcp_sendmsg_locked like this patch) can
store a bpf_ktime_get_ns() in the bpf_sk_storage. The bpf prog running here (at
tcp_tx_timestamp) can get that timestamp from the bpf_sk_storage since it has a
hold on the same sk pointer. There is no need to add a new shinfo->tskey_bpf to
measure this part of the delay.
> + shinfo->tskey_bpf = first_write_seq;
> }
> }
>
> @@ -1067,10 +1068,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> int flags, err, copied = 0;
> int mss_now = 0, size_goal, copied_syn = 0;
> int process_backlog = 0;
> + u32 first_write_seq = 0;
> int zc = 0;
> long timeo;
>
> flags = msg->msg_flags;
> + if (SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING)) {
> + first_write_seq = tp->write_seq;
> + bpf_skops_tx_timestamping(sk, NULL, BPF_SOCK_OPS_TS_TCP_SND_CB);
My preference is to skip this bpf callout for now and depends on a bpf trace
program if it is really needed.
> + }
>
> if ((flags & MSG_ZEROCOPY) && size) {
> if (msg->msg_ubuf) {
> @@ -1331,7 +1337,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>
> out:
> if (copied) {
> - tcp_tx_timestamp(sk, &sockc);
> + tcp_tx_timestamp(sk, &sockc, first_write_seq);
> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> }
> out_nopush:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 0fe7d663a244..3769e38e052d 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7030,6 +7030,9 @@ enum {
> * feature is on. It indicates the
> * recorded timestamp.
> */
> + BPF_SOCK_OPS_TS_TCP_SND_CB, /* Called when every tcp_sendmsg
> + * syscall is triggered
> + */
> };
>
> /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
next prev parent reply other threads:[~2025-01-16 0:04 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-12 11:37 [PATCH net-next v5 00/15] net-timestamp: bpf extension to equip applications transparently Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 01/15] net-timestamp: add support for bpf_setsockopt() Jason Xing
2025-01-12 14:49 ` kernel test robot
2025-01-13 0:11 ` Jason Xing
2025-01-13 7:32 ` Jason Xing
2025-01-14 23:20 ` Martin KaFai Lau
2025-01-14 23:29 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 02/15] net-timestamp: prepare for bpf prog use Jason Xing
2025-01-14 23:39 ` Martin KaFai Lau
2025-01-15 0:09 ` Jason Xing
2025-01-15 0:15 ` Jason Xing
2025-01-15 0:26 ` Martin KaFai Lau
2025-01-15 0:37 ` Jason Xing
2025-01-15 0:43 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 03/15] bpf: introduce timestamp_used to allow UDP socket fetched in bpf prog Jason Xing
2025-01-15 1:17 ` Martin KaFai Lau
2025-01-15 2:28 ` Jason Xing
2025-01-15 2:54 ` Jason Xing
2025-01-16 0:51 ` Martin KaFai Lau
2025-01-16 1:12 ` Jason Xing
2025-01-18 1:42 ` Martin KaFai Lau
2025-01-18 1:58 ` Jason Xing
2025-01-18 2:16 ` Martin KaFai Lau
2025-01-18 2:37 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 04/15] net-timestamp: support SK_BPF_CB_FLAGS only in bpf_sock_ops_setsockopt Jason Xing
2025-01-15 21:22 ` Martin KaFai Lau
2025-01-15 23:26 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 05/15] net-timestamp: add strict check in some BPF calls Jason Xing
2025-01-12 14:37 ` kernel test robot
2025-01-13 0:28 ` Jason Xing
2025-01-15 21:48 ` Martin KaFai Lau
2025-01-15 23:32 ` Jason Xing
2025-01-18 2:15 ` Martin KaFai Lau
2025-01-18 6:28 ` Jason Xing
2025-01-17 10:18 ` kernel test robot
2025-01-12 11:37 ` [PATCH net-next v5 06/15] net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING Jason Xing
2025-01-15 22:11 ` Martin KaFai Lau
2025-01-15 23:50 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 07/15] net-timestamp: support SCM_TSTAMP_SCHED for bpf extension Jason Xing
2025-01-15 22:32 ` Martin KaFai Lau
2025-01-15 23:57 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 08/15] net-timestamp: support sw SCM_TSTAMP_SND " Jason Xing
2025-01-15 22:48 ` Martin KaFai Lau
2025-01-15 23:56 ` Jason Xing
2025-01-18 0:46 ` Martin KaFai Lau
2025-01-18 1:43 ` Jason Xing
2025-01-19 13:38 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 09/15] net-timestamp: support SCM_TSTAMP_ACK " Jason Xing
2025-01-15 23:02 ` Martin KaFai Lau
2025-01-12 11:37 ` [PATCH net-next v5 10/15] net-timestamp: support hw SCM_TSTAMP_SND " Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 11/15] net-timestamp: support export skb to the userspace Jason Xing
2025-01-15 23:05 ` Martin KaFai Lau
2025-01-15 23:59 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 12/15] net-timestamp: make TCP tx timestamp bpf extension work Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 13/15] net-timestamp: support tcp_sendmsg for bpf extension Jason Xing
2025-01-16 0:03 ` Martin KaFai Lau [this message]
2025-01-16 0:41 ` Jason Xing
2025-01-16 1:18 ` Martin KaFai Lau
2025-01-16 1:22 ` Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 14/15] net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases Jason Xing
2025-01-12 11:37 ` [PATCH net-next v5 15/15] bpf: add simple bpf tests in the tx path for so_timestamping feature Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d9ba064-3288-4926-b9dc-3119bb3404c1@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.