Netdev List
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, willemb@google.com,
	kuniyu@google.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev,
	jolsa@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me,
	"Simon Sundberg" <Simon.Sundberg@kau.se>,
	"Toke Høiland-Jørgensen" <toke@toke.dk>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Jason Xing" <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
Date: Mon, 18 May 2026 18:40:09 +0200	[thread overview]
Message-ID: <2942dd24-3b6f-4e88-acb2-67d35ea8938b@kernel.org> (raw)
In-Reply-To: <CAL+tcoDRSpVsiCym+DYsGLBGrdEuim7AZqyBTHYzd-OSBki5-Q@mail.gmail.com>



On 18/05/2026 15.53, Jason Xing wrote:
> On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>
>>
>>
>> On 18/05/2026 10.23, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> Add two if statements to accurately isolate bpf timestamping and so
>>> timestamping. They can work respectively.
>>>
>>> As to so_timestamping, only add a loose condition via report flags
>>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
>>> and performance impact. If the loose condition is hit,
>>> tcp_recv_timestamp() is able to handle the exact case and doesn't
>>> hamper the existing timestamping feature.
>>>
>>> Make it work in TCP protocol.
>>>
>>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> ---
>>>    net/ipv4/tcp.c | 14 ++++++++++++--
>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 21ece4c71612..64c69bb3578a 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>>>        release_sock(sk);
>>>
>>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
>>> -             if (cmsg_flags & TCP_CMSG_TS)
>>> -                     tcp_recv_timestamp(msg, sk, &tss);
>>> +             if (cmsg_flags & TCP_CMSG_TS) {
>>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
>>> +
>>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
>>> +                             bpf_skops_rx_timestamping(sk, &tss,
>>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>>
>> Does this mean I can enable timestamp reading per cgroup?
> 
> Yes, I think so, but I didn't try. One of the natures of sockopt
> feature is supporting cgroup attach.
> cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
> something that you're looking for.
> 

Sound good

> IIUC, you can attach the prog onto the cgroup where all the sockets
> are set using the bpf timestamping function. So the current impl is
> cleaner and has better isolation (to filter out those unmatched
> flows).
> 
>>
>> In Simon's netstacklat[1] tool we are forced process all RX timestamp
>> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
>> the cgroup IDs that we are interested in (which is a significant
>> overhead, as this is deployed at Cloudflare production scale).
> 
> I can feel the pain when filtering in this kind of relatively hot
> path, which is what I'm trying to avoid internally. What I've done in
> production (to cover those old kernels) is to just let the kernel
> print the information, that's it, and there is an agent continuously
> gathering the data, doing the match and computing latency. But it's
> overall complicated.
> 

I hope you don't mean your internal/old approach was using printk and 
then analyzing this data.

> Many thanks here, I'm always interested in hearing more useful and
> real requirements and fancy ideas on how to monitor the latency :) Now

Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how 
to monitor the latency.
The netstacklat tool is part of Simon's PhD thesis:
- https://doi.org/10.59217/qklv6836

And we even gotten a paper accepted on netstacklat:
- 
https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032


> I'm still struggling to port the internal functions to bpf
> timestamping.
> 

Good luck, feel free to get inspired by our netstacklat tool.
Key point is to let BPF store latency histograms and then let userspace
periodically consume these - as heatmaps.

We are proposing to add 'netstacklat' as a new libbpf-tools utility
- https://github.com/iovisor/bcc/issues/5510


>>
>>
>> [1] https://github.com/xdp-project/bpf-examples/tree/main/netstacklat
>>
>> [2]
>> https://github.com/xdp-project/bpf-examples/blob/main/netstacklat/netstacklat.bpf.c#L484-L488
>>
>>
>>> +                     if (sock_flag(sk, SOCK_RCVTSTAMP) ||
>>> +                         tsflags & SOF_TIMESTAMPING_SOFTWARE ||
>>> +                         tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
>>> +                             tcp_recv_timestamp(msg, sk, &tss);
>>> +             }
>>>                if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
>>>                        msg->msg_inq = tcp_inq_hint(sk);
>>>                        if (cmsg_flags & TCP_CMSG_INQ)
>>


  reply	other threads:[~2026-05-18 16:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
2026-05-18 11:57   ` Jesper Dangaard Brouer
2026-05-18 12:35     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
2026-05-18  8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
2026-05-18  8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
2026-05-18 13:01   ` Jesper Dangaard Brouer
2026-05-18 13:53     ` Jason Xing
2026-05-18 16:40       ` Jesper Dangaard Brouer [this message]
2026-05-18 23:16         ` Jason Xing
2026-05-18 23:24           ` Jason Xing
2026-05-19  9:57             ` Toke Høiland-Jørgensen
2026-05-18 15:34   ` Stanislav Fomichev
2026-05-18 23:56     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
2026-05-18 12:32   ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2942dd24-3b6f-4e88-acb2-67d35ea8938b@kernel.org \
    --to=hawk@kernel.org \
    --cc=Simon.Sundberg@kau.se \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=toke@toke.dk \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox