Netdev List
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Jason Xing <kerneljasonxing@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, willemb@google.com,
	kuniyu@google.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev,
	jolsa@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me,
	Simon Sundberg <Simon.Sundberg@kau.se>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
Date: Tue, 19 May 2026 11:57:26 +0200	[thread overview]
Message-ID: <87lddfn2m1.fsf@toke.dk> (raw)
In-Reply-To: <CAL+tcoA_VBcXu_2zVXFvsWF7+U=-TZf7bCz0KzNpN=p=82tB=w@mail.gmail.com>

Jason Xing <kerneljasonxing@gmail.com> writes:

> On Tue, May 19, 2026 at 7:16 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Tue, May 19, 2026 at 12:40 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>> >
>> >
>> >
>> > On 18/05/2026 15.53, Jason Xing wrote:
>> > > On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>> > >>
>> > >>
>> > >>
>> > >> On 18/05/2026 10.23, Jason Xing wrote:
>> > >>> From: Jason Xing <kernelxing@tencent.com>
>> > >>>
>> > >>> Add two if statements to accurately isolate bpf timestamping and so
>> > >>> timestamping. They can work respectively.
>> > >>>
>> > >>> As to so_timestamping, only add a loose condition via report flags
>> > >>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
>> > >>> and performance impact. If the loose condition is hit,
>> > >>> tcp_recv_timestamp() is able to handle the exact case and doesn't
>> > >>> hamper the existing timestamping feature.
>> > >>>
>> > >>> Make it work in TCP protocol.
>> > >>>
>> > >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>> > >>> ---
>> > >>>    net/ipv4/tcp.c | 14 ++++++++++++--
>> > >>>    1 file changed, 12 insertions(+), 2 deletions(-)
>> > >>>
>> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> > >>> index 21ece4c71612..64c69bb3578a 100644
>> > >>> --- a/net/ipv4/tcp.c
>> > >>> +++ b/net/ipv4/tcp.c
>> > >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>> > >>>        release_sock(sk);
>> > >>>
>> > >>>        if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
>> > >>> -             if (cmsg_flags & TCP_CMSG_TS)
>> > >>> -                     tcp_recv_timestamp(msg, sk, &tss);
>> > >>> +             if (cmsg_flags & TCP_CMSG_TS) {
>> > >>> +                     u32 tsflags = READ_ONCE(sk->sk_tsflags);
>> > >>> +
>> > >>> +                     if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>> > >>> +                         SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
>> > >>> +                             bpf_skops_rx_timestamping(sk, &tss,
>> > >>> +                                                       BPF_SOCK_OPS_TSTAMP_RCV_CB);
>> > >>
>> > >> Does this mean I can enable timestamp reading per cgroup?
>> > >
>> > > Yes, I think so, but I didn't try. One of the natures of sockopt
>> > > feature is supporting cgroup attach.
>> > > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
>> > > something that you're looking for.
>> > >
>> >
>> > Sound good
>> >
>> > > IIUC, you can attach the prog onto the cgroup where all the sockets
>> > > are set using the bpf timestamping function. So the current impl is
>> > > cleaner and has better isolation (to filter out those unmatched
>> > > flows).
>> > >
>> > >>
>> > >> In Simon's netstacklat[1] tool we are forced process all RX timestamp
>> > >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
>> > >> the cgroup IDs that we are interested in (which is a significant
>> > >> overhead, as this is deployed at Cloudflare production scale).
>> > >
>> > > I can feel the pain when filtering in this kind of relatively hot
>> > > path, which is what I'm trying to avoid internally. What I've done in
>> > > production (to cover those old kernels) is to just let the kernel
>> > > print the information, that's it, and there is an agent continuously
>> > > gathering the data, doing the match and computing latency. But it's
>> > > overall complicated.
>> > >
>> >
>> > I hope you don't mean your internal/old approach was using printk and
>> > then analyzing this data.
>>
>> Of course not :)
>>
>> The internal approach is to cover the old kernels but doesn't mean the
>> approach is old :P
>>
>> Instead, the internal kernel module is super efficient and I'm trying
>> to ship bpf with such an ability. The fact is we've already deployed
>> in production: 7x24 running, zero sampling.
>>
>> Please see page 24 where there is a brief introduction on how to deal
>> with the log part:
>> https://lpc.events/event/19/contributions/2055/#preview:3846
>> I believe this is the promising direction (ring buffer + lightweight
>> kernel + heavy agent) we're taking.
>>
>> The headache part is that I need to provide an agent written in BPF to
>> do the heavy process.
>>
>> >
>> > > Many thanks here, I'm always interested in hearing more useful and
>> > > real requirements and fancy ideas on how to monitor the latency :) Now
>> >
>> > Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how
>> > to monitor the latency.
>> > The netstacklat tool is part of Simon's PhD thesis:
>> > - https://doi.org/10.59217/qklv6836
>> >
>> > And we even gotten a paper accepted on netstacklat:
>> > -
>> > https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032
>>
>> Sorry, I cannot access this link. Could you give me the title of this paper?
>
> Waiting at the Front Door - Continuous Monitoring of Latency in the
> Host Network Stack
>
> Oh, I guess it hasn't been officially published right? This is the
> reason why I have no way to know the content.

No, it's not published yet; I'll send you a copy off-list :)

-Toke

  reply	other threads:[~2026-05-19  9:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
2026-05-18  8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
2026-05-18 11:57   ` Jesper Dangaard Brouer
2026-05-18 12:35     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
2026-05-18  8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
2026-05-18  8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
2026-05-18  8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
2026-05-18 13:01   ` Jesper Dangaard Brouer
2026-05-18 13:53     ` Jason Xing
2026-05-18 16:40       ` Jesper Dangaard Brouer
2026-05-18 23:16         ` Jason Xing
2026-05-18 23:24           ` Jason Xing
2026-05-19  9:57             ` Toke Høiland-Jørgensen [this message]
2026-05-18 15:34   ` Stanislav Fomichev
2026-05-18 23:56     ` Jason Xing
2026-05-18  8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
2026-05-18 12:32   ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lddfn2m1.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=Simon.Sundberg@kau.se \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox