From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Jason Xing <kerneljasonxing@gmail.com>,
Jesper Dangaard Brouer <hawk@kernel.org>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, willemb@google.com,
kuniyu@google.com, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev,
jolsa@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me,
Simon Sundberg <Simon.Sundberg@kau.se>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer
Date: Wed, 03 Jun 2026 13:07:20 +0200 [thread overview]
Message-ID: <871pendgrb.fsf@toke.dk> (raw)
In-Reply-To: <87lddfn2m1.fsf@toke.dk>
Toke Høiland-Jørgensen <toke@toke.dk> writes:
> Jason Xing <kerneljasonxing@gmail.com> writes:
>
>> On Tue, May 19, 2026 at 7:16 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>
>>> On Tue, May 19, 2026 at 12:40 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>> >
>>> >
>>> >
>>> > On 18/05/2026 15.53, Jason Xing wrote:
>>> > > On Mon, May 18, 2026 at 9:01 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>> > >>
>>> > >>
>>> > >>
>>> > >> On 18/05/2026 10.23, Jason Xing wrote:
>>> > >>> From: Jason Xing <kernelxing@tencent.com>
>>> > >>>
>>> > >>> Add two if statements to accurately isolate bpf timestamping and so
>>> > >>> timestamping. They can work respectively.
>>> > >>>
>>> > >>> As to so_timestamping, only add a loose condition via report flags
>>> > >>> to avoid duplicate strict checks that is done in tcp_recv_timestamp()
>>> > >>> and performance impact. If the loose condition is hit,
>>> > >>> tcp_recv_timestamp() is able to handle the exact case and doesn't
>>> > >>> hamper the existing timestamping feature.
>>> > >>>
>>> > >>> Make it work in TCP protocol.
>>> > >>>
>>> > >>> Signed-off-by: Jason Xing <kernelxing@tencent.com>
>>> > >>> ---
>>> > >>> net/ipv4/tcp.c | 14 ++++++++++++--
>>> > >>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>> > >>>
>>> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> > >>> index 21ece4c71612..64c69bb3578a 100644
>>> > >>> --- a/net/ipv4/tcp.c
>>> > >>> +++ b/net/ipv4/tcp.c
>>> > >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags)
>>> > >>> release_sock(sk);
>>> > >>>
>>> > >>> if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
>>> > >>> - if (cmsg_flags & TCP_CMSG_TS)
>>> > >>> - tcp_recv_timestamp(msg, sk, &tss);
>>> > >>> + if (cmsg_flags & TCP_CMSG_TS) {
>>> > >>> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
>>> > >>> +
>>> > >>> + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>>> > >>> + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TIMESTAMPING))
>>> > >>> + bpf_skops_rx_timestamping(sk, &tss,
>>> > >>> + BPF_SOCK_OPS_TSTAMP_RCV_CB);
>>> > >>
>>> > >> Does this mean I can enable timestamp reading per cgroup?
>>> > >
>>> > > Yes, I think so, but I didn't try. One of the natures of sockopt
>>> > > feature is supporting cgroup attach.
>>> > > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably
>>> > > something that you're looking for.
>>> > >
>>> >
>>> > Sound good
>>> >
>>> > > IIUC, you can attach the prog onto the cgroup where all the sockets
>>> > > are set using the bpf timestamping function. So the current impl is
>>> > > cleaner and has better isolation (to filter out those unmatched
>>> > > flows).
>>> > >
>>> > >>
>>> > >> In Simon's netstacklat[1] tool we are forced process all RX timestamp
>>> > >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter[2] on
>>> > >> the cgroup IDs that we are interested in (which is a significant
>>> > >> overhead, as this is deployed at Cloudflare production scale).
>>> > >
>>> > > I can feel the pain when filtering in this kind of relatively hot
>>> > > path, which is what I'm trying to avoid internally. What I've done in
>>> > > production (to cover those old kernels) is to just let the kernel
>>> > > print the information, that's it, and there is an agent continuously
>>> > > gathering the data, doing the match and computing latency. But it's
>>> > > overall complicated.
>>> > >
>>> >
>>> > I hope you don't mean your internal/old approach was using printk and
>>> > then analyzing this data.
>>>
>>> Of course not :)
>>>
>>> The internal approach is to cover the old kernels but doesn't mean the
>>> approach is old :P
>>>
>>> Instead, the internal kernel module is super efficient and I'm trying
>>> to ship bpf with such an ability. The fact is we've already deployed
>>> in production: 7x24 running, zero sampling.
>>>
>>> Please see page 24 where there is a brief introduction on how to deal
>>> with the log part:
>>> https://lpc.events/event/19/contributions/2055/#preview:3846
>>> I believe this is the promising direction (ring buffer + lightweight
>>> kernel + heavy agent) we're taking.
>>>
>>> The headache part is that I need to provide an agent written in BPF to
>>> do the heavy process.
>>>
>>> >
>>> > > Many thanks here, I'm always interested in hearing more useful and
>>> > > real requirements and fancy ideas on how to monitor the latency :) Now
>>> >
>>> > Simon Sundberg <Simon.Sundberg@kau.se> have many more fancy ideas on how
>>> > to monitor the latency.
>>> > The netstacklat tool is part of Simon's PhD thesis:
>>> > - https://doi.org/10.59217/qklv6836
>>> >
>>> > And we even gotten a paper accepted on netstacklat:
>>> > -
>>> > https://kau.diva-portal.org/smash/record.jsf?pid=diva2%3A2034009&dswid=3032
>>>
>>> Sorry, I cannot access this link. Could you give me the title of this paper?
>>
>> Waiting at the Front Door - Continuous Monitoring of Latency in the
>> Host Network Stack
>>
>> Oh, I guess it hasn't been officially published right? This is the
>> reason why I have no way to know the content.
>
> No, it's not published yet; I'll send you a copy off-list :)
In case anyone else is interested, the paper is now on Arxiv:
https://arxiv.org/abs/2606.02057
-Toke
next prev parent reply other threads:[~2026-06-03 11:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-18 8:23 [PATCH net-next 0/6] bpf-timetamp: support rx side Jason Xing
2026-05-18 8:23 ` [PATCH net-next 1/6] bpf: Add bpf_ktime_get_real_ns() kfunc Jason Xing
2026-05-18 11:57 ` Jesper Dangaard Brouer
2026-05-18 12:35 ` Jason Xing
2026-05-18 8:23 ` [PATCH net-next 2/6] net: export sock_disable_timestamp() declaration Jason Xing
2026-05-19 8:25 ` sashiko-bot
2026-05-19 11:50 ` Jason Xing
2026-05-18 8:23 ` [PATCH net-next 3/6] bpf: support bpf_setsockopt for bpf timestamping rx feature Jason Xing
2026-05-18 8:23 ` [PATCH net-next 4/6] bpf: add BPF_SOCK_OPS_TSTAMP_RCV_CB callback Jason Xing
2026-05-18 8:23 ` [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer Jason Xing
2026-05-18 13:01 ` Jesper Dangaard Brouer
2026-05-18 13:53 ` Jason Xing
2026-05-18 16:40 ` Jesper Dangaard Brouer
2026-05-18 23:16 ` Jason Xing
2026-05-18 23:24 ` Jason Xing
2026-05-19 9:57 ` Toke Høiland-Jørgensen
2026-06-03 11:07 ` Toke Høiland-Jørgensen [this message]
2026-05-18 15:34 ` Stanislav Fomichev
2026-05-18 23:56 ` Jason Xing
2026-05-19 8:25 ` sashiko-bot
2026-05-19 10:31 ` Jiayuan Chen
2026-05-19 12:26 ` Jason Xing
2026-05-18 8:23 ` [PATCH net-next 6/6] selftests/bpf: Add RX latency tests for bpf timestamping Jason Xing
2026-05-19 8:25 ` sashiko-bot
2026-05-19 12:05 ` Jason Xing
2026-05-18 11:46 ` [PATCH net-next 0/6] bpf-timetamp: support rx side Jesper Dangaard Brouer
2026-05-18 12:32 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871pendgrb.fsf@toke.dk \
--to=toke@kernel.org \
--cc=Simon.Sundberg@kau.se \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.