From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15A6A4534A8; Wed, 3 Jun 2026 11:07:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780484845; cv=none; b=nVA7/v/HTXvGFSY7k0zqIlonAC5yxsagtTLgjlL2itsP35u30uumiOvsGdtR4zlUki1P6meE0mZKNaZPwnYRPNkFWkixnBArBCZ0M1Cr9PSi413/EDMJwwiUaZ6czWIADhiuVbSZdkRanixfpPXKTQK1SFy/OEvRI8uOf6eXk1k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780484845; c=relaxed/simple; bh=QR21B1XieO1Gv/zvOXQd1clr0hyDmZzfZmqsZDRnmmM=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=fqxAw65/p8rtNbdNxeWyvFyaEpPlcuK+0aqWeSJfz5o580fAmnfdiY/oii71Y+2iYV+3xHQet801x2NhdBZK3YWiZD5ZjbTXQ8TH5Sg/Qkz5zU30hHkGOfbaYdN53vvP8SDVr3QL2B12ZWBtx1ATU7QwyIPClyWRbrUS5y8mVyo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MszdDAJk; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MszdDAJk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83BC61F0089A; Wed, 3 Jun 2026 11:07:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780484843; bh=Ui/yXK7lZaXIcLfzck+vL1NDPgq7d4i58XIiuImd1TA=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=MszdDAJkXZzyP3gNU6o+YgfTsHh54ExDatarqbsslYE5R2BJ/N7mm3N2gmnU81/tB Nv0+UcW5D6xL7cyCBofxdG9jfubI2JsjobJR0JIAfxlZN8ekQFixsC2YdEfO1ZbZ0s phCDPk2Ia65orjO0jKOAx2/ejmNiZh5HebjH8bBZoJxUFQqNsRIiOIsUcjzWgN1e9r ZrtiJb2Q5fRN68eZcpAaMF5lPYTazuzngxvLfuzZ+leaqlc0RyqZs8b0wAsNXl2HxJ VKXpYYSBGlyfHmPJk1Vr8O5+va18hNrX3O2XcmkdP+fm82BNSRXISIwbMflz7J68qQ clQY5L1kDM8Fg== Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id D12827BAE4D; Wed, 03 Jun 2026 13:07:20 +0200 (CEST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Jason Xing , Jesper Dangaard Brouer Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, willemb@google.com, kuniyu@google.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev, jolsa@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me, Simon Sundberg , netdev@vger.kernel.org, bpf@vger.kernel.org, Jason Xing Subject: Re: [PATCH net-next 5/6] bpf: enable bpf timestamping rx in TCP layer In-Reply-To: <87lddfn2m1.fsf@toke.dk> References: <20260518082344.96647-1-kerneljasonxing@gmail.com> <20260518082344.96647-6-kerneljasonxing@gmail.com> <2942dd24-3b6f-4e88-acb2-67d35ea8938b@kernel.org> <87lddfn2m1.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Wed, 03 Jun 2026 13:07:20 +0200 Message-ID: <871pendgrb.fsf@toke.dk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Toke H=C3=B8iland-J=C3=B8rgensen writes: > Jason Xing writes: > >> On Tue, May 19, 2026 at 7:16=E2=80=AFAM Jason Xing wrote: >>> >>> On Tue, May 19, 2026 at 12:40=E2=80=AFAM Jesper Dangaard Brouer wrote: >>> > >>> > >>> > >>> > On 18/05/2026 15.53, Jason Xing wrote: >>> > > On Mon, May 18, 2026 at 9:01=E2=80=AFPM Jesper Dangaard Brouer wrote: >>> > >> >>> > >> >>> > >> >>> > >> On 18/05/2026 10.23, Jason Xing wrote: >>> > >>> From: Jason Xing >>> > >>> >>> > >>> Add two if statements to accurately isolate bpf timestamping and = so >>> > >>> timestamping. They can work respectively. >>> > >>> >>> > >>> As to so_timestamping, only add a loose condition via report flags >>> > >>> to avoid duplicate strict checks that is done in tcp_recv_timesta= mp() >>> > >>> and performance impact. If the loose condition is hit, >>> > >>> tcp_recv_timestamp() is able to handle the exact case and doesn't >>> > >>> hamper the existing timestamping feature. >>> > >>> >>> > >>> Make it work in TCP protocol. >>> > >>> >>> > >>> Signed-off-by: Jason Xing >>> > >>> --- >>> > >>> net/ipv4/tcp.c | 14 ++++++++++++-- >>> > >>> 1 file changed, 12 insertions(+), 2 deletions(-) >>> > >>> >>> > >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>> > >>> index 21ece4c71612..64c69bb3578a 100644 >>> > >>> --- a/net/ipv4/tcp.c >>> > >>> +++ b/net/ipv4/tcp.c >>> > >>> @@ -2949,8 +2949,18 @@ int tcp_recvmsg(struct sock *sk, struct ms= ghdr *msg, size_t len, int flags) >>> > >>> release_sock(sk); >>> > >>> >>> > >>> if ((cmsg_flags | msg->msg_get_inq) && ret >=3D 0) { >>> > >>> - if (cmsg_flags & TCP_CMSG_TS) >>> > >>> - tcp_recv_timestamp(msg, sk, &tss); >>> > >>> + if (cmsg_flags & TCP_CMSG_TS) { >>> > >>> + u32 tsflags =3D READ_ONCE(sk->sk_tsflags); >>> > >>> + >>> > >>> + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && >>> > >>> + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_RX_TI= MESTAMPING)) >>> > >>> + bpf_skops_rx_timestamping(sk, &tss, >>> > >>> + BPF_SOCK_= OPS_TSTAMP_RCV_CB); >>> > >> >>> > >> Does this mean I can enable timestamp reading per cgroup? >>> > > >>> > > Yes, I think so, but I didn't try. One of the natures of sockopt >>> > > feature is supporting cgroup attach. >>> > > cgroup_bpf_prog_attach()/cgroup_bpf_link_attach() is probably >>> > > something that you're looking for. >>> > > >>> > >>> > Sound good >>> > >>> > > IIUC, you can attach the prog onto the cgroup where all the sockets >>> > > are set using the bpf timestamping function. So the current impl is >>> > > cleaner and has better isolation (to filter out those unmatched >>> > > flows). >>> > > >>> > >> >>> > >> In Simon's netstacklat[1] tool we are forced process all RX timest= amp >>> > >> (hooking fentry/tcp_recv_timestamp), and then we have a BPF filter= [2] on >>> > >> the cgroup IDs that we are interested in (which is a significant >>> > >> overhead, as this is deployed at Cloudflare production scale). >>> > > >>> > > I can feel the pain when filtering in this kind of relatively hot >>> > > path, which is what I'm trying to avoid internally. What I've done = in >>> > > production (to cover those old kernels) is to just let the kernel >>> > > print the information, that's it, and there is an agent continuously >>> > > gathering the data, doing the match and computing latency. But it's >>> > > overall complicated. >>> > > >>> > >>> > I hope you don't mean your internal/old approach was using printk and >>> > then analyzing this data. >>> >>> Of course not :) >>> >>> The internal approach is to cover the old kernels but doesn't mean the >>> approach is old :P >>> >>> Instead, the internal kernel module is super efficient and I'm trying >>> to ship bpf with such an ability. The fact is we've already deployed >>> in production: 7x24 running, zero sampling. >>> >>> Please see page 24 where there is a brief introduction on how to deal >>> with the log part: >>> https://lpc.events/event/19/contributions/2055/#preview:3846 >>> I believe this is the promising direction (ring buffer + lightweight >>> kernel + heavy agent) we're taking. >>> >>> The headache part is that I need to provide an agent written in BPF to >>> do the heavy process. >>> >>> > >>> > > Many thanks here, I'm always interested in hearing more useful and >>> > > real requirements and fancy ideas on how to monitor the latency :) = Now >>> > >>> > Simon Sundberg have many more fancy ideas on = how >>> > to monitor the latency. >>> > The netstacklat tool is part of Simon's PhD thesis: >>> > - https://doi.org/10.59217/qklv6836 >>> > >>> > And we even gotten a paper accepted on netstacklat: >>> > - >>> > https://kau.diva-portal.org/smash/record.jsf?pid=3Ddiva2%3A2034009&ds= wid=3D3032 >>> >>> Sorry, I cannot access this link. Could you give me the title of this p= aper? >> >> Waiting at the Front Door - Continuous Monitoring of Latency in the >> Host Network Stack >> >> Oh, I guess it hasn't been officially published right? This is the >> reason why I have no way to know the content. > > No, it's not published yet; I'll send you a copy off-list :) In case anyone else is interested, the paper is now on Arxiv: https://arxiv.org/abs/2606.02057 -Toke