From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Xing <kerneljasonxing@gmail.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, willemb@google.com,
martin.lau@kernel.org, netdev@vger.kernel.org,
bpf@vger.kernel.org, Jason Xing <kernelxing@tencent.com>,
Yushan Zhou <katrinzhou@tencent.com>
Subject: Re: [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs
Date: Wed, 08 Apr 2026 11:15:09 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.257654f9a3f23@gmail.com> (raw)
In-Reply-To: <CAL+tcoBt1A5GYFvimkcRFUtmy298y13AiF-XFF8b8E8Y6fg8Xw@mail.gmail.com>
> > > > Since we're modifying the kernel, how about adding a new member to
> > > > record sendmsg time which bpf script is able to read. The whole
> > > > scenario looks like this:
> > > > 1) in tcp_sendmsg_locked(), record the sendmsg time for each skb
> > > > 2) in either tso_fragment() or tcp_gso_tstamp(), each new skb will get
> > > > a copy of its original skb
> > > > 3) in each stage, bpf script reads the skb's sendmsg time and the
> > > > current time, and then effortlessly do the math.
> > > >
> > > > At this point, what I had in mind is we have two options:
> > > > 1) only handle the skb from the view of the send syscall layer, which
> > > > is, for sure, very simple but not thorough.
> > > > 2) stick to a pure authentic packet basis, then adding a new member
> > > > seems inevitable. so the question would be where to add? The space of
> > > > the skb structure is very precious :(
> > >
> > > Finding a suitable place to put this timestamp is really hard. IIRC,
> > > we can't expand the size of struct skb_shared_info so easily since
> > > it's a global effect.
> > >
> > > I'm wondering if we can turn the per-packet mode into a non-compatible
> > > feature by reusing 'u32 tskey' to store a microsecond timestamp of
> > > sendmsg.
> >
> > Agreed that an extra field is hard. We should avoid that.
>
> Avoiding adding a new one makes the whole work extremely hard. I'm
> wondering since we have hwtstamp in shared info, why not add a
> software one for timestamping use? Then, we would support more
> different protocols in more different stages in a finer grain, which
> is a big coarse picture in my mind.
I don't understand the need to store more data in the skb for BPF.
With BPF hooks, the bpf program can record the relevant data directly
in a BPF map.
> Adding a software bit will completely reduce the whole complexity and
> be very easy to use. Would you expect to see a draft by adding such a
> bit first?
>
> Or just like I mentioned, repurposing tskey seems an alternative,
> which, however, makes the new feature incompatible.
>
> >
> > If the purpose is to group skbs by sendmsg call (e.g., to filter out
> > all but the last one), it is probably also unnecessary.
> >
> > From a process PoV, since the process knows the sendmsg len and each
> > skb has a tskey in byte offset, it can correlate the skb with a given
> > sendmsg buffer.
> >
> > The BPF program is under control of a third-party admin. So that does
> > not follow directly. But it can be passed additional metadata.
> >
> > I thought about passing the offset of the skb from the start of the
> > sendmsg buffer to identify all consecutive skbs for a sendmsg call,
> > as each new buffer will start with an skb with offset 0 ..
> >
> > .. but that won't work as there is no guarantee that a sendmsg call
> > will not append to an existing outstanding skb.
>
> Right. TCP is way too complex and we indeed see some tough issues when
> trying to deploy the feature. So my humble take is to make the design
> as simple as possible.
>
> >
> > Anyway, the general idea is to pass to the BPF program through
> > bpf_skops_tx_timestamping some relevant signal , without having to
> > expand either skb or sk itself.
> >
> > I hear you on that measuring every skb is too frequent. But is calling
> > the BPF program and letting it decide whether to measure too? BPF
> > program invocation itself should be cheap.
>
> Oh, I was clear enough. Sorry. I meant tracing per skb is definitely
> an awesome way to go. My ultimate goal is to do so. Instead of letting
> people implement various fine grained bpf progs, we can provide a very
> easy/understandable/efficient approach with more samples. It should be
> very beneficial.
>
> >
> > If per-push is preferable, with a filter ability like the above, it
> > seems more useful to me already.
>
> Push-level is a compromise plan. Packet-level is what I always pursue :)
Then why not directly implement per-packet.
If the BPF call is cheap and the BPF program can choose to selectively
track packets.
Reminder that you do not want to break (BPF) users by changing
behavior. Let alone more than once. If per-push is going to be
obsoleted, skip ip entirely.
> The current series has this ability: the bpf prog noticed it's a
> SENDMSG sock option and will selectively call
> bpf_sock_ops_enable_tx_tstamp() to do so. Only by calling
> bpf_sock_ops_enable_tx_tstamp() could the skb be tracked.
>
> Thanks,
> Jason
next prev parent reply other threads:[~2026-04-08 15:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-04 15:04 [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 1/4] tcp: separate BPF timestamping from tcp_tx_timestamp Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 2/4] tcp: advance the tsflags check to save cycles Jason Xing
2026-04-06 2:23 ` Willem de Bruijn
2026-04-06 11:48 ` Jason Xing
2026-04-04 15:04 ` [PATCH net-next v2 3/4] bpf-timestamp: keep track of the skb when wait_for_space occurs Jason Xing
2026-04-06 2:28 ` Willem de Bruijn
2026-04-06 11:59 ` Jason Xing
2026-04-06 14:37 ` Willem de Bruijn
2026-04-07 3:33 ` Jason Xing
2026-04-07 7:43 ` Jason Xing
2026-04-07 21:17 ` Willem de Bruijn
2026-04-08 0:35 ` Jason Xing
2026-04-08 7:30 ` Jason Xing
2026-04-08 15:15 ` Willem de Bruijn [this message]
2026-04-08 18:12 ` Martin KaFai Lau
2026-04-04 15:04 ` [PATCH net-next v2 4/4] bpf-timestamp: complete tracing the skb from each push in sendmsg Jason Xing
2026-04-06 2:17 ` [PATCH net-next v2 0/4] bpf-timestamp: convert to push-level granularity Willem de Bruijn
2026-04-06 12:25 ` Jason Xing
2026-04-06 14:38 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.257654f9a3f23@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=katrinzhou@tencent.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox