* [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
@ 2025-02-24 18:24 Breno Leitao
2025-02-24 19:03 ` Eric Dumazet
0 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-02-24 18:24 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, linux-trace-kernel, kernel-team,
yonghong.song, Breno Leitao
Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
the tracing of TCP messages being sent.
Meta has been using BPF programs to monitor this function for years,
indicating significant interest in observing this important
functionality. Adding a proper tracepoint provides a stable API for all
users who need visibility into TCP message transmission.
The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
creating unnecessary trace event infrastructure and tracefs exports,
keeping the implementation minimal while stabilizing the API.
Given that this patch creates a rawtracepoint, you could hook into it
using regular tooling, like bpftrace, using regular rawtracepoint
infrastructure, such as:
rawtracepoint:tcp_sendmsg_tp {
....
}
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/trace/events/tcp.h | 5 +++++
net/ipv4/tcp.c | 2 ++
2 files changed, 7 insertions(+)
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1a40c41ff8c30..7c0171d2dacdc 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,11 @@ TRACE_EVENT(tcp_retransmit_synack,
__entry->saddr_v6, __entry->daddr_v6)
);
+DECLARE_TRACE(tcp_sendmsg_tp,
+ TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size),
+ TP_ARGS(sk, msg, size)
+);
+
DECLARE_TRACE(tcp_cwnd_reduction_tp,
TP_PROTO(const struct sock *sk, int newly_acked_sacked,
int newly_lost, int flag),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 08d73f17e8162..5ef86fbd8aa85 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1362,6 +1362,8 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
int ret;
+ trace_tcp_sendmsg_tp(sk, msg, size);
+
lock_sock(sk);
ret = tcp_sendmsg_locked(sk, msg, size);
release_sock(sk);
---
base-commit: e13b6da7045f997e1a5a5efd61d40e63c4fc20e8
change-id: 20250224-tcpsendmsg-4f0a236751d7
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 18:24 [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg() Breno Leitao
@ 2025-02-24 19:03 ` Eric Dumazet
2025-02-24 19:12 ` Yonghong Song
2025-02-24 19:16 ` David Ahern
0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2025-02-24 19:03 UTC (permalink / raw)
To: Breno Leitao
Cc: Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>
> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> the tracing of TCP messages being sent.
>
> Meta has been using BPF programs to monitor this function for years,
> indicating significant interest in observing this important
> functionality. Adding a proper tracepoint provides a stable API for all
> users who need visibility into TCP message transmission.
>
> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> creating unnecessary trace event infrastructure and tracefs exports,
> keeping the implementation minimal while stabilizing the API.
>
> Given that this patch creates a rawtracepoint, you could hook into it
> using regular tooling, like bpftrace, using regular rawtracepoint
> infrastructure, such as:
>
> rawtracepoint:tcp_sendmsg_tp {
> ....
> }
I would expect tcp_sendmsg() being stable enough ?
kprobe:tcp_sendmsg {
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:03 ` Eric Dumazet
@ 2025-02-24 19:12 ` Yonghong Song
2025-02-24 19:23 ` Eric Dumazet
2025-02-24 19:16 ` David Ahern
1 sibling, 1 reply; 18+ messages in thread
From: Yonghong Song @ 2025-02-24 19:12 UTC (permalink / raw)
To: Eric Dumazet, Breno Leitao
Cc: Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller, David Ahern,
kuba@kernel.org, Paolo Abeni, Simon Horman,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Kernel Team,
yonghong.song@linux.dev
> ________________________________________
>
> On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
>> the tracing of TCP messages being sent.
>>
>> Meta has been using BPF programs to monitor this function for years,
>> indicating significant interest in observing this important
>> functionality. Adding a proper tracepoint provides a stable API for all
>> users who need visibility into TCP message transmission.
>>
>> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
>> creating unnecessary trace event infrastructure and tracefs exports,
>> keeping the implementation minimal while stabilizing the API.
>>
>> Given that this patch creates a rawtracepoint, you could hook into it
>> using regular tooling, like bpftrace, using regular rawtracepoint
>> infrastructure, such as:
>>
>> rawtracepoint:tcp_sendmsg_tp {
>> ....
>> }
>
> I would expect tcp_sendmsg() being stable enough ?
>
> kprobe:tcp_sendmsg {
> }
In LTO mode, tcp_sendmsg could be inlined cross files. For example,
net/ipv4/tcp.c:
int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
net/ipv4/tcp_bpf.c:
...
return tcp_sendmsg(sk, msg, size);
net/ipv6/af_inet6.c:
...
return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)
And this does happen in our production environment.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:03 ` Eric Dumazet
2025-02-24 19:12 ` Yonghong Song
@ 2025-02-24 19:16 ` David Ahern
2025-02-26 16:10 ` Breno Leitao
1 sibling, 1 reply; 18+ messages in thread
From: David Ahern @ 2025-02-24 19:16 UTC (permalink / raw)
To: Eric Dumazet, Breno Leitao
Cc: Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On 2/24/25 12:03 PM, Eric Dumazet wrote:
> On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
>> the tracing of TCP messages being sent.
>>
>> Meta has been using BPF programs to monitor this function for years,
>> indicating significant interest in observing this important
>> functionality. Adding a proper tracepoint provides a stable API for all
>> users who need visibility into TCP message transmission.
>>
>> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
>> creating unnecessary trace event infrastructure and tracefs exports,
>> keeping the implementation minimal while stabilizing the API.
>>
>> Given that this patch creates a rawtracepoint, you could hook into it
>> using regular tooling, like bpftrace, using regular rawtracepoint
>> infrastructure, such as:
>>
>> rawtracepoint:tcp_sendmsg_tp {
>> ....
>> }
>
> I would expect tcp_sendmsg() being stable enough ?
>
> kprobe:tcp_sendmsg {
> }
Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
more use cases (see kernel references to it).
We have a patch for a couple years now with a tracepoint inside the
while (msg_data_left(msg)) {
}
loop which is more useful than just entry to sendmsg.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:12 ` Yonghong Song
@ 2025-02-24 19:23 ` Eric Dumazet
2025-02-25 10:58 ` Breno Leitao
2025-02-26 23:46 ` Masami Hiramatsu
0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2025-02-24 19:23 UTC (permalink / raw)
To: Yonghong Song
Cc: Breno Leitao, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller, David Ahern,
kuba@kernel.org, Paolo Abeni, Simon Horman,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Kernel Team,
yonghong.song@linux.dev
On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote:
>
> > ________________________________________
> >
> > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
> >>
> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> >> the tracing of TCP messages being sent.
> >>
> >> Meta has been using BPF programs to monitor this function for years,
> >> indicating significant interest in observing this important
> >> functionality. Adding a proper tracepoint provides a stable API for all
> >> users who need visibility into TCP message transmission.
> >>
> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> >> creating unnecessary trace event infrastructure and tracefs exports,
> >> keeping the implementation minimal while stabilizing the API.
> >>
> >> Given that this patch creates a rawtracepoint, you could hook into it
> >> using regular tooling, like bpftrace, using regular rawtracepoint
> >> infrastructure, such as:
> >>
> >> rawtracepoint:tcp_sendmsg_tp {
> >> ....
> >> }
> >
> > I would expect tcp_sendmsg() being stable enough ?
> >
> > kprobe:tcp_sendmsg {
> > }
>
> In LTO mode, tcp_sendmsg could be inlined cross files. For example,
>
> net/ipv4/tcp.c:
> int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> net/ipv4/tcp_bpf.c:
> ...
> return tcp_sendmsg(sk, msg, size);
> net/ipv6/af_inet6.c:
> ...
> return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)
>
> And this does happen in our production environment.
And we do not have a way to make the kprobe work even if LTO decided
to inline a function ?
This seems like a tracing or LTO issue, this could be addressed there
in a generic way
and avoid many other patches to work around this.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:23 ` Eric Dumazet
@ 2025-02-25 10:58 ` Breno Leitao
2025-02-26 23:46 ` Masami Hiramatsu
1 sibling, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-02-25 10:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: Yonghong Song, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller, David Ahern,
kuba@kernel.org, Paolo Abeni, Simon Horman,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Kernel Team,
yonghong.song@linux.dev
Hello Eric,
On Mon, Feb 24, 2025 at 08:23:00PM +0100, Eric Dumazet wrote:
> On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote:
> >
> > > ________________________________________
> > >
> > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
> > >>
> > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> > >> the tracing of TCP messages being sent.
> > >>
> > >> Meta has been using BPF programs to monitor this function for years,
> > >> indicating significant interest in observing this important
> > >> functionality. Adding a proper tracepoint provides a stable API for all
> > >> users who need visibility into TCP message transmission.
> > >>
> > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> > >> creating unnecessary trace event infrastructure and tracefs exports,
> > >> keeping the implementation minimal while stabilizing the API.
> > >>
> > >> Given that this patch creates a rawtracepoint, you could hook into it
> > >> using regular tooling, like bpftrace, using regular rawtracepoint
> > >> infrastructure, such as:
> > >>
> > >> rawtracepoint:tcp_sendmsg_tp {
> > >> ....
> > >> }
> > >
> > > I would expect tcp_sendmsg() being stable enough ?
> > >
> > > kprobe:tcp_sendmsg {
> > > }
> >
> > In LTO mode, tcp_sendmsg could be inlined cross files. For example,
> >
> > net/ipv4/tcp.c:
> > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > net/ipv4/tcp_bpf.c:
> > ...
> > return tcp_sendmsg(sk, msg, size);
> > net/ipv6/af_inet6.c:
> > ...
> > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)
> >
> > And this does happen in our production environment.
>
> And we do not have a way to make the kprobe work even if LTO decided
> to inline a function ?
>
> This seems like a tracing or LTO issue, this could be addressed there
> in a generic way
> and avoid many other patches to work around this.
I understand that the {raw}tracepoint ensures the compiler cannot
interfere with these hook points. For everything else, we rely on the
hope that the compiler behaves favorably, which is far from ideal.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:16 ` David Ahern
@ 2025-02-26 16:10 ` Breno Leitao
2025-02-26 17:12 ` David Ahern
0 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-02-26 16:10 UTC (permalink / raw)
To: David Ahern
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
Hello David,
On Mon, Feb 24, 2025 at 12:16:04PM -0700, David Ahern wrote:
> On 2/24/25 12:03 PM, Eric Dumazet wrote:
> > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
> >>
> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> >> the tracing of TCP messages being sent.
> >>
> >> Meta has been using BPF programs to monitor this function for years,
> >> indicating significant interest in observing this important
> >> functionality. Adding a proper tracepoint provides a stable API for all
> >> users who need visibility into TCP message transmission.
> >>
> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> >> creating unnecessary trace event infrastructure and tracefs exports,
> >> keeping the implementation minimal while stabilizing the API.
> >>
> >> Given that this patch creates a rawtracepoint, you could hook into it
> >> using regular tooling, like bpftrace, using regular rawtracepoint
> >> infrastructure, such as:
> >>
> >> rawtracepoint:tcp_sendmsg_tp {
> >> ....
> >> }
> >
> > I would expect tcp_sendmsg() being stable enough ?
> >
> > kprobe:tcp_sendmsg {
> > }
>
> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
> more use cases (see kernel references to it).
Agree, this seems to provide more useful information
> We have a patch for a couple years now with a tracepoint inside the
Sorry, where do you have this patch? is it downstream?
> while (msg_data_left(msg)) {
> }
>
> loop which is more useful than just entry to sendmsg.
Do you mean something like the following?
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1a40c41ff8c30..23318e252d6b9 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,11 @@ TRACE_EVENT(tcp_retransmit_synack,
__entry->saddr_v6, __entry->daddr_v6)
);
+DECLARE_TRACE(tcp_sendmsg_tp,
+ TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size, ssize_t copied),
+ TP_ARGS(sk, msg, size, copied)
+);
+
DECLARE_TRACE(tcp_cwnd_reduction_tp,
TP_PROTO(const struct sock *sk, int newly_acked_sacked,
int newly_lost, int flag),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 08d73f17e8162..5fcef82275d4a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1290,6 +1290,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
sk_mem_charge(sk, copy);
}
+ trace_tcp_sendmsg_tp(sk, msg, size, copy);
+
if (!copied)
TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 16:10 ` Breno Leitao
@ 2025-02-26 17:12 ` David Ahern
2025-02-26 18:18 ` Breno Leitao
2025-02-26 22:46 ` Jason Xing
0 siblings, 2 replies; 18+ messages in thread
From: David Ahern @ 2025-02-26 17:12 UTC (permalink / raw)
To: Breno Leitao
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
[-- Attachment #1: Type: text/plain, Size: 855 bytes --]
On 2/26/25 9:10 AM, Breno Leitao wrote:
>> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
>> more use cases (see kernel references to it).
>
> Agree, this seems to provide more useful information
>
>> We have a patch for a couple years now with a tracepoint inside the
>
> Sorry, where do you have this patch? is it downstream?
company tree. Attached. Where to put tracepoints and what arguments to
supply so that it is beneficial to multiple users is always a touchy
subject :-), so I have not tried to push the patch out. sock arg should
be added to it for example.
The key is to see how tcp_sendmsg_locked breaks up the buffers, and then
another one in tcp_write_xmit to see when the actual push out happens.
At the time I was looking at latency in the stack - from sendmsg call to
driver pushing descriptors to hardware.
[-- Attachment #2: 0001-tcp-Add-tracepoints-to-tcp_write_xmit-and-tcp_sendms.patch --]
[-- Type: text/plain, Size: 3337 bytes --]
From 2298ca66c15bae6a95698abd8d029b9271fbefa3 Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Fri, 24 Mar 2023 22:07:53 +0000
Subject: [PATCH] tcp: Add tracepoints to tcp_write_xmit and tcp_sendmsg_locked
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/trace/events/tcp.h | 57 ++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp.c | 4 +++
net/ipv4/tcp_output.c | 1 +
3 files changed, 62 insertions(+)
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 901b440238d5..6b19e1d4d79d 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -187,6 +187,63 @@ DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust,
TP_ARGS(sk)
);
+TRACE_EVENT(tcp_sendmsg_locked,
+
+ TP_PROTO(struct msghdr *msg, struct sk_buff *skb, int size_goal),
+
+ TP_ARGS(msg, skb, size_goal),
+
+ TP_STRUCT__entry(
+ __field(__u64, skb)
+ __field(int, skb_len)
+ __field(int, msg_left)
+ __field(int, size_goal)
+ ),
+
+ TP_fast_assign(
+ __entry->skb = (__u64)skb;
+ __entry->skb_len = skb ? skb->len : 0;
+ __entry->msg_left = msg_data_left(msg);
+ __entry->size_goal = size_goal;
+ ),
+
+ TP_printk("skb %llx len %d msg_left %d size_goal %d",
+ __entry->skb, __entry->skb_len,
+ __entry->msg_left, __entry->size_goal)
+);
+
+TRACE_EVENT(tcp_write_xmit,
+
+ TP_PROTO(struct sock *sk, unsigned int mss_now, int nonagle, u32 max_segs),
+
+ TP_ARGS(sk, mss_now, nonagle,max_segs),
+
+ TP_STRUCT__entry(
+ __field(__u64, tcp_wstamp_ns)
+ __field(__u64, tcp_clock_cache)
+ __field(unsigned int, mss_now)
+ __field(unsigned int, max_segs)
+ __field(int, nonagle)
+ __field(int, sk_pacing)
+ ),
+
+ TP_fast_assign(
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ __entry->mss_now = mss_now;
+ __entry->nonagle = nonagle;
+ __entry->max_segs = max_segs;
+ __entry->sk_pacing = tcp_needs_internal_pacing(sk);
+ __entry->tcp_wstamp_ns = tp->tcp_wstamp_ns;
+ __entry->tcp_clock_cache = tp->tcp_clock_cache;
+ ),
+
+ TP_printk("mss %u segs %u nonagle %d sk_pacing %d tcp_wstamp_ns %lld tcp_clock_cache %lld",
+ __entry->mss_now, __entry->max_segs,
+ __entry->nonagle, __entry->sk_pacing,
+ __entry->tcp_wstamp_ns, __entry->tcp_clock_cache)
+);
+
TRACE_EVENT(tcp_retransmit_synack,
TP_PROTO(const struct sock *sk, const struct request_sock *req),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6bb8eb803105..0fa2c8e035b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
#include <linux/uaccess.h>
#include <asm/ioctls.h>
#include <net/busy_poll.h>
+#include <trace/events/tcp.h>
/* Track pending CMSGs. */
enum {
@@ -1312,6 +1313,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
if (skb)
copy = size_goal - skb->len;
+ trace_tcp_sendmsg_locked(msg, skb, size_goal);
+
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
bool first_skb;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 74190518708a..c84f18cd9b7e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2623,6 +2623,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
}
max_segs = tcp_tso_segs(sk, mss_now);
+ trace_tcp_write_xmit(sk, mss_now, nonagle, max_segs);
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 17:12 ` David Ahern
@ 2025-02-26 18:18 ` Breno Leitao
2025-02-26 18:27 ` Eric Dumazet
2025-02-26 22:46 ` Jason Xing
1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-02-26 18:18 UTC (permalink / raw)
To: David Ahern
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
Hello David,
On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote:
> On 2/26/25 9:10 AM, Breno Leitao wrote:
> >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
> >> more use cases (see kernel references to it).
> >
> > Agree, this seems to provide more useful information
> >
> >> We have a patch for a couple years now with a tracepoint inside the
> >
> > Sorry, where do you have this patch? is it downstream?
>
> company tree. Attached. Where to put tracepoints and what arguments to
> supply so that it is beneficial to multiple users is always a touchy
> subject :-)
Thanks. I would like to state that this would be useful for Meta as
well.
Right now, we (Meta) are using nasty `noinline` attribute in
tcp_sendmsg() in order to make the API stable, and this tracepoint would
solve this problem avoiding the `noinline` hack. So, at least two type
of users would benefit from it.
> so I have not tried to push the patch out. sock arg should
> be added to it for example.
True, if it becomes a tracepoint instead of a rawtracepoint, the sock
arg might be useful.
How would you recommend me proceeding in this case?
Thanks
--breno
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 18:18 ` Breno Leitao
@ 2025-02-26 18:27 ` Eric Dumazet
2025-02-26 18:31 ` David Ahern
0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2025-02-26 18:27 UTC (permalink / raw)
To: Breno Leitao
Cc: David Ahern, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Wed, Feb 26, 2025 at 7:18 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hello David,
>
> On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote:
> > On 2/26/25 9:10 AM, Breno Leitao wrote:
> > >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
> > >> more use cases (see kernel references to it).
> > >
> > > Agree, this seems to provide more useful information
> > >
> > >> We have a patch for a couple years now with a tracepoint inside the
> > >
> > > Sorry, where do you have this patch? is it downstream?
> >
> > company tree. Attached. Where to put tracepoints and what arguments to
> > supply so that it is beneficial to multiple users is always a touchy
> > subject :-)
>
> Thanks. I would like to state that this would be useful for Meta as
> well.
>
> Right now, we (Meta) are using nasty `noinline` attribute in
> tcp_sendmsg() in order to make the API stable, and this tracepoint would
> solve this problem avoiding the `noinline` hack. So, at least two type
> of users would benefit from it.
>
> > so I have not tried to push the patch out. sock arg should
> > be added to it for example.
>
> True, if it becomes a tracepoint instead of a rawtracepoint, the sock
> arg might be useful.
>
> How would you recommend me proceeding in this case?
In 2022, Menglong Dong added __fix_address
Then later , Yafang Shao added noinline_for_tracing .
Would one of them be sufficient ?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 18:27 ` Eric Dumazet
@ 2025-02-26 18:31 ` David Ahern
2025-02-27 16:26 ` Breno Leitao
2025-04-02 12:57 ` Breno Leitao
0 siblings, 2 replies; 18+ messages in thread
From: David Ahern @ 2025-02-26 18:31 UTC (permalink / raw)
To: Eric Dumazet, Breno Leitao
Cc: Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On 2/26/25 11:27 AM, Eric Dumazet wrote:
> On Wed, Feb 26, 2025 at 7:18 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Hello David,
>>
>> On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote:
>>> On 2/26/25 9:10 AM, Breno Leitao wrote:
>>>>> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
>>>>> more use cases (see kernel references to it).
>>>>
>>>> Agree, this seems to provide more useful information
>>>>
>>>>> We have a patch for a couple years now with a tracepoint inside the
>>>>
>>>> Sorry, where do you have this patch? is it downstream?
>>>
>>> company tree. Attached. Where to put tracepoints and what arguments to
>>> supply so that it is beneficial to multiple users is always a touchy
>>> subject :-)
>>
>> Thanks. I would like to state that this would be useful for Meta as
>> well.
>>
>> Right now, we (Meta) are using nasty `noinline` attribute in
>> tcp_sendmsg() in order to make the API stable, and this tracepoint would
>> solve this problem avoiding the `noinline` hack. So, at least two type
>> of users would benefit from it.
>>
>>> so I have not tried to push the patch out. sock arg should
>>> be added to it for example.
>>
>> True, if it becomes a tracepoint instead of a rawtracepoint, the sock
>> arg might be useful.
>>
>> How would you recommend me proceeding in this case?
>
> In 2022, Menglong Dong added __fix_address
>
> Then later , Yafang Shao added noinline_for_tracing .
>
> Would one of them be sufficient ?
tcp_sendmsg_locked should not be getting inlined; it is the
sendmsg_locked handler and directly called by several subsystems.
ie., moving the tracepoint to tcp_sendmsg_locked should solve the inline
problem. From there, the question is inside the loop or at entry to the
function. Inside the loop has been very helpful for me.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 17:12 ` David Ahern
2025-02-26 18:18 ` Breno Leitao
@ 2025-02-26 22:46 ` Jason Xing
1 sibling, 0 replies; 18+ messages in thread
From: Jason Xing @ 2025-02-26 22:46 UTC (permalink / raw)
To: David Ahern
Cc: Breno Leitao, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
netdev, linux-kernel, linux-trace-kernel, kernel-team,
yonghong.song
Hi David,
On Thu, Feb 27, 2025 at 1:14 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 2/26/25 9:10 AM, Breno Leitao wrote:
> >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover
> >> more use cases (see kernel references to it).
> >
> > Agree, this seems to provide more useful information
> >
> >> We have a patch for a couple years now with a tracepoint inside the
> >
> > Sorry, where do you have this patch? is it downstream?
>
> company tree. Attached. Where to put tracepoints and what arguments to
> supply so that it is beneficial to multiple users is always a touchy
Right. I am always eager to establish a standard evaluation/method
which developers have common sense in. It's really hard because I gave
it a try before. Maintainers seem not to like to see too many
tracepoints appearing in the stack.
> subject :-), so I have not tried to push the patch out. sock arg should
> be added to it for example.
>
> The key is to see how tcp_sendmsg_locked breaks up the buffers, and then
> another one in tcp_write_xmit to see when the actual push out happens.
Agreed on this point because a fine-grained BPF program can take
advantage of it. But it seems another small topic that is probably
different from what the original motivation from Breno is in this
patch: I guess, making the tcp_sendmsg_locked non-inlined can allow
the BPF program to calculate the delta between when tcp_sendmsg_locked
starts and when tcp_sendmsg_locked ends? I don't know. Probably as
Eric said, using noinline or something like this is simpler?
> At the time I was looking at latency in the stack - from sendmsg call to
> driver pushing descriptors to hardware.
So do I.
Thanks,
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-24 19:23 ` Eric Dumazet
2025-02-25 10:58 ` Breno Leitao
@ 2025-02-26 23:46 ` Masami Hiramatsu
1 sibling, 0 replies; 18+ messages in thread
From: Masami Hiramatsu @ 2025-02-26 23:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: Yonghong Song, Breno Leitao, Neal Cardwell, Kuniyuki Iwashima,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
David S. Miller, David Ahern, kuba@kernel.org, Paolo Abeni,
Simon Horman, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
Kernel Team, yonghong.song@linux.dev
On Mon, 24 Feb 2025 20:23:00 +0100
Eric Dumazet <edumazet@google.com> wrote:
> On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote:
> >
> > > ________________________________________
> > >
> > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote:
> > >>
> > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling
> > >> the tracing of TCP messages being sent.
> > >>
> > >> Meta has been using BPF programs to monitor this function for years,
> > >> indicating significant interest in observing this important
> > >> functionality. Adding a proper tracepoint provides a stable API for all
> > >> users who need visibility into TCP message transmission.
> > >>
> > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid
> > >> creating unnecessary trace event infrastructure and tracefs exports,
> > >> keeping the implementation minimal while stabilizing the API.
> > >>
> > >> Given that this patch creates a rawtracepoint, you could hook into it
> > >> using regular tooling, like bpftrace, using regular rawtracepoint
> > >> infrastructure, such as:
> > >>
> > >> rawtracepoint:tcp_sendmsg_tp {
> > >> ....
> > >> }
> > >
> > > I would expect tcp_sendmsg() being stable enough ?
> > >
> > > kprobe:tcp_sendmsg {
> > > }
> >
> > In LTO mode, tcp_sendmsg could be inlined cross files. For example,
> >
> > net/ipv4/tcp.c:
> > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > net/ipv4/tcp_bpf.c:
> > ...
> > return tcp_sendmsg(sk, msg, size);
> > net/ipv6/af_inet6.c:
> > ...
> > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...)
> >
> > And this does happen in our production environment.
>
> And we do not have a way to make the kprobe work even if LTO decided
> to inline a function ?
There is `perf probe` command to set it up based on the debuginfo. But
the function parameters sometimes optimized out and no more accessible.
(You can try.)
Thus if you needs this tracepoint not temporarily nor debugging,
I would recommend to use tracepoint (or trace event).
Thank you,
>
> This seems like a tracing or LTO issue, this could be addressed there
> in a generic way
> and avoid many other patches to work around this.
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 18:31 ` David Ahern
@ 2025-02-27 16:26 ` Breno Leitao
2025-04-02 12:57 ` Breno Leitao
1 sibling, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-02-27 16:26 UTC (permalink / raw)
To: David Ahern
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Wed, Feb 26, 2025 at 11:31:49AM -0700, David Ahern wrote:
> On 2/26/25 11:27 AM, Eric Dumazet wrote:
> > In 2022, Menglong Dong added __fix_address
> >
> > Then later , Yafang Shao added noinline_for_tracing .
> >
> > Would one of them be sufficient ?
>
> tcp_sendmsg_locked should not be getting inlined; it is the
> sendmsg_locked handler and directly called by several subsystems.
>
> ie., moving the tracepoint to tcp_sendmsg_locked should solve the inline
> problem. From there, the question is inside the loop or at entry to the
> function. Inside the loop has been very helpful for me.
Inside the loop would work for me as well.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-02-26 18:31 ` David Ahern
2025-02-27 16:26 ` Breno Leitao
@ 2025-04-02 12:57 ` Breno Leitao
2025-04-02 14:11 ` David Ahern
1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-04-02 12:57 UTC (permalink / raw)
To: David Ahern
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Wed, Feb 26, 2025 at 11:31:49AM -0700, David Ahern wrote:
> On 2/26/25 11:27 AM, Eric Dumazet wrote:
>
> ie., moving the tracepoint to tcp_sendmsg_locked should solve the inline
> problem. From there, the question is inside the loop or at entry to the
> function. Inside the loop has been very helpful for me.
I am happy to get it inside the loop. I am planning to send the
following patch when the MW opens. How does it sound?
trace: tcp: Add tracepoint for tcp_sendmsg_locked()
Add a tracepoint to monitor TCP sendmsg operations, enabling the tracing
of TCP messages being sent.
Meta has been using BPF programs to monitor tcp_sendmsg() for years,
indicating significant interest in observing this important
functionality. Adding a proper tracepoint provides a stable API for all
users who need visibility into TCP message transmission.
David Ahern is using a similar functionality with a custom patch[1]. So,
this means we have more than a single use case for this request.
The implementation adopts David's approach[1] for greater flexibility
compared to the initial proposal.
Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1a40c41ff8c30..cd90a8c66d683 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,29 @@ TRACE_EVENT(tcp_retransmit_synack,
__entry->saddr_v6, __entry->daddr_v6)
);
+TRACE_EVENT(tcp_sendmsg_locked,
+ TP_PROTO(struct msghdr *msg, struct sk_buff *skb, int size_goal),
+
+ TP_ARGS(msg, skb, size_goal),
+
+ TP_STRUCT__entry(
+ __field(__u64, skb)
+ __field(int, skb_len)
+ __field(int, msg_left)
+ __field(int, size_goal)
+ ),
+
+ TP_fast_assign(
+ __entry->skb = (__u64)skb;
+ __entry->skb_len = skb ? skb->len : 0;
+ __entry->msg_left = msg_data_left(msg);
+ __entry->size_goal = size_goal;
+ ),
+
+ TP_printk("skb %llx skb_len %d msg_left %d size_goal %d", __entry->skb,
+ __entry->skb_len, __entry->msg_left, __entry->size_goal)
+);
+
DECLARE_TRACE(tcp_cwnd_reduction_tp,
TP_PROTO(const struct sock *sk, int newly_acked_sacked,
int newly_lost, int flag),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ea8de00f669d0..822cd40ce2b7f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1160,6 +1160,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
if (skb)
copy = size_goal - skb->len;
+ trace_tcp_sendmsg_locked(msg, skb, size_goal);
+
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
bool first_skb;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-04-02 12:57 ` Breno Leitao
@ 2025-04-02 14:11 ` David Ahern
2025-04-02 16:01 ` Steven Rostedt
2025-04-03 11:38 ` Breno Leitao
0 siblings, 2 replies; 18+ messages in thread
From: David Ahern @ 2025-04-02 14:11 UTC (permalink / raw)
To: Breno Leitao
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On 4/2/25 8:57 AM, Breno Leitao wrote:
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 1a40c41ff8c30..cd90a8c66d683 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -259,6 +259,29 @@ TRACE_EVENT(tcp_retransmit_synack,
> __entry->saddr_v6, __entry->daddr_v6)
> );
>
> +TRACE_EVENT(tcp_sendmsg_locked,
> + TP_PROTO(struct msghdr *msg, struct sk_buff *skb, int size_goal),
How about passing in the sk reference here; not needed for trace
entries, but makes it directly accessible for bpf programs.
Otherwise, LGTM.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-04-02 14:11 ` David Ahern
@ 2025-04-02 16:01 ` Steven Rostedt
2025-04-03 11:38 ` Breno Leitao
1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2025-04-02 16:01 UTC (permalink / raw)
To: David Ahern
Cc: Breno Leitao, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Wed, 2 Apr 2025 08:11:03 -0600
David Ahern <dsahern@kernel.org> wrote:
> How about passing in the sk reference here; not needed for trace
> entries, but makes it directly accessible for bpf programs.
Not to mention event probes (dynamic trace events that can be attached to
existing events and the fields can be dereferenced). One day I'll add
documentation about event probes as they have been in the kernel since 5.15! :-p
-- Steve
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg()
2025-04-02 14:11 ` David Ahern
2025-04-02 16:01 ` Steven Rostedt
@ 2025-04-03 11:38 ` Breno Leitao
1 sibling, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-04-03 11:38 UTC (permalink / raw)
To: David Ahern
Cc: Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel, kernel-team, yonghong.song
On Wed, Apr 02, 2025 at 08:11:03AM -0600, David Ahern wrote:
> On 4/2/25 8:57 AM, Breno Leitao wrote:
> > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > index 1a40c41ff8c30..cd90a8c66d683 100644
> > --- a/include/trace/events/tcp.h
> > +++ b/include/trace/events/tcp.h
> > @@ -259,6 +259,29 @@ TRACE_EVENT(tcp_retransmit_synack,
> > __entry->saddr_v6, __entry->daddr_v6)
> > );
> >
> > +TRACE_EVENT(tcp_sendmsg_locked,
> > + TP_PROTO(struct msghdr *msg, struct sk_buff *skb, int size_goal),
>
> How about passing in the sk reference here; not needed for trace
> entries, but makes it directly accessible for bpf programs.
Right, so, without the trace entries, the output of
/sys/kernel/debug/tracing/events/tcp/tcp_sendmsg_locked/format
doesn't show it, right?
field:__u64 skb; offset:8; size:8; signed:0;
field:int skb_len; offset:16; size:4; signed:1;
field:int msg_left; offset:20; size:4; signed:1;
field:int size_goal; offset:24; size:4; signed:1;
This is what I've been testing now:
trace: tcp: Add tracepoint for tcp_sendmsg_locked()
Add a tracepoint to monitor TCP sendmsg operations, enabling the tracing
of TCP messages being sent.
Meta has been using BPF programs to monitor tcp_sendmsg() for years,
indicating significant interest in observing this important
functionality. Adding a proper tracepoint provides a stable API for all
users who need visibility into TCP message transmission.
David Ahern is using a similar functionality with a custom patch[1]. So,
this means we have more than a single use case for this request.
The implementation adopts David's approach[1] for greater flexibility
compared to the initial proposal.
Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1a40c41ff8c30..21529d5faf3b1 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -259,6 +259,30 @@ TRACE_EVENT(tcp_retransmit_synack,
__entry->saddr_v6, __entry->daddr_v6)
);
+TRACE_EVENT(tcp_sendmsg_locked,
+ TP_PROTO(const struct sock *sk, const struct msghdr *msg,
+ const struct sk_buff *skb, int size_goal),
+
+ TP_ARGS(sk, msg, skb, size_goal),
+
+ TP_STRUCT__entry(
+ __field(__u64, skb)
+ __field(int, skb_len)
+ __field(int, msg_left)
+ __field(int, size_goal)
+ ),
+
+ TP_fast_assign(
+ __entry->skb = (__u64)skb;
+ __entry->skb_len = skb ? skb->len : 0;
+ __entry->msg_left = msg_data_left(msg);
+ __entry->size_goal = size_goal;
+ ),
+
+ TP_printk("skb %llx skb_len %d msg_left %d size_goal %d", __entry->skb,
+ __entry->skb_len, __entry->msg_left, __entry->size_goal)
+);
+
DECLARE_TRACE(tcp_cwnd_reduction_tp,
TP_PROTO(const struct sock *sk, int newly_acked_sacked,
int newly_lost, int flag),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ea8de00f669d0..270ce2c8c2d54 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1160,6 +1160,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
if (skb)
copy = size_goal - skb->len;
+ trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
+
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
bool first_skb;
Plus this small dependency:
net: pass const to msg_data_left()
The msg_data_left() function doesn't modify the struct msghdr parameter,
so mark it as const. This allows the function to be used with const
references, improving type safety and making the API more flexible.
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/include/linux/socket.h b/include/linux/socket.h
index c3322eb3d6865..3b262487ec060 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -168,7 +168,7 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
return __cmsg_nxthdr(__msg->msg_control, __msg->msg_controllen, __cmsg);
}
-static inline size_t msg_data_left(struct msghdr *msg)
+static inline size_t msg_data_left(const struct msghdr *msg)
{
return iov_iter_count(&msg->msg_iter);
}
Thanks for your help,
Breno
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-04-03 11:38 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 18:24 [PATCH net-next] trace: tcp: Add tracepoint for tcp_sendmsg() Breno Leitao
2025-02-24 19:03 ` Eric Dumazet
2025-02-24 19:12 ` Yonghong Song
2025-02-24 19:23 ` Eric Dumazet
2025-02-25 10:58 ` Breno Leitao
2025-02-26 23:46 ` Masami Hiramatsu
2025-02-24 19:16 ` David Ahern
2025-02-26 16:10 ` Breno Leitao
2025-02-26 17:12 ` David Ahern
2025-02-26 18:18 ` Breno Leitao
2025-02-26 18:27 ` Eric Dumazet
2025-02-26 18:31 ` David Ahern
2025-02-27 16:26 ` Breno Leitao
2025-04-02 12:57 ` Breno Leitao
2025-04-02 14:11 ` David Ahern
2025-04-02 16:01 ` Steven Rostedt
2025-04-03 11:38 ` Breno Leitao
2025-02-26 22:46 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).