linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Yunhui Cui <cuiyunhui@bytedance.com>
Cc: rostedt@goodmis.org, mhiramat@kernel.org, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	kuniyu@amazon.com, duanxiongchun@bytedance.com,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH v3] sock: add tracepoint for send recv length
Date: Sat, 7 Jan 2023 12:02:44 -0800	[thread overview]
Message-ID: <Y7nP5PAGxWZ+2GHN@pop-os.localdomain> (raw)
In-Reply-To: <20230107035923.363-1-cuiyunhui@bytedance.com>

On Sat, Jan 07, 2023 at 11:59:23AM +0800, Yunhui Cui wrote:
> Add 2 tracepoints to monitor the tcp/udp traffic
> of per process and per cgroup.
> 
> Regarding monitoring the tcp/udp traffic of each process, there are two
> existing solutions, the first one is https://www.atoptool.nl/netatop.php.
> The second is via kprobe/kretprobe.
> 
> Netatop solution is implemented by registering the hook function at the
> hook point provided by the netfilter framework.
> 
> These hook functions may be in the soft interrupt context and cannot
> directly obtain the pid. Some data structures are added to bind packets
> and processes. For example, struct taskinfobucket, struct taskinfo ...
> 
> Every time the process sends and receives packets it needs multiple
> hashmaps,resulting in low performance and it has the problem fo inaccurate
> tcp/udp traffic statistics(for example: multiple threads share sockets).
> 
> We can obtain the information with kretprobe, but as we know, kprobe gets
> the result by trappig in an exception, which loses performance compared
> to tracepoint.
> 
> We compared the performance of tracepoints with the above two methods, and
> the results are as follows:
> 
> ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
> without trace:
> Time per request: 39.660 [ms] (mean)
> Time per request: 0.040 [ms] (mean, across all concurrent requests)
> 
> netatop:
> Time per request: 50.717 [ms] (mean)
> Time per request: 0.051 [ms] (mean, across all concurrent requests)
> 
> kr:
> Time per request: 43.168 [ms] (mean)
> Time per request: 0.043 [ms] (mean, across all concurrent requests)
> 
> tracepoint:
> Time per request: 41.004 [ms] (mean)
> Time per request: 0.041 [ms] (mean, across all concurrent requests
> 
> It can be seen that tracepoint has better performance.
> 
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com>

Thanks for the numbers, they help a lot.

Acked-by: Cong Wang <cong.wang@bytedance.com>

A few minor issues below.


> ---
>  include/trace/events/sock.h | 49 +++++++++++++++++++++++++++++++++++++
>  net/socket.c                | 20 +++++++++++++--
>  2 files changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
> index 777ee6cbe933..16092c0406a2 100644
> --- a/include/trace/events/sock.h
> +++ b/include/trace/events/sock.h
> @@ -263,6 +263,55 @@ TRACE_EVENT(inet_sk_error_report,
>  		  __entry->error)
>  );
>  
> +/*
> + * sock send/recv msg length
> + */
> +DECLARE_EVENT_CLASS(sock_msg_length,
> +
> +	TP_PROTO(struct sock *sk, __u16 family, __u16 protocol, int ret,
> +		 int flags),
> +
> +	TP_ARGS(sk, family, protocol, ret, flags),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, sk)
> +		__field(__u16, family)
> +		__field(__u16, protocol)
> +		__field(int, length)
> +		__field(int, error)
> +		__field(int, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->sk = sk;
> +		__entry->family = sk->sk_family;
> +		__entry->protocol = sk->sk_protocol;
> +		__entry->length = ret > 0 ? ret : 0;
> +		__entry->error = ret < 0 ? ret : 0;
> +		__entry->flags = flags;
> +	),
> +
> +	TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x",
> +			__entry->sk,
> +			show_family_name(__entry->family),
> +			show_inet_protocol_name(__entry->protocol),
> +			__entry->length,
> +			__entry->error, __entry->flags)

Please align and pack those parameters properly.


> +);
> +
> +DEFINE_EVENT(sock_msg_length, sock_sendmsg_length,
> +	TP_PROTO(struct sock *sk, __u16 family, __u16 protocol, int ret,
> +		 int flags),
> +
> +	TP_ARGS(sk, family, protocol, ret, flags)
> +);
> +
> +DEFINE_EVENT(sock_msg_length, sock_recvmsg_length,
> +	TP_PROTO(struct sock *sk, __u16 family, __u16 protocol, int ret,
> +		 int flags),
> +
> +	TP_ARGS(sk, family, protocol, ret, flags)
> +);

It might be a better idea to remove "msg" from the tracepoint names, in
case of any confusion with "sendpage". So
s/sock_sendmsg_length/sock_send_length/ ?


>  #endif /* _TRACE_SOCK_H */
>  
>  /* This part must be outside protection */
> diff --git a/net/socket.c b/net/socket.c
> index 888cd618a968..37578e8c685b 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -106,6 +106,7 @@
>  #include <net/busy_poll.h>
>  #include <linux/errqueue.h>
>  #include <linux/ptp_clock_kernel.h>
> +#include <trace/events/sock.h>
>  
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  unsigned int sysctl_net_busy_read __read_mostly;
> @@ -715,6 +716,9 @@ static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
>  				     inet_sendmsg, sock, msg,
>  				     msg_data_left(msg));
>  	BUG_ON(ret == -EIOCBQUEUED);
> +
> +	trace_sock_sendmsg_length(sock->sk, sock->sk->sk_family,
> +				  sock->sk->sk_protocol, ret, 0);
>  	return ret;
>  }
>  
> @@ -992,9 +996,16 @@ INDIRECT_CALLABLE_DECLARE(int inet6_recvmsg(struct socket *, struct msghdr *,
>  static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
>  				     int flags)
>  {
> -	return INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg,
> +	int ret = INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg,
>  				  inet_recvmsg, sock, msg, msg_data_left(msg),
>  				  flags);

Please adjust the indentations properly.

Thanks.

  reply	other threads:[~2023-01-07 20:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-07  3:59 [PATCH v3] sock: add tracepoint for send recv length Yunhui Cui
2023-01-07 20:02 ` Cong Wang [this message]
2023-01-08  1:34   ` [External] " 运辉崔
2023-01-08 21:34 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7nP5PAGxWZ+2GHN@pop-os.localdomain \
    --to=xiyou.wangcong@gmail.com \
    --cc=cuiyunhui@bytedance.com \
    --cc=davem@davemloft.net \
    --cc=duanxiongchun@bytedance.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).