Re: [PATCH bpf-next] bpf: add sock_ops callbacks for data send/recv/acked events

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Philo Lu <lulie@linux.alibaba.com>
Cc: xuanzhuo@linux.alibaba.com, dust.li@linux.alibaba.com,
	alibuda@linux.alibaba.com, guwen@linux.alibaba.com,
	hengqi@linux.alibaba.com, edumazet@google.com,
	davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	song@kernel.org, yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, jolsa@kernel.org, dsahern@kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next] bpf: add sock_ops callbacks for data send/recv/acked events
Date: Thu, 30 Nov 2023 10:13:07 -0800	[thread overview]
Message-ID: <ab31efa1-d6b9-499d-a735-0852ed036a2f@linux.dev> (raw)
In-Reply-To: <1bcd4871-7403-41d9-8ae6-4df4878d9275@linux.alibaba.com>

On 11/29/23 2:05 AM, Philo Lu wrote:
> 
> On 2023/11/29 08:33, Martin KaFai Lau wrote:
>> On 11/23/23 4:37 AM, Philo Lu wrote:
>>> Sorry, I forgot to cc the maintainers.
>>>
>>> On 2023/11/23 11:07, Philo Lu wrote:
>>>> Add 3 sock_ops operators, namely BPF_SOCK_OPS_DATA_SEND_CB,
>>>> BPF_SOCK_OPS_DATA_RECV_CB, and BPF_SOCK_OPS_DATA_ACKED_CB. A flag
>>>> BPF_SOCK_OPS_DATA_EVENT_CB_FLAG is provided to minimize the performance
>>>> impact. The flag must be explicitly set to enable these callbacks.
>>>>
>>>> If the flag is enabled, bpf sock_ops program will be called every time a
>>>> tcp data packet is sent, received, and acked.
>>>> BPF_SOCK_OPS_DATA_SEND_CB: call bpf after a data packet is sent.
>>>> BPF_SOCK_OPS_DATA_RECV_CB: call bpf after a data packet is receviced.
>>>> BPF_SOCK_OPS_DATA_ACKED_CB: call bpf after a valid ack packet is
>>>> processed (some sent data are ackknowledged).
>>>>
>>>> We use these callbacks for fine-grained tcp monitoring, which collects
>>>> and analyses every tcp request/response event information. The whole
>>>> system has been described in SIGMOD'18 (see
>>>> https://dl.acm.org/doi/pdf/10.1145/3183713.3190659 for details). To
>>>> achieve this with bpf, we require hooks for data events that call
>>>> sock_ops bpf (1) when any data packet is sent/received/acked, and (2)
>>>> after critical tcp state variables have been updated (e.g., snd_una,
>>>> snd_nxt, rcv_nxt). However, existing sock_ops operators cannot meet our
>>>> requirements.
>>>>
>>>> Besides, these hooks also help to debug tcp when data send/recv/acked.
>>
>> This all sounds like a tracing use case. Why tracepoint is not used instead?
> 
> Yes, our use case is pure tracing. We add hooks to sockops because we also use
> other ops like BPF_SOCK_OPS_STATE_CB. Thus, sockops seems a natural solution
> for us.

There is also an existing trace_inet_sock_set_state() tracepoint for tracking 
the state change. There are other existing tracepoints in 
include/trace/events/tcp.h for tcp perf monitoring/analysis purpose (e.g. 
trace_tcp_retransmit_skb). All it needs is read-only access to sk and the 
purpose is for tcp perf monitoring/analysis. If a hook is needed here 
(cgroup-bpf or tracepoint), I would think it is better to supplement the 
existing tcp tracepoints which were also added to do tcp monitoring.

I suspect the fexit bpf prog may also work because the fexit bpf prog is called 
after the traced kernel function is called. However, the kernel functions may 
get inlined and the tracepoint will still be needed. May be the netdev 
maintainer can chime in here regarding the tracepoint additions.

> 
> We can also use tracepoint (with sockops) instead. So we think which to use
> depends on your opinions. Many thanks.
> 
>

     prev parent reply	other threads:[~2023-11-30 18:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20231123030732.111576-1-lulie@linux.alibaba.com>
2023-11-23 12:37 ` [PATCH bpf-next] bpf: add sock_ops callbacks for data send/recv/acked events Philo Lu
2023-11-24  9:47   ` Daniel Borkmann
2023-11-29 10:05     ` Philo Lu
2023-11-29  0:33   ` Martin KaFai Lau
2023-11-29 10:05     ` Philo Lu
2023-11-30 18:13       ` Martin KaFai Lau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab31efa1-d6b9-499d-a735-0852ed036a2f@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alibuda@linux.alibaba.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=guwen@linux.alibaba.com \
    --cc=haoluo@google.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lulie@linux.alibaba.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).