BPF List
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: zijianzhang@bytedance.com, Amery Hung <amery.hung@bytedance.com>,
	bpf@vger.kernel.org, edumazet@google.com, davem@davemloft.net,
	kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	eddyz87@gmail.com, yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, mykolal@fb.com,
	shuah@kernel.org, wangdongdong.6@bytedance.com,
	zhoufeng.zf@bytedance.com
Subject: Re: [PATCH bpf-next v2 1/2] bpf: tcp: prevent bpf_reserve_hdr_opt() from growing skb larger than MTU
Date: Thu, 29 Aug 2024 17:20:07 -0700	[thread overview]
Message-ID: <84cf2784-49af-481f-9b43-901f7aeeaef8@linux.dev> (raw)
In-Reply-To: <ZtCl3kQrldshCFam@pop-os.localdomain>

On 8/29/24 9:46 AM, Cong Wang wrote:
> On Wed, Aug 28, 2024 at 02:29:17PM -0700, Martin KaFai Lau wrote:
>> On 8/26/24 6:37 PM, zijianzhang@bytedance.com wrote:
>>> From: Amery Hung <amery.hung@bytedance.com>
>>>
>>> This series prevents sockops users from accidentally causing packet
>>> drops. This can happen when a BPF_SOCK_OPS_HDR_OPT_LEN_CB program
>>> reserves different option lengths in tcp_sendmsg().
>>>
>>> Initially, sockops BPF_SOCK_OPS_HDR_OPT_LEN_CB program will be called to
>>> reserve a space in tcp_send_mss(), which will return the MSS for TSO.
>>> Then, BPF_SOCK_OPS_HDR_OPT_LEN_CB will be called in __tcp_transmit_skb()
>>> again to calculate the actual tcp_option_size and skb_push() the total
>>> header size.
>>>
>>> skb->gso_size is restored from TCP_SKB_CB(skb)->tcp_gso_size, which is
>>> derived from tcp_send_mss() where we first call HDR_OPT_LEN. If the
>>> reserved opt size is smaller than the actual header size, the len of the
>>> skb can exceed the MTU. As a result, ip(6)_fragment will drop the
>>> packet if skb->ignore_df is not set.
>>>
>>> To prevent this accidental packet drop, we need to make sure the
>>> second call to the BPF_SOCK_OPS_HDR_OPT_LEN_CB program reserves space
>>> not more than the first time.
>>
>> iiuc, it is a bug in the bpf prog itself that did not reserve the same
>> header length and caused a drop. It is not the only drop case though for an
>> incorrect bpf prog. There are other cases where a bpf prog can accidentally
>> drop a packet.
> 
> But safety is the most important thing for eBPF programs, do we really
> allow this kind of bug to happen in eBPF programs?
> 
>>
>> Do you have an actual use case that the bpf prog cannot reserve the correct
>> header length for the same sk ?
> 
> You can think of it as a simple call of bpf_get_prandom_u32():
> 
> SEC("sockops")
> int bpf_sock_ops_cb(struct bpf_sock_ops *skops)
> {
>      if (skops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB) {
>          return bpf_get_prandom_u32();

It compiles but this won't even pass the verifier.

If you want to go to this extreme to consider any random bpf program, it is 
essentially deploying a fuzzer to the production traffic, right? Sure, syzbot 
has programs that are rejected by verifier or cause a packet drop. afaik, I 
don't recall syzbot reported them as a crash.

Is a drop a safety issue as you said? I don't think so. It is why this set is 
tagged as bpf-next also and I agree. What is the hurry here?

Is it something that can be improved? may be. Note that the patchwork status has 
not been changed yet. Instead of wasting time here, please allow Zijian (/Amery) 
to continue the discussion on another thread. I have asked question on the 
changes and also suggested other ways.


>      }
>      return 0;
> }
> 
> And eBPF programs are stateful anyway, at least we should not assume
> it is stateless since maps are commonly used. Therefore, different invocations
> of a same eBPF program are expected to return different values. IMHO,
> this is a situation we have to deal with in the kernel, hence stricter
> checks are reasonable and necessary.

  reply	other threads:[~2024-08-30  0:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27  1:37 [PATCH bpf-next v2 0/2] prevent bpf_reserve_hdr_opt() from growing skb larger than MTU zijianzhang
2024-08-27  1:37 ` [PATCH bpf-next v2 1/2] bpf: tcp: " zijianzhang
2024-08-28 21:29   ` Martin KaFai Lau
2024-08-28 23:01     ` Zijian Zhang
2024-08-29  1:00       ` Martin KaFai Lau
2024-08-30 21:02         ` Zijian Zhang
2024-09-03 22:38           ` Martin KaFai Lau
2024-09-05 18:19             ` Zijian Zhang
2024-09-05 19:38               ` Martin KaFai Lau
2024-09-05 20:20                 ` Zijian Zhang
2024-09-05 21:07                   ` Martin KaFai Lau
2024-08-29 16:46     ` Cong Wang
2024-08-30  0:20       ` Martin KaFai Lau [this message]
2024-08-27  1:37 ` [PATCH bpf-next v2 2/2] bpf: selftests: reserve smaller tcp header options than the actual size zijianzhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84cf2784-49af-481f-9b43-901f7aeeaef8@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=amery.hung@bytedance.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=mykolal@fb.com \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yonghong.song@linux.dev \
    --cc=zhoufeng.zf@bytedance.com \
    --cc=zijianzhang@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox