Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.

Linux Kernel Selftest development
 help / color / mirror / Atom feed

From: Kui-Feng Lee <sinquersw@gmail.com>
To: Yonghong Song <yhs@meta.com>, Kui-Feng Lee <thinker.li@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev,
	song@kernel.org, kernel-team@meta.com, andrii@kernel.org,
	daniel@iogearbox.net, yhs@fb.com, kpsingh@kernel.org,
	shuah@kernel.org, john.fastabend@gmail.com, sdf@google.com,
	mykolal@fb.com, linux-kselftest@vger.kernel.org,
	jolsa@kernel.org, haoluo@google.com
Cc: Kui-Feng Lee <kuifeng@meta.com>
Subject: Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.
Date: Thu, 22 Jun 2023 10:15:58 -0700	[thread overview]
Message-ID: <2693aaa4-eb33-553c-291c-3eb555452ea6@gmail.com> (raw)
In-Reply-To: <4d46ba3a-61e9-2482-a359-7a8805f1dbc8@meta.com>



On 6/21/23 20:37, Yonghong Song wrote:
> 
> 
> On 6/20/23 10:14 AM, Kui-Feng Lee wrote:
>> Always call BPF filters if CGROUP BPF is enabled for EGRESS without
>> checking skb->sk against sk.
>>
>> The filters were called only if skb is owned by the sock that the
>> skb is sent out through.  In another words, skb->sk should point to
>> the sock that it is sending through its egress.  However, the filters 
>> would
>> miss SYNACK skbs that they are owned by a request_sock but sent through
>> the listening sock, that is the socket listening incoming connections.
>> This is an unnecessary restrict.
> 
> The original patch which introduced 'sk == skb->sk' is
>    3007098494be  cgroup: add support for eBPF programs
> There are no mentioning in commit message why 'sk == skb->sk'
> is needed. So it is possible that this is just restricted
> for use cases at that moment. Now there are use cases
> where 'sk != skb->sk' so removing this check can enable
> the new use case. Maybe you can add this into your commit
> message so people can understand the history of 'sk == skb->sk'.

After checking the code and the Alexei's comment[1] again, this check
may be different from what I thought. In another post[2],
Daniel Borkmann mentioned

     Wouldn't that mean however, when you go through stacked devices that
     you'd run the same eBPF cgroup program for skb->sk multiple times?

I read this paragraph several times.
This check ensures the filters are only called for the device on
the top of a stack.  So, I probably should change the check to

     sk == skb_to_full_sk(skb)

instead of removing it.  If we remove the check, egress filters
could be called multiple times for a skb, just like what Daniel said.

Does that make sense?

[1] 
https://lore.kernel.org/all/CAADnVQKi0c=Mf3b=z43=b6n2xBVhwPw4QoV_au5+pFE29iLkaQ@mail.gmail.com/
[2] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/

> 
>>
>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>> ---
>>   include/linux/bpf-cgroup.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>> index 57e9e109257e..e656da531f9f 100644
>> --- a/include/linux/bpf-cgroup.h
>> +++ b/include/linux/bpf-cgroup.h
>> @@ -199,7 +199,7 @@ static inline bool cgroup_bpf_sock_enabled(struct 
>> sock *sk,
>>   #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb)                   \
>>   ({                                           \
>>       int __ret = 0;                                   \
>> -    if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == 
>> skb->sk) { \
>> +    if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) {               \
>>           typeof(sk) __sk = sk_to_full_sk(sk);                   \
>>           if (sk_fullsock(__sk) &&                       \
>>               cgroup_bpf_sock_enabled(__sk, 
>> CGROUP_INET_EGRESS))           \
>

next prev parent reply	other threads:[~2023-06-22 17:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20 17:14 [PATCH bpf-next v3 0/2] Fix missing synack in BPF cgroup_skb filters Kui-Feng Lee
2023-06-20 17:14 ` [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress Kui-Feng Lee
2023-06-22  3:37   ` Yonghong Song
2023-06-22 15:34     ` Kui-Feng Lee
2023-06-22 17:15     ` Kui-Feng Lee [this message]
2023-06-22 18:28       ` Yonghong Song
2023-06-22 20:06         ` Daniel Borkmann
2023-06-22 23:55           ` Kui-Feng Lee
2023-06-23  8:50             ` Daniel Borkmann
2023-06-23 16:30               ` Kui-Feng Lee
2023-06-20 17:14 ` [PATCH bpf-next v3 2/2] selftests/bpf: Verify that the cgroup_skb filters receive expected packets Kui-Feng Lee
2023-06-22  4:15   ` Yonghong Song
2023-06-22 15:33     ` Kui-Feng Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2693aaa4-eb33-553c-291c-3eb555452ea6@gmail.com \
    --to=sinquersw@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=kuifeng@meta.com \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=thinker.li@gmail.com \
    --cc=yhs@fb.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox