From: Yonghong Song <yhs@meta.com>
To: Kui-Feng Lee <sinquersw@gmail.com>,
Kui-Feng Lee <thinker.li@gmail.com>,
bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev,
song@kernel.org, kernel-team@meta.com, andrii@kernel.org,
daniel@iogearbox.net, yhs@fb.com, kpsingh@kernel.org,
shuah@kernel.org, john.fastabend@gmail.com, sdf@google.com,
mykolal@fb.com, linux-kselftest@vger.kernel.org,
jolsa@kernel.org, haoluo@google.com
Cc: Kui-Feng Lee <kuifeng@meta.com>
Subject: Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.
Date: Thu, 22 Jun 2023 11:28:05 -0700 [thread overview]
Message-ID: <94226479-8d79-cc83-9ecf-6db0b376a7fd@meta.com> (raw)
In-Reply-To: <2693aaa4-eb33-553c-291c-3eb555452ea6@gmail.com>
On 6/22/23 10:15 AM, Kui-Feng Lee wrote:
>
>
> On 6/21/23 20:37, Yonghong Song wrote:
>>
>>
>> On 6/20/23 10:14 AM, Kui-Feng Lee wrote:
>>> Always call BPF filters if CGROUP BPF is enabled for EGRESS without
>>> checking skb->sk against sk.
>>>
>>> The filters were called only if skb is owned by the sock that the
>>> skb is sent out through. In another words, skb->sk should point to
>>> the sock that it is sending through its egress. However, the filters
>>> would
>>> miss SYNACK skbs that they are owned by a request_sock but sent through
>>> the listening sock, that is the socket listening incoming connections.
>>> This is an unnecessary restrict.
>>
>> The original patch which introduced 'sk == skb->sk' is
>> 3007098494be cgroup: add support for eBPF programs
>> There are no mentioning in commit message why 'sk == skb->sk'
>> is needed. So it is possible that this is just restricted
>> for use cases at that moment. Now there are use cases
>> where 'sk != skb->sk' so removing this check can enable
>> the new use case. Maybe you can add this into your commit
>> message so people can understand the history of 'sk == skb->sk'.
>
> After checking the code and the Alexei's comment[1] again, this check
> may be different from what I thought. In another post[2],
> Daniel Borkmann mentioned
>
> Wouldn't that mean however, when you go through stacked devices that
> you'd run the same eBPF cgroup program for skb->sk multiple times?
>
> I read this paragraph several times.
> This check ensures the filters are only called for the device on
> the top of a stack. So, I probably should change the check to
>
> sk == skb_to_full_sk(skb)
I think this should work. It exactly covers your use case:
they are owned by a request_sock but sent through
the listening sock, that is the socket listening incoming connections
and sk == skb->sk for non request_sock/listening_sock case.
I originally though whether you could do
sk == skb->sk || skb->sk->sk_state == TCP_NEW_SYN_RECV
but obviously your approach is better.
>
> instead of removing it. If we remove the check, egress filters
> could be called multiple times for a skb, just like what Daniel said.
>
> Does that make sense?
>
> [1]
> https://lore.kernel.org/all/CAADnVQKi0c=Mf3b=z43=b6n2xBVhwPw4QoV_au5+pFE29iLkaQ@mail.gmail.com/
> [2] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/
>
>>
>>>
>>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>>> ---
>>> include/linux/bpf-cgroup.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>>> index 57e9e109257e..e656da531f9f 100644
>>> --- a/include/linux/bpf-cgroup.h
>>> +++ b/include/linux/bpf-cgroup.h
>>> @@ -199,7 +199,7 @@ static inline bool cgroup_bpf_sock_enabled(struct
>>> sock *sk,
>>> #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb) \
>>> ({ \
>>> int __ret = 0; \
>>> - if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk ==
>>> skb->sk) { \
>>> + if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) { \
>>> typeof(sk) __sk = sk_to_full_sk(sk); \
>>> if (sk_fullsock(__sk) && \
>>> cgroup_bpf_sock_enabled(__sk,
>>> CGROUP_INET_EGRESS)) \
>>
next prev parent reply other threads:[~2023-06-22 18:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-20 17:14 [PATCH bpf-next v3 0/2] Fix missing synack in BPF cgroup_skb filters Kui-Feng Lee
2023-06-20 17:14 ` [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress Kui-Feng Lee
2023-06-22 3:37 ` Yonghong Song
2023-06-22 15:34 ` Kui-Feng Lee
2023-06-22 17:15 ` Kui-Feng Lee
2023-06-22 18:28 ` Yonghong Song [this message]
2023-06-22 20:06 ` Daniel Borkmann
2023-06-22 23:55 ` Kui-Feng Lee
2023-06-23 8:50 ` Daniel Borkmann
2023-06-23 16:30 ` Kui-Feng Lee
2023-06-20 17:14 ` [PATCH bpf-next v3 2/2] selftests/bpf: Verify that the cgroup_skb filters receive expected packets Kui-Feng Lee
2023-06-22 4:15 ` Yonghong Song
2023-06-22 15:33 ` Kui-Feng Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94226479-8d79-cc83-9ecf-6db0b376a7fd@meta.com \
--to=yhs@meta.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=kuifeng@meta.com \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=sinquersw@gmail.com \
--cc=song@kernel.org \
--cc=thinker.li@gmail.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox