From: Martin KaFai Lau <martin.lau@linux.dev>
To: Tiago Lam <tiagolam@cloudflare.com>
Cc: "David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Mykola Lysenko <mykolal@fb.com>,
Shuah Khan <shuah@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
Jakub Sitnicki <jakub@cloudflare.com>,
kernel-team@cloudflare.com
Subject: Re: [RFC PATCH 2/3] ipv6: Run a reverse sk_lookup on sendmsg.
Date: Tue, 24 Sep 2024 16:58:19 -0700 [thread overview]
Message-ID: <0288caf4-3c9b-4eae-a2b4-f8934badc270@linux.dev> (raw)
In-Reply-To: <ZumrBKAkZX0RZrgm@GHGHG14>
On 9/17/24 6:15 PM, Tiago Lam wrote:
> On Fri, Sep 13, 2024 at 11:24:09AM -0700, Martin KaFai Lau wrote:
>> On 9/13/24 2:39 AM, Tiago Lam wrote:
>>> This follows the same rationale provided for the ipv4 counterpart, where
>>> it now runs a reverse socket lookup when source addresses and/or ports
>>> are changed, on sendmsg, to check whether egress traffic should be
>>> allowed to go through or not.
>>>
>>> As with ipv4, the ipv6 sendmsg path is also extended here to support the
>>> IPV6_ORIGDSTADDR ancilliary message to be able to specify a source
>>> address/port.
>>>
>>> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
>>> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
>>> ---
>>> net/ipv6/datagram.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> net/ipv6/udp.c | 8 ++++--
>>> 2 files changed, 82 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
>>> index fff78496803d..4214dda1c320 100644
>>> --- a/net/ipv6/datagram.c
>>> +++ b/net/ipv6/datagram.c
>>> @@ -756,6 +756,27 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
>>> }
>>> EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
>>> +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
>>> + struct in6_addr *saddr, __be16 sport)
>>> +{
>>> + if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
>>> + (saddr && sport) &&
>>> + (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) || inet_sk(sk)->inet_sport != sport)) {
>>> + struct sock *sk_egress;
>>> +
>>> + bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr, fl6->fl6_dport,
>>> + saddr, ntohs(sport), 0, &sk_egress);
>>
>> iirc, in the ingress path, the sk could also be selected by a tc bpf prog
>> doing bpf_sk_assign. Then this re-run on sk_lookup may give an incorrect
>> result?
>>
>
> If it does give the incorrect result, we still fallback to the normal
> egress path.
>
>> In general, is it necessary to rerun any bpf prog if the user space has
>> specified the IP[v6]_ORIGDSTADDR.
>>
>
> More generally, wouldn't that also be the case if someone calls
> bpf_sk_assign() in both TC and sk_lookup on ingress? It can lead to some
> interference between the two.
>
> It seems like the interesting cases are:
> 1. Calling bpf_sk_assign() on both TC and sk_lookup ingress: if this
> happens sk_lookup on egress should match the correct socket when doing
> the reverse lookup;
> 2. Calling bpf_sk_assign() only on ingress TC: in this case it will
> depend if an sk_lookup program is attached or not:
> a. If not, there's no reverse lookup on egress either;
> b. But if yes, although the reverse sk_lookup here won't match the
> initial socket assigned at ingress TC, the packets will still fallback
> to the normal egress path;
>
> You're right in that case 2b above will continue with the same
> restrictions as before.
imo, all these cases you described above is a good signal that neither the TC
nor the BPF_PROG_TYPE_SK_LOOKUP program type is the right bpf prog to run here
_if_ a bpf prog was indeed useful here.
I only followed some of the other discussion in v1 and v2. For now, I still
don't see running a bpf prog is useful here to process the IP[V6]_ORIGDSTADDR.
Jakub Sitnicki and I had discussed a similar point during the LPC.
If a bpf prog was indeed needed to process a cmsg, this should work closer to
what Jakub Sitnicki had proposed for getting the meta data during LPC (but I
believe the verdict there is also that a bpf prog is not needed). It should be a
bpf prog that can work in a more generic way to process any BPF specific cmsg
and can do other operations in the future using kfunc (e.g. route lookup or
something). Saying yes/no to a particular local IP and port could be one of
things that the bpf prog can do when processing the cmsg.
next prev parent reply other threads:[~2024-09-24 23:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-13 9:39 [RFC PATCH 0/3] Allow sk_lookup UDP return traffic to egress Tiago Lam
2024-09-13 9:39 ` [RFC PATCH 1/3] ipv4: Run a reverse sk_lookup on sendmsg Tiago Lam
2024-09-18 12:45 ` Willem de Bruijn
2024-09-20 16:57 ` Tiago Lam
2024-09-13 9:39 ` [RFC PATCH 2/3] ipv6: " Tiago Lam
2024-09-13 18:24 ` Martin KaFai Lau
2024-09-17 16:15 ` Tiago Lam
2024-09-24 23:58 ` Martin KaFai Lau [this message]
2024-10-11 11:21 ` Tiago Lam
2024-09-14 8:59 ` Simon Horman
2024-09-17 16:06 ` Tiago Lam
2024-09-14 11:40 ` Eric Dumazet
2024-09-17 16:03 ` Tiago Lam
2024-09-13 9:39 ` [RFC PATCH 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
2024-09-13 12:10 ` Philo Lu
2024-09-17 16:00 ` Tiago Lam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0288caf4-3c9b-4eae-a2b4-f8934badc270@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=tiagolam@cloudflare.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox