From: Jakub Sitnicki <jakub@cloudflare.com>
To: David Laight <David.Laight@ACULAB.COM>,
Martin KaFai Lau <martin.lau@linux.dev>,
'Tiago Lam' <tiagolam@cloudflare.com>,
Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"kernel-team@cloudflare.com" <kernel-team@cloudflare.com>
Subject: Re: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
Date: Mon, 23 Sep 2024 16:56:14 +0200 [thread overview]
Message-ID: <87r09a771t.fsf@cloudflare.com> (raw)
In-Reply-To: <855fc71343a149479c7da96438bf9e32@AcuMS.aculab.com> (David Laight's message of "Mon, 23 Sep 2024 13:08:36 +0000")
On Mon, Sep 23, 2024 at 01:08 PM GMT, David Laight wrote:
> From: Tiago Lam <tiagolam@cloudflare.com>
[...]
>> To limit its usage, a reverse socket lookup is performed to check if the
>> configured egress source address and/or port have any ingress sk_lookup
>> match. If it does, traffic is allowed to proceed, otherwise it falls
>> back to the regular egress path.
>
> Is that really useful/necessary?
We've been asking ourselves the same question during Plumbers with
Martin.
Unprivileges processes can already source UDP traffic from (almost) any
IP & port by binding a socket to the desired source port and passing
IP_PKTINFO. So perhaps having a reverse socket lookup is an overkill.
We should probably respect net.ipv4.ip_local_reserved_ports and
net.ipv4.ip_unprivileged_port_start system settings, though, or check
for relevant caps.
Open question if it is acceptable to disregard exclusive UDP port
ownership by sockets binding to a wildcard address without SO_REUSEADDR?
[...]
> The check (but not the commit message) implies that some 'bpf thingy'
> also needs to be enabled.
> Any check would need to include the test that the program sending the packet
> has the ability to send a packet through the ingress socket.
> Additionally a check for the sending process having (IIRC) CAP_NET_ADMIN
> (which would let the process send the message by other means) would save the
> slow path.
>
> The code we have sends a lot of UDP RTP (typically 160 bytes of audio every 20ms).
> There is actually no reason for there to be a valid matching ingress path.
> (That code would benefit from being able to bind a lot of ports to the same
> UDP socket.)
>
> David
>
>>
>> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
>> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
>> ---
>> net/ipv6/datagram.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> net/ipv6/udp.c | 8 ++++--
>> 2 files changed, 85 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
>> index fff78496803d..369c64a478ec 100644
>> --- a/net/ipv6/datagram.c
>> +++ b/net/ipv6/datagram.c
>> @@ -756,6 +756,29 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
>> }
>> EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
>>
>> +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
>> + struct in6_addr *saddr, __be16 sport)
>> +{
>> + if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
>> + (saddr && sport) &&
>> + (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) ||
>> + inet_sk(sk)->inet_sport != sport)) {
>> + struct sock *sk_egress;
>> +
>> + bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr,
>> + fl6->fl6_dport, saddr, ntohs(sport), 0,
>> + &sk_egress);
>> + if (!IS_ERR_OR_NULL(sk_egress) && sk_egress == sk)
>> + return true;
>> +
>> + net_info_ratelimited("No reverse socket lookup match for local addr %pI6:%d remote addr
>> %pI6:%d\n",
>> + &saddr, ntohs(sport), &fl6->daddr,
>> + ntohs(fl6->fl6_dport));
>> + }
>> +
>> + return false;
>> +}
>> +
>> int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>> struct msghdr *msg, struct flowi6 *fl6,
>> struct ipcm6_cookie *ipc6)
>> @@ -844,7 +867,63 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>>
>> break;
>> }
>> + case IPV6_ORIGDSTADDR:
>> + {
>> + struct sockaddr_in6 *sockaddr_in;
>> + struct net_device *dev = NULL;
>> +
>> + if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct sockaddr_in6))) {
>> + err = -EINVAL;
>> + goto exit_f;
>> + }
>> +
>> + sockaddr_in = (struct sockaddr_in6 *)CMSG_DATA(cmsg);
>> +
>> + addr_type = __ipv6_addr_type(&sockaddr_in->sin6_addr);
>> +
>> + if (addr_type & IPV6_ADDR_LINKLOCAL)
>> + return -EINVAL;
>> +
>> + /* If we're egressing with a different source address
>> + * and/or port, we perform a reverse socket lookup. The
>> + * rationale behind this is that we can allow return
>> + * UDP traffic that has ingressed through sk_lookup to
>> + * also egress correctly. In case the reverse lookup
>> + * fails, we continue with the normal path.
>> + *
>> + * The lookup is performed if either source address
>> + * and/or port changed, and neither is "0".
>> + */
>> + if (reverse_sk_lookup(fl6, sk, &sockaddr_in->sin6_addr,
>> + sockaddr_in->sin6_port)) {
>> + /* Override the source port and address to use
>> + * with the one we got in cmsg and bail early.
>> + */
>> + fl6->saddr = sockaddr_in->sin6_addr;
>> + fl6->fl6_sport = sockaddr_in->sin6_port;
>> + break;
>> + }
>>
>> + if (addr_type != IPV6_ADDR_ANY) {
>> + int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
>> +
>> + if (!ipv6_can_nonlocal_bind(net, inet_sk(sk)) &&
>> + !ipv6_chk_addr_and_flags(net,
>> + &sockaddr_in->sin6_addr,
>> + dev, !strict, 0,
>> + IFA_F_TENTATIVE) &&
>> + !ipv6_chk_acast_addr_src(net, dev,
>> + &sockaddr_in->sin6_addr))
>> + err = -EINVAL;
>> + else
>> + fl6->saddr = sockaddr_in->sin6_addr;
>> + }
>> +
>> + if (err)
>> + goto exit_f;
>> +
>> + break;
>> + }
>> case IPV6_FLOWINFO:
>> if (cmsg->cmsg_len < CMSG_LEN(4)) {
>> err = -EINVAL;
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 6602a2e9cdb5..6121cbb71ad3 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -1476,6 +1476,12 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>
>> fl6->flowi6_uid = sk->sk_uid;
>>
>> + /* We use fl6's daddr and fl6_sport in the reverse sk_lookup done
>> + * within ip6_datagram_send_ctl() now.
>> + */
>> + fl6->daddr = *daddr;
>> + fl6->fl6_sport = inet->inet_sport;
>> +
>> if (msg->msg_controllen) {
>> opt = &opt_space;
>> memset(opt, 0, sizeof(struct ipv6_txoptions));
>> @@ -1511,10 +1517,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>
>> fl6->flowi6_proto = sk->sk_protocol;
>> fl6->flowi6_mark = ipc6.sockc.mark;
>> - fl6->daddr = *daddr;
>> if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
>> fl6->saddr = np->saddr;
>> - fl6->fl6_sport = inet->inet_sport;
>>
>> if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
>> err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
>>
>> --
>> 2.34.1
>>
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2024-09-23 14:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
2024-09-21 9:12 ` Willem de Bruijn
2024-09-22 16:23 ` ericnetdev dumazet
2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
2024-09-21 9:13 ` Willem de Bruijn
2024-09-23 13:08 ` David Laight
2024-09-23 14:56 ` Jakub Sitnicki [this message]
2024-09-23 15:45 ` David Laight
2024-09-23 16:44 ` Jakub Sitnicki
2024-09-20 17:02 ` [RFC PATCH v2 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
2024-09-23 14:34 ` [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r09a771t.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=David.Laight@ACULAB.COM \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=tiagolam@cloudflare.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.