From: Tiago Lam <tiagolam@cloudflare.com>
To: "David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
Jakub Sitnicki <jakub@cloudflare.com>,
Tiago Lam <tiagolam@cloudflare.com>,
kernel-team@cloudflare.com
Subject: [RFC PATCH 1/3] ipv4: Run a reverse sk_lookup on sendmsg.
Date: Fri, 13 Sep 2024 10:39:19 +0100 [thread overview]
Message-ID: <20240913-reverse-sk-lookup-v1-1-e721ea003d4c@cloudflare.com> (raw)
In-Reply-To: <20240913-reverse-sk-lookup-v1-0-e721ea003d4c@cloudflare.com>
In order to check if egress traffic should be allowed through, we run a
reverse socket lookup (i.e. normal socket lookup with the src/dst
addresses and ports reversed) to check if the corresponding ingress
traffic is allowed in. Thus, if there's a sk_lookup reverse call
returns a socket that matches the egress socket, we also let the egress
traffic through - following the principle of, allowing return traffic to
proceed if ingress traffic is allowed in. The reverse lookup is only
performed in case an sk_lookup ebpf program is attached and the source
address and/or port for the return traffic have been modified.
The src address and port can be modified by using ancilliary messages.
Up until now, it was possible to specify a different source address to
sendmsg by providing it in an IP_PKTINFO anciliarry message, but there's
no way to change the source port. This patch also extends the ancilliary
messages supported by sendmsg to support the IP_ORIGDSTADDR ancilliary
message, reusing the same cmsg and struct used in recvmsg - which
already supports specifying a port.
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
---
include/net/ip.h | 1 +
net/ipv4/ip_sockglue.c | 11 +++++++++++
net/ipv4/udp.c | 33 ++++++++++++++++++++++++++++++++-
3 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index c5606cadb1a5..e5753abd7247 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -75,6 +75,7 @@ static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
struct ipcm_cookie {
struct sockcm_cookie sockc;
__be32 addr;
+ __be16 port;
int oif;
struct ip_options_rcu *opt;
__u8 protocol;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index cf377377b52d..6e55bd25b5f7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -297,6 +297,17 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
ipc->addr = info->ipi_spec_dst.s_addr;
break;
}
+ case IP_ORIGDSTADDR:
+ {
+ struct sockaddr_in *dst_addr;
+
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct sockaddr_in)))
+ return -EINVAL;
+ dst_addr = (struct sockaddr_in *)CMSG_DATA(cmsg);
+ ipc->port = dst_addr->sin_port;
+ ipc->addr = dst_addr->sin_addr.s_addr;
+ break;
+ }
case IP_TTL:
if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
return -EINVAL;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 49c622e743e8..b9dc0a88b0c6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1060,6 +1060,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
struct flowi4 fl4_stack;
struct flowi4 *fl4;
+ __u8 flow_flags = inet_sk_flowi_flags(sk);
int ulen = len;
struct ipcm_cookie ipc;
struct rtable *rt = NULL;
@@ -1179,6 +1180,37 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
}
+ /* If we're egressing with a different source address and/or port, we
+ * perform a reverse socket lookup. The rationale behind this is that we
+ * can allow return UDP traffic that has ingressed through sk_lookup to
+ * also egress correctly. In case this the reverse lookup fails.
+ *
+ * The lookup is performed if either source address and/or port changed, and
+ * neither is "0".
+ */
+ if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+ !connected &&
+ (ipc.port && ipc.addr) &&
+ (inet->inet_saddr != ipc.addr || inet->inet_sport != ipc.port)) {
+ struct sock *sk_egress;
+
+ bpf_sk_lookup_run_v4(sock_net(sk), IPPROTO_UDP,
+ daddr, dport, ipc.addr, ntohs(ipc.port), 1, &sk_egress);
+ if (IS_ERR_OR_NULL(sk_egress) ||
+ atomic64_read(&sk_egress->sk_cookie) != atomic64_read(&sk->sk_cookie)) {
+ net_info_ratelimited("No reverse socket lookup match for local addr %pI4:%d remote addr %pI4:%d\n",
+ &ipc.addr, ntohs(ipc.port), &daddr, ntohs(dport));
+ } else {
+ /* Override the source port to use with the one we got in cmsg,
+ * and tell routing to let us use a non-local address. Otherwise
+ * route lookups will fail with non-local source address when
+ * IP_TRANSPARENT isn't set.
+ */
+ inet->inet_sport = ipc.port;
+ flow_flags |= FLOWI_FLAG_ANYSRC;
+ }
+ }
+
saddr = ipc.addr;
ipc.addr = faddr = daddr;
@@ -1223,7 +1255,6 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (!rt) {
struct net *net = sock_net(sk);
- __u8 flow_flags = inet_sk_flowi_flags(sk);
fl4 = &fl4_stack;
--
2.34.1
next prev parent reply other threads:[~2024-09-13 9:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-13 9:39 [RFC PATCH 0/3] Allow sk_lookup UDP return traffic to egress Tiago Lam
2024-09-13 9:39 ` Tiago Lam [this message]
2024-09-18 12:45 ` [RFC PATCH 1/3] ipv4: Run a reverse sk_lookup on sendmsg Willem de Bruijn
2024-09-20 16:57 ` Tiago Lam
2024-09-13 9:39 ` [RFC PATCH 2/3] ipv6: " Tiago Lam
2024-09-13 18:24 ` Martin KaFai Lau
2024-09-17 16:15 ` Tiago Lam
2024-09-24 23:58 ` Martin KaFai Lau
2024-10-11 11:21 ` Tiago Lam
2024-09-14 8:59 ` Simon Horman
2024-09-17 16:06 ` Tiago Lam
2024-09-14 11:40 ` Eric Dumazet
2024-09-17 16:03 ` Tiago Lam
2024-09-13 9:39 ` [RFC PATCH 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
2024-09-13 12:10 ` Philo Lu
2024-09-17 16:00 ` Tiago Lam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240913-reverse-sk-lookup-v1-1-e721ea003d4c@cloudflare.com \
--to=tiagolam@cloudflare.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox