From: Jakub Sitnicki <jakub@cloudflare.com>
To: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Marek Majkowski <marek@cloudflare.com>
Subject: Re: [PATCH bpf-next v3 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
Date: Tue, 07 Jul 2020 11:21:23 +0200 [thread overview]
Message-ID: <87lfjvadf0.fsf@cloudflare.com> (raw)
In-Reply-To: <20200702092416.11961-3-jakub@cloudflare.com>
On Thu, Jul 02, 2020 at 11:24 AM CEST, Jakub Sitnicki wrote:
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
> BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
> when looking up a listening socket for a new connection request for
> connection oriented protocols, or when looking up an unconnected socket for
> a packet for connection-less protocols.
>
> When called, SK_LOOKUP BPF program can select a socket that will receive
> the packet. This serves as a mechanism to overcome the limits of what
> bind() API allows to express. Two use-cases driving this work are:
>
> (1) steer packets destined to an IP range, on fixed port to a socket
>
> 192.0.2.0/24, port 80 -> NGINX socket
>
> (2) steer packets destined to an IP address, on any port to a socket
>
> 198.51.100.1, any port -> L7 proxy socket
>
> In its run-time context program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
>
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
>
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>
> Notes:
> v3:
> - Allow bpf_sk_assign helper to replace previously selected socket only
> when BPF_SK_LOOKUP_F_REPLACE flag is set, as a precaution for multiple
> programs running in series to accidentally override each other's verdict.
> - Let BPF program decide that load-balancing within a reuseport socket group
> should be skipped for the socket selected with bpf_sk_assign() by passing
> BPF_SK_LOOKUP_F_NO_REUSEPORT flag. (Martin)
> - Extend struct bpf_sk_lookup program context with an 'sk' field containing
> the selected socket with an intention for multiple attached program
> running in series to see each other's choices. However, currently the
> verifier doesn't allow checking if pointer is set.
> - Use bpf-netns infra for link-based multi-program attachment. (Alexei)
> - Get rid of macros in convert_ctx_access to make it easier to read.
> - Disallow 1-,2-byte access to context fields containing IP addresses.
>
> v2:
> - Make bpf_sk_assign reject sockets that don't use RCU freeing.
> Update bpf_sk_assign docs accordingly. (Martin)
> - Change bpf_sk_assign proto to take PTR_TO_SOCKET as argument. (Martin)
> - Fix broken build when CONFIG_INET is not selected. (Martin)
> - Rename bpf_sk_lookup{} src_/dst_* fields remote_/local_*. (Martin)
> - Enforce BPF_SK_LOOKUP attach point on load & attach. (Martin)
>
> include/linux/bpf-netns.h | 3 +
> include/linux/bpf_types.h | 2 +
> include/linux/filter.h | 19 ++++
> include/uapi/linux/bpf.h | 74 +++++++++++++++
> kernel/bpf/net_namespace.c | 5 +
> kernel/bpf/syscall.c | 9 ++
> net/core/filter.c | 186 +++++++++++++++++++++++++++++++++++++
> scripts/bpf_helpers_doc.py | 9 +-
> 8 files changed, 306 insertions(+), 1 deletion(-)
>
[...]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c796e141ea8e..286f90e0c824 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9219,6 +9219,192 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
>
> const struct bpf_prog_ops sk_reuseport_prog_ops = {
> };
> +
> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
> + struct sock *, sk, u64, flags)
> +{
> + if (unlikely(flags & ~(BPF_SK_LOOKUP_F_REPLACE |
> + BPF_SK_LOOKUP_F_NO_REUSEPORT)))
> + return -EINVAL;
> + if (unlikely(sk_is_refcounted(sk)))
> + return -ESOCKTNOSUPPORT; /* reject non-RCU freed sockets */
> + if (unlikely(sk->sk_state == TCP_ESTABLISHED))
> + return -ESOCKTNOSUPPORT; /* reject connected sockets */
> +
> + /* Check if socket is suitable for packet L3/L4 protocol */
> + if (sk->sk_protocol != ctx->protocol)
> + return -EPROTOTYPE;
> + if (sk->sk_family != ctx->family &&
> + (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
> + return -EAFNOSUPPORT;
> +
> + if (ctx->selected_sk && !(flags & BPF_SK_LOOKUP_F_REPLACE))
> + return -EEXIST;
> +
> + /* Select socket as lookup result */
> + ctx->selected_sk = sk;
> + ctx->no_reuseport = flags & BPF_SK_LOOKUP_F_NO_REUSEPORT;
> + return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_sk_lookup_assign_proto = {
> + .func = bpf_sk_lookup_assign,
> + .gpl_only = false,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_PTR_TO_SOCKET,
> + .arg3_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> + switch (func_id) {
> + case BPF_FUNC_sk_assign:
> + return &bpf_sk_lookup_assign_proto;
> + case BPF_FUNC_sk_release:
> + return &bpf_sk_release_proto;
> + default:
> + return bpf_base_func_proto(func_id);
> + }
> +}
> +
> +static bool sk_lookup_is_valid_access(int off, int size,
> + enum bpf_access_type type,
> + const struct bpf_prog *prog,
> + struct bpf_insn_access_aux *info)
> +{
> + if (off < 0 || off >= sizeof(struct bpf_sk_lookup))
> + return false;
> + if (off % size != 0)
> + return false;
> + if (type != BPF_READ)
> + return false;
> +
> + switch (off) {
> + case bpf_ctx_range(struct bpf_sk_lookup, family):
> + case bpf_ctx_range(struct bpf_sk_lookup, protocol):
> + case bpf_ctx_range(struct bpf_sk_lookup, remote_ip4):
> + case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
> + case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
> + case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
> + case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
> + case bpf_ctx_range(struct bpf_sk_lookup, local_port):
> + return size == sizeof(__u32);
> +
> + case offsetof(struct bpf_sk_lookup, sk):
> + info->reg_type = PTR_TO_SOCKET;
There's a bug here. bpf_sk_lookup 'sk' field is initially NULL.
reg_type should be PTR_TO_SOCKET_OR_NULL to inform the verifier.
Will fix in v4.
> + return size == sizeof(__u64);
> +
> + default:
> + return false;
> + }
> +}
> +
> +static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
> + const struct bpf_insn *si,
> + struct bpf_insn *insn_buf,
> + struct bpf_prog *prog,
> + u32 *target_size)
> +{
> + struct bpf_insn *insn = insn_buf;
> +#if IS_ENABLED(CONFIG_IPV6)
> + int off;
> +#endif
> +
> + switch (si->off) {
> + case offsetof(struct bpf_sk_lookup, family):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, family) != 2);
> +
> + *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, family));
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, protocol):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, protocol) != 2);
> +
> + *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, protocol));
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, remote_ip4):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, v4.saddr) != 4);
> +
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, v4.saddr));
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, local_ip4):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, v4.daddr) != 4);
> +
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, v4.daddr));
> + break;
> +
> + case bpf_ctx_range_till(struct bpf_sk_lookup,
> + remote_ip6[0], remote_ip6[3]):
> +#if IS_ENABLED(CONFIG_IPV6)
> + BUILD_BUG_ON(sizeof_field(struct in6_addr, s6_addr32[0]) != 4);
> +
> + off = si->off;
> + off -= offsetof(struct bpf_sk_lookup, remote_ip6[0]);
> + off += offsetof(struct in6_addr, s6_addr32[0]);
> + *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, v6.saddr));
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +#else
> + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +#endif
> + break;
> +
> + case bpf_ctx_range_till(struct bpf_sk_lookup,
> + local_ip6[0], local_ip6[3]):
> +#if IS_ENABLED(CONFIG_IPV6)
> + BUILD_BUG_ON(sizeof_field(struct in6_addr, s6_addr32[0]) != 4);
> +
> + off = si->off;
> + off -= offsetof(struct bpf_sk_lookup, local_ip6[0]);
> + off += offsetof(struct in6_addr, s6_addr32[0]);
> + *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, v6.daddr));
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +#else
> + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +#endif
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, remote_port):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, sport) != 2);
> +
> + *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, sport));
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, local_port):
> + BUILD_BUG_ON(sizeof_field(struct bpf_sk_lookup_kern, dport) != 2);
> +
> + *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, dport));
> + break;
> +
> + case offsetof(struct bpf_sk_lookup, sk):
> + *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
> + offsetof(struct bpf_sk_lookup_kern, selected_sk));
> + break;
> + }
> +
> + return insn - insn_buf;
> +}
> +
> +const struct bpf_prog_ops sk_lookup_prog_ops = {
> +};
> +
> +const struct bpf_verifier_ops sk_lookup_verifier_ops = {
> + .get_func_proto = sk_lookup_func_proto,
> + .is_valid_access = sk_lookup_is_valid_access,
> + .convert_ctx_access = sk_lookup_convert_ctx_access,
> +};
> +
> #endif /* CONFIG_INET */
>
> DEFINE_BPF_DISPATCHER(xdp)
[...]
next prev parent reply other threads:[~2020-07-07 9:21 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-02 9:24 [PATCH bpf-next v3 00/16] Run a BPF program on socket lookup Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
2020-07-09 3:44 ` Andrii Nakryiko
2020-07-09 12:49 ` Jakub Sitnicki
2020-07-09 22:02 ` Andrii Nakryiko
2020-07-10 19:23 ` Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-04 18:42 ` Yonghong Song
2020-07-06 11:44 ` Jakub Sitnicki
2020-07-05 9:20 ` kernel test robot
2020-07-05 9:20 ` kernel test robot
2020-07-05 9:20 ` [RFC PATCH] bpf: sk_lookup_prog_ops can be static kernel test robot
2020-07-05 9:20 ` kernel test robot
2020-07-07 9:21 ` Jakub Sitnicki [this message]
2020-07-09 4:08 ` [PATCH bpf-next v3 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Andrii Nakryiko
2020-07-09 13:25 ` Jakub Sitnicki
2020-07-09 23:09 ` Andrii Nakryiko
2020-07-10 8:55 ` Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 10:27 ` Lorenz Bauer
2020-07-02 12:46 ` Jakub Sitnicki
2020-07-02 13:19 ` Lorenz Bauer
2020-07-06 11:24 ` Jakub Sitnicki
2020-07-06 12:06 ` Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 14:51 ` kernel test robot
2020-07-02 14:51 ` kernel test robot
2020-07-03 13:04 ` Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-07-09 4:23 ` Andrii Nakryiko
2020-07-09 15:51 ` Jakub Sitnicki
2020-07-09 23:13 ` Andrii Nakryiko
2020-07-10 8:37 ` Jakub Sitnicki
2020-07-10 18:55 ` Andrii Nakryiko
2020-07-10 19:24 ` Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-07-02 9:24 ` [PATCH bpf-next v3 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-07-02 11:01 ` Lorenz Bauer
2020-07-02 12:59 ` Jakub Sitnicki
2020-07-09 4:28 ` Andrii Nakryiko
2020-07-09 15:54 ` Jakub Sitnicki
2020-07-02 11:05 ` [PATCH bpf-next v3 00/16] Run a BPF program on socket lookup Lorenz Bauer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lfjvadf0.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=marek@cloudflare.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.