[PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
@ 2024-03-26  2:21 Yonghong Song
  2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

One of our internal services started to use sk_msg program and currently
it used existing prog attach/detach2 as demonstrated in selftests.
But attach/detach of all other bpf programs are based on bpf_link.
Consistent attach/detach APIs for all programs will make things easy to
undersand and less error prone. So this patch added bpf_link
support for BPF_PROG_TYPE_SK_MSG. Based on comments from
previous RFC patch, I added BPF_PROG_TYPE_SK_SKB support as well
as both program types have similar treatment w.r.t. bpf_link
handling.

For the patch series, patch 1 added kernel support. Patch 2
added libbpf support. Patch 3 added bpftool support and
patches 4/5 added some new tests.

Changelogs:
  v2 -> v3:
    - consolidate link types of sk_msg and sk_skb to
      a single link type BPF_PROG_TYPE_SOCKMAP.
    - fix bpf_link lifecycle issue. in v2, after bpf_link
      is attached, a subsequent prog_attach could change
      that bpf_link. this patch makes bpf_link having
      correct behavior such that it won't go away facing
      other prog/link attach for the same map and the same
      attach type.

Yonghong Song (5):
  bpf: Add bpf_link support for sk_msg and sk_skb progs
  libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
  bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP
  selftests/bpf: Refactor out helper functions for a few tests
  selftests/bpf: Add some tests with new bpf_program__attach_sockmap()
    APIs

 include/linux/bpf.h                           |   6 +
 include/linux/skmsg.h                         |   4 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/syscall.c                          |   4 +
 net/core/sock_map.c                           | 263 +++++++++++++++++-
 tools/bpf/bpftool/link.c                      |   9 +
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/lib/bpf/libbpf.c                        |   7 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 128 ++++++++-
 .../selftests/bpf/prog_tests/sockmap_listen.c |  38 +++
 .../bpf/progs/test_skmsg_load_helpers.c       |  15 +-
 .../bpf/progs/test_sockmap_pass_prog.c        |  11 +-
 .../progs/test_sockmap_skb_verdict_attach.c   |   2 +-
 15 files changed, 473 insertions(+), 27 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
@ 2024-03-26  2:21 ` Yonghong Song
  2024-04-02 17:39   ` Eduard Zingerman
  2024-04-02 17:45   ` Andrii Nakryiko
  2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h            |   6 +
 include/linux/skmsg.h          |   4 +
 include/uapi/linux/bpf.h       |   5 +
 kernel/bpf/syscall.c           |   4 +
 net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |   5 +
 6 files changed, 279 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 62762390c93d..5034c1b4ded7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2996,6 +2996,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
 int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
 int sock_map_bpf_prog_query(const union bpf_attr *attr,
 			    union bpf_attr __user *uattr);
+int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
 
 void sock_map_unhash(struct sock *sk);
 void sock_map_destroy(struct sock *sk);
@@ -3094,6 +3095,11 @@ static inline int sock_map_bpf_prog_query(const union bpf_attr *attr,
 {
 	return -EINVAL;
 }
+
+static inline int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_BPF_SYSCALL */
 #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
 
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index e65ec3fd2799..9c8dd4c01412 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -58,6 +58,10 @@ struct sk_psock_progs {
 	struct bpf_prog			*stream_parser;
 	struct bpf_prog			*stream_verdict;
 	struct bpf_prog			*skb_verdict;
+	struct bpf_link			*msg_parser_link;
+	struct bpf_link			*stream_parser_link;
+	struct bpf_link			*stream_verdict_link;
+	struct bpf_link			*skb_verdict_link;
 };
 
 enum sk_psock_state_bits {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9585f5345353..31660c3ffc01 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1135,6 +1135,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TCX = 11,
 	BPF_LINK_TYPE_UPROBE_MULTI = 12,
 	BPF_LINK_TYPE_NETKIT = 13,
+	BPF_LINK_TYPE_SOCKMAP = 14,
 	__MAX_BPF_LINK_TYPE,
 };
 
@@ -6720,6 +6721,10 @@ struct bpf_link_info {
 			__u32 ifindex;
 			__u32 attach_type;
 		} netkit;
+		struct {
+			__u32 map_id;
+			__u32 attach_type;
+		} sockmap;
 	};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e44c276e8617..7d392ec83655 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		ret = netns_bpf_link_create(attr, prog);
 		break;
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_SK_SKB:
+		ret = sock_map_link_create(attr, prog);
+		break;
 #ifdef CONFIG_NET
 	case BPF_PROG_TYPE_XDP:
 		ret = bpf_xdp_link_attach(attr, prog);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..dafc9aa6e192 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -24,8 +24,12 @@ struct bpf_stab {
 #define SOCK_CREATE_FLAG_MASK				\
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
+static DEFINE_MUTEX(sockmap_prog_update_mutex);
+static DEFINE_MUTEX(sockmap_link_mutex);
+
 static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
-				struct bpf_prog *old, u32 which);
+				struct bpf_prog *old, struct bpf_link *link,
+				u32 which);
 static struct sk_psock_progs *sock_map_progs(struct bpf_map *map);
 
 static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
@@ -71,7 +75,7 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-	ret = sock_map_prog_update(map, prog, NULL, attr->attach_type);
+	ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
 	fdput(f);
 	return ret;
 }
@@ -103,7 +107,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 		goto put_prog;
 	}
 
-	ret = sock_map_prog_update(map, NULL, prog, attr->attach_type);
+	ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
 put_prog:
 	bpf_prog_put(prog);
 put_map:
@@ -1488,21 +1492,90 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
 	return 0;
 }
 
+static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
+				struct bpf_link *link, bool skip_check, u32 which)
+{
+	struct sk_psock_progs *progs = sock_map_progs(map);
+
+	switch (which) {
+	case BPF_SK_MSG_VERDICT:
+		if (!skip_check &&
+		    ((!link && progs->msg_parser_link) ||
+		     (link && link != progs->msg_parser_link)))
+			return -EBUSY;
+		*plink = &progs->msg_parser_link;
+		break;
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+	case BPF_SK_SKB_STREAM_PARSER:
+		if (!skip_check &&
+		    ((!link && progs->stream_parser_link) ||
+		     (link && link != progs->stream_parser_link)))
+			return -EBUSY;
+		*plink = &progs->stream_parser_link;
+		break;
+#endif
+	case BPF_SK_SKB_STREAM_VERDICT:
+		if (!skip_check &&
+		    ((!link && progs->stream_verdict_link) ||
+		     (link && link != progs->stream_verdict_link)))
+			return -EBUSY;
+		*plink = &progs->stream_verdict_link;
+		break;
+	case BPF_SK_SKB_VERDICT:
+		if (!skip_check &&
+		    ((!link && progs->skb_verdict_link) ||
+		     (link && link != progs->skb_verdict_link)))
+			return -EBUSY;
+		*plink = &progs->skb_verdict_link;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+/* Handle the following four cases:
+ * prog_attach: prog != NULL, old == NULL, link == NULL
+ * prog_detach: prog == NULL, old != NULL, link == NULL
+ * link_attach: prog != NULL, old == NULL, link != NULL
+ * link_detach: prog == NULL, old != NULL, link != NULL
+ */
 static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
-				struct bpf_prog *old, u32 which)
+				struct bpf_prog *old, struct bpf_link *link,
+				u32 which)
 {
 	struct bpf_prog **pprog;
+	struct bpf_link **plink;
 	int ret;
 
+	mutex_lock(&sockmap_prog_update_mutex);
+
 	ret = sock_map_prog_lookup(map, &pprog, which);
 	if (ret)
-		return ret;
+		goto out;
 
-	if (old)
-		return psock_replace_prog(pprog, prog, old);
+	if (!link || prog)
+		ret = sock_map_link_lookup(map, &plink, NULL, false, which);
+	else
+		ret = sock_map_link_lookup(map, &plink, NULL, true, which);
+	if (ret)
+		goto out;
+
+	if (old) {
+		ret = psock_replace_prog(pprog, prog, old);
+		if (!ret)
+			*plink = NULL;
+		goto out;
+	}
 
 	psock_set_prog(pprog, prog);
-	return 0;
+	if (link)
+		*plink = link;
+
+out:
+	mutex_unlock(&sockmap_prog_update_mutex);
+	return ret;
 }
 
 int sock_map_bpf_prog_query(const union bpf_attr *attr,
@@ -1657,6 +1730,180 @@ void sock_map_close(struct sock *sk, long timeout)
 }
 EXPORT_SYMBOL_GPL(sock_map_close);
 
+struct sockmap_link {
+	struct bpf_link link;
+	struct bpf_map *map;
+	enum bpf_attach_type attach_type;
+};
+
+static struct sockmap_link *get_sockmap_link(const struct bpf_link *link)
+{
+	return container_of(link, struct sockmap_link, link);
+}
+
+static void sock_map_link_release(struct bpf_link *link)
+{
+	struct sockmap_link *sockmap_link = get_sockmap_link(link);
+
+	mutex_lock(&sockmap_link_mutex);
+	if (sockmap_link->map) {
+		(void)sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
+					   sockmap_link->attach_type);
+		bpf_map_put_with_uref(sockmap_link->map);
+		sockmap_link->map = NULL;
+	}
+	mutex_unlock(&sockmap_link_mutex);
+}
+
+static int sock_map_link_detach(struct bpf_link *link)
+{
+	sock_map_link_release(link);
+	return 0;
+}
+
+static void sock_map_link_dealloc(struct bpf_link *link)
+{
+	kfree(get_sockmap_link(link));
+}
+
+/* Handle the following two cases:
+ * case 1: link != NULL, prog != NULL, old != NULL
+ * case 2: link != NULL, prog != NULL, old == NULL
+ */
+static int sock_map_link_update_prog(struct bpf_link *link,
+				     struct bpf_prog *prog,
+				     struct bpf_prog *old)
+{
+	const struct sockmap_link *sockmap_link = get_sockmap_link(link);
+	struct bpf_prog **pprog;
+	struct bpf_link **plink;
+	int ret = 0;
+
+	mutex_lock(&sockmap_prog_update_mutex);
+
+	/* If old prog not NULL, ensure old prog the same as link->prog. */
+	if (old && link->prog != old) {
+		ret = -EINVAL;
+		goto out;
+	}
+	/* Ensure link->prog has the same type/attach_type as the new prog. */
+	if (link->prog->type != prog->type ||
+	    link->prog->expected_attach_type != prog->expected_attach_type) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = sock_map_prog_lookup(sockmap_link->map, &pprog,
+				   sockmap_link->attach_type);
+	if (ret)
+		goto out;
+
+	/* Ensure the same link between the one in map and the passed-in. */
+	ret = sock_map_link_lookup(sockmap_link->map, &plink, link, false,
+				   sockmap_link->attach_type);
+	if (ret)
+		goto out;
+
+	if (old)
+		return psock_replace_prog(pprog, prog, old);
+
+	psock_set_prog(pprog, prog);
+
+out:
+	if (!ret)
+		bpf_prog_inc(prog);
+	mutex_unlock(&sockmap_prog_update_mutex);
+	return ret;
+}
+
+static u32 sock_map_link_get_map_id(const struct sockmap_link *sockmap_link)
+{
+	u32 map_id = 0;
+
+	mutex_lock(&sockmap_link_mutex);
+	if (sockmap_link->map)
+		map_id = sockmap_link->map->id;
+	mutex_unlock(&sockmap_link_mutex);
+	return map_id;
+}
+
+static int sock_map_link_fill_info(const struct bpf_link *link,
+				   struct bpf_link_info *info)
+{
+	const struct sockmap_link *sockmap_link = get_sockmap_link(link);
+	u32 map_id = sock_map_link_get_map_id(sockmap_link);
+
+	info->sockmap.map_id = map_id;
+	info->sockmap.attach_type = sockmap_link->attach_type;
+	return 0;
+}
+
+static void sock_map_link_show_fdinfo(const struct bpf_link *link,
+				      struct seq_file *seq)
+{
+	const struct sockmap_link *sockmap_link = get_sockmap_link(link);
+	u32 map_id = sock_map_link_get_map_id(sockmap_link);
+
+	seq_printf(seq, "map_id:\t%u\n", map_id);
+	seq_printf(seq, "attach_type:\t%u\n", sockmap_link->attach_type);
+}
+
+static const struct bpf_link_ops sock_map_link_ops = {
+	.release = sock_map_link_release,
+	.dealloc = sock_map_link_dealloc,
+	.detach = sock_map_link_detach,
+	.update_prog = sock_map_link_update_prog,
+	.fill_link_info = sock_map_link_fill_info,
+	.show_fdinfo = sock_map_link_show_fdinfo,
+};
+
+int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_link_primer link_primer;
+	struct sockmap_link *sockmap_link;
+	enum bpf_attach_type attach_type;
+	struct bpf_map *map;
+	int ret;
+
+	if (attr->link_create.flags)
+		return -EINVAL;
+
+	map = bpf_map_get_with_uref(attr->link_create.target_fd);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	sockmap_link = kzalloc(sizeof(*sockmap_link), GFP_USER);
+	if (!sockmap_link) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	attach_type = attr->link_create.attach_type;
+	bpf_link_init(&sockmap_link->link, BPF_LINK_TYPE_SOCKMAP, &sock_map_link_ops, prog);
+	sockmap_link->map = map;
+	sockmap_link->attach_type = attach_type;
+
+	ret = bpf_link_prime(&sockmap_link->link, &link_primer);
+	if (ret) {
+		kfree(sockmap_link);
+		goto out;
+	}
+
+	ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
+	if (ret) {
+		bpf_link_cleanup(&link_primer);
+		goto out;
+	}
+
+	bpf_prog_inc(prog);
+
+	return bpf_link_settle(&link_primer);
+
+out:
+	bpf_map_put_with_uref(map);
+	return ret;
+}
+
 static int sock_map_iter_attach_target(struct bpf_prog *prog,
 				       union bpf_iter_link_info *linfo,
 				       struct bpf_iter_aux_info *aux)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9585f5345353..31660c3ffc01 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1135,6 +1135,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TCX = 11,
 	BPF_LINK_TYPE_UPROBE_MULTI = 12,
 	BPF_LINK_TYPE_NETKIT = 13,
+	BPF_LINK_TYPE_SOCKMAP = 14,
 	__MAX_BPF_LINK_TYPE,
 };
 
@@ -6720,6 +6721,10 @@ struct bpf_link_info {
 			__u32 ifindex;
 			__u32 attach_type;
 		} netkit;
+		struct {
+			__u32 map_id;
+			__u32 attach_type;
+		} sockmap;
 	};
 } __attribute__((aligned(8)));
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
  2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
  2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
@ 2024-03-26  2:22 ` Yonghong Song
  2024-04-02 13:18   ` Eduard Zingerman
  2024-04-02 17:46   ` Andrii Nakryiko
  2024-03-26  2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

Introduce a libbpf API function bpf_program__attach_sockmap()
which allow user to get a bpf_link for their corresponding programs.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/lib/bpf/libbpf.c   | 7 +++++++
 tools/lib/bpf/libbpf.h   | 2 ++
 tools/lib/bpf/libbpf.map | 1 +
 3 files changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8eb2cd4ef288..e1bb9ba3d1a5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -149,6 +149,7 @@ static const char * const link_type_name[] = {
 	[BPF_LINK_TYPE_TCX]			= "tcx",
 	[BPF_LINK_TYPE_UPROBE_MULTI]		= "uprobe_multi",
 	[BPF_LINK_TYPE_NETKIT]			= "netkit",
+	[BPF_LINK_TYPE_SOCKMAP]			= "sockmap",
 };
 
 static const char * const map_type_name[] = {
@@ -12507,6 +12508,12 @@ bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
 	return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
 }
 
+struct bpf_link *
+bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd)
+{
+	return bpf_program_attach_fd(prog, map_fd, "sockmap", NULL);
+}
+
 struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
 {
 	/* target_fd/target_ifindex use the same field in LINK_CREATE */
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f88ab50c0229..4c7ada03bf4f 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -795,6 +795,8 @@ bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
 LIBBPF_API struct bpf_link *
+bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd);
+LIBBPF_API struct bpf_link *
 bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_freplace(const struct bpf_program *prog,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 51732ecb1385..2e5c3429b974 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -411,6 +411,7 @@ LIBBPF_1.3.0 {
 LIBBPF_1.4.0 {
 	global:
 		bpf_program__attach_raw_tracepoint_opts;
+		bpf_program__attach_sockmap;
 		bpf_raw_tracepoint_open_opts;
 		bpf_token_create;
 		btf__new_split;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP
  2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
  2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
  2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
@ 2024-03-26  2:22 ` Yonghong Song
  2024-03-27 11:58   ` Quentin Monnet
  2024-03-26  2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
  2024-03-26  2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

An example output looks like:
  $ bpftool link
    1776: sk_skb  prog 49730
            map_id 0  attach_type sk_skb_verdict
            pids test_progs(8424)
    1777: sk_skb  prog 49755
            map_id 0  attach_type sk_skb_stream_verdict
            pids test_progs(8424)
    1778: sk_msg  prog 49770
            map_id 8208  attach_type sk_msg_verdict
            pids test_progs(8424)

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/bpf/bpftool/link.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
index afde9d0c2ea1..5cd503b763d7 100644
--- a/tools/bpf/bpftool/link.c
+++ b/tools/bpf/bpftool/link.c
@@ -526,6 +526,10 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
 		show_link_ifindex_json(info->netkit.ifindex, json_wtr);
 		show_link_attach_type_json(info->netkit.attach_type, json_wtr);
 		break;
+	case BPF_LINK_TYPE_SOCKMAP:
+		jsonw_uint_field(json_wtr, "map_id", info->sockmap.map_id);
+		show_link_attach_type_json(info->sockmap.attach_type, json_wtr);
+		break;
 	case BPF_LINK_TYPE_XDP:
 		show_link_ifindex_json(info->xdp.ifindex, json_wtr);
 		break;
@@ -915,6 +919,11 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
 		show_link_ifindex_plain(info->netkit.ifindex);
 		show_link_attach_type_plain(info->netkit.attach_type);
 		break;
+	case BPF_LINK_TYPE_SOCKMAP:
+		printf("\n\t");
+		printf("map_id %u  ", info->sockmap.map_id);
+		show_link_attach_type_plain(info->sockmap.attach_type);
+		break;
 	case BPF_LINK_TYPE_XDP:
 		printf("\n\t");
 		show_link_ifindex_plain(info->xdp.ifindex);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests
  2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
                   ` (2 preceding siblings ...)
  2024-03-26  2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
@ 2024-03-26  2:22 ` Yonghong Song
  2024-04-02 13:18   ` Eduard Zingerman
  2024-03-26  2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

These helper functions will be used later new tests as well.
There are no functionality change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 38 +++++++++++--------
 .../bpf/progs/test_skmsg_load_helpers.c       |  9 ++++-
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 77e26ecffa9d..63fb2da7930a 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -475,30 +475,19 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
-static void test_sockmap_skb_verdict_peek(void)
+static void test_sockmap_skb_verdict_peek_helper(int map)
 {
-	int err, map, verdict, s, c1, p1, zero = 0, sent, recvd, avail;
-	struct test_sockmap_pass_prog *pass;
+	int err, s, c1, p1, zero = 0, sent, recvd, avail;
 	char snd[256] = "0123456789";
 	char rcv[256] = "0";
 
-	pass = test_sockmap_pass_prog__open_and_load();
-	if (!ASSERT_OK_PTR(pass, "open_and_load"))
-		return;
-	verdict = bpf_program__fd(pass->progs.prog_skb_verdict);
-	map = bpf_map__fd(pass->maps.sock_map_rx);
-
-	err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
-	if (!ASSERT_OK(err, "bpf_prog_attach"))
-		goto out;
-
 	s = socket_loopback(AF_INET, SOCK_STREAM);
 	if (!ASSERT_GT(s, -1, "socket_loopback(s)"))
-		goto out;
+		return;
 
 	err = create_pair(s, AF_INET, SOCK_STREAM, &c1, &p1);
 	if (!ASSERT_OK(err, "create_pairs(s)"))
-		goto out;
+		return;
 
 	err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST);
 	if (!ASSERT_OK(err, "bpf_map_update_elem(c1)"))
@@ -520,6 +509,25 @@ static void test_sockmap_skb_verdict_peek(void)
 out_close:
 	close(c1);
 	close(p1);
+}
+
+static void test_sockmap_skb_verdict_peek(void)
+{
+	struct test_sockmap_pass_prog *pass;
+	int err, map, verdict;
+
+	pass = test_sockmap_pass_prog__open_and_load();
+	if (!ASSERT_OK_PTR(pass, "open_and_load"))
+		return;
+	verdict = bpf_program__fd(pass->progs.prog_skb_verdict);
+	map = bpf_map__fd(pass->maps.sock_map_rx);
+
+	err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto out;
+
+	test_sockmap_skb_verdict_peek_helper(map);
+
 out:
 	test_sockmap_pass_prog__destroy(pass);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c b/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
index 45e8fc75a739..b753672f04c9 100644
--- a/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
@@ -24,8 +24,7 @@ struct {
 	__type(value, __u64);
 } socket_storage SEC(".maps");
 
-SEC("sk_msg")
-int prog_msg_verdict(struct sk_msg_md *msg)
+static int prog_msg_verdict_common(struct sk_msg_md *msg)
 {
 	struct task_struct *task = (struct task_struct *)bpf_get_current_task();
 	int verdict = SK_PASS;
@@ -44,4 +43,10 @@ int prog_msg_verdict(struct sk_msg_md *msg)
 	return verdict;
 }
 
+SEC("sk_msg")
+int prog_msg_verdict(struct sk_msg_md *msg)
+{
+	return prog_msg_verdict_common(msg);
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs
  2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
                   ` (3 preceding siblings ...)
  2024-03-26  2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
@ 2024-03-26  2:22 ` Yonghong Song
  2024-04-02 13:17   ` Eduard Zingerman
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-03-26  2:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

Add a few more tests in sockmap_basic.c and sockmap_listen.c to
test bpf_link based APIs for SK_MSG and SK_SKB programs.
Link attach/detach/update are all tested.

All tests are passed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 90 +++++++++++++++++++
 .../selftests/bpf/prog_tests/sockmap_listen.c | 38 ++++++++
 .../bpf/progs/test_skmsg_load_helpers.c       |  6 ++
 .../bpf/progs/test_sockmap_pass_prog.c        | 11 ++-
 .../progs/test_sockmap_skb_verdict_attach.c   |  2 +-
 5 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 63fb2da7930a..b4b38f92b864 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -131,6 +131,40 @@ static void test_skmsg_helpers(enum bpf_map_type map_type)
 	test_skmsg_load_helpers__destroy(skel);
 }
 
+static void test_skmsg_helpers_with_link(enum bpf_map_type map_type)
+{
+	struct test_skmsg_load_helpers *skel;
+	struct bpf_link *link, *link2;
+	struct bpf_program *prog;
+	int err, map;
+
+	skel = test_skmsg_load_helpers__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_skmsg_load_helpers__open_and_load"))
+		return;
+
+	prog = skel->progs.prog_msg_verdict;
+	map = bpf_map__fd(skel->maps.sock_map);
+
+	link = bpf_program__attach_sockmap(prog, map);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
+		goto out;
+
+	err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_MSG_VERDICT, 0);
+	if (!ASSERT_ERR(err, "bpf_prog_attach"))
+		goto out;
+
+	link2 = bpf_program__attach_sockmap(skel->progs.prog_msg_verdict_clone, map);
+	if (!ASSERT_ERR_PTR(link2, "bpf_program__attach_sockmap"))
+		goto out;
+
+	err = bpf_link__update_program(link, skel->progs.prog_msg_verdict_clone);
+	ASSERT_OK(err, "bpf_link__update_program");
+
+out:
+	bpf_link__detach(link);
+	test_skmsg_load_helpers__destroy(skel);
+}
+
 static void test_sockmap_update(enum bpf_map_type map_type)
 {
 	int err, prog, src;
@@ -298,6 +332,27 @@ static void test_sockmap_skb_verdict_attach(enum bpf_attach_type first,
 	test_sockmap_skb_verdict_attach__destroy(skel);
 }
 
+static void test_sockmap_skb_verdict_attach_with_link(void)
+{
+	struct test_sockmap_skb_verdict_attach *skel;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	int map;
+
+	skel = test_sockmap_skb_verdict_attach__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+	prog = skel->progs.prog_skb_verdict;
+	map = bpf_map__fd(skel->maps.sock_map);
+	link = bpf_program__attach_sockmap(prog, map);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
+		goto out;
+
+	bpf_link__detach(link);
+out:
+	test_sockmap_skb_verdict_attach__destroy(skel);
+}
+
 static __u32 query_prog_id(int prog_fd)
 {
 	struct bpf_prog_info info = {};
@@ -532,6 +587,33 @@ static void test_sockmap_skb_verdict_peek(void)
 	test_sockmap_pass_prog__destroy(pass);
 }
 
+static void test_sockmap_skb_verdict_peek_with_link(void)
+{
+	struct test_sockmap_pass_prog *pass;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	int err, map;
+
+	pass = test_sockmap_pass_prog__open_and_load();
+	if (!ASSERT_OK_PTR(pass, "open_and_load"))
+		return;
+	prog = pass->progs.prog_skb_verdict;
+	map = bpf_map__fd(pass->maps.sock_map_rx);
+	link = bpf_program__attach_sockmap(prog, map);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
+		goto out;
+
+	err = bpf_link__update_program(link, pass->progs.prog_skb_verdict_clone);
+	if (!ASSERT_OK(err, "bpf_link__update_program"))
+		goto out;
+
+	test_sockmap_skb_verdict_peek_helper(map);
+	ASSERT_EQ(pass->bss->clone_called, 1, "clone_called");
+out:
+	bpf_link__detach(link);
+	test_sockmap_pass_prog__destroy(pass);
+}
+
 static void test_sockmap_unconnected_unix(void)
 {
 	int err, map, stream = 0, dgram = 0, zero = 0;
@@ -796,6 +878,8 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_attach(BPF_SK_SKB_STREAM_VERDICT,
 						BPF_SK_SKB_VERDICT);
 	}
+	if (test__start_subtest("sockmap skb_verdict attach_with_link"))
+		test_sockmap_skb_verdict_attach_with_link();
 	if (test__start_subtest("sockmap msg_verdict progs query"))
 		test_sockmap_progs_query(BPF_SK_MSG_VERDICT);
 	if (test__start_subtest("sockmap stream_parser progs query"))
@@ -812,6 +896,8 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_fionread(false);
 	if (test__start_subtest("sockmap skb_verdict msg_f_peek"))
 		test_sockmap_skb_verdict_peek();
+	if (test__start_subtest("sockmap skb_verdict msg_f_peek with link"))
+		test_sockmap_skb_verdict_peek_with_link();
 	if (test__start_subtest("sockmap unconnected af_unix"))
 		test_sockmap_unconnected_unix();
 	if (test__start_subtest("sockmap one socket to many map entries"))
@@ -820,4 +906,8 @@ void test_sockmap_basic(void)
 		test_sockmap_many_maps();
 	if (test__start_subtest("sockmap same socket replace"))
 		test_sockmap_same_sock();
+	if (test__start_subtest("sockmap sk_msg attach sockmap helpers with link"))
+		test_skmsg_helpers_with_link(BPF_MAP_TYPE_SOCKMAP);
+	if (test__start_subtest("sockhash sk_msg attach sockhash helpers with link"))
+		test_skmsg_helpers_with_link(BPF_MAP_TYPE_SOCKHASH);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index a92807bfcd13..54c93eee1808 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -767,6 +767,24 @@ static void test_msg_redir_to_connected(struct test_sockmap_listen *skel,
 	xbpf_prog_detach2(verdict, sock_map, BPF_SK_MSG_VERDICT);
 }
 
+static void test_msg_redir_to_connected_with_link(struct test_sockmap_listen *skel,
+						  struct bpf_map *inner_map, int family,
+						  int sotype)
+{
+	struct bpf_program *verdict = skel->progs.prog_msg_verdict;
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	struct bpf_link *link;
+
+	link = bpf_program__attach_sockmap(verdict, sock_map);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
+		return;
+
+	redir_to_connected(family, sotype, sock_map, verdict_map, REDIR_EGRESS);
+
+	bpf_link__detach(link);
+}
+
 static void redir_to_listening(int family, int sotype, int sock_mapfd,
 			       int verd_mapfd, enum redir_mode mode)
 {
@@ -869,6 +887,24 @@ static void test_msg_redir_to_listening(struct test_sockmap_listen *skel,
 	xbpf_prog_detach2(verdict, sock_map, BPF_SK_MSG_VERDICT);
 }
 
+static void test_msg_redir_to_listening_with_link(struct test_sockmap_listen *skel,
+						  struct bpf_map *inner_map, int family,
+						  int sotype)
+{
+	struct bpf_program *verdict = skel->progs.prog_msg_verdict;
+	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
+	int sock_map = bpf_map__fd(inner_map);
+	struct bpf_link *link;
+
+	link = bpf_program__attach_sockmap(verdict, sock_map);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
+		return;
+
+	redir_to_listening(family, sotype, sock_map, verdict_map, REDIR_EGRESS);
+
+	bpf_link__detach(link);
+}
+
 static void redir_partial(int family, int sotype, int sock_map, int parser_map)
 {
 	int s, c0 = -1, c1 = -1, p0 = -1, p1 = -1;
@@ -1316,7 +1352,9 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
 		TEST(test_skb_redir_to_listening),
 		TEST(test_skb_redir_partial),
 		TEST(test_msg_redir_to_connected),
+		TEST(test_msg_redir_to_connected_with_link),
 		TEST(test_msg_redir_to_listening),
+		TEST(test_msg_redir_to_listening_with_link),
 	};
 	const char *family_name, *map_name;
 	const struct redir_test *t;
diff --git a/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c b/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
index b753672f04c9..2914fbf1b05b 100644
--- a/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_skmsg_load_helpers.c
@@ -49,4 +49,10 @@ int prog_msg_verdict(struct sk_msg_md *msg)
 	return prog_msg_verdict_common(msg);
 }
 
+SEC("sk_msg")
+int prog_msg_verdict_clone(struct sk_msg_md *msg)
+{
+	return prog_msg_verdict_common(msg);
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c
index 1d86a717a290..032c3c6310d4 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c
@@ -23,10 +23,19 @@ struct {
 	__type(value, int);
 } sock_map_msg SEC(".maps");
 
-SEC("sk_skb")
+SEC("sk_skb/stream_verdict")
 int prog_skb_verdict(struct __sk_buff *skb)
 {
 	return SK_PASS;
 }
 
+int clone_called;
+
+SEC("sk_skb/stream_verdict")
+int prog_skb_verdict_clone(struct __sk_buff *skb)
+{
+	clone_called = 1;
+	return SK_PASS;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_skb_verdict_attach.c b/tools/testing/selftests/bpf/progs/test_sockmap_skb_verdict_attach.c
index 3c69aa971738..d25b0bb30fc0 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_skb_verdict_attach.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_skb_verdict_attach.c
@@ -9,7 +9,7 @@ struct {
 	__type(value, __u64);
 } sock_map SEC(".maps");
 
-SEC("sk_skb")
+SEC("sk_skb/verdict")
 int prog_skb_verdict(struct __sk_buff *skb)
 {
 	return SK_DROP;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP
  2024-03-26  2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
@ 2024-03-27 11:58   ` Quentin Monnet
  0 siblings, 0 replies; 26+ messages in thread
From: Quentin Monnet @ 2024-03-27 11:58 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

2024-03-26 02:22 UTC+0000 ~ Yonghong Song <yonghong.song@linux.dev>
> An example output looks like:
>   $ bpftool link
>     1776: sk_skb  prog 49730
>             map_id 0  attach_type sk_skb_verdict
>             pids test_progs(8424)
>     1777: sk_skb  prog 49755
>             map_id 0  attach_type sk_skb_stream_verdict
>             pids test_progs(8424)
>     1778: sk_msg  prog 49770
>             map_id 8208  attach_type sk_msg_verdict
>             pids test_progs(8424)
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>

Reviewed-by: Quentin Monnet <qmo@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs
  2024-03-26  2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
@ 2024-04-02 13:17   ` Eduard Zingerman
  2024-04-02 18:56     ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-04-02 13:17 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, 2024-03-25 at 19:22 -0700, Yonghong Song wrote:
> Add a few more tests in sockmap_basic.c and sockmap_listen.c to
> test bpf_link based APIs for SK_MSG and SK_SKB programs.
> Link attach/detach/update are all tested.
> 
> All tests are passed.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

As far as I understand patch #1 there are several error conditions
that are not covered by these tests:
- update link using program with wrong expected attach type;
- bpf_program__attach_sockmap() _after_ bpf_prog_attach();
- creation of a second link between same prog and map
  (if I understand 'link && link != progs->msg_parser_link' checks in
   sock_map_link_lookup() correctly).

Are these important enough to have a dedicated test?

[...]

> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> index 63fb2da7930a..b4b38f92b864 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> @@ -131,6 +131,40 @@ static void test_skmsg_helpers(enum bpf_map_type map_type)
>  	test_skmsg_load_helpers__destroy(skel);
>  }
>  
> +static void test_skmsg_helpers_with_link(enum bpf_map_type map_type)
> +{
> +	struct test_skmsg_load_helpers *skel;
> +	struct bpf_link *link, *link2;
> +	struct bpf_program *prog;
> +	int err, map;
> +
> +	skel = test_skmsg_load_helpers__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "test_skmsg_load_helpers__open_and_load"))
> +		return;
> +
> +	prog = skel->progs.prog_msg_verdict;
> +	map = bpf_map__fd(skel->maps.sock_map);
> +
> +	link = bpf_program__attach_sockmap(prog, map);
> +	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
> +		goto out;
> +
> +	err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_MSG_VERDICT, 0);
> +	if (!ASSERT_ERR(err, "bpf_prog_attach"))
> +		goto out;
> +
> +	link2 = bpf_program__attach_sockmap(skel->progs.prog_msg_verdict_clone, map);
> +	if (!ASSERT_ERR_PTR(link2, "bpf_program__attach_sockmap"))
> +		goto out;

Nit: should bpf_link__detach(link2) be called before 'goto out' ? 

> +
> +	err = bpf_link__update_program(link, skel->progs.prog_msg_verdict_clone);
> +	ASSERT_OK(err, "bpf_link__update_program");
> +
> +out:
> +	bpf_link__detach(link);
> +	test_skmsg_load_helpers__destroy(skel);
> +}
> +
>  static void test_sockmap_update(enum bpf_map_type map_type)
>  {
>  	int err, prog, src;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests
  2024-03-26  2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
@ 2024-04-02 13:18   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-04-02 13:18 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, 2024-03-25 at 19:22 -0700, Yonghong Song wrote:
> These helper functions will be used later new tests as well.
> There are no functionality change.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
  2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
@ 2024-04-02 13:18   ` Eduard Zingerman
  2024-04-02 17:46   ` Andrii Nakryiko
  1 sibling, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-04-02 13:18 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, 2024-03-25 at 19:22 -0700, Yonghong Song wrote:
> Introduce a libbpf API function bpf_program__attach_sockmap()
> which allow user to get a bpf_link for their corresponding programs.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
@ 2024-04-02 17:39   ` Eduard Zingerman
  2024-04-03  0:06     ` Yonghong Song
  2024-04-02 17:45   ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-04-02 17:39 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, 2024-03-25 at 19:21 -0700, Yonghong Song wrote:
[...]

> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 27d733c0f65e..dafc9aa6e192 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c

[...]

> @@ -1488,21 +1492,90 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
>  	return 0;
>  }
>  
> +static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
> +				struct bpf_link *link, bool skip_check, u32 which)
> +{
> +	struct sk_psock_progs *progs = sock_map_progs(map);
> +
> +	switch (which) {
> +	case BPF_SK_MSG_VERDICT:
> +		if (!skip_check &&
> +		    ((!link && progs->msg_parser_link) ||
> +		     (link && link != progs->msg_parser_link)))
> +			return -EBUSY;

These checks seem a bit repetitive, maybe factor it out as a single
check at the end of the function? E.g.:

	if (!skip_check &&
	    ((!link && **plink) || (link && link != **plink)))
		return -EBUSY;

Or inline these checks at call sites for sock_map_link_lookup()?
I tried this on top of this in [1] and all tests seem to pass.

[1] https://gist.github.com/eddyz87/38d832b3f1fc74120598d3480bc16ae1

> +		*plink = &progs->msg_parser_link;
> +		break;
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> +	case BPF_SK_SKB_STREAM_PARSER:
> +		if (!skip_check &&
> +		    ((!link && progs->stream_parser_link) ||
> +		     (link && link != progs->stream_parser_link)))
> +			return -EBUSY;
> +		*plink = &progs->stream_parser_link;
> +		break;
> +#endif
> +	case BPF_SK_SKB_STREAM_VERDICT:
> +		if (!skip_check &&
> +		    ((!link && progs->stream_verdict_link) ||
> +		     (link && link != progs->stream_verdict_link)))
> +			return -EBUSY;
> +		*plink = &progs->stream_verdict_link;
> +		break;
> +	case BPF_SK_SKB_VERDICT:
> +		if (!skip_check &&
> +		    ((!link && progs->skb_verdict_link) ||
> +		     (link && link != progs->skb_verdict_link)))
> +			return -EBUSY;
> +		*plink = &progs->skb_verdict_link;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}

[...]

> +/* Handle the following two cases:
> + * case 1: link != NULL, prog != NULL, old != NULL
> + * case 2: link != NULL, prog != NULL, old == NULL
> + */
> +static int sock_map_link_update_prog(struct bpf_link *link,
> +				     struct bpf_prog *prog,
> +				     struct bpf_prog *old)
> +{
> +	const struct sockmap_link *sockmap_link = get_sockmap_link(link);
> +	struct bpf_prog **pprog;
> +	struct bpf_link **plink;
> +	int ret = 0;
> +
> +	mutex_lock(&sockmap_prog_update_mutex);
> +
> +	/* If old prog not NULL, ensure old prog the same as link->prog. */
> +	if (old && link->prog != old) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	/* Ensure link->prog has the same type/attach_type as the new prog. */
> +	if (link->prog->type != prog->type ||
> +	    link->prog->expected_attach_type != prog->expected_attach_type) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = sock_map_prog_lookup(sockmap_link->map, &pprog,
> +				   sockmap_link->attach_type);
> +	if (ret)
> +		goto out;
> +
> +	/* Ensure the same link between the one in map and the passed-in. */
> +	ret = sock_map_link_lookup(sockmap_link->map, &plink, link, false,
> +				   sockmap_link->attach_type);
> +	if (ret)
> +		goto out;
> +
> +	if (old)
> +		return psock_replace_prog(pprog, prog, old);

should this be 'goto out' in order to unlock the mutex?

> +
> +	psock_set_prog(pprog, prog);
> +
> +out:
> +	if (!ret)
> +		bpf_prog_inc(prog);
> +	mutex_unlock(&sockmap_prog_update_mutex);
> +	return ret;
> +}

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
  2024-04-02 17:39   ` Eduard Zingerman
@ 2024-04-02 17:45   ` Andrii Nakryiko
  2024-04-03  1:08     ` Yonghong Song
  1 sibling, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2024-04-02 17:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Add bpf_link support for sk_msg and sk_skb programs. We have an
> internal request to support bpf_link for sk_msg programs so user
> space can have a uniform handling with bpf_link based libbpf
> APIs. Using bpf_link based libbpf API also has a benefit which
> makes system robust by decoupling prog life cycle and
> attachment life cycle.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h            |   6 +
>  include/linux/skmsg.h          |   4 +
>  include/uapi/linux/bpf.h       |   5 +
>  kernel/bpf/syscall.c           |   4 +
>  net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
>  tools/include/uapi/linux/bpf.h |   5 +
>  6 files changed, 279 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 62762390c93d..5034c1b4ded7 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2996,6 +2996,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
>  int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
>  int sock_map_bpf_prog_query(const union bpf_attr *attr,
>                             union bpf_attr __user *uattr);
> +int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
>
>  void sock_map_unhash(struct sock *sk);
>  void sock_map_destroy(struct sock *sk);
> @@ -3094,6 +3095,11 @@ static inline int sock_map_bpf_prog_query(const union bpf_attr *attr,
>  {
>         return -EINVAL;
>  }
> +
> +static inline int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +       return -EOPNOTSUPP;
> +}
>  #endif /* CONFIG_BPF_SYSCALL */
>  #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index e65ec3fd2799..9c8dd4c01412 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -58,6 +58,10 @@ struct sk_psock_progs {
>         struct bpf_prog                 *stream_parser;
>         struct bpf_prog                 *stream_verdict;
>         struct bpf_prog                 *skb_verdict;
> +       struct bpf_link                 *msg_parser_link;
> +       struct bpf_link                 *stream_parser_link;
> +       struct bpf_link                 *stream_verdict_link;
> +       struct bpf_link                 *skb_verdict_link;
>  };
>
>  enum sk_psock_state_bits {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 9585f5345353..31660c3ffc01 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
>         BPF_LINK_TYPE_TCX = 11,
>         BPF_LINK_TYPE_UPROBE_MULTI = 12,
>         BPF_LINK_TYPE_NETKIT = 13,
> +       BPF_LINK_TYPE_SOCKMAP = 14,
>         __MAX_BPF_LINK_TYPE,
>  };
>
> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
>                         __u32 ifindex;
>                         __u32 attach_type;
>                 } netkit;
> +               struct {
> +                       __u32 map_id;
> +                       __u32 attach_type;
> +               } sockmap;
>         };
>  } __attribute__((aligned(8)));
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index e44c276e8617..7d392ec83655 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
>         case BPF_PROG_TYPE_SK_LOOKUP:
>                 ret = netns_bpf_link_create(attr, prog);
>                 break;
> +       case BPF_PROG_TYPE_SK_MSG:
> +       case BPF_PROG_TYPE_SK_SKB:
> +               ret = sock_map_link_create(attr, prog);
> +               break;
>  #ifdef CONFIG_NET
>         case BPF_PROG_TYPE_XDP:
>                 ret = bpf_xdp_link_attach(attr, prog);
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 27d733c0f65e..dafc9aa6e192 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -24,8 +24,12 @@ struct bpf_stab {
>  #define SOCK_CREATE_FLAG_MASK                          \
>         (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
>
> +static DEFINE_MUTEX(sockmap_prog_update_mutex);
> +static DEFINE_MUTEX(sockmap_link_mutex);
> +
>  static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
> -                               struct bpf_prog *old, u32 which);
> +                               struct bpf_prog *old, struct bpf_link *link,
> +                               u32 which);
>  static struct sk_psock_progs *sock_map_progs(struct bpf_map *map);
>
>  static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
> @@ -71,7 +75,7 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
>         map = __bpf_map_get(f);
>         if (IS_ERR(map))
>                 return PTR_ERR(map);
> -       ret = sock_map_prog_update(map, prog, NULL, attr->attach_type);
> +       ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
>         fdput(f);
>         return ret;
>  }
> @@ -103,7 +107,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>                 goto put_prog;
>         }
>
> -       ret = sock_map_prog_update(map, NULL, prog, attr->attach_type);
> +       ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
>  put_prog:
>         bpf_prog_put(prog);
>  put_map:
> @@ -1488,21 +1492,90 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
>         return 0;
>  }
>
> +static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
> +                               struct bpf_link *link, bool skip_check, u32 which)
> +{
> +       struct sk_psock_progs *progs = sock_map_progs(map);
> +
> +       switch (which) {
> +       case BPF_SK_MSG_VERDICT:
> +               if (!skip_check &&
> +                   ((!link && progs->msg_parser_link) ||
> +                    (link && link != progs->msg_parser_link)))
> +                       return -EBUSY;
> +               *plink = &progs->msg_parser_link;
> +               break;
> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> +       case BPF_SK_SKB_STREAM_PARSER:
> +               if (!skip_check &&
> +                   ((!link && progs->stream_parser_link) ||
> +                    (link && link != progs->stream_parser_link)))
> +                       return -EBUSY;
> +               *plink = &progs->stream_parser_link;
> +               break;
> +#endif
> +       case BPF_SK_SKB_STREAM_VERDICT:
> +               if (!skip_check &&
> +                   ((!link && progs->stream_verdict_link) ||
> +                    (link && link != progs->stream_verdict_link)))
> +                       return -EBUSY;
> +               *plink = &progs->stream_verdict_link;
> +               break;
> +       case BPF_SK_SKB_VERDICT:
> +               if (!skip_check &&
> +                   ((!link && progs->skb_verdict_link) ||
> +                    (link && link != progs->skb_verdict_link)))
> +                       return -EBUSY;
> +               *plink = &progs->skb_verdict_link;
> +               break;
> +       default:
> +               return -EOPNOTSUPP;
> +       }
> +

you can simplify this by

struct bpf_link *cur_link;

switch (which) {
case BPF_SK_MSG_VERDICT:
    cur_link = progs->msg_parser_link;
    break;
case ...

}

and then perform that condition validating link and cur_link just once.


> +       return 0;
> +}
> +
> +/* Handle the following four cases:
> + * prog_attach: prog != NULL, old == NULL, link == NULL
> + * prog_detach: prog == NULL, old != NULL, link == NULL
> + * link_attach: prog != NULL, old == NULL, link != NULL
> + * link_detach: prog == NULL, old != NULL, link != NULL
> + */
>  static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
> -                               struct bpf_prog *old, u32 which)
> +                               struct bpf_prog *old, struct bpf_link *link,
> +                               u32 which)
>  {
>         struct bpf_prog **pprog;
> +       struct bpf_link **plink;
>         int ret;
>
> +       mutex_lock(&sockmap_prog_update_mutex);
> +
>         ret = sock_map_prog_lookup(map, &pprog, which);
>         if (ret)
> -               return ret;
> +               goto out;
>
> -       if (old)
> -               return psock_replace_prog(pprog, prog, old);
> +       if (!link || prog)
> +               ret = sock_map_link_lookup(map, &plink, NULL, false, which);
> +       else
> +               ret = sock_map_link_lookup(map, &plink, NULL, true, which);

it feels like it would be cleaner if sock_map_link_lookup() would just
return already set link (based on which), and then you'd check update
logic with prog vs link right here

e.g., it's quite obfuscated that we validate that there is no link set
currently (this code doesn't do anything with plink).

anyways, I find it a bit hard to follow as it's written, but I'll
defer to John to make a final call on logic

> +       if (ret)
> +               goto out;
> +
> +       if (old) {
> +               ret = psock_replace_prog(pprog, prog, old);
> +               if (!ret)
> +                       *plink = NULL;
> +               goto out;
> +       }
>
>         psock_set_prog(pprog, prog);
> -       return 0;
> +       if (link)
> +               *plink = link;
> +
> +out:
> +       mutex_unlock(&sockmap_prog_update_mutex);

why this mutex is not per-sockmap?

> +       return ret;
>  }
>
>  int sock_map_bpf_prog_query(const union bpf_attr *attr,
> @@ -1657,6 +1730,180 @@ void sock_map_close(struct sock *sk, long timeout)
>  }
>  EXPORT_SYMBOL_GPL(sock_map_close);
>
> +struct sockmap_link {
> +       struct bpf_link link;
> +       struct bpf_map *map;
> +       enum bpf_attach_type attach_type;
> +};
> +
> +static struct sockmap_link *get_sockmap_link(const struct bpf_link *link)
> +{
> +       return container_of(link, struct sockmap_link, link);
> +}

nit: do you really need this helper? container_of() is a pretty
straightforward by itself, imo

> +
> +static void sock_map_link_release(struct bpf_link *link)
> +{
> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> +
> +       mutex_lock(&sockmap_link_mutex);

similar to the above, why is this mutex not sockmap-specific? And I'd
just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
case to keep it simple.

> +       if (sockmap_link->map) {
> +               (void)sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
> +                                          sockmap_link->attach_type);

WARN() if it's not meant to ever fail?

> +               bpf_map_put_with_uref(sockmap_link->map);
> +               sockmap_link->map = NULL;
> +       }
> +       mutex_unlock(&sockmap_link_mutex);
> +}
> +
> +static int sock_map_link_detach(struct bpf_link *link)
> +{
> +       sock_map_link_release(link);
> +       return 0;
> +}
> +
> +static void sock_map_link_dealloc(struct bpf_link *link)
> +{
> +       kfree(get_sockmap_link(link));
> +}
> +
> +/* Handle the following two cases:
> + * case 1: link != NULL, prog != NULL, old != NULL
> + * case 2: link != NULL, prog != NULL, old == NULL
> + */
> +static int sock_map_link_update_prog(struct bpf_link *link,
> +                                    struct bpf_prog *prog,
> +                                    struct bpf_prog *old)
> +{
> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
> +       struct bpf_prog **pprog;
> +       struct bpf_link **plink;
> +       int ret = 0;
> +
> +       mutex_lock(&sockmap_prog_update_mutex);
> +
> +       /* If old prog not NULL, ensure old prog the same as link->prog. */

typo: "is not NULL", "is the same"

> +       if (old && link->prog != old) {

hm.. even if old matches link->prog, we should unset old and set new
link (link overrides prog attachment, basically), it shouldn't matter
if old == link->prog, unless I'm missing something?

> +               ret = -EINVAL;
> +               goto out;
> +       }
> +       /* Ensure link->prog has the same type/attach_type as the new prog. */
> +       if (link->prog->type != prog->type ||
> +           link->prog->expected_attach_type != prog->expected_attach_type) {
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       ret = sock_map_prog_lookup(sockmap_link->map, &pprog,
> +                                  sockmap_link->attach_type);
> +       if (ret)
> +               goto out;
> +
> +       /* Ensure the same link between the one in map and the passed-in. */
> +       ret = sock_map_link_lookup(sockmap_link->map, &plink, link, false,
> +                                  sockmap_link->attach_type);
> +       if (ret)
> +               goto out;
> +
> +       if (old)
> +               return psock_replace_prog(pprog, prog, old);
> +
> +       psock_set_prog(pprog, prog);
> +
> +out:
> +       if (!ret)
> +               bpf_prog_inc(prog);
> +       mutex_unlock(&sockmap_prog_update_mutex);
> +       return ret;
> +}
> +
> +static u32 sock_map_link_get_map_id(const struct sockmap_link *sockmap_link)
> +{
> +       u32 map_id = 0;
> +
> +       mutex_lock(&sockmap_link_mutex);
> +       if (sockmap_link->map)
> +               map_id = sockmap_link->map->id;
> +       mutex_unlock(&sockmap_link_mutex);
> +       return map_id;
> +}
> +
> +static int sock_map_link_fill_info(const struct bpf_link *link,
> +                                  struct bpf_link_info *info)
> +{
> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
> +       u32 map_id = sock_map_link_get_map_id(sockmap_link);
> +
> +       info->sockmap.map_id = map_id;
> +       info->sockmap.attach_type = sockmap_link->attach_type;
> +       return 0;
> +}
> +
> +static void sock_map_link_show_fdinfo(const struct bpf_link *link,
> +                                     struct seq_file *seq)
> +{
> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
> +       u32 map_id = sock_map_link_get_map_id(sockmap_link);
> +
> +       seq_printf(seq, "map_id:\t%u\n", map_id);
> +       seq_printf(seq, "attach_type:\t%u\n", sockmap_link->attach_type);
> +}
> +
> +static const struct bpf_link_ops sock_map_link_ops = {
> +       .release = sock_map_link_release,
> +       .dealloc = sock_map_link_dealloc,
> +       .detach = sock_map_link_detach,
> +       .update_prog = sock_map_link_update_prog,
> +       .fill_link_info = sock_map_link_fill_info,
> +       .show_fdinfo = sock_map_link_show_fdinfo,
> +};
> +
> +int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +       struct bpf_link_primer link_primer;
> +       struct sockmap_link *sockmap_link;
> +       enum bpf_attach_type attach_type;
> +       struct bpf_map *map;
> +       int ret;
> +
> +       if (attr->link_create.flags)
> +               return -EINVAL;
> +
> +       map = bpf_map_get_with_uref(attr->link_create.target_fd);
> +       if (IS_ERR(map))
> +               return PTR_ERR(map);

check that map is SOCKMAP?

> +
> +       sockmap_link = kzalloc(sizeof(*sockmap_link), GFP_USER);
> +       if (!sockmap_link) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
> +       attach_type = attr->link_create.attach_type;
> +       bpf_link_init(&sockmap_link->link, BPF_LINK_TYPE_SOCKMAP, &sock_map_link_ops, prog);
> +       sockmap_link->map = map;
> +       sockmap_link->attach_type = attach_type;
> +
> +       ret = bpf_link_prime(&sockmap_link->link, &link_primer);
> +       if (ret) {
> +               kfree(sockmap_link);
> +               goto out;
> +       }
> +
> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> +       if (ret) {
> +               bpf_link_cleanup(&link_primer);
> +               goto out;
> +       }
> +
> +       bpf_prog_inc(prog);

if link was created successfully, it "inherits" prog's refcnt, so you
shouldn't do another bpf_prog_inc()? generic link_create() logic puts
prog only if this function returns error

> +
> +       return bpf_link_settle(&link_primer);
> +
> +out:
> +       bpf_map_put_with_uref(map);
> +       return ret;
> +}
> +
>  static int sock_map_iter_attach_target(struct bpf_prog *prog,
>                                        union bpf_iter_link_info *linfo,
>                                        struct bpf_iter_aux_info *aux)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 9585f5345353..31660c3ffc01 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
>         BPF_LINK_TYPE_TCX = 11,
>         BPF_LINK_TYPE_UPROBE_MULTI = 12,
>         BPF_LINK_TYPE_NETKIT = 13,
> +       BPF_LINK_TYPE_SOCKMAP = 14,
>         __MAX_BPF_LINK_TYPE,
>  };
>
> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
>                         __u32 ifindex;
>                         __u32 attach_type;
>                 } netkit;
> +               struct {
> +                       __u32 map_id;
> +                       __u32 attach_type;
> +               } sockmap;
>         };
>  } __attribute__((aligned(8)));
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
  2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
  2024-04-02 13:18   ` Eduard Zingerman
@ 2024-04-02 17:46   ` Andrii Nakryiko
  2024-04-03  0:07     ` Yonghong Song
  1 sibling, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2024-04-02 17:46 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Introduce a libbpf API function bpf_program__attach_sockmap()
> which allow user to get a bpf_link for their corresponding programs.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  tools/lib/bpf/libbpf.c   | 7 +++++++
>  tools/lib/bpf/libbpf.h   | 2 ++
>  tools/lib/bpf/libbpf.map | 1 +
>  3 files changed, 10 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8eb2cd4ef288..e1bb9ba3d1a5 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -149,6 +149,7 @@ static const char * const link_type_name[] = {
>         [BPF_LINK_TYPE_TCX]                     = "tcx",
>         [BPF_LINK_TYPE_UPROBE_MULTI]            = "uprobe_multi",
>         [BPF_LINK_TYPE_NETKIT]                  = "netkit",
> +       [BPF_LINK_TYPE_SOCKMAP]                 = "sockmap",
>  };
>
>  static const char * const map_type_name[] = {
> @@ -12507,6 +12508,12 @@ bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
>         return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
>  }
>
> +struct bpf_link *
> +bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd)
> +{
> +       return bpf_program_attach_fd(prog, map_fd, "sockmap", NULL);
> +}
> +
>  struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
>  {
>         /* target_fd/target_ifindex use the same field in LINK_CREATE */
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index f88ab50c0229..4c7ada03bf4f 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -795,6 +795,8 @@ bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
>  LIBBPF_API struct bpf_link *
>  bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
>  LIBBPF_API struct bpf_link *
> +bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd);
> +LIBBPF_API struct bpf_link *
>  bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
>  LIBBPF_API struct bpf_link *
>  bpf_program__attach_freplace(const struct bpf_program *prog,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 51732ecb1385..2e5c3429b974 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -411,6 +411,7 @@ LIBBPF_1.3.0 {
>  LIBBPF_1.4.0 {
>         global:
>                 bpf_program__attach_raw_tracepoint_opts;
> +               bpf_program__attach_sockmap;

we are now in 1.5 development cycle, for next revision please put it
into a new section

>                 bpf_raw_tracepoint_open_opts;
>                 bpf_token_create;
>                 btf__new_split;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs
  2024-04-02 13:17   ` Eduard Zingerman
@ 2024-04-02 18:56     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-04-02 18:56 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau


On 4/2/24 6:17 AM, Eduard Zingerman wrote:
> On Mon, 2024-03-25 at 19:22 -0700, Yonghong Song wrote:
>> Add a few more tests in sockmap_basic.c and sockmap_listen.c to
>> test bpf_link based APIs for SK_MSG and SK_SKB programs.
>> Link attach/detach/update are all tested.
>>
>> All tests are passed.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> As far as I understand patch #1 there are several error conditions
> that are not covered by these tests:
> - update link using program with wrong expected attach type;
> - bpf_program__attach_sockmap() _after_ bpf_prog_attach();
> - creation of a second link between same prog and map
>    (if I understand 'link && link != progs->msg_parser_link' checks in
>     sock_map_link_lookup() correctly).
>
> Are these important enough to have a dedicated test?

All these are missed. I can add them in the next revision.

>
> [...]
>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>> index 63fb2da7930a..b4b38f92b864 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
>> @@ -131,6 +131,40 @@ static void test_skmsg_helpers(enum bpf_map_type map_type)
>>   	test_skmsg_load_helpers__destroy(skel);
>>   }
>>   
>> +static void test_skmsg_helpers_with_link(enum bpf_map_type map_type)
>> +{
>> +	struct test_skmsg_load_helpers *skel;
>> +	struct bpf_link *link, *link2;
>> +	struct bpf_program *prog;
>> +	int err, map;
>> +
>> +	skel = test_skmsg_load_helpers__open_and_load();
>> +	if (!ASSERT_OK_PTR(skel, "test_skmsg_load_helpers__open_and_load"))
>> +		return;
>> +
>> +	prog = skel->progs.prog_msg_verdict;
>> +	map = bpf_map__fd(skel->maps.sock_map);
>> +
>> +	link = bpf_program__attach_sockmap(prog, map);
>> +	if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap"))
>> +		goto out;
>> +
>> +	err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_MSG_VERDICT, 0);
>> +	if (!ASSERT_ERR(err, "bpf_prog_attach"))
>> +		goto out;
>> +
>> +	link2 = bpf_program__attach_sockmap(skel->progs.prog_msg_verdict_clone, map);
>> +	if (!ASSERT_ERR_PTR(link2, "bpf_program__attach_sockmap"))
>> +		goto out;
> Nit: should bpf_link__detach(link2) be called before 'goto out' ?

Yes, we should do this although it will auto detach when test_progs exists.

>
>> +
>> +	err = bpf_link__update_program(link, skel->progs.prog_msg_verdict_clone);
>> +	ASSERT_OK(err, "bpf_link__update_program");
>> +
>> +out:
>> +	bpf_link__detach(link);
>> +	test_skmsg_load_helpers__destroy(skel);
>> +}
>> +
>>   static void test_sockmap_update(enum bpf_map_type map_type)
>>   {
>>   	int err, prog, src;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-02 17:39   ` Eduard Zingerman
@ 2024-04-03  0:06     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-04-03  0:06 UTC (permalink / raw)
  To: Eduard Zingerman, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau


On 4/2/24 10:39 AM, Eduard Zingerman wrote:
> On Mon, 2024-03-25 at 19:21 -0700, Yonghong Song wrote:
> [...]
>
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 27d733c0f65e..dafc9aa6e192 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
> [...]
>
>> @@ -1488,21 +1492,90 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
>>   	return 0;
>>   }
>>   
>> +static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
>> +				struct bpf_link *link, bool skip_check, u32 which)
>> +{
>> +	struct sk_psock_progs *progs = sock_map_progs(map);
>> +
>> +	switch (which) {
>> +	case BPF_SK_MSG_VERDICT:
>> +		if (!skip_check &&
>> +		    ((!link && progs->msg_parser_link) ||
>> +		     (link && link != progs->msg_parser_link)))
>> +			return -EBUSY;
> These checks seem a bit repetitive, maybe factor it out as a single
> check at the end of the function? E.g.:
>
> 	if (!skip_check &&
> 	    ((!link && **plink) || (link && link != **plink)))
> 		return -EBUSY;
>
> Or inline these checks at call sites for sock_map_link_lookup()?
> I tried this on top of this in [1] and all tests seem to pass.

Andrii has a suggestion to do
	plink = progs->msg_parser_link;

and later plink can be used for checking. This indeed makes things easier.

>
> [1] https://gist.github.com/eddyz87/38d832b3f1fc74120598d3480bc16ae1
>
>> +		*plink = &progs->msg_parser_link;
>> +		break;
>> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> +	case BPF_SK_SKB_STREAM_PARSER:
>> +		if (!skip_check &&
>> +		    ((!link && progs->stream_parser_link) ||
>> +		     (link && link != progs->stream_parser_link)))
>> +			return -EBUSY;
>> +		*plink = &progs->stream_parser_link;
>> +		break;
>> +#endif
>> +	case BPF_SK_SKB_STREAM_VERDICT:
>> +		if (!skip_check &&
>> +		    ((!link && progs->stream_verdict_link) ||
>> +		     (link && link != progs->stream_verdict_link)))
>> +			return -EBUSY;
>> +		*plink = &progs->stream_verdict_link;
>> +		break;
>> +	case BPF_SK_SKB_VERDICT:
>> +		if (!skip_check &&
>> +		    ((!link && progs->skb_verdict_link) ||
>> +		     (link && link != progs->skb_verdict_link)))
>> +			return -EBUSY;
>> +		*plink = &progs->skb_verdict_link;
>> +		break;
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	return 0;
>> +}
> [...]
>
>> +/* Handle the following two cases:
>> + * case 1: link != NULL, prog != NULL, old != NULL
>> + * case 2: link != NULL, prog != NULL, old == NULL
>> + */
>> +static int sock_map_link_update_prog(struct bpf_link *link,
>> +				     struct bpf_prog *prog,
>> +				     struct bpf_prog *old)
>> +{
>> +	const struct sockmap_link *sockmap_link = get_sockmap_link(link);
>> +	struct bpf_prog **pprog;
>> +	struct bpf_link **plink;
>> +	int ret = 0;
>> +
>> +	mutex_lock(&sockmap_prog_update_mutex);
>> +
>> +	/* If old prog not NULL, ensure old prog the same as link->prog. */
>> +	if (old && link->prog != old) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +	/* Ensure link->prog has the same type/attach_type as the new prog. */
>> +	if (link->prog->type != prog->type ||
>> +	    link->prog->expected_attach_type != prog->expected_attach_type) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	ret = sock_map_prog_lookup(sockmap_link->map, &pprog,
>> +				   sockmap_link->attach_type);
>> +	if (ret)
>> +		goto out;
>> +
>> +	/* Ensure the same link between the one in map and the passed-in. */
>> +	ret = sock_map_link_lookup(sockmap_link->map, &plink, link, false,
>> +				   sockmap_link->attach_type);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (old)
>> +		return psock_replace_prog(pprog, prog, old);
> should this be 'goto out' in order to unlock the mutex?

Good point. I missed a test case with non-NULL old. Will add in the next revision.

>
>> +
>> +	psock_set_prog(pprog, prog);
>> +
>> +out:
>> +	if (!ret)
>> +		bpf_prog_inc(prog);
>> +	mutex_unlock(&sockmap_prog_update_mutex);
>> +	return ret;
>> +}
> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP
  2024-04-02 17:46   ` Andrii Nakryiko
@ 2024-04-03  0:07     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2024-04-03  0:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau


On 4/2/24 10:46 AM, Andrii Nakryiko wrote:
> On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Introduce a libbpf API function bpf_program__attach_sockmap()
>> which allow user to get a bpf_link for their corresponding programs.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   tools/lib/bpf/libbpf.c   | 7 +++++++
>>   tools/lib/bpf/libbpf.h   | 2 ++
>>   tools/lib/bpf/libbpf.map | 1 +
>>   3 files changed, 10 insertions(+)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 8eb2cd4ef288..e1bb9ba3d1a5 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -149,6 +149,7 @@ static const char * const link_type_name[] = {
>>          [BPF_LINK_TYPE_TCX]                     = "tcx",
>>          [BPF_LINK_TYPE_UPROBE_MULTI]            = "uprobe_multi",
>>          [BPF_LINK_TYPE_NETKIT]                  = "netkit",
>> +       [BPF_LINK_TYPE_SOCKMAP]                 = "sockmap",
>>   };
>>
>>   static const char * const map_type_name[] = {
>> @@ -12507,6 +12508,12 @@ bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
>>          return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
>>   }
>>
>> +struct bpf_link *
>> +bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd)
>> +{
>> +       return bpf_program_attach_fd(prog, map_fd, "sockmap", NULL);
>> +}
>> +
>>   struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
>>   {
>>          /* target_fd/target_ifindex use the same field in LINK_CREATE */
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index f88ab50c0229..4c7ada03bf4f 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -795,6 +795,8 @@ bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
>>   LIBBPF_API struct bpf_link *
>>   bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
>>   LIBBPF_API struct bpf_link *
>> +bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd);
>> +LIBBPF_API struct bpf_link *
>>   bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
>>   LIBBPF_API struct bpf_link *
>>   bpf_program__attach_freplace(const struct bpf_program *prog,
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index 51732ecb1385..2e5c3429b974 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -411,6 +411,7 @@ LIBBPF_1.3.0 {
>>   LIBBPF_1.4.0 {
>>          global:
>>                  bpf_program__attach_raw_tracepoint_opts;
>> +               bpf_program__attach_sockmap;
> we are now in 1.5 development cycle, for next revision please put it
> into a new section

Ack. will do.

>
>>                  bpf_raw_tracepoint_open_opts;
>>                  bpf_token_create;
>>                  btf__new_split;
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-02 17:45   ` Andrii Nakryiko
@ 2024-04-03  1:08     ` Yonghong Song
  2024-04-03 16:43       ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-04-03  1:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau


On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Add bpf_link support for sk_msg and sk_skb programs. We have an
>> internal request to support bpf_link for sk_msg programs so user
>> space can have a uniform handling with bpf_link based libbpf
>> APIs. Using bpf_link based libbpf API also has a benefit which
>> makes system robust by decoupling prog life cycle and
>> attachment life cycle.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h            |   6 +
>>   include/linux/skmsg.h          |   4 +
>>   include/uapi/linux/bpf.h       |   5 +
>>   kernel/bpf/syscall.c           |   4 +
>>   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
>>   tools/include/uapi/linux/bpf.h |   5 +
>>   6 files changed, 279 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 62762390c93d..5034c1b4ded7 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2996,6 +2996,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
>>   int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
>>   int sock_map_bpf_prog_query(const union bpf_attr *attr,
>>                              union bpf_attr __user *uattr);
>> +int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
>>
>>   void sock_map_unhash(struct sock *sk);
>>   void sock_map_destroy(struct sock *sk);
>> @@ -3094,6 +3095,11 @@ static inline int sock_map_bpf_prog_query(const union bpf_attr *attr,
>>   {
>>          return -EINVAL;
>>   }
>> +
>> +static inline int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>>   #endif /* CONFIG_BPF_SYSCALL */
>>   #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
>>
>> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
>> index e65ec3fd2799..9c8dd4c01412 100644
>> --- a/include/linux/skmsg.h
>> +++ b/include/linux/skmsg.h
>> @@ -58,6 +58,10 @@ struct sk_psock_progs {
>>          struct bpf_prog                 *stream_parser;
>>          struct bpf_prog                 *stream_verdict;
>>          struct bpf_prog                 *skb_verdict;
>> +       struct bpf_link                 *msg_parser_link;
>> +       struct bpf_link                 *stream_parser_link;
>> +       struct bpf_link                 *stream_verdict_link;
>> +       struct bpf_link                 *skb_verdict_link;
>>   };
>>
>>   enum sk_psock_state_bits {
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 9585f5345353..31660c3ffc01 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
>>          BPF_LINK_TYPE_TCX = 11,
>>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
>>          BPF_LINK_TYPE_NETKIT = 13,
>> +       BPF_LINK_TYPE_SOCKMAP = 14,
>>          __MAX_BPF_LINK_TYPE,
>>   };
>>
>> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
>>                          __u32 ifindex;
>>                          __u32 attach_type;
>>                  } netkit;
>> +               struct {
>> +                       __u32 map_id;
>> +                       __u32 attach_type;
>> +               } sockmap;
>>          };
>>   } __attribute__((aligned(8)));
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index e44c276e8617..7d392ec83655 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
>>          case BPF_PROG_TYPE_SK_LOOKUP:
>>                  ret = netns_bpf_link_create(attr, prog);
>>                  break;
>> +       case BPF_PROG_TYPE_SK_MSG:
>> +       case BPF_PROG_TYPE_SK_SKB:
>> +               ret = sock_map_link_create(attr, prog);
>> +               break;
>>   #ifdef CONFIG_NET
>>          case BPF_PROG_TYPE_XDP:
>>                  ret = bpf_xdp_link_attach(attr, prog);
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 27d733c0f65e..dafc9aa6e192 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -24,8 +24,12 @@ struct bpf_stab {
>>   #define SOCK_CREATE_FLAG_MASK                          \
>>          (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
>>
>> +static DEFINE_MUTEX(sockmap_prog_update_mutex);
>> +static DEFINE_MUTEX(sockmap_link_mutex);
>> +
>>   static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
>> -                               struct bpf_prog *old, u32 which);
>> +                               struct bpf_prog *old, struct bpf_link *link,
>> +                               u32 which);
>>   static struct sk_psock_progs *sock_map_progs(struct bpf_map *map);
>>
>>   static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>> @@ -71,7 +75,7 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
>>          map = __bpf_map_get(f);
>>          if (IS_ERR(map))
>>                  return PTR_ERR(map);
>> -       ret = sock_map_prog_update(map, prog, NULL, attr->attach_type);
>> +       ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
>>          fdput(f);
>>          return ret;
>>   }
>> @@ -103,7 +107,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>>                  goto put_prog;
>>          }
>>
>> -       ret = sock_map_prog_update(map, NULL, prog, attr->attach_type);
>> +       ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
>>   put_prog:
>>          bpf_prog_put(prog);
>>   put_map:
>> @@ -1488,21 +1492,90 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
>>          return 0;
>>   }
>>
>> +static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
>> +                               struct bpf_link *link, bool skip_check, u32 which)
>> +{
>> +       struct sk_psock_progs *progs = sock_map_progs(map);
>> +
>> +       switch (which) {
>> +       case BPF_SK_MSG_VERDICT:
>> +               if (!skip_check &&
>> +                   ((!link && progs->msg_parser_link) ||
>> +                    (link && link != progs->msg_parser_link)))
>> +                       return -EBUSY;
>> +               *plink = &progs->msg_parser_link;
>> +               break;
>> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> +       case BPF_SK_SKB_STREAM_PARSER:
>> +               if (!skip_check &&
>> +                   ((!link && progs->stream_parser_link) ||
>> +                    (link && link != progs->stream_parser_link)))
>> +                       return -EBUSY;
>> +               *plink = &progs->stream_parser_link;
>> +               break;
>> +#endif
>> +       case BPF_SK_SKB_STREAM_VERDICT:
>> +               if (!skip_check &&
>> +                   ((!link && progs->stream_verdict_link) ||
>> +                    (link && link != progs->stream_verdict_link)))
>> +                       return -EBUSY;
>> +               *plink = &progs->stream_verdict_link;
>> +               break;
>> +       case BPF_SK_SKB_VERDICT:
>> +               if (!skip_check &&
>> +                   ((!link && progs->skb_verdict_link) ||
>> +                    (link && link != progs->skb_verdict_link)))
>> +                       return -EBUSY;
>> +               *plink = &progs->skb_verdict_link;
>> +               break;
>> +       default:
>> +               return -EOPNOTSUPP;
>> +       }
>> +
> you can simplify this by
>
> struct bpf_link *cur_link;
>
> switch (which) {
> case BPF_SK_MSG_VERDICT:
>      cur_link = progs->msg_parser_link;
>      break;
> case ...
>
> }
>
> and then perform that condition validating link and cur_link just once.

Indeed, this sounds simpler.

>
>
>> +       return 0;
>> +}
>> +
>> +/* Handle the following four cases:
>> + * prog_attach: prog != NULL, old == NULL, link == NULL
>> + * prog_detach: prog == NULL, old != NULL, link == NULL
>> + * link_attach: prog != NULL, old == NULL, link != NULL
>> + * link_detach: prog == NULL, old != NULL, link != NULL
>> + */
>>   static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
>> -                               struct bpf_prog *old, u32 which)
>> +                               struct bpf_prog *old, struct bpf_link *link,
>> +                               u32 which)
>>   {
>>          struct bpf_prog **pprog;
>> +       struct bpf_link **plink;
>>          int ret;
>>
>> +       mutex_lock(&sockmap_prog_update_mutex);
>> +
>>          ret = sock_map_prog_lookup(map, &pprog, which);
>>          if (ret)
>> -               return ret;
>> +               goto out;
>>
>> -       if (old)
>> -               return psock_replace_prog(pprog, prog, old);
>> +       if (!link || prog)
>> +               ret = sock_map_link_lookup(map, &plink, NULL, false, which);
>> +       else
>> +               ret = sock_map_link_lookup(map, &plink, NULL, true, which);
> it feels like it would be cleaner if sock_map_link_lookup() would just
> return already set link (based on which), and then you'd check update
> logic with prog vs link right here
>
> e.g., it's quite obfuscated that we validate that there is no link set
> currently (this code doesn't do anything with plink).
>
> anyways, I find it a bit hard to follow as it's written, but I'll
> defer to John to make a final call on logic
>
>> +       if (ret)
>> +               goto out;
>> +
>> +       if (old) {
>> +               ret = psock_replace_prog(pprog, prog, old);
>> +               if (!ret)
>> +                       *plink = NULL;
>> +               goto out;
>> +       }
>>
>>          psock_set_prog(pprog, prog);
>> -       return 0;
>> +       if (link)
>> +               *plink = link;
>> +
>> +out:
>> +       mutex_unlock(&sockmap_prog_update_mutex);
> why this mutex is not per-sockmap?

My thinking is the system probably won't have lots of sockmaps and
sockmap attach/detach/update_prog should not be that frequent. But
I could be wrong.

>
>> +       return ret;
>>   }
>>
>>   int sock_map_bpf_prog_query(const union bpf_attr *attr,
>> @@ -1657,6 +1730,180 @@ void sock_map_close(struct sock *sk, long timeout)
>>   }
>>   EXPORT_SYMBOL_GPL(sock_map_close);
>>
>> +struct sockmap_link {
>> +       struct bpf_link link;
>> +       struct bpf_map *map;
>> +       enum bpf_attach_type attach_type;
>> +};
>> +
>> +static struct sockmap_link *get_sockmap_link(const struct bpf_link *link)
>> +{
>> +       return container_of(link, struct sockmap_link, link);
>> +}
> nit: do you really need this helper? container_of() is a pretty
> straightforward by itself, imo

I can do this.

>
>> +
>> +static void sock_map_link_release(struct bpf_link *link)
>> +{
>> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
>> +
>> +       mutex_lock(&sockmap_link_mutex);
> similar to the above, why is this mutex not sockmap-specific? And I'd
> just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> case to keep it simple.

This is to protect sockmap_link->map. They could share the same lock.
Let me double check...

>
>> +       if (sockmap_link->map) {
>> +               (void)sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
>> +                                          sockmap_link->attach_type);
> WARN() if it's not meant to ever fail?

Yes, this is in link_release function and should not fail.
I can add a WARN here.

>
>> +               bpf_map_put_with_uref(sockmap_link->map);
>> +               sockmap_link->map = NULL;
>> +       }
>> +       mutex_unlock(&sockmap_link_mutex);
>> +}
>> +
>> +static int sock_map_link_detach(struct bpf_link *link)
>> +{
>> +       sock_map_link_release(link);
>> +       return 0;
>> +}
>> +
>> +static void sock_map_link_dealloc(struct bpf_link *link)
>> +{
>> +       kfree(get_sockmap_link(link));
>> +}
>> +
>> +/* Handle the following two cases:
>> + * case 1: link != NULL, prog != NULL, old != NULL
>> + * case 2: link != NULL, prog != NULL, old == NULL
>> + */
>> +static int sock_map_link_update_prog(struct bpf_link *link,
>> +                                    struct bpf_prog *prog,
>> +                                    struct bpf_prog *old)
>> +{
>> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
>> +       struct bpf_prog **pprog;
>> +       struct bpf_link **plink;
>> +       int ret = 0;
>> +
>> +       mutex_lock(&sockmap_prog_update_mutex);
>> +
>> +       /* If old prog not NULL, ensure old prog the same as link->prog. */
> typo: "is not NULL", "is the same"
>
>> +       if (old && link->prog != old) {
> hm.. even if old matches link->prog, we should unset old and set new
> link (link overrides prog attachment, basically), it shouldn't matter
> if old == link->prog, unless I'm missing something?

In xdp link (net/core/dev.c), we have

         cur_prog = dev_xdp_prog(dev, mode);
         /* can't replace attached prog with link */
         if (link && cur_prog) {
                 NL_SET_ERR_MSG(extack, "Can't replace active XDP 
program with BPF link");
                 return -EBUSY;
         }
         if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
                 NL_SET_ERR_MSG(extack, "Active program does not match 
expected");
                 return -EEXIST;
         }

if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
in order to do prog update.
for sockmap prog update, in link_update (syscall.c), the only way
we can get a non-NULL old_prog is with the following:

         if (flags & BPF_F_REPLACE) {
                 old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
                 if (IS_ERR(old_prog)) {
                         ret = PTR_ERR(old_prog);
                         old_prog = NULL;
                         goto out_put_progs;
                 }
         } else if (attr->link_update.old_prog_fd) {
                 ret = -EINVAL;
                 goto out_put_progs;
         }
Basically, we have BPF_F_REPLACE here.
So similar to xdp link, I think we should check old_prog to
be equal to link->prog in order to do link update_prog.

>
>> +               ret = -EINVAL;
>> +               goto out;
>> +       }
>> +       /* Ensure link->prog has the same type/attach_type as the new prog. */
>> +       if (link->prog->type != prog->type ||
>> +           link->prog->expected_attach_type != prog->expected_attach_type) {
>> +               ret = -EINVAL;
>> +               goto out;
>> +       }
>> +
>> +       ret = sock_map_prog_lookup(sockmap_link->map, &pprog,
>> +                                  sockmap_link->attach_type);
>> +       if (ret)
>> +               goto out;
>> +
>> +       /* Ensure the same link between the one in map and the passed-in. */
>> +       ret = sock_map_link_lookup(sockmap_link->map, &plink, link, false,
>> +                                  sockmap_link->attach_type);
>> +       if (ret)
>> +               goto out;
>> +
>> +       if (old)
>> +               return psock_replace_prog(pprog, prog, old);
>> +
>> +       psock_set_prog(pprog, prog);
>> +
>> +out:
>> +       if (!ret)
>> +               bpf_prog_inc(prog);
>> +       mutex_unlock(&sockmap_prog_update_mutex);
>> +       return ret;
>> +}
>> +
>> +static u32 sock_map_link_get_map_id(const struct sockmap_link *sockmap_link)
>> +{
>> +       u32 map_id = 0;
>> +
>> +       mutex_lock(&sockmap_link_mutex);
>> +       if (sockmap_link->map)
>> +               map_id = sockmap_link->map->id;
>> +       mutex_unlock(&sockmap_link_mutex);
>> +       return map_id;
>> +}
>> +
>> +static int sock_map_link_fill_info(const struct bpf_link *link,
>> +                                  struct bpf_link_info *info)
>> +{
>> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
>> +       u32 map_id = sock_map_link_get_map_id(sockmap_link);
>> +
>> +       info->sockmap.map_id = map_id;
>> +       info->sockmap.attach_type = sockmap_link->attach_type;
>> +       return 0;
>> +}
>> +
>> +static void sock_map_link_show_fdinfo(const struct bpf_link *link,
>> +                                     struct seq_file *seq)
>> +{
>> +       const struct sockmap_link *sockmap_link = get_sockmap_link(link);
>> +       u32 map_id = sock_map_link_get_map_id(sockmap_link);
>> +
>> +       seq_printf(seq, "map_id:\t%u\n", map_id);
>> +       seq_printf(seq, "attach_type:\t%u\n", sockmap_link->attach_type);
>> +}
>> +
>> +static const struct bpf_link_ops sock_map_link_ops = {
>> +       .release = sock_map_link_release,
>> +       .dealloc = sock_map_link_dealloc,
>> +       .detach = sock_map_link_detach,
>> +       .update_prog = sock_map_link_update_prog,
>> +       .fill_link_info = sock_map_link_fill_info,
>> +       .show_fdinfo = sock_map_link_show_fdinfo,
>> +};
>> +
>> +int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
>> +{
>> +       struct bpf_link_primer link_primer;
>> +       struct sockmap_link *sockmap_link;
>> +       enum bpf_attach_type attach_type;
>> +       struct bpf_map *map;
>> +       int ret;
>> +
>> +       if (attr->link_create.flags)
>> +               return -EINVAL;
>> +
>> +       map = bpf_map_get_with_uref(attr->link_create.target_fd);
>> +       if (IS_ERR(map))
>> +               return PTR_ERR(map);
> check that map is SOCKMAP?
Good point. Will do.
>
>> +
>> +       sockmap_link = kzalloc(sizeof(*sockmap_link), GFP_USER);
>> +       if (!sockmap_link) {
>> +               ret = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>> +       attach_type = attr->link_create.attach_type;
>> +       bpf_link_init(&sockmap_link->link, BPF_LINK_TYPE_SOCKMAP, &sock_map_link_ops, prog);
>> +       sockmap_link->map = map;
>> +       sockmap_link->attach_type = attach_type;
>> +
>> +       ret = bpf_link_prime(&sockmap_link->link, &link_primer);
>> +       if (ret) {
>> +               kfree(sockmap_link);
>> +               goto out;
>> +       }
>> +
>> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
>> +       if (ret) {
>> +               bpf_link_cleanup(&link_primer);
>> +               goto out;
>> +       }
>> +
>> +       bpf_prog_inc(prog);
> if link was created successfully, it "inherits" prog's refcnt, so you
> shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> prog only if this function returns error

The reason I did this is due to

static inline void psock_set_prog(struct bpf_prog **pprog,
                                   struct bpf_prog *prog)
{
         prog = xchg(pprog, prog);
         if (prog)
                 bpf_prog_put(prog);
}

You can see when the prog is swapped due to link_update or prog_attach,
its reference count is decremented by 1. This is necessary for prog_attach,
but as you mentioned, indeed, it is not necessary for link-based approach.
Let me see whether I can refactor code to make it easy not to increase
reference count of prog here.


>
>> +
>> +       return bpf_link_settle(&link_primer);
>> +
>> +out:
>> +       bpf_map_put_with_uref(map);
>> +       return ret;
>> +}
>> +
>>   static int sock_map_iter_attach_target(struct bpf_prog *prog,
>>                                         union bpf_iter_link_info *linfo,
>>                                         struct bpf_iter_aux_info *aux)
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 9585f5345353..31660c3ffc01 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
>>          BPF_LINK_TYPE_TCX = 11,
>>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
>>          BPF_LINK_TYPE_NETKIT = 13,
>> +       BPF_LINK_TYPE_SOCKMAP = 14,
>>          __MAX_BPF_LINK_TYPE,
>>   };
>>
>> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
>>                          __u32 ifindex;
>>                          __u32 attach_type;
>>                  } netkit;
>> +               struct {
>> +                       __u32 map_id;
>> +                       __u32 attach_type;
>> +               } sockmap;
>>          };
>>   } __attribute__((aligned(8)));
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-03  1:08     ` Yonghong Song
@ 2024-04-03 16:43       ` Andrii Nakryiko
  2024-04-03 17:47         ` John Fastabend
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2024-04-03 16:43 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> >> internal request to support bpf_link for sk_msg programs so user
> >> space can have a uniform handling with bpf_link based libbpf
> >> APIs. Using bpf_link based libbpf API also has a benefit which
> >> makes system robust by decoupling prog life cycle and
> >> attachment life cycle.
> >>
> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> >> ---
> >>   include/linux/bpf.h            |   6 +
> >>   include/linux/skmsg.h          |   4 +
> >>   include/uapi/linux/bpf.h       |   5 +
> >>   kernel/bpf/syscall.c           |   4 +
> >>   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
> >>   tools/include/uapi/linux/bpf.h |   5 +
> >>   6 files changed, 279 insertions(+), 8 deletions(-)
> >>

[...]

> >>          psock_set_prog(pprog, prog);
> >> -       return 0;
> >> +       if (link)
> >> +               *plink = link;
> >> +
> >> +out:
> >> +       mutex_unlock(&sockmap_prog_update_mutex);
> > why this mutex is not per-sockmap?
>
> My thinking is the system probably won't have lots of sockmaps and
> sockmap attach/detach/update_prog should not be that frequent. But
> I could be wrong.
>

That seems like an even more of an argument to keep mutex per sockmap.
It won't add a lot of memory, but it is conceptually cleaner, as each
sockmap instance (and corresponding links) are completely independent,
even from a locking perspective.

But I can't say I feel very strongly about this.

> >
> >> +       return ret;
> >>   }
> >>

[...]

> >
> >> +
> >> +static void sock_map_link_release(struct bpf_link *link)
> >> +{
> >> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> >> +
> >> +       mutex_lock(&sockmap_link_mutex);
> > similar to the above, why is this mutex not sockmap-specific? And I'd
> > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> > case to keep it simple.
>
> This is to protect sockmap_link->map. They could share the same lock.
> Let me double check...

If you keep that global sockmap_prog_update_mutex then I'd probably
reuse that one here for simplicity (and named it a bit more
generically, "sockmap_mutex" or something like that, just like we have
global "cgroup_mutex").

[...]

> >> +       if (old && link->prog != old) {
> > hm.. even if old matches link->prog, we should unset old and set new
> > link (link overrides prog attachment, basically), it shouldn't matter
> > if old == link->prog, unless I'm missing something?
>
> In xdp link (net/core/dev.c), we have
>
>          cur_prog = dev_xdp_prog(dev, mode);
>          /* can't replace attached prog with link */
>          if (link && cur_prog) {
>                  NL_SET_ERR_MSG(extack, "Can't replace active XDP
> program with BPF link");
>                  return -EBUSY;
>          }
>          if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
>                  NL_SET_ERR_MSG(extack, "Active program does not match
> expected");
>                  return -EEXIST;
>          }
>
> if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
> in order to do prog update.
> for sockmap prog update, in link_update (syscall.c), the only way
> we can get a non-NULL old_prog is with the following:
>
>          if (flags & BPF_F_REPLACE) {
>                  old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
>                  if (IS_ERR(old_prog)) {
>                          ret = PTR_ERR(old_prog);
>                          old_prog = NULL;
>                          goto out_put_progs;
>                  }
>          } else if (attr->link_update.old_prog_fd) {
>                  ret = -EINVAL;
>                  goto out_put_progs;
>          }
> Basically, we have BPF_F_REPLACE here.
> So similar to xdp link, I think we should check old_prog to
> be equal to link->prog in order to do link update_prog.

ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have
this logic split between multiple places, in dev_xdp_attach() it's a
bit more centralized.

>
> >
> >> +               ret = -EINVAL;
> >> +               goto out;
> >> +       }

[...]

> >> +
> >> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> >> +       if (ret) {
> >> +               bpf_link_cleanup(&link_primer);
> >> +               goto out;
> >> +       }
> >> +
> >> +       bpf_prog_inc(prog);
> > if link was created successfully, it "inherits" prog's refcnt, so you
> > shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> > prog only if this function returns error
>
> The reason I did this is due to
>
> static inline void psock_set_prog(struct bpf_prog **pprog,
>                                    struct bpf_prog *prog)
> {
>          prog = xchg(pprog, prog);
>          if (prog)
>                  bpf_prog_put(prog);
> }
>
> You can see when the prog is swapped due to link_update or prog_attach,
> its reference count is decremented by 1. This is necessary for prog_attach,
> but as you mentioned, indeed, it is not necessary for link-based approach.
> Let me see whether I can refactor code to make it easy not to increase
> reference count of prog here.
>

ah, ok, its another sockmap-specific convention, np

>
> >
> >> +
> >> +       return bpf_link_settle(&link_primer);
> >> +
> >> +out:
> >> +       bpf_map_put_with_uref(map);
> >> +       return ret;
> >> +}
> >> +
> >>   static int sock_map_iter_attach_target(struct bpf_prog *prog,
> >>                                         union bpf_iter_link_info *linfo,
> >>                                         struct bpf_iter_aux_info *aux)
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index 9585f5345353..31660c3ffc01 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
> >>          BPF_LINK_TYPE_TCX = 11,
> >>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
> >>          BPF_LINK_TYPE_NETKIT = 13,
> >> +       BPF_LINK_TYPE_SOCKMAP = 14,
> >>          __MAX_BPF_LINK_TYPE,
> >>   };
> >>
> >> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
> >>                          __u32 ifindex;
> >>                          __u32 attach_type;
> >>                  } netkit;
> >> +               struct {
> >> +                       __u32 map_id;
> >> +                       __u32 attach_type;
> >> +               } sockmap;
> >>          };
> >>   } __attribute__((aligned(8)));
> >>
> >> --
> >> 2.43.0
> >>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-03 16:43       ` Andrii Nakryiko
@ 2024-04-03 17:47         ` John Fastabend
  2024-04-03 22:09           ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
  2024-04-04  3:18           ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
  0 siblings, 2 replies; 26+ messages in thread
From: John Fastabend @ 2024-04-03 17:47 UTC (permalink / raw)
  To: Andrii Nakryiko, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, kernel-team, Martin KaFai Lau

Andrii Nakryiko wrote:
> On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> > On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> > > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> > >> internal request to support bpf_link for sk_msg programs so user
> > >> space can have a uniform handling with bpf_link based libbpf
> > >> APIs. Using bpf_link based libbpf API also has a benefit which
> > >> makes system robust by decoupling prog life cycle and
> > >> attachment life cycle.
> > >>

Thanks again for working on it.

> > >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > >> ---
> > >>   include/linux/bpf.h            |   6 +
> > >>   include/linux/skmsg.h          |   4 +
> > >>   include/uapi/linux/bpf.h       |   5 +
> > >>   kernel/bpf/syscall.c           |   4 +
> > >>   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
> > >>   tools/include/uapi/linux/bpf.h |   5 +
> > >>   6 files changed, 279 insertions(+), 8 deletions(-)
> > >>
> 
> [...]
> 
> > >>          psock_set_prog(pprog, prog);
> > >> -       return 0;
> > >> +       if (link)
> > >> +               *plink = link;
> > >> +
> > >> +out:
> > >> +       mutex_unlock(&sockmap_prog_update_mutex);
> > > why this mutex is not per-sockmap?
> >
> > My thinking is the system probably won't have lots of sockmaps and
> > sockmap attach/detach/update_prog should not be that frequent. But
> > I could be wrong.
> >

For my use case at least we have a map per protocol we want to inspect.
So its rather small set <10 I would say. Also they are created once
when the agent starts and when config changes from operator (user decides
to remove/add a parser). Config changing is rather rare. I don't think
this would be paticularly painful in practice now to have a global
lock.

> 
> That seems like an even more of an argument to keep mutex per sockmap.
> It won't add a lot of memory, but it is conceptually cleaner, as each
> sockmap instance (and corresponding links) are completely independent,
> even from a locking perspective.
> 
> But I can't say I feel very strongly about this.
> 
> > >
> > >> +       return ret;
> > >>   }
> > >>
> 
> [...]
> 
> > >
> > >> +
> > >> +static void sock_map_link_release(struct bpf_link *link)
> > >> +{
> > >> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> > >> +
> > >> +       mutex_lock(&sockmap_link_mutex);
> > > similar to the above, why is this mutex not sockmap-specific? And I'd
> > > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> > > case to keep it simple.
> >
> > This is to protect sockmap_link->map. They could share the same lock.
> > Let me double check...
> 
> If you keep that global sockmap_prog_update_mutex then I'd probably
> reuse that one here for simplicity (and named it a bit more
> generically, "sockmap_mutex" or something like that, just like we have
> global "cgroup_mutex").

I was leaning to a per map lock, but because a global lock simplifies this
part a bunch I would agree just use a single sockmap_mutex throughout.

If someone has a use case where they want to add/remove maps dynamically
maybe they can let us know what that is. For us, on my todo list, I want
to just remove the map notion and bind progs to socks directly. The
original map idea was for a L7 load balancer, but other than quick hacks
I've never built such a thing nor ran it in production. Maybe someday
I'll find the time.

> 
> [...]
> 
> > >> +       if (old && link->prog != old) {
> > > hm.. even if old matches link->prog, we should unset old and set new
> > > link (link overrides prog attachment, basically), it shouldn't matter
> > > if old == link->prog, unless I'm missing something?
> >
> > In xdp link (net/core/dev.c), we have
> >
> >          cur_prog = dev_xdp_prog(dev, mode);
> >          /* can't replace attached prog with link */
> >          if (link && cur_prog) {
> >                  NL_SET_ERR_MSG(extack, "Can't replace active XDP
> > program with BPF link");
> >                  return -EBUSY;
> >          }
> >          if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
> >                  NL_SET_ERR_MSG(extack, "Active program does not match
> > expected");
> >                  return -EEXIST;
> >          }
> >
> > if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
> > in order to do prog update.
> > for sockmap prog update, in link_update (syscall.c), the only way
> > we can get a non-NULL old_prog is with the following:
> >
> >          if (flags & BPF_F_REPLACE) {
> >                  old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
> >                  if (IS_ERR(old_prog)) {
> >                          ret = PTR_ERR(old_prog);
> >                          old_prog = NULL;
> >                          goto out_put_progs;
> >                  }
> >          } else if (attr->link_update.old_prog_fd) {
> >                  ret = -EINVAL;
> >                  goto out_put_progs;
> >          }
> > Basically, we have BPF_F_REPLACE here.
> > So similar to xdp link, I think we should check old_prog to
> > be equal to link->prog in order to do link update_prog.
> 
> ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have
> this logic split between multiple places, in dev_xdp_attach() it's a
> bit more centralized.
> 
> >
> > >
> > >> +               ret = -EINVAL;
> > >> +               goto out;
> > >> +       }
> 
> [...]
> 
> > >> +
> > >> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> > >> +       if (ret) {
> > >> +               bpf_link_cleanup(&link_primer);
> > >> +               goto out;
> > >> +       }
> > >> +
> > >> +       bpf_prog_inc(prog);
> > > if link was created successfully, it "inherits" prog's refcnt, so you
> > > shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> > > prog only if this function returns error
> >
> > The reason I did this is due to
> >
> > static inline void psock_set_prog(struct bpf_prog **pprog,
> >                                    struct bpf_prog *prog)
> > {
> >          prog = xchg(pprog, prog);
> >          if (prog)
> >                  bpf_prog_put(prog);
> > }
> >
> > You can see when the prog is swapped due to link_update or prog_attach,
> > its reference count is decremented by 1. This is necessary for prog_attach,
> > but as you mentioned, indeed, it is not necessary for link-based approach.
> > Let me see whether I can refactor code to make it easy not to increase
> > reference count of prog here.
> >
> 
> ah, ok, its another sockmap-specific convention, np
> 
> >
> > >
> > >> +
> > >> +       return bpf_link_settle(&link_primer);
> > >> +
> > >> +out:
> > >> +       bpf_map_put_with_uref(map);
> > >> +       return ret;
> > >> +}
> > >> +
> > >>   static int sock_map_iter_attach_target(struct bpf_prog *prog,
> > >>                                         union bpf_iter_link_info *linfo,
> > >>                                         struct bpf_iter_aux_info *aux)
> > >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > >> index 9585f5345353..31660c3ffc01 100644
> > >> --- a/tools/include/uapi/linux/bpf.h
> > >> +++ b/tools/include/uapi/linux/bpf.h
> > >> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
> > >>          BPF_LINK_TYPE_TCX = 11,
> > >>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
> > >>          BPF_LINK_TYPE_NETKIT = 13,
> > >> +       BPF_LINK_TYPE_SOCKMAP = 14,
> > >>          __MAX_BPF_LINK_TYPE,
> > >>   };
> > >>
> > >> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
> > >>                          __u32 ifindex;
> > >>                          __u32 attach_type;
> > >>                  } netkit;
> > >> +               struct {
> > >> +                       __u32 map_id;
> > >> +                       __u32 attach_type;
> > >> +               } sockmap;
> > >>          };
> > >>   } __attribute__((aligned(8)));
> > >>
> > >> --
> > >> 2.43.0
> > >>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs]
  2024-04-03 17:47         ` John Fastabend
@ 2024-04-03 22:09           ` Martin KaFai Lau
  2024-04-04  1:11             ` John Fastabend
  2024-04-04  3:18           ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
  1 sibling, 1 reply; 26+ messages in thread
From: Martin KaFai Lau @ 2024-04-03 22:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau, Andrii Nakryiko,
	Yonghong Song

On 4/3/24 10:47 AM, John Fastabend wrote:
> on my todo list, I want
> to just remove the map notion and bind progs to socks directly.

Run the bpf prog without the sockmap? +1, it would be nice.

> but other than quick hacks I've never built such a thing nor ran it 
> in production. 

How do you see the interface will look like (e.g. attaching the bpf prog to a sk) ?

It will be nice if the whole function (e.g. sk->sk_data_ready or may be some of 
the sk->sk_prot) can be implemented completely in bpf. I don't have a concrete 
use case for now but I think it will be powerful.

[ It is orthogonal to what Yonghong is doing, so the title changed ]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs]
  2024-04-03 22:09           ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
@ 2024-04-04  1:11             ` John Fastabend
  2024-04-04  3:31               ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: John Fastabend @ 2024-04-04  1:11 UTC (permalink / raw)
  To: Martin KaFai Lau, John Fastabend
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau, Andrii Nakryiko,
	Yonghong Song

Martin KaFai Lau wrote:
> On 4/3/24 10:47 AM, John Fastabend wrote:
> > on my todo list, I want
> > to just remove the map notion and bind progs to socks directly.
> 
> Run the bpf prog without the sockmap? +1, it would be nice.

Part of my motivation for doing this is almost all the bugs syzbot and
others find are related to removing sockets from the map. We never
do this in any of our code. Once a socket is in the map (added at
accept time) it stays there until TCP stack closes it.

Also we have to make up some size for the map that somehow looks like
max number of concurrent sessions for the application. For many
server applicatoins (nginx, httpd, ...) we know this, but is a bit
artifically derived.

> 
> > but other than quick hacks I've never built such a thing nor ran it 
> > in production. 
> 
> How do you see the interface will look like (e.g. attaching the bpf prog to a sk) ?

I would propse doing it directly with a helper/kfunc from the sockops
programs.

  attach_sk_msg_prog(sk, sk_msg_prog)
  attach_sk_skb_prog(sk, sk_skb_prog)

> 
> It will be nice if the whole function (e.g. sk->sk_data_ready or may be some of 
> the sk->sk_prot) can be implemented completely in bpf. I don't have a concrete 
> use case for now but I think it will be powerful.

Perhaps a data_ready prog could also replace the ops?

  attach_sk_data_ready(sk, sk_msg_data_ready)

The attach_sk_data_ready could use pretty much the logic we have for
creating psocks but only replace the sk_data_ready callback.

> 
> [ It is orthogonal to what Yonghong is doing, so the title changed ]
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-03 17:47         ` John Fastabend
  2024-04-03 22:09           ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
@ 2024-04-04  3:18           ` Yonghong Song
  2024-04-05  4:42             ` John Fastabend
  1 sibling, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-04-04  3:18 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau


On 4/3/24 10:47 AM, John Fastabend wrote:
> Andrii Nakryiko wrote:
>> On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>
>>> On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
>>>> On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>> Add bpf_link support for sk_msg and sk_skb programs. We have an
>>>>> internal request to support bpf_link for sk_msg programs so user
>>>>> space can have a uniform handling with bpf_link based libbpf
>>>>> APIs. Using bpf_link based libbpf API also has a benefit which
>>>>> makes system robust by decoupling prog life cycle and
>>>>> attachment life cycle.
>>>>>
> Thanks again for working on it.
>
>>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>>> ---
>>>>>    include/linux/bpf.h            |   6 +
>>>>>    include/linux/skmsg.h          |   4 +
>>>>>    include/uapi/linux/bpf.h       |   5 +
>>>>>    kernel/bpf/syscall.c           |   4 +
>>>>>    net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
>>>>>    tools/include/uapi/linux/bpf.h |   5 +
>>>>>    6 files changed, 279 insertions(+), 8 deletions(-)
>>>>>
>> [...]
>>
>>>>>           psock_set_prog(pprog, prog);
>>>>> -       return 0;
>>>>> +       if (link)
>>>>> +               *plink = link;
>>>>> +
>>>>> +out:
>>>>> +       mutex_unlock(&sockmap_prog_update_mutex);
>>>> why this mutex is not per-sockmap?
>>> My thinking is the system probably won't have lots of sockmaps and
>>> sockmap attach/detach/update_prog should not be that frequent. But
>>> I could be wrong.
>>>
> For my use case at least we have a map per protocol we want to inspect.
> So its rather small set <10 I would say. Also they are created once
> when the agent starts and when config changes from operator (user decides
> to remove/add a parser). Config changing is rather rare. I don't think
> this would be paticularly painful in practice now to have a global
> lock.
>
>> That seems like an even more of an argument to keep mutex per sockmap.
>> It won't add a lot of memory, but it is conceptually cleaner, as each
>> sockmap instance (and corresponding links) are completely independent,
>> even from a locking perspective.
>>
>> But I can't say I feel very strongly about this.
>>
>>>>> +       return ret;
>>>>>    }
>>>>>
>> [...]
>>
>>>>> +
>>>>> +static void sock_map_link_release(struct bpf_link *link)
>>>>> +{
>>>>> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
>>>>> +
>>>>> +       mutex_lock(&sockmap_link_mutex);
>>>> similar to the above, why is this mutex not sockmap-specific? And I'd
>>>> just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
>>>> case to keep it simple.
>>> This is to protect sockmap_link->map. They could share the same lock.
>>> Let me double check...
>> If you keep that global sockmap_prog_update_mutex then I'd probably
>> reuse that one here for simplicity (and named it a bit more
>> generically, "sockmap_mutex" or something like that, just like we have
>> global "cgroup_mutex").
> I was leaning to a per map lock, but because a global lock simplifies this
> part a bunch I would agree just use a single sockmap_mutex throughout.
>
> If someone has a use case where they want to add/remove maps dynamically
> maybe they can let us know what that is. For us, on my todo list, I want
> to just remove the map notion and bind progs to socks directly. The
> original map idea was for a L7 load balancer, but other than quick hacks
> I've never built such a thing nor ran it in production. Maybe someday
> I'll find the time.

I am using a single global lock.
   https://lore.kernel.org/bpf/20240404025305.2210999-1-yonghong.song@linux.dev/
Let us whether it makes sense or not with code.

John, it would be great if you can review the patch set. I am afraid
that I could miss something...

>
>> [...]
>>
>>>>> +       if (old && link->prog != old) {
>>>> hm.. even if old matches link->prog, we should unset old and set new
>>>> link (link overrides prog attachment, basically), it shouldn't matter
>>>> if old == link->prog, unless I'm missing something?
>>>
[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs]
  2024-04-04  1:11             ` John Fastabend
@ 2024-04-04  3:31               ` Yonghong Song
  2024-04-05  4:41                 ` John Fastabend
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2024-04-04  3:31 UTC (permalink / raw)
  To: John Fastabend, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau, Andrii Nakryiko


On 4/3/24 6:11 PM, John Fastabend wrote:
> Martin KaFai Lau wrote:
>> On 4/3/24 10:47 AM, John Fastabend wrote:
>>> on my todo list, I want
>>> to just remove the map notion and bind progs to socks directly.
>> Run the bpf prog without the sockmap? +1, it would be nice.
> Part of my motivation for doing this is almost all the bugs syzbot and
> others find are related to removing sockets from the map. We never
> do this in any of our code. Once a socket is in the map (added at
> accept time) it stays there until TCP stack closes it.
>
> Also we have to make up some size for the map that somehow looks like
> max number of concurrent sessions for the application. For many
> server applicatoins (nginx, httpd, ...) we know this, but is a bit
> artifically derived.
>
>>> but other than quick hacks I've never built such a thing nor ran it
>>> in production.
>> How do you see the interface will look like (e.g. attaching the bpf prog to a sk) ?
> I would propse doing it directly with a helper/kfunc from the sockops
> programs.
>
>    attach_sk_msg_prog(sk, sk_msg_prog)
>    attach_sk_skb_prog(sk, sk_skb_prog)
>
>> It will be nice if the whole function (e.g. sk->sk_data_ready or may be some of
>> the sk->sk_prot) can be implemented completely in bpf. I don't have a concrete
>> use case for now but I think it will be powerful.
> Perhaps a data_ready prog could also replace the ops?
>
>    attach_sk_data_ready(sk, sk_msg_data_ready)
>
> The attach_sk_data_ready could use pretty much the logic we have for
> creating psocks but only replace the sk_data_ready callback.

sounds a good idea. Do we need to support detach function or atomic
update function as well? Can each sk has multiple sk_msg_prog programs?

>
>> [ It is orthogonal to what Yonghong is doing, so the title changed ]
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs]
  2024-04-04  3:31               ` Yonghong Song
@ 2024-04-05  4:41                 ` John Fastabend
  2024-04-06  1:10                   ` Martin KaFai Lau
  0 siblings, 1 reply; 26+ messages in thread
From: John Fastabend @ 2024-04-05  4:41 UTC (permalink / raw)
  To: Yonghong Song, John Fastabend, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau, Andrii Nakryiko

Yonghong Song wrote:
> 
> On 4/3/24 6:11 PM, John Fastabend wrote:
> > Martin KaFai Lau wrote:
> >> On 4/3/24 10:47 AM, John Fastabend wrote:
> >>> on my todo list, I want
> >>> to just remove the map notion and bind progs to socks directly.
> >> Run the bpf prog without the sockmap? +1, it would be nice.
> > Part of my motivation for doing this is almost all the bugs syzbot and
> > others find are related to removing sockets from the map. We never
> > do this in any of our code. Once a socket is in the map (added at
> > accept time) it stays there until TCP stack closes it.
> >
> > Also we have to make up some size for the map that somehow looks like
> > max number of concurrent sessions for the application. For many
> > server applicatoins (nginx, httpd, ...) we know this, but is a bit
> > artifically derived.
> >
> >>> but other than quick hacks I've never built such a thing nor ran it
> >>> in production.
> >> How do you see the interface will look like (e.g. attaching the bpf prog to a sk) ?
> > I would propse doing it directly with a helper/kfunc from the sockops
> > programs.
> >
> >    attach_sk_msg_prog(sk, sk_msg_prog)
> >    attach_sk_skb_prog(sk, sk_skb_prog)
> >
> >> It will be nice if the whole function (e.g. sk->sk_data_ready or may be some of
> >> the sk->sk_prot) can be implemented completely in bpf. I don't have a concrete
> >> use case for now but I think it will be powerful.
> > Perhaps a data_ready prog could also replace the ops?
> >
> >    attach_sk_data_ready(sk, sk_msg_data_ready)
> >
> > The attach_sk_data_ready could use pretty much the logic we have for
> > creating psocks but only replace the sk_data_ready callback.
> 
> sounds a good idea. Do we need to support detach function or atomic
> update function as well? Can each sk has multiple sk_msg_prog programs?

I've not found any use for multiple programs, detach functions, or updating
the psock once its created to be honest. Also why syzbot finds all the bugs
in this space because we unfortunately don't stress this area much. In the
original design I had fresh in my head building hardware load balancers and the
XDP redirect bits so a map seemed natural. Also we didn't have a lot of the
machinery we have now so went with the map. As I noted above the L7 LB
hasn't really got much traction on my side at least not yet.

In reality we've been using sk_msg and sk_skb progs attaching 1:1
with protocols and observing, auditing, adding/removing fields from
data streams.

I would probably suggest for first implementation of a sk msg attach without
maps I would just make it one prog no need for multiple programs and even
skip a detach function. Maybe there is some use for multiple programs but
we just have a single agent so it hasn't come up yet. Maybe similar to cgroups
though because we only have single prog in those at the moment.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
  2024-04-04  3:18           ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
@ 2024-04-05  4:42             ` John Fastabend
  0 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2024-04-05  4:42 UTC (permalink / raw)
  To: Yonghong Song, John Fastabend, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau

Yonghong Song wrote:
> 
> On 4/3/24 10:47 AM, John Fastabend wrote:
> > Andrii Nakryiko wrote:
> >> On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>
> >>> On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> >>>> On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>>> Add bpf_link support for sk_msg and sk_skb programs. We have an
> >>>>> internal request to support bpf_link for sk_msg programs so user
> >>>>> space can have a uniform handling with bpf_link based libbpf
> >>>>> APIs. Using bpf_link based libbpf API also has a benefit which
> >>>>> makes system robust by decoupling prog life cycle and
> >>>>> attachment life cycle.
> >>>>>
> > Thanks again for working on it.
> >
> >>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> >>>>> ---
> >>>>>    include/linux/bpf.h            |   6 +
> >>>>>    include/linux/skmsg.h          |   4 +
> >>>>>    include/uapi/linux/bpf.h       |   5 +
> >>>>>    kernel/bpf/syscall.c           |   4 +
> >>>>>    net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
> >>>>>    tools/include/uapi/linux/bpf.h |   5 +
> >>>>>    6 files changed, 279 insertions(+), 8 deletions(-)
> >>>>>
> >> [...]
> >>
> >>>>>           psock_set_prog(pprog, prog);
> >>>>> -       return 0;
> >>>>> +       if (link)
> >>>>> +               *plink = link;
> >>>>> +
> >>>>> +out:
> >>>>> +       mutex_unlock(&sockmap_prog_update_mutex);
> >>>> why this mutex is not per-sockmap?
> >>> My thinking is the system probably won't have lots of sockmaps and
> >>> sockmap attach/detach/update_prog should not be that frequent. But
> >>> I could be wrong.
> >>>
> > For my use case at least we have a map per protocol we want to inspect.
> > So its rather small set <10 I would say. Also they are created once
> > when the agent starts and when config changes from operator (user decides
> > to remove/add a parser). Config changing is rather rare. I don't think
> > this would be paticularly painful in practice now to have a global
> > lock.
> >
> >> That seems like an even more of an argument to keep mutex per sockmap.
> >> It won't add a lot of memory, but it is conceptually cleaner, as each
> >> sockmap instance (and corresponding links) are completely independent,
> >> even from a locking perspective.
> >>
> >> But I can't say I feel very strongly about this.
> >>
> >>>>> +       return ret;
> >>>>>    }
> >>>>>
> >> [...]
> >>
> >>>>> +
> >>>>> +static void sock_map_link_release(struct bpf_link *link)
> >>>>> +{
> >>>>> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> >>>>> +
> >>>>> +       mutex_lock(&sockmap_link_mutex);
> >>>> similar to the above, why is this mutex not sockmap-specific? And I'd
> >>>> just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> >>>> case to keep it simple.
> >>> This is to protect sockmap_link->map. They could share the same lock.
> >>> Let me double check...
> >> If you keep that global sockmap_prog_update_mutex then I'd probably
> >> reuse that one here for simplicity (and named it a bit more
> >> generically, "sockmap_mutex" or something like that, just like we have
> >> global "cgroup_mutex").
> > I was leaning to a per map lock, but because a global lock simplifies this
> > part a bunch I would agree just use a single sockmap_mutex throughout.
> >
> > If someone has a use case where they want to add/remove maps dynamically
> > maybe they can let us know what that is. For us, on my todo list, I want
> > to just remove the map notion and bind progs to socks directly. The
> > original map idea was for a L7 load balancer, but other than quick hacks
> > I've never built such a thing nor ran it in production. Maybe someday
> > I'll find the time.
> 
> I am using a single global lock.
>    https://lore.kernel.org/bpf/20240404025305.2210999-1-yonghong.song@linux.dev/
> Let us whether it makes sense or not with code.
> 
> John, it would be great if you can review the patch set. I am afraid
> that I could miss something...

Yep I will. Hopefully tonight because I intended to do it today but worse
case top of list tomorrow. I can also drop it into our test harness which
runs some longer running stress stuff. Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs]
  2024-04-05  4:41                 ` John Fastabend
@ 2024-04-06  1:10                   ` Martin KaFai Lau
  0 siblings, 0 replies; 26+ messages in thread
From: Martin KaFai Lau @ 2024-04-06  1:10 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Jakub Sitnicki, kernel-team, Martin KaFai Lau, Andrii Nakryiko,
	Yonghong Song

On 4/4/24 9:41 PM, John Fastabend wrote:

>>>> How do you see the interface will look like (e.g. attaching the bpf prog to a sk) ?
>>> I would propse doing it directly with a helper/kfunc from the sockops
>>> programs.
>>>
>>>     attach_sk_msg_prog(sk, sk_msg_prog)
>>>     attach_sk_skb_prog(sk, sk_skb_prog)

or the whole 'struct sk_psock_progs'

attach_sk_parser(struct sock *sk, struct sk_psock_progs *progs).

>>>
>>>> It will be nice if the whole function (e.g. sk->sk_data_ready or may be some of
>>>> the sk->sk_prot) can be implemented completely in bpf. I don't have a concrete
>>>> use case for now but I think it will be powerful.
>>> Perhaps a data_ready prog could also replace the ops?
>>>
>>>     attach_sk_data_ready(sk, sk_msg_data_ready)

Other than sk_data_ready, I am also wondering how much of the 'struct proto' can 
be written in bpf. For example, the {tcp,udp}_bpf_prots.
May be with some help of new kfunc and some of the functions can just use the 
kernel default one.

>>> The attach_sk_data_ready could use pretty much the logic we have for
>>> creating psocks but only replace the sk_data_ready callback.
>>
>> sounds a good idea. Do we need to support detach function or atomic
>> update function as well? Can each sk has multiple sk_msg_prog programs?
> 
> I've not found any use for multiple programs, detach functions, or updating
> the psock once its created to be honest. Also why syzbot finds all the bugs
> in this space because we unfortunately don't stress this area much. In the
> original design I had fresh in my head building hardware load balancers and the
> XDP redirect bits so a map seemed natural. Also we didn't have a lot of the
> machinery we have now so went with the map. As I noted above the L7 LB
> hasn't really got much traction on my side at least not yet.
> 
> In reality we've been using sk_msg and sk_skb progs attaching 1:1
> with protocols and observing, auditing, adding/removing fields from
> data streams.
> 
> I would probably suggest for first implementation of a sk msg attach without
> maps I would just make it one prog no need for multiple programs and even
> skip a detach function. Maybe there is some use for multiple programs but

I would at least keep the detach (and update) program possibility open. Is it 
still too hard to support them without a map get into the way?


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-04-06  1:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
2024-04-02 17:39   ` Eduard Zingerman
2024-04-03  0:06     ` Yonghong Song
2024-04-02 17:45   ` Andrii Nakryiko
2024-04-03  1:08     ` Yonghong Song
2024-04-03 16:43       ` Andrii Nakryiko
2024-04-03 17:47         ` John Fastabend
2024-04-03 22:09           ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
2024-04-04  1:11             ` John Fastabend
2024-04-04  3:31               ` Yonghong Song
2024-04-05  4:41                 ` John Fastabend
2024-04-06  1:10                   ` Martin KaFai Lau
2024-04-04  3:18           ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-04-05  4:42             ` John Fastabend
2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
2024-04-02 13:18   ` Eduard Zingerman
2024-04-02 17:46   ` Andrii Nakryiko
2024-04-03  0:07     ` Yonghong Song
2024-03-26  2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
2024-03-27 11:58   ` Quentin Monnet
2024-03-26  2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
2024-04-02 13:18   ` Eduard Zingerman
2024-03-26  2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
2024-04-02 13:17   ` Eduard Zingerman
2024-04-02 18:56     ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox