netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops
@ 2024-12-10  4:03 D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 1/5] bpf: export necessary sympols for modules with struct_ops D. Wythe
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:03 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

This patches attempt to introduce BPF injection capability for SMC,
and add selftest to ensure code stability.

Since the SMC protocol is not suitable for all scenarios,
especially for short-lived. For most applications, they cannot
guarantee that there are no such scenarios at all. Therefore, apps
may need some specific strategies to decide shall we need to use SMC
or not, for example, apps can limit the scope of the SMC to a specific
IP address or port.

Based on the consideration of transparent replacement, we hope that apps
can remain transparent even if they need to formulate some specific
strategies for SMC using. That is, do not need to recompile their code.

On the other hand, we need to ensure the scalability of strategies
implementation. Although it is simple to use socket options or sysctl,
it will bring more complexity to subsequent expansion.

Fortunately, BPF can solve these concerns very well, users can write
thire own strategies in eBPF to choose whether to use SMC or not.
And it's quite easy for them to modify their strategies in the future.

v2:
  1. Rename smc_bpf_ops to smc_ops.
  2. Change the scope of smc_ops from global to per netns.
  3. Directly pass parameters to ops instead of smc_ops_ctx.
  4. Remove struct smc_ops_ctx.
  5. Remove exports that are no longer needed.

D. Wythe (5):
  bpf: export necessary sympols for modules with struct_ops
  net/smc: Introduce generic hook smc_ops
  net/smc: bpf: register smc_ops info struct_ops
  libbpf: fix error when st-prefix_ops and ops from differ btf
  bpf/selftests: add simple selftest for bpf_smc_ops

 include/net/netns/smc.h                       |   3 +
 include/net/smc.h                             |  51 ++++++
 kernel/bpf/bpf_struct_ops.c                   |   2 +
 kernel/bpf/syscall.c                          |   1 +
 net/ipv4/tcp_output.c                         |  15 +-
 net/smc/Kconfig                               |  12 ++
 net/smc/Makefile                              |   1 +
 net/smc/af_smc.c                              |  10 ++
 net/smc/smc_ops.c                             | 150 ++++++++++++++++++
 net/smc/smc_ops.h                             |  31 ++++
 net/smc/smc_sysctl.c                          |  95 +++++++++++
 tools/lib/bpf/libbpf.c                        |  34 +++-
 tools/testing/selftests/bpf/config            |   3 +
 .../selftests/bpf/prog_tests/test_bpf_smc.c   |  25 +++
 tools/testing/selftests/bpf/progs/bpf_smc.c   |  27 ++++
 15 files changed, 453 insertions(+), 7 deletions(-)
 create mode 100644 net/smc/smc_ops.c
 create mode 100644 net/smc/smc_ops.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_bpf_smc.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_smc.c

-- 
2.45.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 1/5] bpf: export necessary sympols for modules with struct_ops
  2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
@ 2024-12-10  4:04 ` D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 2/5] net/smc: Introduce generic hook smc_ops D. Wythe
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:04 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

Exports three necessary symbols for implementing struct_ops with
tristate subsystem.

To hold or release refcnt of struct_ops refcnt by inline funcs
bpf_try_module_get and bpf_module_put which use bpf_struct_ops_get(put)
conditionally.

And to copy obj name from one to the other with effective checks by
bpf_obj_name_cpy.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 kernel/bpf/bpf_struct_ops.c | 2 ++
 kernel/bpf/syscall.c        | 1 +
 2 files changed, 3 insertions(+)

diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 606efe32485a..00c212e0ad39 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -1119,6 +1119,7 @@ bool bpf_struct_ops_get(const void *kdata)
 	map = __bpf_map_inc_not_zero(&st_map->map, false);
 	return !IS_ERR(map);
 }
+EXPORT_SYMBOL_GPL(bpf_struct_ops_get);
 
 void bpf_struct_ops_put(const void *kdata)
 {
@@ -1130,6 +1131,7 @@ void bpf_struct_ops_put(const void *kdata)
 
 	bpf_map_put(&st_map->map);
 }
+EXPORT_SYMBOL_GPL(bpf_struct_ops_put);
 
 int bpf_struct_ops_supported(const struct bpf_struct_ops *st_ops, u32 moff)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5684e8ce132d..62238ec989dc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1167,6 +1167,7 @@ int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size)
 
 	return src - orig_src;
 }
+EXPORT_SYMBOL_GPL(bpf_obj_name_cpy);
 
 int map_check_no_btf(const struct bpf_map *map,
 		     const struct btf *btf,
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 2/5] net/smc: Introduce generic hook smc_ops
  2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 1/5] bpf: export necessary sympols for modules with struct_ops D. Wythe
@ 2024-12-10  4:04 ` D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 3/5] net/smc: bpf: register smc_ops info struct_ops D. Wythe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:04 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

The introduction of IPPROTO_SMC enables eBPF programs to determine
whether to use SMC based on the context of socket creation, such as
network namespaces, PID and comm name, etc.

As a subsequent enhancement, to introduce a new generic hook that
allows decisions on whether to use SMC or not at runtime, including
but not limited to local/remote IP address or ports.

Moreover, in the future, we can achieve more complex extensions to the
protocol stack by extending this ops.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 include/net/netns/smc.h |  3 ++
 include/net/smc.h       | 51 ++++++++++++++++++++++
 net/ipv4/tcp_output.c   | 15 +++++--
 net/smc/Kconfig         | 12 ++++++
 net/smc/Makefile        |  1 +
 net/smc/smc_ops.c       | 51 ++++++++++++++++++++++
 net/smc/smc_ops.h       | 29 +++++++++++++
 net/smc/smc_sysctl.c    | 95 +++++++++++++++++++++++++++++++++++++++++
 8 files changed, 253 insertions(+), 4 deletions(-)
 create mode 100644 net/smc/smc_ops.c
 create mode 100644 net/smc/smc_ops.h

diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h
index fc752a50f91b..59d069f56b2d 100644
--- a/include/net/netns/smc.h
+++ b/include/net/netns/smc.h
@@ -17,6 +17,9 @@ struct netns_smc {
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header		*smc_hdr;
 #endif
+#if IS_ENABLED(CONFIG_SMC_OPS)
+	struct smc_ops __rcu *ops;
+#endif /* CONFIG_SMC_OPS */
 	unsigned int			sysctl_autocorking_size;
 	unsigned int			sysctl_smcr_buf_type;
 	int				sysctl_smcr_testlink_time;
diff --git a/include/net/smc.h b/include/net/smc.h
index db84e4e35080..25c762aa96fc 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -18,6 +18,8 @@
 #include "linux/ism.h"
 
 struct sock;
+struct tcp_sock;
+struct inet_request_sock;
 
 #define SMC_MAX_PNETID_LEN	16	/* Max. length of PNET id */
 
@@ -97,4 +99,53 @@ struct smcd_dev {
 	u8 going_away : 1;
 };
 
+#define  SMC_OPS_NAME_MAX 16
+
+enum {
+	/* ops can be inherit from init_net */
+	SMC_OPS_FLAG_INHERITABLE = 0x1,
+
+	SMC_OPS_ALL_FLAGS = SMC_OPS_FLAG_INHERITABLE,
+};
+
+struct smc_ops {
+	/* priavte */
+
+	struct list_head list;
+	struct module *owner;
+
+	/* public */
+
+	/* unique name */
+	char name[SMC_OPS_NAME_MAX];
+	int flags;
+
+	/* Invoked before computing SMC option for SYN packets.
+	 * We can control whether to set SMC options by returning varios value.
+	 * Return 0 to disable SMC, or return any other value to enable it.
+	 */
+	int (*set_option)(struct tcp_sock *tp);
+
+	/* Invoked before Set up SMC options for SYN-ACK packets
+	 * We can control whether to respond SMC options by returning varios value.
+	 * Return 0 to disable SMC, or return any other value to enable it.
+	 */
+	int (*set_option_cond)(const struct tcp_sock *tp, struct inet_request_sock *ireq);
+};
+
+#if IS_ENABLED(CONFIG_SMC_OPS)
+#define smc_call_retops(init_val, sk, func, ...) ({	\
+	typeof(init_val) __ret = (init_val);		\
+	struct smc_ops *ops;				\
+	rcu_read_lock();				\
+	ops = READ_ONCE(sock_net(sk)->smc.ops);		\
+	if (ops && ops->func)				\
+		__ret = ops->func(__VA_ARGS__);		\
+	rcu_read_unlock();				\
+	__ret;						\
+})
+#else
+#define smc_call_retops(init_val, ...) (init_val)
+#endif /* CONFIG_SMC_OPS */
+
 #endif	/* _SMC_H */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5485a70b5fe5..7b402167fb4d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -48,6 +48,7 @@
 #include <linux/skbuff_ref.h>
 
 #include <trace/events/tcp.h>
+#include <net/smc.h>
 
 /* Refresh clocks of a TCP socket,
  * ensuring monotically increasing values.
@@ -759,14 +760,17 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
 	mptcp_options_write(th, ptr, tp, opts);
 }
 
-static void smc_set_option(const struct tcp_sock *tp,
+static void smc_set_option(struct tcp_sock *tp,
 			   struct tcp_out_options *opts,
 			   unsigned int *remaining)
 {
 #if IS_ENABLED(CONFIG_SMC)
 	if (static_branch_unlikely(&tcp_have_smc)) {
 		if (tp->syn_smc) {
-			if (*remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) {
+			tp->syn_smc = smc_call_retops(1, &tp->inet_conn.icsk_inet.sk,
+						      set_option, tp);
+			/* re-check syn_smc */
+			if (tp->syn_smc && *remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) {
 				opts->options |= OPTION_SMC;
 				*remaining -= TCPOLEN_EXP_SMC_BASE_ALIGNED;
 			}
@@ -776,14 +780,17 @@ static void smc_set_option(const struct tcp_sock *tp,
 }
 
 static void smc_set_option_cond(const struct tcp_sock *tp,
-				const struct inet_request_sock *ireq,
+				struct inet_request_sock *ireq,
 				struct tcp_out_options *opts,
 				unsigned int *remaining)
 {
 #if IS_ENABLED(CONFIG_SMC)
 	if (static_branch_unlikely(&tcp_have_smc)) {
 		if (tp->syn_smc && ireq->smc_ok) {
-			if (*remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) {
+			ireq->smc_ok = smc_call_retops(1, &tp->inet_conn.icsk_inet.sk,
+						       set_option_cond, tp, ireq);
+			/* re-check smc_ok */
+			if (ireq->smc_ok && *remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) {
 				opts->options |= OPTION_SMC;
 				*remaining -= TCPOLEN_EXP_SMC_BASE_ALIGNED;
 			}
diff --git a/net/smc/Kconfig b/net/smc/Kconfig
index ba5e6a2dd2fd..0ee16ec8dceb 100644
--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -33,3 +33,15 @@ config SMC_LO
 	  of architecture or hardware.
 
 	  if unsure, say N.
+
+config SMC_OPS
+	bool "Generic hook for SMC subsystem"
+	depends on SMC && BPF_SYSCALL
+	default n
+	help
+	  SMC_OPS enables support to register genericfor hook via eBPF programs
+	  for SMC subsystem. eBPF programs offer much greater flexibility
+	  in modifying the behavior of the SMC protocol stack compared
+	  to a complete kernel-based approach.
+
+	  if unsure, say N.
diff --git a/net/smc/Makefile b/net/smc/Makefile
index 60f1c87d5212..5dd706b2927a 100644
--- a/net/smc/Makefile
+++ b/net/smc/Makefile
@@ -7,3 +7,4 @@ smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_sta
 smc-y += smc_tracepoint.o smc_inet.o
 smc-$(CONFIG_SYSCTL) += smc_sysctl.o
 smc-$(CONFIG_SMC_LO) += smc_loopback.o
+smc-$(CONFIG_SMC_OPS) += smc_ops.o
\ No newline at end of file
diff --git a/net/smc/smc_ops.c b/net/smc/smc_ops.c
new file mode 100644
index 000000000000..0fc19cadd760
--- /dev/null
+++ b/net/smc/smc_ops.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  Generic hook for SMC subsystem.
+ *
+ *  Copyright IBM Corp. 2016
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: D. Wythe <alibuda@linux.alibaba.com>
+ */
+
+#include "smc_ops.h"
+
+static DEFINE_SPINLOCK(smc_ops_list_lock);
+static LIST_HEAD(smc_ops_list);
+
+static int smc_ops_reg(struct smc_ops *ops)
+{
+	int ret = 0;
+
+	spin_lock(&smc_ops_list_lock);
+	/* already exist or duplicate name */
+	if (smc_ops_find_by_name(ops->name))
+		ret = -EEXIST;
+	else
+		list_add_tail_rcu(&ops->list, &smc_ops_list);
+	spin_unlock(&smc_ops_list_lock);
+	return ret;
+}
+
+static void smc_ops_unreg(struct smc_ops *ops)
+{
+	spin_lock(&smc_ops_list_lock);
+	list_del_rcu(&ops->list);
+	spin_unlock(&smc_ops_list_lock);
+
+	/* Ensure that all readers to complete */
+	synchronize_rcu();
+}
+
+struct smc_ops *smc_ops_find_by_name(const char *name)
+{
+	struct smc_ops *ops;
+
+	list_for_each_entry_rcu(ops, &smc_ops_list, list) {
+		if (strcmp(ops->name, name) == 0)
+			return ops;
+	}
+	return NULL;
+}
diff --git a/net/smc/smc_ops.h b/net/smc/smc_ops.h
new file mode 100644
index 000000000000..214f4c99efd4
--- /dev/null
+++ b/net/smc/smc_ops.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  Shared Memory Communications over RDMA (SMC-R) and RoCE
+ *
+ *  Generic hook for SMC subsystem.
+ *
+ *  Copyright IBM Corp. 2016
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: D. Wythe <alibuda@linux.alibaba.com>
+ */
+
+#ifndef __SMC_OPS
+#define __SMC_OPS
+
+#include <net/smc.h>
+
+#if IS_ENABLED(CONFIG_SMC_OPS)
+/* Find ops by the target name, which required to be a c-string.
+ * Return NULL if no such ops was found,otherwise, return a valid ops.
+ *
+ * Note: Caller MUST ensure it's was invoked under rcu_read_lock.
+ */
+struct smc_ops *smc_ops_find_by_name(const char *name);
+#else
+static inline struct smc_ops *smc_ops_find_by_name(const char *name) { return NULL; }
+#endif /* CONFIG_SMC_OPS*/
+
+#endif /* __SMC_OPS */
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index 2fab6456f765..2aa3f19025f4 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -18,6 +18,7 @@
 #include "smc_core.h"
 #include "smc_llc.h"
 #include "smc_sysctl.h"
+#include "smc_ops.h"
 
 static int min_sndbuf = SMC_BUF_MIN_SIZE;
 static int min_rcvbuf = SMC_BUF_MIN_SIZE;
@@ -30,6 +31,70 @@ static int links_per_lgr_max = SMC_LINKS_ADD_LNK_MAX;
 static int conns_per_lgr_min = SMC_CONN_PER_LGR_MIN;
 static int conns_per_lgr_max = SMC_CONN_PER_LGR_MAX;
 
+#if IS_ENABLED(CONFIG_SMC_OPS)
+static int smc_net_replace_smc_ops(struct net *net, const char *name)
+{
+	struct smc_ops *ops = NULL;
+
+	rcu_read_lock();
+	/* null or empty name ask to clear current ops */
+	if (name && name[0]) {
+		ops = smc_ops_find_by_name(name);
+		if (!ops) {
+			rcu_read_unlock();
+			return -EINVAL;
+		}
+		/* no change, just return */
+		if (ops == rcu_dereference(net->smc.ops)) {
+			rcu_read_unlock();
+			return 0;
+		}
+	}
+	if (!ops || bpf_try_module_get(ops, ops->owner)) {
+		/* xhcg */
+		ops = rcu_replace_pointer(net->smc.ops, ops, true);
+		/* release old ops */
+		if (ops)
+			bpf_module_put(ops, ops->owner);
+	} else if (ops) {
+		rcu_read_unlock();
+		return -EBUSY;
+	}
+	rcu_read_unlock();
+	return 0;
+}
+
+static int proc_smc_ops(const struct ctl_table *ctl, int write,
+			void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct net *net = container_of(ctl->data, struct net,
+				       smc.ops);
+	char val[SMC_OPS_NAME_MAX];
+	struct ctl_table tbl = {
+		.data = val,
+		.maxlen = SMC_OPS_NAME_MAX,
+	};
+	struct smc_ops *ops;
+	int ret;
+
+	rcu_read_lock();
+	ops = rcu_dereference(net->smc.ops);
+	if (ops)
+		memcpy(val, ops->name, sizeof(ops->name));
+	else
+		val[0] = '\0';
+	rcu_read_unlock();
+
+	ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	if (write)
+		ret = smc_net_replace_smc_ops(net, val);
+	return ret;
+}
+#endif /* CONFIG_SMC_OPS */
+
 static struct ctl_table smc_table[] = {
 	{
 		.procname       = "autocorking_size",
@@ -99,6 +164,15 @@ static struct ctl_table smc_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
+#if IS_ENABLED(CONFIG_SMC_OPS)
+	{
+		.procname	= "ops",
+		.data		= &init_net.smc.ops,
+		.mode		= 0644,
+		.maxlen		= SMC_OPS_NAME_MAX,
+		.proc_handler	= proc_smc_ops,
+	},
+#endif /* CONFIG_SMC_OPS */
 };
 
 int __net_init smc_sysctl_net_init(struct net *net)
@@ -109,6 +183,20 @@ int __net_init smc_sysctl_net_init(struct net *net)
 	table = smc_table;
 	if (!net_eq(net, &init_net)) {
 		int i;
+#if IS_ENABLED(CONFIG_SMC_OPS)
+		struct smc_ops *ops;
+
+		rcu_read_lock();
+		ops = rcu_dereference(init_net.smc.ops);
+		if (ops && ops->flags & SMC_OPS_FLAG_INHERITABLE) {
+			if (!bpf_try_module_get(ops, ops->owner)) {
+				rcu_read_unlock();
+				return -EBUSY;
+			}
+			rcu_assign_pointer(net->smc.ops, ops);
+		}
+		rcu_read_unlock();
+#endif /* CONFIG_SMC_OPS */
 
 		table = kmemdup(table, sizeof(smc_table), GFP_KERNEL);
 		if (!table)
@@ -139,6 +227,9 @@ int __net_init smc_sysctl_net_init(struct net *net)
 	if (!net_eq(net, &init_net))
 		kfree(table);
 err_alloc:
+#if IS_ENABLED(CONFIG_SMC_OPS)
+	smc_net_replace_smc_ops(net, NULL);
+#endif /* CONFIG_SMC_OPS */
 	return -ENOMEM;
 }
 
@@ -148,6 +239,10 @@ void __net_exit smc_sysctl_net_exit(struct net *net)
 
 	table = net->smc.smc_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->smc.smc_hdr);
+#if IS_ENABLED(CONFIG_SMC_OPS)
+	smc_net_replace_smc_ops(net, NULL);
+#endif /* CONFIG_SMC_OPS */
+
 	if (!net_eq(net, &init_net))
 		kfree(table);
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 3/5] net/smc: bpf: register smc_ops info struct_ops
  2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 1/5] bpf: export necessary sympols for modules with struct_ops D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 2/5] net/smc: Introduce generic hook smc_ops D. Wythe
@ 2024-12-10  4:04 ` D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf D. Wythe
  2024-12-10  4:04 ` [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops D. Wythe
  4 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:04 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

To implement injection capability for smc via struct_ops, so that
user can make their own smc_ops to modify the behavior of smc stack.

Currently, user can write their own implememtion to choose whether to
use SMC or not before TCP 3rd handshake to be comleted. In the future,
users can implement more complex functions on smc by expanding it.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/af_smc.c  | 10 +++++
 net/smc/smc_ops.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_ops.h |  2 +
 3 files changed, 111 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 9d76e902fd77..6adedae2986d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -55,6 +55,7 @@
 #include "smc_sysctl.h"
 #include "smc_loopback.h"
 #include "smc_inet.h"
+#include "smc_ops.h"
 
 static DEFINE_MUTEX(smc_server_lgr_pending);	/* serialize link group
 						 * creation on server
@@ -3576,8 +3577,17 @@ static int __init smc_init(void)
 		pr_err("%s: smc_inet_init fails with %d\n", __func__, rc);
 		goto out_ulp;
 	}
+
+	rc = smc_bpf_struct_ops_init();
+	if (rc) {
+		pr_err("%s: smc_bpf_struct_ops_init fails with %d\n", __func__, rc);
+		goto out_inet;
+	}
+
 	static_branch_enable(&tcp_have_smc);
 	return 0;
+out_inet:
+	smc_inet_exit();
 out_ulp:
 	tcp_unregister_ulp(&smc_ulp_ops);
 out_lo:
diff --git a/net/smc/smc_ops.c b/net/smc/smc_ops.c
index 0fc19cadd760..0f07652f4837 100644
--- a/net/smc/smc_ops.c
+++ b/net/smc/smc_ops.c
@@ -10,6 +10,10 @@
  *  Author: D. Wythe <alibuda@linux.alibaba.com>
  */
 
+#include <linux/bpf_verifier.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+
 #include "smc_ops.h"
 
 static DEFINE_SPINLOCK(smc_ops_list_lock);
@@ -49,3 +53,98 @@ struct smc_ops *smc_ops_find_by_name(const char *name)
 	}
 	return NULL;
 }
+
+static int __bpf_smc_stub_set_tcp_option(struct tcp_sock *tp) { return 1; }
+static int __bpf_smc_stub_set_tcp_option_cond(const struct tcp_sock *tp,
+					      struct inet_request_sock *ireq)
+{
+	return 1;
+}
+
+static struct smc_ops __bpf_smc_bpf_ops = {
+	.set_option		= __bpf_smc_stub_set_tcp_option,
+	.set_option_cond	= __bpf_smc_stub_set_tcp_option_cond,
+};
+
+static int smc_bpf_ops_init(struct btf *btf) { return 0; }
+
+static int smc_bpf_ops_reg(void *kdata, struct bpf_link *link)
+{
+	return smc_ops_reg(kdata);
+}
+
+static void smc_bpf_ops_unreg(void *kdata, struct bpf_link *link)
+{
+	smc_ops_unreg(kdata);
+}
+
+static int smc_bpf_ops_init_member(const struct btf_type *t,
+				   const struct btf_member *member,
+				   void *kdata, const void *udata)
+{
+	const struct smc_ops *u_ops;
+	struct smc_ops *k_ops;
+	u32 moff;
+
+	u_ops = (const struct smc_ops *)udata;
+	k_ops = (struct smc_ops *)kdata;
+
+	moff = __btf_member_bit_offset(t, member) / 8;
+	switch (moff) {
+	case offsetof(struct smc_ops, name):
+		if (bpf_obj_name_cpy(k_ops->name, u_ops->name,
+				     sizeof(u_ops->name)) <= 0)
+			return -EINVAL;
+		return 1;
+	case offsetof(struct smc_ops, flags):
+		if (u_ops->flags & ~SMC_OPS_ALL_FLAGS)
+			return -EINVAL;
+		k_ops->flags = u_ops->flags;
+		return 1;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int smc_bpf_ops_check_member(const struct btf_type *t,
+				    const struct btf_member *member,
+				    const struct bpf_prog *prog)
+{
+	u32 moff = __btf_member_bit_offset(t, member) / 8;
+
+	switch (moff) {
+	case offsetof(struct smc_ops, name):
+	case offsetof(struct smc_ops, flags):
+	case offsetof(struct smc_ops, set_option):
+	case offsetof(struct smc_ops, set_option_cond):
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct bpf_verifier_ops smc_bpf_verifier_ops = {
+	.get_func_proto		= bpf_base_func_proto,
+	.is_valid_access	= bpf_tracing_btf_ctx_access,
+};
+
+static struct bpf_struct_ops bpf_smc_bpf_ops = {
+	.name		= "smc_ops",
+	.init		= smc_bpf_ops_init,
+	.reg		= smc_bpf_ops_reg,
+	.unreg		= smc_bpf_ops_unreg,
+	.cfi_stubs	= &__bpf_smc_bpf_ops,
+	.verifier_ops	= &smc_bpf_verifier_ops,
+	.init_member	= smc_bpf_ops_init_member,
+	.check_member	= smc_bpf_ops_check_member,
+	.owner		= THIS_MODULE,
+};
+
+int smc_bpf_struct_ops_init(void)
+{
+	return register_bpf_struct_ops(&bpf_smc_bpf_ops, smc_ops);
+}
diff --git a/net/smc/smc_ops.h b/net/smc/smc_ops.h
index 214f4c99efd4..f4e50eae13f6 100644
--- a/net/smc/smc_ops.h
+++ b/net/smc/smc_ops.h
@@ -22,8 +22,10 @@
  * Note: Caller MUST ensure it's was invoked under rcu_read_lock.
  */
 struct smc_ops *smc_ops_find_by_name(const char *name);
+int smc_bpf_struct_ops_init(void);
 #else
 static inline struct smc_ops *smc_ops_find_by_name(const char *name) { return NULL; }
+static inline int smc_bpf_struct_ops_init(void) { return 0; }
 #endif /* CONFIG_SMC_OPS*/
 
 #endif /* __SMC_OPS */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf
  2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
                   ` (2 preceding siblings ...)
  2024-12-10  4:04 ` [PATCH bpf-next v2 3/5] net/smc: bpf: register smc_ops info struct_ops D. Wythe
@ 2024-12-10  4:04 ` D. Wythe
  2024-12-12 22:24   ` Andrii Nakryiko
  2024-12-10  4:04 ` [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops D. Wythe
  4 siblings, 1 reply; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:04 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

When a struct_ops named xxx_ops was registered by a module, and
it will be used in both built-in modules and the module itself,
so that the btf_type of xxx_ops will be present in btf_vmlinux
instead of in btf_mod, which means that the btf_type of
bpf_struct_ops_xxx_ops and xxx_ops will not be in the same btf.

Here are four possible case:

+--------+-------------+-------------+---------------------------------+
|        | st_opx_xxx  | xxx         |                                 |
+--------+-------------+-------------+---------------------------------+
| case 0 | btf_vmlinux | bft_vmlinux | be used and reg only in vmlinux |
+--------+-------------+-------------+---------------------------------+
| case 1 | btf_vmlinux | bpf_mod     | INVALID			       |
+--------+-------------+-------------+---------------------------------+
| case 2 | btf_mod     | btf_vmlinux | reg in mod but be used both in  |
|        |             |             | vmlinux and mod.		       |
+--------+-------------+-------------+---------------------------------+
| case 3 | btf_mod     | btf_mod     | be used and reg only in mod     |
+--------+-------------+-------------+---------------------------------+

At present, cases 0, 1, and 3 can be correctly identified, because
st_ops_xxx is searched from the same btf with xxx. In order to
handle case 2 correctly without affecting other cases, we cannot simply
change the search method for st_ops_xxx from find_btf_by_prefix_kind()
to find_ksym_btf_id(), because in this way, case 1 will not be
recognized anymore.

To address this issue, if st_ops_xxx cannot be found in the btf with xxx
and mod_btf does not exist, do find_ksym_btf_id() again to
avoid such issue.

Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 tools/lib/bpf/libbpf.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 66173ddb5a2d..046feab4ec36 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -994,6 +994,27 @@ static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
 				   const char *name, __u32 kind);
 
+static int
+find_ksym_btf_id_by_prefix_kind(struct bpf_object *obj, const char *prefix,
+				const char *name,
+				__u16 kind, struct btf **res_btf,
+				struct module_btf **res_mod_btf)
+{
+	char btf_type_name[128];
+	int ret;
+
+	ret = snprintf(btf_type_name, sizeof(btf_type_name),
+		       "%s%s", prefix, name);
+	/* snprintf returns the number of characters written excluding the
+	 * terminating null. So, if >= BTF_MAX_NAME_SIZE are written, it
+	 * indicates truncation.
+	 */
+	if (ret < 0 || ret >= sizeof(btf_type_name))
+		return -ENAMETOOLONG;
+
+	return find_ksym_btf_id(obj, btf_type_name, kind, res_btf, res_mod_btf);
+}
+
 static int
 find_struct_ops_kern_types(struct bpf_object *obj, const char *tname_raw,
 			   struct module_btf **mod_btf,
@@ -1028,9 +1049,16 @@ find_struct_ops_kern_types(struct bpf_object *obj, const char *tname_raw,
 	kern_vtype_id = find_btf_by_prefix_kind(btf, STRUCT_OPS_VALUE_PREFIX,
 						tname, BTF_KIND_STRUCT);
 	if (kern_vtype_id < 0) {
-		pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
-			STRUCT_OPS_VALUE_PREFIX, tname);
-		return kern_vtype_id;
+		if (kern_vtype_id == -ENOENT && !*mod_btf)
+			kern_vtype_id =
+				find_ksym_btf_id_by_prefix_kind(obj, STRUCT_OPS_VALUE_PREFIX,
+								tname, BTF_KIND_STRUCT, &btf,
+								mod_btf);
+		if (kern_vtype_id < 0) {
+			pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
+				STRUCT_OPS_VALUE_PREFIX, tname);
+			return kern_vtype_id;
+		}
 	}
 	kern_vtype = btf__type_by_id(btf, kern_vtype_id);
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops
  2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
                   ` (3 preceding siblings ...)
  2024-12-10  4:04 ` [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf D. Wythe
@ 2024-12-10  4:04 ` D. Wythe
  2024-12-10 18:01   ` Alexei Starovoitov
  4 siblings, 1 reply; 10+ messages in thread
From: D. Wythe @ 2024-12-10  4:04 UTC (permalink / raw)
  To: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen
  Cc: kuba, davem, netdev, linux-s390, linux-rdma, bpf

This PATCH adds a tiny selftest for bpf_smc_ops, to verify the ability
to load and attach.

Follow the steps below to run this test.

make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc

Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 tools/testing/selftests/bpf/config            |  3 +++
 .../selftests/bpf/prog_tests/test_bpf_smc.c   | 25 +++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_smc.c   | 27 +++++++++++++++++++
 3 files changed, 55 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_bpf_smc.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_smc.c

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index c378d5d07e02..99f1cf10475f 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -113,3 +113,6 @@ CONFIG_XDP_SOCKETS=y
 CONFIG_XFRM_INTERFACE=y
 CONFIG_TCP_CONG_DCTCP=y
 CONFIG_TCP_CONG_BBR=y
+CONFIG_INFINIBAND=m
+CONFIG_SMC=m
+CONFIG_SMC_OPS=y
\ No newline at end of file
diff --git a/tools/testing/selftests/bpf/prog_tests/test_bpf_smc.c b/tools/testing/selftests/bpf/prog_tests/test_bpf_smc.c
new file mode 100644
index 000000000000..b24da7e8db66
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_bpf_smc.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#include "bpf_smc.skel.h"
+
+static void load(void)
+{
+	struct bpf_smc *skel;
+	int ret;
+
+	skel = bpf_smc__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bpf_smc__open_and_load"))
+		return;
+
+	ret = bpf_smc__attach(skel);
+	ASSERT_EQ(ret, 0, "bpf_smc__attach");
+
+	bpf_smc__destroy(skel);
+}
+
+void test_bpf_smc(void)
+{
+	if (test__start_subtest("load"))
+		load();
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_smc.c b/tools/testing/selftests/bpf/progs/bpf_smc.c
new file mode 100644
index 000000000000..32d15596f209
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_smc.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("struct_ops/bpf_smc_set_tcp_option_cond")
+int BPF_PROG(bpf_smc_set_tcp_option_cond, const struct tcp_sock *tp, struct inet_request_sock *ireq)
+{
+	return 0;
+}
+
+SEC("struct_ops/bpf_smc_set_tcp_option")
+int BPF_PROG(bpf_smc_set_tcp_option, struct tcp_sock *tp)
+{
+	return 1;
+}
+
+SEC(".struct_ops.link")
+struct smc_ops  sample_smc_ops = {
+	.name			= "sample",
+	.set_option		= (void *) bpf_smc_set_tcp_option,
+	.set_option_cond	= (void *) bpf_smc_set_tcp_option_cond,
+};
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops
  2024-12-10  4:04 ` [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops D. Wythe
@ 2024-12-10 18:01   ` Alexei Starovoitov
  2024-12-11  5:17     ` D. Wythe
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2024-12-10 18:01 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni, Song Liu,
	Stanislav Fomichev, Hao Luo, Yonghong Song, Eric Dumazet,
	John Fastabend, KP Singh, Jiri Olsa, guwen, Jakub Kicinski,
	David S. Miller, Network Development, linux-s390, linux-rdma, bpf

On Mon, Dec 9, 2024 at 8:04 PM D. Wythe <alibuda@linux.alibaba.com> wrote:
>
> +SEC("struct_ops/bpf_smc_set_tcp_option_cond")
> +int BPF_PROG(bpf_smc_set_tcp_option_cond, const struct tcp_sock *tp, struct inet_request_sock *ireq)
> +{
> +       return 0;
> +}
> +
> +SEC("struct_ops/bpf_smc_set_tcp_option")
> +int BPF_PROG(bpf_smc_set_tcp_option, struct tcp_sock *tp)
> +{
> +       return 1;
> +}
> +
> +SEC(".struct_ops.link")
> +struct smc_ops  sample_smc_ops = {
> +       .name                   = "sample",
> +       .set_option             = (void *) bpf_smc_set_tcp_option,
> +       .set_option_cond        = (void *) bpf_smc_set_tcp_option_cond,
> +};

These stubs don't inspire confidence that smc_ops api
will be sufficient.
Please implement a real bpf prog that demonstrates the actual use case.

See how bpf_cubic was done. On the day one it was implemented
as a parity to builtin cubic cong control.
And over years we didn't need to touch tcp_congestion_ops.
To be fair that api was already solid due to in-kernel cc modules,
but bpf comes with its own limitations, so it wasn't a guarantee
that tcp_congestion_ops would be enough.
Here you're proposing a brand new smc_ops api while bpf progs
are nothing but stubs. That's not sufficient to prove that api
is viable long term.

In terms of look and feel the smc_ops look ok.
The change from v1 to v2 was a good step.

pw-bot: cr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops
  2024-12-10 18:01   ` Alexei Starovoitov
@ 2024-12-11  5:17     ` D. Wythe
  0 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-11  5:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: D. Wythe, kgraul, wenjia, jaka, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Paolo Abeni,
	Song Liu, Stanislav Fomichev, Hao Luo, Yonghong Song,
	Eric Dumazet, John Fastabend, KP Singh, Jiri Olsa, guwen,
	Jakub Kicinski, David S. Miller, Network Development, linux-s390,
	linux-rdma, bpf

On Tue, Dec 10, 2024 at 10:01:38AM -0800, Alexei Starovoitov wrote:
> On Mon, Dec 9, 2024 at 8:04 PM D. Wythe <alibuda@linux.alibaba.com> wrote:
> >
> > +SEC("struct_ops/bpf_smc_set_tcp_option_cond")
> > +int BPF_PROG(bpf_smc_set_tcp_option_cond, const struct tcp_sock *tp, struct inet_request_sock *ireq)
> > +{
> > +       return 0;
> > +}
> > +
> > +SEC("struct_ops/bpf_smc_set_tcp_option")
> > +int BPF_PROG(bpf_smc_set_tcp_option, struct tcp_sock *tp)
> > +{
> > +       return 1;
> > +}
> > +
> > +SEC(".struct_ops.link")
> > +struct smc_ops  sample_smc_ops = {
> > +       .name                   = "sample",
> > +       .set_option             = (void *) bpf_smc_set_tcp_option,
> > +       .set_option_cond        = (void *) bpf_smc_set_tcp_option_cond,
> > +};
> 
> These stubs don't inspire confidence that smc_ops api
> will be sufficient.
> Please implement a real bpf prog that demonstrates the actual use case.
> 
> See how bpf_cubic was done. On the day one it was implemented
> as a parity to builtin cubic cong control.
> And over years we didn't need to touch tcp_congestion_ops.
> To be fair that api was already solid due to in-kernel cc modules,
> but bpf comes with its own limitations, so it wasn't a guarantee
> that tcp_congestion_ops would be enough.
> Here you're proposing a brand new smc_ops api while bpf progs
> are nothing but stubs. That's not sufficient to prove that api
> is viable long term.

Hi Alexei,

Thanks a lot for your advices. I will add actual cases in the
next version to prove why we need it.

> 
> In terms of look and feel the smc_ops look ok.
> The change from v1 to v2 was a good step.

I'm glad that you feel it looks okay. If you have any questions,
please let me know.

Thanks,
D. Wythe

> 
> pw-bot: cr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf
  2024-12-10  4:04 ` [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf D. Wythe
@ 2024-12-12 22:24   ` Andrii Nakryiko
  2024-12-16  2:14     ` D. Wythe
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2024-12-12 22:24 UTC (permalink / raw)
  To: D. Wythe
  Cc: kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau, pabeni,
	song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh, jolsa,
	guwen, kuba, davem, netdev, linux-s390, linux-rdma, bpf

On Mon, Dec 9, 2024 at 8:04 PM D. Wythe <alibuda@linux.alibaba.com> wrote:
>
> When a struct_ops named xxx_ops was registered by a module, and
> it will be used in both built-in modules and the module itself,
> so that the btf_type of xxx_ops will be present in btf_vmlinux
> instead of in btf_mod, which means that the btf_type of
> bpf_struct_ops_xxx_ops and xxx_ops will not be in the same btf.
>
> Here are four possible case:
>
> +--------+-------------+-------------+---------------------------------+
> |        | st_opx_xxx  | xxx         |                                 |
> +--------+-------------+-------------+---------------------------------+
> | case 0 | btf_vmlinux | bft_vmlinux | be used and reg only in vmlinux |
> +--------+-------------+-------------+---------------------------------+
> | case 1 | btf_vmlinux | bpf_mod     | INVALID                         |
> +--------+-------------+-------------+---------------------------------+
> | case 2 | btf_mod     | btf_vmlinux | reg in mod but be used both in  |
> |        |             |             | vmlinux and mod.                |
> +--------+-------------+-------------+---------------------------------+
> | case 3 | btf_mod     | btf_mod     | be used and reg only in mod     |
> +--------+-------------+-------------+---------------------------------+
>
> At present, cases 0, 1, and 3 can be correctly identified, because
> st_ops_xxx is searched from the same btf with xxx. In order to
> handle case 2 correctly without affecting other cases, we cannot simply
> change the search method for st_ops_xxx from find_btf_by_prefix_kind()
> to find_ksym_btf_id(), because in this way, case 1 will not be
> recognized anymore.
>
> To address this issue, if st_ops_xxx cannot be found in the btf with xxx
> and mod_btf does not exist, do find_ksym_btf_id() again to
> avoid such issue.
>
> Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>  tools/lib/bpf/libbpf.c | 34 +++++++++++++++++++++++++++++++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 66173ddb5a2d..046feab4ec36 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -994,6 +994,27 @@ static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
>  static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
>                                    const char *name, __u32 kind);
>
> +static int
> +find_ksym_btf_id_by_prefix_kind(struct bpf_object *obj, const char *prefix,
> +                               const char *name,
> +                               __u16 kind, struct btf **res_btf,
> +                               struct module_btf **res_mod_btf)
> +{
> +       char btf_type_name[128];
> +       int ret;
> +
> +       ret = snprintf(btf_type_name, sizeof(btf_type_name),
> +                      "%s%s", prefix, name);
> +       /* snprintf returns the number of characters written excluding the
> +        * terminating null. So, if >= BTF_MAX_NAME_SIZE are written, it
> +        * indicates truncation.
> +        */
> +       if (ret < 0 || ret >= sizeof(btf_type_name))
> +               return -ENAMETOOLONG;
> +
> +       return find_ksym_btf_id(obj, btf_type_name, kind, res_btf, res_mod_btf);
> +}

I don't think we need this helper, see below.

> +
>  static int
>  find_struct_ops_kern_types(struct bpf_object *obj, const char *tname_raw,
>                            struct module_btf **mod_btf,
> @@ -1028,9 +1049,16 @@ find_struct_ops_kern_types(struct bpf_object *obj, const char *tname_raw,
>         kern_vtype_id = find_btf_by_prefix_kind(btf, STRUCT_OPS_VALUE_PREFIX,
>                                                 tname, BTF_KIND_STRUCT);

instead of using find_btf_by_prefix_kind, let's have:

1) snprintf(STRUCT_OPS_VALUE_PREFIX, tname) right here in this
function, so we have expected type constructed and ready to be used
and reused, if necessary
2) call btf__find_by_name_kind() instead of find_btf_by_prefix_kind()
3) if (kern_vtype_id < 0 && !*mod_btf)
      kern_vtype_id = find_ksym_btf_id(...)
4) if (kern_vtype_id < 0) /* now emit error and error out */

>         if (kern_vtype_id < 0) {
> -               pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
> -                       STRUCT_OPS_VALUE_PREFIX, tname);
> -               return kern_vtype_id;
> +               if (kern_vtype_id == -ENOENT && !*mod_btf)
> +                       kern_vtype_id =
> +                               find_ksym_btf_id_by_prefix_kind(obj, STRUCT_OPS_VALUE_PREFIX,
> +                                                               tname, BTF_KIND_STRUCT, &btf,
> +                                                               mod_btf);
> +               if (kern_vtype_id < 0) {
> +                       pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
> +                               STRUCT_OPS_VALUE_PREFIX, tname);
> +                       return kern_vtype_id;
> +               }
>         }
>         kern_vtype = btf__type_by_id(btf, kern_vtype_id);
>
> --
> 2.45.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf
  2024-12-12 22:24   ` Andrii Nakryiko
@ 2024-12-16  2:14     ` D. Wythe
  0 siblings, 0 replies; 10+ messages in thread
From: D. Wythe @ 2024-12-16  2:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: D. Wythe, kgraul, wenjia, jaka, ast, daniel, andrii, martin.lau,
	pabeni, song, sdf, haoluo, yhs, edumazet, john.fastabend, kpsingh,
	jolsa, guwen, kuba, davem, netdev, linux-s390, linux-rdma, bpf

On Thu, Dec 12, 2024 at 02:24:46PM -0800, Andrii Nakryiko wrote:
> On Mon, Dec 9, 2024 at 8:04 PM D. Wythe <alibuda@linux.alibaba.com> wrote:
> >
> > When a struct_ops named xxx_ops was registered by a module, and
> > it will be used in both built-in modules and the module itself,
> > so that the btf_type of xxx_ops will be present in btf_vmlinux
> 
> instead of using find_btf_by_prefix_kind, let's have:
> 
> 1) snprintf(STRUCT_OPS_VALUE_PREFIX, tname) right here in this
> function, so we have expected type constructed and ready to be used
> and reused, if necessary
> 2) call btf__find_by_name_kind() instead of find_btf_by_prefix_kind()
> 3) if (kern_vtype_id < 0 && !*mod_btf)
>       kern_vtype_id = find_ksym_btf_id(...)
> 4) if (kern_vtype_id < 0) /* now emit error and error out */

Got it. This looks more concise, I will modify it in the next version in
this way.

Thanks,
D. Wythe

> 
> >         if (kern_vtype_id < 0) {
> > -               pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
> > -                       STRUCT_OPS_VALUE_PREFIX, tname);
> > -               return kern_vtype_id;
> > +               if (kern_vtype_id == -ENOENT && !*mod_btf)
> > +                       kern_vtype_id =
> > +                               find_ksym_btf_id_by_prefix_kind(obj, STRUCT_OPS_VALUE_PREFIX,
> > +                                                               tname, BTF_KIND_STRUCT, &btf,
> > +                                                               mod_btf);
> > +               if (kern_vtype_id < 0) {
> > +                       pr_warn("struct_ops init_kern: struct %s%s is not found in kernel BTF\n",
> > +                               STRUCT_OPS_VALUE_PREFIX, tname);
> > +                       return kern_vtype_id;
> > +               }
> >         }
> >         kern_vtype = btf__type_by_id(btf, kern_vtype_id);
> >
> > --
> > 2.45.0
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-12-16  2:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10  4:03 [PATCH bpf-next v2 0/5] net/smc: Introduce smc_ops D. Wythe
2024-12-10  4:04 ` [PATCH bpf-next v2 1/5] bpf: export necessary sympols for modules with struct_ops D. Wythe
2024-12-10  4:04 ` [PATCH bpf-next v2 2/5] net/smc: Introduce generic hook smc_ops D. Wythe
2024-12-10  4:04 ` [PATCH bpf-next v2 3/5] net/smc: bpf: register smc_ops info struct_ops D. Wythe
2024-12-10  4:04 ` [PATCH bpf-next v2 4/5] libbpf: fix error when st-prefix_ops and ops from differ btf D. Wythe
2024-12-12 22:24   ` Andrii Nakryiko
2024-12-16  2:14     ` D. Wythe
2024-12-10  4:04 ` [PATCH bpf-next v2 5/5] bpf/selftests: add simple selftest for bpf_smc_ops D. Wythe
2024-12-10 18:01   ` Alexei Starovoitov
2024-12-11  5:17     ` D. Wythe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).