BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
@ 2022-08-16 20:12 Stanislav Fomichev
  2022-08-16 20:12 ` [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto Stanislav Fomichev
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-16 20:12 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa

Apparently, only a small subset of cgroup hooks actually falls
back to cgroup_base_func_proto. This leads to unexpected result
where not all cgroup helpers have bpf_{g,s}et_retval.

It's getting harder and harder to manage which helpers are exported
to which hooks. We now have the following call chains:

- cg_skb_func_proto
  - sk_filter_func_proto
    - bpf_sk_base_func_proto
      - bpf_base_func_proto

So by looking at cg_skb_func_proto it's pretty hard to understand
what's going on.

For cgroup helpers, I'm proposing we do the following instead:

  func_proto = cgroup_common_func_proto();
  if (func_proto) return func_proto;

  /* optional, if hook has 'current' */
  func_proto = cgroup_current_func_proto();
  if (func_proto) return func_proto;

  ...

  switch (func_id) {
  /* hook specific helpers */
  case BPF_FUNC_hook_specific_helper:
    return &xyz;
  default:
    /* always fall back to plain bpf_base_func_proto */
    bpf_base_func_proto(func_id);
  }

If this turns out more workable, we can follow up with converting
the rest to the same pattern.

v2:
- move everything into kernel/bpf/cgroup.c instead (Martin)
- use cgroup_common_func_proto in lsm (Martin)

Stanislav Fomichev (3):
  bpf: Introduce cgroup_{common,current}_func_proto
  bpf: Use cgroup_{common,current}_func_proto in more hooks
  selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhere

 include/linux/bpf.h                           |  18 +-
 kernel/bpf/bpf_lsm.c                          |  19 +-
 kernel/bpf/cgroup.c                           | 333 +++++++++++++++++-
 kernel/bpf/helpers.c                          | 205 +----------
 net/core/filter.c                             |  92 ++---
 tools/testing/selftests/bpf/Makefile          |   1 +
 .../bpf/cgroup_getset_retval_hooks.h          |  25 ++
 .../bpf/prog_tests/cgroup_getset_retval.c     |  48 +++
 .../bpf/progs/cgroup_getset_retval_hooks.c    |  16 +
 9 files changed, 466 insertions(+), 291 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/cgroup_getset_retval_hooks.h
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_getset_retval_hooks.c

-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto
  2022-08-16 20:12 [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Stanislav Fomichev
@ 2022-08-16 20:12 ` Stanislav Fomichev
  2022-08-17 19:22   ` Martin KaFai Lau
  2022-08-16 20:12 ` [PATCH bpf-next v2 2/3] bpf: Use cgroup_{common,current}_func_proto in more hooks Stanislav Fomichev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-16 20:12 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa

Split cgroup_base_func_proto into the following:

* cgroup_common_func_proto - common helpers for all cgroup hooks
* cgroup_current_func_proto - common helpers for all cgroup hooks
  running in the process context (== have meaningful 'current').

Move bpf_{g,s}et_retval and other cgroup-related helpers into
kernel/bpf/cgroup.c so they closer to where they are being used.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h  |  18 ++-
 kernel/bpf/cgroup.c  | 301 ++++++++++++++++++++++++++++++++++++++++---
 kernel/bpf/helpers.c | 204 -----------------------------
 net/core/filter.c    |  14 +-
 4 files changed, 301 insertions(+), 236 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a627a02cf8ab..c302d2de073a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1948,6 +1948,10 @@ struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *
+cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
+const struct bpf_func_proto *
+cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
 const struct btf_func_model *
@@ -2154,6 +2158,18 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	return NULL;
 }
 
+static inline const struct bpf_func_proto *
+cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	return NULL;
+}
+
+static inline const struct bpf_func_proto *
+cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	return NULL;
+}
+
 static inline void bpf_task_storage_free(struct task_struct *task)
 {
 }
@@ -2369,8 +2385,6 @@ extern const struct bpf_func_proto bpf_sk_redirect_map_proto;
 extern const struct bpf_func_proto bpf_spin_lock_proto;
 extern const struct bpf_func_proto bpf_spin_unlock_proto;
 extern const struct bpf_func_proto bpf_get_local_storage_proto;
-extern const struct bpf_func_proto bpf_strtol_proto;
-extern const struct bpf_func_proto bpf_strtoul_proto;
 extern const struct bpf_func_proto bpf_tcp_sock_proto;
 extern const struct bpf_func_proto bpf_jiffies64_proto;
 extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 59b7eb60d5b4..8ead5df3fafb 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -18,7 +18,9 @@
 #include <linux/bpf_verifier.h>
 #include <net/sock.h>
 #include <net/bpf_sk_storage.h>
+#include <net/cls_cgroup.h>
 
+#include "../lib/kstrtox.h"
 #include "../cgroup/cgroup-internal.h"
 
 DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
@@ -1527,6 +1529,78 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 	return ret;
 }
 
+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+	struct cgroup *cgrp;
+	u64 cgrp_id;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	cgrp_id = cgroup_id(cgrp);
+	rcu_read_unlock();
+
+	return cgrp_id;
+}
+
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+	.func		= bpf_get_current_cgroup_id,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+
+BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
+{
+	struct cgroup *cgrp;
+	struct cgroup *ancestor;
+	u64 cgrp_id;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	ancestor = cgroup_ancestor(cgrp, ancestor_level);
+	cgrp_id = ancestor ? cgroup_id(ancestor) : 0;
+	rcu_read_unlock();
+
+	return cgrp_id;
+}
+
+const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = {
+	.func		= bpf_get_current_ancestor_cgroup_id,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
+{
+	/* flags argument is not used now,
+	 * but provides an ability to extend the API.
+	 * verifier checks that its value is correct.
+	 */
+	enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
+	struct bpf_cgroup_storage *storage;
+	struct bpf_cg_run_ctx *ctx;
+	void *ptr;
+
+	/* get current cgroup storage from BPF run context */
+	ctx = container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
+	storage = ctx->prog_item->cgroup_storage[stype];
+
+	if (stype == BPF_CGROUP_STORAGE_SHARED)
+		ptr = &READ_ONCE(storage->buf)->data[0];
+	else
+		ptr = this_cpu_ptr(storage->percpu_buf);
+
+	return (unsigned long)ptr;
+}
+
+const struct bpf_func_proto bpf_get_local_storage_proto = {
+	.func		= bpf_get_local_storage,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MAP_VALUE,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_0(bpf_get_retval)
 {
 	struct bpf_cg_run_ctx *ctx =
@@ -1557,33 +1631,168 @@ const struct bpf_func_proto bpf_set_retval_proto = {
 	.arg1_type	= ARG_ANYTHING,
 };
 
+#define BPF_STRTOX_BASE_MASK 0x1F
+
+static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
+			  unsigned long long *res, bool *is_negative)
+{
+	unsigned int base = flags & BPF_STRTOX_BASE_MASK;
+	const char *cur_buf = buf;
+	size_t cur_len = buf_len;
+	unsigned int consumed;
+	size_t val_len;
+	char str[64];
+
+	if (!buf || !buf_len || !res || !is_negative)
+		return -EINVAL;
+
+	if (base != 0 && base != 8 && base != 10 && base != 16)
+		return -EINVAL;
+
+	if (flags & ~BPF_STRTOX_BASE_MASK)
+		return -EINVAL;
+
+	while (cur_buf < buf + buf_len && isspace(*cur_buf))
+		++cur_buf;
+
+	*is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
+	if (*is_negative)
+		++cur_buf;
+
+	consumed = cur_buf - buf;
+	cur_len -= consumed;
+	if (!cur_len)
+		return -EINVAL;
+
+	cur_len = min(cur_len, sizeof(str) - 1);
+	memcpy(str, cur_buf, cur_len);
+	str[cur_len] = '\0';
+	cur_buf = str;
+
+	cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
+	val_len = _parse_integer(cur_buf, base, res);
+
+	if (val_len & KSTRTOX_OVERFLOW)
+		return -ERANGE;
+
+	if (val_len == 0)
+		return -EINVAL;
+
+	cur_buf += val_len;
+	consumed += cur_buf - str;
+
+	return consumed;
+}
+
+static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
+			 long long *res)
+{
+	unsigned long long _res;
+	bool is_negative;
+	int err;
+
+	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
+	if (err < 0)
+		return err;
+	if (is_negative) {
+		if ((long long)-_res > 0)
+			return -ERANGE;
+		*res = -_res;
+	} else {
+		if ((long long)_res < 0)
+			return -ERANGE;
+		*res = _res;
+	}
+	return err;
+}
+
+BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
+	   long *, res)
+{
+	long long _res;
+	int err;
+
+	err = __bpf_strtoll(buf, buf_len, flags, &_res);
+	if (err < 0)
+		return err;
+	if (_res != (long)_res)
+		return -ERANGE;
+	*res = _res;
+	return err;
+}
+
+static const struct bpf_func_proto bpf_strtol_proto = {
+	.func		= bpf_strtol,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_LONG,
+};
+
+BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
+	   unsigned long *, res)
+{
+	unsigned long long _res;
+	bool is_negative;
+	int err;
+
+	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
+	if (err < 0)
+		return err;
+	if (is_negative)
+		return -EINVAL;
+	if (_res != (unsigned long)_res)
+		return -ERANGE;
+	*res = _res;
+	return err;
+}
+
+static const struct bpf_func_proto bpf_strtoul_proto = {
+	.func		= bpf_strtoul,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg2_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_LONG,
+};
+
+#ifdef CONFIG_CGROUP_NET_CLASSID
+BPF_CALL_0(bpf_get_cgroup_classid_curr)
+{
+	return __task_get_classid(current);
+}
+
+const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto = {
+	.func		= bpf_get_cgroup_classid_curr,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+#endif
+
 static const struct bpf_func_proto *
-cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
+	func_proto = cgroup_current_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
-	case BPF_FUNC_get_current_uid_gid:
-		return &bpf_get_current_uid_gid_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
-	case BPF_FUNC_get_current_cgroup_id:
-		return &bpf_get_current_cgroup_id_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
-	case BPF_FUNC_get_retval:
-		return &bpf_get_retval_proto;
-	case BPF_FUNC_set_retval:
-		return &bpf_set_retval_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
 }
 
-static const struct bpf_func_proto *
-cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
-{
-	return cgroup_base_func_proto(func_id, prog);
-}
-
 static bool cgroup_dev_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
 				       const struct bpf_prog *prog,
@@ -2096,6 +2305,16 @@ static const struct bpf_func_proto bpf_sysctl_set_new_value_proto = {
 static const struct bpf_func_proto *
 sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
+	func_proto = cgroup_current_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
 	case BPF_FUNC_strtol:
 		return &bpf_strtol_proto;
@@ -2111,8 +2330,10 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sysctl_set_new_value_proto;
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
+	case BPF_FUNC_perf_event_output:
+		return &bpf_event_output_data_proto;
 	default:
-		return cgroup_base_func_proto(func_id, prog);
+		return bpf_base_func_proto(func_id);
 	}
 }
 
@@ -2233,6 +2454,16 @@ static const struct bpf_func_proto bpf_get_netns_cookie_sockopt_proto = {
 static const struct bpf_func_proto *
 cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
+	func_proto = cgroup_current_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
 #ifdef CONFIG_NET
 	case BPF_FUNC_get_netns_cookie:
@@ -2254,8 +2485,10 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_tcp_sock:
 		return &bpf_tcp_sock_proto;
 #endif
+	case BPF_FUNC_perf_event_output:
+		return &bpf_event_output_data_proto;
 	default:
-		return cgroup_base_func_proto(func_id, prog);
+		return bpf_base_func_proto(func_id);
 	}
 }
 
@@ -2420,3 +2653,33 @@ const struct bpf_verifier_ops cg_sockopt_verifier_ops = {
 
 const struct bpf_prog_ops cg_sockopt_prog_ops = {
 };
+
+/* Common helpers for cgroup hooks. */
+const struct bpf_func_proto *
+cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_get_local_storage:
+		return &bpf_get_local_storage_proto;
+	case BPF_FUNC_get_retval:
+		return &bpf_get_retval_proto;
+	case BPF_FUNC_set_retval:
+		return &bpf_set_retval_proto;
+	default:
+		return NULL;
+	}
+}
+
+/* Common helpers for cgroup hooks with valid process context. */
+const struct bpf_func_proto *
+cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_get_current_uid_gid:
+		return &bpf_get_current_uid_gid_proto;
+	case BPF_FUNC_get_current_cgroup_id:
+		return &bpf_get_current_cgroup_id_proto;
+	default:
+		return NULL;
+	}
+}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3c1b9bbcf971..e20a4657cb55 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -386,210 +386,6 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
 	.ret_type	= RET_INTEGER,
 };
 
-#ifdef CONFIG_CGROUPS
-BPF_CALL_0(bpf_get_current_cgroup_id)
-{
-	struct cgroup *cgrp;
-	u64 cgrp_id;
-
-	rcu_read_lock();
-	cgrp = task_dfl_cgroup(current);
-	cgrp_id = cgroup_id(cgrp);
-	rcu_read_unlock();
-
-	return cgrp_id;
-}
-
-const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
-	.func		= bpf_get_current_cgroup_id,
-	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
-};
-
-BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
-{
-	struct cgroup *cgrp;
-	struct cgroup *ancestor;
-	u64 cgrp_id;
-
-	rcu_read_lock();
-	cgrp = task_dfl_cgroup(current);
-	ancestor = cgroup_ancestor(cgrp, ancestor_level);
-	cgrp_id = ancestor ? cgroup_id(ancestor) : 0;
-	rcu_read_unlock();
-
-	return cgrp_id;
-}
-
-const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = {
-	.func		= bpf_get_current_ancestor_cgroup_id,
-	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_ANYTHING,
-};
-
-#ifdef CONFIG_CGROUP_BPF
-
-BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
-{
-	/* flags argument is not used now,
-	 * but provides an ability to extend the API.
-	 * verifier checks that its value is correct.
-	 */
-	enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
-	struct bpf_cgroup_storage *storage;
-	struct bpf_cg_run_ctx *ctx;
-	void *ptr;
-
-	/* get current cgroup storage from BPF run context */
-	ctx = container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
-	storage = ctx->prog_item->cgroup_storage[stype];
-
-	if (stype == BPF_CGROUP_STORAGE_SHARED)
-		ptr = &READ_ONCE(storage->buf)->data[0];
-	else
-		ptr = this_cpu_ptr(storage->percpu_buf);
-
-	return (unsigned long)ptr;
-}
-
-const struct bpf_func_proto bpf_get_local_storage_proto = {
-	.func		= bpf_get_local_storage,
-	.gpl_only	= false,
-	.ret_type	= RET_PTR_TO_MAP_VALUE,
-	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_ANYTHING,
-};
-#endif
-
-#define BPF_STRTOX_BASE_MASK 0x1F
-
-static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
-			  unsigned long long *res, bool *is_negative)
-{
-	unsigned int base = flags & BPF_STRTOX_BASE_MASK;
-	const char *cur_buf = buf;
-	size_t cur_len = buf_len;
-	unsigned int consumed;
-	size_t val_len;
-	char str[64];
-
-	if (!buf || !buf_len || !res || !is_negative)
-		return -EINVAL;
-
-	if (base != 0 && base != 8 && base != 10 && base != 16)
-		return -EINVAL;
-
-	if (flags & ~BPF_STRTOX_BASE_MASK)
-		return -EINVAL;
-
-	while (cur_buf < buf + buf_len && isspace(*cur_buf))
-		++cur_buf;
-
-	*is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
-	if (*is_negative)
-		++cur_buf;
-
-	consumed = cur_buf - buf;
-	cur_len -= consumed;
-	if (!cur_len)
-		return -EINVAL;
-
-	cur_len = min(cur_len, sizeof(str) - 1);
-	memcpy(str, cur_buf, cur_len);
-	str[cur_len] = '\0';
-	cur_buf = str;
-
-	cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
-	val_len = _parse_integer(cur_buf, base, res);
-
-	if (val_len & KSTRTOX_OVERFLOW)
-		return -ERANGE;
-
-	if (val_len == 0)
-		return -EINVAL;
-
-	cur_buf += val_len;
-	consumed += cur_buf - str;
-
-	return consumed;
-}
-
-static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
-			 long long *res)
-{
-	unsigned long long _res;
-	bool is_negative;
-	int err;
-
-	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
-	if (err < 0)
-		return err;
-	if (is_negative) {
-		if ((long long)-_res > 0)
-			return -ERANGE;
-		*res = -_res;
-	} else {
-		if ((long long)_res < 0)
-			return -ERANGE;
-		*res = _res;
-	}
-	return err;
-}
-
-BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
-	   long *, res)
-{
-	long long _res;
-	int err;
-
-	err = __bpf_strtoll(buf, buf_len, flags, &_res);
-	if (err < 0)
-		return err;
-	if (_res != (long)_res)
-		return -ERANGE;
-	*res = _res;
-	return err;
-}
-
-const struct bpf_func_proto bpf_strtol_proto = {
-	.func		= bpf_strtol,
-	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
-	.arg2_type	= ARG_CONST_SIZE,
-	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_LONG,
-};
-
-BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
-	   unsigned long *, res)
-{
-	unsigned long long _res;
-	bool is_negative;
-	int err;
-
-	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
-	if (err < 0)
-		return err;
-	if (is_negative)
-		return -EINVAL;
-	if (_res != (unsigned long)_res)
-		return -ERANGE;
-	*res = _res;
-	return err;
-}
-
-const struct bpf_func_proto bpf_strtoul_proto = {
-	.func		= bpf_strtoul,
-	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
-	.arg2_type	= ARG_CONST_SIZE,
-	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_LONG,
-};
-#endif
 
 BPF_CALL_3(bpf_strncmp, const char *, s1, u32, s1_sz, const char *, s2)
 {
diff --git a/net/core/filter.c b/net/core/filter.c
index 5669248aff25..737bef7ff831 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3004,17 +3004,6 @@ static const struct bpf_func_proto bpf_msg_pop_data_proto = {
 };
 
 #ifdef CONFIG_CGROUP_NET_CLASSID
-BPF_CALL_0(bpf_get_cgroup_classid_curr)
-{
-	return __task_get_classid(current);
-}
-
-static const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto = {
-	.func		= bpf_get_cgroup_classid_curr,
-	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
-};
-
 BPF_CALL_1(bpf_skb_cgroup_classid, const struct sk_buff *, skb)
 {
 	struct sock *sk = skb_to_full_sk(skb);
@@ -8101,6 +8090,9 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 
 const struct bpf_func_proto bpf_msg_redirect_map_proto __weak;
 const struct bpf_func_proto bpf_msg_redirect_hash_proto __weak;
+const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto __weak;
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
+const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto __weak;
 
 static const struct bpf_func_proto *
 sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 2/3] bpf: Use cgroup_{common,current}_func_proto in more hooks
  2022-08-16 20:12 [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Stanislav Fomichev
  2022-08-16 20:12 ` [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto Stanislav Fomichev
@ 2022-08-16 20:12 ` Stanislav Fomichev
  2022-08-16 20:12 ` [PATCH bpf-next v2 3/3] selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhere Stanislav Fomichev
  2022-08-17 19:07 ` [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Martin KaFai Lau
  3 siblings, 0 replies; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-16 20:12 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa

The following hooks are per-cgroup hooks but they are not
using cgroup_{common,current}_func_proto, fix it:

* BPF_PROG_TYPE_CGROUP_SKB (cg_skb)
* BPF_PROG_TYPE_CGROUP_SOCK_ADDR (cg_sock_addr)
* BPF_PROG_TYPE_CGROUP_SOCK (cg_sock)
* BPF_PROG_TYPE_LSM+BPF_LSM_CGROUP

Also:

* move common func_proto's into cgroup func_proto handlers
* make sure bpf_{g,s}et_retval are not accessible from recvmsg,
  getpeername and getsockname (return/errno is ignored in these
  places)
* as a side effect, expose get_current_pid_tgid, get_current_comm_proto,
  get_current_ancestor_cgroup_id, get_cgroup_classid to more cgroup
  hooks

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/bpf_lsm.c | 19 ++++++-----
 kernel/bpf/cgroup.c  | 36 ++++++++++++++++++--
 kernel/bpf/helpers.c |  1 +
 net/core/filter.c    | 78 ++++++++++++++++++--------------------------
 4 files changed, 77 insertions(+), 57 deletions(-)

diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index fa71d58b7ded..6eba60248e20 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -189,6 +189,16 @@ static const struct bpf_func_proto bpf_get_attach_cookie_proto = {
 static const struct bpf_func_proto *
 bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+#ifdef CONFIG_CGROUP_BPF
+	const struct bpf_func_proto *func_proto;
+
+	if (prog->expected_attach_type == BPF_LSM_CGROUP) {
+		func_proto = cgroup_common_func_proto(func_id, prog);
+		if (func_proto)
+			return func_proto;
+	}
+#endif
+
 	switch (func_id) {
 	case BPF_FUNC_inode_storage_get:
 		return &bpf_inode_storage_get_proto;
@@ -212,15 +222,6 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL;
 	case BPF_FUNC_get_attach_cookie:
 		return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL;
-	case BPF_FUNC_get_local_storage:
-		return prog->expected_attach_type == BPF_LSM_CGROUP ?
-			&bpf_get_local_storage_proto : NULL;
-	case BPF_FUNC_set_retval:
-		return prog->expected_attach_type == BPF_LSM_CGROUP ?
-			&bpf_set_retval_proto : NULL;
-	case BPF_FUNC_get_retval:
-		return prog->expected_attach_type == BPF_LSM_CGROUP ?
-			&bpf_get_retval_proto : NULL;
 #ifdef CONFIG_NET
 	case BPF_FUNC_setsockopt:
 		if (prog->expected_attach_type != BPF_LSM_CGROUP)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 8ead5df3fafb..7c73b334244a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -2662,9 +2662,31 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_local_storage:
 		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_get_retval:
-		return &bpf_get_retval_proto;
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_SOCK_OPS:
+		case BPF_CGROUP_UDP4_RECVMSG:
+		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_INET4_GETPEERNAME:
+		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_INET4_GETSOCKNAME:
+		case BPF_CGROUP_INET6_GETSOCKNAME:
+			return NULL;
+		default:
+			return &bpf_get_retval_proto;
+		}
 	case BPF_FUNC_set_retval:
-		return &bpf_set_retval_proto;
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_SOCK_OPS:
+		case BPF_CGROUP_UDP4_RECVMSG:
+		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_INET4_GETPEERNAME:
+		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_INET4_GETSOCKNAME:
+		case BPF_CGROUP_INET6_GETSOCKNAME:
+			return NULL;
+		default:
+			return &bpf_set_retval_proto;
+		}
 	default:
 		return NULL;
 	}
@@ -2677,8 +2699,18 @@ cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	switch (func_id) {
 	case BPF_FUNC_get_current_uid_gid:
 		return &bpf_get_current_uid_gid_proto;
+	case BPF_FUNC_get_current_pid_tgid:
+		return &bpf_get_current_pid_tgid_proto;
+	case BPF_FUNC_get_current_comm:
+		return &bpf_get_current_comm_proto;
 	case BPF_FUNC_get_current_cgroup_id:
 		return &bpf_get_current_cgroup_id_proto;
+	case BPF_FUNC_get_current_ancestor_cgroup_id:
+		return &bpf_get_current_ancestor_cgroup_id_proto;
+#ifdef CONFIG_CGROUP_NET_CLASSID
+	case BPF_FUNC_get_cgroup_classid:
+		return &bpf_get_cgroup_classid_curr_proto;
+#endif
 	default:
 		return NULL;
 	}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e20a4657cb55..45ce7a26a147 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1394,6 +1394,7 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
 const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
 const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
+const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto __weak;
 
 const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
diff --git a/net/core/filter.c b/net/core/filter.c
index 737bef7ff831..699a10949bc2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7652,34 +7652,23 @@ const struct bpf_func_proto bpf_sk_storage_get_cg_sock_proto __weak;
 static const struct bpf_func_proto *
 sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
+	func_proto = cgroup_current_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
-	/* inet and inet6 sockets are created in a process
-	 * context so there is always a valid uid/gid
-	 */
-	case BPF_FUNC_get_current_uid_gid:
-		return &bpf_get_current_uid_gid_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_cookie_sock_proto;
 	case BPF_FUNC_get_netns_cookie:
 		return &bpf_get_netns_cookie_sock_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
-	case BPF_FUNC_get_current_pid_tgid:
-		return &bpf_get_current_pid_tgid_proto;
-	case BPF_FUNC_get_current_comm:
-		return &bpf_get_current_comm_proto;
-#ifdef CONFIG_CGROUPS
-	case BPF_FUNC_get_current_cgroup_id:
-		return &bpf_get_current_cgroup_id_proto;
-	case BPF_FUNC_get_current_ancestor_cgroup_id:
-		return &bpf_get_current_ancestor_cgroup_id_proto;
-#endif
-#ifdef CONFIG_CGROUP_NET_CLASSID
-	case BPF_FUNC_get_cgroup_classid:
-		return &bpf_get_cgroup_classid_curr_proto;
-#endif
 	case BPF_FUNC_sk_storage_get:
 		return &bpf_sk_storage_get_cg_sock_proto;
 	case BPF_FUNC_ktime_get_coarse_ns:
@@ -7692,12 +7681,17 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 static const struct bpf_func_proto *
 sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
+	func_proto = cgroup_current_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
-	/* inet and inet6 sockets are created in a process
-	 * context so there is always a valid uid/gid
-	 */
-	case BPF_FUNC_get_current_uid_gid:
-		return &bpf_get_current_uid_gid_proto;
 	case BPF_FUNC_bind:
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_CONNECT:
@@ -7710,24 +7704,8 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_cookie_sock_addr_proto;
 	case BPF_FUNC_get_netns_cookie:
 		return &bpf_get_netns_cookie_sock_addr_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
-	case BPF_FUNC_get_current_pid_tgid:
-		return &bpf_get_current_pid_tgid_proto;
-	case BPF_FUNC_get_current_comm:
-		return &bpf_get_current_comm_proto;
-#ifdef CONFIG_CGROUPS
-	case BPF_FUNC_get_current_cgroup_id:
-		return &bpf_get_current_cgroup_id_proto;
-	case BPF_FUNC_get_current_ancestor_cgroup_id:
-		return &bpf_get_current_ancestor_cgroup_id_proto;
-#endif
-#ifdef CONFIG_CGROUP_NET_CLASSID
-	case BPF_FUNC_get_cgroup_classid:
-		return &bpf_get_cgroup_classid_curr_proto;
-#endif
 #ifdef CONFIG_INET
 	case BPF_FUNC_sk_lookup_tcp:
 		return &bpf_sock_addr_sk_lookup_tcp_proto;
@@ -7808,9 +7786,13 @@ const struct bpf_func_proto bpf_sk_storage_delete_proto __weak;
 static const struct bpf_func_proto *
 cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_sk_fullsock:
 		return &bpf_sk_fullsock_proto;
 	case BPF_FUNC_sk_storage_get:
@@ -8050,6 +8032,12 @@ const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
 static const struct bpf_func_proto *
 sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = cgroup_common_func_proto(func_id, prog);
+	if (func_proto)
+		return func_proto;
+
 	switch (func_id) {
 	case BPF_FUNC_setsockopt:
 		return &bpf_sock_ops_setsockopt_proto;
@@ -8063,8 +8051,6 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sock_hash_update_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_cookie_sock_ops_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	case BPF_FUNC_sk_storage_get:
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 3/3] selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhere
  2022-08-16 20:12 [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Stanislav Fomichev
  2022-08-16 20:12 ` [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto Stanislav Fomichev
  2022-08-16 20:12 ` [PATCH bpf-next v2 2/3] bpf: Use cgroup_{common,current}_func_proto in more hooks Stanislav Fomichev
@ 2022-08-16 20:12 ` Stanislav Fomichev
  2022-08-17 19:07 ` [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Martin KaFai Lau
  3 siblings, 0 replies; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-16 20:12 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa

For each hook, have a simple bpf_set_retval(bpf_get_retval) program
and make sure it loads for the hooks we want. The exceptions are
the hooks which don't propagate the error to the callers:

- sockops
- recvmsg
- getpeername
- getsockname

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/Makefile          |  1 +
 .../bpf/cgroup_getset_retval_hooks.h          | 25 ++++++++++
 .../bpf/prog_tests/cgroup_getset_retval.c     | 48 +++++++++++++++++++
 .../bpf/progs/cgroup_getset_retval_hooks.c    | 16 +++++++
 4 files changed, 90 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/cgroup_getset_retval_hooks.h
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_getset_retval_hooks.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8d59ec7f4c2d..eecad99f1735 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -323,6 +323,7 @@ $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
 $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
 
 $(OUTPUT)/flow_dissector_load.o: flow_dissector_load.h
+$(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
 
 # Build BPF object using Clang
 # $1 - input .c file
diff --git a/tools/testing/selftests/bpf/cgroup_getset_retval_hooks.h b/tools/testing/selftests/bpf/cgroup_getset_retval_hooks.h
new file mode 100644
index 000000000000..4a8d2063163d
--- /dev/null
+++ b/tools/testing/selftests/bpf/cgroup_getset_retval_hooks.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+BPF_RETVAL_HOOK(ingress, "cgroup_skb/ingress", __sk_buff, 0)
+BPF_RETVAL_HOOK(egress, "cgroup_skb/egress", __sk_buff, 0)
+BPF_RETVAL_HOOK(sock_create, "cgroup/sock_create", bpf_sock, 0)
+BPF_RETVAL_HOOK(sock_ops, "sockops", bpf_sock_ops, -EINVAL)
+BPF_RETVAL_HOOK(dev, "cgroup/dev", bpf_cgroup_dev_ctx, 0)
+BPF_RETVAL_HOOK(bind4, "cgroup/bind4", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(bind6, "cgroup/bind6", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(connect4, "cgroup/connect4", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(connect6, "cgroup/connect6", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(post_bind4, "cgroup/post_bind4", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(post_bind6, "cgroup/post_bind6", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(sendmsg4, "cgroup/sendmsg4", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(sendmsg6, "cgroup/sendmsg6", bpf_sock_addr, 0)
+BPF_RETVAL_HOOK(sysctl, "cgroup/sysctl", bpf_sysctl, 0)
+BPF_RETVAL_HOOK(recvmsg4, "cgroup/recvmsg4", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(recvmsg6, "cgroup/recvmsg6", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(getsockopt, "cgroup/getsockopt", bpf_sockopt, 0)
+BPF_RETVAL_HOOK(setsockopt, "cgroup/setsockopt", bpf_sockopt, 0)
+BPF_RETVAL_HOOK(getpeername4, "cgroup/getpeername4", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(getpeername6, "cgroup/getpeername6", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(getsockname4, "cgroup/getsockname4", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(getsockname6, "cgroup/getsockname6", bpf_sock_addr, -EINVAL)
+BPF_RETVAL_HOOK(sock_release, "cgroup/sock_release", bpf_sock, 0)
diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_getset_retval.c b/tools/testing/selftests/bpf/prog_tests/cgroup_getset_retval.c
index 0b47c3c000c7..4d2fa99273d8 100644
--- a/tools/testing/selftests/bpf/prog_tests/cgroup_getset_retval.c
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_getset_retval.c
@@ -10,6 +10,7 @@
 
 #include "cgroup_getset_retval_setsockopt.skel.h"
 #include "cgroup_getset_retval_getsockopt.skel.h"
+#include "cgroup_getset_retval_hooks.skel.h"
 
 #define SOL_CUSTOM	0xdeadbeef
 
@@ -433,6 +434,50 @@ static void test_getsockopt_retval_sync(int cgroup_fd, int sock_fd)
 	cgroup_getset_retval_getsockopt__destroy(obj);
 }
 
+struct exposed_hook {
+	const char *name;
+	int expected_err;
+} exposed_hooks[] = {
+
+#define BPF_RETVAL_HOOK(NAME, SECTION, CTX, EXPECTED_ERR) \
+	{ \
+		.name = #NAME, \
+		.expected_err = EXPECTED_ERR, \
+	},
+
+#include "cgroup_getset_retval_hooks.h"
+
+#undef BPF_RETVAL_HOOK
+};
+
+static void test_exposed_hooks(int cgroup_fd, int sock_fd)
+{
+	struct cgroup_getset_retval_hooks *skel;
+	struct bpf_program *prog;
+	int err;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(exposed_hooks); i++) {
+		skel = cgroup_getset_retval_hooks__open();
+		if (!ASSERT_OK_PTR(skel, "cgroup_getset_retval_hooks__open"))
+			continue;
+
+		prog = bpf_object__find_program_by_name(skel->obj, exposed_hooks[i].name);
+		if (!ASSERT_NEQ(prog, NULL, "bpf_object__find_program_by_name"))
+			goto close_skel;
+
+		err = bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(err, "bpf_program__set_autoload"))
+			goto close_skel;
+
+		err = cgroup_getset_retval_hooks__load(skel);
+		ASSERT_EQ(err, exposed_hooks[i].expected_err, "expected_err");
+
+close_skel:
+		cgroup_getset_retval_hooks__destroy(skel);
+	}
+}
+
 void test_cgroup_getset_retval(void)
 {
 	int cgroup_fd = -1;
@@ -476,6 +521,9 @@ void test_cgroup_getset_retval(void)
 	if (test__start_subtest("getsockopt-retval_sync"))
 		test_getsockopt_retval_sync(cgroup_fd, sock_fd);
 
+	if (test__start_subtest("exposed_hooks"))
+		test_exposed_hooks(cgroup_fd, sock_fd);
+
 close_fd:
 	close(cgroup_fd);
 }
diff --git a/tools/testing/selftests/bpf/progs/cgroup_getset_retval_hooks.c b/tools/testing/selftests/bpf/progs/cgroup_getset_retval_hooks.c
new file mode 100644
index 000000000000..13dfb4bbfd28
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cgroup_getset_retval_hooks.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+#define BPF_RETVAL_HOOK(name, section, ctx, expected_err) \
+	__attribute__((__section__("?" section))) \
+	int name(struct ctx *_ctx) \
+	{ \
+		bpf_set_retval(bpf_get_retval()); \
+		return 1; \
+	}
+
+#include "cgroup_getset_retval_hooks.h"
+
+#undef BPF_RETVAL_HOOK
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-16 20:12 [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Stanislav Fomichev
                   ` (2 preceding siblings ...)
  2022-08-16 20:12 ` [PATCH bpf-next v2 3/3] selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhere Stanislav Fomichev
@ 2022-08-17 19:07 ` Martin KaFai Lau
  2022-08-17 22:41   ` Stanislav Fomichev
  3 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-17 19:07 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> Apparently, only a small subset of cgroup hooks actually falls
> back to cgroup_base_func_proto. This leads to unexpected result
> where not all cgroup helpers have bpf_{g,s}et_retval.
> 
> It's getting harder and harder to manage which helpers are exported
> to which hooks. We now have the following call chains:
> 
> - cg_skb_func_proto
>   - sk_filter_func_proto
>     - bpf_sk_base_func_proto
>       - bpf_base_func_proto
Could you explain how bpf_set_retval() will work with cgroup prog that
is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
It will be a useful doc to add to the uapi bpf.h for
the bpf_set_retval() helper.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto
  2022-08-16 20:12 ` [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto Stanislav Fomichev
@ 2022-08-17 19:22   ` Martin KaFai Lau
  2022-08-17 22:41     ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-17 19:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Tue, Aug 16, 2022 at 01:12:12PM -0700, Stanislav Fomichev wrote:
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a627a02cf8ab..c302d2de073a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1948,6 +1948,10 @@ struct bpf_prog *bpf_prog_by_id(u32 id);
>  struct bpf_link *bpf_link_by_id(u32 id);
>  
>  const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
> +const struct bpf_func_proto *
> +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
> +const struct bpf_func_proto *
> +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
>  void bpf_task_storage_free(struct task_struct *task);
>  bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
>  const struct btf_func_model *
> @@ -2154,6 +2158,18 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>  	return NULL;
>  }
>  
> +static inline const struct bpf_func_proto *
> +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +	return NULL;
> +}
> +
> +static inline const struct bpf_func_proto *
> +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +	return NULL;
> +}
> +
There two new functions are implemented in cgroup.c which only compiles with
CONFIG_CGROUP_BPF.  I think the change in bpf.h here should be done in
bpf-cgroup.h instead.  Otherwise, the changes in filter.c in the next patch will
have issue in resolving these functions when CONFIG_CGROUP_BPF is not set.

> -#define BPF_STRTOX_BASE_MASK 0x1F
> -
> -static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
> -			  unsigned long long *res, bool *is_negative)
> -{
> -	unsigned int base = flags & BPF_STRTOX_BASE_MASK;
> -	const char *cur_buf = buf;
> -	size_t cur_len = buf_len;
> -	unsigned int consumed;
> -	size_t val_len;
> -	char str[64];
> -
> -	if (!buf || !buf_len || !res || !is_negative)
> -		return -EINVAL;
> -
> -	if (base != 0 && base != 8 && base != 10 && base != 16)
> -		return -EINVAL;
> -
> -	if (flags & ~BPF_STRTOX_BASE_MASK)
> -		return -EINVAL;
> -
> -	while (cur_buf < buf + buf_len && isspace(*cur_buf))
> -		++cur_buf;
> -
> -	*is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
> -	if (*is_negative)
> -		++cur_buf;
> -
> -	consumed = cur_buf - buf;
> -	cur_len -= consumed;
> -	if (!cur_len)
> -		return -EINVAL;
> -
> -	cur_len = min(cur_len, sizeof(str) - 1);
> -	memcpy(str, cur_buf, cur_len);
> -	str[cur_len] = '\0';
> -	cur_buf = str;
> -
> -	cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
> -	val_len = _parse_integer(cur_buf, base, res);
> -
> -	if (val_len & KSTRTOX_OVERFLOW)
> -		return -ERANGE;
> -
> -	if (val_len == 0)
> -		return -EINVAL;
> -
> -	cur_buf += val_len;
> -	consumed += cur_buf - str;
> -
> -	return consumed;
> -}
> -
> -static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
> -			 long long *res)
> -{
> -	unsigned long long _res;
> -	bool is_negative;
> -	int err;
> -
> -	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
> -	if (err < 0)
> -		return err;
> -	if (is_negative) {
> -		if ((long long)-_res > 0)
> -			return -ERANGE;
> -		*res = -_res;
> -	} else {
> -		if ((long long)_res < 0)
> -			return -ERANGE;
> -		*res = _res;
> -	}
> -	return err;
> -}
> -
> -BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
> -	   long *, res)
> -{
> -	long long _res;
> -	int err;
> -
> -	err = __bpf_strtoll(buf, buf_len, flags, &_res);
> -	if (err < 0)
> -		return err;
> -	if (_res != (long)_res)
> -		return -ERANGE;
> -	*res = _res;
> -	return err;
> -}
> -
> -const struct bpf_func_proto bpf_strtol_proto = {
> -	.func		= bpf_strtol,
> -	.gpl_only	= false,
> -	.ret_type	= RET_INTEGER,
> -	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
> -	.arg2_type	= ARG_CONST_SIZE,
> -	.arg3_type	= ARG_ANYTHING,
> -	.arg4_type	= ARG_PTR_TO_LONG,
> -};
> -
> -BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
> -	   unsigned long *, res)
> -{
> -	unsigned long long _res;
> -	bool is_negative;
> -	int err;
> -
> -	err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
> -	if (err < 0)
> -		return err;
> -	if (is_negative)
> -		return -EINVAL;
> -	if (_res != (unsigned long)_res)
> -		return -ERANGE;
> -	*res = _res;
> -	return err;
> -}
> -
> -const struct bpf_func_proto bpf_strtoul_proto = {
This should be useful in general other than cgroup bpf.
It may end up moving back to helpers.c soon.
How about take this chance to add it to bpf_base_func_proto()
which already has another string helper bpf_strncmp_proto?

> -	.func		= bpf_strtoul,
> -	.gpl_only	= false,
> -	.ret_type	= RET_INTEGER,
> -	.arg1_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
> -	.arg2_type	= ARG_CONST_SIZE,
> -	.arg3_type	= ARG_ANYTHING,
> -	.arg4_type	= ARG_PTR_TO_LONG,
> -};
> -#endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-17 19:07 ` [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Martin KaFai Lau
@ 2022-08-17 22:41   ` Stanislav Fomichev
  2022-08-17 23:27     ` Martin KaFai Lau
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-17 22:41 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > Apparently, only a small subset of cgroup hooks actually falls
> > back to cgroup_base_func_proto. This leads to unexpected result
> > where not all cgroup helpers have bpf_{g,s}et_retval.
> >
> > It's getting harder and harder to manage which helpers are exported
> > to which hooks. We now have the following call chains:
> >
> > - cg_skb_func_proto
> >   - sk_filter_func_proto
> >     - bpf_sk_base_func_proto
> >       - bpf_base_func_proto
> Could you explain how bpf_set_retval() will work with cgroup prog that
> is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> It will be a useful doc to add to the uapi bpf.h for
> the bpf_set_retval() helper.

I think it's the same case as the case without bpf_set_retval? I don't
think the flags can be exported via bpf_set_retval, it just lets the
users override EPERM.
Let me verify and I can add a note to bpf_set_retval uapi definition
to mention that it just overrides EPERM. bpf_set_retval should
probably not talk about userspace/syscall and instead use the words
like "caller".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto
  2022-08-17 19:22   ` Martin KaFai Lau
@ 2022-08-17 22:41     ` Stanislav Fomichev
  0 siblings, 0 replies; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-17 22:41 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 12:22 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Aug 16, 2022 at 01:12:12PM -0700, Stanislav Fomichev wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a627a02cf8ab..c302d2de073a 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1948,6 +1948,10 @@ struct bpf_prog *bpf_prog_by_id(u32 id);
> >  struct bpf_link *bpf_link_by_id(u32 id);
> >
> >  const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
> > +const struct bpf_func_proto *
> > +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
> > +const struct bpf_func_proto *
> > +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
> >  void bpf_task_storage_free(struct task_struct *task);
> >  bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
> >  const struct btf_func_model *
> > @@ -2154,6 +2158,18 @@ bpf_base_func_proto(enum bpf_func_id func_id)
> >       return NULL;
> >  }
> >
> > +static inline const struct bpf_func_proto *
> > +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > +{
> > +     return NULL;
> > +}
> > +
> > +static inline const struct bpf_func_proto *
> > +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > +{
> > +     return NULL;
> > +}
> > +
> There two new functions are implemented in cgroup.c which only compiles with
> CONFIG_CGROUP_BPF.  I think the change in bpf.h here should be done in
> bpf-cgroup.h instead.  Otherwise, the changes in filter.c in the next patch will
> have issue in resolving these functions when CONFIG_CGROUP_BPF is not set.

SG. Let me try. I think I have a config for CONFIG_CGROUPS=y
CONFIG_CGROUP_BPF=n, but maybe I don't, let me check.

> > -#define BPF_STRTOX_BASE_MASK 0x1F
> > -
> > -static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
> > -                       unsigned long long *res, bool *is_negative)
> > -{
> > -     unsigned int base = flags & BPF_STRTOX_BASE_MASK;
> > -     const char *cur_buf = buf;
> > -     size_t cur_len = buf_len;
> > -     unsigned int consumed;
> > -     size_t val_len;
> > -     char str[64];
> > -
> > -     if (!buf || !buf_len || !res || !is_negative)
> > -             return -EINVAL;
> > -
> > -     if (base != 0 && base != 8 && base != 10 && base != 16)
> > -             return -EINVAL;
> > -
> > -     if (flags & ~BPF_STRTOX_BASE_MASK)
> > -             return -EINVAL;
> > -
> > -     while (cur_buf < buf + buf_len && isspace(*cur_buf))
> > -             ++cur_buf;
> > -
> > -     *is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
> > -     if (*is_negative)
> > -             ++cur_buf;
> > -
> > -     consumed = cur_buf - buf;
> > -     cur_len -= consumed;
> > -     if (!cur_len)
> > -             return -EINVAL;
> > -
> > -     cur_len = min(cur_len, sizeof(str) - 1);
> > -     memcpy(str, cur_buf, cur_len);
> > -     str[cur_len] = '\0';
> > -     cur_buf = str;
> > -
> > -     cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
> > -     val_len = _parse_integer(cur_buf, base, res);
> > -
> > -     if (val_len & KSTRTOX_OVERFLOW)
> > -             return -ERANGE;
> > -
> > -     if (val_len == 0)
> > -             return -EINVAL;
> > -
> > -     cur_buf += val_len;
> > -     consumed += cur_buf - str;
> > -
> > -     return consumed;
> > -}
> > -
> > -static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
> > -                      long long *res)
> > -{
> > -     unsigned long long _res;
> > -     bool is_negative;
> > -     int err;
> > -
> > -     err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
> > -     if (err < 0)
> > -             return err;
> > -     if (is_negative) {
> > -             if ((long long)-_res > 0)
> > -                     return -ERANGE;
> > -             *res = -_res;
> > -     } else {
> > -             if ((long long)_res < 0)
> > -                     return -ERANGE;
> > -             *res = _res;
> > -     }
> > -     return err;
> > -}
> > -
> > -BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
> > -        long *, res)
> > -{
> > -     long long _res;
> > -     int err;
> > -
> > -     err = __bpf_strtoll(buf, buf_len, flags, &_res);
> > -     if (err < 0)
> > -             return err;
> > -     if (_res != (long)_res)
> > -             return -ERANGE;
> > -     *res = _res;
> > -     return err;
> > -}
> > -
> > -const struct bpf_func_proto bpf_strtol_proto = {
> > -     .func           = bpf_strtol,
> > -     .gpl_only       = false,
> > -     .ret_type       = RET_INTEGER,
> > -     .arg1_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
> > -     .arg2_type      = ARG_CONST_SIZE,
> > -     .arg3_type      = ARG_ANYTHING,
> > -     .arg4_type      = ARG_PTR_TO_LONG,
> > -};
> > -
> > -BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
> > -        unsigned long *, res)
> > -{
> > -     unsigned long long _res;
> > -     bool is_negative;
> > -     int err;
> > -
> > -     err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
> > -     if (err < 0)
> > -             return err;
> > -     if (is_negative)
> > -             return -EINVAL;
> > -     if (_res != (unsigned long)_res)
> > -             return -ERANGE;
> > -     *res = _res;
> > -     return err;
> > -}
> > -
> > -const struct bpf_func_proto bpf_strtoul_proto = {
> This should be useful in general other than cgroup bpf.
> It may end up moving back to helpers.c soon.
> How about take this chance to add it to bpf_base_func_proto()
> which already has another string helper bpf_strncmp_proto?

Sure, will do! Thanks!


> > -     .func           = bpf_strtoul,
> > -     .gpl_only       = false,
> > -     .ret_type       = RET_INTEGER,
> > -     .arg1_type      = ARG_PTR_TO_MEM | MEM_RDONLY,
> > -     .arg2_type      = ARG_CONST_SIZE,
> > -     .arg3_type      = ARG_ANYTHING,
> > -     .arg4_type      = ARG_PTR_TO_LONG,
> > -};
> > -#endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-17 22:41   ` Stanislav Fomichev
@ 2022-08-17 23:27     ` Martin KaFai Lau
  2022-08-17 23:59       ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-17 23:27 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > Apparently, only a small subset of cgroup hooks actually falls
> > > back to cgroup_base_func_proto. This leads to unexpected result
> > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > >
> > > It's getting harder and harder to manage which helpers are exported
> > > to which hooks. We now have the following call chains:
> > >
> > > - cg_skb_func_proto
> > >   - sk_filter_func_proto
> > >     - bpf_sk_base_func_proto
> > >       - bpf_base_func_proto
> > Could you explain how bpf_set_retval() will work with cgroup prog that
> > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > It will be a useful doc to add to the uapi bpf.h for
> > the bpf_set_retval() helper.
> 
> I think it's the same case as the case without bpf_set_retval? I don't
> think the flags can be exported via bpf_set_retval, it just lets the
> users override EPERM.
eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
is expecting after calling bpf_set_retval(-Exxxx) ?
Thinking more about it, should __cgroup_bpf_run_filter_skb() always
return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?

> Let me verify and I can add a note to bpf_set_retval uapi definition
> to mention that it just overrides EPERM. bpf_set_retval should
> probably not talk about userspace/syscall and instead use the words
> like "caller".
yeah, it is no longer syscall return value only now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-17 23:27     ` Martin KaFai Lau
@ 2022-08-17 23:59       ` Stanislav Fomichev
  2022-08-18  0:21         ` Martin KaFai Lau
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-17 23:59 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > >
> > > > It's getting harder and harder to manage which helpers are exported
> > > > to which hooks. We now have the following call chains:
> > > >
> > > > - cg_skb_func_proto
> > > >   - sk_filter_func_proto
> > > >     - bpf_sk_base_func_proto
> > > >       - bpf_base_func_proto
> > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > It will be a useful doc to add to the uapi bpf.h for
> > > the bpf_set_retval() helper.
> >
> > I think it's the same case as the case without bpf_set_retval? I don't
> > think the flags can be exported via bpf_set_retval, it just lets the
> > users override EPERM.
> eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> is expecting after calling bpf_set_retval(-Exxxx) ?
> Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?

I think we used to have "return 0/1/2/3" to indicate different
conditions but then switched to "return 1/0" + flags.
So, technically, "return 3 + bpf_set_retval" is still fundamentally a
"return 3" api-wise.
I guess we can make bpf_set_retval override that but let me start by
trying to document what we currently have.
If it turns out to be super ugly, we can try to fix it. (not sure how
much of a uapi that is)



> > Let me verify and I can add a note to bpf_set_retval uapi definition
> > to mention that it just overrides EPERM. bpf_set_retval should
> > probably not talk about userspace/syscall and instead use the words
> > like "caller".
> yeah, it is no longer syscall return value only now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-17 23:59       ` Stanislav Fomichev
@ 2022-08-18  0:21         ` Martin KaFai Lau
  2022-08-18  3:42           ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-18  0:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 04:59:06PM -0700, Stanislav Fomichev wrote:
> On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > > >
> > > > > It's getting harder and harder to manage which helpers are exported
> > > > > to which hooks. We now have the following call chains:
> > > > >
> > > > > - cg_skb_func_proto
> > > > >   - sk_filter_func_proto
> > > > >     - bpf_sk_base_func_proto
> > > > >       - bpf_base_func_proto
> > > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > > It will be a useful doc to add to the uapi bpf.h for
> > > > the bpf_set_retval() helper.
> > >
> > > I think it's the same case as the case without bpf_set_retval? I don't
> > > think the flags can be exported via bpf_set_retval, it just lets the
> > > users override EPERM.
> > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> > What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> > correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> > instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> > is expecting after calling bpf_set_retval(-Exxxx) ?
> > Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?
> 
> I think we used to have "return 0/1/2/3" to indicate different
> conditions but then switched to "return 1/0" + flags.
For 'int bpf_prog_run_array_cg(..., u32 *ret_flags)'?
I think it is more like return "0 (OK)/-Exxxx" + ret_flags now.

> So, technically, "return 3 + bpf_set_retval" is still fundamentally a
> "return 3" api-wise.
hm....for the exisiting usecase (eg. CGROUP_SETSOCKOPT), what does
"bpf-prog-return 1 + bpf_set_retval(-EPERM)" mean?

> I guess we can make bpf_set_retval override that but let me start by
> trying to document what we currently have.
To be clear, for cg_skb case, I meant to clear the ret_flags only if
run_ctx.retval is set.

> If it turns out to be super ugly, we can try to fix it. (not sure how
> much of a uapi that is)
sgtm.

> 
> 
> 
> > > Let me verify and I can add a note to bpf_set_retval uapi definition
> > > to mention that it just overrides EPERM. bpf_set_retval should
> > > probably not talk about userspace/syscall and instead use the words
> > > like "caller".
> > yeah, it is no longer syscall return value only now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-18  0:21         ` Martin KaFai Lau
@ 2022-08-18  3:42           ` Stanislav Fomichev
  2022-08-18 18:15             ` Martin KaFai Lau
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-18  3:42 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 5:21 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Aug 17, 2022 at 04:59:06PM -0700, Stanislav Fomichev wrote:
> > On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > > > >
> > > > > > It's getting harder and harder to manage which helpers are exported
> > > > > > to which hooks. We now have the following call chains:
> > > > > >
> > > > > > - cg_skb_func_proto
> > > > > >   - sk_filter_func_proto
> > > > > >     - bpf_sk_base_func_proto
> > > > > >       - bpf_base_func_proto
> > > > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > > > It will be a useful doc to add to the uapi bpf.h for
> > > > > the bpf_set_retval() helper.
> > > >
> > > > I think it's the same case as the case without bpf_set_retval? I don't
> > > > think the flags can be exported via bpf_set_retval, it just lets the
> > > > users override EPERM.
> > > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> > > What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> > > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> > > correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> > > instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> > > is expecting after calling bpf_set_retval(-Exxxx) ?
> > > Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> > > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?
> >
> > I think we used to have "return 0/1/2/3" to indicate different
> > conditions but then switched to "return 1/0" + flags.
> For 'int bpf_prog_run_array_cg(..., u32 *ret_flags)'?
> I think it is more like return "0 (OK)/-Exxxx" + ret_flags now.

Yes, right now that's that case. What I meant to say is that for the
BPF program itself, the api is still "return a set of predefined
values". We don't advertise the flags to the bpf programs. 'return 2'
is a perfectly valid return for cgroup/egress that will tx the packet
with a cn. (where bpf_prog_run_array_cg sees it as a 'return 0 + (1 <<
1)')

> > So, technically, "return 3 + bpf_set_retval" is still fundamentally a
> > "return 3" api-wise.
> hm....for the exisiting usecase (eg. CGROUP_SETSOCKOPT), what does
> "bpf-prog-return 1 + bpf_set_retval(-EPERM)" mean?

I think bpf_set_retval takes precedence and in this case bpf_prog_run
wrapper will return -EPERM to the caller.
Will try to document that as well.

> > I guess we can make bpf_set_retval override that but let me start by
> > trying to document what we currently have.
> To be clear, for cg_skb case, I meant to clear the ret_flags only if
> run_ctx.retval is set.

Are you suggesting something like the following?

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index fd113bd2f79c..c110cbe52001 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -61,6 +61,8 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
        bpf_reset_run_ctx(old_run_ctx);
        rcu_read_unlock();
        migrate_enable();
+       if (IS_ERR_VALUE((long)run_ctx.retval))
+               *ret_flags = 0;
        return run_ctx.retval;
 }

I think this will break the 'return 2' case? But is it worth it doing
it more carefully like this? LMKWYT.

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index fd113bd2f79c..dcd25561f8d4 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -39,6 +39,7 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
        const struct bpf_prog_array *array;
        struct bpf_run_ctx *old_run_ctx;
        struct bpf_cg_run_ctx run_ctx;
+       bool seen_return0 = false;
        u32 func_ret;

        run_ctx.retval = retval;
@@ -54,13 +55,17 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
                        *(ret_flags) |= (func_ret >> 1);
                        func_ret &= 1;
                }
-               if (!func_ret && !IS_ERR_VALUE((long)run_ctx.retval))
+               if (!func_ret && !IS_ERR_VALUE((long)run_ctx.retval)) {
                        run_ctx.retval = -EPERM;
+                       seen_return0 = true;
+               }
                item++;
        }
        bpf_reset_run_ctx(old_run_ctx);
        rcu_read_unlock();
        migrate_enable();
+       if (IS_ERR_VALUE((long)run_ctx.retval) && !seen_return0)
+               *ret_flags = 0;
        return run_ctx.retval;
 }

> > If it turns out to be super ugly, we can try to fix it. (not sure how
> > much of a uapi that is)
> sgtm.
>
> >
> >
> >
> > > > Let me verify and I can add a note to bpf_set_retval uapi definition
> > > > to mention that it just overrides EPERM. bpf_set_retval should
> > > > probably not talk about userspace/syscall and instead use the words
> > > > like "caller".
> > > yeah, it is no longer syscall return value only now.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-18  3:42           ` Stanislav Fomichev
@ 2022-08-18 18:15             ` Martin KaFai Lau
  2022-08-18 18:30               ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-18 18:15 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Wed, Aug 17, 2022 at 08:42:54PM -0700, Stanislav Fomichev wrote:
> On Wed, Aug 17, 2022 at 5:21 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Aug 17, 2022 at 04:59:06PM -0700, Stanislav Fomichev wrote:
> > > On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > > > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > > > > >
> > > > > > > It's getting harder and harder to manage which helpers are exported
> > > > > > > to which hooks. We now have the following call chains:
> > > > > > >
> > > > > > > - cg_skb_func_proto
> > > > > > >   - sk_filter_func_proto
> > > > > > >     - bpf_sk_base_func_proto
> > > > > > >       - bpf_base_func_proto
> > > > > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > > > > It will be a useful doc to add to the uapi bpf.h for
> > > > > > the bpf_set_retval() helper.
> > > > >
> > > > > I think it's the same case as the case without bpf_set_retval? I don't
> > > > > think the flags can be exported via bpf_set_retval, it just lets the
> > > > > users override EPERM.
> > > > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> > > > What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> > > > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> > > > correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> > > > instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> > > > is expecting after calling bpf_set_retval(-Exxxx) ?
> > > > Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> > > > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?
> > >
> > > I think we used to have "return 0/1/2/3" to indicate different
> > > conditions but then switched to "return 1/0" + flags.
> > For 'int bpf_prog_run_array_cg(..., u32 *ret_flags)'?
> > I think it is more like return "0 (OK)/-Exxxx" + ret_flags now.
> 
> Yes, right now that's that case. What I meant to say is that for the
> BPF program itself, the api is still "return a set of predefined
> values". We don't advertise the flags to the bpf programs. 'return 2'
> is a perfectly valid return for cgroup/egress that will tx the packet
> with a cn. (where bpf_prog_run_array_cg sees it as a 'return 0 + (1 <<
> 1)')
> 
> > > So, technically, "return 3 + bpf_set_retval" is still fundamentally a
> > > "return 3" api-wise.
> > hm....for the exisiting usecase (eg. CGROUP_SETSOCKOPT), what does
> > "bpf-prog-return 1 + bpf_set_retval(-EPERM)" mean?
> 
> I think bpf_set_retval takes precedence and in this case bpf_prog_run
> wrapper will return -EPERM to the caller.
> Will try to document that as well.
> 
> > > I guess we can make bpf_set_retval override that but let me start by
> > > trying to document what we currently have.
> > To be clear, for cg_skb case, I meant to clear the ret_flags only if
> > run_ctx.retval is set.
> 
> Are you suggesting something like the following?
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index fd113bd2f79c..c110cbe52001 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -61,6 +61,8 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>         bpf_reset_run_ctx(old_run_ctx);
>         rcu_read_unlock();
>         migrate_enable();
> +       if (IS_ERR_VALUE((long)run_ctx.retval))
> +               *ret_flags = 0;
>         return run_ctx.retval;
>  }
> 
> I think this will break the 'return 2' case? But is it worth it doing
> it more carefully like this? LMKWYT.
The below should work. Not sure it is worth it
but before doing this...

During this discussion, I think I am not clear what is the use case
on bpf_{g,s}et_retval() for cg_skb.  Could you describe how it will be
used in your use case?  Is it for another tracing program to get
a different return value from (eg.) sk_filter_trim_cap or ip[6]_output?

Not meaning the helper should not be exposed.  It is easier
to think with some examples.

> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index fd113bd2f79c..dcd25561f8d4 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -39,6 +39,7 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>         const struct bpf_prog_array *array;
>         struct bpf_run_ctx *old_run_ctx;
>         struct bpf_cg_run_ctx run_ctx;
> +       bool seen_return0 = false;
>         u32 func_ret;
> 
>         run_ctx.retval = retval;
> @@ -54,13 +55,17 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>                         *(ret_flags) |= (func_ret >> 1);
>                         func_ret &= 1;
>                 }
> -               if (!func_ret && !IS_ERR_VALUE((long)run_ctx.retval))
> +               if (!func_ret && !IS_ERR_VALUE((long)run_ctx.retval)) {
>                         run_ctx.retval = -EPERM;
> +                       seen_return0 = true;
> +               }
>                 item++;
>         }
>         bpf_reset_run_ctx(old_run_ctx);
>         rcu_read_unlock();
>         migrate_enable();
> +       if (IS_ERR_VALUE((long)run_ctx.retval) && !seen_return0)
> +               *ret_flags = 0;
>         return run_ctx.retval;
>  }
> 
> > > If it turns out to be super ugly, we can try to fix it. (not sure how
> > > much of a uapi that is)
> > sgtm.
> >
> > >
> > >
> > >
> > > > > Let me verify and I can add a note to bpf_set_retval uapi definition
> > > > > to mention that it just overrides EPERM. bpf_set_retval should
> > > > > probably not talk about userspace/syscall and instead use the words
> > > > > like "caller".
> > > > yeah, it is no longer syscall return value only now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-18 18:15             ` Martin KaFai Lau
@ 2022-08-18 18:30               ` Stanislav Fomichev
  2022-08-18 19:54                 ` Martin KaFai Lau
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2022-08-18 18:30 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Thu, Aug 18, 2022 at 11:16 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Aug 17, 2022 at 08:42:54PM -0700, Stanislav Fomichev wrote:
> > On Wed, Aug 17, 2022 at 5:21 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Wed, Aug 17, 2022 at 04:59:06PM -0700, Stanislav Fomichev wrote:
> > > > On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > > > > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > > > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > > > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > > > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > > > > > >
> > > > > > > > It's getting harder and harder to manage which helpers are exported
> > > > > > > > to which hooks. We now have the following call chains:
> > > > > > > >
> > > > > > > > - cg_skb_func_proto
> > > > > > > >   - sk_filter_func_proto
> > > > > > > >     - bpf_sk_base_func_proto
> > > > > > > >       - bpf_base_func_proto
> > > > > > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > > > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > > > > > It will be a useful doc to add to the uapi bpf.h for
> > > > > > > the bpf_set_retval() helper.
> > > > > >
> > > > > > I think it's the same case as the case without bpf_set_retval? I don't
> > > > > > think the flags can be exported via bpf_set_retval, it just lets the
> > > > > > users override EPERM.
> > > > > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> > > > > What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> > > > > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> > > > > correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> > > > > instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> > > > > is expecting after calling bpf_set_retval(-Exxxx) ?
> > > > > Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> > > > > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?
> > > >
> > > > I think we used to have "return 0/1/2/3" to indicate different
> > > > conditions but then switched to "return 1/0" + flags.
> > > For 'int bpf_prog_run_array_cg(..., u32 *ret_flags)'?
> > > I think it is more like return "0 (OK)/-Exxxx" + ret_flags now.
> >
> > Yes, right now that's that case. What I meant to say is that for the
> > BPF program itself, the api is still "return a set of predefined
> > values". We don't advertise the flags to the bpf programs. 'return 2'
> > is a perfectly valid return for cgroup/egress that will tx the packet
> > with a cn. (where bpf_prog_run_array_cg sees it as a 'return 0 + (1 <<
> > 1)')
> >
> > > > So, technically, "return 3 + bpf_set_retval" is still fundamentally a
> > > > "return 3" api-wise.
> > > hm....for the exisiting usecase (eg. CGROUP_SETSOCKOPT), what does
> > > "bpf-prog-return 1 + bpf_set_retval(-EPERM)" mean?
> >
> > I think bpf_set_retval takes precedence and in this case bpf_prog_run
> > wrapper will return -EPERM to the caller.
> > Will try to document that as well.
> >
> > > > I guess we can make bpf_set_retval override that but let me start by
> > > > trying to document what we currently have.
> > > To be clear, for cg_skb case, I meant to clear the ret_flags only if
> > > run_ctx.retval is set.
> >
> > Are you suggesting something like the following?
> >
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index fd113bd2f79c..c110cbe52001 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -61,6 +61,8 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> >         bpf_reset_run_ctx(old_run_ctx);
> >         rcu_read_unlock();
> >         migrate_enable();
> > +       if (IS_ERR_VALUE((long)run_ctx.retval))
> > +               *ret_flags = 0;
> >         return run_ctx.retval;
> >  }
> >
> > I think this will break the 'return 2' case? But is it worth it doing
> > it more carefully like this? LMKWYT.
> The below should work. Not sure it is worth it
> but before doing this...
>
> During this discussion, I think I am not clear what is the use case
> on bpf_{g,s}et_retval() for cg_skb.  Could you describe how it will be
> used in your use case?  Is it for another tracing program to get
> a different return value from (eg.) sk_filter_trim_cap or ip[6]_output?
>
> Not meaning the helper should not be exposed.  It is easier
> to think with some examples.

I don't really need them in cg_skb, I want them in cg_sock so I can
return a custom errno from socket() syscall.
You're probably right and it doesn't make sense to support them in
cg_skb. Most of the
BPF_CGROUP_RUN_PROG_INET_INGRESS/BPF_CGROUP_RUN_PROG_INET_EGRESS
callers don't seem to care about returned error code? (from my brief
grepping)
Let's maybe err on the safe side and special case cg_skb for now (in
cgroup_common_func_proto) and not expose retval helpers?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks
  2022-08-18 18:30               ` Stanislav Fomichev
@ 2022-08-18 19:54                 ` Martin KaFai Lau
  0 siblings, 0 replies; 15+ messages in thread
From: Martin KaFai Lau @ 2022-08-18 19:54 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, haoluo, jolsa

On Thu, Aug 18, 2022 at 11:30:21AM -0700, Stanislav Fomichev wrote:
> On Thu, Aug 18, 2022 at 11:16 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Aug 17, 2022 at 08:42:54PM -0700, Stanislav Fomichev wrote:
> > > On Wed, Aug 17, 2022 at 5:21 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Wed, Aug 17, 2022 at 04:59:06PM -0700, Stanislav Fomichev wrote:
> > > > > On Wed, Aug 17, 2022 at 4:27 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Wed, Aug 17, 2022 at 03:41:26PM -0700, Stanislav Fomichev wrote:
> > > > > > > On Wed, Aug 17, 2022 at 12:07 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Aug 16, 2022 at 01:12:11PM -0700, Stanislav Fomichev wrote:
> > > > > > > > > Apparently, only a small subset of cgroup hooks actually falls
> > > > > > > > > back to cgroup_base_func_proto. This leads to unexpected result
> > > > > > > > > where not all cgroup helpers have bpf_{g,s}et_retval.
> > > > > > > > >
> > > > > > > > > It's getting harder and harder to manage which helpers are exported
> > > > > > > > > to which hooks. We now have the following call chains:
> > > > > > > > >
> > > > > > > > > - cg_skb_func_proto
> > > > > > > > >   - sk_filter_func_proto
> > > > > > > > >     - bpf_sk_base_func_proto
> > > > > > > > >       - bpf_base_func_proto
> > > > > > > > Could you explain how bpf_set_retval() will work with cgroup prog that
> > > > > > > > is not syscall and can return flags in the higher bit (e.g. cg_skb egress).
> > > > > > > > It will be a useful doc to add to the uapi bpf.h for
> > > > > > > > the bpf_set_retval() helper.
> > > > > > >
> > > > > > > I think it's the same case as the case without bpf_set_retval? I don't
> > > > > > > think the flags can be exported via bpf_set_retval, it just lets the
> > > > > > > users override EPERM.
> > > > > > eg. Before, a cg_skb@egress prog returns 3 to mean NET_XMIT_CN.
> > > > > > What if the prog now returns 3 and also bpf_set_retval(-Exxxx).
> > > > > > If I read how __cgroup_bpf_run_filter_skb() uses bpf_prog_run_array_cg()
> > > > > > correctly,  __cgroup_bpf_run_filter_skb() will return NET_XMIT_DROP
> > > > > > instead of the -Exxxx.  The -Exxxx is probably what the bpf prog
> > > > > > is expecting after calling bpf_set_retval(-Exxxx) ?
> > > > > > Thinking more about it, should __cgroup_bpf_run_filter_skb() always
> > > > > > return -Exxxx whenever a -ve retval is set in bpf_set_retval() ?
> > > > >
> > > > > I think we used to have "return 0/1/2/3" to indicate different
> > > > > conditions but then switched to "return 1/0" + flags.
> > > > For 'int bpf_prog_run_array_cg(..., u32 *ret_flags)'?
> > > > I think it is more like return "0 (OK)/-Exxxx" + ret_flags now.
> > >
> > > Yes, right now that's that case. What I meant to say is that for the
> > > BPF program itself, the api is still "return a set of predefined
> > > values". We don't advertise the flags to the bpf programs. 'return 2'
> > > is a perfectly valid return for cgroup/egress that will tx the packet
> > > with a cn. (where bpf_prog_run_array_cg sees it as a 'return 0 + (1 <<
> > > 1)')
> > >
> > > > > So, technically, "return 3 + bpf_set_retval" is still fundamentally a
> > > > > "return 3" api-wise.
> > > > hm....for the exisiting usecase (eg. CGROUP_SETSOCKOPT), what does
> > > > "bpf-prog-return 1 + bpf_set_retval(-EPERM)" mean?
> > >
> > > I think bpf_set_retval takes precedence and in this case bpf_prog_run
> > > wrapper will return -EPERM to the caller.
> > > Will try to document that as well.
> > >
> > > > > I guess we can make bpf_set_retval override that but let me start by
> > > > > trying to document what we currently have.
> > > > To be clear, for cg_skb case, I meant to clear the ret_flags only if
> > > > run_ctx.retval is set.
> > >
> > > Are you suggesting something like the following?
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index fd113bd2f79c..c110cbe52001 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -61,6 +61,8 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> > >         bpf_reset_run_ctx(old_run_ctx);
> > >         rcu_read_unlock();
> > >         migrate_enable();
> > > +       if (IS_ERR_VALUE((long)run_ctx.retval))
> > > +               *ret_flags = 0;
> > >         return run_ctx.retval;
> > >  }
> > >
> > > I think this will break the 'return 2' case? But is it worth it doing
> > > it more carefully like this? LMKWYT.
> > The below should work. Not sure it is worth it
> > but before doing this...
> >
> > During this discussion, I think I am not clear what is the use case
> > on bpf_{g,s}et_retval() for cg_skb.  Could you describe how it will be
> > used in your use case?  Is it for another tracing program to get
> > a different return value from (eg.) sk_filter_trim_cap or ip[6]_output?
> >
> > Not meaning the helper should not be exposed.  It is easier
> > to think with some examples.
> 
> I don't really need them in cg_skb, I want them in cg_sock so I can
> return a custom errno from socket() syscall.
bpf_{g,s}et_retval() will probably be useful for sock_addr too.
There is a BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE falling
into a similar bucket.  However, I think it should be fine
since the higher bit is only used when the bpf prog
returns OK.

> You're probably right and it doesn't make sense to support them in
> cg_skb. Most of the
> BPF_CGROUP_RUN_PROG_INET_INGRESS/BPF_CGROUP_RUN_PROG_INET_EGRESS
> callers don't seem to care about returned error code? (from my brief
> grepping)
> Let's maybe err on the safe side and special case cg_skb for now (in
> cgroup_common_func_proto) and not expose retval helpers?
Ah. I was under the wrong impression that you have a use case
on cg_skb because cg_skb_func_proto was in the stack
example above :)

Yep, lets disable the bpf_{g,s}et_retval() helper for CGROUP_SKB
and also SOCK_OP(S) for now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-08-18 19:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-16 20:12 [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Stanislav Fomichev
2022-08-16 20:12 ` [PATCH bpf-next v2 1/3] bpf: Introduce cgroup_{common,current}_func_proto Stanislav Fomichev
2022-08-17 19:22   ` Martin KaFai Lau
2022-08-17 22:41     ` Stanislav Fomichev
2022-08-16 20:12 ` [PATCH bpf-next v2 2/3] bpf: Use cgroup_{common,current}_func_proto in more hooks Stanislav Fomichev
2022-08-16 20:12 ` [PATCH bpf-next v2 3/3] selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhere Stanislav Fomichev
2022-08-17 19:07 ` [PATCH bpf-next v2 0/3] bpf: expose bpf_{g,s}et_retval to more cgroup hooks Martin KaFai Lau
2022-08-17 22:41   ` Stanislav Fomichev
2022-08-17 23:27     ` Martin KaFai Lau
2022-08-17 23:59       ` Stanislav Fomichev
2022-08-18  0:21         ` Martin KaFai Lau
2022-08-18  3:42           ` Stanislav Fomichev
2022-08-18 18:15             ` Martin KaFai Lau
2022-08-18 18:30               ` Stanislav Fomichev
2022-08-18 19:54                 ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox