[PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if
@ 2016-11-29 15:53 David Ahern
  2016-11-29 15:53 ` [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.

This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.

For example:
 1. configure vrf (e.g., using ifupdown2)
        auto eth0
        iface eth0 inet dhcp
            vrf mgmt

        auto mgmt
        iface mgmt
            vrf-table auto

 2. configure cgroup
        mount -t cgroup2 none /tmp/cgroupv2
        mkdir /tmp/cgroupv2/mgmt
        test_cgrp2_sock /tmp/cgroupv2/mgmt 15

 3. set shell into cgroup (e.g., can be done at login using pam)
        echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs

At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.

This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.

David Ahern (3):
  bpf: Refactor cgroups code in prep for new type
  bpf: Add new cgroup attach type to enable sock modifications
  samples: bpf: add userspace example for modifying sk_bound_dev_if

 include/linux/bpf-cgroup.h     | 60 ++++++++++++++++++------------
 include/uapi/linux/bpf.h       |  6 +++
 kernel/bpf/cgroup.c            | 43 +++++++++++++++++++---
 kernel/bpf/syscall.c           | 33 ++++++++++-------
 net/core/filter.c              | 59 ++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c             | 12 +++++-
 net/ipv6/af_inet6.c            |  8 ++++
 samples/bpf/Makefile           |  2 +
 samples/bpf/test_cgrp2_sock.c  | 83 ++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/test_cgrp2_sock.sh | 48 ++++++++++++++++++++++++
 10 files changed, 311 insertions(+), 43 deletions(-)
 create mode 100644 samples/bpf/test_cgrp2_sock.c
 create mode 100755 samples/bpf/test_cgrp2_sock.sh

-- 
2.1.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type
  2016-11-29 15:53 [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
@ 2016-11-29 15:53 ` David Ahern
  2016-11-29 19:41   ` Alexei Starovoitov
  2016-11-29 15:53 ` [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
  2016-11-29 15:53 ` [PATCH net-next v5 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
  2 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Code move and rename only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- no change

v4
- dropped refactor of __cgroup_bpf_run_filter and renamed it
  to __cgroup_bpf_run_filter_skb

v3
- dropped the rename

v2
- fix bpf_prog_run_clear_cb to bpf_prog_run_save_cb as caught by Daniel

- rename BPF_PROG_TYPE_CGROUP_SKB and its cg_skb functions to
  BPF_PROG_TYPE_CGROUP and cgroup

 include/linux/bpf-cgroup.h | 46 +++++++++++++++++++++++-----------------------
 kernel/bpf/cgroup.c        | 10 +++++-----
 kernel/bpf/syscall.c       | 28 +++++++++++++++-------------
 3 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index ec80d0c0953e..7f0fc635b13e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -37,31 +37,31 @@ void cgroup_bpf_update(struct cgroup *cgrp,
 		       struct bpf_prog *prog,
 		       enum bpf_attach_type type);
 
-int __cgroup_bpf_run_filter(struct sock *sk,
-			    struct sk_buff *skb,
-			    enum bpf_attach_type type);
-
-/* Wrappers for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled. */
-#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb)			\
-({									\
-	int __ret = 0;							\
-	if (cgroup_bpf_enabled)						\
-		__ret = __cgroup_bpf_run_filter(sk, skb,		\
-						BPF_CGROUP_INET_INGRESS); \
-									\
-	__ret;								\
+int __cgroup_bpf_run_filter_skb(struct sock *sk,
+				struct sk_buff *skb,
+				enum bpf_attach_type type);
+
+/* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
+#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
+({									      \
+	int __ret = 0;							      \
+	if (cgroup_bpf_enabled)						      \
+		__ret = __cgroup_bpf_run_filter_skb(sk, skb,		      \
+						    BPF_CGROUP_INET_INGRESS); \
+									      \
+	__ret;								      \
 })
 
-#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb)				\
-({									\
-	int __ret = 0;							\
-	if (cgroup_bpf_enabled && sk && sk == skb->sk) {		\
-		typeof(sk) __sk = sk_to_full_sk(sk);			\
-		if (sk_fullsock(__sk))					\
-			__ret = __cgroup_bpf_run_filter(__sk, skb,	\
-						BPF_CGROUP_INET_EGRESS); \
-	}								\
-	__ret;								\
+#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb)			       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled && sk && sk == skb->sk) {		       \
+		typeof(sk) __sk = sk_to_full_sk(sk);			       \
+		if (sk_fullsock(__sk))					       \
+			__ret = __cgroup_bpf_run_filter_skb(__sk, skb,	       \
+						      BPF_CGROUP_INET_EGRESS); \
+	}								       \
+	__ret;								       \
 })
 
 #else
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0ab43f264b0..19892973a78a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -118,7 +118,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 }
 
 /**
- * __cgroup_bpf_run_filter() - Run a program for packet filtering
+ * __cgroup_bpf_run_filter_skb() - Run a program for packet filtering
  * @sk: The socken sending or receiving traffic
  * @skb: The skb that is being sent or received
  * @type: The type of program to be exectuted
@@ -132,9 +132,9 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
  * This function will return %-EPERM if any if an attached program was found
  * and if it returned != 1 during execution. In all other cases, 0 is returned.
  */
-int __cgroup_bpf_run_filter(struct sock *sk,
-			    struct sk_buff *skb,
-			    enum bpf_attach_type type)
+int __cgroup_bpf_run_filter_skb(struct sock *sk,
+				struct sk_buff *skb,
+				enum bpf_attach_type type)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
@@ -164,4 +164,4 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 
 	return ret;
 }
-EXPORT_SYMBOL(__cgroup_bpf_run_filter);
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4caa18e6860a..5518a6839ab1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -856,6 +856,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
+	enum bpf_prog_type ptype;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -866,25 +867,26 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
-		prog = bpf_prog_get_type(attr->attach_bpf_fd,
-					 BPF_PROG_TYPE_CGROUP_SKB);
-		if (IS_ERR(prog))
-			return PTR_ERR(prog);
-
-		cgrp = cgroup_get_from_fd(attr->target_fd);
-		if (IS_ERR(cgrp)) {
-			bpf_prog_put(prog);
-			return PTR_ERR(cgrp);
-		}
-
-		cgroup_bpf_update(cgrp, prog, attr->attach_type);
-		cgroup_put(cgrp);
+		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 
 	default:
 		return -EINVAL;
 	}
 
+	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(cgrp);
+	}
+
+	cgroup_bpf_update(cgrp, prog, attr->attach_type);
+	cgroup_put(cgrp);
+
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-29 15:53 [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
  2016-11-29 15:53 ` [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-11-29 15:53 ` David Ahern
  2016-11-29 20:01   ` Alexei Starovoitov
  2016-11-29 15:53 ` [PATCH net-next v5 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
  2 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- no change

v4
- dropped tweak to bpf_func signature
- dropped cg_sock_func_proto in favor of sk_filter_func_proto
- new __cgroup_bpf_run_filter_sk versus overloading __cgroup_bpf_run_filter
- reverted BPF_CGROUP_INET_SOCK to BPF_CGROUP_INET_SOCK_CREATE

v3
- reverted to new prog type BPF_PROG_TYPE_CGROUP_SOCK
- dropped the subtype

v2
- dropped the bpf_sock_store_u32 helper
- dropped the new prog type BPF_PROG_TYPE_CGROUP_SOCK
- moved valid access and context conversion to use subtype
- dropped CREATE from BPF_CGROUP_INET_SOCK and related function names
- moved running of filter from sk_alloc to inet{6}_create

 include/linux/bpf-cgroup.h | 14 +++++++++++
 include/uapi/linux/bpf.h   |  6 +++++
 kernel/bpf/cgroup.c        | 33 ++++++++++++++++++++++++++
 kernel/bpf/syscall.c       |  5 +++-
 net/core/filter.c          | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c         | 12 +++++++++-
 net/ipv6/af_inet6.c        |  8 +++++++
 7 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 7f0fc635b13e..7de376e37c5c 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -41,6 +41,9 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
 				enum bpf_attach_type type);
 
+int __cgroup_bpf_run_filter_sk(struct sock *sk,
+			       enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
 ({									      \
@@ -64,6 +67,16 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	__ret;								       \
 })
 
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled && sk) {					       \
+		__ret = __cgroup_bpf_run_filter_sk(sk,			       \
+						 BPF_CGROUP_INET_SOCK_CREATE); \
+	}								       \
+	__ret;								       \
+})
+
 #else
 
 struct cgroup_bpf {};
@@ -73,6 +86,7 @@ static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
 
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1370a9d1456f..75964e00d947 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,11 +101,13 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
 	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
 };
 
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK_CREATE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -537,6 +539,10 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+struct bpf_sock {
+	__u32 bound_dev_if;
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 19892973a78a..fe1c9ad03a36 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -165,3 +165,36 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	return ret;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
+
+/**
+ * __cgroup_bpf_run_filter_sk() - Run a program on a sock
+ * @sk: sock structure to manipulate
+ * @type: The type of program to be exectuted
+ *
+ * socket is passed is expected to be of type INET or INET6.
+ *
+ * The program type passed in via @type must be suitable for sock
+ * filtering. No further check is performed to assert that.
+ *
+ * This function will return %-EPERM if any if an attached program was found
+ * and if it returned != 1 during execution. In all other cases, 0 is returned.
+ */
+int __cgroup_bpf_run_filter_sk(struct sock *sk,
+			       enum bpf_attach_type type)
+{
+	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	struct bpf_prog *prog;
+	int ret = 0;
+
+
+	rcu_read_lock();
+
+	prog = rcu_dereference(cgrp->bpf.effective[type]);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, sk) == 1 ? 0 : -EPERM;
+
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5518a6839ab1..85af86c496cd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -869,7 +869,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_EGRESS:
 		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
-
+	case BPF_CGROUP_INET_SOCK_CREATE:
+		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -905,6 +907,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK_CREATE:
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
diff --git a/net/core/filter.c b/net/core/filter.c
index 698a262b8ebb..404aaa1bfa1f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2676,6 +2676,29 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool sock_filter_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					enum bpf_reg_type *reg_type)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_sock, bound_dev_if):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	if (off < 0 || off + size > sizeof(struct bpf_sock))
+		return false;
+
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+
+	return true;
+}
+
 static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 			       const struct bpf_prog *prog)
 {
@@ -2934,6 +2957,30 @@ static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
+					  int dst_reg, int src_reg,
+					  int ctx_off,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct bpf_sock, bound_dev_if):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_bound_dev_if) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg,
+					offsetof(struct sock, sk_bound_dev_if));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+				      offsetof(struct sock, sk_bound_dev_if));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 static u32 tc_cls_act_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					 int src_reg, int ctx_off,
 					 struct bpf_insn *insn_buf,
@@ -3007,6 +3054,12 @@ static const struct bpf_verifier_ops cg_skb_ops = {
 	.convert_ctx_access	= sk_filter_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops cg_sock_ops = {
+	.get_func_proto		= sk_filter_func_proto,
+	.is_valid_access	= sock_filter_is_valid_access,
+	.convert_ctx_access	= sock_filter_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops	= &sk_filter_ops,
 	.type	= BPF_PROG_TYPE_SOCKET_FILTER,
@@ -3032,6 +3085,11 @@ static struct bpf_prog_type_list cg_skb_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_CGROUP_SKB,
 };
 
+static struct bpf_prog_type_list cg_sock_type __read_mostly = {
+	.ops	= &cg_sock_ops,
+	.type	= BPF_PROG_TYPE_CGROUP_SOCK
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
@@ -3039,6 +3097,7 @@ static int __init register_sk_filter_ops(void)
 	bpf_register_prog_type(&sched_act_type);
 	bpf_register_prog_type(&xdp_type);
 	bpf_register_prog_type(&cg_skb_type);
+	bpf_register_prog_type(&cg_sock_type);
 
 	return 0;
 }
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5ddf5cda07f4..24d2550492ee 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 
 	if (sk->sk_prot->init) {
 		err = sk->sk_prot->init(sk);
-		if (err)
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
 			sk_common_release(sk);
+			goto out;
+		}
 	}
 out:
 	return err;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d424f3a3737a..237e654ba717 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -258,6 +258,14 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 			goto out;
 		}
 	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
 out:
 	return err;
 out_rcu_unlock:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
  2016-11-29 15:53 [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
  2016-11-29 15:53 ` [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
  2016-11-29 15:53 ` [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
@ 2016-11-29 15:53 ` David Ahern
  2 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- changed BPF_CGROUP_INET_SOCK to BPF_CGROUP_INET_SOCK_CREATE

v4
- added test_cgrp2_sock.sh for an automated test

v3
- revert to BPF_PROG_TYPE_CGROUP_SOCK prog type

v2
- removed bpf_sock_store_u32 references
- changed BPF_CGROUP_INET_SOCK_CREATE to BPF_CGROUP_INET_SOCK
- remove BPF_PROG_TYPE_CGROUP_SOCK prog type and add prog_subtype

 samples/bpf/Makefile           |  2 +
 samples/bpf/test_cgrp2_sock.c  | 83 ++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/test_cgrp2_sock.sh | 48 ++++++++++++++++++++++++
 3 files changed, 133 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_sock.c
 create mode 100755 samples/bpf/test_cgrp2_sock.sh

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 22b6407efa4f..3a404dd4bb46 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -23,6 +23,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
+hostprogs-y += test_cgrp2_sock
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -51,6 +52,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
+test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
new file mode 100644
index 000000000000..2831e5f41f86
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -0,0 +1,83 @@
+/* eBPF example program:
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
+ *   sockets opened by processes in the cgroup.
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/bpf.h>
+
+#include "libbpf.h"
+
+static int prog_load(int idx)
+{
+	struct bpf_insn prog[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+		BPF_MOV64_IMM(BPF_REG_3, idx),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+			     "GPL", 0);
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s cg-path device-index\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	int cg_fd, prog_fd, ret;
+	int idx = 0;
+
+	if (argc < 2)
+		return usage(argv[0]);
+
+	idx = atoi(argv[2]);
+	if (!idx) {
+		printf("Invalid device index\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (cg_fd < 0) {
+		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	prog_fd = prog_load(idx);
+	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
+
+	if (prog_fd < 0) {
+		printf("Failed to load prog: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+	if (ret < 0) {
+		printf("Failed to attach prog to cgroup: '%s'\n",
+		       strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	return EXIT_SUCCESS;
+}
diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
new file mode 100755
index 000000000000..35ce9e1566d8
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+function config_device {
+	ip netns add at_ns0
+	ip link add veth0 type veth peer name veth0b
+	ip link set veth0b up
+	ip link set veth0 netns at_ns0
+	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+	ip netns exec at_ns0 ip addr add 2401:db00::1/64 dev veth0 nodad
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip link add foo type vrf table 1234
+	ip link set foo up
+	ip addr add 172.16.1.101/24 dev veth0b
+	ip addr add 2401:db00::2/64 dev veth0b nodad
+	ip link set veth0b master foo
+}
+
+function attach_bpf {
+	idx=$(ip -o li sh dev foo | awk -F':' '{print $1}')
+	rm -rf /tmp/cgroupv2
+	mkdir -p /tmp/cgroupv2
+	mount -t cgroup2 none /tmp/cgroupv2
+	mkdir -p /tmp/cgroupv2/foo
+	test_cgrp2_sock /tmp/cgroupv2/foo $idx
+	echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
+}
+
+function cleanup {
+	set +ex
+	ip netns delete at_ns0
+	ip link del veth0
+	ip link del foo
+	umount /tmp/cgroupv2
+	rm -rf /tmp/cgroupv2
+	set -ex
+}
+
+function do_test {
+	ping -c1 -w1 172.16.1.100
+	ping6 -c1 -w1 2401:db00::1
+}
+
+cleanup 2>/dev/null
+config_device
+attach_bpf
+do_test
+cleanup
+echo "*** PASS ***"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type
  2016-11-29 15:53 ` [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-11-29 19:41   ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2016-11-29 19:41 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Tue, Nov 29, 2016 at 07:53:31AM -0800, David Ahern wrote:
> Code move and rename only; no functional change intended.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-29 15:53 ` [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
@ 2016-11-29 20:01   ` Alexei Starovoitov
  2016-11-29 21:01     ` David Ahern
  2016-11-30  0:43     ` David Ahern
  0 siblings, 2 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2016-11-29 20:01 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Tue, Nov 29, 2016 at 07:53:32AM -0800, David Ahern wrote:
> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
> any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
> Currently only sk_bound_dev_if is exported to userspace for modification
> by a bpf program.
> 
> This allows a cgroup to be configured such that AF_INET{6} sockets opened
> by processes are automatically bound to a specific device. In turn, this
> enables the running of programs that do not support SO_BINDTODEVICE in a
> specific VRF context / L3 domain.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
...
> +struct bpf_sock {
> +	__u32 bound_dev_if;
> +};

overall looks great to me.
Could you also expose sk_protcol and sk_type as read only fields?
They have user space visible values already and will make this new
BPF_PROG_TYPE_CGROUP_SOCK program type much more useful beyond vrf
use case. Like we'll be able to write a tiny bpf program to block all
raw sockets or all udp sockets for an application within a given cgroup.
If someone would want to prevent udp traffic from an application,
it can be done here within close to zero overhead for socket() syscall
instead of checking every packet at networking layer.
It can help vrf use case as well, so you can auto-bindtodevice
only udp and tcp sockets instead of all... or do it for ipv4 or ipv6 only.
Plenty of interesting opportunities with just two extra fields
that cost nothing when not in use.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-29 20:01   ` Alexei Starovoitov
@ 2016-11-29 21:01     ` David Ahern
  2016-11-30  0:43     ` David Ahern
  1 sibling, 0 replies; 12+ messages in thread
From: David Ahern @ 2016-11-29 21:01 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/29/16 1:01 PM, Alexei Starovoitov wrote:
> Could you also expose sk_protcol and sk_type as read only fields?
> They have user space visible values already and will make this new
> BPF_PROG_TYPE_CGROUP_SOCK program type much more useful beyond vrf
> use case. Like we'll be able to write a tiny bpf program to block all
> raw sockets or all udp sockets for an application within a given cgroup.
> If someone would want to prevent udp traffic from an application,
> it can be done here within close to zero overhead for socket() syscall
> instead of checking every packet at networking layer.
> It can help vrf use case as well, so you can auto-bindtodevice
> only udp and tcp sockets instead of all... or do it for ipv4 or ipv6 only.
> Plenty of interesting opportunities with just two extra fields
> that cost nothing when not in use.

sure. for completeness I'll add sk_family too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-29 20:01   ` Alexei Starovoitov
  2016-11-29 21:01     ` David Ahern
@ 2016-11-30  0:43     ` David Ahern
  2016-11-30  0:59       ` Alexei Starovoitov
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2016-11-30  0:43 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/29/16 1:01 PM, Alexei Starovoitov wrote:
> Could you also expose sk_protcol and sk_type as read only fields?

Those are bitfields in struct sock, so can't use offsetof or sizeof. Any existing use cases that try to load a bitfield in a bpf that I can look at?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-30  0:43     ` David Ahern
@ 2016-11-30  0:59       ` Alexei Starovoitov
  2016-11-30  1:07         ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2016-11-30  0:59 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Tue, Nov 29, 2016 at 05:43:08PM -0700, David Ahern wrote:
> On 11/29/16 1:01 PM, Alexei Starovoitov wrote:
> > Could you also expose sk_protcol and sk_type as read only fields?
> 
> Those are bitfields in struct sock, so can't use offsetof or sizeof. Any existing use cases that try to load a bitfield in a bpf that I can look at?

pkt_type, vlan are also bitfileds in skb. Please see convert_skb_access()
There is a bit of ugliness due to __BIG_ENDIAN_BITFIELD though..

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-30  0:59       ` Alexei Starovoitov
@ 2016-11-30  1:07         ` David Ahern
  2016-11-30  5:41           ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2016-11-30  1:07 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/29/16 5:59 PM, Alexei Starovoitov wrote:
> On Tue, Nov 29, 2016 at 05:43:08PM -0700, David Ahern wrote:
>> On 11/29/16 1:01 PM, Alexei Starovoitov wrote:
>>> Could you also expose sk_protcol and sk_type as read only fields?
>>
>> Those are bitfields in struct sock, so can't use offsetof or sizeof. Any existing use cases that try to load a bitfield in a bpf that I can look at?
> 
> pkt_type, vlan are also bitfileds in skb. Please see convert_skb_access()
> There is a bit of ugliness due to __BIG_ENDIAN_BITFIELD though..
> 

Given the added complexity I'd prefer to defer this second use case to a follow on patch set. This one introduces the infra for sockets and I don't see anything needing to change with it to add the read of 3 more sock elements. Agree?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-30  1:07         ` David Ahern
@ 2016-11-30  5:41           ` Alexei Starovoitov
  2016-11-30 16:24             ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2016-11-30  5:41 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Tue, Nov 29, 2016 at 06:07:18PM -0700, David Ahern wrote:
> On 11/29/16 5:59 PM, Alexei Starovoitov wrote:
> > On Tue, Nov 29, 2016 at 05:43:08PM -0700, David Ahern wrote:
> >> On 11/29/16 1:01 PM, Alexei Starovoitov wrote:
> >>> Could you also expose sk_protcol and sk_type as read only fields?
> >>
> >> Those are bitfields in struct sock, so can't use offsetof or sizeof. Any existing use cases that try to load a bitfield in a bpf that I can look at?
> > 
> > pkt_type, vlan are also bitfileds in skb. Please see convert_skb_access()
> > There is a bit of ugliness due to __BIG_ENDIAN_BITFIELD though..
> > 
> 
> Given the added complexity I'd prefer to defer this second use case to a follow on patch set. This one introduces the infra for sockets and I don't see anything needing to change with it to add the read of 3 more sock elements. Agree?

I don't see a complexity. It was straightforward for skb bitfields,
but if there is some unforeseen issue, it's better to tackle it now
otherwise the feature may never come and this 'infra for sockets' will
stay as 'infra for vrf only' and I'm struggling to convince myself that it's ok.
So I'll try to tweak this patch to add these 3 fields...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-30  5:41           ` Alexei Starovoitov
@ 2016-11-30 16:24             ` David Ahern
  0 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2016-11-30 16:24 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/29/16 10:41 PM, Alexei Starovoitov wrote:
> I don't see a complexity. It was straightforward for skb bitfields,
> but if there is some unforeseen issue, it's better to tackle it now
> otherwise the feature may never come and this 'infra for sockets' will
> stay as 'infra for vrf only' and I'm struggling to convince myself that it's ok.
> So I'll try to tweak this patch to add these 3 fields...
> 

regardless of whether the change is attached to this patch set or sent as a follow on, I will make 2 separate patches -- 1 that adds the fields to bpf_sock and updates the filter.c code and a second patch that adds a test case and automation for it. There is no reason to combine it into 1 large patch.

If you think the use case to fail socket create based on family/proto/type checks is valid, it is valid regardless of when the patch comes.

I take your comment to mean you believe if I don't do it now then it won't get done. Am I over reading your comment? Do you really believe it?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-30 16:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-29 15:53 [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
2016-11-29 15:53 ` [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
2016-11-29 19:41   ` Alexei Starovoitov
2016-11-29 15:53 ` [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
2016-11-29 20:01   ` Alexei Starovoitov
2016-11-29 21:01     ` David Ahern
2016-11-30  0:43     ` David Ahern
2016-11-30  0:59       ` Alexei Starovoitov
2016-11-30  1:07         ` David Ahern
2016-11-30  5:41           ` Alexei Starovoitov
2016-11-30 16:24             ` David Ahern
2016-11-29 15:53 ` [PATCH net-next v5 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox