[PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev_if
@ 2016-11-28 15:48 David Ahern
  2016-11-28 15:48 ` [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Ahern @ 2016-11-28 15:48 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.

This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.

For example:
 1. configure vrf (e.g., using ifupdown2)
        auto eth0
        iface eth0 inet dhcp
            vrf mgmt

        auto mgmt
        iface mgmt
            vrf-table auto

 2. configure cgroup
        mount -t cgroup2 none /tmp/cgroupv2
        mkdir /tmp/cgroupv2/mgmt
        test_cgrp2_sock /tmp/cgroupv2/mgmt 15

 3. set shell into cgroup (e.g., can be done at login using pam)
        echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs

At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.

This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.

David Ahern (3):
  bpf: Refactor cgroups code in prep for new type
  bpf: Add new cgroup attach type to enable sock modifications
  samples: bpf: add userspace example for modifying sk_bound_dev_if

 include/linux/bpf-cgroup.h    | 11 ++++++
 include/linux/filter.h        |  2 +-
 include/uapi/linux/bpf.h      |  6 ++++
 kernel/bpf/cgroup.c           | 36 ++++++++++++++++---
 kernel/bpf/syscall.c          | 33 +++++++++--------
 net/core/filter.c             | 65 +++++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c            | 12 ++++++-
 net/ipv6/af_inet6.c           |  8 +++++
 samples/bpf/Makefile          |  2 ++
 samples/bpf/test_cgrp2_sock.c | 83 +++++++++++++++++++++++++++++++++++++++++++
 10 files changed, 237 insertions(+), 21 deletions(-)
 create mode 100644 samples/bpf/test_cgrp2_sock.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type
  2016-11-28 15:48 [PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
@ 2016-11-28 15:48 ` David Ahern
  2016-11-28 20:06   ` Alexei Starovoitov
  2016-11-28 15:48 ` [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
  2016-11-28 15:48 ` [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
  2 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2016-11-28 15:48 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Code move only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- dropped the rename

v2
- fix bpf_prog_run_clear_cb to bpf_prog_run_save_cb as caught by Daniel

- rename BPF_PROG_TYPE_CGROUP_SKB and its cg_skb functions to
  BPF_PROG_TYPE_CGROUP and cgroup

 kernel/bpf/cgroup.c  | 27 ++++++++++++++++++++++-----
 kernel/bpf/syscall.c | 28 +++++++++++++++-------------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0ab43f264b0..d5746aec8f34 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,6 +117,19 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 	}
 }
 
+static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb,
+				       struct bpf_prog *prog)
+{
+	unsigned int offset = skb->data - skb_network_header(skb);
+	int ret;
+
+	__skb_push(skb, offset);
+	ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
+	__skb_pull(skb, offset);
+
+	return ret;
+}
+
 /**
  * __cgroup_bpf_run_filter() - Run a program for packet filtering
  * @sk: The socken sending or receiving traffic
@@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 
 	prog = rcu_dereference(cgrp->bpf.effective[type]);
 	if (prog) {
-		unsigned int offset = skb->data - skb_network_header(skb);
-
-		__skb_push(skb, offset);
-		ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
-		__skb_pull(skb, offset);
+		switch (type) {
+		case BPF_CGROUP_INET_INGRESS:
+		case BPF_CGROUP_INET_EGRESS:
+			ret = __cgroup_bpf_run_filter_skb(skb, prog);
+			break;
+		/* make gcc happy else complains about missing enum value */
+		default:
+			return 0;
+		}
 	}
 
 	rcu_read_unlock();
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1090d16a31c1..c2bce596e842 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -843,6 +843,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
+	enum bpf_prog_type ptype;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -853,25 +854,26 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
-		prog = bpf_prog_get_type(attr->attach_bpf_fd,
-					 BPF_PROG_TYPE_CGROUP_SKB);
-		if (IS_ERR(prog))
-			return PTR_ERR(prog);
-
-		cgrp = cgroup_get_from_fd(attr->target_fd);
-		if (IS_ERR(cgrp)) {
-			bpf_prog_put(prog);
-			return PTR_ERR(cgrp);
-		}
-
-		cgroup_bpf_update(cgrp, prog, attr->attach_type);
-		cgroup_put(cgrp);
+		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 
 	default:
 		return -EINVAL;
 	}
 
+	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(cgrp);
+	}
+
+	cgroup_bpf_update(cgrp, prog, attr->attach_type);
+	cgroup_put(cgrp);
+
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-28 15:48 [PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
  2016-11-28 15:48 ` [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-11-28 15:48 ` David Ahern
  2016-11-28 20:32   ` Alexei Starovoitov
  2016-11-28 15:48 ` [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
  2 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2016-11-28 15:48 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- reverted to new prog type BPF_PROG_TYPE_CGROUP_SOCK
- dropped the subtype

v2
- dropped the bpf_sock_store_u32 helper
- dropped the new prog type BPF_PROG_TYPE_CGROUP_SOCK
- moved valid access and context conversion to use subtype
- dropped CREATE from BPF_CGROUP_INET_SOCK and related function names
- moved running of filter from sk_alloc to inet{6}_create

 include/linux/bpf-cgroup.h | 11 ++++++++
 include/linux/filter.h     |  2 +-
 include/uapi/linux/bpf.h   |  6 +++++
 kernel/bpf/cgroup.c        |  9 +++++++
 kernel/bpf/syscall.c       |  5 +++-
 net/core/filter.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c         | 12 ++++++++-
 net/ipv6/af_inet6.c        |  8 ++++++
 8 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index ec80d0c0953e..2ca529664c8b 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -64,6 +64,16 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 	__ret;								\
 })
 
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				\
+({									\
+	int __ret = 0;							\
+	if (cgroup_bpf_enabled && sk) {					\
+		__ret = __cgroup_bpf_run_filter(sk, NULL,		\
+						BPF_CGROUP_INET_SOCK);	\
+	}								\
+	__ret;								\
+})
+
 #else
 
 struct cgroup_bpf {};
@@ -73,6 +83,7 @@ static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
 
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c521adfe..808e158742a2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -408,7 +408,7 @@ struct bpf_prog {
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
-	unsigned int		(*bpf_func)(const struct sk_buff *skb,
+	unsigned int		(*bpf_func)(const void *ctx,
 					    const struct bpf_insn *filter);
 	/* Instructions for interpreter */
 	union {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1370a9d1456f..8f410ecdaac4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,11 +101,13 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
 	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
 };
 
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -537,6 +539,10 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+struct bpf_sock {
+	__u32 bound_dev_if;
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index d5746aec8f34..796e39aa28f5 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,6 +117,12 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 	}
 }
 
+static int __cgroup_bpf_run_filter_sock(struct sock *sk,
+					struct bpf_prog *prog)
+{
+	return prog->bpf_func(sk, prog->insnsi) == 1 ? 0 : -EPERM;
+}
+
 static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb,
 				       struct bpf_prog *prog)
 {
@@ -171,6 +177,9 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 		case BPF_CGROUP_INET_EGRESS:
 			ret = __cgroup_bpf_run_filter_skb(skb, prog);
 			break;
+		case BPF_CGROUP_INET_SOCK:
+			ret = __cgroup_bpf_run_filter_sock(sk, prog);
+			break;
 		/* make gcc happy else complains about missing enum value */
 		default:
 			return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c2bce596e842..f5247901a4cc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -856,7 +856,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_EGRESS:
 		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
-
+	case BPF_CGROUP_INET_SOCK:
+		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -892,6 +894,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK:
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
diff --git a/net/core/filter.c b/net/core/filter.c
index ea315af56511..593b6b664c0c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2645,6 +2645,12 @@ cg_skb_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+static const struct bpf_func_proto *
+cg_sock_func_proto(enum bpf_func_id func_id)
+{
+	return NULL;
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
 	if (off < 0 || off >= sizeof(struct __sk_buff))
@@ -2682,6 +2688,29 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool sock_filter_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					enum bpf_reg_type *reg_type)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_sock, bound_dev_if):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	if (off < 0 || off + size > sizeof(struct bpf_sock))
+		return false;
+
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+
+	return true;
+}
+
 static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 			       const struct bpf_prog *prog)
 {
@@ -2940,6 +2969,30 @@ static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
+					  int dst_reg, int src_reg,
+					  int ctx_off,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct bpf_sock, bound_dev_if):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_bound_dev_if) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg,
+					offsetof(struct sock, sk_bound_dev_if));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+				      offsetof(struct sock, sk_bound_dev_if));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 static u32 tc_cls_act_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					 int src_reg, int ctx_off,
 					 struct bpf_insn *insn_buf,
@@ -3013,6 +3066,12 @@ static const struct bpf_verifier_ops cg_skb_ops = {
 	.convert_ctx_access	= sk_filter_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops cg_sock_ops = {
+	.get_func_proto		= cg_sock_func_proto,
+	.is_valid_access	= sock_filter_is_valid_access,
+	.convert_ctx_access	= sock_filter_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops	= &sk_filter_ops,
 	.type	= BPF_PROG_TYPE_SOCKET_FILTER,
@@ -3038,6 +3097,11 @@ static struct bpf_prog_type_list cg_skb_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_CGROUP_SKB,
 };
 
+static struct bpf_prog_type_list cg_sock_type __read_mostly = {
+	.ops	= &cg_sock_ops,
+	.type	= BPF_PROG_TYPE_CGROUP_SOCK
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
@@ -3045,6 +3109,7 @@ static int __init register_sk_filter_ops(void)
 	bpf_register_prog_type(&sched_act_type);
 	bpf_register_prog_type(&xdp_type);
 	bpf_register_prog_type(&cg_skb_type);
+	bpf_register_prog_type(&cg_sock_type);
 
 	return 0;
 }
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5ddf5cda07f4..24d2550492ee 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 
 	if (sk->sk_prot->init) {
 		err = sk->sk_prot->init(sk);
-		if (err)
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
 			sk_common_release(sk);
+			goto out;
+		}
 	}
 out:
 	return err;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d424f3a3737a..237e654ba717 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -258,6 +258,14 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 			goto out;
 		}
 	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
 out:
 	return err;
 out_rcu_unlock:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
  2016-11-28 15:48 [PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
  2016-11-28 15:48 ` [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
  2016-11-28 15:48 ` [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
@ 2016-11-28 15:48 ` David Ahern
  2016-11-28 20:37   ` Alexei Starovoitov
  2 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2016-11-28 15:48 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v3
- revert to BPF_PROG_TYPE_CGROUP_SOCK prog type

v2
- removed bpf_sock_store_u32 references
- changed BPF_CGROUP_INET_SOCK_CREATE to BPF_CGROUP_INET_SOCK
- remove BPF_PROG_TYPE_CGROUP_SOCK prog type and add prog_subtype

 samples/bpf/Makefile          |  2 ++
 samples/bpf/test_cgrp2_sock.c | 83 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_sock.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index fb17206ddb57..7de5cd9849c2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -23,6 +23,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
+hostprogs-y += test_cgrp2_sock
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -51,6 +52,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
+test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
new file mode 100644
index 000000000000..2831e5f41f86
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -0,0 +1,83 @@
+/* eBPF example program:
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
+ *   sockets opened by processes in the cgroup.
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/bpf.h>
+
+#include "libbpf.h"
+
+static int prog_load(int idx)
+{
+	struct bpf_insn prog[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+		BPF_MOV64_IMM(BPF_REG_3, idx),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+			     "GPL", 0);
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s cg-path device-index\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	int cg_fd, prog_fd, ret;
+	int idx = 0;
+
+	if (argc < 2)
+		return usage(argv[0]);
+
+	idx = atoi(argv[2]);
+	if (!idx) {
+		printf("Invalid device index\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (cg_fd < 0) {
+		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	prog_fd = prog_load(idx);
+	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
+
+	if (prog_fd < 0) {
+		printf("Failed to load prog: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK);
+	if (ret < 0) {
+		printf("Failed to attach prog to cgroup: '%s'\n",
+		       strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	return EXIT_SUCCESS;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type
  2016-11-28 15:48 ` [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-11-28 20:06   ` Alexei Starovoitov
  2016-11-28 20:31     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2016-11-28 20:06 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Mon, Nov 28, 2016 at 07:48:48AM -0800, David Ahern wrote:
> Code move only; no functional change intended.

not quite...

> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
...
>   * @sk: The socken sending or receiving traffic
> @@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk,
>  
>  	prog = rcu_dereference(cgrp->bpf.effective[type]);
>  	if (prog) {
> -		unsigned int offset = skb->data - skb_network_header(skb);
> -
> -		__skb_push(skb, offset);
> -		ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
> -		__skb_pull(skb, offset);
> +		switch (type) {
> +		case BPF_CGROUP_INET_INGRESS:
> +		case BPF_CGROUP_INET_EGRESS:
> +			ret = __cgroup_bpf_run_filter_skb(skb, prog);
> +			break;

hmm. what's a point of double jump table? It's only burning cycles
in the fast path. We already have
prog = rcu_dereference(cgrp->bpf.effective[type]); if (prog) {...}
Could you do a variant of __cgroup_bpf_run_filter() instead ?
That doesnt't take 'skb' as an argument.
It will also solve scary looking NULL skb from patch 2:
__cgroup_bpf_run_filter(sk, NULL, ...

and to avoid copy-pasting first dozen lines of current
__cgroup_bpf_run_filter can be moved into a helper that
__cgroup_bpf_run_filter_skb and
__cgroup_bpf_run_filter_sk will call.
Or some other way to rearrange that code.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type
  2016-11-28 20:06   ` Alexei Starovoitov
@ 2016-11-28 20:31     ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2016-11-28 20:31 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/28/16 1:06 PM, Alexei Starovoitov wrote:
> On Mon, Nov 28, 2016 at 07:48:48AM -0800, David Ahern wrote:
>> Code move only; no functional change intended.
> 
> not quite...
> 
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>>   * @sk: The socken sending or receiving traffic
>> @@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk,
>>  
>>  	prog = rcu_dereference(cgrp->bpf.effective[type]);
>>  	if (prog) {
>> -		unsigned int offset = skb->data - skb_network_header(skb);
>> -
>> -		__skb_push(skb, offset);
>> -		ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
>> -		__skb_pull(skb, offset);
>> +		switch (type) {
>> +		case BPF_CGROUP_INET_INGRESS:
>> +		case BPF_CGROUP_INET_EGRESS:
>> +			ret = __cgroup_bpf_run_filter_skb(skb, prog);
>> +			break;
> 
> hmm. what's a point of double jump table? It's only burning cycles
> in the fast path. We already have
> prog = rcu_dereference(cgrp->bpf.effective[type]); if (prog) {...}
> Could you do a variant of __cgroup_bpf_run_filter() instead ?
> That doesnt't take 'skb' as an argument.
> It will also solve scary looking NULL skb from patch 2:
> __cgroup_bpf_run_filter(sk, NULL, ...
> 
> and to avoid copy-pasting first dozen lines of current
> __cgroup_bpf_run_filter can be moved into a helper that
> __cgroup_bpf_run_filter_skb and
> __cgroup_bpf_run_filter_sk will call.
> Or some other way to rearrange that code.
> 

sure
1. rename the existing __cgroup_bpf_run_filter to __cgroup_bpf_run_filter_skb

2. create new __cgroup_bpf_run_filter_sk for this new program type. the new run_filter does not need the family or full sock checks for this use case so the common code is only the sock_cgroup_ptr and prog lookups.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-28 15:48 ` [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
@ 2016-11-28 20:32   ` Alexei Starovoitov
  2016-11-28 20:57     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2016-11-28 20:32 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Mon, Nov 28, 2016 at 07:48:49AM -0800, David Ahern wrote:
> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
> any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
> Currently only sk_bound_dev_if is exported to userspace for modification
> by a bpf program.
> 
> This allows a cgroup to be configured such that AF_INET{6} sockets opened
> by processes are automatically bound to a specific device. In turn, this
> enables the running of programs that do not support SO_BINDTODEVICE in a
> specific VRF context / L3 domain.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
...
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 1f09c521adfe..808e158742a2 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -408,7 +408,7 @@ struct bpf_prog {
>  	enum bpf_prog_type	type;		/* Type of BPF program */
>  	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
>  	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
> -	unsigned int		(*bpf_func)(const struct sk_buff *skb,
> +	unsigned int		(*bpf_func)(const void *ctx,
>  					    const struct bpf_insn *filter);

Daniel already tweaked it. pls rebase.

> +static const struct bpf_func_proto *
> +cg_sock_func_proto(enum bpf_func_id func_id)
> +{
> +	return NULL;
> +}

if you don't want any helpers, just don't set .get_func_proto.
See check_call() in verifier.
Though why not allow socket filter like helpers that
sk_filter_func_proto() provides?
tail call, bpf_trace_printk, maps are useful things that you get for free.
Developing programs without bpf_trace_printk is pretty hard.

> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 5ddf5cda07f4..24d2550492ee 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
>  
>  	if (sk->sk_prot->init) {
>  		err = sk->sk_prot->init(sk);
> -		if (err)
> +		if (err) {
> +			sk_common_release(sk);
> +			goto out;
> +		}
> +	}
> +
> +	if (!kern) {
> +		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);

i guess from vrf use case point of view this is the best place,
since so_bindtodevice can still override it,
but thinking little bit into other use case like port binding
restrictions and port rewrites can we move it into inet_bind ?
My understanding nothing will be using bound_dev_if until that
time, so we can set it there?
And at that point we can extend 'struct bpf_sock' with other
fields like port and sockaddr...
and single BPF_PROG_TYPE_CGROUP_SOCK type will be used for
vrf and port binding use cases...
More users, more testing of that code path...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
  2016-11-28 15:48 ` [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
@ 2016-11-28 20:37   ` Alexei Starovoitov
  2016-11-28 20:47     ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2016-11-28 20:37 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On Mon, Nov 28, 2016 at 07:48:50AM -0800, David Ahern wrote:
> Add a simple program to demonstrate the ability to attach a bpf program
> to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
> are created.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
...
> +static int prog_load(int idx)
> +{
> +	struct bpf_insn prog[] = {
> +		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +		BPF_MOV64_IMM(BPF_REG_3, idx),
> +		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
> +		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
> +		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
> +		BPF_EXIT_INSN(),
> +	};
> +
> +	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
> +			     "GPL", 0);
> +}

the program looks trivial enough :)

Could you integrate it into iproute2 as well ?
Then the whole vrf management will be easier.
The user wouldn't even need to be aware that iproute2 sets up
this program. It will know ifindex and can delete
the prog when vrf configs change and so on.

Also please convert this sample into automated test like samples/bpf/*.sh
we're going to move all of them to tools/testing/selftests/ eventually.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
  2016-11-28 20:37   ` Alexei Starovoitov
@ 2016-11-28 20:47     ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2016-11-28 20:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/28/16 1:37 PM, Alexei Starovoitov wrote:
> On Mon, Nov 28, 2016 at 07:48:50AM -0800, David Ahern wrote:
>> Add a simple program to demonstrate the ability to attach a bpf program
>> to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
>> are created.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>> +static int prog_load(int idx)
>> +{
>> +	struct bpf_insn prog[] = {
>> +		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
>> +		BPF_MOV64_IMM(BPF_REG_3, idx),
>> +		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
>> +		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
>> +		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
>> +		BPF_EXIT_INSN(),
>> +	};
>> +
>> +	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
>> +			     "GPL", 0);
>> +}
> 
> the program looks trivial enough :)
> 
> Could you integrate it into iproute2 as well ?

yes, that is the plan. iproute2 can be used for all things vrf. As infra goes into the kernel, support is added to iproute2

> Then the whole vrf management will be easier.
> The user wouldn't even need to be aware that iproute2 sets up
> this program. It will know ifindex and can delete
> the prog when vrf configs change and so on.
> 
> Also please convert this sample into automated test like samples/bpf/*.sh
> we're going to move all of them to tools/testing/selftests/ eventually.
> 

ok

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications
  2016-11-28 20:32   ` Alexei Starovoitov
@ 2016-11-28 20:57     ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2016-11-28 20:57 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 11/28/16 1:32 PM, Alexei Starovoitov wrote:
> On Mon, Nov 28, 2016 at 07:48:49AM -0800, David Ahern wrote:
>> Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
>> BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
>> any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
>> Currently only sk_bound_dev_if is exported to userspace for modification
>> by a bpf program.
>>
>> This allows a cgroup to be configured such that AF_INET{6} sockets opened
>> by processes are automatically bound to a specific device. In turn, this
>> enables the running of programs that do not support SO_BINDTODEVICE in a
>> specific VRF context / L3 domain.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 1f09c521adfe..808e158742a2 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -408,7 +408,7 @@ struct bpf_prog {
>>  	enum bpf_prog_type	type;		/* Type of BPF program */
>>  	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
>>  	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
>> -	unsigned int		(*bpf_func)(const struct sk_buff *skb,
>> +	unsigned int		(*bpf_func)(const void *ctx,
>>  					    const struct bpf_insn *filter);
> 
> Daniel already tweaked it. pls rebase.

ack

> 
>> +static const struct bpf_func_proto *
>> +cg_sock_func_proto(enum bpf_func_id func_id)
>> +{
>> +	return NULL;
>> +}
> 
> if you don't want any helpers, just don't set .get_func_proto.
> See check_call() in verifier.

ack.

> Though why not allow socket filter like helpers that
> sk_filter_func_proto() provides?
> tail call, bpf_trace_printk, maps are useful things that you get for free.
> Developing programs without bpf_trace_printk is pretty hard.

this use case was trivial enough, but in general I get your point will use sk_filter_func_proto.

> 
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 5ddf5cda07f4..24d2550492ee 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
>>  
>>  	if (sk->sk_prot->init) {
>>  		err = sk->sk_prot->init(sk);
>> -		if (err)
>> +		if (err) {
>> +			sk_common_release(sk);
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	if (!kern) {
>> +		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
> 
> i guess from vrf use case point of view this is the best place,
> since so_bindtodevice can still override it,
> but thinking little bit into other use case like port binding
> restrictions and port rewrites can we move it into inet_bind ?

Deferring to inet_bind won't work for a number of use cases (e.g., udp, raw).

> My understanding nothing will be using bound_dev_if until that
> time, so we can set it there?

And yes, I do want to allow a sufficiently privileged process to override the inherited setting. For example, shell is management vrf cgroup and root user wants to run a program that sends packets out a data plane vrf using an option built into the program. The sequence is:

1. socket - inherits sk_bound_dev_if from bpf program attached to mgmt cgroup

2. setsockopt( new vrf )

3. connect - lookups to remote address use vrf from step 2.

Thanks for the review.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-11-28 20:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-28 15:48 [PATCH net-next v3 0/3] net: Add bpf support to set sk_bound_dev_if David Ahern
2016-11-28 15:48 ` [PATCH net-next v3 1/3] bpf: Refactor cgroups code in prep for new type David Ahern
2016-11-28 20:06   ` Alexei Starovoitov
2016-11-28 20:31     ` David Ahern
2016-11-28 15:48 ` [PATCH net-next v3 2/3] bpf: Add new cgroup attach type to enable sock modifications David Ahern
2016-11-28 20:32   ` Alexei Starovoitov
2016-11-28 20:57     ` David Ahern
2016-11-28 15:48 ` [PATCH net-next v3 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
2016-11-28 20:37   ` Alexei Starovoitov
2016-11-28 20:47     ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).