RFC PATCH net-next 00/15] bpf: Add new SOCKET

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type
@ 2017-06-13 17:59 Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 01/15] net: BPF support for socket ops Lawrence Brakmo
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCKET_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCKET_OPS_TIMEOUT_INIT
  BPF_SOCKET_OPS_RWND_INIT
  BPF_SOCKET_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCKET_OPS_TCP_CONNECT_CB
  BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

Currently there is functionality to load one global BPF program of this
type but I plan to add support for loading per cgroup socket ops BPF
programs in the near future. When that is done, the global program could
be called when a cgroup has no program associated with it.

One question is whether I should add this functionality into David Ahern's
BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf type. Whereas the
current cgroup_sock type expects to be called only once during a connection's
lifetime, the new socket_ops type could be called multipe times. My preference
is to define a new cgroup BPF program type (BPF_PROG_TYPE_CGROUP_SOCKET_OPS)

This patche also includes sample BPF programs to demostrate the differnet
features.

Consists of the following patches:

[RFC PATCH net-next 01/15] net: BPF support for socket ops
[RFC PATCH net-next 02/15] bpf: program to load socketops BPF
[RFC PATCH net-next 03/15] bpf: Support for per connection
[RFC PATCH net-next 04/15] bpf: Sample bpf program to set SYN/SYN-ACK
[RFC PATCH net-next 05/15] bpf: Support for setting initial receive
[RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial
[RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf
[RFC PATCH net-next 08/15] bpf: Add TCP connection BPF callbacks
[RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer
[RFC PATCH net-next 10/15] bpf: Add support for changing congestion
[RFC PATCH net-next 11/15] bpf: Sample BPF program to set congestion
[RFC PATCH net-next 12/15] bpf: Adds support for setting initial cwnd
[RFC PATCH net-next 13/15] bpf: Sample BPF program to set initial
[RFC PATCH net-next 14/15] bpf: Adds support for setting sndcwnd
[RFC PATCH net-next 15/15] bpf: Sample bpf program to set sndcwnd

 include/linux/bpf.h           |   6 +
 include/linux/bpf_types.h     |   1 +
 include/linux/filter.h        |  10 ++
 include/net/tcp.h             |  57 +++++++++-
 include/uapi/linux/bpf.h      |  66 ++++++++++-
 kernel/bpf/syscall.c          |   3 +
 net/core/Makefile             |   3 +-
 net/core/filter.c             | 258 ++++++++++++++++++++++++++++++++++++++++++
 net/core/sock_bpfops.c        |  67 +++++++++++
 net/ipv4/tcp.c                |   2 +-
 net/ipv4/tcp_cong.c           |  15 ++-
 net/ipv4/tcp_fastopen.c       |   1 +
 net/ipv4/tcp_input.c          |  10 +-
 net/ipv4/tcp_minisocks.c      |   9 +-
 net/ipv4/tcp_output.c         |  18 ++-
 samples/bpf/Makefile          |   9 ++
 samples/bpf/bpf_helpers.h     |   3 +
 samples/bpf/bpf_load.c        |  13 ++-
 samples/bpf/tcp_bpf.c         |  81 +++++++++++++
 samples/bpf/tcp_bufs_kern.c   |  71 ++++++++++++
 samples/bpf/tcp_clamp_kern.c  |  86 ++++++++++++++
 samples/bpf/tcp_cong_kern.c   |  67 +++++++++++
 samples/bpf/tcp_iw_kern.c     |  73 ++++++++++++
 samples/bpf/tcp_rwnd_kern.c   |  54 +++++++++
 samples/bpf/tcp_synrto_kern.c |  53 +++++++++
 25 files changed, 1015 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 01/15] net: BPF support for socket ops
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:12   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 02/15] bpf: program to load socketops BPF programs Lawrence Brakmo
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). Currently there is
functionality to load one global BPF program of this type which can be
called at appropriate times to set relevant connection parameters such
as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection
information such as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

I plan to add support for loading per cgroup socket ops BPF programs in
the near future. One question is whether I should add this functionality
into David Ahern's BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf
type. Whereas the current cgroup_sock type expects to be called only once
during a connection's lifetime, the new socket_ops type could be called
multipe times. For example, before sending SYN and SYN-ACKs to set an
appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_socket_ops_kern {
        struct sock *sk;
	__u32  is_req_sock:1;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version */
struct bpf_socket_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;
        __u32 local_ip4;
        __u32 remote_ip6[4];
        __u32 local_ip6[4];
        __u32 remote_port;
        __u32 local_port;
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/linux/bpf.h       |   6 ++
 include/linux/bpf_types.h |   1 +
 include/linux/filter.h    |  10 +++
 include/net/tcp.h         |  27 ++++++++
 include/uapi/linux/bpf.h  |  28 +++++++++
 kernel/bpf/syscall.c      |   3 +
 net/core/Makefile         |   3 +-
 net/core/filter.c         | 157 ++++++++++++++++++++++++++++++++++++++++++++++
 net/core/sock_bpfops.c    |  67 ++++++++++++++++++++
 samples/bpf/bpf_load.c    |  13 +++-
 10 files changed, 311 insertions(+), 4 deletions(-)
 create mode 100644 net/core/sock_bpfops.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fcc80ca..b5058e4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -360,4 +360,10 @@ extern const struct bpf_func_proto bpf_get_stackid_proto;
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
+/* socket_ops related */
+struct sock;
+struct bpf_socket_ops_kern;
+
+int bpf_socket_ops_set_prog(int fd);
+int bpf_socket_ops_call(struct bpf_socket_ops_kern *bpf_socket);
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 03bf223..ca69d10 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -10,6 +10,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit_prog_ops)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SOCKET_OPS, socket_ops_prog_ops)
 #endif
 #ifdef CONFIG_BPF_EVENTS
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe_prog_ops)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a20ba40..8f69b8b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -899,4 +899,14 @@ static inline int bpf_tell_extensions(void)
 	return SKF_AD_MAX;
 }
 
+struct bpf_socket_ops_kern {
+	struct	sock *sk;
+	__u32	is_req_sock:1;
+	__u32	op;
+	union {
+		__u32 reply;
+		__u32 replylong[4];
+	};
+};
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 82462db..bc1c92c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -46,6 +46,9 @@
 #include <linux/seq_file.h>
 #include <linux/memcontrol.h>
 
+#include <linux/bpf.h>
+#include <linux/filter.h>
+
 extern struct inet_hashinfo tcp_hashinfo;
 
 extern struct percpu_counter tcp_orphan_count;
@@ -1992,4 +1995,28 @@ static inline void tcp_listendrop(const struct sock *sk)
 
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
 
+/* Call BPF_SOCKET_OPS program that returns an int. If the return value
+ * is < 0, then the BPF op failed (for example if the loaded BPF
+ * program does not support the chosen operation or there is no BPF
+ * program loaded).
+ */
+#ifdef CONFIG_BPF
+static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
+{
+	struct bpf_socket_ops_kern socket_ops;
+
+	memset(&socket_ops, 0, sizeof(socket_ops));
+	socket_ops.sk = sk;
+	socket_ops.is_req_sock = is_req_sock ? 1 : 0;
+	socket_ops.op = op;
+
+	return bpf_socket_ops_call(&socket_ops);
+}
+#else
+static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
+{
+	return -1;
+}
+#endif
+
 #endif	/* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e78aece..06e7ee1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,6 +82,7 @@ enum bpf_cmd {
 	BPF_PROG_ATTACH,
 	BPF_PROG_DETACH,
 	BPF_PROG_TEST_RUN,
+	BPF_PROG_LOAD_SOCKET_OPS,
 };
 
 enum bpf_map_type {
@@ -115,6 +116,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LWT_IN,
 	BPF_PROG_TYPE_LWT_OUT,
 	BPF_PROG_TYPE_LWT_XMIT,
+	BPF_PROG_TYPE_SOCKET_OPS,
 };
 
 enum bpf_attach_type {
@@ -673,4 +675,30 @@ struct xdp_md {
 	__u32 data_end;
 };
 
+/* User bpf_socket_ops struct to access socket values and specify request ops
+ * and their replies.
+ * New fields can only be added at the end of this structure
+ */
+struct bpf_socket_ops {
+	__u32 op;
+	union {
+		__u32 reply;
+		__u32 replylong[4];
+	};
+	__u32 family;
+	__u32 remote_ip4;
+	__u32 local_ip4;
+	__u32 remote_ip6[4];
+	__u32 local_ip6[4];
+	__u32 remote_port;
+	__u32 local_port;
+};
+
+/* List of known BPF socket_ops operators.
+ * New entries can only be added at the end
+ */
+enum {
+	BPF_SOCKET_OPS_VOID,
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 59da103..5ef8c01 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1076,6 +1076,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_TEST_RUN:
 		err = bpf_prog_test_run(&attr, uattr);
 		break;
+	case BPF_PROG_LOAD_SOCKET_OPS:
+		err = bpf_socket_ops_set_prog(attr.bpf_fd);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/net/core/Makefile b/net/core/Makefile
index 79f9479..5d711c2 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
 obj-y		     += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
 			neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
-			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o
+			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
+			sock_bpfops.o
 
 obj-$(CONFIG_XFRM) += flow.o
 obj-y += net-sysfs.o
diff --git a/net/core/filter.c b/net/core/filter.c
index 946f758..7b61690 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3053,6 +3053,37 @@ void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
+static bool __is_valid_socket_ops_access(int off, int size,
+					 enum bpf_access_type type)
+{
+	if (off < 0 || off >= sizeof(struct bpf_socket_ops))
+		return false;
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+	if (size != sizeof(__u32))
+		return false;
+
+	return true;
+}
+
+static bool socket_ops_is_valid_access(int off, int size,
+				       enum bpf_access_type type,
+				       enum bpf_reg_type *reg_type)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_socket_ops, op) ...
+		     offsetof(struct bpf_socket_ops, replylong[3]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	return __is_valid_socket_ops_access(off, size, type);
+}
+
 static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 				  const struct bpf_insn *si,
 				  struct bpf_insn *insn_buf,
@@ -3322,6 +3353,126 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 	return insn - insn_buf;
 }
 
+static u32 socket_ops_convert_ctx_access(enum bpf_access_type type,
+					 const struct bpf_insn *si,
+					 struct bpf_insn *insn_buf,
+					 struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_socket_ops, op) ...
+	     offsetof(struct bpf_socket_ops, replylong[3]):
+		off = si->off;
+		off -= offsetof(struct bpf_socket_ops, op);
+		off += offsetof(struct bpf_socket_ops_kern, op);
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					      off);
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					      off);
+		break;
+
+	case offsetof(struct bpf_socket_ops, family):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_family) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_family));
+		break;
+
+	case offsetof(struct bpf_socket_ops, remote_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_daddr));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_socket_ops, local_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_rcv_saddr));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_socket_ops, remote_ip6[0]) ...
+		offsetof(struct bpf_socket_ops, remote_ip6[3]):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_daddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct bpf_socket_ops, remote_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_daddr.s6_addr32[0]) +
+				      off);
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_socket_ops, local_ip6[0]) ...
+		offsetof(struct bpf_socket_ops, local_ip6[3]):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_rcv_saddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct bpf_socket_ops, local_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_rcv_saddr.s6_addr32[0]) +
+				      off);
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_socket_ops, remote_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_dport));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 16);
+		break;
+
+	case offsetof(struct bpf_socket_ops, local_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_socket_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_socket_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_num));
+		break;
+	}
+	return insn - insn_buf;
+}
+
 const struct bpf_verifier_ops sk_filter_prog_ops = {
 	.get_func_proto		= sk_filter_func_proto,
 	.is_valid_access	= sk_filter_is_valid_access,
@@ -3371,6 +3522,12 @@ const struct bpf_verifier_ops cg_sock_prog_ops = {
 	.convert_ctx_access	= sock_filter_convert_ctx_access,
 };
 
+const struct bpf_verifier_ops socket_ops_prog_ops = {
+	.get_func_proto		= bpf_base_func_proto,
+	.is_valid_access	= socket_ops_is_valid_access,
+	.convert_ctx_access	= socket_ops_convert_ctx_access,
+};
+
 int sk_detach_filter(struct sock *sk)
 {
 	int ret = -ENOENT;
diff --git a/net/core/sock_bpfops.c b/net/core/sock_bpfops.c
new file mode 100644
index 0000000..8f8daa5
--- /dev/null
+++ b/net/core/sock_bpfops.c
@@ -0,0 +1,67 @@
+/*
+ * BPF support for sockets
+ *
+ * Copyright (c) 2016 Lawrence Brakmo <brakmo@fb.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#include <net/sock.h>
+#include <linux/skbuff.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/errno.h>
+#ifdef CONFIG_NET_NS
+#include <net/net_namespace.h>
+#include <linux/proc_ns.h>
+#endif
+
+/* Global BPF program for sockets */
+static struct bpf_prog *bpf_socket_ops_prog;
+static DEFINE_RWLOCK(bpf_socket_ops_lock);
+
+int bpf_socket_ops_set_prog(int fd)
+{
+	int err = 0;
+
+	write_lock(&bpf_socket_ops_lock);
+	if (bpf_socket_ops_prog) {
+		bpf_prog_put(bpf_socket_ops_prog);
+		bpf_socket_ops_prog = NULL;
+	}
+
+	/* fd of zero is used as a signal to remove the current
+	 * bpf_socket_ops_prog.
+	 */
+	if (fd == 0) {
+		write_unlock(&bpf_socket_ops_lock);
+		return 1;
+	}
+
+	bpf_socket_ops_prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_OPS);
+	if (IS_ERR(bpf_socket_ops_prog)) {
+		bpf_prog_put(bpf_socket_ops_prog);
+		bpf_socket_ops_prog = NULL;
+		err = 1;
+	}
+	write_unlock(&bpf_socket_ops_lock);
+	return err;
+}
+
+int bpf_socket_ops_call(struct bpf_socket_ops_kern *bpf_socket)
+{
+	int ret;
+
+	read_lock(&bpf_socket_ops_lock);
+	if (bpf_socket_ops_prog) {
+		rcu_read_lock();
+		ret = (int)BPF_PROG_RUN(bpf_socket_ops_prog, bpf_socket);
+		rcu_read_unlock();
+	} else {
+		ret = -1;
+	}
+	read_unlock(&bpf_socket_ops_lock);
+	return ret;
+}
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index a91c57d..c18d713 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -64,6 +64,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
 	bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
 	bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
+	bool is_sockops = strncmp(event, "sockops", 7) == 0;
 	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
@@ -89,6 +90,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_CGROUP_SKB;
 	} else if (is_cgroup_sk) {
 		prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
+	} else if (is_sockops) {
+		prog_type = BPF_PROG_TYPE_SOCKET_OPS;
 	} else {
 		printf("Unknown event '%s'\n", event);
 		return -1;
@@ -106,8 +109,11 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
 		return 0;
 
-	if (is_socket) {
-		event += 6;
+	if (is_socket || is_sockops) {
+		if (is_socket)
+			event += 6;
+		else
+			event += 7;
 		if (*event != '/')
 			return 0;
 		event++;
@@ -560,7 +566,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		    memcmp(shname, "xdp", 3) == 0 ||
 		    memcmp(shname, "perf_event", 10) == 0 ||
 		    memcmp(shname, "socket", 6) == 0 ||
-		    memcmp(shname, "cgroup/", 7) == 0)
+		    memcmp(shname, "cgroup/", 7) == 0 ||
+		    memcmp(shname, "sockops", 7) == 0)
 			load_and_attach(shname, data->d_buf, data->d_size);
 	}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 02/15] bpf: program to load socketops BPF programs
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 01/15] net: BPF support for socket ops Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs Lawrence Brakmo
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

The program tcp_bpf can be used to remove current global sockops program
and to load/replace sockops BPF programs. There is also an option to
print the bpf trace buffer (for debugging purposes).

USAGE:
  ./tcp_bpf [-r] [-l] [<pname>]
WHERE:
  -r      remove current loaded socketops BPF program
          not needed if loading a new program
  -l      print BPF trace buffer. Used when loading a new program
  <pname> name of BPF sockeops program to load
          if <pname> does not end in ".o", then "_kern.o" is appended
          example: using tcp_rto will load tcp_rto_kern.o

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile  |  3 ++
 samples/bpf/tcp_bpf.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)
 create mode 100644 samples/bpf/tcp_bpf.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 6c7468e..b13436c 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -36,6 +36,7 @@ hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 hostprogs-y += test_map_in_map
 hostprogs-y += per_socket_stats_example
+hostprogs-y += tcp_bpf
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -52,6 +53,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
 tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
 tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
 tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+tcp_bpf-objs := bpf_load.o $(LIBBPF) tcp_bpf.o
 test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
 trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
 lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
@@ -130,6 +132,7 @@ HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
+HOSTLOADLIBES_tcp_bpf += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
 HOSTLOADLIBES_trace_output += -lelf -lrt
 HOSTLOADLIBES_lathist += -lelf
diff --git a/samples/bpf/tcp_bpf.c b/samples/bpf/tcp_bpf.c
new file mode 100644
index 0000000..41d2124
--- /dev/null
+++ b/samples/bpf/tcp_bpf.c
@@ -0,0 +1,81 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+#include <unistd.h>
+#include <linux/unistd.h>
+
+static void usage(char *pname)
+{
+	printf("USAGE:\n  %s [-r] [-l] <pname>\n", pname);
+	printf("WHERE:\n");
+	printf("  -r      remove current loaded socketops BPF program\n");
+	printf("          not needed if loading a new program\n");
+	printf("  -l      print out BPF log buffer\n");
+	printf("  <pname> name of BPF sockeops program to load\n");
+	printf("          if <pname> does not end in \".o\", then \"_kern.o\" "
+	       "is appended\n");
+	printf("          example: using tcp1 will load tcp1_kern.o\n");
+	printf("\n");
+	exit(1);
+}
+
+int main(int argc, char **argv)
+{
+	char fn[500];
+	int k, logFlag = 0;
+	int error = -1;
+	union bpf_attr attr;
+
+	if (argc <= 1)
+		usage(argv[0]);
+	for (k = 1; k < argc; k++) {
+		if (!strcmp(argv[k], "-r")) {
+			/* A fd of zero is used as signal to remove the
+			 * current SOCKET_OPS program
+			 */
+			attr.bpf_fd = 0;
+			syscall(__NR_bpf, BPF_PROG_LOAD_SOCKET_OPS, &attr,
+				sizeof(attr));
+		} else if (!strcmp(argv[k], "-l")) {
+			logFlag = 1;
+		} else if (!strcmp(argv[k], "-h")) {
+			usage(argv[0]);
+		} else if (argv[k][0] == '-') {
+			printf("Error, unknown flag: %s\n", argv[k]);
+			exit(2);
+		} else if (strlen(argv[k]) > 450) {
+			printf("Error, program name too long %d\n",
+			       (int) strlen(argv[k]));
+			exit(3);
+		} else {
+			if (!strcmp(argv[k]+strlen(argv[k])-2, ".o"))
+				strcpy(fn, argv[k]);
+			else
+				sprintf(fn, "%s_kern.o", argv[k]);
+			if (logFlag)
+				printf("loading bpf file:%s\n", fn);
+			if (load_bpf_file(fn)) {
+				printf("%s", bpf_log_buf);
+				return 1;
+			}
+			if (logFlag) {
+				printf("TCP BPF Loaded %s\n", fn);
+				printf("%s\n", bpf_log_buf);
+			}
+			attr.bpf_fd = prog_fd[0];
+			error = syscall(__NR_bpf, BPF_PROG_LOAD_SOCKET_OPS,
+					&attr, sizeof(attr));
+			if (error) {
+				printf("ERROR: syscall(BPF_PROG_SOCKET_OPS: %d\n",
+				       error);
+				return 2;
+			}
+			if (logFlag)
+				read_trace_pipe();
+		}
+	}
+	return 0;
+}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 03/15] bpf: Support for per connection  SYN/SYN-ACK RTOs
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 01/15] net: BPF support for socket ops Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 02/15] bpf: program to load socketops BPF programs Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

This patch adds support for setting a per connection SYN and
SYN_ACK RTOs from within a BPF_SOCKET_OPS program. For example,
to set small RTOs when it is known both hosts are within a
datacenter.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/tcp.h        | 11 +++++++++++
 include/uapi/linux/bpf.h |  3 +++
 net/ipv4/tcp_input.c     |  3 ++-
 net/ipv4/tcp_output.c    |  2 +-
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index bc1c92c..8353563 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2019,4 +2019,15 @@ static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
 }
 #endif
 
+static inline u32 tcp_timeout_init(struct sock *sk, bool is_req_sock)
+{
+	int timeout;
+
+	timeout = tcp_call_bpf(sk, is_req_sock, BPF_SOCKET_OPS_TIMEOUT_INIT);
+
+	if (timeout <= 0)
+		timeout = TCP_TIMEOUT_INIT;
+	return timeout;
+}
+
 #endif	/* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 06e7ee1..c9da55d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -699,6 +699,9 @@ struct bpf_socket_ops {
  */
 enum {
 	BPF_SOCKET_OPS_VOID,
+	BPF_SOCKET_OPS_TIMEOUT_INIT,	/* Should return SYN-RTO value to use or
+					 * -1 if default value should be used
+					 */
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4ea8ec5..fefc062 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6405,7 +6405,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	} else {
 		tcp_rsk(req)->tfo_listener = false;
 		if (!want_cookie)
-			inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
+			inet_csk_reqsk_queue_hash_add(sk, req,
+				tcp_timeout_init((struct sock *)req, true));
 		af_ops->send_synack(sk, dst, &fl, req, &foc,
 				    !want_cookie ? TCP_SYNACK_NORMAL :
 						   TCP_SYNACK_COOKIE);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e3aab1c..503e478 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3326,7 +3326,7 @@ static void tcp_connect_init(struct sock *sk)
 	tp->rcv_wup = tp->rcv_nxt;
 	tp->copied_seq = tp->rcv_nxt;
 
-	inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+	inet_csk(sk)->icsk_rto = tcp_timeout_init(sk, false);
 	inet_csk(sk)->icsk_retransmits = 0;
 	tcp_clear_retrans(tp);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 04/15] bpf: Sample bpf program to set SYN/SYN-ACK RTOs
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (2 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:13   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile          |  1 +
 samples/bpf/tcp_synrto_kern.c | 53 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)
 create mode 100644 samples/bpf/tcp_synrto_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index b13436c..c5008c4 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -113,6 +113,7 @@ always += lwt_len_hist_kern.o
 always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
+always += tcp_synrto_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_synrto_kern.c b/samples/bpf/tcp_synrto_kern.c
new file mode 100644
index 0000000..1af2a34
--- /dev/null
+++ b/samples/bpf/tcp_synrto_kern.c
@@ -0,0 +1,53 @@
+/*
+ * BPF program to set SYN and SYN-ACK RTOs to 10ms when using IPv6 addresses
+ * and the first 5.5 bytes of the IPv6 addresses are the same (in this example
+ * that means both hosts are in the same datacenter.
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_synrto(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = -1;
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Check for TIMEOUT_INIT operation and IPv6 addresses
+	if (op == BPF_SOCKET_OPS_TIMEOUT_INIT &&
+		skops->family == AF_INET6) {
+
+		// If the first 5.5 bytes of the IPv6 address are the same
+		// then both hosts are in the same datacenter
+		// so use an RTO of 10ms
+		if (skops->local_ip6[0] == skops->remote_ip6[0] &&
+		    (skops->local_ip6[1] & 0xfff00000) ==
+		    (skops->remote_ip6[1] & 0xfff00000))
+			rv = 10;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (3 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:14   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

This patch adds suppport for setting the initial advertized window from
within a BPF_SOCKET_OPS program. This can be used to support larger
initial cwnd values in environments where it is known to be safe.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/tcp.h        | 10 ++++++++++
 include/uapi/linux/bpf.h |  4 ++++
 net/ipv4/tcp_minisocks.c |  9 ++++++++-
 net/ipv4/tcp_output.c    |  7 ++++++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8353563..dc20e86 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2030,4 +2030,14 @@ static inline u32 tcp_timeout_init(struct sock *sk, bool is_req_sock)
 	return timeout;
 }
 
+static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool is_req_sock)
+{
+	int rwnd;
+
+	rwnd = tcp_call_bpf(sk, is_req_sock, BPF_SOCKET_OPS_RWND_INIT);
+
+	if (rwnd < 0)
+		rwnd = 0;
+	return rwnd;
+}
 #endif	/* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c9da55d..b2af839 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -702,6 +702,10 @@ enum {
 	BPF_SOCKET_OPS_TIMEOUT_INIT,	/* Should return SYN-RTO value to use or
 					 * -1 if default value should be used
 					 */
+	BPF_SOCKET_OPS_RWND_INIT,	/* Should return initial advertized
+					 * window (in packets) or -1 if default
+					 * value should be used
+					 */
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index d0642df..a0032ea 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -352,6 +352,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
 	u32 window_clamp;
 	__u8 rcv_wscale;
 	int mss;
+	u32 rcv_wnd;
 
 	mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
 	window_clamp = READ_ONCE(tp->window_clamp);
@@ -363,6 +364,12 @@ void tcp_openreq_init_rwin(struct request_sock *req,
 	    (req->rsk_window_clamp > full_space || req->rsk_window_clamp == 0))
 		req->rsk_window_clamp = full_space;
 
+	rcv_wnd = tcp_rwnd_init_bpf((struct sock *)req, true);
+	if (rcv_wnd == 0)
+		rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+	else if (full_space < rcv_wnd * mss)
+		full_space = rcv_wnd * mss;
+
 	/* tcp_full_space because it is guaranteed to be the first packet */
 	tcp_select_initial_window(full_space,
 		mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
@@ -370,7 +377,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
 		&req->rsk_window_clamp,
 		ireq->wscale_ok,
 		&rcv_wscale,
-		dst_metric(dst, RTAX_INITRWND));
+		rcv_wnd);
 	ireq->rcv_wscale = rcv_wscale;
 }
 EXPORT_SYMBOL(tcp_openreq_init_rwin);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 503e478..972a985 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3267,6 +3267,7 @@ static void tcp_connect_init(struct sock *sk)
 	const struct dst_entry *dst = __sk_dst_get(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	__u8 rcv_wscale;
+	u32 rcv_wnd;
 
 	/* We'll fix this up when we get a response from the other end.
 	 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
@@ -3299,13 +3300,17 @@ static void tcp_connect_init(struct sock *sk)
 	    (tp->window_clamp > tcp_full_space(sk) || tp->window_clamp == 0))
 		tp->window_clamp = tcp_full_space(sk);
 
+	rcv_wnd = tcp_rwnd_init_bpf(sk, false);
+	if (rcv_wnd == 0)
+		rcv_wnd = dst_metric(dst, RTAX_INITRWND);
+
 	tcp_select_initial_window(tcp_full_space(sk),
 				  tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
 				  &tp->rcv_wnd,
 				  &tp->window_clamp,
 				  sysctl_tcp_window_scaling,
 				  &rcv_wscale,
-				  dst_metric(dst, RTAX_INITRWND));
+				  rcv_wnd);
 
 	tp->rx_opt.rcv_wscale = rcv_wscale;
 	tp->rcv_ssthresh = tp->rcv_wnd;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (4 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:14   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile        |  1 +
 samples/bpf/tcp_rwnd_kern.c | 54 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 samples/bpf/tcp_rwnd_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index c5008c4..5783d38 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,6 +114,7 @@ always += xdp_tx_iptunnel_kern.o
 always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
+always += tcp_rwnd_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_rwnd_kern.c b/samples/bpf/tcp_rwnd_kern.c
new file mode 100644
index 0000000..958945f
--- /dev/null
+++ b/samples/bpf/tcp_rwnd_kern.c
@@ -0,0 +1,54 @@
+/*
+ * BPF program to set initial receive window to 40 packets when using IPv6
+ * and the first 5.5 bytes of the IPv6 addresses are not the same (in this
+ * example that means both hosts are not the same datacenter.
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_rwnd(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = -1;
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Check for RWND_INIT operation and IPv6 addresses
+	if (op == BPF_SOCKET_OPS_RWND_INIT &&
+		skops->family == AF_INET6) {
+
+		// If the first 5.5 bytes of the IPv6 address are not the same
+		// then both hosts are not in the same datacenter
+		// so use a larger initial advertized window (40 packets)
+		if (skops->local_ip6[0] != skops->remote_ip6[0] ||
+		    (skops->local_ip6[1] & 0xfffff000) !=
+		    (skops->remote_ip6[1] & 0xfffff000))
+			bpf_trace_printk(fmt2, sizeof(fmt2), -1);
+			rv = 40;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (5 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:16   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 08/15] bpf: Add TCP connection BPF callbacks Lawrence Brakmo
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCKET_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/uapi/linux/bpf.h  | 14 ++++++++-
 net/core/filter.c         | 77 ++++++++++++++++++++++++++++++++++++++++++++++-
 samples/bpf/bpf_helpers.h |  3 ++
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b2af839..e229d61 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -495,6 +495,17 @@ union bpf_attr {
  *     Get the owner uid of the socket stored inside sk_buff.
  *     @skb: pointer to skb
  *     Return: uid of the socket owner on success or overflowuid if failed.
+ *
+ * int bpf_setsockopt(sk, level, optname, optval, optlen)
+ *     Calls setsockopt. Not all opts are available, only those with
+ *     integer optvals plus TCP_CONGESTION.
+ *     Supported levels: SOL_SOCKET and IPROTO_TCP
+ *     @sk: pointer to bpf_socket
+ *     @level: SOL_SOCKET or IPROTO_TCP
+ *     @optname: option name
+ *     @optval: pointer to option value
+ *     @optlen: length of optval in byes
+ *     Return: 0 or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -544,7 +555,8 @@ union bpf_attr {
 	FN(xdp_adjust_head),		\
 	FN(probe_read_str),		\
 	FN(get_socket_cookie),		\
-	FN(get_socket_uid),
+	FN(get_socket_uid),		\
+	FN(setsockopt),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 7b61690..6bc8d57 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -54,6 +54,7 @@
 #include <net/dst.h>
 #include <net/sock_reuseport.h>
 #include <net/busy_poll.h>
+#include <net/tcp.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -2653,6 +2654,69 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
+	   int, level, int, optname, char *, optval, int, optlen)
+{
+	int val;
+	int ret = 0;
+	struct sock *sk = bpf_socket->sk;
+
+	if (bpf_socket->is_req_sock)
+		return -EINVAL;
+
+	if (level == SOL_SOCKET) {
+		// Only some socketops are supported
+		val = *((int *)optval);
+
+		switch (optname) {
+		case SO_RCVBUF:
+			sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+			sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
+			break;
+		case SO_SNDBUF:
+			sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
+			sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
+			break;
+		case SO_MAX_PACING_RATE:
+			sk->sk_max_pacing_rate = val;
+			sk->sk_pacing_rate = min(sk->sk_pacing_rate,
+						 sk->sk_max_pacing_rate);
+			break;
+		case SO_PRIORITY:
+			sk->sk_priority = val;
+			break;
+		case SO_RCVLOWAT:
+			if (val < 0)
+				val = INT_MAX;
+			sk->sk_rcvlowat = val ? : 1;
+			break;
+		case SO_MARK:
+			sk->sk_mark = val;
+			break;
+		default:
+			ret = -EINVAL;
+		}
+	} else if (level == SOL_TCP &&
+		   bpf_socket->sk->sk_prot->setsockopt == tcp_setsockopt) {
+		// Place holder
+		ret = -EINVAL;
+	} else {
+		ret = -EINVAL;
+	}
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_setsockopt_proto = {
+	.func		= bpf_setsockopt,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -2808,6 +2872,17 @@ lwt_inout_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+	socket_ops_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_setsockopt:
+		return &bpf_setsockopt_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static const struct bpf_func_proto *
 lwt_xmit_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -3523,7 +3598,7 @@ const struct bpf_verifier_ops cg_sock_prog_ops = {
 };
 
 const struct bpf_verifier_ops socket_ops_prog_ops = {
-	.get_func_proto		= bpf_base_func_proto,
+	.get_func_proto		= socket_ops_func_proto,
 	.is_valid_access	= socket_ops_is_valid_access,
 	.convert_ctx_access	= socket_ops_convert_ctx_access,
 };
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 51e567b..d7514c0 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -60,6 +60,9 @@ static unsigned long long (*bpf_get_prandom_u32)(void) =
 	(void *) BPF_FUNC_get_prandom_u32;
 static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
 	(void *) BPF_FUNC_xdp_adjust_head;
+static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
+			     int optlen) =
+	(void *) BPF_FUNC_setsockopt;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 08/15] bpf: Add TCP connection BPF callbacks
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (6 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 17:59 ` [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Added callbacks to BPF SOCKET_OPS type program before an active
connection is intialized and after a passive or active connection is
established.

The following patch demostrates how they can be used to set send and
receive buffer sizes.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/uapi/linux/bpf.h | 11 +++++++++++
 net/ipv4/tcp_fastopen.c  |  1 +
 net/ipv4/tcp_input.c     |  4 +++-
 net/ipv4/tcp_output.c    |  1 +
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e229d61..31a2641 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -718,6 +718,17 @@ enum {
 					 * window (in packets) or -1 if default
 					 * value should be used
 					 */
+	BPF_SOCKET_OPS_TCP_CONNECT_CB,	/* Calls BPF program right before an
+					 * active connection is initialized
+					 */
+	BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB,	/* Calls BPF program when an
+						 * active connection is
+						 * established
+						 */
+	BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB,	/* Calls BPF program when a
+						 * passive connection is
+						 * established
+						 */
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4af82b9..c3ec4ec 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -221,6 +221,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	tcp_init_congestion_control(child);
 	tcp_mtup_init(child);
 	tcp_init_metrics(child);
+	tcp_call_bpf(child, false, BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB);
 	tcp_init_buffer_space(child);
 
 	tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fefc062..9cef574 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5571,7 +5571,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 	icsk->icsk_af_ops->rebuild_header(sk);
 
 	tcp_init_metrics(sk);
-
+	tcp_call_bpf(sk, false, BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB);
 	tcp_init_congestion_control(sk);
 
 	/* Prevent spurious tcp_cwnd_restart() on first data
@@ -5977,6 +5977,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		} else {
 			/* Make sure socket is routed, for correct metrics. */
 			icsk->icsk_af_ops->rebuild_header(sk);
+			tcp_call_bpf(sk, false,
+				     BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB);
 			tcp_init_congestion_control(sk);
 
 			tcp_mtup_init(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 972a985..face30c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3444,6 +3444,7 @@ int tcp_connect(struct sock *sk)
 	struct sk_buff *buff;
 	int err;
 
+	tcp_call_bpf(sk, false, BPF_SOCKET_OPS_TCP_CONNECT_CB);
 	tcp_connect_init(sk);
 
 	if (unlikely(tp->repair)) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (7 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 08/15] bpf: Add TCP connection BPF callbacks Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:17   ` David Miller
  2017-06-13 17:59 ` [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile        |  1 +
 samples/bpf/tcp_bufs_kern.c | 71 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)
 create mode 100644 samples/bpf/tcp_bufs_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 5783d38..b38cbe2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -115,6 +115,7 @@ always += test_map_in_map_kern.o
 always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
+always += tcp_bufs_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_bufs_kern.c b/samples/bpf/tcp_bufs_kern.c
new file mode 100644
index 0000000..a407b73
--- /dev/null
+++ b/samples/bpf/tcp_bufs_kern.c
@@ -0,0 +1,71 @@
+/*
+ * BPF program to set initial receive window to 40 packets and send
+ * and receive buffers to 1.5MB. This would usually be done after
+ * doing appropriate checks that indicate the hosts are far enough
+ * away (i.e. large RTT).
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_bufs(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = 0;
+	int rwnd_init = 40;
+	int bufsize = 1500000;
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Usually there would be a check to insure the hosts are far
+	// from each other so it makes sense to increase buffer sizes
+
+	switch (op) {
+	case BPF_SOCKET_OPS_RWND_INIT:
+		rv = rwnd_init;
+		break;
+	case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+		// Set sndbuf and rcvbuf of active connections
+		rv = bpf_setsockopt(skb, SOL_SOCKET, SO_SNDBUF, &bufsize,
+				    sizeof(bufsize));
+		rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET, SO_RCVBUF,
+					      &bufsize, sizeof(bufsize));
+		break;
+	case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+		// Nothing to do
+		break;
+	case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+		// Set sndbuf and rcvbuf of passive connections
+		rv = bpf_setsockopt(skb, SOL_SOCKET, SO_SNDBUF, &bufsize,
+				    sizeof(bufsize));
+		rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET, SO_RCVBUF,
+					     &bufsize, sizeof(bufsize));
+		break;
+	default:
+		rv = -1;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (8 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
@ 2017-06-13 17:59 ` Lawrence Brakmo
  2017-06-13 21:17   ` David Miller
  2017-06-13 18:00 ` [RFC PATCH net-next 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 17:59 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Added support for changing congestion control for SOCKET_OPS bps
programs through the setsockopt bpf helper function. It also adds
a new SOCKET_OPS op, BPF_SOCKET_OPS_NEEDS_ECN, that is needed for
congestion controls, like dctcp, that need to enable ECN in the
SYN packets.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/tcp.h        |  9 ++++++++-
 include/uapi/linux/bpf.h |  3 +++
 net/core/filter.c        | 11 +++++++++--
 net/ipv4/tcp.c           |  2 +-
 net/ipv4/tcp_cong.c      | 15 ++++++++++-----
 net/ipv4/tcp_input.c     |  3 ++-
 net/ipv4/tcp_output.c    |  8 +++++---
 7 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index dc20e86..24ea413 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1003,7 +1003,9 @@ void tcp_get_default_congestion_control(char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
+void tcp_reinit_congestion_control(struct sock *sk,
+				   const struct tcp_congestion_ops *ca);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
@@ -2040,4 +2042,9 @@ static inline u32 tcp_rwnd_init_bpf(struct sock *sk, bool is_req_sock)
 		rwnd = 0;
 	return rwnd;
 }
+
+static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
+{
+	return (tcp_call_bpf(sk, true, BPF_SOCKET_OPS_NEEDS_ECN) == 1);
+}
 #endif	/* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 31a2641..b86aacc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -729,6 +729,9 @@ enum {
 						 * passive connection is
 						 * established
 						 */
+	BPF_SOCKET_OPS_NEEDS_ECN,	/* If connection's congestion control
+					 * needs ECN
+					 */
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 6bc8d57..5e0cb4e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2698,8 +2698,15 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
 		}
 	} else if (level == SOL_TCP &&
 		   bpf_socket->sk->sk_prot->setsockopt == tcp_setsockopt) {
-		// Place holder
-		ret = -EINVAL;
+		if (optname == TCP_CONGESTION) {
+			ret = tcp_set_congestion_control(sk, optval, false);
+			if (!ret && bpf_socket->op > BPF_SOCKET_OPS_NEEDS_ECN)
+				// replacing an existing ca
+				tcp_reinit_congestion_control(sk,
+					inet_csk(sk)->icsk_ca_ops);
+		} else {
+			ret = -EINVAL;
+		}
 	} else {
 		ret = -EINVAL;
 	}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f7be94f..c976871 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2457,7 +2457,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		name[val] = 0;
 
 		lock_sock(sk);
-		err = tcp_set_congestion_control(sk, name);
+		err = tcp_set_congestion_control(sk, name, true);
 		release_sock(sk);
 		return err;
 	}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 6e3c512..136f392 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -188,8 +188,8 @@ void tcp_init_congestion_control(struct sock *sk)
 		INET_ECN_dontxmit(sk);
 }
 
-static void tcp_reinit_congestion_control(struct sock *sk,
-					  const struct tcp_congestion_ops *ca)
+void tcp_reinit_congestion_control(struct sock *sk,
+				   const struct tcp_congestion_ops *ca)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
@@ -333,7 +333,7 @@ int tcp_set_allowed_congestion_control(char *val)
 }
 
 /* Change congestion control for socket */
-int tcp_set_congestion_control(struct sock *sk, const char *name)
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const struct tcp_congestion_ops *ca;
@@ -343,7 +343,10 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 		return -EPERM;
 
 	rcu_read_lock();
-	ca = __tcp_ca_find_autoload(name);
+	if (!load)
+		ca = tcp_ca_find(name);
+	else
+		ca = __tcp_ca_find_autoload(name);
 	/* No change asking for existing value */
 	if (ca == icsk->icsk_ca_ops) {
 		icsk->icsk_ca_setsockopt = 1;
@@ -351,8 +354,10 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	}
 	if (!ca)
 		err = -ENOENT;
+	else if (!load)
+		icsk->icsk_ca_ops = ca;
 	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
-		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
+			   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 	else if (!try_module_get(ca->owner))
 		err = -EBUSY;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9cef574..1979740 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6192,7 +6192,8 @@ static void tcp_ecn_create_request(struct request_sock *req,
 	ecn_ok = net->ipv4.sysctl_tcp_ecn || ecn_ok_dst;
 
 	if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk) ||
-	    (ecn_ok_dst & DST_FEATURE_ECN_CA))
+	    (ecn_ok_dst & DST_FEATURE_ECN_CA) ||
+	    tcp_bpf_ca_needs_ecn((struct sock *)req))
 		inet_rsk(req)->ecn_ok = 1;
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index face30c..32a250a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -316,7 +316,8 @@ static void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
 	TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_CWR;
 	if (!(tp->ecn_flags & TCP_ECN_OK))
 		TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
-	else if (tcp_ca_needs_ecn(sk))
+	else if (tcp_ca_needs_ecn(sk) ||
+		 tcp_bpf_ca_needs_ecn(sk))
 		INET_ECN_xmit(sk);
 }
 
@@ -324,8 +325,9 @@ static void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
 static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	bool bpf_needs_ecn = tcp_bpf_ca_needs_ecn(sk);
 	bool use_ecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 ||
-		       tcp_ca_needs_ecn(sk);
+		tcp_ca_needs_ecn(sk) || bpf_needs_ecn;
 
 	if (!use_ecn) {
 		const struct dst_entry *dst = __sk_dst_get(sk);
@@ -339,7 +341,7 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
 	if (use_ecn) {
 		TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
 		tp->ecn_flags = TCP_ECN_OK;
-		if (tcp_ca_needs_ecn(sk))
+		if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn)
 			INET_ECN_xmit(sk);
 	}
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 11/15] bpf: Sample BPF program to set congestion control
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (9 preceding siblings ...)
  2017-06-13 17:59 ` [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
@ 2017-06-13 18:00 ` Lawrence Brakmo
  2017-06-13 21:31   ` David Miller
  2017-06-13 18:00 ` [RFC PATCH net-next 12/15] bpf: Adds support for setting initial cwnd Lawrence Brakmo
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 18:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile        |  1 +
 samples/bpf/tcp_cong_kern.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 samples/bpf/tcp_cong_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index b38cbe2..1066c99 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -116,6 +116,7 @@ always += cookie_uid_helper_example.o
 always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
+always += tcp_cong_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_cong_kern.c b/samples/bpf/tcp_cong_kern.c
new file mode 100644
index 0000000..c077ff0
--- /dev/null
+++ b/samples/bpf/tcp_cong_kern.c
@@ -0,0 +1,67 @@
+/*
+ * BPF program to set congestion control to dctcp when both hosts are
+ * in the same datacenter (as deteremined by IPv6 prefix).
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/tcp.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_cong(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = 0;
+	char cong[] = "dctcp";
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Check if both hosts are in the same datacenter. For this
+	// example they are if the 1st 5.5 bytes in the IPv6 address
+	// are the same.
+	if (skops->family == AF_INET6 &&
+	    skops->local_ip6[0] == skops->remote_ip6[0] &&
+	    (skops->local_ip6[1] & 0xfff00000) ==
+	    (skops->remote_ip6[1] & 0xfff00000)) {
+		switch (op) {
+		case BPF_SOCKET_OPS_NEEDS_ECN:
+			rv = 1;
+			break;
+		case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+			rv = bpf_setsockopt(skb, SOL_TCP, TCP_CONGESTION, cong,
+					    sizeof(cong));
+			break;
+		case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+			rv = bpf_setsockopt(skb, SOL_TCP, TCP_CONGESTION, cong,
+					    sizeof(cong));
+			break;
+		default:
+			rv = -1;
+		}
+	} else {
+		rv = -1;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 12/15] bpf: Adds support for setting initial cwnd
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (10 preceding siblings ...)
  2017-06-13 18:00 ` [RFC PATCH net-next 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
@ 2017-06-13 18:00 ` Lawrence Brakmo
  2017-06-13 18:00 ` [RFC PATCH net-next 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 18:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the
initial congestion window. This can be used when the hosts are far
apart (large RTTs) and it is safe to start with a large inital cwnd.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c        | 14 +++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b86aacc..9895672 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -734,4 +734,6 @@ enum {
 					 */
 };
 
+#define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 5e0cb4e..7b4458d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2705,7 +2705,19 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
 				tcp_reinit_congestion_control(sk,
 					inet_csk(sk)->icsk_ca_ops);
 		} else {
-			ret = -EINVAL;
+			struct tcp_sock *tp = tcp_sk(sk);
+
+			val = *((int *)optval);
+			switch (optname) {
+			case TCP_BPF_IW:
+				if (val <= 0 || tp->data_segs_out > 0)
+					ret = -EINVAL;
+				else
+					tp->snd_cwnd = val;
+				break;
+			default:
+				ret = -EINVAL;
+			}
 		}
 	} else {
 		ret = -EINVAL;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 13/15] bpf: Sample BPF program to set initial cwnd
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (11 preceding siblings ...)
  2017-06-13 18:00 ` [RFC PATCH net-next 12/15] bpf: Adds support for setting initial cwnd Lawrence Brakmo
@ 2017-06-13 18:00 ` Lawrence Brakmo
  2017-06-13 21:31   ` David Miller
  2017-06-13 18:00 ` [RFC PATCH net-next 14/15] bpf: Adds support for setting sndcwnd clamp Lawrence Brakmo
  2017-06-13 18:00 ` [RFC PATCH net-next 15/15] bpf: Sample bpf program to set " Lawrence Brakmo
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 18:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile      |  1 +
 samples/bpf/tcp_iw_kern.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 samples/bpf/tcp_iw_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1066c99..59ba879 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -117,6 +117,7 @@ always += tcp_synrto_kern.o
 always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
+always += tcp_iw_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_iw_kern.c b/samples/bpf/tcp_iw_kern.c
new file mode 100644
index 0000000..0279f7e
--- /dev/null
+++ b/samples/bpf/tcp_iw_kern.c
@@ -0,0 +1,73 @@
+/*
+ * BPF program to set initial congestion window and initial receive
+ * window to 40 packets and send and receive buffers to 1.5MB. This
+ * would usually be done after doing appropriate checks that indicate
+ * the hosts are far enough away (i.e. large RTT).
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_iw(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = 0;
+	int rwnd_init = 40;
+	int iw = 40;
+	int bufsize = 1500000;
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Usually there would be a check to insure the hosts are far
+	// from each other so it makes sense to increase buffer sizes
+
+	switch (op) {
+	case BPF_SOCKET_OPS_RWND_INIT:
+		rv = rwnd_init;
+		break;
+	case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+		// Set sndbuf and rcvbuf of active connections
+		rv = bpf_setsockopt(skb, SOL_SOCKET, SO_SNDBUF, &bufsize,
+				    sizeof(bufsize));
+		rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET, SO_RCVBUF,
+					      &bufsize, sizeof(bufsize));
+		break;
+	case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+		rv = bpf_setsockopt(skb, SOL_TCP, TCP_BPF_IW, &iw,
+				    sizeof(iw));
+		break;
+	case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+		// Set sndbuf and rcvbuf of passive connections
+		rv = bpf_setsockopt(skb, SOL_SOCKET, SO_SNDBUF, &bufsize,
+				    sizeof(bufsize));
+		rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET, SO_RCVBUF,
+					      &bufsize, sizeof(bufsize));
+		break;
+	default:
+		rv = -1;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 14/15] bpf: Adds support for setting sndcwnd clamp
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (12 preceding siblings ...)
  2017-06-13 18:00 ` [RFC PATCH net-next 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
@ 2017-06-13 18:00 ` Lawrence Brakmo
  2017-06-13 18:00 ` [RFC PATCH net-next 15/15] bpf: Sample bpf program to set " Lawrence Brakmo
  14 siblings, 0 replies; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 18:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/uapi/linux/bpf.h | 1 +
 net/core/filter.c        | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9895672..6ca0aa2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -735,5 +735,6 @@ enum {
 };
 
 #define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
+#define TCP_BPF_SNDCWND_CLAMP	1002	/* Set sndcwnd_clamp */
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/net/core/filter.c b/net/core/filter.c
index 7b4458d..bcd4495 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2715,6 +2715,13 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
 				else
 					tp->snd_cwnd = val;
 				break;
+			case TCP_BPF_SNDCWND_CLAMP:
+				if (val <= 0) {
+					ret = -EINVAL;
+				} else {
+					tp->snd_cwnd_clamp = val;
+					tp->snd_ssthresh = val;
+				}
 			default:
 				ret = -EINVAL;
 			}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 15/15] bpf: Sample bpf program to set sndcwnd clamp
  2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
                   ` (13 preceding siblings ...)
  2017-06-13 18:00 ` [RFC PATCH net-next 14/15] bpf: Adds support for setting sndcwnd clamp Lawrence Brakmo
@ 2017-06-13 18:00 ` Lawrence Brakmo
  2017-06-13 21:32   ` David Miller
  14 siblings, 1 reply; 26+ messages in thread
From: Lawrence Brakmo @ 2017-06-13 18:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 samples/bpf/Makefile         |  1 +
 samples/bpf/tcp_clamp_kern.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)
 create mode 100644 samples/bpf/tcp_clamp_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 59ba879..22f2d84 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -118,6 +118,7 @@ always += tcp_rwnd_kern.o
 always += tcp_bufs_kern.o
 always += tcp_cong_kern.o
 always += tcp_iw_kern.o
+always += tcp_clamp_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/tcp_clamp_kern.c b/samples/bpf/tcp_clamp_kern.c
new file mode 100644
index 0000000..edbdff3
--- /dev/null
+++ b/samples/bpf/tcp_clamp_kern.c
@@ -0,0 +1,86 @@
+/*
+ * Sample BPF program to set send and receive buffers to 150KB, sndcwnd clamp
+ * to 100 packets and SYN and SYN_ACK RTOs to 10ms when both hosts are within
+ * the same datacenter. For his example, we assume they are within the same
+ * datacenter when the first 5.5 bytes of their IPv6 addresses are the same.
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+
+#define DEBUG 1
+
+SEC("sockops")
+int bpf_clamp(struct __sk_buff *skb)
+{
+	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
+	char fmt1[] = "BPF command: %d\n";
+	char fmt2[] = "  Returning %d\n";
+	int op;
+	int rv = 0;
+	int bufsize = 150000;
+	int clamp = 100;
+	int to_init = 10;
+
+	// For testing purposes, only execute rest of BPF program
+	// if neither port numberis 55601
+	if (skops->remote_port != 55601 && skops->local_port != 55601)
+		return -1;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_trace_printk(fmt1, sizeof(fmt1), op);
+#endif
+
+	// Check that both hosts are within same datacenter. For this example
+	// it is the case when the first 5.5 bytes of their IPv6 addresses are
+	// the same.
+
+	if (skops->family == AF_INET6 &&
+	    skops->local_ip6[0] == skops->remote_ip6[0] &&
+	    (skops->local_ip6[1] & 0xfff00000) ==
+	    (skops->remote_ip6[1] & 0xfff00000)) {
+		switch (op) {
+		case BPF_SOCKET_OPS_TIMEOUT_INIT:
+			rv = to_init;
+			break;
+		case BPF_SOCKET_OPS_TCP_CONNECT_CB:
+			// Set sndbuf and rcvbuf of active connections
+			rv = bpf_setsockopt(skb, SOL_SOCKET, SO_SNDBUF,
+					    &bufsize, sizeof(bufsize));
+			rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET,
+						      SO_RCVBUF, &bufsize,
+						      sizeof(bufsize));
+			break;
+		case BPF_SOCKET_OPS_ACTIVE_ESTABLISHED_CB:
+			rv = bpf_setsockopt(skb, SOL_TCP, TCP_BPF_SNDCWND_CLAMP,
+					    &clamp, sizeof(clamp));
+			break;
+		case BPF_SOCKET_OPS_PASSIVE_ESTABLISHED_CB:
+			// Set sndbuf and rcvbuf of passive connections
+			rv = bpf_setsockopt(skb, SOL_TCP, TCP_BPF_SNDCWND_CLAMP,
+					    &clamp, sizeof(clamp));
+			rv = -rv*100 + bpf_setsockopt(skb, SOL_SOCKET,
+						      SO_SNDBUF, &bufsize,
+						      sizeof(bufsize));
+			rv = -rv*200 + bpf_setsockopt(skb, SOL_SOCKET,
+						      SO_RCVBUF, &bufsize,
+						      sizeof(bufsize));
+			break;
+		default:
+			rv = -1;
+		}
+	} else {
+		rv = -1;
+	}
+#ifdef DEBUG
+	bpf_trace_printk(fmt2, sizeof(fmt2), rv);
+#endif
+	return rv;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 01/15] net: BPF support for socket ops
  2017-06-13 17:59 ` [RFC PATCH net-next 01/15] net: BPF support for socket ops Lawrence Brakmo
@ 2017-06-13 21:12   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:12 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:50 -0700

> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a20ba40..8f69b8b 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -899,4 +899,14 @@ static inline int bpf_tell_extensions(void)
>  	return SKF_AD_MAX;
>  }
>  
> +struct bpf_socket_ops_kern {
> +	struct	sock *sk;
> +	__u32	is_req_sock:1;
> +	__u32	op;
> +	union {
> +		__u32 reply;
> +		__u32 replylong[4];
> +	};
> +};
> +
>  #endif /* __LINUX_FILTER_H__ */

In non-UAPI headers, "u32", "u16", "u8", etc. without leading underscores
shoudl be used.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 04/15] bpf: Sample bpf program to set SYN/SYN-ACK RTOs
  2017-06-13 17:59 ` [RFC PATCH net-next 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
@ 2017-06-13 21:13   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:13 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:53 -0700

> +
> +	// Check for TIMEOUT_INIT operation and IPv6 addresses
> +	if (op == BPF_SOCKET_OPS_TIMEOUT_INIT &&
> +		skops->family == AF_INET6) {
> +
> +		// If the first 5.5 bytes of the IPv6 address are the same
> +		// then both hosts are in the same datacenter
> +		// so use an RTO of 10ms
> +		if (skops->local_ip6[0] == skops->remote_ip6[0] &&
> +		    (skops->local_ip6[1] & 0xfff00000) ==
> +		    (skops->remote_ip6[1] & 0xfff00000))
> +			rv = 10;
> +	}

I know this is just a sample program, but please do not use C++ comments
in C code.

Thank you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window
  2017-06-13 17:59 ` [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
@ 2017-06-13 21:14   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:14 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:54 -0700

> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index d0642df..a0032ea 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -352,6 +352,7 @@ void tcp_openreq_init_rwin(struct request_sock *req,
>  	u32 window_clamp;
>  	__u8 rcv_wscale;
>  	int mss;
> +	u32 rcv_wnd;

Please order local variables from longest to shortest line.

 ...
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 503e478..972a985 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3267,6 +3267,7 @@ static void tcp_connect_init(struct sock *sk)
>  	const struct dst_entry *dst = __sk_dst_get(sk);
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	__u8 rcv_wscale;
> +	u32 rcv_wnd;

Likewise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window
  2017-06-13 17:59 ` [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
@ 2017-06-13 21:14   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:14 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:55 -0700

> +SEC("sockops")
> +int bpf_rwnd(struct __sk_buff *skb)
> +{
> +	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
> +	char fmt1[] = "BPF command: %d\n";
> +	char fmt2[] = "  Returning %d\n";
> +	int op;
> +	int rv = -1;

Longest to shortest line for local variable declarations please.

> +
> +	// For testing purposes, only execute rest of BPF program
> +	// if neither port numberis 55601

Again, please use non-C++ comments.

Thank you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf
  2017-06-13 17:59 ` [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
@ 2017-06-13 21:16   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:16 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:56 -0700

> +BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
> +	   int, level, int, optname, char *, optval, int, optlen)
> +{
> +	int val;
> +	int ret = 0;
> +	struct sock *sk = bpf_socket->sk;

Longest to shortest line for variable declarations please.

Also, throughout your submission make sure you indent multi-line
function declarations and definitions properly.  It needs to be:

return_type function_name(type1 arg1, type2 arg2, type3 arg3,
			  type4 arg4, type5 arg5)

such that "type4 arg4" starts precisely at the first column after
the openning parenthesis of the first line.  You must use the
appropriate number of TAB then SPACE characters necessary to
achieve this.

Thank you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes
  2017-06-13 17:59 ` [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
@ 2017-06-13 21:17   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:17 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:58 -0700

> diff --git a/samples/bpf/tcp_bufs_kern.c b/samples/bpf/tcp_bufs_kern.c
> new file mode 100644
> index 0000000..a407b73
> --- /dev/null
> +++ b/samples/bpf/tcp_bufs_kern.c
...
> +SEC("sockops")
> +int bpf_bufs(struct __sk_buff *skb)
> +{
> +	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
> +	char fmt1[] = "BPF command: %d\n";
> +	char fmt2[] = "  Returning %d\n";
> +	int op;
> +	int rv = 0;
> +	int rwnd_init = 40;
> +	int bufsize = 1500000;
> +
> +	// For testing purposes, only execute rest of BPF program
> +	// if neither port numberis 55601

Local variable ordering and C++ comments.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control
  2017-06-13 17:59 ` [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
@ 2017-06-13 21:17   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:17 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 10:59:59 -0700

> @@ -2698,8 +2698,15 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_socket_ops_kern *, bpf_socket,
>  		}
>  	} else if (level == SOL_TCP &&
>  		   bpf_socket->sk->sk_prot->setsockopt == tcp_setsockopt) {
> -		// Place holder
> -		ret = -EINVAL;

I missed these C++ comments in an earlier patch, definitely don't use
them in kernel code.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 11/15] bpf: Sample BPF program to set congestion control
  2017-06-13 18:00 ` [RFC PATCH net-next 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
@ 2017-06-13 21:31   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:31 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 11:00:00 -0700

> +SEC("sockops")
> +int bpf_cong(struct __sk_buff *skb)
> +{
> +	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
> +	char fmt1[] = "BPF command: %d\n";
> +	char fmt2[] = "  Returning %d\n";
> +	int op;
> +	int rv = 0;
> +	char cong[] = "dctcp";
> +
> +	// For testing purposes, only execute rest of BPF program
> +	// if neither port numberis 55601

Local variable ordering and C++ comments.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 13/15] bpf: Sample BPF program to set initial cwnd
  2017-06-13 18:00 ` [RFC PATCH net-next 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
@ 2017-06-13 21:31   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:31 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 11:00:02 -0700

> +SEC("sockops")
> +int bpf_iw(struct __sk_buff *skb)
> +{
> +	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
> +	char fmt1[] = "BPF command: %d\n";
> +	char fmt2[] = "  Returning %d\n";
> +	int op;
> +	int rv = 0;
> +	int rwnd_init = 40;
> +	int iw = 40;
> +	int bufsize = 1500000;
> +
> +	// For testing purposes, only execute rest of BPF program
> +	// if neither port numberis 55601

Local variable ordering and C++ comments.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 15/15] bpf: Sample bpf program to set sndcwnd clamp
  2017-06-13 18:00 ` [RFC PATCH net-next 15/15] bpf: Sample bpf program to set " Lawrence Brakmo
@ 2017-06-13 21:32   ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2017-06-13 21:32 UTC (permalink / raw)
  To: brakmo; +Cc: netdev, kernel-team, bmatheny, ast, daniel, dsa

From: Lawrence Brakmo <brakmo@fb.com>
Date: Tue, 13 Jun 2017 11:00:04 -0700

> +SEC("sockops")
> +int bpf_clamp(struct __sk_buff *skb)
> +{
> +	struct bpf_socket_ops *skops = (struct bpf_socket_ops *) skb;
> +	char fmt1[] = "BPF command: %d\n";
> +	char fmt2[] = "  Returning %d\n";
> +	int op;
> +	int rv = 0;
> +	int bufsize = 150000;
> +	int clamp = 100;
> +	int to_init = 10;
> +
> +	// For testing purposes, only execute rest of BPF program
> +	// if neither port numberis 55601
> +	if (skops->remote_port != 55601 && skops->local_port != 55601)
> +		return -1;

Local variable ordering and C++ comments.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-06-13 21:32 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-13 17:59 RFC PATCH net-next 00/15] bpf: Add new SOCKET_OPS program type Lawrence Brakmo
2017-06-13 17:59 ` [RFC PATCH net-next 01/15] net: BPF support for socket ops Lawrence Brakmo
2017-06-13 21:12   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 02/15] bpf: program to load socketops BPF programs Lawrence Brakmo
2017-06-13 17:59 ` [RFC PATCH net-next 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs Lawrence Brakmo
2017-06-13 17:59 ` [RFC PATCH net-next 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
2017-06-13 21:13   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
2017-06-13 21:14   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
2017-06-13 21:14   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
2017-06-13 21:16   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 08/15] bpf: Add TCP connection BPF callbacks Lawrence Brakmo
2017-06-13 17:59 ` [RFC PATCH net-next 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
2017-06-13 21:17   ` David Miller
2017-06-13 17:59 ` [RFC PATCH net-next 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
2017-06-13 21:17   ` David Miller
2017-06-13 18:00 ` [RFC PATCH net-next 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-13 21:31   ` David Miller
2017-06-13 18:00 ` [RFC PATCH net-next 12/15] bpf: Adds support for setting initial cwnd Lawrence Brakmo
2017-06-13 18:00 ` [RFC PATCH net-next 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-13 21:31   ` David Miller
2017-06-13 18:00 ` [RFC PATCH net-next 14/15] bpf: Adds support for setting sndcwnd clamp Lawrence Brakmo
2017-06-13 18:00 ` [RFC PATCH net-next 15/15] bpf: Sample bpf program to set " Lawrence Brakmo
2017-06-13 21:32   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).