* [PATCH bpf-next v6 01/11] bpf: Make SOCK_OPS_GET_TCP size independent
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 02/11] bpf: Make SOCK_OPS_GET_TCP struct independent Lawrence Brakmo
` (9 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Make SOCK_OPS_GET_TCP helper macro size independent (before only worked
with 4-byte fields.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
net/core/filter.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 30fafaa..5d6f121 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4465,9 +4465,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
break;
/* Helper macro for adding read access to tcp_sock fields. */
-#define SOCK_OPS_GET_TCP32(FIELD_NAME) \
+#define SOCK_OPS_GET_TCP(FIELD_NAME) \
do { \
- BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) != 4); \
+ BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) > \
+ FIELD_SIZEOF(struct bpf_sock_ops, FIELD_NAME)); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
is_fullsock), \
@@ -4479,16 +4480,18 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
struct bpf_sock_ops_kern, sk),\
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, sk));\
- *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, \
+ *insn++ = BPF_LDX_MEM(FIELD_SIZEOF(struct tcp_sock, \
+ FIELD_NAME), si->dst_reg, \
+ si->dst_reg, \
offsetof(struct tcp_sock, FIELD_NAME)); \
} while (0)
case offsetof(struct bpf_sock_ops, snd_cwnd):
- SOCK_OPS_GET_TCP32(snd_cwnd);
+ SOCK_OPS_GET_TCP(snd_cwnd);
break;
case offsetof(struct bpf_sock_ops, srtt_us):
- SOCK_OPS_GET_TCP32(srtt_us);
+ SOCK_OPS_GET_TCP(srtt_us);
break;
}
return insn - insn_buf;
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 02/11] bpf: Make SOCK_OPS_GET_TCP struct independent
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 01/11] bpf: Make SOCK_OPS_GET_TCP size independent Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 03/11] bpf: Add write access to tcp_sock and sock fields Lawrence Brakmo
` (8 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Changed SOCK_OPS_GET_TCP to SOCK_OPS_GET_FIELD and added 2
arguments so now it can also work with struct sock fields.
The first argument is the name of the field in the bpf_sock_ops
struct, the 2nd argument is the name of the field in the OBJ struct.
Previous: SOCK_OPS_GET_TCP(FIELD_NAME)
New: SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)
Where OBJ is either "struct tcp_sock" or "struct sock" (without
quotation). BPF_FIELD is the name of the field in the bpf_sock_ops
struct and OBJ_FIELD is the name of the field in the OBJ struct.
Although the field names are currently the same, the kernel struct names
could change in the future and this change makes it easier to support
that.
Note that adding access to tcp_sock fields in sock_ops programs does
not preclude the tcp_sock fields from being removed as long as we are
willing to do one of the following:
1) Return a fixed value (e.x. 0 or 0xffffffff), or
2) Make the verifier fail if that field is accessed (i.e. program
fails to load) so the user will know that field is no longer
supported.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
net/core/filter.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 5d6f121..292bda8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4464,11 +4464,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
is_fullsock));
break;
-/* Helper macro for adding read access to tcp_sock fields. */
-#define SOCK_OPS_GET_TCP(FIELD_NAME) \
+/* Helper macro for adding read access to tcp_sock or sock fields. */
+#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
do { \
- BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, FIELD_NAME) > \
- FIELD_SIZEOF(struct bpf_sock_ops, FIELD_NAME)); \
+ BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
+ FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
is_fullsock), \
@@ -4480,18 +4480,18 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
struct bpf_sock_ops_kern, sk),\
si->dst_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, sk));\
- *insn++ = BPF_LDX_MEM(FIELD_SIZEOF(struct tcp_sock, \
- FIELD_NAME), si->dst_reg, \
- si->dst_reg, \
- offsetof(struct tcp_sock, FIELD_NAME)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(OBJ, \
+ OBJ_FIELD), \
+ si->dst_reg, si->dst_reg, \
+ offsetof(OBJ, OBJ_FIELD)); \
} while (0)
case offsetof(struct bpf_sock_ops, snd_cwnd):
- SOCK_OPS_GET_TCP(snd_cwnd);
+ SOCK_OPS_GET_FIELD(snd_cwnd, snd_cwnd, struct tcp_sock);
break;
case offsetof(struct bpf_sock_ops, srtt_us):
- SOCK_OPS_GET_TCP(srtt_us);
+ SOCK_OPS_GET_FIELD(srtt_us, srtt_us, struct tcp_sock);
break;
}
return insn - insn_buf;
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 03/11] bpf: Add write access to tcp_sock and sock fields
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 01/11] bpf: Make SOCK_OPS_GET_TCP size independent Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 02/11] bpf: Make SOCK_OPS_GET_TCP struct independent Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 3:54 ` Alexei Starovoitov
2018-01-20 1:45 ` [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function Lawrence Brakmo
` (7 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
struct tcp_sock or struct sock fields. This required adding a new
field "temp" to struct bpf_sock_ops_kern for temporary storage that
is used by sock_ops_convert_ctx_access. It is used to store and recover
the contents of a register, so the register can be used to store the
address of the sk. Since we cannot overwrite the dst_reg because it
contains the pointer to ctx, nor the src_reg since it contains the value
we want to store, we need an extra register to contain the address
of the sk.
Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
GET or SET macros depending on the value of the TYPE field.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/linux/filter.h | 9 +++++++++
include/net/tcp.h | 2 +-
net/core/filter.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 425056c..daa5a67 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1007,6 +1007,15 @@ struct bpf_sock_ops_kern {
u32 replylong[4];
};
u32 is_fullsock;
+ u64 temp; /* temp and everything after is not
+ * initialized to 0 before calling
+ * the BPF program. New fields that
+ * should be initialized to 0 should
+ * be inserted before temp.
+ * temp is scratch storage used by
+ * sock_ops_convert_ctx_access
+ * as temporary storage of a register.
+ */
};
#endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6939e69..108d16a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2010,7 +2010,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
struct bpf_sock_ops_kern sock_ops;
int ret;
- memset(&sock_ops, 0, sizeof(sock_ops));
+ memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
if (sk_fullsock(sk)) {
sock_ops.is_fullsock = 1;
sock_owned_by_me(sk);
diff --git a/net/core/filter.c b/net/core/filter.c
index 292bda8..1ff36ca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4486,6 +4486,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
offsetof(OBJ, OBJ_FIELD)); \
} while (0)
+/* Helper macro for adding write access to tcp_sock or sock fields.
+ * The macro is called with two registers, dst_reg which contains a pointer
+ * to ctx (context) and src_reg which contains the value that should be
+ * stored. However, we need an additional register since we cannot overwrite
+ * dst_reg because it may be used later in the program.
+ * Instead we "borrow" one of the other register. We first save its value
+ * into a new (temp) field in bpf_sock_ops_kern, use it, and then restore
+ * it at the end of the macro.
+ */
+#define SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
+ do { \
+ int reg = BPF_REG_9; \
+ BUILD_BUG_ON(FIELD_SIZEOF(OBJ, OBJ_FIELD) > \
+ FIELD_SIZEOF(struct bpf_sock_ops, BPF_FIELD)); \
+ if (si->dst_reg == reg || si->src_reg == reg) \
+ reg--; \
+ if (si->dst_reg == reg || si->src_reg == reg) \
+ reg--; \
+ *insn++ = BPF_STX_MEM(BPF_DW, si->dst_reg, reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ temp)); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, \
+ is_fullsock), \
+ reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ is_fullsock)); \
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
+ struct bpf_sock_ops_kern, sk),\
+ reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, sk));\
+ *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD), \
+ reg, si->src_reg, \
+ offsetof(OBJ, OBJ_FIELD)); \
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg, \
+ offsetof(struct bpf_sock_ops_kern, \
+ temp)); \
+ } while (0)
+
+#define SOCK_OPS_GET_OR_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ, TYPE) \
+ do { \
+ if (TYPE == BPF_WRITE) \
+ SOCK_OPS_SET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
+ else \
+ SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \
+ } while (0)
+
case offsetof(struct bpf_sock_ops, snd_cwnd):
SOCK_OPS_GET_FIELD(snd_cwnd, snd_cwnd, struct tcp_sock);
break;
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 03/11] bpf: Add write access to tcp_sock and sock fields
2018-01-20 1:45 ` [PATCH bpf-next v6 03/11] bpf: Add write access to tcp_sock and sock fields Lawrence Brakmo
@ 2018-01-20 3:54 ` Alexei Starovoitov
0 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 3:54 UTC (permalink / raw)
To: Lawrence Brakmo
Cc: netdev, Kernel Team, Blake Matheny, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Neal Cardwell, Yuchung Cheng
On Fri, Jan 19, 2018 at 05:45:40PM -0800, Lawrence Brakmo wrote:
> This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
> struct tcp_sock or struct sock fields. This required adding a new
> field "temp" to struct bpf_sock_ops_kern for temporary storage that
> is used by sock_ops_convert_ctx_access. It is used to store and recover
> the contents of a register, so the register can be used to store the
> address of the sk. Since we cannot overwrite the dst_reg because it
> contains the pointer to ctx, nor the src_reg since it contains the value
> we want to store, we need an extra register to contain the address
> of the sk.
>
> Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
> GET or SET macros depending on the value of the TYPE field.
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
that is really clever way of doing inline writes into fields.
I suspect we will be using this approach in other places.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (2 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 03/11] bpf: Add write access to tcp_sock and sock fields Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-24 1:11 ` Daniel Borkmann
2018-01-20 1:45 ` [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock Lawrence Brakmo
` (6 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds support for passing up to 4 arguments to sock_ops bpf functions. It
reusues the reply union, so the bpf_sock_ops structures are not
increased in size.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/linux/filter.h | 1 +
include/net/tcp.h | 64 ++++++++++++++++++++++++++++++++++++++++++++----
include/uapi/linux/bpf.h | 5 ++--
net/ipv4/tcp.c | 2 +-
net/ipv4/tcp_nv.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
6 files changed, 66 insertions(+), 10 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index daa5a67..20384c4 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1003,6 +1003,7 @@ struct bpf_sock_ops_kern {
struct sock *sk;
u32 op;
union {
+ u32 args[4];
u32 reply;
u32 replylong[4];
};
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 108d16a..8e9111f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2005,7 +2005,7 @@ void tcp_cleanup_ulp(struct sock *sk);
* program loaded).
*/
#ifdef CONFIG_BPF
-static inline int tcp_call_bpf(struct sock *sk, int op)
+static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
{
struct bpf_sock_ops_kern sock_ops;
int ret;
@@ -2018,6 +2018,8 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
sock_ops.sk = sk;
sock_ops.op = op;
+ if (nargs > 0)
+ memcpy(sock_ops.args, args, nargs*sizeof(u32));
ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
if (ret == 0)
@@ -2026,18 +2028,70 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
ret = -1;
return ret;
}
+
+static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
+{
+ return tcp_call_bpf(sk, op, 1, &arg);
+}
+
+static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
+{
+ u32 args[2] = {arg1, arg2};
+
+ return tcp_call_bpf(sk, op, 2, args);
+}
+
+static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+ u32 arg3)
+{
+ u32 args[3] = {arg1, arg2, arg3};
+
+ return tcp_call_bpf(sk, op, 3, args);
+}
+
+static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+ u32 arg3, u32 arg4)
+{
+ u32 args[4] = {arg1, arg2, arg3, arg4};
+
+ return tcp_call_bpf(sk, op, 4, args);
+}
+
#else
-static inline int tcp_call_bpf(struct sock *sk, int op)
+static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
{
return -EPERM;
}
+
+static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
+{
+ return -EPERM;
+}
+
+static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
+{
+ return -EPERM;
+}
+
+static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+ u32 arg3)
+{
+ return -EPERM;
+}
+
+static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
+ u32 arg3, u32 arg4)
+{
+ return -EPERM;
+}
+
#endif
static inline u32 tcp_timeout_init(struct sock *sk)
{
int timeout;
- timeout = tcp_call_bpf(sk, BPF_SOCK_OPS_TIMEOUT_INIT);
+ timeout = tcp_call_bpf(sk, BPF_SOCK_OPS_TIMEOUT_INIT, 0, NULL);
if (timeout <= 0)
timeout = TCP_TIMEOUT_INIT;
@@ -2048,7 +2102,7 @@ static inline u32 tcp_rwnd_init_bpf(struct sock *sk)
{
int rwnd;
- rwnd = tcp_call_bpf(sk, BPF_SOCK_OPS_RWND_INIT);
+ rwnd = tcp_call_bpf(sk, BPF_SOCK_OPS_RWND_INIT, 0, NULL);
if (rwnd < 0)
rwnd = 0;
@@ -2057,7 +2111,7 @@ static inline u32 tcp_rwnd_init_bpf(struct sock *sk)
static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
{
- return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN) == 1);
+ return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1);
}
#if IS_ENABLED(CONFIG_SMC)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406c19d..8d5874c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -952,8 +952,9 @@ struct bpf_map_info {
struct bpf_sock_ops {
__u32 op;
union {
- __u32 reply;
- __u32 replylong[4];
+ __u32 args[4]; /* Optionally passed to bpf program */
+ __u32 reply; /* Returned by bpf program */
+ __u32 replylong[4]; /* Optionally returned by bpf prog */
};
__u32 family;
__u32 remote_ip4; /* Stored in network byte order */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d7cf861..88b6244 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -463,7 +463,7 @@ void tcp_init_transfer(struct sock *sk, int bpf_op)
tcp_mtup_init(sk);
icsk->icsk_af_ops->rebuild_header(sk);
tcp_init_metrics(sk);
- tcp_call_bpf(sk, bpf_op);
+ tcp_call_bpf(sk, bpf_op, 0, NULL);
tcp_init_congestion_control(sk);
tcp_init_buffer_space(sk);
}
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
index 0b5a05b..ddbce73 100644
--- a/net/ipv4/tcp_nv.c
+++ b/net/ipv4/tcp_nv.c
@@ -146,7 +146,7 @@ static void tcpnv_init(struct sock *sk)
* within a datacenter, where we have reasonable estimates of
* RTTs
*/
- base_rtt = tcp_call_bpf(sk, BPF_SOCK_OPS_BASE_RTT);
+ base_rtt = tcp_call_bpf(sk, BPF_SOCK_OPS_BASE_RTT, 0, NULL);
if (base_rtt > 0) {
ca->nv_base_rtt = base_rtt;
ca->nv_lower_bound_rtt = (base_rtt * 205) >> 8; /* 80% */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95461f0..d12f7f7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3469,7 +3469,7 @@ int tcp_connect(struct sock *sk)
struct sk_buff *buff;
int err;
- tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_CONNECT_CB);
+ tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_CONNECT_CB, 0, NULL);
if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
return -EHOSTUNREACH; /* Routing failure or similar. */
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function
2018-01-20 1:45 ` [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function Lawrence Brakmo
@ 2018-01-24 1:11 ` Daniel Borkmann
2018-01-24 1:30 ` Lawrence Brakmo
0 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2018-01-24 1:11 UTC (permalink / raw)
To: Lawrence Brakmo, netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet,
Neal Cardwell, Yuchung Cheng
On 01/20/2018 02:45 AM, Lawrence Brakmo wrote:
> Adds support for passing up to 4 arguments to sock_ops bpf functions. It
> reusues the reply union, so the bpf_sock_ops structures are not
> increased in size.
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
> ---
> include/linux/filter.h | 1 +
> include/net/tcp.h | 64 ++++++++++++++++++++++++++++++++++++++++++++----
> include/uapi/linux/bpf.h | 5 ++--
> net/ipv4/tcp.c | 2 +-
> net/ipv4/tcp_nv.c | 2 +-
> net/ipv4/tcp_output.c | 2 +-
> 6 files changed, 66 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index daa5a67..20384c4 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1003,6 +1003,7 @@ struct bpf_sock_ops_kern {
> struct sock *sk;
> u32 op;
> union {
> + u32 args[4];
> u32 reply;
> u32 replylong[4];
> };
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 108d16a..8e9111f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2005,7 +2005,7 @@ void tcp_cleanup_ulp(struct sock *sk);
> * program loaded).
> */
> #ifdef CONFIG_BPF
> -static inline int tcp_call_bpf(struct sock *sk, int op)
> +static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
> {
> struct bpf_sock_ops_kern sock_ops;
> int ret;
> @@ -2018,6 +2018,8 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
>
> sock_ops.sk = sk;
> sock_ops.op = op;
> + if (nargs > 0)
> + memcpy(sock_ops.args, args, nargs*sizeof(u32));
Small nit given respin: nargs * sizeof(*args)
> ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
> if (ret == 0)
> @@ -2026,18 +2028,70 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
> ret = -1;
> return ret;
> }
> +
> +static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
> +{
> + return tcp_call_bpf(sk, op, 1, &arg);
> +}
> +
> +static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
> +{
> + u32 args[2] = {arg1, arg2};
> +
> + return tcp_call_bpf(sk, op, 2, args);
> +}
> +
> +static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3)
> +{
> + u32 args[3] = {arg1, arg2, arg3};
> +
> + return tcp_call_bpf(sk, op, 3, args);
> +}
> +
> +static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3, u32 arg4)
> +{
> + u32 args[4] = {arg1, arg2, arg3, arg4};
> +
> + return tcp_call_bpf(sk, op, 4, args);
> +}
> +
> #else
> -static inline int tcp_call_bpf(struct sock *sk, int op)
> +static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
> {
> return -EPERM;
> }
> +
> +static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3)
indent: arg3
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3, u32 arg4)
> +{
> + return -EPERM;
> +}
> +
> #endif
tcp_call_bpf_1arg() and tcp_call_bpf_4arg() unused for the time being?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function
2018-01-24 1:11 ` Daniel Borkmann
@ 2018-01-24 1:30 ` Lawrence Brakmo
2018-01-24 1:34 ` Daniel Borkmann
0 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-24 1:30 UTC (permalink / raw)
To: Daniel Borkmann, netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet,
Neal Cardwell, Yuchung Cheng
On 1/23/18, 5:11 PM, "Daniel Borkmann" <daniel@iogearbox.net> wrote:
On 01/20/2018 02:45 AM, Lawrence Brakmo wrote:
> Adds support for passing up to 4 arguments to sock_ops bpf functions. It
> reusues the reply union, so the bpf_sock_ops structures are not
> increased in size.
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
> ---
> include/linux/filter.h | 1 +
> include/net/tcp.h | 64 ++++++++++++++++++++++++++++++++++++++++++++----
> include/uapi/linux/bpf.h | 5 ++--
> net/ipv4/tcp.c | 2 +-
> net/ipv4/tcp_nv.c | 2 +-
> net/ipv4/tcp_output.c | 2 +-
> 6 files changed, 66 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index daa5a67..20384c4 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1003,6 +1003,7 @@ struct bpf_sock_ops_kern {
> struct sock *sk;
> u32 op;
> union {
> + u32 args[4];
> u32 reply;
> u32 replylong[4];
> };
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 108d16a..8e9111f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2005,7 +2005,7 @@ void tcp_cleanup_ulp(struct sock *sk);
> * program loaded).
> */
> #ifdef CONFIG_BPF
> -static inline int tcp_call_bpf(struct sock *sk, int op)
> +static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
> {
> struct bpf_sock_ops_kern sock_ops;
> int ret;
> @@ -2018,6 +2018,8 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
>
> sock_ops.sk = sk;
> sock_ops.op = op;
> + if (nargs > 0)
> + memcpy(sock_ops.args, args, nargs*sizeof(u32));
Small nit given respin: nargs * sizeof(*args)
Thanks, will fix.
> ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
> if (ret == 0)
> @@ -2026,18 +2028,70 @@ static inline int tcp_call_bpf(struct sock *sk, int op)
> ret = -1;
> return ret;
> }
> +
> +static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
> +{
> + return tcp_call_bpf(sk, op, 1, &arg);
> +}
> +
> +static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
> +{
> + u32 args[2] = {arg1, arg2};
> +
> + return tcp_call_bpf(sk, op, 2, args);
> +}
> +
> +static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3)
> +{
> + u32 args[3] = {arg1, arg2, arg3};
> +
> + return tcp_call_bpf(sk, op, 3, args);
> +}
> +
> +static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3, u32 arg4)
> +{
> + u32 args[4] = {arg1, arg2, arg3, arg4};
> +
> + return tcp_call_bpf(sk, op, 4, args);
> +}
> +
> #else
> -static inline int tcp_call_bpf(struct sock *sk, int op)
> +static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
> {
> return -EPERM;
> }
> +
> +static inline int tcp_call_bpf_1arg(struct sock *sk, int op, u32 arg)
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2)
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3)
indent: arg3
OK
> +{
> + return -EPERM;
> +}
> +
> +static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> + u32 arg3, u32 arg4)
> +{
> + return -EPERM;
> +}
> +
> #endif
tcp_call_bpf_1arg() and tcp_call_bpf_4arg() unused for the time being?
Yes, I just thought I should add them for completeness. Should I remove them until
they are actually used?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function
2018-01-24 1:30 ` Lawrence Brakmo
@ 2018-01-24 1:34 ` Daniel Borkmann
0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2018-01-24 1:34 UTC (permalink / raw)
To: Lawrence Brakmo, netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet,
Neal Cardwell, Yuchung Cheng
On 01/24/2018 02:30 AM, Lawrence Brakmo wrote:
> On 1/23/18, 5:11 PM, "Daniel Borkmann" <daniel@iogearbox.net> wrote:
[...]
> > +{
> > + return -EPERM;
> > +}
> > +
> > +static inline int tcp_call_bpf_4arg(struct sock *sk, int op, u32 arg1, u32 arg2,
> > + u32 arg3, u32 arg4)
> > +{
> > + return -EPERM;
> > +}
> > +
> > #endif
>
> tcp_call_bpf_1arg() and tcp_call_bpf_4arg() unused for the time being?
>
> Yes, I just thought I should add them for completeness. Should I remove them until
> they are actually used?
Yeah, I think that would be preferred way.
Thanks again,
Daniel
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (3 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 04/11] bpf: Support passing args to sock_ops bpf function Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 3:52 ` Alexei Starovoitov
2018-01-20 1:45 ` [PATCH bpf-next v6 06/11] bpf: Add sock_ops RTO callback Lawrence Brakmo
` (5 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
use is to determine if there should be calls to sock_ops bpf program at
various points in the TCP code. The field is initialized to zero,
disabling the calls. A sock_ops BPF program can set it, per connection and
as necessary, when the connection is established.
It also adds support for reading and writting the field within a
sock_ops BPF program. Reading is done by accessing the field directly.
However, writing is done through the helper function
bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
is trying to set a callback that is not supported in the current kernel
(i.e. running an older kernel). The helper function returns 0 if it was
able to set all of the bits set in the argument, a positive number
containing the bits that could not be set, or -EINVAL if the socket is
not a full TCP socket.
Examples of where one could call the bpf program:
1) When RTO fires
2) When a packet is retransmitted
3) When the connection terminates
4) When a packet is sent
5) When a packet is received
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/linux/tcp.h | 11 +++++++++++
include/uapi/linux/bpf.h | 12 +++++++++++-
include/uapi/linux/tcp.h | 5 +++++
net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 4f93f095..8f4c549 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -335,6 +335,17 @@ struct tcp_sock {
int linger2;
+
+/* Sock_ops bpf program related variables */
+#ifdef CONFIG_BPF
+ u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
+ * values defined in uapi/linux/tcp.h
+ */
+#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
+#else
+#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
+#endif
+
/* Receiver side RTT estimation */
struct {
u32 rtt_us;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8d5874c..7573f5b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -642,6 +642,14 @@ union bpf_attr {
* @optlen: length of optval in bytes
* Return: 0 or negative error
*
+ * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
+ * Set callback flags for sock_ops
+ * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
+ * @flags: flags value
+ * Return: 0 for no error
+ * -EINVAL if there is no full tcp socket
+ * bits in flags that are not supported by current kernel
+ *
* int bpf_skb_adjust_room(skb, len_diff, mode, flags)
* Grow or shrink room in sk_buff.
* @skb: pointer to skb
@@ -748,7 +756,8 @@ union bpf_attr {
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
FN(getsockopt), \
- FN(override_return),
+ FN(override_return), \
+ FN(sock_ops_cb_flags_set),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -969,6 +978,7 @@ struct bpf_sock_ops {
*/
__u32 snd_cwnd;
__u32 srtt_us; /* Averaged RTT << 3 in usecs */
+ __u32 bpf_sock_ops_cb_flags; /* flags defined in uapi/linux/tcp.h */
};
/* List of known BPF sock_ops operators.
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b4a4f64..d1df2f6 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -268,4 +268,9 @@ struct tcp_diag_md5sig {
__u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN];
};
+/* Definitions for bpf_sock_ops_cb_flags */
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0 /* Mask of all currently
+ * supported cb flags
+ */
+
#endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index 1ff36ca..c9411dc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3324,6 +3324,33 @@ static const struct bpf_func_proto bpf_getsockopt_proto = {
.arg5_type = ARG_CONST_SIZE,
};
+BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
+ int, argval)
+{
+ struct sock *sk = bpf_sock->sk;
+ int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
+
+ if (!sk_fullsock(sk))
+ return -EINVAL;
+
+#ifdef CONFIG_INET
+ if (val)
+ tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
+
+ return argval & (~BPF_SOCK_OPS_ALL_CB_FLAGS);
+#else
+ return -EINVAL;
+#endif
+}
+
+static const struct bpf_func_proto bpf_sock_ops_cb_flags_set_proto = {
+ .func = bpf_sock_ops_cb_flags_set,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+};
+
static const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
@@ -3504,6 +3531,8 @@ static const struct bpf_func_proto *
return &bpf_setsockopt_proto;
case BPF_FUNC_getsockopt:
return &bpf_getsockopt_proto;
+ case BPF_FUNC_sock_ops_cb_flags_set:
+ return &bpf_sock_ops_cb_flags_set_proto;
case BPF_FUNC_sock_map_update:
return &bpf_sock_map_update_proto;
default:
@@ -4541,6 +4570,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
case offsetof(struct bpf_sock_ops, srtt_us):
SOCK_OPS_GET_FIELD(srtt_us, srtt_us, struct tcp_sock);
break;
+
+ case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
+ SOCK_OPS_GET_FIELD(bpf_sock_ops_cb_flags, bpf_sock_ops_cb_flags,
+ struct tcp_sock);
+ break;
}
return insn - insn_buf;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock
2018-01-20 1:45 ` [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock Lawrence Brakmo
@ 2018-01-20 3:52 ` Alexei Starovoitov
2018-01-20 7:50 ` Lawrence Brakmo
2018-01-23 17:29 ` Eric Dumazet
0 siblings, 2 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 3:52 UTC (permalink / raw)
To: Lawrence Brakmo
Cc: netdev, Kernel Team, Blake Matheny, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Neal Cardwell, Yuchung Cheng
On Fri, Jan 19, 2018 at 05:45:42PM -0800, Lawrence Brakmo wrote:
> Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
> use is to determine if there should be calls to sock_ops bpf program at
> various points in the TCP code. The field is initialized to zero,
> disabling the calls. A sock_ops BPF program can set it, per connection and
> as necessary, when the connection is established.
>
> It also adds support for reading and writting the field within a
> sock_ops BPF program. Reading is done by accessing the field directly.
> However, writing is done through the helper function
> bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
> is trying to set a callback that is not supported in the current kernel
> (i.e. running an older kernel). The helper function returns 0 if it was
> able to set all of the bits set in the argument, a positive number
> containing the bits that could not be set, or -EINVAL if the socket is
> not a full TCP socket.
...
> +/* Sock_ops bpf program related variables */
> +#ifdef CONFIG_BPF
> + u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
> + * values defined in uapi/linux/tcp.h
I guess we can extend u8 into u16 or more if necessary in the future.
> + * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
> + * Set callback flags for sock_ops
> + * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
> + * @flags: flags value
> + * Return: 0 for no error
> + * -EINVAL if there is no full tcp socket
> + * bits in flags that are not supported by current kernel
...
> +BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
> + int, argval)
> +{
> + struct sock *sk = bpf_sock->sk;
> + int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
> +
> + if (!sk_fullsock(sk))
> + return -EINVAL;
> +
> +#ifdef CONFIG_INET
> + if (val)
> + tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
> +
> + return argval & (~BPF_SOCK_OPS_ALL_CB_FLAGS);
interesting idea! took me some time to realize the potential
of such semantics, but now I like it a lot.
It blends 'set good flag' with 'which flags are supported' logic
into single helper. Nice.
Thanks for adding a test for both ways.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Eric, does this approach address your concerns?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock
2018-01-20 3:52 ` Alexei Starovoitov
@ 2018-01-20 7:50 ` Lawrence Brakmo
2018-01-23 17:29 ` Eric Dumazet
1 sibling, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 7:50 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev, Kernel Team, Blake Matheny, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Neal Cardwell, Yuchung Cheng
On 1/19/18, 7:52 PM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
On Fri, Jan 19, 2018 at 05:45:42PM -0800, Lawrence Brakmo wrote:
> Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
> use is to determine if there should be calls to sock_ops bpf program at
> various points in the TCP code. The field is initialized to zero,
> disabling the calls. A sock_ops BPF program can set it, per connection and
> as necessary, when the connection is established.
>
> It also adds support for reading and writting the field within a
> sock_ops BPF program. Reading is done by accessing the field directly.
> However, writing is done through the helper function
> bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
> is trying to set a callback that is not supported in the current kernel
> (i.e. running an older kernel). The helper function returns 0 if it was
> able to set all of the bits set in the argument, a positive number
> containing the bits that could not be set, or -EINVAL if the socket is
> not a full TCP socket.
...
> +/* Sock_ops bpf program related variables */
> +#ifdef CONFIG_BPF
> + u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
> + * values defined in uapi/linux/tcp.h
I guess we can extend u8 into u16 or more if necessary in the future.
Yes, that was my thought.
> + * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
> + * Set callback flags for sock_ops
> + * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
> + * @flags: flags value
> + * Return: 0 for no error
> + * -EINVAL if there is no full tcp socket
> + * bits in flags that are not supported by current kernel
...
> +BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
> + int, argval)
> +{
> + struct sock *sk = bpf_sock->sk;
> + int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
> +
> + if (!sk_fullsock(sk))
> + return -EINVAL;
> +
> +#ifdef CONFIG_INET
> + if (val)
> + tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
> +
> + return argval & (~BPF_SOCK_OPS_ALL_CB_FLAGS);
interesting idea! took me some time to realize the potential
of such semantics, but now I like it a lot.
It blends 'set good flag' with 'which flags are supported' logic
into single helper. Nice.
Thanks for adding a test for both ways.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Eric, does this approach address your concerns?
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock
2018-01-20 3:52 ` Alexei Starovoitov
2018-01-20 7:50 ` Lawrence Brakmo
@ 2018-01-23 17:29 ` Eric Dumazet
1 sibling, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2018-01-23 17:29 UTC (permalink / raw)
To: Alexei Starovoitov, Lawrence Brakmo
Cc: netdev, Kernel Team, Blake Matheny, Alexei Starovoitov,
Daniel Borkmann, Neal Cardwell, Yuchung Cheng
On Fri, 2018-01-19 at 19:52 -0800, Alexei Starovoitov wrote:
> On Fri, Jan 19, 2018 at 05:45:42PM -0800, Lawrence Brakmo wrote:
> > Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
> > use is to determine if there should be calls to sock_ops bpf program at
> > various points in the TCP code. The field is initialized to zero,
> > disabling the calls. A sock_ops BPF program can set it, per connection and
> > as necessary, when the connection is established.
> >
> > It also adds support for reading and writting the field within a
> > sock_ops BPF program. Reading is done by accessing the field directly.
> > However, writing is done through the helper function
> > bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
> > is trying to set a callback that is not supported in the current kernel
> > (i.e. running an older kernel). The helper function returns 0 if it was
> > able to set all of the bits set in the argument, a positive number
> > containing the bits that could not be set, or -EINVAL if the socket is
> > not a full TCP socket.
>
> ...
> > +/* Sock_ops bpf program related variables */
> > +#ifdef CONFIG_BPF
> > + u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
> > + * values defined in uapi/linux/tcp.h
>
> I guess we can extend u8 into u16 or more if necessary in the future.
>
> > + * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
> > + * Set callback flags for sock_ops
> > + * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
> > + * @flags: flags value
> > + * Return: 0 for no error
> > + * -EINVAL if there is no full tcp socket
> > + * bits in flags that are not supported by current kernel
>
> ...
> > +BPF_CALL_2(bpf_sock_ops_cb_flags_set, struct bpf_sock_ops_kern *, bpf_sock,
> > + int, argval)
> > +{
> > + struct sock *sk = bpf_sock->sk;
> > + int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
> > +
> > + if (!sk_fullsock(sk))
> > + return -EINVAL;
> > +
> > +#ifdef CONFIG_INET
> > + if (val)
> > + tcp_sk(sk)->bpf_sock_ops_cb_flags = val;
> > +
> > + return argval & (~BPF_SOCK_OPS_ALL_CB_FLAGS);
>
> interesting idea! took me some time to realize the potential
> of such semantics, but now I like it a lot.
> It blends 'set good flag' with 'which flags are supported' logic
> into single helper. Nice.
> Thanks for adding a test for both ways.
> Acked-by: Alexei Starovoitov <ast@kernel.org>
>
> Eric, does this approach address your concerns?
Yes, this seems fine, thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH bpf-next v6 06/11] bpf: Add sock_ops RTO callback
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (4 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 05/11] bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more Lawrence Brakmo
` (4 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds an optional call to sock_ops BPF program based on whether the
BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
The BPF program is passed 2 arguments: icsk_retransmits and whether the
RTO has expired.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/uapi/linux/bpf.h | 5 +++++
include/uapi/linux/tcp.h | 3 ++-
net/ipv4/tcp_timer.c | 7 +++++++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7573f5b..2a8c40a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1014,6 +1014,11 @@ enum {
* a congestion threshold. RTTs above
* this indicate congestion
*/
+ BPF_SOCK_OPS_RTO_CB, /* Called when an RTO has triggered.
+ * Arg1: value of icsk_retransmits
+ * Arg2: value of icsk_rto
+ * Arg3: whether RTO has expired
+ */
};
#define TCP_BPF_IW 1001 /* Set TCP initial congestion window */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index d1df2f6..129032ca 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -269,7 +269,8 @@ struct tcp_diag_md5sig {
};
/* Definitions for bpf_sock_ops_cb_flags */
-#define BPF_SOCK_OPS_ALL_CB_FLAGS 0 /* Mask of all currently
+#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x1 /* Mask of all currently
* supported cb flags
*/
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 6db3124..257abdd 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -213,11 +213,18 @@ static int tcp_write_timeout(struct sock *sk)
icsk->icsk_user_timeout);
}
tcp_fastopen_active_detect_blackhole(sk, expired);
+
+ if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
+ tcp_call_bpf_3arg(sk, BPF_SOCK_OPS_RTO_CB,
+ icsk->icsk_retransmits,
+ icsk->icsk_rto, (int)expired);
+
if (expired) {
/* Has it gone just too far? */
tcp_write_err(sk);
return 1;
}
+
return 0;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (5 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 06/11] bpf: Add sock_ops RTO callback Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-24 1:05 ` Daniel Borkmann
2018-01-20 1:45 ` [PATCH bpf-next v6 08/11] bpf: Add sock_ops R/W access to tclass & sk_txhash Lawrence Brakmo
` (3 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Add support for reading many more tcp_sock fields
state, same as sk->sk_state
rtt_min same as sk->rtt_min.s[0].v (current rtt_min)
snd_ssthresh
rcv_nxt
snd_nxt
snd_una
mss_cache
ecn_flags
rate_delivered
rate_interval_us
packets_out
retrans_out
total_retrans
segs_in
data_segs_in
segs_out
data_segs_out
bytes_received (__u64)
bytes_acked (__u64)
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/uapi/linux/bpf.h | 19 +++++++
net/core/filter.c | 134 ++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 140 insertions(+), 13 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2a8c40a..ff34f3c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -979,6 +979,25 @@ struct bpf_sock_ops {
__u32 snd_cwnd;
__u32 srtt_us; /* Averaged RTT << 3 in usecs */
__u32 bpf_sock_ops_cb_flags; /* flags defined in uapi/linux/tcp.h */
+ __u32 state;
+ __u32 rtt_min;
+ __u32 snd_ssthresh;
+ __u32 rcv_nxt;
+ __u32 snd_nxt;
+ __u32 snd_una;
+ __u32 mss_cache;
+ __u32 ecn_flags;
+ __u32 rate_delivered;
+ __u32 rate_interval_us;
+ __u32 packets_out;
+ __u32 retrans_out;
+ __u32 total_retrans;
+ __u32 segs_in;
+ __u32 data_segs_in;
+ __u32 segs_out;
+ __u32 data_segs_out;
+ __u64 bytes_received;
+ __u64 bytes_acked;
};
/* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index c9411dc..98665ba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3849,34 +3849,43 @@ void bpf_warn_invalid_xdp_action(u32 act)
}
EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
-static bool __is_valid_sock_ops_access(int off, int size)
+static bool sock_ops_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
{
+ const int size_default = sizeof(__u32);
+
if (off < 0 || off >= sizeof(struct bpf_sock_ops))
return false;
+
/* The verifier guarantees that size > 0. */
if (off % size != 0)
return false;
- if (size != sizeof(__u32))
- return false;
-
- return true;
-}
-static bool sock_ops_is_valid_access(int off, int size,
- enum bpf_access_type type,
- struct bpf_insn_access_aux *info)
-{
if (type == BPF_WRITE) {
switch (off) {
- case offsetof(struct bpf_sock_ops, op) ...
- offsetof(struct bpf_sock_ops, replylong[3]):
+ case bpf_ctx_range_till(struct bpf_sock_ops, op, replylong[3]):
+ if (size != size_default)
+ return false;
break;
default:
return false;
}
+ } else {
+ switch (off) {
+ case bpf_ctx_range_till(struct bpf_sock_ops, bytes_received,
+ bytes_acked):
+ if (size != sizeof(__u64))
+ return false;
+ break;
+ default:
+ if (size != size_default)
+ return false;
+ break;
+ }
}
- return __is_valid_sock_ops_access(off, size);
+ return true;
}
static int sk_skb_prologue(struct bpf_insn *insn_buf, bool direct_write,
@@ -4493,6 +4502,32 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
is_fullsock));
break;
+ case offsetof(struct bpf_sock_ops, state):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_state) != 1);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct bpf_sock_ops_kern, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern, sk));
+ *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common, skc_state));
+ break;
+
+ case offsetof(struct bpf_sock_ops, rtt_min):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct tcp_sock, rtt_min) !=
+ sizeof(struct minmax));
+ BUILD_BUG_ON(sizeof(struct minmax) <
+ sizeof(struct minmax_sample));
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct bpf_sock_ops_kern, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern, sk));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct tcp_sock, rtt_min) +
+ FIELD_SIZEOF(struct minmax_sample, t));
+ break;
+
/* Helper macro for adding read access to tcp_sock or sock fields. */
#define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ) \
do { \
@@ -4575,6 +4610,79 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
SOCK_OPS_GET_FIELD(bpf_sock_ops_cb_flags, bpf_sock_ops_cb_flags,
struct tcp_sock);
break;
+
+ case offsetof(struct bpf_sock_ops, snd_ssthresh):
+ SOCK_OPS_GET_FIELD(snd_ssthresh, snd_ssthresh, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, rcv_nxt):
+ SOCK_OPS_GET_FIELD(rcv_nxt, rcv_nxt, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, snd_nxt):
+ SOCK_OPS_GET_FIELD(snd_nxt, snd_nxt, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, snd_una):
+ SOCK_OPS_GET_FIELD(snd_una, snd_una, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, mss_cache):
+ SOCK_OPS_GET_FIELD(mss_cache, mss_cache, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, ecn_flags):
+ SOCK_OPS_GET_FIELD(ecn_flags, ecn_flags, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, rate_delivered):
+ SOCK_OPS_GET_FIELD(rate_delivered, rate_delivered,
+ struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, rate_interval_us):
+ SOCK_OPS_GET_FIELD(rate_interval_us, rate_interval_us,
+ struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, packets_out):
+ SOCK_OPS_GET_FIELD(packets_out, packets_out, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, retrans_out):
+ SOCK_OPS_GET_FIELD(retrans_out, retrans_out, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, total_retrans):
+ SOCK_OPS_GET_FIELD(total_retrans, total_retrans,
+ struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, segs_in):
+ SOCK_OPS_GET_FIELD(segs_in, segs_in, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, data_segs_in):
+ SOCK_OPS_GET_FIELD(data_segs_in, data_segs_in, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, segs_out):
+ SOCK_OPS_GET_FIELD(segs_out, segs_out, struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, data_segs_out):
+ SOCK_OPS_GET_FIELD(data_segs_out, data_segs_out,
+ struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, bytes_received):
+ SOCK_OPS_GET_FIELD(bytes_received, bytes_received,
+ struct tcp_sock);
+ break;
+
+ case offsetof(struct bpf_sock_ops, bytes_acked):
+ SOCK_OPS_GET_FIELD(bytes_acked, bytes_acked, struct tcp_sock);
+ break;
}
return insn - insn_buf;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more
2018-01-20 1:45 ` [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more Lawrence Brakmo
@ 2018-01-24 1:05 ` Daniel Borkmann
2018-01-24 1:27 ` Lawrence Brakmo
0 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2018-01-24 1:05 UTC (permalink / raw)
To: Lawrence Brakmo, netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet,
Neal Cardwell, Yuchung Cheng
On 01/20/2018 02:45 AM, Lawrence Brakmo wrote:
> Add support for reading many more tcp_sock fields
>
> state, same as sk->sk_state
> rtt_min same as sk->rtt_min.s[0].v (current rtt_min)
> snd_ssthresh
> rcv_nxt
> snd_nxt
> snd_una
> mss_cache
> ecn_flags
> rate_delivered
> rate_interval_us
> packets_out
> retrans_out
> total_retrans
> segs_in
> data_segs_in
> segs_out
> data_segs_out
> bytes_received (__u64)
> bytes_acked (__u64)
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
> ---
> include/uapi/linux/bpf.h | 19 +++++++
> net/core/filter.c | 134 ++++++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 140 insertions(+), 13 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2a8c40a..ff34f3c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -979,6 +979,25 @@ struct bpf_sock_ops {
> __u32 snd_cwnd;
> __u32 srtt_us; /* Averaged RTT << 3 in usecs */
> __u32 bpf_sock_ops_cb_flags; /* flags defined in uapi/linux/tcp.h */
> + __u32 state;
> + __u32 rtt_min;
> + __u32 snd_ssthresh;
> + __u32 rcv_nxt;
> + __u32 snd_nxt;
> + __u32 snd_una;
> + __u32 mss_cache;
> + __u32 ecn_flags;
> + __u32 rate_delivered;
> + __u32 rate_interval_us;
> + __u32 packets_out;
> + __u32 retrans_out;
> + __u32 total_retrans;
> + __u32 segs_in;
> + __u32 data_segs_in;
> + __u32 segs_out;
> + __u32 data_segs_out;
Btw, this will have a 4 bytes hole in here which the user can otherwise
address out of the prog. Could you add the sk_txhash from the next patch
in between here instead?
> + __u64 bytes_received;
> + __u64 bytes_acked;
> };
>
> /* List of known BPF sock_ops operators.
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c9411dc..98665ba 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3849,34 +3849,43 @@ void bpf_warn_invalid_xdp_action(u32 act)
> }
> EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
>
> -static bool __is_valid_sock_ops_access(int off, int size)
> +static bool sock_ops_is_valid_access(int off, int size,
> + enum bpf_access_type type,
> + struct bpf_insn_access_aux *info)
> {
> + const int size_default = sizeof(__u32);
> +
> if (off < 0 || off >= sizeof(struct bpf_sock_ops))
> return false;
> +
> /* The verifier guarantees that size > 0. */
> if (off % size != 0)
> return false;
> - if (size != sizeof(__u32))
> - return false;
> -
> - return true;
> -}
>
> -static bool sock_ops_is_valid_access(int off, int size,
> - enum bpf_access_type type,
> - struct bpf_insn_access_aux *info)
> -{
> if (type == BPF_WRITE) {
> switch (off) {
> - case offsetof(struct bpf_sock_ops, op) ...
> - offsetof(struct bpf_sock_ops, replylong[3]):
> + case bpf_ctx_range_till(struct bpf_sock_ops, op, replylong[3]):
> + if (size != size_default)
> + return false;
> break;
> default:
> return false;
> }
> + } else {
> + switch (off) {
> + case bpf_ctx_range_till(struct bpf_sock_ops, bytes_received,
> + bytes_acked):
> + if (size != sizeof(__u64))
> + return false;
> + break;
> + default:
> + if (size != size_default)
> + return false;
> + break;
> + }
> }
>
> - return __is_valid_sock_ops_access(off, size);
> + return true;
> }
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more
2018-01-24 1:05 ` Daniel Borkmann
@ 2018-01-24 1:27 ` Lawrence Brakmo
0 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-24 1:27 UTC (permalink / raw)
To: Daniel Borkmann, netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet,
Neal Cardwell, Yuchung Cheng
On 1/23/18, 5:05 PM, "Daniel Borkmann" <daniel@iogearbox.net> wrote:
On 01/20/2018 02:45 AM, Lawrence Brakmo wrote:
> Add support for reading many more tcp_sock fields
>
> state, same as sk->sk_state
> rtt_min same as sk->rtt_min.s[0].v (current rtt_min)
> snd_ssthresh
> rcv_nxt
> snd_nxt
> snd_una
> mss_cache
> ecn_flags
> rate_delivered
> rate_interval_us
> packets_out
> retrans_out
> total_retrans
> segs_in
> data_segs_in
> segs_out
> data_segs_out
> bytes_received (__u64)
> bytes_acked (__u64)
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
> ---
> include/uapi/linux/bpf.h | 19 +++++++
> net/core/filter.c | 134 ++++++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 140 insertions(+), 13 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2a8c40a..ff34f3c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -979,6 +979,25 @@ struct bpf_sock_ops {
> __u32 snd_cwnd;
> __u32 srtt_us; /* Averaged RTT << 3 in usecs */
> __u32 bpf_sock_ops_cb_flags; /* flags defined in uapi/linux/tcp.h */
> + __u32 state;
> + __u32 rtt_min;
> + __u32 snd_ssthresh;
> + __u32 rcv_nxt;
> + __u32 snd_nxt;
> + __u32 snd_una;
> + __u32 mss_cache;
> + __u32 ecn_flags;
> + __u32 rate_delivered;
> + __u32 rate_interval_us;
> + __u32 packets_out;
> + __u32 retrans_out;
> + __u32 total_retrans;
> + __u32 segs_in;
> + __u32 data_segs_in;
> + __u32 segs_out;
> + __u32 data_segs_out;
Btw, this will have a 4 bytes hole in here which the user can otherwise
address out of the prog. Could you add the sk_txhash from the next patch
in between here instead?
Good point. Will fix in new patch. Thanks Daniel.
> + __u64 bytes_received;
> + __u64 bytes_acked;
> };
>
> /* List of known BPF sock_ops operators.
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c9411dc..98665ba 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3849,34 +3849,43 @@ void bpf_warn_invalid_xdp_action(u32 act)
> }
> EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
>
> -static bool __is_valid_sock_ops_access(int off, int size)
> +static bool sock_ops_is_valid_access(int off, int size,
> + enum bpf_access_type type,
> + struct bpf_insn_access_aux *info)
> {
> + const int size_default = sizeof(__u32);
> +
> if (off < 0 || off >= sizeof(struct bpf_sock_ops))
> return false;
> +
> /* The verifier guarantees that size > 0. */
> if (off % size != 0)
> return false;
> - if (size != sizeof(__u32))
> - return false;
> -
> - return true;
> -}
>
> -static bool sock_ops_is_valid_access(int off, int size,
> - enum bpf_access_type type,
> - struct bpf_insn_access_aux *info)
> -{
> if (type == BPF_WRITE) {
> switch (off) {
> - case offsetof(struct bpf_sock_ops, op) ...
> - offsetof(struct bpf_sock_ops, replylong[3]):
> + case bpf_ctx_range_till(struct bpf_sock_ops, op, replylong[3]):
> + if (size != size_default)
> + return false;
> break;
> default:
> return false;
> }
> + } else {
> + switch (off) {
> + case bpf_ctx_range_till(struct bpf_sock_ops, bytes_received,
> + bytes_acked):
> + if (size != sizeof(__u64))
> + return false;
> + break;
> + default:
> + if (size != size_default)
> + return false;
> + break;
> + }
> }
>
> - return __is_valid_sock_ops_access(off, size);
> + return true;
> }
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH bpf-next v6 08/11] bpf: Add sock_ops R/W access to tclass & sk_txhash
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (6 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 07/11] bpf: Add support for reading sk_state and more Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 09/11] bpf: Add BPF_SOCK_OPS_RETRANS_CB Lawrence Brakmo
` (2 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds direct R/W access to sk_txhash and access to tclass for ipv6 flows
through getsockopt and setsockopt. Sample usage for tclass:
bpf_getsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v, sizeof(v))
where skops is a pointer to the ctx (struct bpf_sock_ops).
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/uapi/linux/bpf.h | 1 +
net/core/filter.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ff34f3c..1c80ff4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -998,6 +998,7 @@ struct bpf_sock_ops {
__u32 data_segs_out;
__u64 bytes_received;
__u64 bytes_acked;
+ __u32 sk_txhash;
};
/* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index 98665ba..e136796 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3228,6 +3228,29 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
ret = -EINVAL;
}
#ifdef CONFIG_INET
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (level == SOL_IPV6) {
+ if (optlen != sizeof(int) || sk->sk_family != AF_INET6)
+ return -EINVAL;
+
+ val = *((int *)optval);
+ /* Only some options are supported */
+ switch (optname) {
+ case IPV6_TCLASS:
+ if (val < -1 || val > 0xff) {
+ ret = -EINVAL;
+ } else {
+ struct ipv6_pinfo *np = inet6_sk(sk);
+
+ if (val == -1)
+ val = 0;
+ np->tclass = val;
+ }
+ break;
+ default:
+ ret = -EINVAL;
+ }
+#endif
} else if (level == SOL_TCP &&
sk->sk_prot->setsockopt == tcp_setsockopt) {
if (optname == TCP_CONGESTION) {
@@ -3237,7 +3260,8 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
strncpy(name, optval, min_t(long, optlen,
TCP_CA_NAME_MAX-1));
name[TCP_CA_NAME_MAX-1] = 0;
- ret = tcp_set_congestion_control(sk, name, false, reinit);
+ ret = tcp_set_congestion_control(sk, name, false,
+ reinit);
} else {
struct tcp_sock *tp = tcp_sk(sk);
@@ -3303,6 +3327,22 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
} else {
goto err_clear;
}
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (level == SOL_IPV6) {
+ struct ipv6_pinfo *np = inet6_sk(sk);
+
+ if (optlen != sizeof(int) || sk->sk_family != AF_INET6)
+ goto err_clear;
+
+ /* Only some options are supported */
+ switch (optname) {
+ case IPV6_TCLASS:
+ *((int *)optval) = (int)np->tclass;
+ break;
+ default:
+ goto err_clear;
+ }
+#endif
} else {
goto err_clear;
}
@@ -3865,6 +3905,7 @@ static bool sock_ops_is_valid_access(int off, int size,
if (type == BPF_WRITE) {
switch (off) {
case bpf_ctx_range_till(struct bpf_sock_ops, op, replylong[3]):
+ case bpf_ctx_range(struct bpf_sock_ops, sk_txhash):
if (size != size_default)
return false;
break;
@@ -4683,6 +4724,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
case offsetof(struct bpf_sock_ops, bytes_acked):
SOCK_OPS_GET_FIELD(bytes_acked, bytes_acked, struct tcp_sock);
break;
+
+ case offsetof(struct bpf_sock_ops, sk_txhash):
+ SOCK_OPS_GET_OR_SET_FIELD(sk_txhash, sk_txhash,
+ struct sock, type);
+ break;
}
return insn - insn_buf;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 09/11] bpf: Add BPF_SOCK_OPS_RETRANS_CB
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (7 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 08/11] bpf: Add sock_ops R/W access to tclass & sk_txhash Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 10/11] bpf: Add BPF_SOCK_OPS_STATE_CB Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 11/11] bpf: add selftest for tcpbpf Lawrence Brakmo
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds support for calling sock_ops BPF program when there is a
retransmission. Two arguments are used; one for the sequence number and
other for the number of segments retransmitted. Does not include syn-ack
retransmissions.
New op: BPF_SOCK_OPS_RETRANS_CB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/uapi/linux/bpf.h | 4 ++++
include/uapi/linux/tcp.h | 3 ++-
net/ipv4/tcp_output.c | 3 +++
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1c80ff4..2f752f3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1039,6 +1039,10 @@ enum {
* Arg2: value of icsk_rto
* Arg3: whether RTO has expired
*/
+ BPF_SOCK_OPS_RETRANS_CB, /* Called when skb is retransmitted.
+ * Arg1: sequence number of 1st byte
+ * Arg2: # segments
+ */
};
#define TCP_BPF_IW 1001 /* Set TCP initial congestion window */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 129032ca..ec03a2b 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -270,7 +270,8 @@ struct tcp_diag_md5sig {
/* Definitions for bpf_sock_ops_cb_flags */
#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x1 /* Mask of all currently
+#define BPF_SOCK_OPS_RETRANS_CB_FLAG (1<<1)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x3 /* Mask of all currently
* supported cb flags
*/
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d12f7f7..f7d34f01 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2908,6 +2908,9 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
if (likely(!err)) {
TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
trace_tcp_retransmit_skb(sk, skb);
+ if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RETRANS_CB_FLAG))
+ tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_RETRANS_CB,
+ TCP_SKB_CB(skb)->seq, segs);
} else if (err != -EBUSY) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRETRANSFAIL);
}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 10/11] bpf: Add BPF_SOCK_OPS_STATE_CB
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (8 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 09/11] bpf: Add BPF_SOCK_OPS_RETRANS_CB Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 1:45 ` [PATCH bpf-next v6 11/11] bpf: add selftest for tcpbpf Lawrence Brakmo
10 siblings, 0 replies; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Adds support for calling sock_ops BPF program when there is a TCP state
change. Two arguments are used; one for the old state and another for
the new state.
There is a new enum in include/uapi/linux/bpf.h that exports the TCP
states that prepends BPF_ to the current TCP state names. If it is ever
necessary to change the internal TCP state values (other than adding
more to the end), then it will become necessary to convert from the
internal TCP state value to the BPF value before calling the BPF
sock_ops function. There are a set of compile checks added in tcp.c
to detect if the internal and BPF values differ so we can make the
necessary fixes.
New op: BPF_SOCK_OPS_STATE_CB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
include/uapi/linux/bpf.h | 26 ++++++++++++++++++++++++++
include/uapi/linux/tcp.h | 3 ++-
net/ipv4/tcp.c | 24 ++++++++++++++++++++++++
3 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2f752f3..0e336cf8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1043,6 +1043,32 @@ enum {
* Arg1: sequence number of 1st byte
* Arg2: # segments
*/
+ BPF_SOCK_OPS_STATE_CB, /* Called when TCP changes state.
+ * Arg1: old_state
+ * Arg2: new_state
+ */
+};
+
+/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
+ * changes between the TCP and BPF versions. Ideally this should never happen.
+ * If it does, we need to add code to convert them before calling
+ * the BPF sock_ops function.
+ */
+enum {
+ BPF_TCP_ESTABLISHED = 1,
+ BPF_TCP_SYN_SENT,
+ BPF_TCP_SYN_RECV,
+ BPF_TCP_FIN_WAIT1,
+ BPF_TCP_FIN_WAIT2,
+ BPF_TCP_TIME_WAIT,
+ BPF_TCP_CLOSE,
+ BPF_TCP_CLOSE_WAIT,
+ BPF_TCP_LAST_ACK,
+ BPF_TCP_LISTEN,
+ BPF_TCP_CLOSING, /* Now a valid state */
+ BPF_TCP_NEW_SYN_RECV,
+
+ BPF_TCP_MAX_STATES /* Leave at the end! */
};
#define TCP_BPF_IW 1001 /* Set TCP initial congestion window */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index ec03a2b..cf0b861 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -271,7 +271,8 @@ struct tcp_diag_md5sig {
/* Definitions for bpf_sock_ops_cb_flags */
#define BPF_SOCK_OPS_RTO_CB_FLAG (1<<0)
#define BPF_SOCK_OPS_RETRANS_CB_FLAG (1<<1)
-#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x3 /* Mask of all currently
+#define BPF_SOCK_OPS_STATE_CB_FLAG (1<<2)
+#define BPF_SOCK_OPS_ALL_CB_FLAGS 0x7 /* Mask of all currently
* supported cb flags
*/
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 88b6244..f013ddc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2042,6 +2042,30 @@ void tcp_set_state(struct sock *sk, int state)
{
int oldstate = sk->sk_state;
+ /* We defined a new enum for TCP states that are exported in BPF
+ * so as not force the internal TCP states to be frozen. The
+ * following checks will detect if an internal state value ever
+ * differs from the BPF value. If this ever happens, then we will
+ * need to remap the internal value to the BPF value before calling
+ * tcp_call_bpf_2arg.
+ */
+ BUILD_BUG_ON((int)BPF_TCP_ESTABLISHED != (int)TCP_ESTABLISHED);
+ BUILD_BUG_ON((int)BPF_TCP_SYN_SENT != (int)TCP_SYN_SENT);
+ BUILD_BUG_ON((int)BPF_TCP_SYN_RECV != (int)TCP_SYN_RECV);
+ BUILD_BUG_ON((int)BPF_TCP_FIN_WAIT1 != (int)TCP_FIN_WAIT1);
+ BUILD_BUG_ON((int)BPF_TCP_FIN_WAIT2 != (int)TCP_FIN_WAIT2);
+ BUILD_BUG_ON((int)BPF_TCP_TIME_WAIT != (int)TCP_TIME_WAIT);
+ BUILD_BUG_ON((int)BPF_TCP_CLOSE != (int)TCP_CLOSE);
+ BUILD_BUG_ON((int)BPF_TCP_CLOSE_WAIT != (int)TCP_CLOSE_WAIT);
+ BUILD_BUG_ON((int)BPF_TCP_LAST_ACK != (int)TCP_LAST_ACK);
+ BUILD_BUG_ON((int)BPF_TCP_LISTEN != (int)TCP_LISTEN);
+ BUILD_BUG_ON((int)BPF_TCP_CLOSING != (int)TCP_CLOSING);
+ BUILD_BUG_ON((int)BPF_TCP_NEW_SYN_RECV != (int)TCP_NEW_SYN_RECV);
+ BUILD_BUG_ON((int)BPF_TCP_MAX_STATES != (int)TCP_MAX_STATES);
+
+ if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
+ tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
+
switch (state) {
case TCP_ESTABLISHED:
if (oldstate != TCP_ESTABLISHED)
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH bpf-next v6 11/11] bpf: add selftest for tcpbpf
2018-01-20 1:45 [PATCH bpf-next v6 00/11] bpf: More sock_ops callbacks Lawrence Brakmo
` (9 preceding siblings ...)
2018-01-20 1:45 ` [PATCH bpf-next v6 10/11] bpf: Add BPF_SOCK_OPS_STATE_CB Lawrence Brakmo
@ 2018-01-20 1:45 ` Lawrence Brakmo
2018-01-20 3:59 ` Alexei Starovoitov
10 siblings, 1 reply; 22+ messages in thread
From: Lawrence Brakmo @ 2018-01-20 1:45 UTC (permalink / raw)
To: netdev
Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, Neal Cardwell, Yuchung Cheng
Added a selftest for tcpbpf (sock_ops) that checks that the appropriate
callbacks occured and that it can access tcp_sock fields and that their
values are correct.
Run with command: ./test_tcpbpf_user
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
tools/include/uapi/linux/bpf.h | 74 +++++++++++++-
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
tools/testing/selftests/bpf/tcp_client.py | 52 ++++++++++
tools/testing/selftests/bpf/tcp_server.py | 79 +++++++++++++++
tools/testing/selftests/bpf/test_tcpbpf.h | 16 +++
tools/testing/selftests/bpf/test_tcpbpf_kern.c | 131 +++++++++++++++++++++++++
tools/testing/selftests/bpf/test_tcpbpf_user.c | 126 ++++++++++++++++++++++++
8 files changed, 478 insertions(+), 6 deletions(-)
create mode 100755 tools/testing/selftests/bpf/tcp_client.py
create mode 100755 tools/testing/selftests/bpf/tcp_server.py
create mode 100644 tools/testing/selftests/bpf/test_tcpbpf.h
create mode 100644 tools/testing/selftests/bpf/test_tcpbpf_kern.c
create mode 100644 tools/testing/selftests/bpf/test_tcpbpf_user.c
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index af1f49a..0e336cf8 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -17,7 +17,7 @@
#define BPF_ALU64 0x07 /* alu mode in double word width */
/* ld/ldx fields */
-#define BPF_DW 0x18 /* double word */
+#define BPF_DW 0x18 /* double word (64-bit) */
#define BPF_XADD 0xc0 /* exclusive add */
/* alu/jmp fields */
@@ -642,6 +642,14 @@ union bpf_attr {
* @optlen: length of optval in bytes
* Return: 0 or negative error
*
+ * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
+ * Set callback flags for sock_ops
+ * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
+ * @flags: flags value
+ * Return: 0 for no error
+ * -EINVAL if there is no full tcp socket
+ * bits in flags that are not supported by current kernel
+ *
* int bpf_skb_adjust_room(skb, len_diff, mode, flags)
* Grow or shrink room in sk_buff.
* @skb: pointer to skb
@@ -748,7 +756,8 @@ union bpf_attr {
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
FN(getsockopt), \
- FN(override_return),
+ FN(override_return), \
+ FN(sock_ops_cb_flags_set),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -952,8 +961,9 @@ struct bpf_map_info {
struct bpf_sock_ops {
__u32 op;
union {
- __u32 reply;
- __u32 replylong[4];
+ __u32 args[4]; /* Optionally passed to bpf program */
+ __u32 reply; /* Returned by bpf program */
+ __u32 replylong[4]; /* Optionally returned by bpf prog */
};
__u32 family;
__u32 remote_ip4; /* Stored in network byte order */
@@ -968,6 +978,27 @@ struct bpf_sock_ops {
*/
__u32 snd_cwnd;
__u32 srtt_us; /* Averaged RTT << 3 in usecs */
+ __u32 bpf_sock_ops_cb_flags; /* flags defined in uapi/linux/tcp.h */
+ __u32 state;
+ __u32 rtt_min;
+ __u32 snd_ssthresh;
+ __u32 rcv_nxt;
+ __u32 snd_nxt;
+ __u32 snd_una;
+ __u32 mss_cache;
+ __u32 ecn_flags;
+ __u32 rate_delivered;
+ __u32 rate_interval_us;
+ __u32 packets_out;
+ __u32 retrans_out;
+ __u32 total_retrans;
+ __u32 segs_in;
+ __u32 data_segs_in;
+ __u32 segs_out;
+ __u32 data_segs_out;
+ __u64 bytes_received;
+ __u64 bytes_acked;
+ __u32 sk_txhash;
};
/* List of known BPF sock_ops operators.
@@ -1003,6 +1034,41 @@ enum {
* a congestion threshold. RTTs above
* this indicate congestion
*/
+ BPF_SOCK_OPS_RTO_CB, /* Called when an RTO has triggered.
+ * Arg1: value of icsk_retransmits
+ * Arg2: value of icsk_rto
+ * Arg3: whether RTO has expired
+ */
+ BPF_SOCK_OPS_RETRANS_CB, /* Called when skb is retransmitted.
+ * Arg1: sequence number of 1st byte
+ * Arg2: # segments
+ */
+ BPF_SOCK_OPS_STATE_CB, /* Called when TCP changes state.
+ * Arg1: old_state
+ * Arg2: new_state
+ */
+};
+
+/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
+ * changes between the TCP and BPF versions. Ideally this should never happen.
+ * If it does, we need to add code to convert them before calling
+ * the BPF sock_ops function.
+ */
+enum {
+ BPF_TCP_ESTABLISHED = 1,
+ BPF_TCP_SYN_SENT,
+ BPF_TCP_SYN_RECV,
+ BPF_TCP_FIN_WAIT1,
+ BPF_TCP_FIN_WAIT2,
+ BPF_TCP_TIME_WAIT,
+ BPF_TCP_CLOSE,
+ BPF_TCP_CLOSE_WAIT,
+ BPF_TCP_LAST_ACK,
+ BPF_TCP_LISTEN,
+ BPF_TCP_CLOSING, /* Now a valid state */
+ BPF_TCP_NEW_SYN_RECV,
+
+ BPF_TCP_MAX_STATES /* Leave at the end! */
};
#define TCP_BPF_IW 1001 /* Set TCP initial congestion window */
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3a44b65..9868835 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -14,13 +14,13 @@ CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../i
LDLIBS += -lcap -lelf -lrt
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
- test_align test_verifier_log test_dev_cgroup
+ test_align test_verifier_log test_dev_cgroup test_tcpbpf_user
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
- sample_map_ret0.o
+ sample_map_ret0.o test_tcpbpf_kern.o
TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
test_offload.py
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 33cb00e..dde2c11 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -71,6 +71,8 @@ static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
static int (*bpf_getsockopt)(void *ctx, int level, int optname, void *optval,
int optlen) =
(void *) BPF_FUNC_getsockopt;
+static int (*bpf_sock_ops_cb_flags_set)(void *ctx, int flags) =
+ (void *) BPF_FUNC_sock_ops_cb_flags_set;
static int (*bpf_sk_redirect_map)(void *ctx, void *map, int key, int flags) =
(void *) BPF_FUNC_sk_redirect_map;
static int (*bpf_sock_map_update)(void *map, void *key, void *value,
diff --git a/tools/testing/selftests/bpf/tcp_client.py b/tools/testing/selftests/bpf/tcp_client.py
new file mode 100755
index 0000000..ac2ce32
--- /dev/null
+++ b/tools/testing/selftests/bpf/tcp_client.py
@@ -0,0 +1,52 @@
+#!/usr/local/bin/python
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+
+import sys, os, os.path, getopt
+import socket, time
+import subprocess
+import select
+
+def read(sock, n):
+ buf = ''
+ while len(buf) < n:
+ rem = n - len(buf)
+ try: s = sock.recv(rem)
+ except (socket.error), e: return ''
+ buf += s
+ return buf
+
+def send(sock, s):
+ total = len(s)
+ count = 0
+ while count < total:
+ try: n = sock.send(s)
+ except (socket.error), e: n = 0
+ if n == 0:
+ return count;
+ count += n
+ return count
+
+
+serverPort = int(sys.argv[1])
+HostName = socket.gethostname()
+
+time.sleep(1)
+
+# create active socket
+sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+try:
+ sock.connect((HostName, serverPort))
+except socket.error as e:
+ sys.exit(1)
+
+buf = ''
+n = 0
+while n < 1000:
+ buf += '+'
+ n += 1
+
+n = send(sock, buf)
+n = read(sock, 500)
+sys.exit(0)
diff --git a/tools/testing/selftests/bpf/tcp_server.py b/tools/testing/selftests/bpf/tcp_server.py
new file mode 100755
index 0000000..9e5db0d
--- /dev/null
+++ b/tools/testing/selftests/bpf/tcp_server.py
@@ -0,0 +1,79 @@
+#!/usr/local/bin/python
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+
+import sys, os, os.path, getopt
+import socket, time
+import subprocess
+import select
+
+def read(sock, n):
+ buf = ''
+ while len(buf) < n:
+ rem = n - len(buf)
+ try: s = sock.recv(rem)
+ except (socket.error), e: return ''
+ buf += s
+ return buf
+
+def send(sock, s):
+ total = len(s)
+ count = 0
+ while count < total:
+ try: n = sock.send(s)
+ except (socket.error), e: n = 0
+ if n == 0:
+ return count;
+ count += n
+ return count
+
+
+SERVER_PORT = 12877
+MAX_PORTS = 2
+
+serverPort = SERVER_PORT
+serverSocket = None
+
+HostName = socket.gethostname()
+
+# create passive socket
+serverSocket = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+host = socket.gethostname()
+
+while serverPort < SERVER_PORT + 5:
+ try: serverSocket.bind((host, serverPort))
+ except socket.error as msg:
+ serverPort += 1
+ continue
+ break
+
+cmdStr = ("./tcp_client.py %d &") % (serverPort)
+os.system(cmdStr)
+
+buf = ''
+n = 0
+while n < 500:
+ buf += '.'
+ n += 1
+
+serverSocket.listen(MAX_PORTS)
+readList = [serverSocket]
+
+while True:
+ readyRead, readyWrite, inError = \
+ select.select(readList, [], [], 10)
+
+ if len(readyRead) > 0:
+ waitCount = 0
+ for sock in readyRead:
+ if sock == serverSocket:
+ (clientSocket, address) = serverSocket.accept()
+ address = str(address[0])
+ readList.append(clientSocket)
+ else:
+ s = read(sock, 1000)
+ n = send(sock, buf)
+ sock.close()
+ time.sleep(1)
+ sys.exit(0)
diff --git a/tools/testing/selftests/bpf/test_tcpbpf.h b/tools/testing/selftests/bpf/test_tcpbpf.h
new file mode 100644
index 0000000..2fe4328
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpbpf.h
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef _TEST_TCPBPF_H
+#define _TEST_TCPBPF_H
+
+struct tcpbpf_globals {
+ __u32 event_map;
+ __u32 total_retrans;
+ __u32 data_segs_in;
+ __u32 data_segs_out;
+ __u32 bad_cb_test_rv;
+ __u32 good_cb_test_rv;
+ __u64 bytes_received;
+ __u64 bytes_acked;
+};
+#endif
diff --git a/tools/testing/selftests/bpf/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
new file mode 100644
index 0000000..15e97f0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/in6.h>
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/tcp.h>
+#include <netinet/in.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+#include "test_tcpbpf.h"
+
+struct bpf_map_def SEC("maps") global_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct tcpbpf_globals),
+ .max_entries = 2,
+};
+
+static inline void update_event_map(int event)
+{
+ __u32 key = 0;
+ struct tcpbpf_globals g, *gp;
+
+ gp = bpf_map_lookup_elem(&global_map, &key);
+ if (gp == NULL) {
+ struct tcpbpf_globals g = {0, 0, 0, 0, 0, 0, 0, 0};
+
+ g.event_map |= (1 << event);
+ bpf_map_update_elem(&global_map, &key, &g,
+ BPF_ANY);
+ } else {
+ g = *gp;
+ g.event_map |= (1 << event);
+ bpf_map_update_elem(&global_map, &key, &g,
+ BPF_ANY);
+ }
+}
+
+int _version SEC("version") = 1;
+
+SEC("sockops")
+int bpf_testcb(struct bpf_sock_ops *skops)
+{
+ int rv = -1;
+ int bad_call_rv = 0;
+ int good_call_rv = 0;
+ int op;
+ int init_seq = 0;
+ int v = 0;
+
+ /* For testing purposes, only execute rest of BPF program
+ * if remote port number is in the range 12877..12887
+ * I.e. the active side of the connection
+ */
+ if ((bpf_ntohl(skops->remote_port) < 12877 ||
+ bpf_ntohl(skops->remote_port) >= 12887)) {
+ skops->reply = -1;
+ return 1;
+ }
+
+ op = (int) skops->op;
+
+ update_event_map(op);
+
+ switch (op) {
+ case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+ init_seq = skops->snd_nxt;
+ /* Test failure to set largest cb flag (assumes not defined) */
+ bad_call_rv = bpf_sock_ops_cb_flags_set(skops, 0x80);
+ /* Set callback */
+ good_call_rv = bpf_sock_ops_cb_flags_set(skops,
+ BPF_SOCK_OPS_STATE_CB_FLAG);
+ /* Update results */
+ {
+ __u32 key = 0;
+ struct tcpbpf_globals g, *gp;
+
+ gp = bpf_map_lookup_elem(&global_map, &key);
+ if (!gp)
+ break;
+ g = *gp;
+ g.bad_cb_test_rv = bad_call_rv;
+ g.good_cb_test_rv = good_call_rv;
+ bpf_map_update_elem(&global_map, &key, &g,
+ BPF_ANY);
+ }
+ break;
+ case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+ init_seq = skops->snd_nxt;
+ /* Set callback */
+ good_call_rv = bpf_sock_ops_cb_flags_set(skops,
+ BPF_SOCK_OPS_STATE_CB_FLAG);
+ skops->sk_txhash = 0x12345f;
+ v = 0xff;
+ rv = bpf_setsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v,
+ sizeof(v));
+ break;
+ case BPF_SOCK_OPS_RTO_CB:
+ break;
+ case BPF_SOCK_OPS_RETRANS_CB:
+ break;
+ case BPF_SOCK_OPS_STATE_CB:
+ if (skops->args[1] == BPF_TCP_CLOSE) {
+ __u32 key = 0;
+ struct tcpbpf_globals g, *gp;
+
+ gp = bpf_map_lookup_elem(&global_map, &key);
+ if (!gp)
+ break;
+ g = *gp;
+ g.total_retrans = skops->total_retrans;
+ g.data_segs_in = skops->data_segs_in;
+ g.data_segs_out = skops->data_segs_out;
+ g.bytes_received = skops->bytes_received;
+ g.bytes_acked = skops->bytes_acked;
+ bpf_map_update_elem(&global_map, &key, &g,
+ BPF_ANY);
+ }
+ break;
+ default:
+ rv = -1;
+ }
+ skops->reply = rv;
+ return 1;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_tcpbpf_user.c b/tools/testing/selftests/bpf/test_tcpbpf_user.c
new file mode 100644
index 0000000..64f238a
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpbpf_user.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <string.h>
+#include <assert.h>
+#include <linux/perf_event.h>
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include "bpf_util.h"
+#include <linux/perf_event.h>
+#include "test_tcpbpf.h"
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+ const char *name)
+{
+ struct bpf_map *map;
+
+ map = bpf_object__find_map_by_name(obj, name);
+ if (!map) {
+ printf("%s:FAIL:map '%s' not found\n", test, name);
+ return -1;
+ }
+ return bpf_map__fd(map);
+}
+
+#define SYSTEM(CMD) \
+ do { \
+ if (system(CMD)) { \
+ printf("system(%s) FAILS!\n", CMD); \
+ } \
+ } while (0)
+
+int main(int argc, char **argv)
+{
+ struct tcpbpf_globals g = {0, 0, 0, 0, 0, 0};
+ __u32 key = 0;
+ int rv;
+ int pid;
+ bool debug_flag = false;
+ int error = EXIT_FAILURE;
+ int cg_fd, prog_fd, map_fd;
+ char cmd[100], *dir;
+ const char *file = "test_tcpbpf_kern.o";
+ struct bpf_object *obj;
+ struct stat buffer;
+
+ if (argc > 1 && strcmp(argv[1], "-d") == 0)
+ debug_flag = true;
+
+ dir = "/tmp/cgroupv2/foo";
+
+ if (stat(dir, &buffer) != 0) {
+ SYSTEM("mkdir -p /tmp/cgroupv2");
+ SYSTEM("mount -t cgroup2 none /tmp/cgroupv2");
+ SYSTEM("mkdir -p /tmp/cgroupv2/foo");
+ }
+ pid = (int) getpid();
+ sprintf(cmd, "echo %d >> /tmp/cgroupv2/foo/cgroup.procs", pid);
+ SYSTEM(cmd);
+
+ cg_fd = open(dir, O_DIRECTORY, O_RDONLY);
+ if (bpf_prog_load(file, BPF_PROG_TYPE_SOCK_OPS, &obj, &prog_fd)) {
+ printf("FAILED: load_bpf_file failed for: %s\n", file);
+ goto err;
+ }
+
+ rv = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_SOCK_OPS, 0);
+ if (rv) {
+ printf("FAILED: bpf_prog_attach: %d (%s)\n",
+ error, strerror(errno));
+ goto err;
+ }
+
+ SYSTEM("./tcp_server.py");
+
+ map_fd = bpf_find_map(__func__, obj, "global_map");
+ if (map_fd < 0)
+ goto err;
+
+ rv = bpf_map_lookup_elem(map_fd, &key, &g);
+ if (rv != 0) {
+ printf("FAILED: bpf_map_lookup_elem returns %d\n", rv);
+ goto err;
+ }
+
+ if (g.bytes_received != 501 || g.bytes_acked != 1002 ||
+ g.data_segs_in != 1 || g.data_segs_out != 1 ||
+ g.event_map != 0x45e || g.bad_cb_test_rv != 0x80 ||
+ g.good_cb_test_rv != 0) {
+ printf("FAILED: Wrong stats\n");
+ if (debug_flag) {
+ printf("\n");
+ printf("bytes_received: %d (expecting 501)\n",
+ (int)g.bytes_received);
+ printf("bytes_acked: %d (expecting 1002)\n",
+ (int)g.bytes_acked);
+ printf("data_segs_in: %d (expecting 1)\n",
+ g.data_segs_in);
+ printf("data_segs_out: %d (expecting 1)\n",
+ g.data_segs_out);
+ printf("event_map: 0x%x (expecting 0x45e)\n",
+ g.event_map);
+ printf("bad_cb_test_rv: 0x%x (expecting 0x80)\n",
+ g.bad_cb_test_rv);
+ printf("good_cb_test_rv:0x%x (expecting 0)\n",
+ g.good_cb_test_rv);
+ }
+ goto err;
+ }
+ printf("PASSED!\n");
+ error = 0;
+err:
+ bpf_prog_detach(cg_fd, BPF_CGROUP_SOCK_OPS);
+ return error;
+
+}
--
2.9.5
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH bpf-next v6 11/11] bpf: add selftest for tcpbpf
2018-01-20 1:45 ` [PATCH bpf-next v6 11/11] bpf: add selftest for tcpbpf Lawrence Brakmo
@ 2018-01-20 3:59 ` Alexei Starovoitov
0 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2018-01-20 3:59 UTC (permalink / raw)
To: Lawrence Brakmo
Cc: netdev, Kernel Team, Blake Matheny, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, Neal Cardwell, Yuchung Cheng
On Fri, Jan 19, 2018 at 05:45:48PM -0800, Lawrence Brakmo wrote:
> Added a selftest for tcpbpf (sock_ops) that checks that the appropriate
> callbacks occured and that it can access tcp_sock fields and that their
> values are correct.
>
> Run with command: ./test_tcpbpf_user
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
...
> + __u32 key = 0;
> + struct tcpbpf_globals g, *gp;
> +
> + gp = bpf_map_lookup_elem(&global_map, &key);
> + if (gp == NULL) {
> + struct tcpbpf_globals g = {0, 0, 0, 0, 0, 0, 0, 0};
> +
> + g.event_map |= (1 << event);
> + bpf_map_update_elem(&global_map, &key, &g,
> + BPF_ANY);
> + } else {
> + g = *gp;
> + g.event_map |= (1 << event);
> + bpf_map_update_elem(&global_map, &key, &g,
> + BPF_ANY);
...
> + __u32 key = 0;
> + struct tcpbpf_globals g, *gp;
> +
> + gp = bpf_map_lookup_elem(&global_map, &key);
> + if (!gp)
> + break;
> + g = *gp;
> + g.bad_cb_test_rv = bad_call_rv;
> + g.good_cb_test_rv = good_call_rv;
> + bpf_map_update_elem(&global_map, &key, &g,
> + BPF_ANY);
since 'g' is an array of one element and the tests designed
for single flow anyway, there is no need to use map_update_elem.
the program can directly assign into fields like:
gp->bad_cb_test_rv = bad_call_rv;
gp->good_cb_test_rv = good_call_rv;
probably not worth respining just for that. Mainly fyi.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 22+ messages in thread