[PATCH net-next 0/5] BPF updates

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/5] BPF updates
@ 2014-04-23 20:56 Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 1/5] net: filter: simplify label names from jump-table Daniel Borkmann
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:56 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

I had these cleanups still in my queue before the merge window.
The set is against net-next tree, but with 83d5b7ef99 ("net: filter:
initialize A and X registers") applied on top of it, so a merge
of net into net-next would be required *before* applying this set.

The main objective for these updates is that we get the code
a bit more readable/comprehensible and avoid one additional
instruction in the interpreter during fast-path.

Tested with Alexei's BPF test suite and seccomp test suite, no
issues found.

Thanks!

Daniel Borkmann (5):
  net: filter: simplify label names from jump-table
  net: filter: misc/various cleanups
  net: filter: get rid of sock_fprog_kern
  net: filter: make register namings more comprehensible
  net: filter: optimize BPF migration for ARG1/CTX handling

 include/linux/filter.h |  60 ++++--
 net/core/filter.c      | 564 ++++++++++++++++++++++++-------------------------
 net/core/sock_diag.c   |   4 +-
 3 files changed, 325 insertions(+), 303 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/5] net: filter: simplify label names from jump-table
  2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
@ 2014-04-23 20:56 ` Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 2/5] net: filter: misc/various cleanups Daniel Borkmann
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:56 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

This patch simplifies label naming for the BPF jump-table.
When we define labels via DL(), we just concatenate/textify
the combination of instruction opcode which consists of the
class, subclass, word size, target register and so on. Each
time we leave BPF_ prefix intact, so that e.g. the preprocessor
generates a label BPF_ALU_BPF_ADD_BPF_X for DL(BPF_ALU, BPF_ADD,
BPF_X) whereas a label name of ALU_ADD_X is much more easy
to grasp.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/filter.h |   3 +
 net/core/filter.c      | 308 ++++++++++++++++++++++++-------------------------
 2 files changed, 157 insertions(+), 154 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 759abf7..b042d1d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -37,6 +37,9 @@
 #define BPF_CALL	0x80	/* function call */
 #define BPF_EXIT	0x90	/* function return */
 
+/* Placeholder/dummy for 0 */
+#define BPF_0		0
+
 /* BPF has 10 general purpose 64-bit registers and stack frame. */
 #define MAX_BPF_REG	11
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 7c4db3d..a1784e9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -156,94 +156,94 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 	static const void *jumptable[256] = {
 		[0 ... 255] = &&default_label,
 		/* Now overwrite non-defaults ... */
-#define DL(A, B, C)	[A|B|C] = &&A##_##B##_##C
-		DL(BPF_ALU, BPF_ADD, BPF_X),
-		DL(BPF_ALU, BPF_ADD, BPF_K),
-		DL(BPF_ALU, BPF_SUB, BPF_X),
-		DL(BPF_ALU, BPF_SUB, BPF_K),
-		DL(BPF_ALU, BPF_AND, BPF_X),
-		DL(BPF_ALU, BPF_AND, BPF_K),
-		DL(BPF_ALU, BPF_OR, BPF_X),
-		DL(BPF_ALU, BPF_OR, BPF_K),
-		DL(BPF_ALU, BPF_LSH, BPF_X),
-		DL(BPF_ALU, BPF_LSH, BPF_K),
-		DL(BPF_ALU, BPF_RSH, BPF_X),
-		DL(BPF_ALU, BPF_RSH, BPF_K),
-		DL(BPF_ALU, BPF_XOR, BPF_X),
-		DL(BPF_ALU, BPF_XOR, BPF_K),
-		DL(BPF_ALU, BPF_MUL, BPF_X),
-		DL(BPF_ALU, BPF_MUL, BPF_K),
-		DL(BPF_ALU, BPF_MOV, BPF_X),
-		DL(BPF_ALU, BPF_MOV, BPF_K),
-		DL(BPF_ALU, BPF_DIV, BPF_X),
-		DL(BPF_ALU, BPF_DIV, BPF_K),
-		DL(BPF_ALU, BPF_MOD, BPF_X),
-		DL(BPF_ALU, BPF_MOD, BPF_K),
-		DL(BPF_ALU, BPF_NEG, 0),
-		DL(BPF_ALU, BPF_END, BPF_TO_BE),
-		DL(BPF_ALU, BPF_END, BPF_TO_LE),
-		DL(BPF_ALU64, BPF_ADD, BPF_X),
-		DL(BPF_ALU64, BPF_ADD, BPF_K),
-		DL(BPF_ALU64, BPF_SUB, BPF_X),
-		DL(BPF_ALU64, BPF_SUB, BPF_K),
-		DL(BPF_ALU64, BPF_AND, BPF_X),
-		DL(BPF_ALU64, BPF_AND, BPF_K),
-		DL(BPF_ALU64, BPF_OR, BPF_X),
-		DL(BPF_ALU64, BPF_OR, BPF_K),
-		DL(BPF_ALU64, BPF_LSH, BPF_X),
-		DL(BPF_ALU64, BPF_LSH, BPF_K),
-		DL(BPF_ALU64, BPF_RSH, BPF_X),
-		DL(BPF_ALU64, BPF_RSH, BPF_K),
-		DL(BPF_ALU64, BPF_XOR, BPF_X),
-		DL(BPF_ALU64, BPF_XOR, BPF_K),
-		DL(BPF_ALU64, BPF_MUL, BPF_X),
-		DL(BPF_ALU64, BPF_MUL, BPF_K),
-		DL(BPF_ALU64, BPF_MOV, BPF_X),
-		DL(BPF_ALU64, BPF_MOV, BPF_K),
-		DL(BPF_ALU64, BPF_ARSH, BPF_X),
-		DL(BPF_ALU64, BPF_ARSH, BPF_K),
-		DL(BPF_ALU64, BPF_DIV, BPF_X),
-		DL(BPF_ALU64, BPF_DIV, BPF_K),
-		DL(BPF_ALU64, BPF_MOD, BPF_X),
-		DL(BPF_ALU64, BPF_MOD, BPF_K),
-		DL(BPF_ALU64, BPF_NEG, 0),
-		DL(BPF_JMP, BPF_CALL, 0),
-		DL(BPF_JMP, BPF_JA, 0),
-		DL(BPF_JMP, BPF_JEQ, BPF_X),
-		DL(BPF_JMP, BPF_JEQ, BPF_K),
-		DL(BPF_JMP, BPF_JNE, BPF_X),
-		DL(BPF_JMP, BPF_JNE, BPF_K),
-		DL(BPF_JMP, BPF_JGT, BPF_X),
-		DL(BPF_JMP, BPF_JGT, BPF_K),
-		DL(BPF_JMP, BPF_JGE, BPF_X),
-		DL(BPF_JMP, BPF_JGE, BPF_K),
-		DL(BPF_JMP, BPF_JSGT, BPF_X),
-		DL(BPF_JMP, BPF_JSGT, BPF_K),
-		DL(BPF_JMP, BPF_JSGE, BPF_X),
-		DL(BPF_JMP, BPF_JSGE, BPF_K),
-		DL(BPF_JMP, BPF_JSET, BPF_X),
-		DL(BPF_JMP, BPF_JSET, BPF_K),
-		DL(BPF_JMP, BPF_EXIT, 0),
-		DL(BPF_STX, BPF_MEM, BPF_B),
-		DL(BPF_STX, BPF_MEM, BPF_H),
-		DL(BPF_STX, BPF_MEM, BPF_W),
-		DL(BPF_STX, BPF_MEM, BPF_DW),
-		DL(BPF_STX, BPF_XADD, BPF_W),
-		DL(BPF_STX, BPF_XADD, BPF_DW),
-		DL(BPF_ST, BPF_MEM, BPF_B),
-		DL(BPF_ST, BPF_MEM, BPF_H),
-		DL(BPF_ST, BPF_MEM, BPF_W),
-		DL(BPF_ST, BPF_MEM, BPF_DW),
-		DL(BPF_LDX, BPF_MEM, BPF_B),
-		DL(BPF_LDX, BPF_MEM, BPF_H),
-		DL(BPF_LDX, BPF_MEM, BPF_W),
-		DL(BPF_LDX, BPF_MEM, BPF_DW),
-		DL(BPF_LD, BPF_ABS, BPF_W),
-		DL(BPF_LD, BPF_ABS, BPF_H),
-		DL(BPF_LD, BPF_ABS, BPF_B),
-		DL(BPF_LD, BPF_IND, BPF_W),
-		DL(BPF_LD, BPF_IND, BPF_H),
-		DL(BPF_LD, BPF_IND, BPF_B),
+#define DL(A, B, C)	[BPF_##A|BPF_##B|BPF_##C] = &&A##_##B##_##C
+		DL(ALU, ADD, X),
+		DL(ALU, ADD, K),
+		DL(ALU, SUB, X),
+		DL(ALU, SUB, K),
+		DL(ALU, AND, X),
+		DL(ALU, AND, K),
+		DL(ALU, OR, X),
+		DL(ALU, OR, K),
+		DL(ALU, LSH, X),
+		DL(ALU, LSH, K),
+		DL(ALU, RSH, X),
+		DL(ALU, RSH, K),
+		DL(ALU, XOR, X),
+		DL(ALU, XOR, K),
+		DL(ALU, MUL, X),
+		DL(ALU, MUL, K),
+		DL(ALU, MOV, X),
+		DL(ALU, MOV, K),
+		DL(ALU, DIV, X),
+		DL(ALU, DIV, K),
+		DL(ALU, MOD, X),
+		DL(ALU, MOD, K),
+		DL(ALU, NEG, 0),
+		DL(ALU, END, TO_BE),
+		DL(ALU, END, TO_LE),
+		DL(ALU64, ADD, X),
+		DL(ALU64, ADD, K),
+		DL(ALU64, SUB, X),
+		DL(ALU64, SUB, K),
+		DL(ALU64, AND, X),
+		DL(ALU64, AND, K),
+		DL(ALU64, OR, X),
+		DL(ALU64, OR, K),
+		DL(ALU64, LSH, X),
+		DL(ALU64, LSH, K),
+		DL(ALU64, RSH, X),
+		DL(ALU64, RSH, K),
+		DL(ALU64, XOR, X),
+		DL(ALU64, XOR, K),
+		DL(ALU64, MUL, X),
+		DL(ALU64, MUL, K),
+		DL(ALU64, MOV, X),
+		DL(ALU64, MOV, K),
+		DL(ALU64, ARSH, X),
+		DL(ALU64, ARSH, K),
+		DL(ALU64, DIV, X),
+		DL(ALU64, DIV, K),
+		DL(ALU64, MOD, X),
+		DL(ALU64, MOD, K),
+		DL(ALU64, NEG, 0),
+		DL(JMP, CALL, 0),
+		DL(JMP, JA, 0),
+		DL(JMP, JEQ, X),
+		DL(JMP, JEQ, K),
+		DL(JMP, JNE, X),
+		DL(JMP, JNE, K),
+		DL(JMP, JGT, X),
+		DL(JMP, JGT, K),
+		DL(JMP, JGE, X),
+		DL(JMP, JGE, K),
+		DL(JMP, JSGT, X),
+		DL(JMP, JSGT, K),
+		DL(JMP, JSGE, X),
+		DL(JMP, JSGE, K),
+		DL(JMP, JSET, X),
+		DL(JMP, JSET, K),
+		DL(JMP, EXIT, 0),
+		DL(STX, MEM, B),
+		DL(STX, MEM, H),
+		DL(STX, MEM, W),
+		DL(STX, MEM, DW),
+		DL(STX, XADD, W),
+		DL(STX, XADD, DW),
+		DL(ST, MEM, B),
+		DL(ST, MEM, H),
+		DL(ST, MEM, W),
+		DL(ST, MEM, DW),
+		DL(LDX, MEM, B),
+		DL(LDX, MEM, H),
+		DL(LDX, MEM, W),
+		DL(LDX, MEM, DW),
+		DL(LD, ABS, W),
+		DL(LD, ABS, H),
+		DL(LD, ABS, B),
+		DL(LD, IND, W),
+		DL(LD, IND, H),
+		DL(LD, IND, B),
 #undef DL
 	};
 
@@ -257,93 +257,93 @@ select_insn:
 
 	/* ALU */
 #define ALU(OPCODE, OP)			\
-	BPF_ALU64_##OPCODE##_BPF_X:	\
+	ALU64_##OPCODE##_X:		\
 		A = A OP X;		\
 		CONT;			\
-	BPF_ALU_##OPCODE##_BPF_X:	\
+	ALU_##OPCODE##_X:		\
 		A = (u32) A OP (u32) X;	\
 		CONT;			\
-	BPF_ALU64_##OPCODE##_BPF_K:	\
+	ALU64_##OPCODE##_K:		\
 		A = A OP K;		\
 		CONT;			\
-	BPF_ALU_##OPCODE##_BPF_K:	\
+	ALU_##OPCODE##_K:		\
 		A = (u32) A OP (u32) K;	\
 		CONT;
 
-	ALU(BPF_ADD,  +)
-	ALU(BPF_SUB,  -)
-	ALU(BPF_AND,  &)
-	ALU(BPF_OR,   |)
-	ALU(BPF_LSH, <<)
-	ALU(BPF_RSH, >>)
-	ALU(BPF_XOR,  ^)
-	ALU(BPF_MUL,  *)
+	ALU(ADD,  +)
+	ALU(SUB,  -)
+	ALU(AND,  &)
+	ALU(OR,   |)
+	ALU(LSH, <<)
+	ALU(RSH, >>)
+	ALU(XOR,  ^)
+	ALU(MUL,  *)
 #undef ALU
-	BPF_ALU_BPF_NEG_0:
+	ALU_NEG_0:
 		A = (u32) -A;
 		CONT;
-	BPF_ALU64_BPF_NEG_0:
+	ALU64_NEG_0:
 		A = -A;
 		CONT;
-	BPF_ALU_BPF_MOV_BPF_X:
+	ALU_MOV_X:
 		A = (u32) X;
 		CONT;
-	BPF_ALU_BPF_MOV_BPF_K:
+	ALU_MOV_K:
 		A = (u32) K;
 		CONT;
-	BPF_ALU64_BPF_MOV_BPF_X:
+	ALU64_MOV_X:
 		A = X;
 		CONT;
-	BPF_ALU64_BPF_MOV_BPF_K:
+	ALU64_MOV_K:
 		A = K;
 		CONT;
-	BPF_ALU64_BPF_ARSH_BPF_X:
+	ALU64_ARSH_X:
 		(*(s64 *) &A) >>= X;
 		CONT;
-	BPF_ALU64_BPF_ARSH_BPF_K:
+	ALU64_ARSH_K:
 		(*(s64 *) &A) >>= K;
 		CONT;
-	BPF_ALU64_BPF_MOD_BPF_X:
+	ALU64_MOD_X:
 		if (unlikely(X == 0))
 			return 0;
 		tmp = A;
 		A = do_div(tmp, X);
 		CONT;
-	BPF_ALU_BPF_MOD_BPF_X:
+	ALU_MOD_X:
 		if (unlikely(X == 0))
 			return 0;
 		tmp = (u32) A;
 		A = do_div(tmp, (u32) X);
 		CONT;
-	BPF_ALU64_BPF_MOD_BPF_K:
+	ALU64_MOD_K:
 		tmp = A;
 		A = do_div(tmp, K);
 		CONT;
-	BPF_ALU_BPF_MOD_BPF_K:
+	ALU_MOD_K:
 		tmp = (u32) A;
 		A = do_div(tmp, (u32) K);
 		CONT;
-	BPF_ALU64_BPF_DIV_BPF_X:
+	ALU64_DIV_X:
 		if (unlikely(X == 0))
 			return 0;
 		do_div(A, X);
 		CONT;
-	BPF_ALU_BPF_DIV_BPF_X:
+	ALU_DIV_X:
 		if (unlikely(X == 0))
 			return 0;
 		tmp = (u32) A;
 		do_div(tmp, (u32) X);
 		A = (u32) tmp;
 		CONT;
-	BPF_ALU64_BPF_DIV_BPF_K:
+	ALU64_DIV_K:
 		do_div(A, K);
 		CONT;
-	BPF_ALU_BPF_DIV_BPF_K:
+	ALU_DIV_K:
 		tmp = (u32) A;
 		do_div(tmp, (u32) K);
 		A = (u32) tmp;
 		CONT;
-	BPF_ALU_BPF_END_BPF_TO_BE:
+	ALU_END_TO_BE:
 		switch (K) {
 		case 16:
 			A = (__force u16) cpu_to_be16(A);
@@ -356,7 +356,7 @@ select_insn:
 			break;
 		}
 		CONT;
-	BPF_ALU_BPF_END_BPF_TO_LE:
+	ALU_END_TO_LE:
 		switch (K) {
 		case 16:
 			A = (__force u16) cpu_to_le16(A);
@@ -371,7 +371,7 @@ select_insn:
 		CONT;
 
 	/* CALL */
-	BPF_JMP_BPF_CALL_0:
+	JMP_CALL_0:
 		/* Function call scratches R1-R5 registers, preserves R6-R9,
 		 * and stores return value into R0.
 		 */
@@ -380,122 +380,122 @@ select_insn:
 		CONT;
 
 	/* JMP */
-	BPF_JMP_BPF_JA_0:
+	JMP_JA_0:
 		insn += insn->off;
 		CONT;
-	BPF_JMP_BPF_JEQ_BPF_X:
+	JMP_JEQ_X:
 		if (A == X) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JEQ_BPF_K:
+	JMP_JEQ_K:
 		if (A == K) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JNE_BPF_X:
+	JMP_JNE_X:
 		if (A != X) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JNE_BPF_K:
+	JMP_JNE_K:
 		if (A != K) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JGT_BPF_X:
+	JMP_JGT_X:
 		if (A > X) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JGT_BPF_K:
+	JMP_JGT_K:
 		if (A > K) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JGE_BPF_X:
+	JMP_JGE_X:
 		if (A >= X) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JGE_BPF_K:
+	JMP_JGE_K:
 		if (A >= K) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSGT_BPF_X:
-		if (((s64)A) > ((s64)X)) {
+	JMP_JSGT_X:
+		if (((s64) A) > ((s64) X)) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSGT_BPF_K:
-		if (((s64)A) > ((s64)K)) {
+	JMP_JSGT_K:
+		if (((s64) A) > ((s64) K)) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSGE_BPF_X:
-		if (((s64)A) >= ((s64)X)) {
+	JMP_JSGE_X:
+		if (((s64) A) >= ((s64) X)) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSGE_BPF_K:
-		if (((s64)A) >= ((s64)K)) {
+	JMP_JSGE_K:
+		if (((s64) A) >= ((s64) K)) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSET_BPF_X:
+	JMP_JSET_X:
 		if (A & X) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_JSET_BPF_K:
+	JMP_JSET_K:
 		if (A & K) {
 			insn += insn->off;
 			CONT_JMP;
 		}
 		CONT;
-	BPF_JMP_BPF_EXIT_0:
+	JMP_EXIT_0:
 		return R0;
 
 	/* STX and ST and LDX*/
 #define LDST(SIZEOP, SIZE)					\
-	BPF_STX_BPF_MEM_##SIZEOP:				\
+	STX_MEM_##SIZEOP:					\
 		*(SIZE *)(unsigned long) (A + insn->off) = X;	\
 		CONT;						\
-	BPF_ST_BPF_MEM_##SIZEOP:				\
+	ST_MEM_##SIZEOP:					\
 		*(SIZE *)(unsigned long) (A + insn->off) = K;	\
 		CONT;						\
-	BPF_LDX_BPF_MEM_##SIZEOP:				\
+	LDX_MEM_##SIZEOP:					\
 		A = *(SIZE *)(unsigned long) (X + insn->off);	\
 		CONT;
 
-	LDST(BPF_B,   u8)
-	LDST(BPF_H,  u16)
-	LDST(BPF_W,  u32)
-	LDST(BPF_DW, u64)
+	LDST(B,   u8)
+	LDST(H,  u16)
+	LDST(W,  u32)
+	LDST(DW, u64)
 #undef LDST
-	BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
+	STX_XADD_W: /* lock xadd *(u32 *)(A + insn->off) += X */
 		atomic_add((u32) X, (atomic_t *)(unsigned long)
 			   (A + insn->off));
 		CONT;
-	BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
+	STX_XADD_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
 		atomic64_add((u64) X, (atomic64_t *)(unsigned long)
 			     (A + insn->off));
 		CONT;
-	BPF_LD_BPF_ABS_BPF_W: /* R0 = ntohl(*(u32 *) (skb->data + K)) */
+	LD_ABS_W: /* R0 = ntohl(*(u32 *) (skb->data + K)) */
 		off = K;
 load_word:
 		/* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
@@ -524,7 +524,7 @@ load_word:
 			CONT;
 		}
 		return 0;
-	BPF_LD_BPF_ABS_BPF_H: /* R0 = ntohs(*(u16 *) (skb->data + K)) */
+	LD_ABS_H: /* R0 = ntohs(*(u16 *) (skb->data + K)) */
 		off = K;
 load_half:
 		ptr = load_pointer((struct sk_buff *) ctx, off, 2, &tmp);
@@ -533,7 +533,7 @@ load_half:
 			CONT;
 		}
 		return 0;
-	BPF_LD_BPF_ABS_BPF_B: /* R0 = *(u8 *) (ctx + K) */
+	LD_ABS_B: /* R0 = *(u8 *) (ctx + K) */
 		off = K;
 load_byte:
 		ptr = load_pointer((struct sk_buff *) ctx, off, 1, &tmp);
@@ -542,13 +542,13 @@ load_byte:
 			CONT;
 		}
 		return 0;
-	BPF_LD_BPF_IND_BPF_W: /* R0 = ntohl(*(u32 *) (skb->data + X + K)) */
+	LD_IND_W: /* R0 = ntohl(*(u32 *) (skb->data + X + K)) */
 		off = K + X;
 		goto load_word;
-	BPF_LD_BPF_IND_BPF_H: /* R0 = ntohs(*(u16 *) (skb->data + X + K)) */
+	LD_IND_H: /* R0 = ntohs(*(u16 *) (skb->data + X + K)) */
 		off = K + X;
 		goto load_half;
-	BPF_LD_BPF_IND_BPF_B: /* R0 = *(u8 *) (skb->data + X + K) */
+	LD_IND_B: /* R0 = *(u8 *) (skb->data + X + K) */
 		off = K + X;
 		goto load_byte;
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/5] net: filter: misc/various cleanups
  2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 1/5] net: filter: simplify label names from jump-table Daniel Borkmann
@ 2014-04-23 20:56 ` Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 3/5] net: filter: get rid of sock_fprog_kern Daniel Borkmann
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:56 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

This contains only some minor misc cleanpus. We can spare us the
extra variable declaration in __skb_get_pay_offset(), we can mark
some unexpected conditions in the fast-path as unlikely(), the
cast in __get_random_u32() is rather unnecessary and in
__sk_migrate_realloc() we can remove the memcpy() and do a direct
assignment of the structs. Latter was suggested by Fengguang Wu
found with coccinelle.

Suggested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/filter.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a1784e9..2fd2293 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -57,9 +57,9 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
 		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
 	else if (k >= SKF_LL_OFF)
 		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
-
 	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
 		return ptr;
+
 	return NULL;
 }
 
@@ -68,6 +68,7 @@ static inline void *load_pointer(const struct sk_buff *skb, int k,
 {
 	if (k >= 0)
 		return skb_header_pointer(skb, k, size, buffer);
+
 	return bpf_internal_load_pointer_neg_helper(skb, k, size);
 }
 
@@ -596,9 +597,7 @@ static unsigned int pkt_type_offset(void)
 
 static u64 __skb_get_pay_offset(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 {
-	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
-
-	return __skb_get_poff(skb);
+	return __skb_get_poff((struct sk_buff *)(long) ctx);
 }
 
 static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
@@ -609,10 +608,10 @@ static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 	if (skb_is_nonlinear(skb))
 		return 0;
 
-	if (skb->len < sizeof(struct nlattr))
+	if (unlikely(skb->len < sizeof(struct nlattr)))
 		return 0;
 
-	if (A > skb->len - sizeof(struct nlattr))
+	if (unlikely(A > skb->len - sizeof(struct nlattr)))
 		return 0;
 
 	nla = nla_find((struct nlattr *) &skb->data[A], skb->len - A, X);
@@ -630,14 +629,14 @@ static u64 __skb_get_nlattr_nest(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 	if (skb_is_nonlinear(skb))
 		return 0;
 
-	if (skb->len < sizeof(struct nlattr))
+	if (unlikely(skb->len < sizeof(struct nlattr)))
 		return 0;
 
-	if (A > skb->len - sizeof(struct nlattr))
+	if (unlikely(A > skb->len - sizeof(struct nlattr)))
 		return 0;
 
 	nla = (struct nlattr *) &skb->data[A];
-	if (nla->nla_len > skb->len - A)
+	if (unlikely(nla->nla_len > skb->len - A))
 		return 0;
 
 	nla = nla_find_nested(nla, X);
@@ -655,7 +654,7 @@ static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 /* note that this only generates 32-bit random numbers */
 static u64 __get_random_u32(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 {
-	return (u64)prandom_u32();
+	return prandom_u32();
 }
 
 static bool convert_bpf_extensions(struct sock_filter *fp,
@@ -1472,7 +1471,7 @@ static struct sk_filter *__sk_migrate_realloc(struct sk_filter *fp,
 
 	fp_new = sock_kmalloc(sk, len, GFP_KERNEL);
 	if (fp_new) {
-		memcpy(fp_new, fp, sizeof(struct sk_filter));
+		*fp_new = *fp;
 		/* As we're kepping orig_prog in fp_new along,
 		 * we need to make sure we're not evicting it
 		 * from the old fp.
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/5] net: filter: get rid of sock_fprog_kern
  2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 1/5] net: filter: simplify label names from jump-table Daniel Borkmann
  2014-04-23 20:56 ` [PATCH net-next 2/5] net: filter: misc/various cleanups Daniel Borkmann
@ 2014-04-23 20:56 ` Daniel Borkmann
  2014-04-23 20:57 ` [PATCH net-next 4/5] net: filter: make register namings more comprehensible Daniel Borkmann
  2014-04-23 20:57 ` [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling Daniel Borkmann
  4 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:56 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

It is actually cleaner to just get rid of sock_fprog_kern structure.
It's not really useful as we can just use sock_fprog structure as
we do elsewhere in the kernel, this could throw some sparse false
positives though, but getting rid of the structure duplication is
probably the better way to go.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/filter.h | 13 +++++--------
 net/core/filter.c      | 24 ++++++++++++------------
 net/core/sock_diag.c   |  4 ++--
 3 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b042d1d..e58b687 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -67,11 +67,6 @@ struct compat_sock_fprog {
 };
 #endif
 
-struct sock_fprog_kern {
-	u16			len;
-	struct sock_filter	*filter;
-};
-
 struct sk_buff;
 struct sock;
 struct seccomp_data;
@@ -80,7 +75,7 @@ struct sk_filter {
 	atomic_t		refcnt;
 	u32			jited:1,	/* Is our filter JIT'ed? */
 				len:31;		/* Number of filter blocks */
-	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
+	struct sock_fprog	*orig_prog;	/* Original BPF program */
 	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter_int *filter);
@@ -97,8 +92,10 @@ static inline unsigned int sk_filter_size(unsigned int proglen)
 		   offsetof(struct sk_filter, insns[proglen]));
 }
 
-#define sk_filter_proglen(fprog)			\
-		(fprog->len * sizeof(fprog->filter[0]))
+static inline unsigned int sk_filter_prog_size(const struct sock_fprog *fprog)
+{
+	return fprog->len * sizeof(fprog->filter[0]);
+}
 
 #define SK_RUN_FILTER(filter, ctx)			\
 		(*filter->bpf_func)(ctx, filter->insnsi)
diff --git a/net/core/filter.c b/net/core/filter.c
index 2fd2293..e5e7717 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1396,17 +1396,17 @@ EXPORT_SYMBOL(sk_chk_filter);
 static int sk_store_orig_filter(struct sk_filter *fp,
 				const struct sock_fprog *fprog)
 {
-	unsigned int fsize = sk_filter_proglen(fprog);
-	struct sock_fprog_kern *fkprog;
+	unsigned int fsize = sk_filter_prog_size(fprog);
+	struct sock_fprog *op;
 
-	fp->orig_prog = kmalloc(sizeof(*fkprog), GFP_KERNEL);
+	fp->orig_prog = kmalloc(sizeof(*op), GFP_KERNEL);
 	if (!fp->orig_prog)
 		return -ENOMEM;
 
-	fkprog = fp->orig_prog;
-	fkprog->len = fprog->len;
-	fkprog->filter = kmemdup(fp->insns, fsize, GFP_KERNEL);
-	if (!fkprog->filter) {
+	op = fp->orig_prog;
+	op->len = fprog->len;
+	op->filter = kmemdup(fp->insns, fsize, GFP_KERNEL);
+	if (!op->filter) {
 		kfree(fp->orig_prog);
 		return -ENOMEM;
 	}
@@ -1416,7 +1416,7 @@ static int sk_store_orig_filter(struct sk_filter *fp,
 
 static void sk_release_orig_filter(struct sk_filter *fp)
 {
-	struct sock_fprog_kern *fprog = fp->orig_prog;
+	struct sock_fprog *fprog = fp->orig_prog;
 
 	if (fprog) {
 		kfree(fprog->filter);
@@ -1599,7 +1599,7 @@ static struct sk_filter *__sk_prepare_filter(struct sk_filter *fp,
 int sk_unattached_filter_create(struct sk_filter **pfp,
 				struct sock_fprog *fprog)
 {
-	unsigned int fsize = sk_filter_proglen(fprog);
+	unsigned int fsize = sk_filter_prog_size(fprog);
 	struct sk_filter *fp;
 
 	/* Make sure new filter is there and in the right amounts. */
@@ -1651,7 +1651,7 @@ EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
 	struct sk_filter *fp, *old_fp;
-	unsigned int fsize = sk_filter_proglen(fprog);
+	unsigned int fsize = sk_filter_prog_size(fprog);
 	unsigned int sk_fsize = sk_filter_size(fprog->len);
 	int err;
 
@@ -1799,7 +1799,7 @@ void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to)
 int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 		  unsigned int len)
 {
-	struct sock_fprog_kern *fprog;
+	struct sock_fprog *fprog;
 	struct sk_filter *filter;
 	int ret = 0;
 
@@ -1824,7 +1824,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 		goto out;
 
 	ret = -EFAULT;
-	if (copy_to_user(ubuf, fprog->filter, sk_filter_proglen(fprog)))
+	if (copy_to_user(ubuf, fprog->filter, sk_filter_prog_size(fprog)))
 		goto out;
 
 	/* Instead of bytes, the API requests to return the number
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index d7af188..6669077 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -52,7 +52,7 @@ EXPORT_SYMBOL_GPL(sock_diag_put_meminfo);
 int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk,
 			     struct sk_buff *skb, int attrtype)
 {
-	struct sock_fprog_kern *fprog;
+	struct sock_fprog *fprog;
 	struct sk_filter *filter;
 	struct nlattr *attr;
 	unsigned int flen;
@@ -69,7 +69,7 @@ int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk,
 		goto out;
 
 	fprog = filter->orig_prog;
-	flen = sk_filter_proglen(fprog);
+	flen = sk_filter_prog_size(fprog);
 
 	attr = nla_reserve(skb, attrtype, flen);
 	if (attr == NULL) {
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/5] net: filter: make register namings more comprehensible
  2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
                   ` (2 preceding siblings ...)
  2014-04-23 20:56 ` [PATCH net-next 3/5] net: filter: get rid of sock_fprog_kern Daniel Borkmann
@ 2014-04-23 20:57 ` Daniel Borkmann
  2014-04-23 20:57 ` [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling Daniel Borkmann
  4 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:57 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

The current code is a bit hard to parse on which registers can be used,
how they are mapped and all play together. It makes much more sense to
define this a bit more clearly so that the code is a bit more intuitive.
This patch cleans this up, and makes naming a bit more consistent among
the code. This also allows for moving some of the defines into the header
file. Clearing of A and X registers in __sk_run_filter() do not get a
particular register name assigned as they have not an 'official' function,
but rather just result from the concrete initial mapping. Since for BPF
helper functions for BPF_CALL we already use small letters, so be
consistent here as well.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/filter.h |  44 ++++++++--
 net/core/filter.c      | 215 +++++++++++++++++++++++++------------------------
 2 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e58b687..947dc33 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -40,16 +40,47 @@
 /* Placeholder/dummy for 0 */
 #define BPF_0		0
 
+/* Register numbers */
+enum {
+	BPF_REG_0 = 0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_6,
+	BPF_REG_7,
+	BPF_REG_8,
+	BPF_REG_9,
+	BPF_REG_10,
+	__MAX_BPF_REG,
+};
+
 /* BPF has 10 general purpose 64-bit registers and stack frame. */
-#define MAX_BPF_REG	11
+#define MAX_BPF_REG	__MAX_BPF_REG
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	BPF_REG_1
+#define BPF_REG_ARG2	BPF_REG_2
+#define BPF_REG_ARG3	BPF_REG_3
+#define BPF_REG_ARG4	BPF_REG_4
+#define BPF_REG_ARG5	BPF_REG_5
+#define BPF_REG_CTX	BPF_REG_6
+#define BPF_REG_FP	BPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	BPF_REG_0
+#define BPF_REG_X	BPF_REG_7
+#define BPF_REG_TMP	BPF_REG_8
 
 /* BPF program can access up to 512 bytes of stack space. */
 #define MAX_BPF_STACK	512
 
-/* Arg1, context and stack frame pointer register positions. */
-#define ARG1_REG	1
-#define CTX_REG		6
-#define FP_REG		10
+/* Macro to invoke filter function. */
+#define SK_RUN_FILTER(filter, ctx)  (*filter->bpf_func)(ctx, filter->insnsi)
 
 struct sock_filter_int {
 	__u8	code;		/* opcode */
@@ -97,9 +128,6 @@ static inline unsigned int sk_filter_prog_size(const struct sock_fprog *fprog)
 	return fprog->len * sizeof(fprog->filter[0]);
 }
 
-#define SK_RUN_FILTER(filter, ctx)			\
-		(*filter->bpf_func)(ctx, filter->insnsi)
-
 int sk_filter(struct sock *sk, struct sk_buff *skb);
 
 u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
diff --git a/net/core/filter.c b/net/core/filter.c
index e5e7717..eada3d5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -45,6 +45,27 @@
 #include <linux/seccomp.h>
 #include <linux/if_vlan.h>
 
+/* Registers */
+#define R0	regs[BPF_REG_0]
+#define R1	regs[BPF_REG_1]
+#define R2	regs[BPF_REG_2]
+#define R3	regs[BPF_REG_3]
+#define R4	regs[BPF_REG_4]
+#define R5	regs[BPF_REG_5]
+#define R6	regs[BPF_REG_6]
+#define R7	regs[BPF_REG_7]
+#define R8	regs[BPF_REG_8]
+#define R9	regs[BPF_REG_9]
+#define R10	regs[BPF_REG_10]
+
+/* Named registers */
+#define A	regs[insn->a_reg]
+#define X	regs[insn->x_reg]
+#define FP	regs[BPF_REG_FP]
+#define ARG1	regs[BPF_REG_ARG1]
+#define CTX	regs[BPF_REG_CTX]
+#define K	insn->imm
+
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
@@ -123,13 +144,6 @@ noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 	return 0;
 }
 
-/* Register mappings for user programs. */
-#define A_REG		0
-#define X_REG		7
-#define TMP_REG		8
-#define ARG2_REG	2
-#define ARG3_REG	3
-
 /**
  *	__sk_run_filter - run a filter on a given context
  *	@ctx: buffer to run the filter on
@@ -143,17 +157,6 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 {
 	u64 stack[MAX_BPF_STACK / sizeof(u64)];
 	u64 regs[MAX_BPF_REG], tmp;
-	void *ptr;
-	int off;
-
-#define K  insn->imm
-#define A  regs[insn->a_reg]
-#define X  regs[insn->x_reg]
-#define R0 regs[0]
-
-#define CONT	 ({insn++; goto select_insn; })
-#define CONT_JMP ({insn++; goto select_insn; })
-
 	static const void *jumptable[256] = {
 		[0 ... 255] = &&default_label,
 		/* Now overwrite non-defaults ... */
@@ -247,11 +250,18 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 		DL(LD, IND, B),
 #undef DL
 	};
+	void *ptr;
+	int off;
 
-	regs[FP_REG]  = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
-	regs[ARG1_REG] = (u64) (unsigned long) ctx;
-	regs[A_REG] = 0;
-	regs[X_REG] = 0;
+#define CONT	 ({ insn++; goto select_insn; })
+#define CONT_JMP ({ insn++; goto select_insn; })
+
+	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
+	ARG1 = (u64) (unsigned long) ctx;
+
+	/* Register for user BPF programs need to be reset first. */
+	regs[BPF_REG_A] = 0;
+	regs[BPF_REG_X] = 0;
 
 select_insn:
 	goto *jumptable[insn->code];
@@ -376,8 +386,7 @@ select_insn:
 		/* Function call scratches R1-R5 registers, preserves R6-R9,
 		 * and stores return value into R0.
 		 */
-		R0 = (__bpf_call_base + insn->imm)(regs[1], regs[2], regs[3],
-						   regs[4], regs[5]);
+		R0 = (__bpf_call_base + insn->imm)(R1, R2, R3, R4, R5);
 		CONT;
 
 	/* JMP */
@@ -501,7 +510,7 @@ select_insn:
 load_word:
 		/* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
 		 * appearing in the programs where ctx == skb. All programs
-		 * keep 'ctx' in regs[CTX_REG] == R6, sk_convert_filter()
+		 * keep 'ctx' in regs[BPF_REG_CTX] == R6, sk_convert_filter()
 		 * saves it in R6, internal BPF verifier will check that
 		 * R6 == ctx.
 		 *
@@ -557,13 +566,6 @@ load_byte:
 		/* If we ever reach this, we have a bug somewhere. */
 		WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
 		return 0;
-#undef CONT_JMP
-#undef CONT
-
-#undef R0
-#undef X
-#undef A
-#undef K
 }
 
 u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
@@ -595,12 +597,12 @@ static unsigned int pkt_type_offset(void)
 	return -1;
 }
 
-static u64 __skb_get_pay_offset(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
 	return __skb_get_poff((struct sk_buff *)(long) ctx);
 }
 
-static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
 	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
 	struct nlattr *nla;
@@ -611,17 +613,17 @@ static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 	if (unlikely(skb->len < sizeof(struct nlattr)))
 		return 0;
 
-	if (unlikely(A > skb->len - sizeof(struct nlattr)))
+	if (unlikely(a > skb->len - sizeof(struct nlattr)))
 		return 0;
 
-	nla = nla_find((struct nlattr *) &skb->data[A], skb->len - A, X);
+	nla = nla_find((struct nlattr *) &skb->data[a], skb->len - a, x);
 	if (nla)
 		return (void *) nla - (void *) skb->data;
 
 	return 0;
 }
 
-static u64 __skb_get_nlattr_nest(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+static u64 __skb_get_nlattr_nest(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
 	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
 	struct nlattr *nla;
@@ -632,27 +634,27 @@ static u64 __skb_get_nlattr_nest(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 	if (unlikely(skb->len < sizeof(struct nlattr)))
 		return 0;
 
-	if (unlikely(A > skb->len - sizeof(struct nlattr)))
+	if (unlikely(a > skb->len - sizeof(struct nlattr)))
 		return 0;
 
-	nla = (struct nlattr *) &skb->data[A];
-	if (unlikely(nla->nla_len > skb->len - A))
+	nla = (struct nlattr *) &skb->data[a];
+	if (unlikely(nla->nla_len > skb->len - a))
 		return 0;
 
-	nla = nla_find_nested(nla, X);
+	nla = nla_find_nested(nla, x);
 	if (nla)
 		return (void *) nla - (void *) skb->data;
 
 	return 0;
 }
 
-static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+static u64 __get_raw_cpu_id(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
 	return raw_smp_processor_id();
 }
 
 /* note that this only generates 32-bit random numbers */
-static u64 __get_random_u32(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+static u64 __get_random_u32(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
 {
 	return prandom_u32();
 }
@@ -667,28 +669,28 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, protocol) != 2);
 
 		insn->code = BPF_LDX | BPF_MEM | BPF_H;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, protocol);
 		insn++;
 
 		/* A = ntohs(A) [emitting a nop or swap16] */
 		insn->code = BPF_ALU | BPF_END | BPF_FROM_BE;
-		insn->a_reg = A_REG;
+		insn->a_reg = BPF_REG_A;
 		insn->imm = 16;
 		break;
 
 	case SKF_AD_OFF + SKF_AD_PKTTYPE:
 		insn->code = BPF_LDX | BPF_MEM | BPF_B;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = pkt_type_offset();
 		if (insn->off < 0)
 			return false;
 		insn++;
 
 		insn->code = BPF_ALU | BPF_AND | BPF_K;
-		insn->a_reg = A_REG;
+		insn->a_reg = BPF_REG_A;
 		insn->imm = PKT_TYPE_MAX;
 		break;
 
@@ -698,13 +700,13 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 			insn->code = BPF_LDX | BPF_MEM | BPF_DW;
 		else
 			insn->code = BPF_LDX | BPF_MEM | BPF_W;
-		insn->a_reg = TMP_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_TMP;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, dev);
 		insn++;
 
 		insn->code = BPF_JMP | BPF_JNE | BPF_K;
-		insn->a_reg = TMP_REG;
+		insn->a_reg = BPF_REG_TMP;
 		insn->imm = 0;
 		insn->off = 1;
 		insn++;
@@ -715,8 +717,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, ifindex) != 4);
 		BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, type) != 2);
 
-		insn->a_reg = A_REG;
-		insn->x_reg = TMP_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_TMP;
 
 		if (fp->k == SKF_AD_OFF + SKF_AD_IFINDEX) {
 			insn->code = BPF_LDX | BPF_MEM | BPF_W;
@@ -731,8 +733,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
 
 		insn->code = BPF_LDX | BPF_MEM | BPF_W;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, mark);
 		break;
 
@@ -740,8 +742,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
 
 		insn->code = BPF_LDX | BPF_MEM | BPF_W;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, hash);
 		break;
 
@@ -749,8 +751,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, queue_mapping) != 2);
 
 		insn->code = BPF_LDX | BPF_MEM | BPF_H;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, queue_mapping);
 		break;
 
@@ -759,8 +761,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
 
 		insn->code = BPF_LDX | BPF_MEM | BPF_H;
-		insn->a_reg = A_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_CTX;
 		insn->off = offsetof(struct sk_buff, vlan_tci);
 		insn++;
 
@@ -768,16 +770,16 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 
 		if (fp->k == SKF_AD_OFF + SKF_AD_VLAN_TAG) {
 			insn->code = BPF_ALU | BPF_AND | BPF_K;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = ~VLAN_TAG_PRESENT;
 		} else {
 			insn->code = BPF_ALU | BPF_RSH | BPF_K;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = 12;
 			insn++;
 
 			insn->code = BPF_ALU | BPF_AND | BPF_K;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = 1;
 		}
 		break;
@@ -789,20 +791,20 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 	case SKF_AD_OFF + SKF_AD_RANDOM:
 		/* arg1 = ctx */
 		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		insn->a_reg = ARG1_REG;
-		insn->x_reg = CTX_REG;
+		insn->a_reg = BPF_REG_ARG1;
+		insn->x_reg = BPF_REG_CTX;
 		insn++;
 
 		/* arg2 = A */
 		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		insn->a_reg = ARG2_REG;
-		insn->x_reg = A_REG;
+		insn->a_reg = BPF_REG_ARG2;
+		insn->x_reg = BPF_REG_A;
 		insn++;
 
 		/* arg3 = X */
 		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		insn->a_reg = ARG3_REG;
-		insn->x_reg = X_REG;
+		insn->a_reg = BPF_REG_ARG3;
+		insn->x_reg = BPF_REG_X;
 		insn++;
 
 		/* Emit call(ctx, arg2=A, arg3=X) */
@@ -828,8 +830,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
 
 	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
 		insn->code = BPF_ALU | BPF_XOR | BPF_X;
-		insn->a_reg = A_REG;
-		insn->x_reg = X_REG;
+		insn->a_reg = BPF_REG_A;
+		insn->x_reg = BPF_REG_X;
 		break;
 
 	default:
@@ -879,7 +881,7 @@ int sk_convert_filter(struct sock_filter *prog, int len,
 	u8 bpf_src;
 
 	BUILD_BUG_ON(BPF_MEMWORDS * sizeof(u32) > MAX_BPF_STACK);
-	BUILD_BUG_ON(FP_REG + 1 != MAX_BPF_REG);
+	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
 
 	if (len <= 0 || len >= BPF_MAXINSNS)
 		return -EINVAL;
@@ -896,8 +898,8 @@ do_pass:
 
 	if (new_insn) {
 		new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		new_insn->a_reg = CTX_REG;
-		new_insn->x_reg = ARG1_REG;
+		new_insn->a_reg = BPF_REG_CTX;
+		new_insn->x_reg = BPF_REG_ARG1;
 	}
 	new_insn++;
 
@@ -947,8 +949,8 @@ do_pass:
 				break;
 
 			insn->code = fp->code;
-			insn->a_reg = A_REG;
-			insn->x_reg = X_REG;
+			insn->a_reg = BPF_REG_A;
+			insn->x_reg = BPF_REG_X;
 			insn->imm = fp->k;
 			break;
 
@@ -982,16 +984,16 @@ do_pass:
 				 * in compare insn.
 				 */
 				insn->code = BPF_ALU | BPF_MOV | BPF_K;
-				insn->a_reg = TMP_REG;
+				insn->a_reg = BPF_REG_TMP;
 				insn->imm = fp->k;
 				insn++;
 
-				insn->a_reg = A_REG;
-				insn->x_reg = TMP_REG;
+				insn->a_reg = BPF_REG_A;
+				insn->x_reg = BPF_REG_TMP;
 				bpf_src = BPF_X;
 			} else {
-				insn->a_reg = A_REG;
-				insn->x_reg = X_REG;
+				insn->a_reg = BPF_REG_A;
+				insn->x_reg = BPF_REG_X;
 				insn->imm = fp->k;
 				bpf_src = BPF_SRC(fp->code);
 			}
@@ -1026,33 +1028,33 @@ do_pass:
 		/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
 		case BPF_LDX | BPF_MSH | BPF_B:
 			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-			insn->a_reg = TMP_REG;
-			insn->x_reg = A_REG;
+			insn->a_reg = BPF_REG_TMP;
+			insn->x_reg = BPF_REG_A;
 			insn++;
 
 			insn->code = BPF_LD | BPF_ABS | BPF_B;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = fp->k;
 			insn++;
 
 			insn->code = BPF_ALU | BPF_AND | BPF_K;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = 0xf;
 			insn++;
 
 			insn->code = BPF_ALU | BPF_LSH | BPF_K;
-			insn->a_reg = A_REG;
+			insn->a_reg = BPF_REG_A;
 			insn->imm = 2;
 			insn++;
 
 			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-			insn->a_reg = X_REG;
-			insn->x_reg = A_REG;
+			insn->a_reg = BPF_REG_X;
+			insn->x_reg = BPF_REG_A;
 			insn++;
 
 			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-			insn->a_reg = A_REG;
-			insn->x_reg = TMP_REG;
+			insn->a_reg = BPF_REG_A;
+			insn->x_reg = BPF_REG_TMP;
 			break;
 
 		/* RET_K, RET_A are remaped into 2 insns. */
@@ -1062,7 +1064,7 @@ do_pass:
 				     (BPF_RVAL(fp->code) == BPF_K ?
 				      BPF_K : BPF_X);
 			insn->a_reg = 0;
-			insn->x_reg = A_REG;
+			insn->x_reg = BPF_REG_A;
 			insn->imm = fp->k;
 			insn++;
 
@@ -1073,8 +1075,9 @@ do_pass:
 		case BPF_ST:
 		case BPF_STX:
 			insn->code = BPF_STX | BPF_MEM | BPF_W;
-			insn->a_reg = FP_REG;
-			insn->x_reg = fp->code == BPF_ST ? A_REG : X_REG;
+			insn->a_reg = BPF_REG_FP;
+			insn->x_reg = fp->code == BPF_ST ?
+				      BPF_REG_A : BPF_REG_X;
 			insn->off = -(BPF_MEMWORDS - fp->k) * 4;
 			break;
 
@@ -1083,8 +1086,8 @@ do_pass:
 		case BPF_LDX | BPF_MEM:
 			insn->code = BPF_LDX | BPF_MEM | BPF_W;
 			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
-				      A_REG : X_REG;
-			insn->x_reg = FP_REG;
+				      BPF_REG_A : BPF_REG_X;
+			insn->x_reg = BPF_REG_FP;
 			insn->off = -(BPF_MEMWORDS - fp->k) * 4;
 			break;
 
@@ -1093,22 +1096,22 @@ do_pass:
 		case BPF_LDX | BPF_IMM:
 			insn->code = BPF_ALU | BPF_MOV | BPF_K;
 			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
-				      A_REG : X_REG;
+				      BPF_REG_A : BPF_REG_X;
 			insn->imm = fp->k;
 			break;
 
 		/* X = A */
 		case BPF_MISC | BPF_TAX:
 			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-			insn->a_reg = X_REG;
-			insn->x_reg = A_REG;
+			insn->a_reg = BPF_REG_X;
+			insn->x_reg = BPF_REG_A;
 			break;
 
 		/* A = X */
 		case BPF_MISC | BPF_TXA:
 			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-			insn->a_reg = A_REG;
-			insn->x_reg = X_REG;
+			insn->a_reg = BPF_REG_A;
+			insn->x_reg = BPF_REG_X;
 			break;
 
 		/* A = skb->len or X = skb->len */
@@ -1116,16 +1119,16 @@ do_pass:
 		case BPF_LDX | BPF_W | BPF_LEN:
 			insn->code = BPF_LDX | BPF_MEM | BPF_W;
 			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
-				      A_REG : X_REG;
-			insn->x_reg = CTX_REG;
+				      BPF_REG_A : BPF_REG_X;
+			insn->x_reg = BPF_REG_CTX;
 			insn->off = offsetof(struct sk_buff, len);
 			break;
 
 		/* access seccomp_data fields */
 		case BPF_LDX | BPF_ABS | BPF_W:
 			insn->code = BPF_LDX | BPF_MEM | BPF_W;
-			insn->a_reg = A_REG;
-			insn->x_reg = CTX_REG;
+			insn->a_reg = BPF_REG_A;
+			insn->x_reg = BPF_REG_CTX;
 			insn->off = fp->k;
 			break;
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
                   ` (3 preceding siblings ...)
  2014-04-23 20:57 ` [PATCH net-next 4/5] net: filter: make register namings more comprehensible Daniel Borkmann
@ 2014-04-23 20:57 ` Daniel Borkmann
  2014-04-23 21:59   ` Alexei Starovoitov
  4 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 20:57 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

Currently, at initial setup in __sk_run_filter() we initialize the
BPF stack's frame-pointer and CTX register. However, instead of the
CTX register, we initialize context to ARG1, and during user filter
migration we emit *always* an instruction that copies the content
from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
ctx, A, X for call emission. However, we nevertheless copy CTX over
to ARG1 in these cases. So all in all, we always emit one instruction
at BPF program beginning that should have actually been avoided to
spare this overhead.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/filter.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index eada3d5..6fed231 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -62,7 +62,6 @@
 #define A	regs[insn->a_reg]
 #define X	regs[insn->x_reg]
 #define FP	regs[BPF_REG_FP]
-#define ARG1	regs[BPF_REG_ARG1]
 #define CTX	regs[BPF_REG_CTX]
 #define K	insn->imm
 
@@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 #define CONT_JMP ({ insn++; goto select_insn; })
 
 	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
-	ARG1 = (u64) (unsigned long) ctx;
+	CTX = (u64) (unsigned long) ctx;
 
 	/* Register for user BPF programs need to be reset first. */
 	regs[BPF_REG_A] = 0;
@@ -896,13 +895,6 @@ do_pass:
 	new_insn = new_prog;
 	fp = prog;
 
-	if (new_insn) {
-		new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		new_insn->a_reg = BPF_REG_CTX;
-		new_insn->x_reg = BPF_REG_ARG1;
-	}
-	new_insn++;
-
 	for (i = 0; i < len; fp++, i++) {
 		struct sock_filter_int tmp_insns[6] = { };
 		struct sock_filter_int *insn = tmp_insns;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 20:57 ` [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling Daniel Borkmann
@ 2014-04-23 21:59   ` Alexei Starovoitov
  2014-04-23 22:21     ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-04-23 21:59 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David S. Miller, Network Development

On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> Currently, at initial setup in __sk_run_filter() we initialize the
> BPF stack's frame-pointer and CTX register. However, instead of the
> CTX register, we initialize context to ARG1, and during user filter
> migration we emit *always* an instruction that copies the content
> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
> ctx, A, X for call emission. However, we nevertheless copy CTX over
> to ARG1 in these cases. So all in all, we always emit one instruction
> at BPF program beginning that should have actually been avoided to
> spare this overhead.
>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>

First 4 patches look great, but this one I have to disagree.
See below.

> ---
>  net/core/filter.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index eada3d5..6fed231 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -62,7 +62,6 @@
>  #define A      regs[insn->a_reg]
>  #define X      regs[insn->x_reg]
>  #define FP     regs[BPF_REG_FP]
> -#define ARG1   regs[BPF_REG_ARG1]
>  #define CTX    regs[BPF_REG_CTX]
>  #define K      insn->imm
>
> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>  #define CONT_JMP ({ insn++; goto select_insn; })
>
>         FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
> -       ARG1 = (u64) (unsigned long) ctx;
> +       CTX = (u64) (unsigned long) ctx;

R1 (ARG1) is the register that used to pass first argument to the function.
For seamless kernel->bpf->kernel transition we have to follow calling
convention, so 'void *ctx' has to go into R1 by design.
Storing it into R6 (CTX) will only work for classic filters converted
to extended.
all native ebpf filters will be broken.
In documentation we say:
    * R1 - R5   - arguments from BPF program to in-kernel function
so llvm/gcc are following this ebpf ABI.
Calling convention is the same whether to call kernel function from ebpf
or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.

By convention ld_abs/ld_ind insns are using implicit input register 'ctx' (R6),
that's why we do a copy from R1 into R6 as a first insn of the
_converted_ filter.
The cost of one insn is not zero, yes, but cost of imbalanced ABI is huge.
For example, I don't know how to teach llvm/gcc to have different ABIs
in such cases.

This dual ABI makes JITs ugly too, since now they must remember to
emit R6=R1 copy. I would like JITs to worry about instructions only
and not to deal with such discrepancies.

The whole thing becomes unclean when kernel to ebpf passes
1st argument in R6, but ebpf to kernel passes 1st argument in R1.

So I would suggest to drop this one for now.

I also want to avoid extra generated R6=R1 for converted filters.
Let's think it through first.

>         /* Register for user BPF programs need to be reset first. */
>         regs[BPF_REG_A] = 0;
> @@ -896,13 +895,6 @@ do_pass:
>         new_insn = new_prog;
>         fp = prog;
>
> -       if (new_insn) {
> -               new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> -               new_insn->a_reg = BPF_REG_CTX;
> -               new_insn->x_reg = BPF_REG_ARG1;
> -       }
> -       new_insn++;
> -
>         for (i = 0; i < len; fp++, i++) {
>                 struct sock_filter_int tmp_insns[6] = { };
>                 struct sock_filter_int *insn = tmp_insns;
> --
> 1.7.11.7
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 21:59   ` Alexei Starovoitov
@ 2014-04-23 22:21     ` Daniel Borkmann
  2014-04-23 22:37       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 22:21 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, Network Development

On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> Currently, at initial setup in __sk_run_filter() we initialize the
>> BPF stack's frame-pointer and CTX register. However, instead of the
>> CTX register, we initialize context to ARG1, and during user filter
>> migration we emit *always* an instruction that copies the content
>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>> to ARG1 in these cases. So all in all, we always emit one instruction
>> at BPF program beginning that should have actually been avoided to
>> spare this overhead.
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>
> First 4 patches look great, but this one I have to disagree.
> See below.
>
>> ---
>>   net/core/filter.c | 10 +---------
>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index eada3d5..6fed231 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -62,7 +62,6 @@
>>   #define A      regs[insn->a_reg]
>>   #define X      regs[insn->x_reg]
>>   #define FP     regs[BPF_REG_FP]
>> -#define ARG1   regs[BPF_REG_ARG1]
>>   #define CTX    regs[BPF_REG_CTX]
>>   #define K      insn->imm
>>
>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>>   #define CONT_JMP ({ insn++; goto select_insn; })
>>
>>          FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>> -       ARG1 = (u64) (unsigned long) ctx;
>> +       CTX = (u64) (unsigned long) ctx;
>
> R1 (ARG1) is the register that used to pass first argument to the function.

Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
copy ctx over to arg1 for calls, i.e.:

   /* arg1 = ctx */
   insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
   insn->a_reg = ARG1_REG;
   insn->x_reg = CTX_REG;
   insn++;

> For seamless kernel->bpf->kernel transition we have to follow calling
> convention, so 'void *ctx' has to go into R1 by design.
> Storing it into R6 (CTX) will only work for classic filters converted
> to extended.
> all native ebpf filters will be broken.

My objection was that currently, we do _not_ have _any_ users or even kernel
API for _native_ filters, at least not in mainline tree. The _main_ users we
have are currently _all_ being converted, hence this patch. Given that these
calls have likely just a minority of use cases triggered by tcpdump et al,
the majority of users still need to do this overhead/additional work.

> In documentation we say:
>      * R1 - R5   - arguments from BPF program to in-kernel function
> so llvm/gcc are following this ebpf ABI.
> Calling convention is the same whether to call kernel function from ebpf
> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.

Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
not using llvm/gcc backend here and have the internal instruction set not
exposed to user space, but even there you would need to prepare R1 - R5 to
hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
at that time just as we do with convert_bpf_extensions()?

> By convention ld_abs/ld_ind insns are using implicit input register 'ctx' (R6),
> that's why we do a copy from R1 into R6 as a first insn of the
> _converted_ filter.

Yep, and R6 stays as is here. So ld_abs/ld_ind insns are correctly using 'ctx'.

Best,

Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 22:21     ` Daniel Borkmann
@ 2014-04-23 22:37       ` Alexei Starovoitov
  2014-04-23 22:57         ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-04-23 22:37 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David S. Miller, Network Development

On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>
>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>> CTX register, we initialize context to ARG1, and during user filter
>>> migration we emit *always* an instruction that copies the content
>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>> at BPF program beginning that should have actually been avoided to
>>> spare this overhead.
>>>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>
>>
>> First 4 patches look great, but this one I have to disagree.
>> See below.
>>
>>> ---
>>>   net/core/filter.c | 10 +---------
>>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index eada3d5..6fed231 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -62,7 +62,6 @@
>>>   #define A      regs[insn->a_reg]
>>>   #define X      regs[insn->x_reg]
>>>   #define FP     regs[BPF_REG_FP]
>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>   #define CTX    regs[BPF_REG_CTX]
>>>   #define K      insn->imm
>>>
>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct
>>> sock_filter_int *insn)
>>>   #define CONT_JMP ({ insn++; goto select_insn; })
>>>
>>>          FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>> -       ARG1 = (u64) (unsigned long) ctx;
>>> +       CTX = (u64) (unsigned long) ctx;
>>
>>
>> R1 (ARG1) is the register that used to pass first argument to the
>> function.
>
>
> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
> copy ctx over to arg1 for calls, i.e.:
>
>   /* arg1 = ctx */
>
>   insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>   insn->a_reg = ARG1_REG;
>   insn->x_reg = CTX_REG;
>   insn++;
>
>
>> For seamless kernel->bpf->kernel transition we have to follow calling
>> convention, so 'void *ctx' has to go into R1 by design.
>> Storing it into R6 (CTX) will only work for classic filters converted
>> to extended.
>> all native ebpf filters will be broken.
>
>
> My objection was that currently, we do _not_ have _any_ users or even kernel
> API for _native_ filters, at least not in mainline tree. The _main_ users we
> have are currently _all_ being converted, hence this patch. Given that these
> calls have likely just a minority of use cases triggered by tcpdump et al,
> the majority of users still need to do this overhead/additional work.
>
>
>> In documentation we say:
>>      * R1 - R5   - arguments from BPF program to in-kernel function
>> so llvm/gcc are following this ebpf ABI.
>> Calling convention is the same whether to call kernel function from ebpf
>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>
>
> Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
> not using llvm/gcc backend here and have the internal instruction set not
> exposed to user space, but even there you would need to prepare R1 - R5 to
> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
> at that time just as we do with convert_bpf_extensions()?

How about then removing extra generated R6=R1 from converted and do
both in __sk_run_filter() ?
        regs[ARG1_REG] = (u64) (unsigned long) ctx;
        regs[CTX_REG] = (u64) (unsigned long) ctx;

Overhead problem will be solved and ABI is still clean.

>> By convention ld_abs/ld_ind insns are using implicit input register 'ctx'
>> (R6),
>> that's why we do a copy from R1 into R6 as a first insn of the
>> _converted_ filter.
>
>
> Yep, and R6 stays as is here. So ld_abs/ld_ind insns are correctly using
> 'ctx'.
>
> Best,
>
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 22:37       ` Alexei Starovoitov
@ 2014-04-23 22:57         ` Daniel Borkmann
  2014-04-23 23:14           ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-23 22:57 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, Network Development

On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>>> wrote:
>>>>
>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>> CTX register, we initialize context to ARG1, and during user filter
>>>> migration we emit *always* an instruction that copies the content
>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>> at BPF program beginning that should have actually been avoided to
>>>> spare this overhead.
>>>>
>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> First 4 patches look great, but this one I have to disagree.
>>> See below.
>>>
>>>> ---
>>>>    net/core/filter.c | 10 +---------
>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index eada3d5..6fed231 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -62,7 +62,6 @@
>>>>    #define A      regs[insn->a_reg]
>>>>    #define X      regs[insn->x_reg]
>>>>    #define FP     regs[BPF_REG_FP]
>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>    #define K      insn->imm
>>>>
>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct
>>>> sock_filter_int *insn)
>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>
>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>> +       CTX = (u64) (unsigned long) ctx;
>>>
>>>
>>> R1 (ARG1) is the register that used to pass first argument to the
>>> function.
>>
>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
>> copy ctx over to arg1 for calls, i.e.:
>>
>>    /* arg1 = ctx */
>>
>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>    insn->a_reg = ARG1_REG;
>>    insn->x_reg = CTX_REG;
>>    insn++;
>>
>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>> convention, so 'void *ctx' has to go into R1 by design.
>>> Storing it into R6 (CTX) will only work for classic filters converted
>>> to extended.
>>> all native ebpf filters will be broken.
>>
>> My objection was that currently, we do _not_ have _any_ users or even kernel
>> API for _native_ filters, at least not in mainline tree. The _main_ users we
>> have are currently _all_ being converted, hence this patch. Given that these
>> calls have likely just a minority of use cases triggered by tcpdump et al,
>> the majority of users still need to do this overhead/additional work.
>>
>>> In documentation we say:
>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>> so llvm/gcc are following this ebpf ABI.
>>> Calling convention is the same whether to call kernel function from ebpf
>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>
>> Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
>> not using llvm/gcc backend here and have the internal instruction set not
>> exposed to user space, but even there you would need to prepare R1 - R5 to
>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
>> at that time just as we do with convert_bpf_extensions()?
>
> How about then removing extra generated R6=R1 from converted and do
> both in __sk_run_filter() ?
>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>
> Overhead problem will be solved and ABI is still clean.

Well, I just don't understand the concerns of ABI here. Given that we do not
have any native BPF code to maintain in the kernel tree and given that we
currently do not expose the instruction set to user space, we're free to do
what we want, no? Eventually what's in mainline kernel is that dictates an
ABI, so far we have it only in kernel space and the only current user of the
ABI is the conversion function from user BPF programs. Given that helper
function calls may happen less often than executing instructions from the
rest of the code, why can't your llvm/gcc backend emit the load of ctx into
arg1? JITs don't need to treat that differently in any way, imho. Simply
the one who is generating the BPF insns needs to emit ctx into arg1 just as
he prepares the other argX registers. Btw, since we know a priori that
__skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
prepared registers, we could even go that far and avoid preparing loads for
the two.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 22:57         ` Daniel Borkmann
@ 2014-04-23 23:14           ` Alexei Starovoitov
  2014-04-24  3:05             ` Alexei Starovoitov
  2014-04-24  5:56             ` Daniel Borkmann
  0 siblings, 2 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2014-04-23 23:14 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David S. Miller, Network Development

On Wed, Apr 23, 2014 at 3:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
>>
>> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>>>
>>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>>> CTX register, we initialize context to ARG1, and during user filter
>>>>> migration we emit *always* an instruction that copies the content
>>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>>> at BPF program beginning that should have actually been avoided to
>>>>> spare this overhead.
>>>>>
>>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>>>
>>>>
>>>> First 4 patches look great, but this one I have to disagree.
>>>> See below.
>>>>
>>>>> ---
>>>>>    net/core/filter.c | 10 +---------
>>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>> index eada3d5..6fed231 100644
>>>>> --- a/net/core/filter.c
>>>>> +++ b/net/core/filter.c
>>>>> @@ -62,7 +62,6 @@
>>>>>    #define A      regs[insn->a_reg]
>>>>>    #define X      regs[insn->x_reg]
>>>>>    #define FP     regs[BPF_REG_FP]
>>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>>    #define K      insn->imm
>>>>>
>>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const
>>>>> struct
>>>>> sock_filter_int *insn)
>>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>>
>>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>>> +       CTX = (u64) (unsigned long) ctx;
>>>>
>>>>
>>>>
>>>> R1 (ARG1) is the register that used to pass first argument to the
>>>> function.
>>>
>>>
>>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we
>>> currently
>>> copy ctx over to arg1 for calls, i.e.:
>>>
>>>    /* arg1 = ctx */
>>>
>>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>>    insn->a_reg = ARG1_REG;
>>>    insn->x_reg = CTX_REG;
>>>    insn++;
>>>
>>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>>> convention, so 'void *ctx' has to go into R1 by design.
>>>> Storing it into R6 (CTX) will only work for classic filters converted
>>>> to extended.
>>>> all native ebpf filters will be broken.
>>>
>>>
>>> My objection was that currently, we do _not_ have _any_ users or even
>>> kernel
>>> API for _native_ filters, at least not in mainline tree. The _main_ users
>>> we
>>> have are currently _all_ being converted, hence this patch. Given that
>>> these
>>> calls have likely just a minority of use cases triggered by tcpdump et
>>> al,
>>> the majority of users still need to do this overhead/additional work.
>>>
>>>> In documentation we say:
>>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>>> so llvm/gcc are following this ebpf ABI.
>>>> Calling convention is the same whether to call kernel function from ebpf
>>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>>
>>>
>>> Yes, that's clear and convert_bpf_extensions() is doing that. So far
>>> we're
>>> not using llvm/gcc backend here and have the internal instruction set not
>>> exposed to user space, but even there you would need to prepare R1 - R5
>>> to
>>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into
>>> R1
>>> at that time just as we do with convert_bpf_extensions()?
>>
>>
>> How about then removing extra generated R6=R1 from converted and do
>> both in __sk_run_filter() ?
>>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>>
>> Overhead problem will be solved and ABI is still clean.
>
>
> Well, I just don't understand the concerns of ABI here. Given that we do not
> have any native BPF code to maintain in the kernel tree and given that we
> currently do not expose the instruction set to user space, we're free to do
> what we want, no? Eventually what's in mainline kernel is that dictates an
> ABI, so far we have it only in kernel space and the only current user of the
> ABI is the conversion function from user BPF programs. Given that helper

well, all the patches for llvm and the rest were posted.
Obviously they cannot go in at once, but breaking this ABI right now
makes it impossible to continue working on them.
I want to make ebpf engine to be useful for tracing filters and so on.
Are you saying forget it, this is classic bpf engine only?

> function calls may happen less often than executing instructions from the
> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
> arg1? JITs don't need to treat that differently in any way, imho. Simply

I don't think we're on the same page here. compiler at the end is just
a piece of sw and can do everything, but imbalanced ebpf ABI is really
out of normal compiler work flow.
Pretty much compiler somehow need to generate one way of passing
arguments into a function, but inside the function it needs to expect
them in different registers?! That is practically impossible to do
in normal gcc/llvm.
Even without any compiler insight.
If you insist on ctx to be in R6 only, then please kill the whole
concept of ABI in the doc. It doesn't make sense to pass a value
into a function in R1, but the function will see in R6?!

> the one who is generating the BPF insns needs to emit ctx into arg1 just as
> he prepares the other argX registers. Btw, since we know a priori that
> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
> prepared registers, we could even go that far and avoid preparing loads for
> the two.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 23:14           ` Alexei Starovoitov
@ 2014-04-24  3:05             ` Alexei Starovoitov
  2014-04-24  8:28               ` David Laight
  2014-04-24  5:56             ` Daniel Borkmann
  1 sibling, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-04-24  3:05 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David S. Miller, Network Development

On Wed, Apr 23, 2014 at 4:14 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Wed, Apr 23, 2014 at 3:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
>>>
>>> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com>
>>> wrote:
>>>>
>>>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>>>>
>>>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>>>> CTX register, we initialize context to ARG1, and during user filter
>>>>>> migration we emit *always* an instruction that copies the content
>>>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>>>> at BPF program beginning that should have actually been avoided to
>>>>>> spare this overhead.
>>>>>>
>>>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>>>>
>>>>>
>>>>> First 4 patches look great, but this one I have to disagree.
>>>>> See below.
>>>>>
>>>>>> ---
>>>>>>    net/core/filter.c | 10 +---------
>>>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>> index eada3d5..6fed231 100644
>>>>>> --- a/net/core/filter.c
>>>>>> +++ b/net/core/filter.c
>>>>>> @@ -62,7 +62,6 @@
>>>>>>    #define A      regs[insn->a_reg]
>>>>>>    #define X      regs[insn->x_reg]
>>>>>>    #define FP     regs[BPF_REG_FP]
>>>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>>>    #define K      insn->imm
>>>>>>
>>>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const
>>>>>> struct
>>>>>> sock_filter_int *insn)
>>>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>>>
>>>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>>>> +       CTX = (u64) (unsigned long) ctx;
>>>>>
>>>>>
>>>>>
>>>>> R1 (ARG1) is the register that used to pass first argument to the
>>>>> function.
>>>>
>>>>
>>>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we
>>>> currently
>>>> copy ctx over to arg1 for calls, i.e.:
>>>>
>>>>    /* arg1 = ctx */
>>>>
>>>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>>>    insn->a_reg = ARG1_REG;
>>>>    insn->x_reg = CTX_REG;
>>>>    insn++;
>>>>
>>>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>>>> convention, so 'void *ctx' has to go into R1 by design.
>>>>> Storing it into R6 (CTX) will only work for classic filters converted
>>>>> to extended.
>>>>> all native ebpf filters will be broken.
>>>>
>>>>
>>>> My objection was that currently, we do _not_ have _any_ users or even
>>>> kernel
>>>> API for _native_ filters, at least not in mainline tree. The _main_ users
>>>> we
>>>> have are currently _all_ being converted, hence this patch. Given that
>>>> these
>>>> calls have likely just a minority of use cases triggered by tcpdump et
>>>> al,
>>>> the majority of users still need to do this overhead/additional work.
>>>>
>>>>> In documentation we say:
>>>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>>>> so llvm/gcc are following this ebpf ABI.
>>>>> Calling convention is the same whether to call kernel function from ebpf
>>>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>>>
>>>>
>>>> Yes, that's clear and convert_bpf_extensions() is doing that. So far
>>>> we're
>>>> not using llvm/gcc backend here and have the internal instruction set not
>>>> exposed to user space, but even there you would need to prepare R1 - R5
>>>> to
>>>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into
>>>> R1
>>>> at that time just as we do with convert_bpf_extensions()?
>>>
>>>
>>> How about then removing extra generated R6=R1 from converted and do
>>> both in __sk_run_filter() ?
>>>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>>>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>>>
>>> Overhead problem will be solved and ABI is still clean.
>>
>>
>> Well, I just don't understand the concerns of ABI here. Given that we do not
>> have any native BPF code to maintain in the kernel tree and given that we
>> currently do not expose the instruction set to user space, we're free to do
>> what we want, no? Eventually what's in mainline kernel is that dictates an
>> ABI, so far we have it only in kernel space and the only current user of the
>> ABI is the conversion function from user BPF programs. Given that helper
>
> well, all the patches for llvm and the rest were posted.
> Obviously they cannot go in at once, but breaking this ABI right now
> makes it impossible to continue working on them.
> I want to make ebpf engine to be useful for tracing filters and so on.
> Are you saying forget it, this is classic bpf engine only?
>
>> function calls may happen less often than executing instructions from the
>> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
>> arg1? JITs don't need to treat that differently in any way, imho. Simply
>
> I don't think we're on the same page here. compiler at the end is just
> a piece of sw and can do everything, but imbalanced ebpf ABI is really
> out of normal compiler work flow.
> Pretty much compiler somehow need to generate one way of passing
> arguments into a function, but inside the function it needs to expect
> them in different registers?! That is practically impossible to do
> in normal gcc/llvm.

Have to correct myself here.
sparc is an architecture where args are passed in %o regs
and received by the callee in %i regs, so there is a precedent.
but it's a unique sliding register window architecture.
ebpf ABI is picked carefully to match x86/arm ABI one to one where
in/out regs are the same.

The following patch does the same.
It saves one extra copy insn and doesn't break ABI.
imo much better solution.

diff --git a/net/core/filter.c b/net/core/filter.c
index 78a636e60a0b..a12f62a64ed1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -242,6 +242,7 @@ unsigned int __sk_run_filter(void *ctx, const
struct sock_filter_int *insn)

        regs[FP_REG]  = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
        regs[ARG1_REG] = (u64) (unsigned long) ctx;
+       regs[CTX_REG] = regs[ARG1_REG];

 select_insn:
        goto *jumptable[insn->code];
@@ -893,13 +894,6 @@ do_pass:
        new_insn = new_prog;
        fp = prog;

-       if (new_insn) {
-               new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-               new_insn->a_reg = CTX_REG;
-               new_insn->x_reg = ARG1_REG;
-       }
-       new_insn++;
-
        for (i = 0; i < len; fp++, i++) {
                struct sock_filter_int tmp_insns[6] = { };
                struct sock_filter_int *insn = tmp_insns;

> Even without any compiler insight.
> If you insist on ctx to be in R6 only, then please kill the whole
> concept of ABI in the doc. It doesn't make sense to pass a value
> into a function in R1, but the function will see in R6?!
>
>> the one who is generating the BPF insns needs to emit ctx into arg1 just as
>> he prepares the other argX registers. Btw, since we know a priori that
>> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
>> prepared registers, we could even go that far and avoid preparing loads for
>> the two.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-23 23:14           ` Alexei Starovoitov
  2014-04-24  3:05             ` Alexei Starovoitov
@ 2014-04-24  5:56             ` Daniel Borkmann
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2014-04-24  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, Network Development

On 04/24/2014 01:14 AM, Alexei Starovoitov wrote:
...
>> Well, I just don't understand the concerns of ABI here. Given that we do not
>> have any native BPF code to maintain in the kernel tree and given that we
>> currently do not expose the instruction set to user space, we're free to do
>> what we want, no? Eventually what's in mainline kernel is that dictates an
>> ABI, so far we have it only in kernel space and the only current user of the
>> ABI is the conversion function from user BPF programs. Given that helper
>
> well, all the patches for llvm and the rest were posted.
> Obviously they cannot go in at once, but breaking this ABI right now
> makes it impossible to continue working on them.
> I want to make ebpf engine to be useful for tracing filters and so on.
> Are you saying forget it, this is classic bpf engine only?

Nope, I'm not saying that at all! As tracing has something similar, yes,
I already stated that it would probably be a good idea, i.e. when we could
JIT compile it.

>> function calls may happen less often than executing instructions from the
>> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
>> arg1? JITs don't need to treat that differently in any way, imho. Simply
>
> I don't think we're on the same page here. compiler at the end is just
> a piece of sw and can do everything, but imbalanced ebpf ABI is really
> out of normal compiler work flow.
> Pretty much compiler somehow need to generate one way of passing
> arguments into a function, but inside the function it needs to expect
> them in different registers?! That is practically impossible to do
> in normal gcc/llvm.
> Even without any compiler insight.
> If you insist on ctx to be in R6 only, then please kill the whole
> concept of ABI in the doc. It doesn't make sense to pass a value
> into a function in R1, but the function will see in R6?!

Sorry, I don't think that this is what I claimed. I don't have a strong
opinion on this particular patch, just wanted to have clarified what we
have with ABI here as there's no particular freeze from the code we have
in the kernel. Anyway, I'll change it into the other variant and respin
the set, as that will already spare us a bit unnecessary of work in
fast-path.

>> the one who is generating the BPF insns needs to emit ctx into arg1 just as
>> he prepares the other argX registers. Btw, since we know a priori that
>> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
>> prepared registers, we could even go that far and avoid preparing loads for
>> the two.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-24  3:05             ` Alexei Starovoitov
@ 2014-04-24  8:28               ` David Laight
  2014-04-24 15:55                 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2014-04-24  8:28 UTC (permalink / raw)
  To: 'Alexei Starovoitov', Daniel Borkmann
  Cc: David S. Miller, Network Development

From: Alexei Starovoitov
...
> Have to correct myself here.
> sparc is an architecture where args are passed in %o regs
> and received by the callee in %i regs, so there is a precedent.
...

Not actually true, the parameters are still in the %o regs
in the called function.
It is the called function that rotates the register window,
not the act of making the function call.
Small leaf functions will typically use the existing register
window.

	David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
  2014-04-24  8:28               ` David Laight
@ 2014-04-24 15:55                 ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2014-04-24 15:55 UTC (permalink / raw)
  To: David.Laight; +Cc: ast, dborkman, netdev

From: David Laight <David.Laight@ACULAB.COM>
Date: Thu, 24 Apr 2014 08:28:56 +0000

> From: Alexei Starovoitov
> ...
>> Have to correct myself here.
>> sparc is an architecture where args are passed in %o regs
>> and received by the callee in %i regs, so there is a precedent.
> ...
> 
> Not actually true, the parameters are still in the %o regs
> in the called function.
> It is the called function that rotates the register window,
> not the act of making the function call.
> Small leaf functions will typically use the existing register
> window.

Correct.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-04-24 15:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 1/5] net: filter: simplify label names from jump-table Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 2/5] net: filter: misc/various cleanups Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 3/5] net: filter: get rid of sock_fprog_kern Daniel Borkmann
2014-04-23 20:57 ` [PATCH net-next 4/5] net: filter: make register namings more comprehensible Daniel Borkmann
2014-04-23 20:57 ` [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling Daniel Borkmann
2014-04-23 21:59   ` Alexei Starovoitov
2014-04-23 22:21     ` Daniel Borkmann
2014-04-23 22:37       ` Alexei Starovoitov
2014-04-23 22:57         ` Daniel Borkmann
2014-04-23 23:14           ` Alexei Starovoitov
2014-04-24  3:05             ` Alexei Starovoitov
2014-04-24  8:28               ` David Laight
2014-04-24 15:55                 ` David Miller
2014-04-24  5:56             ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).