Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next 09/12] nfp: bpf: fix return address from register-saving subroutine to callee
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.

This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index e8b03d8f54f7..74423d3e714d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3340,10 +3340,25 @@ static const instr_cb_t instr_cb[256] = {
 };
 
 /* --- Assembler logic --- */
+static int
+nfp_fixup_immed_relo(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		     struct nfp_insn_meta *jmp_dst, u32 br_idx)
+{
+	if (immed_get_value(nfp_prog->prog[br_idx + 1])) {
+		pr_err("BUG: failed to fix up callee register saving\n");
+		return -EINVAL;
+	}
+
+	immed_set_value(&nfp_prog->prog[br_idx + 1], jmp_dst->off);
+
+	return 0;
+}
+
 static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 {
 	struct nfp_insn_meta *meta, *jmp_dst;
 	u32 idx, br_idx;
+	int err;
 
 	list_for_each_entry(meta, &nfp_prog->insns, l) {
 		if (meta->skip)
@@ -3380,7 +3395,7 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 
 		/* Leave special branches for later */
 		if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
-		    RELO_BR_REL)
+		    RELO_BR_REL && !is_mbpf_pseudo_call(meta))
 			continue;
 
 		if (!meta->jmp_dst) {
@@ -3395,6 +3410,17 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 			return -ELOOP;
 		}
 
+		if (is_mbpf_pseudo_call(meta)) {
+			err = nfp_fixup_immed_relo(nfp_prog, meta,
+						   jmp_dst, br_idx);
+			if (err)
+				return err;
+		}
+
+		if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
+		    RELO_BR_REL)
+			continue;
+
 		for (idx = meta->off; idx <= br_idx; idx++) {
 			if (!nfp_is_br(nfp_prog->prog[idx]))
 				continue;
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 08/12] nfp: bpf: update fixup function for BPF-to-BPF calls support
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: netdev, oss-drivers, Quentin Monnet, Jiong Wang
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.

When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 25 ++++++++++++++++++++++---
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  2 ++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 2d2c9148bd44..e8b03d8f54f7 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3116,7 +3116,7 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 static int
 bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	u32 ret_tgt, stack_depth;
+	u32 ret_tgt, stack_depth, offset_br;
 	swreg tmp_reg;
 
 	stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN);
@@ -3160,6 +3160,7 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
 	emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
 		     RELO_BR_GO_CALL_PUSH_REGS);
+	offset_br = nfp_prog_current_offset(nfp_prog);
 	wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
 	wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL);
 
@@ -3176,6 +3177,9 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 		wrp_nops(nfp_prog, 3);
 	}
 
+	meta->num_insns_after_br = nfp_prog_current_offset(nfp_prog);
+	meta->num_insns_after_br -= offset_br;
+
 	return 0;
 }
 
@@ -3344,21 +3348,36 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 	list_for_each_entry(meta, &nfp_prog->insns, l) {
 		if (meta->skip)
 			continue;
-		if (meta->insn.code == (BPF_JMP | BPF_CALL))
-			continue;
 		if (BPF_CLASS(meta->insn.code) != BPF_JMP)
 			continue;
+		if (meta->insn.code == (BPF_JMP | BPF_EXIT) &&
+		    !nfp_is_main_function(meta))
+			continue;
+		if (is_mbpf_helper_call(meta))
+			continue;
 
 		if (list_is_last(&meta->l, &nfp_prog->insns))
 			br_idx = nfp_prog->last_bpf_off;
 		else
 			br_idx = list_next_entry(meta, l)->off - 1;
 
+		/* For BPF-to-BPF function call, a stack adjustment sequence is
+		 * generated after the return instruction. Therefore, we must
+		 * withdraw the length of this sequence to have br_idx pointing
+		 * to where the "branch" NFP instruction is expected to be.
+		 */
+		if (is_mbpf_pseudo_call(meta))
+			br_idx -= meta->num_insns_after_br;
+
 		if (!nfp_is_br(nfp_prog->prog[br_idx])) {
 			pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n",
 			       br_idx, meta->insn.code, nfp_prog->prog[br_idx]);
 			return -ELOOP;
 		}
+
+		if (meta->insn.code == (BPF_JMP | BPF_EXIT))
+			continue;
+
 		/* Leave special branches for later */
 		if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
 		    RELO_BR_REL)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index d9695bc316dd..1cef5136c198 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -283,6 +283,7 @@ struct nfp_bpf_reg_state {
  * @xadd_maybe_16bit: 16bit immediate is possible
  * @jmp_dst: destination info for jump instructions
  * @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB
+ * @num_insns_after_br: number of insns following a branch jump, used for fixup
  * @func_id: function id for call instructions
  * @arg1: arg1 for call instructions
  * @arg2: arg2 for call instructions
@@ -319,6 +320,7 @@ struct nfp_insn_meta {
 		struct {
 			struct nfp_insn_meta *jmp_dst;
 			bool jump_neg_op;
+			u32 num_insns_after_br; /* only for BPF-to-BPF calls */
 		};
 		/* function calls */
 		struct {
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 07/12] nfp: bpf: account for additional stack usage when checking stack limit
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.

In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  8 ---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 68 +++++++++++++++++++++++
 2 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 2ebd13b29c97..49c7bead8113 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -252,17 +252,9 @@ nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net *nn,
 static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
 {
 	struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
-	unsigned int stack_size;
 	unsigned int max_instr;
 	int err;
 
-	stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
-	if (prog->aux->stack_depth > stack_size) {
-		nn_info(nn, "stack too large: program %dB > FW stack %dB\n",
-			prog->aux->stack_depth, stack_size);
-		return -EOPNOTSUPP;
-	}
-
 	max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
 	nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index cc1b2c601f4e..81a463726d55 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -34,10 +34,12 @@
 #include <linux/bpf.h>
 #include <linux/bpf_verifier.h>
 #include <linux/kernel.h>
+#include <linux/netdevice.h>
 #include <linux/pkt_cls.h>
 
 #include "../nfp_app.h"
 #include "../nfp_main.h"
+#include "../nfp_net.h"
 #include "fw.h"
 #include "main.h"
 
@@ -662,10 +664,67 @@ nfp_assign_subprog_idx(struct bpf_verifier_env *env, struct nfp_prog *nfp_prog)
 	return 0;
 }
 
+static unsigned int
+nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned int cnt)
+{
+	struct nfp_insn_meta *meta = nfp_prog_first_meta(nfp_prog);
+	unsigned int max_depth = 0, depth = 0, frame = 0;
+	struct nfp_insn_meta *ret_insn[MAX_CALL_FRAMES];
+	unsigned short frame_depths[MAX_CALL_FRAMES];
+	unsigned short ret_prog[MAX_CALL_FRAMES];
+	unsigned short idx = meta->subprog_idx;
+
+	/* Inspired from check_max_stack_depth() from kernel verifier.
+	 * Starting from main subprogram, walk all instructions and recursively
+	 * walk all callees that given subprogram can call. Since recursion is
+	 * prevented by the kernel verifier, this algorithm only needs a local
+	 * stack of MAX_CALL_FRAMES to remember callsites.
+	 */
+process_subprog:
+	frame_depths[frame] = nfp_prog->subprog[idx].stack_depth;
+	frame_depths[frame] = round_up(frame_depths[frame], STACK_FRAME_ALIGN);
+	depth += frame_depths[frame];
+	max_depth = max(max_depth, depth);
+
+continue_subprog:
+	for (; meta != nfp_prog_last_meta(nfp_prog) && meta->subprog_idx == idx;
+	     meta = nfp_meta_next(meta)) {
+		if (!is_mbpf_pseudo_call(meta))
+			continue;
+
+		/* We found a call to a subprogram. Remember instruction to
+		 * return to and subprog id.
+		 */
+		ret_insn[frame] = nfp_meta_next(meta);
+		ret_prog[frame] = idx;
+
+		/* Find the callee and start processing it. */
+		meta = nfp_bpf_goto_meta(nfp_prog, meta,
+					 meta->n + 1 + meta->insn.imm, cnt);
+		idx = meta->subprog_idx;
+		frame++;
+		goto process_subprog;
+	}
+	/* End of for() loop means the last instruction of the subprog was
+	 * reached. If we popped all stack frames, return; otherwise, go on
+	 * processing remaining instructions from the caller.
+	 */
+	if (frame == 0)
+		return max_depth;
+
+	depth -= frame_depths[frame];
+	frame--;
+	meta = ret_insn[frame];
+	idx = ret_prog[frame];
+	goto continue_subprog;
+}
+
 static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
+	unsigned int stack_size, stack_needed;
 	struct bpf_subprog_info *info;
 	struct nfp_prog *nfp_prog;
+	struct nfp_net *nn;
 	int i;
 
 	nfp_prog = env->prog->aux->offload->dev_priv;
@@ -690,6 +749,15 @@ static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 		nfp_prog->subprog[i].stack_depth += BPF_REG_SIZE * 4;
 	}
 
+	nn = netdev_priv(env->prog->aux->offload->netdev);
+	stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
+	stack_needed = nfp_bpf_get_stack_usage(nfp_prog, env->prog->len);
+	if (stack_needed > stack_size) {
+		pr_vlog(env, "stack too large: program %dB > FW stack %dB\n",
+			stack_needed, stack_size);
+		return -EOPNOTSUPP;
+	}
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 06/12] nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: netdev, oss-drivers, Quentin Monnet, Jiong Wang
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.

The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.

For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).

Return to the caller is performed with the RTN instruction newly added
to the JIT.

For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.

Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 235 +++++++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |  20 ++
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |   1 -
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  34 +++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h      |   9 +
 5 files changed, 295 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index ccb80a5ac828..2d2c9148bd44 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -267,6 +267,38 @@ emit_br_bset(struct nfp_prog *nfp_prog, swreg src, u8 bit, u16 addr, u8 defer)
 }
 
 static void
+__emit_br_alu(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
+	      u8 defer, bool dst_lmextn, bool src_lmextn)
+{
+	u64 insn;
+
+	insn = OP_BR_ALU_BASE |
+		FIELD_PREP(OP_BR_ALU_A_SRC, areg) |
+		FIELD_PREP(OP_BR_ALU_B_SRC, breg) |
+		FIELD_PREP(OP_BR_ALU_DEFBR, defer) |
+		FIELD_PREP(OP_BR_ALU_IMM_HI, imm_hi) |
+		FIELD_PREP(OP_BR_ALU_SRC_LMEXTN, src_lmextn) |
+		FIELD_PREP(OP_BR_ALU_DST_LMEXTN, dst_lmextn);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static void emit_rtn(struct nfp_prog *nfp_prog, swreg base, u8 defer)
+{
+	struct nfp_insn_ur_regs reg;
+	int err;
+
+	err = swreg_to_unrestricted(reg_none(), base, reg_imm(0), &reg);
+	if (err) {
+		nfp_prog->error = err;
+		return;
+	}
+
+	__emit_br_alu(nfp_prog, reg.areg, reg.breg, 0, defer, reg.dst_lmextn,
+		      reg.src_lmextn);
+}
+
+static void
 __emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
 	     enum immed_width width, bool invert,
 	     enum immed_shift shift, bool wr_both,
@@ -3081,7 +3113,73 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return wrp_test_reg(nfp_prog, meta, ALU_OP_XOR, BR_BNE);
 }
 
-static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+static int
+bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	u32 ret_tgt, stack_depth;
+	swreg tmp_reg;
+
+	stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN);
+	/* Space for saving the return address is accounted for by the callee,
+	 * so stack_depth can be zero for the main function.
+	 */
+	if (stack_depth) {
+		tmp_reg = ur_load_imm_any(nfp_prog, stack_depth,
+					  stack_imm(nfp_prog));
+		emit_alu(nfp_prog, stack_reg(nfp_prog),
+			 stack_reg(nfp_prog), ALU_OP_ADD, tmp_reg);
+		emit_csr_wr(nfp_prog, stack_reg(nfp_prog),
+			    NFP_CSR_ACT_LM_ADDR0);
+	}
+
+	/* The following steps are performed:
+	 *     1. Put the start offset of the callee into imm_b(). This will
+	 *        require a fixup step, as we do not necessarily know this
+	 *        address yet.
+	 *     2. Put the return address from the callee to the caller into
+	 *        register ret_reg().
+	 *     3. (After defer slots are consumed) Jump to the subroutine that
+	 *        pushes the registers to the stack.
+	 * The subroutine acts as a trampoline, and returns to the address in
+	 * imm_b(), i.e. jumps to the callee.
+	 *
+	 * Using ret_reg() to pass the return address to the callee is set here
+	 * as a convention. The callee can then push this address onto its
+	 * stack frame in its prologue. The advantages of passing the return
+	 * address through ret_reg(), instead of pushing it to the stack right
+	 * here, are the following:
+	 * - It looks cleaner.
+	 * - If the called function is called multiple time, we get a lower
+	 *   program size.
+	 * - We save two no-op instructions that should be added just before
+	 *   the emit_br() when stack depth is not null otherwise.
+	 * - If we ever find a register to hold the return address during whole
+	 *   execution of the callee, we will not have to push the return
+	 *   address to the stack for leaf functions.
+	 */
+	ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
+	emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
+		     RELO_BR_GO_CALL_PUSH_REGS);
+	wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
+	wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL);
+
+	if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
+		return -EINVAL;
+
+	if (stack_depth) {
+		tmp_reg = ur_load_imm_any(nfp_prog, stack_depth,
+					  stack_imm(nfp_prog));
+		emit_alu(nfp_prog, stack_reg(nfp_prog),
+			 stack_reg(nfp_prog), ALU_OP_SUB, tmp_reg);
+		emit_csr_wr(nfp_prog, stack_reg(nfp_prog),
+			    NFP_CSR_ACT_LM_ADDR0);
+		wrp_nops(nfp_prog, 3);
+	}
+
+	return 0;
+}
+
+static int helper_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	switch (meta->insn.imm) {
 	case BPF_FUNC_xdp_adjust_head:
@@ -3102,6 +3200,19 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	}
 }
 
+static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	if (is_mbpf_pseudo_call(meta))
+		return bpf_to_bpf_call(nfp_prog, meta);
+	else
+		return helper_call(nfp_prog, meta);
+}
+
+static bool nfp_is_main_function(struct nfp_insn_meta *meta)
+{
+	return meta->subprog_idx == 0;
+}
+
 static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 0, RELO_BR_GO_OUT);
@@ -3109,6 +3220,30 @@ static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return 0;
 }
 
+static int
+nfp_subprog_epilogue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	/* Pop R6~R9 to the stack via related subroutine.
+	 * Pop return address for BPF-to-BPF call from the stack and load it
+	 * into ret_reg() before we jump. This means that the subroutine does
+	 * not come back here, we make it jump back to the subprogram caller
+	 * directly!
+	 */
+	emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 1,
+		     RELO_BR_GO_CALL_POP_REGS);
+	wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0));
+
+	return 0;
+}
+
+static int jmp_exit(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	if (nfp_is_main_function(meta))
+		return goto_out(nfp_prog, meta);
+	else
+		return nfp_subprog_epilogue(nfp_prog, meta);
+}
+
 static const instr_cb_t instr_cb[256] = {
 	[BPF_ALU64 | BPF_MOV | BPF_X] =	mov_reg64,
 	[BPF_ALU64 | BPF_MOV | BPF_K] =	mov_imm64,
@@ -3197,7 +3332,7 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_JMP | BPF_JSET | BPF_X] =	jset_reg,
 	[BPF_JMP | BPF_JNE | BPF_X] =	jne_reg,
 	[BPF_JMP | BPF_CALL] =		call,
-	[BPF_JMP | BPF_EXIT] =		goto_out,
+	[BPF_JMP | BPF_EXIT] =		jmp_exit,
 };
 
 /* --- Assembler logic --- */
@@ -3258,6 +3393,27 @@ static void nfp_intro(struct nfp_prog *nfp_prog)
 		 plen_reg(nfp_prog), ALU_OP_AND, pv_len(nfp_prog));
 }
 
+static void
+nfp_subprog_prologue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	/* Save return address into the stack. */
+	wrp_mov(nfp_prog, reg_lm(0, 0), ret_reg(nfp_prog));
+}
+
+static void
+nfp_start_subprog(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	unsigned int depth = nfp_prog->subprog[meta->subprog_idx].stack_depth;
+
+	nfp_prog->stack_frame_depth = round_up(depth, 4);
+	nfp_subprog_prologue(nfp_prog, meta);
+}
+
+bool nfp_is_subprog_start(struct nfp_insn_meta *meta)
+{
+	return meta->flags & FLAG_INSN_IS_SUBPROG_START;
+}
+
 static void nfp_outro_tc_da(struct nfp_prog *nfp_prog)
 {
 	/* TC direct-action mode:
@@ -3348,6 +3504,56 @@ static void nfp_outro_xdp(struct nfp_prog *nfp_prog)
 	emit_ld_field(nfp_prog, reg_a(0), 0xc, reg_b(2), SHF_SC_L_SHF, 16);
 }
 
+static void nfp_push_callee_registers(struct nfp_prog *nfp_prog)
+{
+	u8 reg;
+
+	/* Subroutine: Save all callee saved registers (R6 ~ R9).
+	 * imm_b() holds the return address.
+	 */
+	nfp_prog->tgt_call_push_regs = nfp_prog_current_offset(nfp_prog);
+	for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) {
+		u8 adj = (reg - BPF_REG_0) * 2;
+		u8 idx = (reg - BPF_REG_6) * 2;
+
+		/* The first slot in the stack frame is used to push the return
+		 * address in bpf_to_bpf_call(), start just after.
+		 */
+		wrp_mov(nfp_prog, reg_lm(0, 1 + idx), reg_b(adj));
+
+		if (reg == BPF_REG_8)
+			/* Prepare to jump back, last 3 insns use defer slots */
+			emit_rtn(nfp_prog, imm_b(nfp_prog), 3);
+
+		wrp_mov(nfp_prog, reg_lm(0, 1 + idx + 1), reg_b(adj + 1));
+	}
+}
+
+static void nfp_pop_callee_registers(struct nfp_prog *nfp_prog)
+{
+	u8 reg;
+
+	/* Subroutine: Restore all callee saved registers (R6 ~ R9).
+	 * ret_reg() holds the return address.
+	 */
+	nfp_prog->tgt_call_pop_regs = nfp_prog_current_offset(nfp_prog);
+	for (reg = BPF_REG_6; reg <= BPF_REG_9; reg++) {
+		u8 adj = (reg - BPF_REG_0) * 2;
+		u8 idx = (reg - BPF_REG_6) * 2;
+
+		/* The first slot in the stack frame holds the return address,
+		 * start popping just after that.
+		 */
+		wrp_mov(nfp_prog, reg_both(adj), reg_lm(0, 1 + idx));
+
+		if (reg == BPF_REG_8)
+			/* Prepare to jump back, last 3 insns use defer slots */
+			emit_rtn(nfp_prog, ret_reg(nfp_prog), 3);
+
+		wrp_mov(nfp_prog, reg_both(adj + 1), reg_lm(0, 1 + idx + 1));
+	}
+}
+
 static void nfp_outro(struct nfp_prog *nfp_prog)
 {
 	switch (nfp_prog->type) {
@@ -3360,13 +3566,23 @@ static void nfp_outro(struct nfp_prog *nfp_prog)
 	default:
 		WARN_ON(1);
 	}
+
+	if (nfp_prog->subprog_cnt == 1)
+		return;
+
+	nfp_push_callee_registers(nfp_prog);
+	nfp_pop_callee_registers(nfp_prog);
 }
 
 static int nfp_translate(struct nfp_prog *nfp_prog)
 {
 	struct nfp_insn_meta *meta;
+	unsigned int depth;
 	int err;
 
+	depth = nfp_prog->subprog[0].stack_depth;
+	nfp_prog->stack_frame_depth = round_up(depth, 4);
+
 	nfp_intro(nfp_prog);
 	if (nfp_prog->error)
 		return nfp_prog->error;
@@ -3376,6 +3592,12 @@ static int nfp_translate(struct nfp_prog *nfp_prog)
 
 		meta->off = nfp_prog_current_offset(nfp_prog);
 
+		if (nfp_is_subprog_start(meta)) {
+			nfp_start_subprog(nfp_prog, meta);
+			if (nfp_prog->error)
+				return nfp_prog->error;
+		}
+
 		if (meta->skip) {
 			nfp_prog->n_translated++;
 			continue;
@@ -4069,6 +4291,7 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
 	for (i = 0; i < nfp_prog->prog_len; i++) {
 		enum nfp_relo_type special;
 		u32 val;
+		u16 off;
 
 		special = FIELD_GET(OP_RELO_TYPE, prog[i]);
 		switch (special) {
@@ -4085,6 +4308,14 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
 			br_set_offset(&prog[i],
 				      nfp_prog->tgt_abort + bv->start_off);
 			break;
+		case RELO_BR_GO_CALL_PUSH_REGS:
+			off = nfp_prog->tgt_call_push_regs + bv->start_off;
+			br_set_offset(&prog[i], off);
+			break;
+		case RELO_BR_GO_CALL_POP_REGS:
+			off = nfp_prog->tgt_call_pop_regs + bv->start_off;
+			br_set_offset(&prog[i], off);
+			break;
 		case RELO_BR_NEXT_PKT:
 			br_set_offset(&prog[i], bv->tgt_done);
 			break;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 20a98ce4b345..d9695bc316dd 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -61,6 +61,8 @@ enum nfp_relo_type {
 	/* internal jumps to parts of the outro */
 	RELO_BR_GO_OUT,
 	RELO_BR_GO_ABORT,
+	RELO_BR_GO_CALL_PUSH_REGS,
+	RELO_BR_GO_CALL_POP_REGS,
 	/* external jumps to fixed addresses */
 	RELO_BR_NEXT_PKT,
 	RELO_BR_HELPER,
@@ -104,6 +106,7 @@ enum pkt_vec {
 #define imma_a(np)	reg_a(STATIC_REG_IMMA)
 #define imma_b(np)	reg_b(STATIC_REG_IMMA)
 #define imm_both(np)	reg_both(STATIC_REG_IMM)
+#define ret_reg(np)	imm_a(np)
 
 #define NFP_BPF_ABI_FLAGS	reg_imm(0)
 #define   NFP_BPF_ABI_FLAG_MARK	1
@@ -290,6 +293,7 @@ struct nfp_bpf_reg_state {
  * @off: index of first generated machine instruction (in nfp_prog.prog)
  * @n: eBPF instruction number
  * @flags: eBPF instruction extra optimization flags
+ * @subprog_idx: index of subprogram to which the instruction belongs
  * @skip: skip this instruction (optimized out)
  * @double_cb: callback for second part of the instruction
  * @l: link on nfp_prog->insns list
@@ -336,6 +340,7 @@ struct nfp_insn_meta {
 	unsigned int off;
 	unsigned short n;
 	unsigned short flags;
+	unsigned short subprog_idx;
 	bool skip;
 	instr_cb_t double_cb;
 
@@ -432,6 +437,16 @@ static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta)
 		insn.src_reg != BPF_PSEUDO_CALL;
 }
 
+static inline bool is_mbpf_pseudo_call(const struct nfp_insn_meta *meta)
+{
+	struct bpf_insn insn = meta->insn;
+
+	return insn.code == (BPF_JMP | BPF_CALL) &&
+		insn.src_reg == BPF_PSEUDO_CALL;
+}
+
+#define STACK_FRAME_ALIGN 64
+
 /**
  * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
  * @stack_depth:	maximum stack depth used by this sub-program
@@ -451,6 +466,8 @@ struct nfp_bpf_subprog_info {
  * @last_bpf_off: address of the last instruction translated from BPF
  * @tgt_out: jump target for normal exit
  * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
+ * @tgt_call_push_regs: jump target for subroutine for saving R6~R9 to stack
+ * @tgt_call_pop_regs: jump target for subroutine used for restoring R6~R9
  * @n_translated: number of successfully translated instructions (for errors)
  * @error: error code if something went wrong
  * @stack_frame_depth: max stack depth for current frame
@@ -475,6 +492,8 @@ struct nfp_prog {
 	unsigned int last_bpf_off;
 	unsigned int tgt_out;
 	unsigned int tgt_abort;
+	unsigned int tgt_call_push_regs;
+	unsigned int tgt_call_pop_regs;
 
 	unsigned int n_translated;
 	int error;
@@ -502,6 +521,7 @@ struct nfp_bpf_vnic {
 	unsigned int tgt_done;
 };
 
+bool nfp_is_subprog_start(struct nfp_insn_meta *meta);
 void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt);
 int nfp_bpf_jit(struct nfp_prog *prog);
 bool nfp_bpf_supported_opcode(u8 code);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index b683b03efd22..2ebd13b29c97 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -262,7 +262,6 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
 			prog->aux->stack_depth, stack_size);
 		return -EOPNOTSUPP;
 	}
-	nfp_prog->stack_frame_depth = round_up(prog->aux->stack_depth, 4);
 
 	max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
 	nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index c642c2c07d96..cc1b2c601f4e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -641,6 +641,27 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 	return 0;
 }
 
+static int
+nfp_assign_subprog_idx(struct bpf_verifier_env *env, struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta;
+	int index = 0;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		if (nfp_is_subprog_start(meta))
+			index++;
+		meta->subprog_idx = index;
+	}
+
+	if (index + 1 != nfp_prog->subprog_cnt) {
+		pr_vlog(env, "BUG: number of processed BPF functions is not consistent (processed %d, expected %d)\n",
+			index + 1, nfp_prog->subprog_cnt);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
 static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *info;
@@ -654,10 +675,21 @@ static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 	if (!nfp_prog->subprog)
 		return -ENOMEM;
 
+	nfp_assign_subprog_idx(env, nfp_prog);
+
 	info = env->subprog_info;
-	for (i = 0; i < nfp_prog->subprog_cnt; i++)
+	for (i = 0; i < nfp_prog->subprog_cnt; i++) {
 		nfp_prog->subprog[i].stack_depth = info[i].stack_depth;
 
+		if (i == 0)
+			continue;
+
+		/* Account for size of return address. */
+		nfp_prog->subprog[i].stack_depth += REG_WIDTH;
+		/* Account for size of saved registers. */
+		nfp_prog->subprog[i].stack_depth += BPF_REG_SIZE * 4;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
index fad0e62a910c..5b257c603e91 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -82,6 +82,15 @@
 #define OP_BR_BIT_ADDR_LO	OP_BR_ADDR_LO
 #define OP_BR_BIT_ADDR_HI	OP_BR_ADDR_HI
 
+#define OP_BR_ALU_BASE		0x0e800000000ULL
+#define OP_BR_ALU_BASE_MASK	0x0ff80000000ULL
+#define OP_BR_ALU_A_SRC		0x000000003ffULL
+#define OP_BR_ALU_B_SRC		0x000000ffc00ULL
+#define OP_BR_ALU_DEFBR		0x00000300000ULL
+#define OP_BR_ALU_IMM_HI	0x0007fc00000ULL
+#define OP_BR_ALU_SRC_LMEXTN	0x40000000000ULL
+#define OP_BR_ALU_DST_LMEXTN	0x80000000000ULL
+
 static inline bool nfp_is_br(u64 insn)
 {
 	return (insn & OP_BR_BASE_MASK) == OP_BR_BASE ||
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 05/12] nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.

We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 35 +++++++++++++++++++--------
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  3 ++-
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 98a94ca36bfa..ccb80a5ac828 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -4018,20 +4018,35 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, unsigned int cnt)
 
 	/* Another pass to record jump information. */
 	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		struct nfp_insn_meta *dst_meta;
 		u64 code = meta->insn.code;
+		unsigned int dst_idx;
+		bool pseudo_call;
 
-		if (BPF_CLASS(code) == BPF_JMP && BPF_OP(code) != BPF_EXIT &&
-		    BPF_OP(code) != BPF_CALL) {
-			struct nfp_insn_meta *dst_meta;
-			unsigned short dst_indx;
+		if (BPF_CLASS(code) != BPF_JMP)
+			continue;
+		if (BPF_OP(code) == BPF_EXIT)
+			continue;
+		if (is_mbpf_helper_call(meta))
+			continue;
 
-			dst_indx = meta->n + 1 + meta->insn.off;
-			dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_indx,
-						     cnt);
+		/* If opcode is BPF_CALL at this point, this can only be a
+		 * BPF-to-BPF call (a.k.a pseudo call).
+		 */
+		pseudo_call = BPF_OP(code) == BPF_CALL;
 
-			meta->jmp_dst = dst_meta;
-			dst_meta->flags |= FLAG_INSN_IS_JUMP_DST;
-		}
+		if (pseudo_call)
+			dst_idx = meta->n + 1 + meta->insn.imm;
+		else
+			dst_idx = meta->n + 1 + meta->insn.off;
+
+		dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_idx, cnt);
+
+		if (pseudo_call)
+			dst_meta->flags |= FLAG_INSN_IS_SUBPROG_START;
+
+		dst_meta->flags |= FLAG_INSN_IS_JUMP_DST;
+		meta->jmp_dst = dst_meta;
 	}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 853a5346378c..20a98ce4b345 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -262,7 +262,8 @@ struct nfp_bpf_reg_state {
 	bool var_off;
 };
 
-#define FLAG_INSN_IS_JUMP_DST	BIT(0)
+#define FLAG_INSN_IS_JUMP_DST			BIT(0)
+#define FLAG_INSN_IS_SUBPROG_START		BIT(1)
 
 /**
  * struct nfp_insn_meta - BPF instruction wrapper
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 04/12] nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.

While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h     | 8 ++++++++
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 9 +++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 7f6e850e42da..853a5346378c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -423,6 +423,14 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
 	return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
 }
 
+static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta)
+{
+	struct bpf_insn insn = meta->insn;
+
+	return insn.code == (BPF_JMP | BPF_CALL) &&
+		insn.src_reg != BPF_PSEUDO_CALL;
+}
+
 /**
  * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
  * @stack_depth:	maximum stack depth used by this sub-program
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 9ef74bc1ec1d..c642c2c07d96 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -155,8 +155,9 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env,
 }
 
 static int
-nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
-		   struct nfp_insn_meta *meta)
+nfp_bpf_check_helper_call(struct nfp_prog *nfp_prog,
+			  struct bpf_verifier_env *env,
+			  struct nfp_insn_meta *meta)
 {
 	const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1;
 	const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2;
@@ -620,8 +621,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 		return -EINVAL;
 	}
 
-	if (meta->insn.code == (BPF_JMP | BPF_CALL))
-		return nfp_bpf_check_call(nfp_prog, env, meta);
+	if (is_mbpf_helper_call(meta))
+		return nfp_bpf_check_helper_call(nfp_prog, env, meta);
 	if (meta->insn.code == (BPF_JMP | BPF_EXIT))
 		return nfp_bpf_check_exit(nfp_prog, env);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 03/12] nfp: bpf: copy eBPF subprograms information from kernel verifier
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.

Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h     | 12 ++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  2 ++
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 15 +++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 7050535383b8..7f6e850e42da 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -424,6 +424,14 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
 }
 
 /**
+ * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
+ * @stack_depth:	maximum stack depth used by this sub-program
+ */
+struct nfp_bpf_subprog_info {
+	u16 stack_depth;
+};
+
+/**
  * struct nfp_prog - nfp BPF program
  * @bpf: backpointer to the bpf app priv structure
  * @prog: machine code
@@ -439,7 +447,9 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
  * @stack_frame_depth: max stack depth for current frame
  * @adjust_head_location: if program has single adjust head call - the insn no.
  * @map_records_cnt: the number of map pointers recorded for this prog
+ * @subprog_cnt: number of sub-programs, including main function
  * @map_records: the map record pointers from bpf->maps_neutral
+ * @subprog: pointer to an array of objects holding info about sub-programs
  * @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
  */
 struct nfp_prog {
@@ -464,7 +474,9 @@ struct nfp_prog {
 	unsigned int adjust_head_location;
 
 	unsigned int map_records_cnt;
+	unsigned int subprog_cnt;
 	struct nfp_bpf_neutral_map **map_records;
+	struct nfp_bpf_subprog_info *subprog;
 
 	struct list_head insns;
 };
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index c9519bb00f8a..b683b03efd22 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -208,6 +208,8 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
 {
 	struct nfp_insn_meta *meta, *tmp;
 
+	kfree(nfp_prog->subprog);
+
 	list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) {
 		list_del(&meta->l);
 		kfree(meta);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index e470489021e3..9ef74bc1ec1d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -642,6 +642,21 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 
 static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
+	struct bpf_subprog_info *info;
+	struct nfp_prog *nfp_prog;
+	int i;
+
+	nfp_prog = env->prog->aux->offload->dev_priv;
+	nfp_prog->subprog_cnt = env->subprog_cnt;
+	nfp_prog->subprog = kcalloc(nfp_prog->subprog_cnt,
+				    sizeof(nfp_prog->subprog[0]), GFP_KERNEL);
+	if (!nfp_prog->subprog)
+		return -ENOMEM;
+
+	info = env->subprog_info;
+	for (i = 0; i < nfp_prog->subprog_cnt; i++)
+		nfp_prog->subprog[i].stack_depth = info[i].stack_depth;
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 02/12] nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c     | 10 +++++-----
 drivers/net/ethernet/netronome/nfp/bpf/main.h    |  4 ++--
 drivers/net/ethernet/netronome/nfp/bpf/offload.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index eff57f7d056a..98a94ca36bfa 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1137,7 +1137,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	     unsigned int size, unsigned int ptr_off, u8 gpr, u8 ptr_gpr,
 	     bool clr_gpr, lmem_step step)
 {
-	s32 off = nfp_prog->stack_depth + meta->insn.off + ptr_off;
+	s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
 	bool first = true, last;
 	bool needs_inc = false;
 	swreg stack_off_reg;
@@ -1695,7 +1695,7 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	s64 lm_off;
 
 	/* We only have to reload LM0 if the key is not at start of stack */
-	lm_off = nfp_prog->stack_depth;
+	lm_off = nfp_prog->stack_frame_depth;
 	lm_off += meta->arg2.reg.var_off.value + meta->arg2.reg.off;
 	load_lm_ptr = meta->arg2.var_off || lm_off;
 
@@ -1808,10 +1808,10 @@ static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 		swreg stack_depth_reg;
 
 		stack_depth_reg = ur_load_imm_any(nfp_prog,
-						  nfp_prog->stack_depth,
+						  nfp_prog->stack_frame_depth,
 						  stack_imm(nfp_prog));
-		emit_alu(nfp_prog, reg_both(dst),
-			 stack_reg(nfp_prog), ALU_OP_ADD, stack_depth_reg);
+		emit_alu(nfp_prog, reg_both(dst), stack_reg(nfp_prog),
+			 ALU_OP_ADD, stack_depth_reg);
 		wrp_immed(nfp_prog, reg_both(dst + 1), 0);
 	} else {
 		wrp_reg_mov(nfp_prog, dst, src);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 792ebc4081a3..7050535383b8 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -436,7 +436,7 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta *meta)
  * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
  * @n_translated: number of successfully translated instructions (for errors)
  * @error: error code if something went wrong
- * @stack_depth: max stack depth from the verifier
+ * @stack_frame_depth: max stack depth for current frame
  * @adjust_head_location: if program has single adjust head call - the insn no.
  * @map_records_cnt: the number of map pointers recorded for this prog
  * @map_records: the map record pointers from bpf->maps_neutral
@@ -460,7 +460,7 @@ struct nfp_prog {
 	unsigned int n_translated;
 	int error;
 
-	unsigned int stack_depth;
+	unsigned int stack_frame_depth;
 	unsigned int adjust_head_location;
 
 	unsigned int map_records_cnt;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 1ccd6371a15b..c9519bb00f8a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -260,7 +260,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
 			prog->aux->stack_depth, stack_size);
 		return -EOPNOTSUPP;
 	}
-	nfp_prog->stack_depth = round_up(prog->aux->stack_depth, 4);
+	nfp_prog->stack_frame_depth = round_up(prog->aux->stack_depth, 4);
 
 	max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
 	nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 01/12] bpf: add verifier callback to get stack usage info for offloaded progs
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet
In-Reply-To: <1538913418-16039-1-git-send-email-quentin.monnet@netronome.com>

In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.

Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.

This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  8 +++++++-
 drivers/net/netdevsim/bpf.c                       |  8 +++++++-
 include/linux/bpf.h                               |  1 +
 include/linux/bpf_verifier.h                      |  1 +
 kernel/bpf/offload.c                              | 18 ++++++++++++++++++
 kernel/bpf/verifier.c                             |  3 +++
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index a6e9248669e1..e470489021e3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -640,6 +640,12 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 	return 0;
 }
 
+static int nfp_bpf_finalize(struct bpf_verifier_env *env)
+{
+	return 0;
+}
+
 const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
-	.insn_hook = nfp_verify_insn,
+	.insn_hook	= nfp_verify_insn,
+	.finalize	= nfp_bpf_finalize,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 81444208b216..cb3518474f0e 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -86,8 +86,14 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn)
 	return 0;
 }
 
+static int nsim_bpf_finalize(struct bpf_verifier_env *env)
+{
+	return 0;
+}
+
 static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
-	.insn_hook = nsim_bpf_verify_insn,
+	.insn_hook	= nsim_bpf_verify_insn,
+	.finalize	= nsim_bpf_finalize,
 };
 
 static bool nsim_xdp_offload_active(struct netdevsim *ns)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 027697b6a22f..9b558713447f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -263,6 +263,7 @@ struct bpf_verifier_ops {
 struct bpf_prog_offload_ops {
 	int (*insn_hook)(struct bpf_verifier_env *env,
 			 int insn_idx, int prev_insn_idx);
+	int (*finalize)(struct bpf_verifier_env *env);
 };
 
 struct bpf_prog_offload {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7b6fd2ab3263..9e8056ec20fa 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -245,5 +245,6 @@ static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
 int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
 int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
 				 int insn_idx, int prev_insn_idx);
+int bpf_prog_offload_finalize(struct bpf_verifier_env *env);
 
 #endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 177a52436394..8e93c47f0779 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -172,6 +172,24 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
 	return ret;
 }
 
+int bpf_prog_offload_finalize(struct bpf_verifier_env *env)
+{
+	struct bpf_prog_offload *offload;
+	int ret = -ENODEV;
+
+	down_read(&bpf_devs_lock);
+	offload = env->prog->aux->offload;
+	if (offload) {
+		if (offload->dev_ops->finalize)
+			ret = offload->dev_ops->finalize(env);
+		else
+			ret = 0;
+	}
+	up_read(&bpf_devs_lock);
+
+	return ret;
+}
+
 static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
 {
 	struct bpf_prog_offload *offload = prog->aux->offload;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 73c81bef6ae8..a0454cb299ba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6309,6 +6309,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 		env->cur_state = NULL;
 	}
 
+	if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux))
+		ret = bpf_prog_offload_finalize(env);
+
 skip_full_check:
 	while (!pop_stack(env, NULL, NULL));
 	free_states(env);
-- 
2.7.4

^ permalink raw reply related

* [PATCH bpf-next 00/12] nfp: bpf: add support for BPF-to-BPF function calls
From: Quentin Monnet @ 2018-10-07 11:56 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers, Quentin Monnet

This patch series adds support for hardware offload of programs containing
BPF-to-BPF function calls. First, a new callback is added to the kernel
verifier, to collect information after the main part of the verification
has been performed. Then support for BPF-to-BPF calls is incrementally
added to the nfp driver, before offloading programs containing such calls
is eventually allowed by lifting the restriction in the kernel verifier, in
the last patch. Please refer to individual patches for details.

Many thanks to Jiong and Jakub for their precious help and contribution on
the main patches for the JIT-compiler, and everything related to stack
accesses.

Quentin Monnet (12):
  bpf: add verifier callback to get stack usage info for offloaded progs
  nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
  nfp: bpf: copy eBPF subprograms information from kernel verifier
  nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier
  nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT
  nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver
  nfp: bpf: account for additional stack usage when checking stack limit
  nfp: bpf: update fixup function for BPF-to-BPF calls support
  nfp: bpf: fix return address from register-saving subroutine to callee
  nfp: bpf: optimise save/restore for R6~R9 based on register usage
  nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls
  bpf: allow offload of programs with BPF-to-BPF function calls

 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 381 ++++++++++++++++++++--
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |  52 ++-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  11 +-
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 141 +++++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h      |   9 +
 drivers/net/netdevsim/bpf.c                       |   8 +-
 include/linux/bpf.h                               |   1 +
 include/linux/bpf_verifier.h                      |   1 +
 kernel/bpf/offload.c                              |  18 +
 kernel/bpf/verifier.c                             |  13 +-
 10 files changed, 589 insertions(+), 46 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net 1/1] net: sched: cls_u32: fix hnode refcounting
From: Jamal Hadi Salim @ 2018-10-07 11:40 UTC (permalink / raw)
  To: davem; +Cc: xiyou.wangcong, jiri, netdev, viro, Jamal Hadi Salim

From: Al Viro <viro@zeniv.linux.org.uk>

cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
via ->hlist and via ->tp_root together.  u32_destroy() drops the former
and, in case when there had been links, leaves the sucker on the list.
As the result, there's nothing to protect it from getting freed once links
are dropped.
That also makes the "is it busy" check incapable of catching the root
hnode - it *is* busy (there's a reference from tp), but we don't see it as
something separate.  "Is it our root?" check partially covers that, but
the problem exists for others' roots as well.

AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
include oopsen, that is) would be this:
        * count tp->root and tp_c->hlist as separate references.  I.e.
have u32_init() set refcount to 2, not 1.
	* in u32_destroy() we always drop the former;
in u32_destroy_hnode() - the latter.

	That way we have *all* references contributing to refcount.  List
removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
an in u32_destroy() in case of tc_u_common going away, along with
everything reachable from it.  IOW, that way we know that
u32_destroy_key() won't free something still on the list (or pointed to by
someone's ->root).

Reproducer:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \
u32 divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \
u32 divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 100 \
handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \
plus 0 eat match ip protocol 6 ff
tc filter delete dev eth0 parent ffff: protocol ip prio 200
tc filter change dev eth0 parent ffff: protocol ip prio 100 \
handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \
eat match ip protocol 6 ff
tc filter delete dev eth0 parent ffff: protocol ip prio 100

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/cls_u32.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f218ccf1e2d9..b2c3406a2cf2 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -398,6 +398,7 @@ static int u32_init(struct tcf_proto *tp)
 	rcu_assign_pointer(tp_c->hlist, root_ht);
 	root_ht->tp_c = tp_c;
 
+	root_ht->refcnt++;
 	rcu_assign_pointer(tp->root, root_ht);
 	tp->data = tp_c;
 	return 0;
@@ -610,7 +611,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 	struct tc_u_hnode __rcu **hn;
 	struct tc_u_hnode *phn;
 
-	WARN_ON(ht->refcnt);
+	WARN_ON(--ht->refcnt);
 
 	u32_clear_hnode(tp, ht, extack);
 
@@ -649,7 +650,7 @@ static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 
 	WARN_ON(root_ht == NULL);
 
-	if (root_ht && --root_ht->refcnt == 0)
+	if (root_ht && --root_ht->refcnt == 1)
 		u32_destroy_hnode(tp, root_ht, extack);
 
 	if (--tp_c->refcnt == 0) {
@@ -698,7 +699,6 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last,
 	}
 
 	if (ht->refcnt == 1) {
-		ht->refcnt--;
 		u32_destroy_hnode(tp, ht, extack);
 	} else {
 		NL_SET_ERR_MSG_MOD(extack, "Can not delete in-use filter");
@@ -708,11 +708,11 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last,
 out:
 	*last = true;
 	if (root_ht) {
-		if (root_ht->refcnt > 1) {
+		if (root_ht->refcnt > 2) {
 			*last = false;
 			goto ret;
 		}
-		if (root_ht->refcnt == 1) {
+		if (root_ht->refcnt == 2) {
 			if (!ht_empty(root_ht)) {
 				*last = false;
 				goto ret;
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next v7 28/28] net: WireGuard secure network tunnel
From: Andrew Lunn @ 2018-10-07 18:42 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jason A. Donenfeld, Ji??í Pírko, LKML, Netdev,
	David Miller, Greg Kroah-Hartman, Dan Carpenter
In-Reply-To: <20181007181444.wohudtigexv56b77@wunner.de>

> Apart from Dan's clarity argument, what if you need to insert another
> step to create the interface, thereby necessitating another step in
> the error path?  Are you going to call it 4a, 4b, ...?  Because you
> don't want to inflate that future patch by renaming every single
> label.

Hi Lukas

You beat me to it.

Jason, the things to understand is, Wireguard is not finished. It has
only just started. The code will develop further, changes will be
made. We need the code to be easy to make changes to. As Lukas just
pointed out, this number construct is hard to change in a minimal way,
making a patch small, and obviously correct.

       Andrew

^ permalink raw reply

* Re: [PATCH net-next v7 28/28] net: WireGuard secure network tunnel
From: Lukas Wunner @ 2018-10-07 18:14 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Ji??í Pírko, LKML, Netdev, David Miller,
	Greg Kroah-Hartman, Dan Carpenter
In-Reply-To: <CAHmME9pKvUOaAHhe7gb3yH4jdkdTtLv+gUqdnh3bJZcN776UWQ@mail.gmail.com>

On Sun, Oct 07, 2018 at 07:26:47PM +0200, Jason A. Donenfeld wrote:
> On Sat, Oct 6, 2018 at 9:03 AM Jiri Pirko <jiri@resnulli.us> wrote:
> > >+      wg->incoming_handshakes_worker =
> > >+              wg_packet_alloc_percpu_multicore_worker(
> > >+                              wg_packet_handshake_receive_worker, wg);
> > >+      if (!wg->incoming_handshakes_worker)
> > >+              goto error_2;
> >
> >
> > Please consider renaming the label to "what went wrong". In this case,
> > it would be "err_alloc_worker".

Dan Carpenter, who has probably become the world expert on error paths
in the kernel due to his work on smatch, recommends naming the labels
not "what went wrong" but rather what the error path is going to do,
in this case "err_free_incoming_handshakes_worker" (abbreviate to
"err_free_incoming" or some such):

https://lkml.org/lkml/2016/8/22/374


> > >+      pr_debug("%s: Interface created\n", dev->name);
> > >+      return ret;
> > >+
> > >+error_9:
> > >+      wg_ratelimiter_uninit();
> > >+error_8:
> > >+      wg_packet_queue_free(&wg->decrypt_queue, true);
> > >+error_7:
> > >+      wg_packet_queue_free(&wg->encrypt_queue, true);
> > >+error_6:
> > >+      destroy_workqueue(wg->packet_crypt_wq);
> > >+error_5:
> > >+      destroy_workqueue(wg->handshake_send_wq);
> > >+error_4:
> > >+      destroy_workqueue(wg->handshake_receive_wq);
> > >+error_3:
> > >+      free_percpu(wg->incoming_handshakes_worker);
> > >+error_2:
> > >+      free_percpu(dev->tstats);
> > >+error_1:
> > >+      return ret;
> > >+}
> 
> I'll change away from using error_9,8,7,6,5,4,3,2,1 because of your
> suggestion -- and because it's the norm in the kernel to use real
> names. But, I would be interested in your opinion on the numerical
> errors' reasoning for existing in the first place. The idea was that
> with so many different failure cases that need to cascade in the
> correct order, it's much easier to visually inspect that it's been
> done right by observing up top 1,2,3,4,5,6,7,8,9 and at the bottom
> 9,8,7,6,5,4,3,2,1, rather than having to store in my brain's limited
> stack space what each name pertains to and keep track of the ordering
> and such. In light of that, do you still think that following the
> convention of textual error labels is a good match here? Again, I'm
> changing this for v8, but I am nonetheless curious about what you
> think.

Apart from Dan's clarity argument, what if you need to insert another
step to create the interface, thereby necessitating another step in
the error path?  Are you going to call it 4a, 4b, ...?  Because you
don't want to inflate that future patch by renaming every single
label.

Thanks,

Lukas

^ permalink raw reply

* Re: [PATCH net-next 17/20] net/fib_rules: Update fib_nl_dumprule for strict data checking
From: Christian Brauner @ 2018-10-07 10:55 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-18-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:52PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update fib_nl_dumprule for strict data checking. If the flag is set,
> the dump request is expected to have fib_rule_hdr struct as the header.
> All elements of the struct are expected to be 0 and no attributes can
> be appended.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Christian Brauner <christian@brauner.io>

> ---
>  net/core/fib_rules.c | 36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 0ff3953f64aa..e3cf50728d0a 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -1063,13 +1063,47 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
>  	return err;
>  }
>  
> +static int fib_valid_dumprule(const struct nlmsghdr *nlh,
> +			      struct netlink_ext_ack *extack)
> +{
> +	struct fib_rule_hdr *frh;
> +
> +	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
> +		return -EINVAL;
> +	}
> +
> +	frh = nlmsg_data(nlh);
> +	if (frh->dst_len || frh->src_len || frh->tos || frh->table ||
> +	    frh->res1 || frh->res2 || frh->action || frh->flags) {
> +		NL_SET_ERR_MSG(extack,
> +			       "Invalid values in header for dump request");
> +		return -EINVAL;
> +	}
> +
> +	if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*frh))) {
> +		NL_SET_ERR_MSG(extack, "Invalid data after header");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	struct fib_rules_ops *ops;
>  	int idx = 0, family;
>  
> -	family = rtnl_msg_family(cb->nlh);
> +	if (cb->strict_check) {
> +		int err = fib_valid_dumprule(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
> +	family = rtnl_msg_family(nlh);
>  	if (family != AF_UNSPEC) {
>  		/* Protocol specific dump request */
>  		ops = lookup_rules_ops(net, family);
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 18/20] net/ipv6: Update ip6addrlbl_dump for strict data checking
From: Christian Brauner @ 2018-10-07 10:54 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-19-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:53PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update ip6addrlbl_dump for strict data checking. If the flag is set,
> the dump request is expected to have an ifaddrlblmsg struct as the
> header. All elements of the struct are expected to be 0 and no
> attributes can be appended.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Christian Brauner <christian@brauner.io>

> ---
>  net/ipv6/addrlabel.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
> index 1d6ced37ad71..10556049cc44 100644
> --- a/net/ipv6/addrlabel.c
> +++ b/net/ipv6/addrlabel.c
> @@ -458,20 +458,53 @@ static int ip6addrlbl_fill(struct sk_buff *skb,
>  	return 0;
>  }
>  
> +static int ip6addrlbl_valid_dump_req(const struct nlmsghdr *nlh,
> +				     struct netlink_ext_ack *extack)
> +{
> +	struct ifaddrlblmsg *ifal;
> +
> +	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifal))) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
> +		return -EINVAL;
> +	}
> +
> +	ifal = nlmsg_data(nlh);
> +	if (ifal->__ifal_reserved || ifal->ifal_prefixlen ||
> +	    ifal->ifal_flags || ifal->ifal_index || ifal->ifal_seq) {
> +		NL_SET_ERR_MSG(extack,
> +			       "Invalid values in header for dump request");
> +		return -EINVAL;
> +	}
> +
> +	if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ifal))) {
> +		NL_SET_ERR_MSG(extack, "Invalid data after header");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int ip6addrlbl_dump(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	struct ip6addrlbl_entry *p;
>  	int idx = 0, s_idx = cb->args[0];
>  	int err;
>  
> +	if (cb->strict_check) {
> +		err = ip6addrlbl_valid_dump_req(nlh, cb->extack);
> +		if (err)
> +			return err;
> +	}
> +
>  	rcu_read_lock();
>  	hlist_for_each_entry_rcu(p, &net->ipv6.ip6addrlbl_table.head, list) {
>  		if (idx >= s_idx) {
>  			err = ip6addrlbl_fill(skb, p,
>  					      net->ipv6.ip6addrlbl_table.seq,
>  					      NETLINK_CB(cb->skb).portid,
> -					      cb->nlh->nlmsg_seq,
> +					      nlh->nlmsg_seq,
>  					      RTM_NEWADDRLABEL,
>  					      NLM_F_MULTI);
>  			if (err < 0)
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 19/20] net: Update netconf dump handlers for strict data checking
From: Christian Brauner @ 2018-10-07 10:53 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-20-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:54PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and
> mpls_netconf_dump_devconf for strict data checking. If the flag is set,
> the dump request is expected to have an netconfmsg struct as the header.
> The struct only has the family member and no attributes can be appended.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Christian Brauner <christian@brauner.io>

> ---
>  net/ipv4/devinet.c  | 22 +++++++++++++++++++---
>  net/ipv6/addrconf.c | 22 +++++++++++++++++++---
>  net/mpls/af_mpls.c  | 18 +++++++++++++++++-
>  3 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index af968d4fe4fc..595706d6b672 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -2069,6 +2069,7 @@ static int inet_netconf_get_devconf(struct sk_buff *in_skb,
>  static int inet_netconf_dump_devconf(struct sk_buff *skb,
>  				     struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	int h, s_h;
>  	int idx, s_idx;
> @@ -2076,6 +2077,21 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
>  	struct in_device *in_dev;
>  	struct hlist_head *head;
>  
> +	if (cb->strict_check) {
> +		struct netlink_ext_ack *extack = cb->extack;
> +		struct netconfmsg *ncm;
> +
> +		if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid header");
> +			return -EINVAL;
> +		}
> +
> +		if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid data after header");
> +			return -EINVAL;
> +		}

Hm, I think this could just be one branch with !=
But if you've done this to report back a more meaningful error message
to userspace, fine. :)

> +	}
> +
>  	s_h = cb->args[0];
>  	s_idx = idx = cb->args[1];
>  
> @@ -2095,7 +2111,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
>  			if (inet_netconf_fill_devconf(skb, dev->ifindex,
>  						      &in_dev->cnf,
>  						      NETLINK_CB(cb->skb).portid,
> -						      cb->nlh->nlmsg_seq,
> +						      nlh->nlmsg_seq,
>  						      RTM_NEWNETCONF,
>  						      NLM_F_MULTI,
>  						      NETCONFA_ALL) < 0) {
> @@ -2112,7 +2128,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
>  		if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_ALL,
>  					      net->ipv4.devconf_all,
>  					      NETLINK_CB(cb->skb).portid,
> -					      cb->nlh->nlmsg_seq,
> +					      nlh->nlmsg_seq,
>  					      RTM_NEWNETCONF, NLM_F_MULTI,
>  					      NETCONFA_ALL) < 0)
>  			goto done;
> @@ -2123,7 +2139,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
>  		if (inet_netconf_fill_devconf(skb, NETCONFA_IFINDEX_DEFAULT,
>  					      net->ipv4.devconf_dflt,
>  					      NETLINK_CB(cb->skb).portid,
> -					      cb->nlh->nlmsg_seq,
> +					      nlh->nlmsg_seq,
>  					      RTM_NEWNETCONF, NLM_F_MULTI,
>  					      NETCONFA_ALL) < 0)
>  			goto done;
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 693199a29426..9dfe6c2106c1 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -666,6 +666,7 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
>  static int inet6_netconf_dump_devconf(struct sk_buff *skb,
>  				      struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	int h, s_h;
>  	int idx, s_idx;
> @@ -673,6 +674,21 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
>  	struct inet6_dev *idev;
>  	struct hlist_head *head;
>  
> +	if (cb->strict_check) {
> +		struct netlink_ext_ack *extack = cb->extack;
> +		struct netconfmsg *ncm;
> +
> +		if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid header");
> +			return -EINVAL;
> +		}
> +
> +		if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid data after header");
> +			return -EINVAL;
> +		}
> +	}
> +
>  	s_h = cb->args[0];
>  	s_idx = idx = cb->args[1];
>  
> @@ -692,7 +708,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
>  			if (inet6_netconf_fill_devconf(skb, dev->ifindex,
>  						       &idev->cnf,
>  						       NETLINK_CB(cb->skb).portid,
> -						       cb->nlh->nlmsg_seq,
> +						       nlh->nlmsg_seq,
>  						       RTM_NEWNETCONF,
>  						       NLM_F_MULTI,
>  						       NETCONFA_ALL) < 0) {
> @@ -709,7 +725,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
>  		if (inet6_netconf_fill_devconf(skb, NETCONFA_IFINDEX_ALL,
>  					       net->ipv6.devconf_all,
>  					       NETLINK_CB(cb->skb).portid,
> -					       cb->nlh->nlmsg_seq,
> +					       nlh->nlmsg_seq,
>  					       RTM_NEWNETCONF, NLM_F_MULTI,
>  					       NETCONFA_ALL) < 0)
>  			goto done;
> @@ -720,7 +736,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
>  		if (inet6_netconf_fill_devconf(skb, NETCONFA_IFINDEX_DEFAULT,
>  					       net->ipv6.devconf_dflt,
>  					       NETLINK_CB(cb->skb).portid,
> -					       cb->nlh->nlmsg_seq,
> +					       nlh->nlmsg_seq,
>  					       RTM_NEWNETCONF, NLM_F_MULTI,
>  					       NETCONFA_ALL) < 0)
>  			goto done;
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 3e33934751b4..b80b00b55bdf 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -1263,6 +1263,7 @@ static int mpls_netconf_get_devconf(struct sk_buff *in_skb,
>  static int mpls_netconf_dump_devconf(struct sk_buff *skb,
>  				     struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	struct hlist_head *head;
>  	struct net_device *dev;
> @@ -1270,6 +1271,21 @@ static int mpls_netconf_dump_devconf(struct sk_buff *skb,
>  	int idx, s_idx;
>  	int h, s_h;
>  
> +	if (cb->strict_check) {
> +		struct netlink_ext_ack *extack = cb->extack;
> +		struct netconfmsg *ncm;
> +
> +		if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid header");
> +			return -EINVAL;
> +		}
> +
> +		if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ncm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid data after header");
> +			return -EINVAL;
> +		}
> +	}
> +
>  	s_h = cb->args[0];
>  	s_idx = idx = cb->args[1];
>  
> @@ -1286,7 +1302,7 @@ static int mpls_netconf_dump_devconf(struct sk_buff *skb,
>  				goto cont;
>  			if (mpls_netconf_fill_devconf(skb, mdev,
>  						      NETLINK_CB(cb->skb).portid,
> -						      cb->nlh->nlmsg_seq,
> +						      nlh->nlmsg_seq,
>  						      RTM_NEWNETCONF,
>  						      NLM_F_MULTI,
>  						      NETCONFA_ALL) < 0) {
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 15/20] net/neighbor: Update neightbl_dump_info for strict data checking
From: Christian Brauner @ 2018-10-07 10:48 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-16-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:50PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update neightbl_dump_info for strict data checking. If the flag is set,
> the dump request is expected to have an ndtmsg struct as the header.
> All elements of the struct are expected to be 0 and no attributes can
> be appended.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  net/core/neighbour.c | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 3130d010b7ad..8e07b92403ab 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -2164,15 +2164,47 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	return err;
>  }
>  
> +static int neightbl_valid_dump_info(const struct nlmsghdr *nlh,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct ndtmsg *ndtm;
> +
> +	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndtm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
> +		return -EINVAL;
> +	}
> +
> +	ndtm = nlmsg_data(nlh);
> +	if (ndtm->ndtm_pad1  || ndtm->ndtm_pad2) {
> +		NL_SET_ERR_MSG(extack, "Invalid values in header for dump request");
> +		return -EINVAL;
> +	}
> +
> +	if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*ndtm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid data after header");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	int family, tidx, nidx = 0;
>  	int tbl_skip = cb->args[0];
>  	int neigh_skip = cb->args[1];
>  	struct neigh_table *tbl;
>  
> -	family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
> +	if (cb->strict_check) {
> +		int err = neightbl_valid_dump_info(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
> +	family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family;

So this already was a problem prior to your patch: what happens when you
pass in the wrong struct? Then this case is not safe to do and might
contain all kinds of crap.

>  
>  	for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) {
>  		struct neigh_parms *p;
> @@ -2185,7 +2217,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>  			continue;
>  
>  		if (neightbl_fill_info(skb, tbl, NETLINK_CB(cb->skb).portid,
> -				       cb->nlh->nlmsg_seq, RTM_NEWNEIGHTBL,
> +				       nlh->nlmsg_seq, RTM_NEWNEIGHTBL,
>  				       NLM_F_MULTI) < 0)
>  			break;
>  
> @@ -2200,7 +2232,7 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>  
>  			if (neightbl_fill_param_info(skb, tbl, p,
>  						     NETLINK_CB(cb->skb).portid,
> -						     cb->nlh->nlmsg_seq,
> +						     nlh->nlmsg_seq,
>  						     RTM_NEWNEIGHTBL,
>  						     NLM_F_MULTI) < 0)
>  				goto out;
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 14/20] net/neighbor: Update neigh_dump_info for strict data checking
From: Christian Brauner @ 2018-10-07 10:46 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-15-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:49PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update neigh_dump_info for strict data checking. If the flag is set,
> the dump request is expected to have an ndmsg struct as the header
> potentially followed by one or more attributes. Any data passed in the
> header or as an attribute is taken as a request to influence the data
> returned. Only values supported by the dump handler are allowed to be
> non-0 or set in the request. At the moment only the NDA_IFINDEX and
> NDA_MASTER attributes are supported.
> 
> Existing code does not fail the dump if nlmsg_parse fails. That behavior
> is kept for non-strict checking.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  net/core/neighbour.c | 59 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 48 insertions(+), 11 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index b06f794bf91e..3130d010b7ad 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -2428,13 +2428,14 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
>  
>  static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	struct netlink_ext_ack *extack = cb->extack;
>  	const struct nlmsghdr *nlh = cb->nlh;
>  	struct neigh_dump_filter filter = {};
>  	struct nlattr *tb[NDA_MAX + 1];
>  	struct neigh_table *tbl;
>  	int t, family, s_t;
>  	int proxy = 0;
> -	int err;
> +	int err, i;
>  
>  	family = ((struct rtgenmsg *)nlmsg_data(nlh))->rtgen_family;
>  
> @@ -2445,20 +2446,56 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>  	    ((struct ndmsg *)nlmsg_data(nlh))->ndm_flags == NTF_PROXY)
>  		proxy = 1;
>  
> -	err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL,
> -			  cb->extack);

So right, this seems like there was precedent in the kernel where an
nlmsg_parse() would fail and a warning as generated in the logs.
Anyway, as we have decided to not cause such warnings to be printed it
would make sense to make the whole nlmsg_parse() conditional and move it
under the if (strict) branch.

> -	if (!err) {
> -		if (tb[NDA_IFINDEX]) {
> -			if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32))
> -				return -EINVAL;
> -			filter.dev_idx = nla_get_u32(tb[NDA_IFINDEX]);
> +	if (cb->strict_check) {
> +		struct ndmsg *ndm;
> +
> +		if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ndm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid header");
> +			return -EINVAL;
> +		}
> +
> +		ndm = nlmsg_data(nlh);
> +		if (ndm->ndm_pad1  || ndm->ndm_pad2  || ndm->ndm_ifindex ||
> +		    ndm->ndm_state || ndm->ndm_flags || ndm->ndm_type) {
> +			NL_SET_ERR_MSG(extack, "Invalid values in header for dump request");
> +			return -EINVAL;
>  		}
> -		if (tb[NDA_MASTER]) {
> -			if (nla_len(tb[NDA_MASTER]) != sizeof(u32))
> +	}
> +
> +	err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, extack);
> +	if (err < 0) {
> +		if (cb->strict_check)
> +			return -EINVAL;
> +		goto walk_entries;
> +	}
> +
> +	for (i = 0; i <= NDA_MAX; ++i) {
> +		if (!tb[i])
> +			continue;
> +		switch (i) {
> +		case NDA_IFINDEX:
> +			if (nla_len(tb[i]) != sizeof(u32)) {
> +				NL_SET_ERR_MSG(extack, "Invalid IFINDEX attribute");
>  				return -EINVAL;
> -			filter.master_idx = nla_get_u32(tb[NDA_MASTER]);
> +			}
> +			filter.dev_idx = nla_get_u32(tb[i]);
> +			break;
> +		case NDA_MASTER:
> +			if (nla_len(tb[i]) != sizeof(u32)) {
> +				NL_SET_ERR_MSG(extack, "Invalid MASTER attribute");
> +				return -EINVAL;
> +			}
> +			filter.master_idx = nla_get_u32(tb[i]);
> +			break;
> +		default:
> +			if (cb->strict_check) {
> +				NL_SET_ERR_MSG(extack, "Unsupported attribute in dump request");
> +				return -EINVAL;
> +			}
>  		}
>  	}
> +
> +walk_entries:
>  	s_t = cb->args[0];
>  
>  	for (t = 0; t < NEIGH_NR_TABLES; t++) {
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 13/20] rtnetlink: Update fib dumps for strict data checking
From: Christian Brauner @ 2018-10-07 10:43 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-14-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:48PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Add helper to check netlink message for route dumps. If the strict flag
> is set the dump request is expected to have an rtmsg struct as the header.
> All elements of the struct are expected to be 0 with the exception of
> rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
> can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
> set.
> 
> Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
> and ip6mr_rtm_dumproute to call this helper if strict data checking is
> enabled.

This also looks good to me apart from the \n after every:

+ type <err> = foo();
+ 
+ if (<err> < 0)
        /* handle error */

I haven't seen this style in the code a lot or I haven't looked close
enough. It just seems visually cleaner to have this close together
without a newline:

+ type <err> = foo();
+ if (<err> < 0)
        /* handle error */

> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/net/ip_fib.h    |  2 ++
>  net/ipv4/fib_frontend.c | 43 +++++++++++++++++++++++++++++++++++++++++--
>  net/ipv4/ipmr.c         |  9 +++++++++
>  net/ipv6/ip6_fib.c      |  8 ++++++++
>  net/ipv6/ip6mr.c        |  9 +++++++++
>  net/mpls/af_mpls.c      |  8 ++++++++
>  6 files changed, 77 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index f7c109e37298..9846b79c9ee1 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -452,4 +452,6 @@ static inline void fib_proc_exit(struct net *net)
>  
>  u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
>  
> +int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
> +			  struct netlink_ext_ack *extack);
>  #endif  /* _NET_FIB_H */
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 30e2bcc3ef2a..1583ec0a5154 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -802,8 +802,41 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	return err;
>  }
>  
> +int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
> +			  struct netlink_ext_ack *extack)
> +{
> +	struct rtmsg *rtm;
> +
> +	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
> +		return -EINVAL;
> +	}
> +
> +	rtm = nlmsg_data(nlh);
> +	if (rtm->rtm_dst_len || rtm->rtm_src_len  || rtm->rtm_tos   ||
> +	    rtm->rtm_table   || rtm->rtm_protocol || rtm->rtm_scope ||
> +	    rtm->rtm_type) {
> +		NL_SET_ERR_MSG(extack,
> +			       "Invalid values in header for dump request");
> +		return -EINVAL;
> +	}
> +
> +	if (rtm->rtm_flags & ~(RTM_F_CLONED | RTM_F_PREFIX)) {
> +		NL_SET_ERR_MSG(extack, "Invalid flags for dump request");
> +		return -EINVAL;
> +	}
> +	if (nlh->nlmsg_len != nlmsg_msg_size(sizeof(*rtm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid data after header");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req);
> +
>  static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	unsigned int h, s_h;
>  	unsigned int e = 0, s_e;
> @@ -811,8 +844,14 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  	struct hlist_head *head;
>  	int dumped = 0, err;
>  
> -	if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) &&
> -	    ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED)
> +	if (cb->strict_check) {
> +		err = ip_valid_fib_dump_req(nlh, cb->extack);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (nlmsg_len(nlh) >= sizeof(struct rtmsg) &&
> +	    ((struct rtmsg *)nlmsg_data(nlh))->rtm_flags & RTM_F_CLONED)
>  		return skb->len;
>  
>  	s_h = cb->args[0];
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index e6c48e08d53d..2a7963beecfb 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -2527,6 +2527,15 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  
>  static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
> +
> +	if (cb->strict_check) {
> +		int err = ip_valid_fib_dump_req(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
>  	return mr_rtm_dumproute(skb, cb, ipmr_mr_table_iter,
>  				_ipmr_fill_mroute, &mfc_unres_lock);
>  }
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 5516f55e214b..123786684476 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -568,6 +568,7 @@ static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
>  
>  static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	unsigned int h, s_h;
>  	unsigned int e = 0, s_e;
> @@ -577,6 +578,13 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  	struct hlist_head *head;
>  	int res = 0;
>  
> +	if (cb->strict_check) {
> +		int err = ip_valid_fib_dump_req(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
>  	s_h = cb->args[0];
>  	s_e = cb->args[1];
>  
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 6f07b8380425..8a94500c5532 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -2457,6 +2457,15 @@ static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt)
>  
>  static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
> +
> +	if (cb->strict_check) {
> +		int err = ip_valid_fib_dump_req(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
>  	return mr_rtm_dumproute(skb, cb, ip6mr_mr_table_iter,
>  				_ip6mr_fill_mroute, &mfc_unres_lock);
>  }
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 55a30ee3d820..3e33934751b4 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -2017,6 +2017,7 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
>  
>  static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
>  	struct mpls_route __rcu **platform_label;
>  	size_t platform_labels;
> @@ -2024,6 +2025,13 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
>  
>  	ASSERT_RTNL();
>  
> +	if (cb->strict_check) {
> +		int err = ip_valid_fib_dump_req(nlh, cb->extack);
> +
> +		if (err)
> +			return err;
> +	}
> +
>  	index = cb->args[0];
>  	if (index < MPLS_LABEL_FIRST_UNRESERVED)
>  		index = MPLS_LABEL_FIRST_UNRESERVED;
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 12/20] rtnetlink: Update ipmr_rtm_dumplink for strict data checking
From: Christian Brauner @ 2018-10-07 10:40 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-13-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:47PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update ipmr_rtm_dumplink for strict data checking. If the flag is set,
> the dump request is expected to have an ifinfomsg struct as the header.
> All elements of the struct are expected to be 0 and no attributes can
> be appended.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Just one really tiny nit below. :)

Acked-by: Christian Brauner <christian@brauner.io>

> ---
>  net/ipv4/ipmr.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 5660adcf7a04..e6c48e08d53d 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -2710,6 +2710,31 @@ static bool ipmr_fill_vif(struct mr_table *mrt, u32 vifid, struct sk_buff *skb)
>  	return true;
>  }
>  
> +static int ipmr_valid_dumplink(const struct nlmsghdr *nlh,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct ifinfomsg *ifm;
> +
> +	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
> +		return -EINVAL;
> +	}
> +
> +	if (nlh->nlmsg_len > nlmsg_msg_size(sizeof(*ifm))) {
> +		NL_SET_ERR_MSG(extack, "Invalid data after header");
> +		return -EINVAL;
> +	}
> +
> +	ifm = nlmsg_data(nlh);
> +	if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
> +	    ifm->ifi_change || ifm->ifi_index) {
> +		NL_SET_ERR_MSG(extack, "Invalid values in header for dump request");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb)
>  {
>  	struct net *net = sock_net(skb->sk);
> @@ -2718,6 +2743,13 @@ static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb)
>  	unsigned int e = 0, s_e;
>  	struct mr_table *mrt;
>  
> +	if (cb->strict_check) {
> +		int err = ipmr_valid_dumplink(cb->nlh, cb->extack);
> +
> +		if (err)
> +			return err;

Nit: can we remove the unnecessary \n, please.

> +	}
> +
>  	s_t = cb->args[0];
>  	s_e = cb->args[1];
>  
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 10/20] rtnetlink: Update rtnl_stats_dump for strict data checking
From: Christian Brauner @ 2018-10-07 10:38 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-11-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:45PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update rtnl_stats_dump for strict data checking. If the flag is set,
> the dump request is expected to have an if_stats_msg struct as the header.
> All elements of the struct are expected to be 0 except filter_mask which
> must be non-0 (legacy behavior). No attributes are supported.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  net/core/rtnetlink.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index d2c8d41a6fbc..6cdd9251771a 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4643,6 +4643,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	struct netlink_ext_ack *extack = cb->extack;
>  	int h, s_h, err, s_idx, s_idxattr, s_prividx;
>  	struct net *net = sock_net(skb->sk);
>  	unsigned int flags = NLM_F_MULTI;
> @@ -4659,13 +4660,32 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
>  
>  	cb->seq = net->dev_base_seq;
>  
> -	if (nlmsg_len(cb->nlh) < sizeof(*ifsm))
> +	if (nlmsg_len(cb->nlh) < sizeof(*ifsm)) {
> +		NL_SET_ERR_MSG(extack, "Invalid header");
>  		return -EINVAL;
> +	}
>  
>  	ifsm = nlmsg_data(cb->nlh);
> +
> +	/* only requests using NLM_F_DUMP_PROPER_HDR can pass data to
> +	 * influence the dump. The legacy exception is filter_mask.
> +	 */
> +	if (cb->strict_check) {
> +		if (ifsm->pad1 || ifsm->pad2 || ifsm->ifindex) {
> +			NL_SET_ERR_MSG(extack, "Invalid values in header for dump request");
> +			return -EINVAL;
> +		}
> +		if (cb->nlh->nlmsg_len > nlmsg_msg_size(sizeof(*ifsm))) {

Nit: \n appreciated :)

> +			NL_SET_ERR_MSG(extack, "Invalid attributes after header");
> +			return -EINVAL;
> +		}
> +	}
> +
>  	filter_mask = ifsm->filter_mask;
> -	if (!filter_mask)
> +	if (!filter_mask) {
> +		NL_SET_ERR_MSG(extack, "Invalid filter_mask");

Nit: probably better to have this read "Invalid filter mask".

>  		return -EINVAL;
> +	}
>  
>  	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
>  		idx = 0;
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH net-next 09/20] rtnetlink: Update rtnl_bridge_getlink for strict data checking
From: Christian Brauner @ 2018-10-07 10:36 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, jbenc, stephen, David Ahern
In-Reply-To: <20181004213355.14899-10-dsahern@kernel.org>

On Thu, Oct 04, 2018 at 02:33:44PM -0700, David Ahern wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> Update rtnl_bridge_getlink for strict data checking. If the flag is set,
> the dump request is expected to have an ifinfomsg struct as the header
> potentially followed by one or more attributes. Any data passed in the
> header or as an attribute is taken as a request to influence the data
> returned. Only values supported by the dump handler are allowed to be
> non-0 or set in the request. At the moment only the IFLA_EXT_MASK
> attribute is supported.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  net/core/rtnetlink.c | 50 ++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 4fd27b5db787..d2c8d41a6fbc 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4000,27 +4000,57 @@ EXPORT_SYMBOL_GPL(ndo_dflt_bridge_getlink);
>  
>  static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
>  {
> +	struct netlink_ext_ack *extack = cb->extack;
> +	const struct nlmsghdr *nlh = cb->nlh;
>  	struct net *net = sock_net(skb->sk);
> +	struct nlattr *tb[IFLA_MAX+1];
>  	struct net_device *dev;
>  	int idx = 0;
>  	u32 portid = NETLINK_CB(cb->skb).portid;
> -	u32 seq = cb->nlh->nlmsg_seq;
> +	u32 seq = nlh->nlmsg_seq;
>  	u32 filter_mask = 0;
> -	int err;
> +	int err, i;
>  
> -	if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) {
> -		struct nlattr *extfilt;
> +	if (cb->strict_check) {
> +		struct ifinfomsg *ifm;
>  
> -		extfilt = nlmsg_find_attr(cb->nlh, sizeof(struct ifinfomsg),
> -					  IFLA_EXT_MASK);
> -		if (extfilt) {
> -			if (nla_len(extfilt) < sizeof(filter_mask))
> -				return -EINVAL;
> +		if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ifm))) {
> +			NL_SET_ERR_MSG(extack, "Invalid header");
> +			return -EINVAL;
> +		}
> +
> +		ifm = nlmsg_data(nlh);
> +		if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags ||
> +		    ifm->ifi_change || ifm->ifi_index) {
> +			NL_SET_ERR_MSG(extack, "Invalid values in header for dump request");
> +			return -EINVAL;
> +		}
> +	}
>  
> -			filter_mask = nla_get_u32(extfilt);
> +	err = nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
> +			  ifla_policy, extack);
> +	if (err < 0) {
> +		if (cb->strict_check)
> +			return -EINVAL;
> +		goto walk_entries;
> +	}

What's the point of moving this out of the
if (cb->strict_check) {} branch above? This looks like it would cause
the same parse warnings that we're trying to get rid of in inet{4,6}
dumps.
Seems to make more sense to make the nlmsg_parse() itself conditional as
well unless I'm lacking context.

> +
> +	for (i = 0; i <= IFLA_MAX; ++i) {
> +		if (!tb[i])
> +			continue;
> +		switch (i) {

I'm a fan of \n between different conditions etc. so can we please have
one after the continue. :)

> +		case IFLA_EXT_MASK:
> +			filter_mask = nla_get_u32(tb[i]);
> +			break;
> +		default:
> +			if (cb->strict_check) {
> +				NL_SET_ERR_MSG(extack, "Unsupported attribute in dump request");
> +				return -EINVAL;
> +			}
>  		}
>  	}
>  
> +walk_entries:
>  	rcu_read_lock();
>  	for_each_netdev_rcu(net, dev) {
>  		const struct net_device_ops *ops = dev->netdev_ops;
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH v2 net-next] inet: do not set backlog if listen fails
From: Eric Dumazet @ 2018-10-07 17:39 UTC (permalink / raw)
  To: Yafang Shao, edumazet, davem; +Cc: netdev, linux-kernel
In-Reply-To: <1538919405-3093-1-git-send-email-laoar.shao@gmail.com>



On 10/07/2018 06:36 AM, Yafang Shao wrote:
> We don't need to set the backlog if listen fails.
> The sk_max_ack_backlog will be set in the caller inet_listen() and
> dccp_listen_start() if inet_csk_listen_start() return without error.
> So just remove this line to avoid this unnecessary operation.
> 
> Regarding sk_ack_backlog, we have to set it before a TCP/DCCP socket is
> ready to accept new flows to avoid race, because dccp and tcp have lockless
> listeners
>

Really could you not add irrelevant stuff in the changelog ?

This patch simply removes one redundant setting, you do not have to explain
about dccp/tcp being lockless and that sk_ack_backlog is not touched by this patch.

Also the title is misleading, since we will still set the backlog even if the listen
fails.

If you really want sk_max_ack_backlog being not changed if listen() fails,
then I am afraid you will need to submit a different patch.

^ permalink raw reply

* Re: [PATCH net-next 08/20] rtnetlink: Update rtnl_dump_ifinfo for strict data checking
From: Christian Brauner @ 2018-10-07 10:29 UTC (permalink / raw)
  To: David Ahern; +Cc: David Ahern, netdev, davem, jbenc, stephen
In-Reply-To: <cc1e134e-7d02-2f46-2a6d-5c1e823f3e97@gmail.com>

On Fri, Oct 05, 2018 at 01:22:24PM -0600, David Ahern wrote:
> On 10/5/18 11:59 AM, Christian Brauner wrote:
> >> +	err = nlmsg_parse(nlh, hdrlen, tb, IFLA_MAX, ifla_policy, extack);
> >> +	if (err < 0) {
> >> +		if (cb->strict_check)
> >> +			return -EINVAL;
> >> +		goto walk_entries;
> >> +	}
> >>  
> >> -		if (master_idx || kind_ops)
> >> -			flags |= NLM_F_DUMP_FILTERED;
> >> +	for (i = 0; i <= IFLA_MAX; ++i) {
> >> +		if (!tb[i])
> >> +			continue;
> >> +		switch (i) {
> >> +		case IFLA_TARGET_NETNSID:
> >> +			netnsid = nla_get_s32(tb[i]);
> >> +			tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
> >> +			if (IS_ERR(tgt_net)) {
> >> +				NL_SET_ERR_MSG(extack, "Invalid target namespace id");
> >> +				return PTR_ERR(tgt_net);
> >> +			}
> >> +			break;
> >> +		case IFLA_EXT_MASK:
> >> +			ext_filter_mask = nla_get_u32(tb[i]);
> >> +			break;
> >> +		case IFLA_MASTER:
> >> +			master_idx = nla_get_u32(tb[i]);
> >> +			break;
> >> +		case IFLA_LINKINFO:
> >> +			kind_ops = linkinfo_to_kind_ops(tb[i]);
> >> +			break;
> >> +		default:
> >> +			if (cb->strict_check) {
> >> +				NL_SET_ERR_MSG(extack, "Unsupported attribute in dump request");
> >> +				return -EINVAL;
> >> +			}
> >> +		}
> > 
> > This might make sense to be split into two helpers for parsing:
> > <blablabla>_strict() and <blablabla>_lenient(). :)
> 
> I thought about that, but there is so much overlap - they are mostly
> common. Besides, ifinfomsg is the header for link dumps, and ifinfomsg
> is the one that has been (ab)used for other message types, so strict
> versus lenient does not really have a differentiator for this message
> type - other than checking the elements of the struct.

It's mostly about the function being extremely long and convoluted.
Having parts moved out into (a) descriptive helper(s) with whatever name
might make this way more readable than it is now especially with the new
handling we need for strict checking.

^ permalink raw reply

* Re: [PATCH net-next 11/20] rtnetlink: Update inet6_dump_ifinfo for strict data checking
From: Christian Brauner @ 2018-10-07 10:25 UTC (permalink / raw)
  To: David Ahern; +Cc: David Ahern, netdev, davem, jbenc, stephen
In-Reply-To: <30b45b7e-cbda-e348-c5f1-5b78cfb84d2f@gmail.com>

On Fri, Oct 05, 2018 at 01:25:22PM -0600, David Ahern wrote:
> On 10/5/18 11:48 AM, Christian Brauner wrote:
> > On Thu, Oct 04, 2018 at 02:33:46PM -0700, David Ahern wrote:
> >> From: David Ahern <dsahern@gmail.com>
> >>
> >> Update inet6_dump_ifinfo for strict data checking. If the flag is
> >> set, the dump request is expected to have an ifinfomsg struct as
> >> the header. All elements of the struct are expected to be 0 and no
> >> attributes can be appended.
> >>
> >> Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Christian Brauner <christian@brauner.io>

> > This is on top of current net-next? Are your patches ensuring that
> > ipv6 addr requests don't generate log messages anymore when a wrong
> > header is passed but the strict socket option is not passed? The context
> > here doesn't seem to indicate that. :)
> > 
> 
> this is an AF_INET6 GETLINK handler. Why? no idea, but I think you are
> confusing this patch with the GETADDR patch which generated the
> "netlink: 16 bytes leftover after parsing attributes in process `ip'."
> message before this set.

Yes, I realized this immediately afterwards.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox