[PATCH bpf-next 0/7] bpf: improve verifier scalability

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/7] bpf: improve verifier scalability
@ 2019-03-30  0:16 Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 1/7] bpf: add verifier stats and log_level bit 2 Alexei Starovoitov
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

Realize two key ideas to speed up verification speed by ~20 times
1. every 'branching' instructions records all verifier states.
   not all of them are useful for search pruning.
   add a simple heuristic to keep states that were successful in search pruning
   and remove those that were not
2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ.
   Once the register is marked there is no need to remark it again in the future.
   Hence stop walking the chain once first LIVE_READ is seen.

1st optimization gives 10x speed up on large programs
and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%.
Combined the deliver ~20x speedup on large programs.

Faster and bounded verification time allows to increase insn_processed
limit to 1 million from 130k.
Worst case it takes 1/10 of a second to process that many instructions
and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state)
which is around ~5Mbyte.

Increase insn_per_program limit for root to insn_processed limit.

Add verification stats and stress tests for verifier scalability.

This patch set is the first step to be able to accept large programs.
The verifier still suffers from its brute force algorithm and
large programs can easily hit 1M insn_processed limit.
A lot more work is necessary to be able to verify large programs.

Alexei Starovoitov (7):
  bpf: add verifier stats and log_level bit 2
  bpf: improve verification speed by droping states
  bpf: improve verification speed by not remarking live_read
  bpf: increase complexity limit and maximum program size
  bpf: increase verifier log limit
  libbpf: teach libbpf about log_level bit 2
  selftests/bpf: add few verifier scale tests

 include/linux/bpf.h                           |   1 +
 include/linux/bpf_verifier.h                  |  23 +++
 kernel/bpf/syscall.c                          |   3 +-
 kernel/bpf/verifier.c                         | 132 ++++++++++++++----
 tools/lib/bpf/bpf.c                           |   2 +-
 tools/lib/bpf/bpf.h                           |   2 +-
 tools/lib/bpf/libbpf.c                        |  16 ++-
 tools/lib/bpf/libbpf.h                        |   1 +
 .../bpf/prog_tests/bpf_verif_scale.c          |  49 +++++++
 .../testing/selftests/bpf/progs/test_jhash.h  |  70 ++++++++++
 .../selftests/bpf/progs/test_verif_scale1.c   |  30 ++++
 .../selftests/bpf/progs/test_verif_scale2.c   |  30 ++++
 .../selftests/bpf/progs/test_verif_scale3.c   |  30 ++++
 tools/testing/selftests/bpf/test_progs.c      |   6 +-
 tools/testing/selftests/bpf/test_progs.h      |   1 +
 15 files changed, 361 insertions(+), 35 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_jhash.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale1.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale2.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale3.c

-- 
2.20.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 1/7] bpf: add verifier stats and log_level bit 2
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 2/7] bpf: improve verification speed by droping states Alexei Starovoitov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

In order to understand the verifier bottlenecks add various stats
and extend log_level:
log_level 1 and 2 are kept as-is:
bit 0 - level=1 - print every insn and verifier state at branch points
bit 1 - level=2 - print every insn and verifier state at every insn
bit 2 - level=4 - print verifier error and stats at the end of verification

When verifier rejects the program the libbpf is trying to load the program twice.
Once with log_level=0 (no messages, only error code is reported to user space)
and second time with log_level=1 to tell the user why the verifier rejected it.

With introduction of bit 2 - level=4 the libbpf can choose to always use that
level and load programs once, since the verification speed is not affected and
in case of error the verbose message will be available.

Note that the verifier stats are not part of uapi just like all other
verbose messages. They're expected to change in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h | 21 ++++++++++
 kernel/bpf/verifier.c        | 76 ++++++++++++++++++++++++------------
 2 files changed, 73 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7d8228d1c898..f7e15eeb60bb 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -248,6 +248,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
 	return log->len_used >= log->len_total - 1;
 }
 
+#define BPF_LOG_LEVEL1	1
+#define BPF_LOG_LEVEL2	2
+#define BPF_LOG_STATS	4
+#define BPF_LOG_LEVEL	(BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
+#define BPF_LOG_MASK	(BPF_LOG_LEVEL | BPF_LOG_STATS)
+
 static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 {
 	return log->level && log->ubuf && !bpf_verifier_log_full(log);
@@ -284,6 +290,21 @@ struct bpf_verifier_env {
 	struct bpf_verifier_log log;
 	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
 	u32 subprog_cnt;
+	/* number of instructions analyzed by the verifier */
+	u32 insn_processed;
+	/* total verification time */
+	u64 verification_time;
+	/* maximum number of verifier states kept in 'branching' instructions */
+	u32 max_states_per_insn;
+	/* total number of allocated verifier states */
+	u32 total_states;
+	/* some states are freed during program analysis.
+	 * this is peak number of states. this number dominates kernel
+	 * memory consumption during verification
+	 */
+	u32 peak_states;
+	/* longest register parentage chain walked for liveness marking */
+	u32 longest_mark_read_walk;
 };
 
 __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 87221fda1321..2b5dcb2c6967 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1092,7 +1092,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	 */
 	subprog[env->subprog_cnt].start = insn_cnt;
 
-	if (env->log.level > 1)
+	if (env->log.level & BPF_LOG_LEVEL2)
 		for (i = 0; i < env->subprog_cnt; i++)
 			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
@@ -1139,6 +1139,7 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 			 struct bpf_reg_state *parent)
 {
 	bool writes = parent == state->parent; /* Observe write marks */
+	int cnt = 0;
 
 	while (parent) {
 		/* if read wasn't screened by an earlier write ... */
@@ -1155,7 +1156,11 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 		state = parent;
 		parent = state->parent;
 		writes = true;
+		cnt++;
 	}
+
+	if (env->longest_mark_read_walk < cnt)
+		env->longest_mark_read_walk = cnt;
 	return 0;
 }
 
@@ -1455,7 +1460,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	 * need to try adding each of min_value and max_value to off
 	 * to make sure our theoretical access will be safe.
 	 */
-	if (env->log.level)
+	if (env->log.level & BPF_LOG_LEVEL)
 		print_verifier_state(env, state);
 
 	/* The minimum value is only important with signed
@@ -2938,7 +2943,7 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	/* and go analyze first insn of the callee */
 	*insn_idx = target_insn;
 
-	if (env->log.level) {
+	if (env->log.level & BPF_LOG_LEVEL) {
 		verbose(env, "caller:\n");
 		print_verifier_state(env, caller);
 		verbose(env, "callee:\n");
@@ -2978,7 +2983,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 		return err;
 
 	*insn_idx = callee->callsite + 1;
-	if (env->log.level) {
+	if (env->log.level & BPF_LOG_LEVEL) {
 		verbose(env, "returning from callee:\n");
 		print_verifier_state(env, callee);
 		verbose(env, "to caller at %d:\n", *insn_idx);
@@ -5001,7 +5006,7 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 			insn->dst_reg);
 		return -EACCES;
 	}
-	if (env->log.level)
+	if (env->log.level & BPF_LOG_LEVEL)
 		print_verifier_state(env, this_branch->frame[this_branch->curframe]);
 	return 0;
 }
@@ -6181,6 +6186,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 		states_cnt++;
 	}
 
+	if (env->max_states_per_insn < states_cnt)
+		env->max_states_per_insn = states_cnt;
+
 	if (!env->allow_ptr_leaks && states_cnt > BPF_COMPLEXITY_LIMIT_STATES)
 		return 0;
 
@@ -6194,6 +6202,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 	new_sl = kzalloc(sizeof(struct bpf_verifier_state_list), GFP_KERNEL);
 	if (!new_sl)
 		return -ENOMEM;
+	env->total_states++;
+	env->peak_states++;
 
 	/* add new state to the head of linked list */
 	new = &new_sl->state;
@@ -6278,8 +6288,7 @@ static int do_check(struct bpf_verifier_env *env)
 	struct bpf_verifier_state *state;
 	struct bpf_insn *insns = env->prog->insnsi;
 	struct bpf_reg_state *regs;
-	int insn_cnt = env->prog->len, i;
-	int insn_processed = 0;
+	int insn_cnt = env->prog->len;
 	bool do_print_state = false;
 
 	env->prev_linfo = NULL;
@@ -6314,10 +6323,10 @@ static int do_check(struct bpf_verifier_env *env)
 		insn = &insns[env->insn_idx];
 		class = BPF_CLASS(insn->code);
 
-		if (++insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
+		if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
 			verbose(env,
 				"BPF program is too large. Processed %d insn\n",
-				insn_processed);
+				env->insn_processed);
 			return -E2BIG;
 		}
 
@@ -6326,7 +6335,7 @@ static int do_check(struct bpf_verifier_env *env)
 			return err;
 		if (err == 1) {
 			/* found equivalent state, can prune the search */
-			if (env->log.level) {
+			if (env->log.level & BPF_LOG_LEVEL) {
 				if (do_print_state)
 					verbose(env, "\nfrom %d to %d%s: safe\n",
 						env->prev_insn_idx, env->insn_idx,
@@ -6344,8 +6353,9 @@ static int do_check(struct bpf_verifier_env *env)
 		if (need_resched())
 			cond_resched();
 
-		if (env->log.level > 1 || (env->log.level && do_print_state)) {
-			if (env->log.level > 1)
+		if (env->log.level & BPF_LOG_LEVEL2 ||
+		    (env->log.level & BPF_LOG_LEVEL && do_print_state)) {
+			if (env->log.level & BPF_LOG_LEVEL2)
 				verbose(env, "%d:", env->insn_idx);
 			else
 				verbose(env, "\nfrom %d to %d%s:",
@@ -6356,7 +6366,7 @@ static int do_check(struct bpf_verifier_env *env)
 			do_print_state = false;
 		}
 
-		if (env->log.level) {
+		if (env->log.level & BPF_LOG_LEVEL2) {
 			const struct bpf_insn_cbs cbs = {
 				.cb_print	= verbose,
 				.private_data	= env,
@@ -6621,16 +6631,6 @@ static int do_check(struct bpf_verifier_env *env)
 		env->insn_idx++;
 	}
 
-	verbose(env, "processed %d insns (limit %d), stack depth ",
-		insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
-	for (i = 0; i < env->subprog_cnt; i++) {
-		u32 depth = env->subprog_info[i].stack_depth;
-
-		verbose(env, "%d", depth);
-		if (i + 1 < env->subprog_cnt)
-			verbose(env, "+");
-	}
-	verbose(env, "\n");
 	env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
 	return 0;
 }
@@ -7854,9 +7854,34 @@ static void free_states(struct bpf_verifier_env *env)
 	kfree(env->explored_states);
 }
 
+static void print_verification_stats(struct bpf_verifier_env *env)
+{
+	int i;
+
+	if (env->log.level & BPF_LOG_STATS) {
+		verbose(env, "verification time %lld usec\n",
+			div_u64(env->verification_time, 1000));
+		verbose(env, "stack depth ");
+		for (i = 0; i < env->subprog_cnt; i++) {
+			u32 depth = env->subprog_info[i].stack_depth;
+
+			verbose(env, "%d", depth);
+			if (i + 1 < env->subprog_cnt)
+				verbose(env, "+");
+		}
+		verbose(env, "\n");
+	}
+	verbose(env, "processed %d insns (limit %d) max_states_per_insn %d "
+		"total_states %d peak_states %d mark_read %d\n",
+		env->insn_processed, BPF_COMPLEXITY_LIMIT_INSNS,
+		env->max_states_per_insn, env->total_states,
+		env->peak_states, env->longest_mark_read_walk);
+}
+
 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	      union bpf_attr __user *uattr)
 {
+	u64 start_time = ktime_get_ns();
 	struct bpf_verifier_env *env;
 	struct bpf_verifier_log *log;
 	int i, len, ret = -EINVAL;
@@ -7899,7 +7924,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 		ret = -EINVAL;
 		/* log attributes have to be sane */
 		if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 ||
-		    !log->level || !log->ubuf)
+		    !log->level || !log->ubuf || log->level & ~BPF_LOG_MASK)
 			goto err_unlock;
 	}
 
@@ -7980,6 +8005,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	if (ret == 0)
 		ret = fixup_call_args(env);
 
+	env->verification_time = ktime_get_ns() - start_time;
+	print_verification_stats(env);
+
 	if (log->level && bpf_verifier_log_full(log))
 		ret = -ENOSPC;
 	if (log->level && !log->ubuf) {
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 2/7] bpf: improve verification speed by droping states
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 1/7] bpf: add verifier stats and log_level bit 2 Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  3:12   ` Jakub Kicinski
  2019-03-30  0:16 ` [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read Alexei Starovoitov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

Branch instructions, branch targets and calls in a bpf program are
the places where the verifier remembers states that led to successful
verification of the program.
These states are used to prune brute force program analysis.
For unprivileged programs there is a limit of 64 states per such
'branching' instructions (maximum length is tracked by max_states_per_insn
counter introduced in the previous patch).
Simply reducing this threshold to 32 or lower increases insn_processed
metric to the point that small valid programs get rejected.
For root programs there is no limit and cilium programs can have
max_states_per_insn to be 100 or higher.
Walking 100+ states multiplied by number of 'branching' insns during
verification consumes significant amount of cpu time.
Turned out simple LRU-like mechanism can be used to remove states
that unlikely will be helpful in future search pruning.
This patch introduces hit_cnt and miss_cnt counters:
hit_cnt - this many times this state successfully pruned the search
miss_cnt - this many times this state was not equivalent to other states
(and that other states were added to state list)

The heuristic introduced in this patch is:
if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
  /* drop this state from future considerations */

Higher numbers increase max_states_per_insn (allow more states to be
considered for pruning) and slow verification speed, but do not meaningfully
reduce insn_processed metric.
Lower numbers drop too many states and insn_processed increases too much.
Many different formulas were considered.
This one is simple and works well enough in practice.
(the analysis was done on selftests/progs/* and on cilium programs)

The end result is this heuristic improves verification speed by 10 times.
Large synthetic programs that used to take a second more now take
1/10 of a second.
In cases where max_states_per_insn used to be 100 or more, now it's ~10.

There is a slight increase in insn_processed for cilium progs:
                       before   after
bpf_lb-DLB_L3.o 	1831	1838
bpf_lb-DLB_L4.o 	3029	3218
bpf_lb-DUNKNOWN.o 	1064	1064
bpf_lxc-DDROP_ALL.o	26309	26935
bpf_lxc-DUNKNOWN.o	33517	34439
bpf_netdev.o		9713	9721
bpf_overlay.o		6184	6184
bpf_lcx_jit.o		37335	39389
And 2-3 times improvement in the verification speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  2 ++
 kernel/bpf/verifier.c        | 44 +++++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f7e15eeb60bb..fc8254d6b569 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -207,6 +207,7 @@ struct bpf_verifier_state {
 struct bpf_verifier_state_list {
 	struct bpf_verifier_state state;
 	struct bpf_verifier_state_list *next;
+	int miss_cnt, hit_cnt;
 };
 
 /* Possible states for alu_state member. */
@@ -280,6 +281,7 @@ struct bpf_verifier_env {
 	bool strict_alignment;		/* perform strict pointer alignment checks */
 	struct bpf_verifier_state *cur_state; /* current verifier state */
 	struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
+	struct bpf_verifier_state_list *free_list;
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
 	u32 used_map_cnt;		/* number of used maps */
 	u32 id_gen;			/* used to generate unique reg IDs */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2b5dcb2c6967..b18512ac205e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6152,11 +6152,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 {
 	struct bpf_verifier_state_list *new_sl;
-	struct bpf_verifier_state_list *sl;
+	struct bpf_verifier_state_list *sl, **pprev;
 	struct bpf_verifier_state *cur = env->cur_state, *new;
 	int i, j, err, states_cnt = 0;
 
-	sl = env->explored_states[insn_idx];
+	pprev = &env->explored_states[insn_idx];
+	sl = *pprev;
+
 	if (!sl)
 		/* this 'insn_idx' instruction wasn't marked, so we will not
 		 * be doing state search here
@@ -6167,6 +6169,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 
 	while (sl != STATE_LIST_MARK) {
 		if (states_equal(env, &sl->state, cur)) {
+			sl->hit_cnt++;
 			/* reached equivalent register/stack state,
 			 * prune the search.
 			 * Registers read by the continuation are read by us.
@@ -6182,8 +6185,35 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 				return err;
 			return 1;
 		}
-		sl = sl->next;
 		states_cnt++;
+		sl->miss_cnt++;
+		/* heuristic to determine whether this state is beneficial
+		 * to keep checking from state equivalence point of view.
+		 * Higher numbers increase max_states_per_insn and verification time,
+		 * but do not meaningfully decrease insn_processed.
+		 */
+		if (sl->miss_cnt > sl->hit_cnt * 3 + 3) {
+			/* the state is unlikely to be useful. Remove it to
+			 * speed up verification
+			 */
+			*pprev = sl->next;
+			if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) {
+				free_verifier_state(&sl->state, false);
+				kfree(sl);
+				env->peak_states--;
+			} else {
+				/* cannot free this state, since parentage chain may
+				 * walk it later. Add it for free_list instead to
+				 * be freed at the end of verification
+				 */
+				sl->next = env->free_list;
+				env->free_list = sl;
+			}
+			sl = *pprev;
+			continue;
+		}
+		pprev = &sl->next;
+		sl = *pprev;
 	}
 
 	if (env->max_states_per_insn < states_cnt)
@@ -7836,6 +7866,14 @@ static void free_states(struct bpf_verifier_env *env)
 	struct bpf_verifier_state_list *sl, *sln;
 	int i;
 
+	sl = env->free_list;
+	while (sl) {
+		sln = sl->next;
+		free_verifier_state(&sl->state, false);
+		kfree(sl);
+		sl = sln;
+	}
+
 	if (!env->explored_states)
 		return;
 
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 1/7] bpf: add verifier stats and log_level bit 2 Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 2/7] bpf: improve verification speed by droping states Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  3:12   ` Jakub Kicinski
  2019-04-01 15:38   ` Edward Cree
  2019-03-30  0:16 ` [PATCH bpf-next 4/7] bpf: increase complexity limit and maximum program size Alexei Starovoitov
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

With large verifier speed improvement brought by the previous patch
mark_reg_read() becomes the hottest function during verification.
On a typical program it consumes 40% of cpu.
mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ.
Once the register is marked there is no need to remark it again in the future.
Hence stop walking the chain once first LIVE_READ is seen.
This optimization drops mark_reg_read() time from 40% of cpu to <1%
and overall 2x improvement of verification speed.
For some programs the longest_mark_read_walk counter improves from ~500 to ~5

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b18512ac205e..6dfd148b58f6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1151,6 +1151,15 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 				parent->var_off.value, parent->off);
 			return -EFAULT;
 		}
+		if (parent->live & REG_LIVE_READ)
+			/* The parentage chain never changes and
+			 * this parent was already marked as LIVE_READ.
+			 * There is no need to keep walking the chain again and
+			 * keep re-marking all parents as LIVE_READ.
+			 * This case happens when the same register is read
+			 * multiple times without writes into it in-between.
+			 */
+			break;
 		/* ... then we depend on parent's value */
 		parent->live |= REG_LIVE_READ;
 		state = parent;
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 4/7] bpf: increase complexity limit and maximum program size
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2019-03-30  0:16 ` [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 5/7] bpf: increase verifier log limit Alexei Starovoitov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

Large verifier speed improvements allow to increase
verifier complexity limit.
Now regardless of the program composition and its size it takes
little time for the verifier to hit insn_processed limit.
On typical x86 machine non-debug kernel processes 1M instructions
in 1/10 of a second.
(before these speed improvements specially crafted programs
could be hitting multi-second verification times)
Full kasan kernel with debug takes ~1 second for the same 1M insns.
Hence bump the BPF_COMPLEXITY_LIMIT_INSNS limit to 1M.
Also increase the number of instructions per program
from 4k to internal BPF_COMPLEXITY_LIMIT_INSNS limit.
4k limit was confusing to users, since small programs with hundreds
of insns could be hitting BPF_COMPLEXITY_LIMIT_INSNS limit.
Sometimes adding more insns and bpf_trace_printk debug statements
would make the verifier accept the program while removing
code would make the verifier reject it.
Some user space application started to add #define MAX_FOO to
their programs and do:
  MAX_FOO=100;
again:
  compile with MAX_FOO;
  try to load;
  if (fails_to_load) { reduce MAX_FOO; goto again; }
to be able to fit maximum amount of processing into single program.
Other users artificially split their single program into a set of programs
and use all 32 iterations of tail_calls to increase compute limits.
And the most advanced folks used unlimited tc-bpf filter list
to execute many bpf programs.
Essentially the users managed to workaround 4k insn limit.
This patch removes the limit for root programs from uapi.
BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit
and success to load the program no longer depends on program size,
but on 'smartness' of the verifier only.
The verifier will continue to get smarter with every kernel release.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/syscall.c  | 3 ++-
 kernel/bpf/verifier.c | 1 -
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f62897198844..a445194b5fb6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -421,6 +421,7 @@ struct bpf_array {
 	};
 };

+#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
 #define MAX_TAIL_CALL_CNT 32

 struct bpf_event_entry {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index afca36f53c49..1d65e56594db 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1557,7 +1557,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	/* eBPF programs must be GPL compatible to use GPL-ed functions */
 	is_gpl = license_is_gpl_compatible(license);

-	if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS)
+	if (attr->insn_cnt == 0 ||
+	    attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
 		return -E2BIG;
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6dfd148b58f6..76089a094147 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -176,7 +176,6 @@ struct bpf_verifier_stack_elem {
 	struct bpf_verifier_stack_elem *next;
 };

-#define BPF_COMPLEXITY_LIMIT_INSNS	131072
 #define BPF_COMPLEXITY_LIMIT_STACK	1024
 #define BPF_COMPLEXITY_LIMIT_STATES	64

-- 
2.20.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 5/7] bpf: increase verifier log limit
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2019-03-30  0:16 ` [PATCH bpf-next 4/7] bpf: increase complexity limit and maximum program size Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  3:16   ` Jakub Kicinski
  2019-03-30  0:16 ` [PATCH bpf-next 6/7] libbpf: teach libbpf about log_level bit 2 Alexei Starovoitov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

The existing 16Mbyte verifier log limit is not enough for log_level=2
even for small programs. Increase it to 1Gbyte.
Note it's not a kernel memory limit.
It's an amount of memory user space provides to store
the verifier log. The kernel populates it 1k at a time.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 76089a094147..3722fb0bc7bc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7969,7 +7969,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 
 		ret = -EINVAL;
 		/* log attributes have to be sane */
-		if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 ||
+		if (log->len_total < 128 || log->len_total > UINT_MAX >> 2 ||
 		    !log->level || !log->ubuf || log->level & ~BPF_LOG_MASK)
 			goto err_unlock;
 	}
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 6/7] libbpf: teach libbpf about log_level bit 2
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2019-03-30  0:16 ` [PATCH bpf-next 5/7] bpf: increase verifier log limit Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-03-30  0:16 ` [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests Alexei Starovoitov
  2019-03-30  3:18 ` [PATCH bpf-next 0/7] bpf: improve verifier scalability Jakub Kicinski
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

Allow bpf_prog_load_xattr() to specify log_level for program loading.

Teach libbpf to accept log_level with bit 2 set.

Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
There is no downside to increase it to a maximum allowed by old kernels.
Existing 256k limit caused ENOSPC errors and users were not able to see
verifier error which is printed at the end of the verifier log.

If ENOSPC is hit, double the verifier log and try again to capture
the verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/bpf.c    |  2 +-
 tools/lib/bpf/bpf.h    |  2 +-
 tools/lib/bpf/libbpf.c | 16 ++++++++++++++--
 tools/lib/bpf/libbpf.h |  1 +
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9cd015574e83..a1db869a6fda 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -223,7 +223,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 		return -EINVAL;
 
 	log_level = load_attr->log_level;
-	if (log_level > 2 || (log_level && !log_buf))
+	if (log_level > (4 | 2 | 1) || (log_level && !log_buf))
 		return -EINVAL;
 
 	name_len = load_attr->name ? strlen(load_attr->name) : 0;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6ffdd79bea89..e2c0df7b831f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -92,7 +92,7 @@ struct bpf_load_program_attr {
 #define MAPS_RELAX_COMPAT	0x01
 
 /* Recommend log buffer size */
-#define BPF_LOG_BUF_SIZE (256 * 1024)
+#define BPF_LOG_BUF_SIZE (16 * 1024 * 1024) /* verifier maximum in kernels <= 5.1 */
 LIBBPF_API int
 bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 		       char *log_buf, size_t log_buf_sz);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 11c25d9ea431..e1e4d35cf08e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -152,6 +152,7 @@ struct bpf_program {
 		};
 	} *reloc_desc;
 	int nr_reloc;
+	int log_level;
 
 	struct {
 		int nr;
@@ -1494,6 +1495,7 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 {
 	struct bpf_load_program_attr load_attr;
 	char *cp, errmsg[STRERR_BUFSIZE];
+	int log_buf_size = BPF_LOG_BUF_SIZE;
 	char *log_buf;
 	int ret;
 
@@ -1514,21 +1516,30 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	load_attr.line_info = prog->line_info;
 	load_attr.line_info_rec_size = prog->line_info_rec_size;
 	load_attr.line_info_cnt = prog->line_info_cnt;
+	load_attr.log_level = prog->log_level;
 	if (!load_attr.insns || !load_attr.insns_cnt)
 		return -EINVAL;
 
-	log_buf = malloc(BPF_LOG_BUF_SIZE);
+retry_load:
+	log_buf = malloc(log_buf_size);
 	if (!log_buf)
 		pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
 
-	ret = bpf_load_program_xattr(&load_attr, log_buf, BPF_LOG_BUF_SIZE);
+	ret = bpf_load_program_xattr(&load_attr, log_buf, log_buf_size);
 
 	if (ret >= 0) {
+		if (load_attr.log_level)
+			pr_debug("verifier log:\n%s", log_buf);
 		*pfd = ret;
 		ret = 0;
 		goto out;
 	}
 
+	if (errno == ENOSPC) {
+		log_buf_size <<= 1;
+		free(log_buf);
+		goto retry_load;
+	}
 	ret = -LIBBPF_ERRNO__LOAD;
 	cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
 	pr_warning("load bpf program failed: %s\n", cp);
@@ -2938,6 +2949,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 		bpf_program__set_expected_attach_type(prog,
 						      expected_attach_type);
 
+		prog->log_level = attr->log_level;
 		if (!first_prog)
 			first_prog = prog;
 	}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index c70785cc8ef5..531323391d07 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -314,6 +314,7 @@ struct bpf_prog_load_attr {
 	enum bpf_prog_type prog_type;
 	enum bpf_attach_type expected_attach_type;
 	int ifindex;
+	int log_level;
 };
 
 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2019-03-30  0:16 ` [PATCH bpf-next 6/7] libbpf: teach libbpf about log_level bit 2 Alexei Starovoitov
@ 2019-03-30  0:16 ` Alexei Starovoitov
  2019-04-01 14:17   ` Daniel Borkmann
  2019-03-30  3:18 ` [PATCH bpf-next 0/7] bpf: improve verifier scalability Jakub Kicinski
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  0:16 UTC (permalink / raw)
  To: davem; +Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

Add 3 basic tests that stress verifier scalability.

test_verif_scale1.c calls non-inlined jhash() function 90 times on
different position in the packet.
This test simulates network packet parsing.
jhash function is ~140 instructions and main program is ~1200 insns.

test_verif_scale2.c force inlines jhash() function 90 times.
This program is ~15k instructions long.

test_verif_scale3.c calls non-inlined jhash() function 90 times on
But this time jhash has to process 32-bytes from the packet
instead of 14-bytes in tests 1 and 2.
jhash function is ~230 insns and main program is ~1200 insns.

$ test_progs -s
can be used to see verifier stats.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../bpf/prog_tests/bpf_verif_scale.c          | 49 +++++++++++++
 .../testing/selftests/bpf/progs/test_jhash.h  | 70 +++++++++++++++++++
 .../selftests/bpf/progs/test_verif_scale1.c   | 30 ++++++++
 .../selftests/bpf/progs/test_verif_scale2.c   | 30 ++++++++
 .../selftests/bpf/progs/test_verif_scale3.c   | 30 ++++++++
 tools/testing/selftests/bpf/test_progs.c      |  6 +-
 tools/testing/selftests/bpf/test_progs.h      |  1 +
 7 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_jhash.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale1.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale2.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_verif_scale3.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
new file mode 100644
index 000000000000..23b159d95c3f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <test_progs.h>
+static int libbpf_debug_print(enum libbpf_print_level level,
+			      const char *format, va_list args)
+{
+	if (level != LIBBPF_DEBUG)
+		return 0;
+
+	if (!strstr(format, "verifier log"))
+		return 0;
+	return vfprintf(stderr, "%s", args);
+}
+
+static int check_load(const char *file)
+{
+	struct bpf_prog_load_attr attr;
+	struct bpf_object *obj;
+	int err, prog_fd;
+
+	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+	attr.file = file;
+	attr.prog_type = BPF_PROG_TYPE_SCHED_CLS;
+	attr.log_level = 4;
+	err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
+	bpf_object__close(obj);
+	if (err)
+		error_cnt++;
+	return err;
+}
+
+void test_bpf_verif_scale(void)
+{
+	const char *file1 = "./test_verif_scale1.o";
+	const char *file2 = "./test_verif_scale2.o";
+	const char *file3 = "./test_verif_scale3.o";
+	int err;
+
+	if (verifier_stats)
+		libbpf_set_print(libbpf_debug_print);
+
+	err = check_load(file1);
+	err |= check_load(file2);
+	err |= check_load(file3);
+	if (!err)
+		printf("test_verif_scale:OK\n");
+	else
+		printf("test_verif_scale:FAIL\n");
+}
diff --git a/tools/testing/selftests/bpf/progs/test_jhash.h b/tools/testing/selftests/bpf/progs/test_jhash.h
new file mode 100644
index 000000000000..3d12c11a8d47
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_jhash.h
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+typedef unsigned int u32;
+
+static __attribute__((always_inline)) u32 rol32(u32 word, unsigned int shift)
+{
+	return (word << shift) | (word >> ((-shift) & 31));
+}
+
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
+
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
+}
+
+#define JHASH_INITVAL		0xdeadbeef
+
+static ATTR
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c;
+	const unsigned char *k = key;
+
+	a = b = c = JHASH_INITVAL + length + initval;
+
+	while (length > 12) {
+		a += *(volatile u32 *)(k);
+		b += *(volatile u32 *)(k + 4);
+		c += *(volatile u32 *)(k + 8);
+		__jhash_mix(a, b, c);
+		length -= 12;
+		k += 12;
+	}
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 c ^= a;
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale1.c b/tools/testing/selftests/bpf/progs/test_verif_scale1.c
new file mode 100644
index 000000000000..f3236ce35f31
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_verif_scale1.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((noinline))
+#include "test_jhash.h"
+
+SEC("scale90_noinline")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	void *ptr;
+	int ret = 0, nh_off, i = 0;
+
+	nh_off = 14;
+
+	/* pragma unroll doesn't work on large loops */
+
+#define C do { \
+	ptr = data + i; \
+	if (ptr + nh_off > data_end) \
+		break; \
+	ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+	} while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+	C30;C30;C30; /* 90 calls */
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale2.c b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
new file mode 100644
index 000000000000..77830693eccb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((always_inline))
+#include "test_jhash.h"
+
+SEC("scale90_inline")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	void *ptr;
+	int ret = 0, nh_off, i = 0;
+
+	nh_off = 14;
+
+	/* pragma unroll doesn't work on large loops */
+
+#define C do { \
+	ptr = data + i; \
+	if (ptr + nh_off > data_end) \
+		break; \
+	ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+	} while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+	C30;C30;C30; /* 90 calls */
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale3.c b/tools/testing/selftests/bpf/progs/test_verif_scale3.c
new file mode 100644
index 000000000000..1848da04ea41
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_verif_scale3.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#define ATTR __attribute__((noinline))
+#include "test_jhash.h"
+
+SEC("scale90_noinline32")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	void *ptr;
+	int ret = 0, nh_off, i = 0;
+
+	nh_off = 32;
+
+	/* pragma unroll doesn't work on large loops */
+
+#define C do { \
+	ptr = data + i; \
+	if (ptr + nh_off > data_end) \
+		break; \
+	ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+	} while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+	C30;C30;C30; /* 90 calls */
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 5d10aee9e277..bf5c90998916 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -9,6 +9,7 @@
 
 int error_cnt, pass_cnt;
 bool jit_enabled;
+bool verifier_stats = false;
 
 struct ipv4_packet pkt_v4 = {
 	.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
@@ -162,12 +163,15 @@ void *spin_lock_thread(void *arg)
 #include <prog_tests/tests.h>
 #undef DECLARE
 
-int main(void)
+int main(int ac, char **av)
 {
 	srand(time(NULL));
 
 	jit_enabled = is_jit_enabled();
 
+	if (ac == 2 && strcmp(av[1], "-s") == 0)
+		verifier_stats = true;
+
 #define CALL
 #include <prog_tests/tests.h>
 #undef CALL
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 51a07367cd43..f095e1d4c657 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -40,6 +40,7 @@ typedef __u16 __sum16;
 
 extern int error_cnt, pass_cnt;
 extern bool jit_enabled;
+extern bool verifier_stats;
 
 #define MAGIC_BYTES 123
 
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: improve verification speed by droping states
  2019-03-30  0:16 ` [PATCH bpf-next 2/7] bpf: improve verification speed by droping states Alexei Starovoitov
@ 2019-03-30  3:12   ` Jakub Kicinski
  2019-03-30  3:29     ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2019-03-30  3:12 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, jannh, netdev, bpf, kernel-team

On Fri, 29 Mar 2019 17:16:07 -0700, Alexei Starovoitov wrote:
> Branch instructions, branch targets and calls in a bpf program are
> the places where the verifier remembers states that led to successful
> verification of the program.
> These states are used to prune brute force program analysis.
> For unprivileged programs there is a limit of 64 states per such
> 'branching' instructions (maximum length is tracked by max_states_per_insn
> counter introduced in the previous patch).
> Simply reducing this threshold to 32 or lower increases insn_processed
> metric to the point that small valid programs get rejected.
> For root programs there is no limit and cilium programs can have
> max_states_per_insn to be 100 or higher.
> Walking 100+ states multiplied by number of 'branching' insns during
> verification consumes significant amount of cpu time.
> Turned out simple LRU-like mechanism can be used to remove states
> that unlikely will be helpful in future search pruning.
> This patch introduces hit_cnt and miss_cnt counters:
> hit_cnt - this many times this state successfully pruned the search
> miss_cnt - this many times this state was not equivalent to other states
> (and that other states were added to state list)
> 
> The heuristic introduced in this patch is:
> if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
>   /* drop this state from future considerations */
> 
> Higher numbers increase max_states_per_insn (allow more states to be
> considered for pruning) and slow verification speed, but do not meaningfully
> reduce insn_processed metric.
> Lower numbers drop too many states and insn_processed increases too much.
> Many different formulas were considered.
> This one is simple and works well enough in practice.
> (the analysis was done on selftests/progs/* and on cilium programs)
> 
> The end result is this heuristic improves verification speed by 10 times.
> Large synthetic programs that used to take a second more now take
> 1/10 of a second.
> In cases where max_states_per_insn used to be 100 or more, now it's ~10.
> 
> There is a slight increase in insn_processed for cilium progs:
>                        before   after
> bpf_lb-DLB_L3.o 	1831	1838
> bpf_lb-DLB_L4.o 	3029	3218
> bpf_lb-DUNKNOWN.o 	1064	1064
> bpf_lxc-DDROP_ALL.o	26309	26935
> bpf_lxc-DUNKNOWN.o	33517	34439
> bpf_netdev.o		9713	9721
> bpf_overlay.o		6184	6184
> bpf_lcx_jit.o		37335	39389
> And 2-3 times improvement in the verification speed.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

> @@ -6182,8 +6185,35 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>  				return err;
>  			return 1;
>  		}
> -		sl = sl->next;
>  		states_cnt++;
> +		sl->miss_cnt++;
> +		/* heuristic to determine whether this state is beneficial
> +		 * to keep checking from state equivalence point of view.
> +		 * Higher numbers increase max_states_per_insn and verification time,
> +		 * but do not meaningfully decrease insn_processed.
> +		 */
> +		if (sl->miss_cnt > sl->hit_cnt * 3 + 3) {
> +			/* the state is unlikely to be useful. Remove it to
> +			 * speed up verification
> +			 */
> +			*pprev = sl->next;
> +			if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) {
> +				free_verifier_state(&sl->state, false);
> +				kfree(sl);
> +				env->peak_states--;

nit: is peak_states always equal to number of states when verifier
     exits?

> +			} else {
> +				/* cannot free this state, since parentage chain may
> +				 * walk it later. Add it for free_list instead to
> +				 * be freed at the end of verification
> +				 */
> +				sl->next = env->free_list;
> +				env->free_list = sl;
> +			}
> +			sl = *pprev;
> +			continue;
> +		}
> +		pprev = &sl->next;
> +		sl = *pprev;
>  	}
>  
>  	if (env->max_states_per_insn < states_cnt)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read
  2019-03-30  0:16 ` [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read Alexei Starovoitov
@ 2019-03-30  3:12   ` Jakub Kicinski
  2019-04-01 15:38   ` Edward Cree
  1 sibling, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2019-03-30  3:12 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, jannh, netdev, bpf, kernel-team

On Fri, 29 Mar 2019 17:16:08 -0700, Alexei Starovoitov wrote:
> With large verifier speed improvement brought by the previous patch
> mark_reg_read() becomes the hottest function during verification.
> On a typical program it consumes 40% of cpu.
> mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ.
> Once the register is marked there is no need to remark it again in the future.
> Hence stop walking the chain once first LIVE_READ is seen.
> This optimization drops mark_reg_read() time from 40% of cpu to <1%
> and overall 2x improvement of verification speed.
> For some programs the longest_mark_read_walk counter improves from ~500 to ~5
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/7] bpf: increase verifier log limit
  2019-03-30  0:16 ` [PATCH bpf-next 5/7] bpf: increase verifier log limit Alexei Starovoitov
@ 2019-03-30  3:16   ` Jakub Kicinski
  0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2019-03-30  3:16 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, jannh, netdev, bpf, kernel-team

On Fri, 29 Mar 2019 17:16:10 -0700, Alexei Starovoitov wrote:
> The existing 16Mbyte verifier log limit is not enough for log_level=2
> even for small programs. Increase it to 1Gbyte.
> Note it's not a kernel memory limit.
> It's an amount of memory user space provides to store
> the verifier log. The kernel populates it 1k at a time.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] bpf: improve verifier scalability
  2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2019-03-30  0:16 ` [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests Alexei Starovoitov
@ 2019-03-30  3:18 ` Jakub Kicinski
  2019-03-30  3:35   ` Alexei Starovoitov
  7 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2019-03-30  3:18 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, jannh, netdev, bpf, kernel-team

On Fri, 29 Mar 2019 17:16:05 -0700, Alexei Starovoitov wrote:
> Realize two key ideas to speed up verification speed by ~20 times
> 1. every 'branching' instructions records all verifier states.
>    not all of them are useful for search pruning.
>    add a simple heuristic to keep states that were successful in search pruning
>    and remove those that were not
> 2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ.
>    Once the register is marked there is no need to remark it again in the future.
>    Hence stop walking the chain once first LIVE_READ is seen.
> 
> 1st optimization gives 10x speed up on large programs
> and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%.
> Combined the deliver ~20x speedup on large programs.
> 
> Faster and bounded verification time allows to increase insn_processed
> limit to 1 million from 130k.
> 
> Worst case it takes 1/10 of a second to process that many instructions
> and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state)
> which is around ~5Mbyte.
> 
> Increase insn_per_program limit for root to insn_processed limit.
> 
> Add verification stats and stress tests for verifier scalability.
> 
> This patch set is the first step to be able to accept large programs.
> The verifier still suffers from its brute force algorithm and
> large programs can easily hit 1M insn_processed limit.
> A lot more work is necessary to be able to verify large programs.

Very nice!

Hopefully this doesn't discourage people from working on loops ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 2/7] bpf: improve verification speed by droping states
  2019-03-30  3:12   ` Jakub Kicinski
@ 2019-03-30  3:29     ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  3:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, davem, daniel, jannh, netdev, bpf,
	kernel-team

On Fri, Mar 29, 2019 at 08:12:39PM -0700, Jakub Kicinski wrote:
> On Fri, 29 Mar 2019 17:16:07 -0700, Alexei Starovoitov wrote:
> > Branch instructions, branch targets and calls in a bpf program are
> > the places where the verifier remembers states that led to successful
> > verification of the program.
> > These states are used to prune brute force program analysis.
> > For unprivileged programs there is a limit of 64 states per such
> > 'branching' instructions (maximum length is tracked by max_states_per_insn
> > counter introduced in the previous patch).
> > Simply reducing this threshold to 32 or lower increases insn_processed
> > metric to the point that small valid programs get rejected.
> > For root programs there is no limit and cilium programs can have
> > max_states_per_insn to be 100 or higher.
> > Walking 100+ states multiplied by number of 'branching' insns during
> > verification consumes significant amount of cpu time.
> > Turned out simple LRU-like mechanism can be used to remove states
> > that unlikely will be helpful in future search pruning.
> > This patch introduces hit_cnt and miss_cnt counters:
> > hit_cnt - this many times this state successfully pruned the search
> > miss_cnt - this many times this state was not equivalent to other states
> > (and that other states were added to state list)
> > 
> > The heuristic introduced in this patch is:
> > if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
> >   /* drop this state from future considerations */
> > 
> > Higher numbers increase max_states_per_insn (allow more states to be
> > considered for pruning) and slow verification speed, but do not meaningfully
> > reduce insn_processed metric.
> > Lower numbers drop too many states and insn_processed increases too much.
> > Many different formulas were considered.
> > This one is simple and works well enough in practice.
> > (the analysis was done on selftests/progs/* and on cilium programs)
> > 
> > The end result is this heuristic improves verification speed by 10 times.
> > Large synthetic programs that used to take a second more now take
> > 1/10 of a second.
> > In cases where max_states_per_insn used to be 100 or more, now it's ~10.
> > 
> > There is a slight increase in insn_processed for cilium progs:
> >                        before   after
> > bpf_lb-DLB_L3.o 	1831	1838
> > bpf_lb-DLB_L4.o 	3029	3218
> > bpf_lb-DUNKNOWN.o 	1064	1064
> > bpf_lxc-DDROP_ALL.o	26309	26935
> > bpf_lxc-DUNKNOWN.o	33517	34439
> > bpf_netdev.o		9713	9721
> > bpf_overlay.o		6184	6184
> > bpf_lcx_jit.o		37335	39389
> > And 2-3 times improvement in the verification speed.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> 
> > @@ -6182,8 +6185,35 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >  				return err;
> >  			return 1;
> >  		}
> > -		sl = sl->next;
> >  		states_cnt++;
> > +		sl->miss_cnt++;
> > +		/* heuristic to determine whether this state is beneficial
> > +		 * to keep checking from state equivalence point of view.
> > +		 * Higher numbers increase max_states_per_insn and verification time,
> > +		 * but do not meaningfully decrease insn_processed.
> > +		 */
> > +		if (sl->miss_cnt > sl->hit_cnt * 3 + 3) {
> > +			/* the state is unlikely to be useful. Remove it to
> > +			 * speed up verification
> > +			 */
> > +			*pprev = sl->next;
> > +			if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) {
> > +				free_verifier_state(&sl->state, false);
> > +				kfree(sl);
> > +				env->peak_states--;
> 
> nit: is peak_states always equal to number of states when verifier
>      exits?

yes.
I was thinking as a follow up to add peak_states-- in bpf_verifier_env free-ing
logic to check that there are no memory leaks.

Few other follow ups on my todo list:
. write a ton more tests for scalability
. remove few leftover global variables and drop mutex for root
. account all verifier memory in memcg
. may be introduce a limit for peak_states for unpriv
. continue exploring state merging ideas
. introduce explicit rcu_unlock + preempt_enable in the programs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] bpf: improve verifier scalability
  2019-03-30  3:18 ` [PATCH bpf-next 0/7] bpf: improve verifier scalability Jakub Kicinski
@ 2019-03-30  3:35   ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-30  3:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, davem, daniel, jannh, netdev, bpf,
	kernel-team

On Fri, Mar 29, 2019 at 08:18:28PM -0700, Jakub Kicinski wrote:
> On Fri, 29 Mar 2019 17:16:05 -0700, Alexei Starovoitov wrote:
> > Realize two key ideas to speed up verification speed by ~20 times
> > 1. every 'branching' instructions records all verifier states.
> >    not all of them are useful for search pruning.
> >    add a simple heuristic to keep states that were successful in search pruning
> >    and remove those that were not
> > 2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ.
> >    Once the register is marked there is no need to remark it again in the future.
> >    Hence stop walking the chain once first LIVE_READ is seen.
> > 
> > 1st optimization gives 10x speed up on large programs
> > and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%.
> > Combined the deliver ~20x speedup on large programs.
> > 
> > Faster and bounded verification time allows to increase insn_processed
> > limit to 1 million from 130k.
> > 
> > Worst case it takes 1/10 of a second to process that many instructions
> > and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state)
> > which is around ~5Mbyte.
> > 
> > Increase insn_per_program limit for root to insn_processed limit.
> > 
> > Add verification stats and stress tests for verifier scalability.
> > 
> > This patch set is the first step to be able to accept large programs.
> > The verifier still suffers from its brute force algorithm and
> > large programs can easily hit 1M insn_processed limit.
> > A lot more work is necessary to be able to verify large programs.
> 
> Very nice!
> 
> Hopefully this doesn't discourage people from working on loops ;)

Definitely not :) we desperately need loops.
llvm performs 'pragma unroll' only for relatively small loop counts.
Walking stack traces still not possible.
In the test from patch 7 doing jhash() over 64-bytes will not work,
because llvm will generate a loop ignoring pragma unroll.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-03-30  0:16 ` [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests Alexei Starovoitov
@ 2019-04-01 14:17   ` Daniel Borkmann
  2019-04-01 16:35     ` Daniel Borkmann
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-04-01 14:17 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: jakub.kicinski, jannh, netdev, bpf, kernel-team

On 03/30/2019 01:16 AM, Alexei Starovoitov wrote:
> Add 3 basic tests that stress verifier scalability.
> 
> test_verif_scale1.c calls non-inlined jhash() function 90 times on
> different position in the packet.
> This test simulates network packet parsing.
> jhash function is ~140 instructions and main program is ~1200 insns.
> 
> test_verif_scale2.c force inlines jhash() function 90 times.
> This program is ~15k instructions long.
> 
> test_verif_scale3.c calls non-inlined jhash() function 90 times on
> But this time jhash has to process 32-bytes from the packet
> instead of 14-bytes in tests 1 and 2.
> jhash function is ~230 insns and main program is ~1200 insns.
> 
> $ test_progs -s
> can be used to see verifier stats.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Do you also have some test cases with actual 1M insns resp. such that hit
the new complexity limit to check timing / resources it consumes? I think
these would be good to have as well as part of selftests.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read
  2019-03-30  0:16 ` [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read Alexei Starovoitov
  2019-03-30  3:12   ` Jakub Kicinski
@ 2019-04-01 15:38   ` Edward Cree
  1 sibling, 0 replies; 20+ messages in thread
From: Edward Cree @ 2019-04-01 15:38 UTC (permalink / raw)
  To: Alexei Starovoitov, davem
  Cc: daniel, jakub.kicinski, jannh, netdev, bpf, kernel-team

On 30/03/2019 00:16, Alexei Starovoitov wrote:
> With large verifier speed improvement brought by the previous patch
> mark_reg_read() becomes the hottest function during verification.
> On a typical program it consumes 40% of cpu.
> mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ.
> Once the register is marked there is no need to remark it again in the future.
> Hence stop walking the chain once first LIVE_READ is seen.
> This optimization drops mark_reg_read() time from 40% of cpu to <1%
> and overall 2x improvement of verification speed.
> For some programs the longest_mark_read_walk counter improves from ~500 to ~5
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Edward Cree <ecree@solarflare.com>
> ---
>  kernel/bpf/verifier.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index b18512ac205e..6dfd148b58f6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1151,6 +1151,15 @@ static int mark_reg_read(struct bpf_verifier_env *env,
>  				parent->var_off.value, parent->off);
>  			return -EFAULT;
>  		}
> +		if (parent->live & REG_LIVE_READ)
> +			/* The parentage chain never changes and
> +			 * this parent was already marked as LIVE_READ.
> +			 * There is no need to keep walking the chain again and
> +			 * keep re-marking all parents as LIVE_READ.
> +			 * This case happens when the same register is read
> +			 * multiple times without writes into it in-between.
> +			 */
> +			break;
>  		/* ... then we depend on parent's value */
>  		parent->live |= REG_LIVE_READ;
>  		state = parent;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-04-01 14:17   ` Daniel Borkmann
@ 2019-04-01 16:35     ` Daniel Borkmann
  2019-04-02  2:08       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-04-01 16:35 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: jakub.kicinski, jannh, netdev, bpf, kernel-team

On 04/01/2019 04:17 PM, Daniel Borkmann wrote:
> On 03/30/2019 01:16 AM, Alexei Starovoitov wrote:
>> Add 3 basic tests that stress verifier scalability.
>>
>> test_verif_scale1.c calls non-inlined jhash() function 90 times on
>> different position in the packet.
>> This test simulates network packet parsing.
>> jhash function is ~140 instructions and main program is ~1200 insns.
>>
>> test_verif_scale2.c force inlines jhash() function 90 times.
>> This program is ~15k instructions long.
>>
>> test_verif_scale3.c calls non-inlined jhash() function 90 times on
>> But this time jhash has to process 32-bytes from the packet
>> instead of 14-bytes in tests 1 and 2.
>> jhash function is ~230 insns and main program is ~1200 insns.
>>
>> $ test_progs -s
>> can be used to see verifier stats.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 
> Do you also have some test cases with actual 1M insns resp. such that hit
> the new complexity limit to check timing / resources it consumes? I think
> these would be good to have as well as part of selftests.

(Another thought on why it would be good to have such tests would be to
see if we would otherwise break long jumps again due to offset truncation
which we had in the core as well as in JITs in the past.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-04-01 16:35     ` Daniel Borkmann
@ 2019-04-02  2:08       ` Alexei Starovoitov
  2019-04-02 14:13         ` Daniel Borkmann
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-04-02  2:08 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, davem@davemloft.net
  Cc: jakub.kicinski@netronome.com, jannh@google.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org, Kernel Team

On 4/1/19 9:35 AM, Daniel Borkmann wrote:
> On 04/01/2019 04:17 PM, Daniel Borkmann wrote:
>> On 03/30/2019 01:16 AM, Alexei Starovoitov wrote:
>>> Add 3 basic tests that stress verifier scalability.
>>>
>>> test_verif_scale1.c calls non-inlined jhash() function 90 times on
>>> different position in the packet.
>>> This test simulates network packet parsing.
>>> jhash function is ~140 instructions and main program is ~1200 insns.
>>>
>>> test_verif_scale2.c force inlines jhash() function 90 times.
>>> This program is ~15k instructions long.
>>>
>>> test_verif_scale3.c calls non-inlined jhash() function 90 times on
>>> But this time jhash has to process 32-bytes from the packet
>>> instead of 14-bytes in tests 1 and 2.
>>> jhash function is ~230 insns and main program is ~1200 insns.
>>>
>>> $ test_progs -s
>>> can be used to see verifier stats.
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>
>> Do you also have some test cases with actual 1M insns resp. such that hit
>> the new complexity limit to check timing / resources it consumes? I think
>> these would be good to have as well as part of selftests.
> 
> (Another thought on why it would be good to have such tests would be to
> see if we would otherwise break long jumps again due to offset truncation
> which we had in the core as well as in JITs in the past.)

llvm truncates jump offsets for large programs.
Unfortunately we need to introduce JA instruction
with 32-bit offset on kernel side before we can fix llvm.
There are plenty of other scalability issues on the kernel side.
This patch set is the first step.
Only non-jumpy program of 1m insn loads and works.
Realistically we're still far away from accepting large programs,
but to get there we need to bump the limit and experience these issues.
I noticed a typo in patch 1, so will respin and will add a test.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-04-02  2:08       ` Alexei Starovoitov
@ 2019-04-02 14:13         ` Daniel Borkmann
  2019-04-02 18:12           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-04-02 14:13 UTC (permalink / raw)
  To: Alexei Starovoitov, Alexei Starovoitov, davem@davemloft.net
  Cc: jakub.kicinski@netronome.com, jannh@google.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org, Kernel Team

On 04/02/2019 04:08 AM, Alexei Starovoitov wrote:
> On 4/1/19 9:35 AM, Daniel Borkmann wrote:
>> On 04/01/2019 04:17 PM, Daniel Borkmann wrote:
>>> On 03/30/2019 01:16 AM, Alexei Starovoitov wrote:
>>>> Add 3 basic tests that stress verifier scalability.
>>>>
>>>> test_verif_scale1.c calls non-inlined jhash() function 90 times on
>>>> different position in the packet.
>>>> This test simulates network packet parsing.
>>>> jhash function is ~140 instructions and main program is ~1200 insns.
>>>>
>>>> test_verif_scale2.c force inlines jhash() function 90 times.
>>>> This program is ~15k instructions long.
>>>>
>>>> test_verif_scale3.c calls non-inlined jhash() function 90 times on
>>>> But this time jhash has to process 32-bytes from the packet
>>>> instead of 14-bytes in tests 1 and 2.
>>>> jhash function is ~230 insns and main program is ~1200 insns.
>>>>
>>>> $ test_progs -s
>>>> can be used to see verifier stats.
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>>
>>> Do you also have some test cases with actual 1M insns resp. such that hit
>>> the new complexity limit to check timing / resources it consumes? I think
>>> these would be good to have as well as part of selftests.
>>
>> (Another thought on why it would be good to have such tests would be to
>> see if we would otherwise break long jumps again due to offset truncation
>> which we had in the core as well as in JITs in the past.)
> 
> llvm truncates jump offsets for large programs.
> Unfortunately we need to introduce JA instruction
> with 32-bit offset on kernel side before we can fix llvm.
> There are plenty of other scalability issues on the kernel side.
> This patch set is the first step.

Right.

> Only non-jumpy program of 1m insn loads and works.
> Realistically we're still far away from accepting large programs,
> but to get there we need to bump the limit and experience these issues.

I think my main worry is that if we're realistically still far away from
accepting 1M insns programs and have potential unknown blockers in the
road wrt verifying safety, then why lifting the limit for root to 1M
today already?

I think the complexity limit itself is fine, but why not being more
conservative on the number of instructions to something we believe
verifier can realistically handle /today/ wrt worst case consumptions in
case of complexity attacks, and go further from there? Once it's on 1M in
UAPI it's effectively frozen and then we'll be re-adding slightly similar
upper limits again which caps the 1M to something much lower (though I
doubt we'll see such large programs any time soon).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests
  2019-04-02 14:13         ` Daniel Borkmann
@ 2019-04-02 18:12           ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-04-02 18:12 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Alexei Starovoitov, davem@davemloft.net,
	jakub.kicinski@netronome.com, jannh@google.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org, Kernel Team

On Tue, Apr 02, 2019 at 04:13:58PM +0200, Daniel Borkmann wrote:
> On 04/02/2019 04:08 AM, Alexei Starovoitov wrote:
> > On 4/1/19 9:35 AM, Daniel Borkmann wrote:
> >> On 04/01/2019 04:17 PM, Daniel Borkmann wrote:
> >>> On 03/30/2019 01:16 AM, Alexei Starovoitov wrote:
> >>>> Add 3 basic tests that stress verifier scalability.
> >>>>
> >>>> test_verif_scale1.c calls non-inlined jhash() function 90 times on
> >>>> different position in the packet.
> >>>> This test simulates network packet parsing.
> >>>> jhash function is ~140 instructions and main program is ~1200 insns.
> >>>>
> >>>> test_verif_scale2.c force inlines jhash() function 90 times.
> >>>> This program is ~15k instructions long.
> >>>>
> >>>> test_verif_scale3.c calls non-inlined jhash() function 90 times on
> >>>> But this time jhash has to process 32-bytes from the packet
> >>>> instead of 14-bytes in tests 1 and 2.
> >>>> jhash function is ~230 insns and main program is ~1200 insns.
> >>>>
> >>>> $ test_progs -s
> >>>> can be used to see verifier stats.
> >>>>
> >>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >>>
> >>> Do you also have some test cases with actual 1M insns resp. such that hit
> >>> the new complexity limit to check timing / resources it consumes? I think
> >>> these would be good to have as well as part of selftests.
> >>
> >> (Another thought on why it would be good to have such tests would be to
> >> see if we would otherwise break long jumps again due to offset truncation
> >> which we had in the core as well as in JITs in the past.)
> > 
> > llvm truncates jump offsets for large programs.
> > Unfortunately we need to introduce JA instruction
> > with 32-bit offset on kernel side before we can fix llvm.
> > There are plenty of other scalability issues on the kernel side.
> > This patch set is the first step.
> 
> Right.
> 
> > Only non-jumpy program of 1m insn loads and works.
> > Realistically we're still far away from accepting large programs,
> > but to get there we need to bump the limit and experience these issues.
> 
> I think my main worry is that if we're realistically still far away from
> accepting 1M insns programs and have potential unknown blockers in the
> road wrt verifying safety, then why lifting the limit for root to 1M
> today already?
> 
> I think the complexity limit itself is fine, but why not being more
> conservative on the number of instructions to something we believe
> verifier can realistically handle /today/ wrt worst case consumptions in
> case of complexity attacks, and go further from there? Once it's on 1M in
> UAPI it's effectively frozen and then we'll be re-adding slightly similar
> upper limits again which caps the 1M to something much lower (though I
> doubt we'll see such large programs any time soon).

1m is not an uapi.
4k sort-of was an uapi limit, but in practice it wasn't.
Could we load all programs that fit into 4k? Obviously not.
There are plenty of internal verifier limits.
Like BPF_COMPLEXITY_LIMIT_STACK. They are discoverable by user space,
but they are not uapi. Meaning that we can change them whichever direction.
BPF_COMPLEXITY_LIMIT_STATES is 64, but I'm thinking to decrease it to 8.

Here is the commit log from patch 6:
"This patch removes the limit for root programs from uapi.
BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit
and success to load the program no longer depends on program size,
but on 'smartness' of the verifier only.
The verifier will continue to get smarter with every kernel release.
"
Removing 4k limit for root doesn't change much from user's perspective.
Every program has a different limit depending on a kernel version.
I gave an example in patch 6 where folks compile the same program
with different #define because verifier smartness is different
between kernels.
4k was a stated limit, but irrelevant in practice.
Am I happy with this situation? Of coure not.
I've been talking about the requirement to support 1m instructions
for a year now, but most bpf developers didn't take it seriously and
I see a creep of features into the verifier that makes it harder and
harder to support large programs.
Most folks operate with an assumption that bpf programs are tiny,
so it's ok to use inefficent algorithms. This has to change.
We have to remove the limit and experience these issues otherwise
that day of verifier being 'ready' will never come.
With 4k limit syzbot found a way to make verifier spend 10 seconds
verifying a program of ~300 instructions long.
Will it find something again? No doubt. Whether the limit on program
size is there or not is not going to stop fuzzing tools.
The removal of instruction limit is changing bpf developer's mindset.
It's changing _us_. That's the key difference.
It is a requirement to support large programs and we need to make sure
internals of the verifier can scale.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-04-02 18:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-30  0:16 [PATCH bpf-next 0/7] bpf: improve verifier scalability Alexei Starovoitov
2019-03-30  0:16 ` [PATCH bpf-next 1/7] bpf: add verifier stats and log_level bit 2 Alexei Starovoitov
2019-03-30  0:16 ` [PATCH bpf-next 2/7] bpf: improve verification speed by droping states Alexei Starovoitov
2019-03-30  3:12   ` Jakub Kicinski
2019-03-30  3:29     ` Alexei Starovoitov
2019-03-30  0:16 ` [PATCH bpf-next 3/7] bpf: improve verification speed by not remarking live_read Alexei Starovoitov
2019-03-30  3:12   ` Jakub Kicinski
2019-04-01 15:38   ` Edward Cree
2019-03-30  0:16 ` [PATCH bpf-next 4/7] bpf: increase complexity limit and maximum program size Alexei Starovoitov
2019-03-30  0:16 ` [PATCH bpf-next 5/7] bpf: increase verifier log limit Alexei Starovoitov
2019-03-30  3:16   ` Jakub Kicinski
2019-03-30  0:16 ` [PATCH bpf-next 6/7] libbpf: teach libbpf about log_level bit 2 Alexei Starovoitov
2019-03-30  0:16 ` [PATCH bpf-next 7/7] selftests/bpf: add few verifier scale tests Alexei Starovoitov
2019-04-01 14:17   ` Daniel Borkmann
2019-04-01 16:35     ` Daniel Borkmann
2019-04-02  2:08       ` Alexei Starovoitov
2019-04-02 14:13         ` Daniel Borkmann
2019-04-02 18:12           ` Alexei Starovoitov
2019-03-30  3:18 ` [PATCH bpf-next 0/7] bpf: improve verifier scalability Jakub Kicinski
2019-03-30  3:35   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).