From: Eduard Zingerman <eddyz87@gmail.com>
To: Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
martin.lau@kernel.org
Cc: kernel-team@meta.com
Subject: Re: [PATCH bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default
Date: Fri, 17 Mar 2023 17:05:45 +0200 [thread overview]
Message-ID: <dacb74f46aef078e101d631ce95f03fddf17e284.camel@gmail.com> (raw)
In-Reply-To: <20230316183013.2882810-4-andrii@kernel.org>
On Thu, 2023-03-16 at 11:30 -0700, Andrii Nakryiko wrote:
> Currently, if user-supplied log buffer to collect BPF verifier log turns
> out to be too small to contain full log, bpf() syscall return -ENOSPC,
> fails BPF program verification/load, but preserves first N-1 bytes of
> verifier log (where N is the size of user-supplied buffer).
>
> This is problematic in a bunch of common scenarios, especially when
> working with real-world BPF programs that tend to be pretty complex as
> far as verification goes and require big log buffers. Typically, it's
> when debugging tricky cases at log level 2 (verbose). Also, when BPF program
> is successfully validated, log level 2 is the only way to actually see
> verifier state progression and all the important details.
>
> Even with log level 1, it's possible to get -ENOSPC even if the final
> verifier log fits in log buffer, if there is a code path that's deep
> enough to fill up entire log, even if normally it would be reset later
> on (there is a logic to chop off successfully validated portions of BPF
> verifier log).
>
> In short, it's not always possible to pre-size log buffer. Also, in
> practice, the end of the log most often is way more important than the
> beginning.
>
> This patch switches BPF verifier log behavior to effectively behave as
> rotating log. That is, if user-supplied log buffer turns out to be too
> short, instead of failing with -ENOSPC, verifier will keep overwriting
> previously written log, effectively treating user's log buffer as a ring
> buffer.
>
> Importantly, though, to preserve good user experience and not require
> every user-space application to adopt to this new behavior, before
> exiting to user-space verifier will rotate log (in place) to make it
> start at the very beginning of user buffer as a continuous
> zero-terminated string. The contents will be a chopped off N-1 last
> bytes of full verifier log, of course.
>
> Given beginning of log is sometimes important as well, we add
> BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
> tools like veristat to request first part of verifier log, if necessary.
>
> On the implementation side, conceptually, it's all simple. We maintain
> 64-bit logical start and end positions. If we need to truncate the log,
> start position will be adjusted accordingly to lag end position by
> N bytes. We then use those logical positions to calculate their matching
> actual positions in user buffer and handle wrap around the end of the
> buffer properly. Finally, right before returning from bpf_check(), we
> rotate user log buffer contents in-place as necessary, to make log
> contents contiguous. See comments in relevant functions for details.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
I tried bpf_verifier_vlog() and bpf_vlog_finalize() using some
randomized testing + valgrind and everything seems to be fine.
Tbh, it seems to me that having kunit tests for things like this might
be a good idea.
Test script for reference:
void once(char *in, size_t out_sz) {
struct bpf_verifier_log log = {};
char *out = calloc(out_sz, 1);
size_t in_sz = strlen(in);
size_t orig_out_sz = out_sz;
log.ubuf = out;
log.len_total = out_sz;
log.level = 1 | 2 | 4;
for (char *input = strtok(in, " "); input; input = strtok(NULL, " "))
bpf_verifier_vlog(&log, input);
bpf_vlog_finalize(&log);
for (unsigned i = 0; i < in_sz; ++i)
if (in[i] == 0)
in[i] = ' ';
out_sz = strlen(out);
if (in_sz)
--in_sz;
if (out_sz)
--out_sz;
while (out_sz > 0 && in_sz > 0) {
if (in[in_sz] == ' ') {
--in_sz;
continue;
}
if (in[in_sz] == out[out_sz]) {
--in_sz;
--out_sz;
continue;
}
printf(" in: '%s'\n", in);
printf(" out: '%s'\n", out);
printf("err at: %lu\n", out_sz);
printf("out_sz: %lu\n", orig_out_sz);
break;
}
free(out);
}
void rnd_once() {
char in[1024] = {};
size_t out_sz = 1 + rand() % 100;
size_t in_sz = rand() % (sizeof(in) - 1);
int cnt = 0;
for (unsigned i = 0; i < in_sz; ++i)
if (rand() % 100 < 7)
in[i] = ' ';
else
in[i] = 'a' + (rand() % 26);
once(in, out_sz);
}
int main(int argc, char **argv) {
if (argc == 3) {
once(argv[2], atoi(argv[1]));
} else {
srand(clock());
for (int i = 0; i < 100000; ++i)
rnd_once();
}
}
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
> include/linux/bpf_verifier.h | 32 ++-
> kernel/bpf/log.c | 182 +++++++++++++++++-
> kernel/bpf/verifier.c | 17 +-
> .../selftests/bpf/prog_tests/log_fixup.c | 1 +
> 4 files changed, 209 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 83dff25545ee..cff11c99ed9a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -491,25 +491,42 @@ struct bpf_insn_aux_data {
> #define BPF_VERIFIER_TMP_LOG_SIZE 1024
>
> struct bpf_verifier_log {
> - u32 level;
> - char kbuf[BPF_VERIFIER_TMP_LOG_SIZE];
> + /* Logical start and end positions of a "log window" of the verifier log.
> + * start_pos == 0 means we haven't truncated anything.
> + * Once truncation starts to happen, start_pos + len_total == end_pos,
> + * except during log reset situations, in which (end_pos - start_pos)
> + * might get smaller than len_total (see bpf_vlog_reset()).
> + * Generally, (end_pos - start_pos) gives number of useful data in
> + * user log buffer.
> + */
> + u64 start_pos;
> + u64 end_pos;
> char __user *ubuf;
> - u32 len_used;
> + u32 level;
> u32 len_total;
> + char kbuf[BPF_VERIFIER_TMP_LOG_SIZE];
> };
>
> #define BPF_LOG_LEVEL1 1
> #define BPF_LOG_LEVEL2 2
> #define BPF_LOG_STATS 4
> +#define BPF_LOG_FIXED 8
> #define BPF_LOG_LEVEL (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
> -#define BPF_LOG_MASK (BPF_LOG_LEVEL | BPF_LOG_STATS)
> +#define BPF_LOG_MASK (BPF_LOG_LEVEL | BPF_LOG_STATS | BPF_LOG_FIXED)
> #define BPF_LOG_KERNEL (BPF_LOG_MASK + 1) /* kernel internal flag */
> #define BPF_LOG_MIN_ALIGNMENT 8U
> #define BPF_LOG_ALIGNMENT 40U
>
> +static inline u32 bpf_log_used(const struct bpf_verifier_log *log)
> +{
> + return log->end_pos - log->start_pos;
> +}
> +
> static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
> {
> - return log->len_used >= log->len_total - 1;
> + if (log->level & BPF_LOG_FIXED)
> + return bpf_log_used(log) >= log->len_total - 1;
> + return false;
> }
>
> static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
> @@ -596,7 +613,7 @@ struct bpf_verifier_env {
> u32 scratched_regs;
> /* Same as scratched_regs but for stack slots */
> u64 scratched_stack_slots;
> - u32 prev_log_len, prev_insn_print_len;
> + u64 prev_log_pos, prev_insn_print_pos;
> /* buffer used in reg_type_str() to generate reg_type string */
> char type_str_buf[TYPE_STR_BUF_LEN];
> };
> @@ -608,7 +625,8 @@ __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
> const char *fmt, ...);
> __printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
> const char *fmt, ...);
> -void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos);
> +void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos);
> +void bpf_vlog_finalize(struct bpf_verifier_log *log);
>
> static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
> {
> diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
> index 1974891fc324..ec640828e84e 100644
> --- a/kernel/bpf/log.c
> +++ b/kernel/bpf/log.c
> @@ -32,26 +32,192 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
> return;
> }
>
> - n = min(log->len_total - log->len_used - 1, n);
> - log->kbuf[n] = '\0';
> - if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
> - log->len_used += n;
> - else
> - log->ubuf = NULL;
> + if (log->level & BPF_LOG_FIXED) {
> + n = min(log->len_total - bpf_log_used(log) - 1, n);
> + log->kbuf[n] = '\0';
> + n += 1;
> +
> + if (copy_to_user(log->ubuf + log->end_pos, log->kbuf, n))
> + goto fail;
> +
> + log->end_pos += n - 1; /* don't count terminating '\0' */
> + } else {
> + u64 new_end, new_start, cur_pos;
> + u32 buf_start, buf_end, new_n;
> +
> + log->kbuf[n] = '\0';
> + n += 1;
> +
> + new_end = log->end_pos + n;
> + if (new_end - log->start_pos >= log->len_total)
> + new_start = new_end - log->len_total;
> + else
> + new_start = log->start_pos;
> + new_n = min(n, log->len_total);
> + cur_pos = new_end - new_n;
> +
> + buf_start = cur_pos % log->len_total;
> + buf_end = new_end % log->len_total;
> + /* new_end and buf_end are exclusive indices, so if buf_end is
> + * exactly zero, then it actually points right to the end of
> + * ubuf and there is no wrap around
> + */
> + if (buf_end == 0)
> + buf_end = log->len_total;
> +
> + /* if buf_start > buf_end, we wrapped around;
> + * if buf_start == buf_end, then we fill ubuf completely; we
> + * can't have buf_start == buf_end to mean that there is
> + * nothing to write, because we always write at least
> + * something, even if terminal '\0'
> + */
> + if (buf_start < buf_end) {
> + /* message fits within contiguous chunk of ubuf */
> + if (copy_to_user(log->ubuf + buf_start,
> + log->kbuf + n - new_n,
> + buf_end - buf_start))
> + goto fail;
> + } else {
> + /* message wraps around the end of ubuf, copy in two chunks */
> + if (copy_to_user(log->ubuf + buf_start,
> + log->kbuf + n - new_n,
> + log->len_total - buf_start))
> + goto fail;
> + if (copy_to_user(log->ubuf,
> + log->kbuf + n - buf_end,
> + buf_end))
> + goto fail;
> + }
> +
> + log->start_pos = new_start;
> + log->end_pos = new_end - 1; /* don't count terminating '\0' */
> + }
> +
> + return;
> +fail:
> + log->ubuf = NULL;
> }
>
> -void bpf_vlog_reset(struct bpf_verifier_log *log, u32 new_pos)
> +void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos)
> {
> char zero = 0;
>
> if (!bpf_verifier_log_needed(log))
> return;
>
> - log->len_used = new_pos;
> + /* if position to which we reset is beyond current log window, then we
> + * didn't preserve any useful content and should adjust adjust
> + * start_pos to end up with an empty log (start_pos == end_pos)
> + */
> + log->end_pos = new_pos;
> + if (log->end_pos < log->start_pos)
> + log->start_pos = log->end_pos;
> +
> if (put_user(zero, log->ubuf + new_pos))
> log->ubuf = NULL;
> }
>
> +static void bpf_vlog_reverse_kbuf(char *buf, int len)
> +{
> + int i, j;
> +
> + for (i = 0, j = len - 1; i < j; i++, j--)
> + swap(buf[i], buf[j]);
> +}
> +
> +static int bpf_vlog_reverse_ubuf(struct bpf_verifier_log *log, int start, int end)
> +{
> + /* we split log->kbuf into two equal parts for both ends of array */
> + int n = sizeof(log->kbuf) / 2, nn;
> + char *lbuf = log->kbuf, *rbuf = log->kbuf + n;
> +
> + /* Read ubuf's section [start, end) two chunks at a time, from left
> + * and right side; within each chunk, swap all the bytes; after that
> + * reverse the order of lbuf and rbuf and write result back to ubuf.
> + * This way we'll end up with swapped contents of specified
> + * [start, end) ubuf segment.
> + */
> + while (end - start > 1) {
> + nn = min(n, (end - start ) / 2);
> +
> + if (copy_from_user(lbuf, log->ubuf + start, nn))
> + return -EFAULT;
> + if (copy_from_user(rbuf, log->ubuf + end - nn, nn))
> + return -EFAULT;
> +
> + bpf_vlog_reverse_kbuf(lbuf, nn);
> + bpf_vlog_reverse_kbuf(rbuf, nn);
> +
> + /* we write lbuf to the right end of ubuf, while rbuf to the
> + * left one to end up with properly reversed overall ubuf
> + */
> + if (copy_to_user(log->ubuf + start, rbuf, nn))
> + return -EFAULT;
> + if (copy_to_user(log->ubuf + end - nn, lbuf, nn))
> + return -EFAULT;
> +
> + start += nn;
> + end -= nn;
> + }
> +
> + return 0;
> +}
> +
> +void bpf_vlog_finalize(struct bpf_verifier_log *log)
> +{
> + u32 sublen;
> + int err;
> +
> + if (!log || !log->level || !log->ubuf)
> + return;
> + if ((log->level & BPF_LOG_FIXED) || log->level == BPF_LOG_KERNEL)
> + return;
> +
> + /* If we never truncated log, there is nothing to move around. */
> + if (log->start_pos == 0)
> + return;
> +
> + /* Otherwise we need to rotate log contents to make it start from the
> + * buffer beginning and be a continuous zero-terminated string. Note
> + * that if log->start_pos != 0 then we definitely filled up entire log
> + * buffer with no gaps, and we just need to shift buffer contents to
> + * the left by (log->start_pos % log->len_total) bytes.
> + *
> + * Unfortunately, user buffer could be huge and we don't want to
> + * allocate temporary kernel memory of the same size just to shift
> + * contents in a straightforward fashion. Instead, we'll be clever and
> + * do in-place array rotation. This is a leetcode-style problem, which
> + * could be solved by three rotations.
> + *
> + * Let's say we have log buffer that has to be shifted left by 7 bytes
> + * (spaces and vertical bar is just for demonstrative purposes):
> + * E F G H I J K | A B C D
> + *
> + * First, we reverse entire array:
> + * D C B A | K J I H G F E
> + *
> + * Then we rotate first 4 bytes (DCBA) and separately last 7 bytes
> + * (KJIHGFE), resulting in a properly rotated array:
> + * A B C D | E F G H I J K
> + *
> + * We'll utilize log->kbuf to read user memory chunk by chunk, swap
> + * bytes, and write them back. Doing it byte-by-byte would be
> + * unnecessarily inefficient. Altogether we are going to read and
> + * write each byte twice.
> + */
> +
> + /* length of the chopped off part that will be the beginning;
> + * len(ABCD) in the example above
> + */
> + sublen = log->len_total - (log->start_pos % log->len_total);
> +
> + err = bpf_vlog_reverse_ubuf(log, 0, log->len_total);
> + err = err ?: bpf_vlog_reverse_ubuf(log, 0, sublen);
> + err = err ?: bpf_vlog_reverse_ubuf(log, sublen, log->len_total);
> + if (err)
> + log->ubuf = NULL;
> +}
> +
> /* log_level controls verbosity level of eBPF verifier.
> * bpf_verifier_log_write() is used to dump the verification trace to the log,
> * so the user can figure out what's wrong with the program
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 203d6e644e44..f6d3d448e1b1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1435,10 +1435,10 @@ static inline u32 vlog_alignment(u32 pos)
> static void print_insn_state(struct bpf_verifier_env *env,
> const struct bpf_func_state *state)
> {
> - if (env->prev_log_len && env->prev_log_len == env->log.len_used) {
> + if (env->prev_log_pos && env->prev_log_pos == env->log.end_pos) {
> /* remove new line character */
> - bpf_vlog_reset(&env->log, env->prev_log_len - 1);
> - verbose(env, "%*c;", vlog_alignment(env->prev_insn_print_len), ' ');
> + bpf_vlog_reset(&env->log, env->prev_log_pos - 1);
> + verbose(env, "%*c;", vlog_alignment(env->prev_insn_print_pos), ' ');
> } else {
> verbose(env, "%d:", env->insn_idx);
> }
> @@ -1746,7 +1746,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
> elem->insn_idx = insn_idx;
> elem->prev_insn_idx = prev_insn_idx;
> elem->next = env->head;
> - elem->log_pos = env->log.len_used;
> + elem->log_pos = env->log.end_pos;
> env->head = elem;
> env->stack_size++;
> err = copy_verifier_state(&elem->st, cur);
> @@ -2282,7 +2282,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
> elem->insn_idx = insn_idx;
> elem->prev_insn_idx = prev_insn_idx;
> elem->next = env->head;
> - elem->log_pos = env->log.len_used;
> + elem->log_pos = env->log.end_pos;
> env->head = elem;
> env->stack_size++;
> if (env->stack_size > BPF_COMPLEXITY_LIMIT_JMP_SEQ) {
> @@ -15563,11 +15563,11 @@ static int do_check(struct bpf_verifier_env *env)
> print_insn_state(env, state->frame[state->curframe]);
>
> verbose_linfo(env, env->insn_idx, "; ");
> - env->prev_log_len = env->log.len_used;
> + env->prev_log_pos = env->log.end_pos;
> verbose(env, "%d: ", env->insn_idx);
> print_bpf_insn(&cbs, insn, env->allow_ptr_leaks);
> - env->prev_insn_print_len = env->log.len_used - env->prev_log_len;
> - env->prev_log_len = env->log.len_used;
> + env->prev_insn_print_pos = env->log.end_pos - env->prev_log_pos;
> + env->prev_log_pos = env->log.end_pos;
> }
>
> if (bpf_prog_is_offloaded(env->prog->aux)) {
> @@ -18780,6 +18780,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
> print_verification_stats(env);
> env->prog->aux->verified_insns = env->insn_processed;
>
> + bpf_vlog_finalize(log);
> if (log->level && bpf_verifier_log_full(log))
> ret = -ENOSPC;
> if (log->level && !log->ubuf) {
> diff --git a/tools/testing/selftests/bpf/prog_tests/log_fixup.c b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
> index 239e1c5753b0..bc27170bdeb0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/log_fixup.c
> +++ b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
> @@ -24,6 +24,7 @@ static void bad_core_relo(size_t log_buf_size, enum trunc_type trunc_type)
> bpf_program__set_autoload(skel->progs.bad_relo, true);
> memset(log_buf, 0, sizeof(log_buf));
> bpf_program__set_log_buf(skel->progs.bad_relo, log_buf, log_buf_size ?: sizeof(log_buf));
> + bpf_program__set_log_level(skel->progs.bad_relo, 1 | 8); /* BPF_LOG_FIXED to force truncation */
>
> err = test_log_fixup__load(skel);
> if (!ASSERT_ERR(err, "load_fail"))
next prev parent reply other threads:[~2023-03-17 15:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-16 18:30 [PATCH bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 1/6] bpf: split off basic BPF verifier log into separate file Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 2/6] bpf: remove minimum size restrictions on verifier log buffer Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
2023-03-17 3:54 ` Alexei Starovoitov
2023-03-17 16:44 ` Andrii Nakryiko
2023-03-17 5:07 ` kernel test robot
2023-03-17 6:39 ` kernel test robot
2023-03-17 6:59 ` kernel test robot
2023-03-17 15:05 ` Eduard Zingerman [this message]
2023-03-17 16:47 ` Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 4/6] libbpf: don't enfore verifier log levels on libbpf side Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 5/6] selftests/bpf: add more veristat control over verifier log options Andrii Nakryiko
2023-03-16 18:30 ` [PATCH bpf-next 6/6] selftests/bpf: add fixed vs rotating verifier log tests Andrii Nakryiko
-- strict thread matches above, loose matches on Subject: below --
2023-03-17 22:03 [PATCH bpf-next 0/6] BPF verifier rotating log Andrii Nakryiko
2023-03-17 22:03 ` [PATCH bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default Andrii Nakryiko
2023-03-17 22:05 ` Andrii Nakryiko
2023-03-17 23:02 ` Daniel Borkmann
2023-03-17 23:12 ` Andrii Nakryiko
2023-03-20 16:10 ` Lorenz Bauer
2023-03-20 18:55 ` Andrii Nakryiko
2023-03-21 16:18 ` Lorenz Bauer
2023-03-22 21:27 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dacb74f46aef078e101d631ce95f03fddf17e284.camel@gmail.com \
--to=eddyz87@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@meta.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox