From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0A9E6FC1 for ; Sun, 17 Sep 2023 21:37:47 +0000 (UTC) Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D355E114 for ; Sun, 17 Sep 2023 14:37:44 -0700 (PDT) Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-9ad8d47ef2fso475444866b.1 for ; Sun, 17 Sep 2023 14:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1694986663; x=1695591463; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=m/r2jvFL/PZxJ4CCjrN81Ok4M5F+6FvJOnXR1zy8NXo=; b=AFtZc1M4Lx94nghWVqSuddsgEYf+0o3IZ2OPk6rxQV1PUSwvlEfT2O2zQ636sIW2p1 OpiWcL+2Jp33rxaPzP6T/E/OXZwIlp36/zma23z6sIlRXrGqaY8AOWY4aZpTo6ICM1F7 q/gWqa1ulh5p2J/l4imh8Luky7PWFFkoIBOzV2e1uRzqxnMckhTLFBjnCTWCBDytzh3q 9uFM3+51XthGJAxdLh3JZuyjEbxzcnQbpV2NfaVuqwiGj7CeOF1jTVMOJ35PSncU4wOn 7POsd98divdly4yiXeoEjvwIM1YkDOdQADhljEpx5GVFPnPU+P5gTZpXwyOzP/7JdAp/ tYbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694986663; x=1695591463; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=m/r2jvFL/PZxJ4CCjrN81Ok4M5F+6FvJOnXR1zy8NXo=; b=qctg2pOkFDH/RJEn9k/Tg34RvinvymwJ2Yn1dB4uUAR2cINMp31GaF0b3qaG7wKnYT wWrC1ku4+xVzrOGJowOAzmgU3xg0M60/p0zB4xIrInjjWd6XIlGdS78hMvjP9WyyaZWK 6M4pKKStwNCGwtZ8SZpZo77hs0P791bzxW9NhQcn2rdM5e6G/r5kwmApddy/pxtZopgy soB9gpf81Uzl+DmLCGhvgjx0oqAlhhOrRw7EsVIU4GFBFRYWhDXUbIF1BLMY6jC8JCsi q9rx4iIbOHNf27d2R6m2iiH+73c6IBlw2dcJ31yHP5sUKVNcSlByTxZlo5wG+YgVMNAt ZfZA== X-Gm-Message-State: AOJu0YybB0yEnXl84lXxSstf9KekeDs1Obwx9O1/hDygIpnRd6fT84LD gOC1ioluJA7awogLALO4e4ewV7IY458= X-Google-Smtp-Source: AGHT+IGtR4Lblea3KGH/OO48A5KKVdq6XigX0avZxr411iRExFxQHKtvrHVRk+PWkjSFeQrpDE/OLw== X-Received: by 2002:a17:906:51d0:b0:9a1:abae:8d30 with SMTP id v16-20020a17090651d000b009a1abae8d30mr6314101ejk.47.1694986662971; Sun, 17 Sep 2023 14:37:42 -0700 (PDT) Received: from [192.168.1.95] (host-176-36-0-241.b024.la.net.ua. [176.36.0.241]) by smtp.gmail.com with ESMTPSA id kg11-20020a17090776eb00b00993004239a4sm5408488ejc.215.2023.09.17.14.37.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Sep 2023 14:37:42 -0700 (PDT) Message-ID: Subject: Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...) From: Eduard Zingerman To: Andrii Nakryiko , Alexei Starovoitov Cc: Andrew Werner , bpf , Andrei Matei , Tamir Duberstein , Joanne Koong , kernel-team@dataexmachina.dev, Song Liu Date: Mon, 18 Sep 2023 00:37:41 +0300 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote: > On Fri, Jul 7, 2023 at 9:44=E2=80=AFAM Alexei Starovoitov > wrote: > >=20 > > On Fri, Jul 7, 2023 at 7:04=E2=80=AFAM Andrew Werner wrote: > > >=20 > > > When it comes to fixing the problem, I don't quite know where to star= t. > > > Perhaps these iteration callbacks ought to be treated more like globa= l functions > > > -- you can't always make assumptions about the state of the data in t= he context > > > pointer. Treating the context pointer as totally opaque seems bad fro= m > > > a usability > > > perspective. Maybe there's a way to attempt to verify the function > > > body of the function > > > by treating all or part of the context as read-only, and then if that > > > fails, go back and > > > assume nothing about that part of the context structure. What is the > > > right way to > > > think about plugging this hole? > >=20 > > 'treat as global' might be a way to fix it, but it will likely break > > some setups, since people passing pointers in a context and current > > global func verification doesn't support that. >=20 > yeah, the intended use case is to return something from callbacks > through context pointer. So it will definitely break real uses. >=20 > > I think the simplest fix would be to disallow writes into any pointers > > within a ctx. Writes to scalars should still be allowed. >=20 > It might be a partial mitigation, but even with SCALARs there could be > problems due to assumed SCALAR range, which will be invalid if > callback is never called or called many times. >=20 > > Much more complex fix would be to verify callbacks as > > process_iter_next_call(). See giant comment next to it. >=20 > yep, that seems like the right way forward. We need to simulate 0, 1, > 2, .... executions of callbacks until we validate and exhaust all > possible context modification permutations, just like open-coded > iterators do it >=20 > > But since we need a fix for stable I'd try the simple approach first. > > Could you try to implement that? Hi All, This issue seems stalled, so I took a look over the weekend. I have a work in progress fix, please let me know if you don't agree with direction I'm taking or if I'm stepping on anyone's toes. After some analysis I decided to go with Alexei's suggestion and implement something similar to iterators for selected set of helper functions that accept "looping" callbacks, such as: - bpf_loop - bpf_for_each_map_elem - bpf_user_ringbuf_drain - bpf_timer_set_callback (?) The sketch of the fix looks as follows: - extend struct bpf_func_state with callback iterator state information: - active/non-active flag - depth - r1-r5 state at the entry to callback; - extend __check_func_call() to setup callback iterator state when call to suitable helper function is processed; - extend BPF_EXIT processing (prepare_func_exit()) to queue new callback visit upon return from iterating callback (similar to process_iter_next_call()); - extend is_state_visited() to account for callback iterator hits in a way similar to regular iterators; - extend backtrack_insn() to correctly react to jumps from callback exit to callback entry. =20 I have a patch (at the end of this email) that correctly recognizes the bpf_loop example in this thread as unsafe. However this patch has a few deficiencies: - verif_scale_strobemeta_bpf_loop selftest is not passing, because read_map_var function invoked from the callback continuously increments 'payload' pointer used in subsequent iterations. =20 In order to handle such code I need to add an upper bound tracking for iteration depth (when such bound could be deduced). - loop detection is broken for simple callback as below: static int loop_callback_infinite(__u32 idx, __u64 *data) { for (;;) (*ctx)++; return 0; } =20 To handle such code I need to change is_state_visited() to do callback iterator loop/hit detection on exit from callback (returns are not prune points at the moment), currently it is done on entry. - the callback as bellow currently causes state explosion: static int precise1_callback(__u32 idx, struct precise1_ctx *ctx) { if (idx =3D=3D 0) ctx->a =3D 1; if (idx =3D=3D 1 && ctx->a =3D=3D 1) ctx->b =3D 1; return 0; } =20 I'm not sure yet what to do about this, there are several possibilities: - tweak the order in which states are visited (need to think about it); - check states in bpf_verifier_env::head (not explored yet) for equivalence and avoid enqueuing duplicate states. =20 I'll proceed addressing the issues above on Monday. Thanks, Eduard --- diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index a3236651ec64..5589f55e42ba 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -70,6 +70,17 @@ enum bpf_iter_state { BPF_ITER_STATE_DRAINED, }; =20 +struct bpf_iter { + /* BTF container and BTF type ID describing + * struct bpf_iter_ of an iterator state + */ + struct btf *btf; + u32 btf_id; + /* packing following two fields to fit iter state into 16 bytes */ + enum bpf_iter_state state:2; + int depth:30; +}; + struct bpf_reg_state { /* Ordering of fields matters. See states_equal() */ enum bpf_reg_type type; @@ -115,16 +126,7 @@ struct bpf_reg_state { } dynptr; =20 /* For bpf_iter stack slots */ - struct { - /* BTF container and BTF type ID describing - * struct bpf_iter_ of an iterator state - */ - struct btf *btf; - u32 btf_id; - /* packing following two fields to fit iter state into 16 bytes */ - enum bpf_iter_state state:2; - int depth:30; - } iter; + struct bpf_iter iter; =20 /* Max size from any of the above. */ struct { @@ -300,6 +302,8 @@ struct bpf_func_state { bool in_callback_fn; struct tnum callback_ret_range; bool in_async_callback_fn; + struct bpf_iter callback_iter; + struct bpf_reg_state callback_entry_state[MAX_BPF_REG]; =20 /* The following fields should be last. See copy_func_state() */ int acquired_refs; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dbba2b806017..e79a4bec4461 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2357,6 +2357,7 @@ static void init_func_state(struct bpf_verifier_env *= env, state->callback_ret_range =3D tnum_range(0, 0); init_reg_state(env, state); mark_verifier_state_scratched(env); + state->callback_iter.state =3D BPF_ITER_STATE_INVALID; } =20 /* Similar to push_stack(), but for async callbacks */ @@ -3599,6 +3600,39 @@ static int backtrack_insn(struct bpf_verifier_env *e= nv, int idx, int subseq_idx, } } else if (opcode =3D=3D BPF_EXIT) { bool r0_precise; + int subprog; + + /* Jump from 'exit' to start of the same subprogram, + * this happens for callback iteration, e.g.: + * + * int cb_func(...) { + * start: + * ... + * return 0; // <-- BPF_EXIT processing in do_check() + * // adds a state with IP set to 'start'. + * } + * bpf_loop(..., cb_func, ...); + * + * Clear r1-r5 as in the callback case above, but do + * not change frame number. + */ + if (subseq_idx >=3D 0 && + ((subprog =3D find_subprog(env, subseq_idx)) >=3D 0) && + idx >=3D subseq_idx && + (subprog >=3D env->subprog_cnt || + idx < env->subprog_info[subprog + 1].start)) { + if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) { + verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); + WARN_ONCE(1, "verifier backtracking bug"); + return -EFAULT; + } + if (bt_stack_mask(bt) !=3D 0) + return -ENOTSUPP; + /* clear r1-r5 in callback subprog's mask */ + for (i =3D BPF_REG_1; i <=3D BPF_REG_5; i++) + bt_clear_reg(bt, i); + return 0; + } =20 if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { /* if backtracing was looking for registers R1-R5 @@ -8869,13 +8903,17 @@ static int set_callee_state(struct bpf_verifier_env= *env, struct bpf_func_state *caller, struct bpf_func_state *callee, int insn_idx); =20 +static int set_loop_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, int insn_idx); + static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn= *insn, int *insn_idx, int subprog, set_callee_state_fn set_callee_state_cb) { struct bpf_verifier_state *state =3D env->cur_state; struct bpf_func_state *caller, *callee; - int err; + int err, i; =20 if (state->curframe + 1 >=3D MAX_CALL_FRAMES) { verbose(env, "the call stack of %d frames is too deep\n", @@ -8972,7 +9010,6 @@ static int __check_func_call(struct bpf_verifier_env = *env, struct bpf_insn *insn *insn_idx /* callsite */, state->curframe + 1 /* frameno within this callchain */, subprog /* subprog number within this prog */); - /* Transfer references to the callee */ err =3D copy_reference_state(callee, caller); if (err) @@ -8982,6 +9019,14 @@ static int __check_func_call(struct bpf_verifier_env= *env, struct bpf_insn *insn if (err) goto err_out; =20 + if (set_callee_state_cb =3D=3D set_loop_callback_state) { + callee->callback_iter.state =3D BPF_ITER_STATE_ACTIVE; + callee->callback_iter.depth =3D 0; + for (i =3D BPF_REG_0; i < MAX_BPF_REG; ++i) + copy_register_state(&callee->callback_entry_state[i], + &callee->regs[i]); + } + clear_caller_saved_regs(env, caller->regs); =20 /* only increment it after check_reg_arg() finished */ @@ -9256,7 +9301,7 @@ static int prepare_func_exit(struct bpf_verifier_env = *env, int *insn_idx) struct bpf_verifier_state *state =3D env->cur_state; struct bpf_func_state *caller, *callee; struct bpf_reg_state *r0; - int err; + int err, i; =20 callee =3D state->frame[state->curframe]; r0 =3D &callee->regs[BPF_REG_0]; @@ -9301,6 +9346,25 @@ static int prepare_func_exit(struct bpf_verifier_env= *env, int *insn_idx) return err; } =20 + if (callee->in_callback_fn && callee->callback_iter.state =3D=3D BPF_ITER= _STATE_ACTIVE) { + struct bpf_verifier_state *queued_st; + struct bpf_func_state *queued_callee; + struct bpf_iter *queued_iter; + + queued_st =3D push_stack(env, env->subprog_info[callee->subprogno].start= , + *insn_idx, false); + if (!queued_st) + return -ENOMEM; + + queued_callee =3D queued_st->frame[callee->frameno]; + queued_iter =3D &queued_callee->callback_iter; + queued_iter->depth++; + + for (i =3D BPF_REG_0; i < MAX_BPF_REG; ++i) + copy_register_state(&queued_callee->regs[i], + &queued_callee->callback_entry_state[i]); + } + *insn_idx =3D callee->callsite + 1; if (env->log.level & BPF_LOG_LEVEL) { verbose(env, "returning from callee:\n"); @@ -16112,6 +16176,15 @@ static bool is_iter_next_insn(struct bpf_verifier_= env *env, int insn_idx) return env->insn_aux_data[insn_idx].is_iter_next; } =20 +static bool is_callback_iter_entry(struct bpf_verifier_env *env, int insn_= idx) +{ + struct bpf_func_state *cur_frame =3D cur_func(env); + + return cur_frame->in_callback_fn && + env->subprog_info[cur_frame->subprogno].start =3D=3D insn_idx && + cur_frame->callback_iter.state !=3D BPF_ITER_STATE_INVALID; +} + /* is_state_visited() handles iter_next() (see process_iter_next_call() fo= r * terminology) calls specially: as opposed to bounded BPF loops, it *expe= cts* * states to match, which otherwise would look like an infinite loop. So w= hile @@ -16190,6 +16263,9 @@ static bool iter_active_depths_differ(struct bpf_ve= rifier_state *old, struct bpf if (cur_slot->iter.depth !=3D slot->iter.depth) return true; } + if (state->callback_iter.state =3D=3D BPF_ITER_STATE_ACTIVE && + state->callback_iter.depth !=3D cur->frame[fr]->callback_iter.depth) + return true; } return false; } @@ -16277,6 +16353,12 @@ static int is_state_visited(struct bpf_verifier_en= v *env, int insn_idx) } goto skip_inf_loop_check; } + if (is_callback_iter_entry(env, insn_idx)) { + if (states_equal(env, &sl->state, cur) && + cur_func(env)->callback_iter.state =3D=3D BPF_ITER_STATE_ACTIVE) + goto hit; + goto skip_inf_loop_check; + } /* attempt to detect infinite loop to avoid unnecessary doomed work */ if (states_maybe_looping(&sl->state, cur) && states_equal(env, &sl->state, cur) &&