From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDA0318C932 for ; Tue, 13 May 2025 00:07:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747094828; cv=none; b=R48A+M1b86pWp1MB5bBTXRm3qiYHWCOHPS0kXnZJ2hMRoOG/M6dPaDn/MkzXeH+wWYMqyiROU/B3810HKsexXLQnfc/wsdL9FDEIBLPvk6YamH+tznXQi2tsnxI3zCtfYBVhrkL+6tgi578lE9TTiK1aLtbJorQ/I6jd0PykUxE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747094828; c=relaxed/simple; bh=zFISzlTdFAgnzT4xezEUtPhbE/zGusoZE0fb+jq/yg4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=pegE5ste30ynEOOtOa4ukEmiwAOqe14BXDO+FUN2qlClYW7mSZV3xL1jBeK4/7D2vJ/rBa8/wG+DdLVzntpdU/+taateUMnFVgmWCDLzP6ndbYoAzeDKmcPDKmeeZA1t4bR9X9ebtzuhGKqGFxXzKSWjca10iypMlDLV61Zi+lk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JE1uZFkY; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JE1uZFkY" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7399838db7fso4880304b3a.0 for ; Mon, 12 May 2025 17:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747094826; x=1747699626; darn=vger.kernel.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=3gkEtbigvAnIv0feEiYLTGDPb1DHH0YnnXWoAGUdhU4=; b=JE1uZFkY1OwuVgC1VYNBXsUYU3YXrEXuFrNa+muSqwnkqxos/buMy70sm2+uv8Zx4i CNix7Ah2YjEuD2pI6fnHTmFxjbIDqcp6JVF5NuSqktdYDEUJQZ4XMF9chLiZieBJpzWa TAsAp1PbiPXsl0/JZSHx6//rNoDlgy1HZkgbsJtHuacPfY6FAdwHes3/3+Bq7CrstuNu n0QTdGqhr82DpEcJWSOs0IjBrgLS3o+JmLl1szvyqtCurJ789jOWKiQBufcv4sDRvjju Y32f+8lNghBAS94iwAIPDVWrOL1shiauoGolUJWIEpeR7dwW4sPOhmO4tX/tfn0d0L+P n1Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747094826; x=1747699626; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3gkEtbigvAnIv0feEiYLTGDPb1DHH0YnnXWoAGUdhU4=; b=P3eTxW9eexyBSc2RlBUHBURhFhpFffrL0K5OWYjkIccxdlq9fjHIs6O8sZkdJixoJ3 BRFtowKm0ooPxdV6zaGyKu7QMMi8JxcyKKpY7dEuED9XV6Tr8+tmaoPJ7cVJIj2+f1ww H4d/tZv054uL+Kqi3Stp3iR4F7i05l2goUrqIRmcc9cX3UCWmlLeaxZ+kEBGgKc47tXC QaxZiGnMP7rDf9yT/PgBUJeVkASuG9lBlJgEjlgwQKCH0/YFX1EvAOkraYuDLKehwVAD P6UrS/ANn8iQkxAgmAquss5qSDhxg95SUdmGbZEgsOR4TfLX7im52O3WegqiGC3h8VxC HVkA== X-Gm-Message-State: AOJu0YxyAcWlvanZzAl2nFmUxG03DuxrnmOVerVy1YPIk4Qhwlbj5tX2 cAtujyJ1UsYQGp1kHETV2ZnWD0DuOyZMnAFaFyWRD4etjDojgWa1 X-Gm-Gg: ASbGncstoO73YCrpsR62/PVHn1m+mrQKmtegudhxX6xdYMee5HX3CT/ELbVdpUCf6lN uu3M+37DVmZ0UhZALwfwU1rHoTkO5eUKbtk49+FnhR8SOryW1bsUrrSEGxHSAfIfwloE2rRzwTJ uv+tj8Z+I/aubj3eAAGxCJMbj/h0hucUBNTkNTbvBJP5jm9c/yPAzqjrtIKv6SCrmG0UfKuI486 sk6juEwSI0jPF+XD7ZwE3W4ctqTqt/tPWYMeGJHATt05Skg6FqRtDU7aiX6fIiRmfpfOFTaHPyK Inl0p/3TiQTCnN/EN0bdzAypm/xGBNNAM4CPDsYLEODd2pe/gSLlAPoAriA= X-Google-Smtp-Source: AGHT+IEF/oNrNKt4NdNrFFzQ8yQjQeOvDWsSXqT866RpAaXmXnh9fX3SnCSjdiGSku1k2/6evE5l3g== X-Received: by 2002:a05:6a00:8482:b0:736:aea8:c9b7 with SMTP id d2e1a72fcca58-74278f8ce19mr1374478b3a.2.1747094825824; Mon, 12 May 2025 17:07:05 -0700 (PDT) Received: from ezingerman-mba ([2620:10d:c090:500::6:10d]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74237a8f674sm6619439b3a.168.2025.05.12.17.07.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 May 2025 17:07:05 -0700 (PDT) From: Eduard Zingerman To: Raj Sahu Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com, jolsa@kernel.org, djwillia@vt.edu, miloc@vt.edu, ericts@vt.edu, rahult@vt.edu, doniaghazy@vt.edu, quanzhif@vt.edu, jinghao7@illinois.edu, sidchintamaneni@gmail.com, memxor@gmail.com Subject: Re: [RFC bpf-next 3/4] bpf: Generating a stubbed version of BPF program for termination In-Reply-To: <20250420105524.2115690-4-rjsu26@gmail.com> (Raj Sahu's message of "Sun, 20 Apr 2025 06:55:21 -0400") References: <20250420105524.2115690-1-rjsu26@gmail.com> <20250420105524.2115690-4-rjsu26@gmail.com> Date: Mon, 12 May 2025 17:07:02 -0700 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Raj Sahu writes: [...] > diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c > index 380e9a7cac75..e5dceebfb302 100644 > --- a/kernel/bpf/bpf_iter.c > +++ b/kernel/bpf/bpf_iter.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > > struct bpf_iter_target_info { > struct list_head list; > @@ -775,6 +776,70 @@ const struct bpf_func_proto bpf_loop_proto = { > .arg4_type = ARG_ANYTHING, > }; > > +BPF_CALL_4(bpf_loop_termination, u32, nr_loops, void *, callback_fn, > + void *, callback_ctx, u64, flags) > +{ > + /* Since a patched BPF program for termination will want > + * to finish as fast as possible, > + * we simply don't run any loop in here. > + */ > + > + /* Restoring the callee-saved registers and exit. > + * Hacky/ err prone way of restoring the registers. > + * We are figuring out a way to have arch independent > + * way to do this. > + */ > + > + asm volatile( > + "pop %rbx\n\t" > + "pop %rbp\n\t" > + "pop %r12\n\t" > + "pop %r13\n\t" > + ); Why do anything special here? I'd expect a bpf_loop() version simplified to a single return to work. > + > + return 0; > +} [...] > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c [...] > + > +static bool find_in_skiplist(int func_id) { Nit: "skip list" is a name of an (unrelated) data structure, maybe disambiguate the name here? > + return is_verifier_inlined_function(func_id) || > + is_debug_function(func_id) || > + is_resource_release_function(func_id); > +} > + > +static int get_replacement_helper(int func_id, enum bpf_return_type ret_type) { > + > + switch (func_id) { > + case BPF_FUNC_loop: > + return BPF_FUNC_loop_termination; > + case BPF_FUNC_for_each_map_elem: > + case BPF_FUNC_user_ringbuf_drain: > + return -ENOTSUPP; > + } > + > + switch (ret_type) { > + case RET_VOID: > + return BPF_FUNC_dummy_void; > + case RET_INTEGER: > + return BPF_FUNC_dummy_int; > + case RET_PTR_TO_MAP_VALUE_OR_NULL: > + return BPF_FUNC_dummy_ptr_to_map; > + case RET_PTR_TO_SOCKET_OR_NULL: > + case RET_PTR_TO_TCP_SOCK_OR_NULL: > + case RET_PTR_TO_SOCK_COMMON_OR_NULL: > + case RET_PTR_TO_RINGBUF_MEM_OR_NULL: > + case RET_PTR_TO_DYNPTR_MEM_OR_NULL: > + case RET_PTR_TO_BTF_ID_OR_NULL: > + case RET_PTR_TO_BTF_ID_TRUSTED: > + case RET_PTR_TO_MAP_VALUE: > + case RET_PTR_TO_SOCKET: > + case RET_PTR_TO_TCP_SOCK: > + case RET_PTR_TO_SOCK_COMMON: > + case RET_PTR_TO_MEM: > + case RET_PTR_TO_MEM_OR_BTF_ID: > + case RET_PTR_TO_BTF_ID: I'm curious what's the plan for RET_PTR_TO_BTF_ID? verifier.c:check_ptr_tp_btf_access() has the following comment: /* By default any pointer obtained from walking a trusted pointer is no * longer trusted, unless the field being accessed has explicitly been * marked as inheriting its parent's state of trust (either full or RCU). * For example: * 'cgroups' pointer is untrusted if task->cgroups dereference * happened in a sleepable program outside of bpf_rcu_read_lock() * section. In a non-sleepable program it's trusted while in RCU CS (aka MEM_RCU). * Note bpf_rcu_read_unlock() converts MEM_RCU pointers to PTR_UNTRUSTED. * * A regular RCU-protected pointer with __rcu tag can also be deemed * trusted if we are in an RCU CS. Such pointer can be NULL. */ > + default: > + return -ENOTSUPP; > + } > +} > + > +static void patch_generator(struct bpf_prog *prog) > +{ > + struct call_insn_aux{ > + int insn_idx; > + int replacement_helper; > + }; > + > + struct call_insn_aux *call_indices; > + int num_calls=0; > + call_indices = vmalloc(sizeof(call_indices) * prog->len); clang-tidy spotted a warning here, the expression should be `sizeof(*call_indices)`. > + > + /* Find all call insns */ > + for(int insn_idx =0 ;insn_idx < prog->len; insn_idx++) > + { > + struct bpf_insn *insn = &prog->insnsi[insn_idx] ; > + u8 class = BPF_CLASS(insn->code); > + if (class == BPF_JMP || class == BPF_JMP32) { > + if (BPF_OP(insn->code) == BPF_CALL){ > + if (insn->src_reg == BPF_PSEUDO_CALL) { > + continue; > + } > + if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL){ /*kfunc */ > + // TODO Need to use btf for getting proto > + // If release function --> skip > + // If acquire function --> find return type and add to list > + } > + else { > + int func_id = insn->imm; > + const struct bpf_func_proto *fn = NULL; > + int new_helper_id = -1; > + > + if (find_in_skiplist(func_id)) { > + continue; > + } > + > + fn = bpf_verifier_ops[prog->type]->get_func_proto(func_id, prog); Nit: please reuse verifier.c:get_helper_proto() utility function here. > + if (!fn && !fn->func) { > + continue; > + } > + > + new_helper_id = get_replacement_helper(func_id, fn->ret_type); > + if (new_helper_id < 0) { > + continue; Nit: propagate error? > + } > + > + call_indices[num_calls].insn_idx = insn_idx; > + call_indices[num_calls].replacement_helper= new_helper_id; > + num_calls++; > + } > + } > + } > + } > + > + /* Patch all call insns */ > + for(int k =0; k < num_calls; k++){ > + prog->insnsi[call_indices[k].insn_idx].imm = call_indices[k].replacement_helper; Nit: each instruction is visited only once, so it appears that patching can be done in-place w/o need for call_indices array. > + } > +} > + > +static bool create_termination_prog(struct bpf_prog *prog, > + union bpf_attr *attr, > + bpfptr_t uattr, > + u32 uattr_size) > +{ > + if (prog->len < 10) > + return false; > + > + int err; > + struct bpf_prog *patch_prog; > + patch_prog = bpf_prog_alloc_no_stats(bpf_prog_size(prog->len), 0); > + if (!patch_prog) { > + return false; > + } > + > + patch_prog->termination_states->is_termination_prog = true; > + > + err = clone_bpf_prog(patch_prog, prog); > + if (err) > + goto free_termination_prog; > + > + patch_generator(patch_prog); > + > + err = bpf_check(&patch_prog, attr, uattr, uattr_size); There might be issues with program verification if bpf_check is called for a patched program. For example, for a full implementation you'd need to handle kfuncs, replacing loops like: while (bpf_iter_next()) {} with loops like: while (dummy_return_null()) {} won't work as verifier would assume that the loop is infinite. In general, check_helper_call, check_kfunc_call and is_state_visited have logic for specific helpers/kfuncs that modifies verifier state basing not only on the helper return value type. What do you plan to do with functions that unconditionally take resources, e.g. bpf_spin_lock()? - From verification point of view: this function is RET_VOID and is not in find_in_skiplist(), patch_generator() would replace its call with a dummy. However, a corresponding bpf_spin_unlock() would remain and thus bpf_check() will exit with error. So, you would need some special version of bpf_check, that collects all resources needed for program translation (e.g. maps), but does not perform semantic checks. Or patch_generator() has to be called for a program that is already verified. - From termination point of view: this function cannot be replaced with a dummy w/o removing corresponding unlock. Do you plan to leave these as-is? > + if (err) { > + goto free_termination_prog; > + } > + > + patch_prog = bpf_prog_select_runtime(patch_prog, &err); > + if (err) { > + goto free_termination_prog; > + } > + > + prog->termination_states->patch_prog = patch_prog; > + return true; > + > +free_termination_prog: > + free_percpu(patch_prog->stats); > + free_percpu(patch_prog->active); > + kfree(patch_prog->aux); > + return false; In case of a load failure of the main program bpf_prog_load_load() does much more cleanup. [...] > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 54c6953a8b84..57b4fd1f6a72 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c [...] > @@ -22502,7 +22506,14 @@ static struct bpf_prog *inline_bpf_loop(struct bpf_verifier_env *env, > */ > insn_buf[cnt++] = BPF_MOV64_REG(BPF_REG_1, reg_loop_cnt); > insn_buf[cnt++] = BPF_MOV64_REG(BPF_REG_2, reg_loop_ctx); > - insn_buf[cnt++] = BPF_CALL_REL(0); > + > + if (termination_states && termination_states->is_termination_prog) { > + /* In a termination BPF prog, we want to exit - set R0 = 1 */ > + insn_buf[cnt++] = BPF_MOV64_IMM(BPF_REG_0, 1); > + } else { > + insn_buf[cnt++] = BPF_CALL_REL(0); > + } > + Q: Do you need 1:1 instruction pointer correspondence between original and patched programs? Since you patch inlined bpf_loop instead of disallowing inlining in the patched program I assume you do. In this case, there is no guarantee that instructions above have the same size after jit. > /* increment loop counter */ > insn_buf[cnt++] = BPF_ALU64_IMM(BPF_ADD, reg_loop_cnt, 1); > /* jump to loop header if callback returned 0 */ [...]