From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
yangpeihao@sjtu.edu.cn, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@kernel.org, sinquersw@gmail.com, toke@redhat.com,
jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com
Subject: Re: [RFC PATCH v8 01/20] bpf: Support passing referenced kptr to struct_ops programs
Date: Fri, 17 May 2024 02:23:37 +0200 [thread overview]
Message-ID: <CAP01T74mQPMktHJiPoZ7z-UfFCRoxOexpBe_X2v3rLpE5A+WEA@mail.gmail.com> (raw)
In-Reply-To: <CAMB2axP1C1wVRsq2uDGW0r6-OM8yWvZ9LB0WwEtuSAYsU2T0fg@mail.gmail.com>
On Fri, 17 May 2024 at 02:17, Amery Hung <ameryhung@gmail.com> wrote:
>
> On Thu, May 16, 2024 at 4:59 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Fri, 10 May 2024 at 21:24, Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > This patch supports struct_ops programs that acqurie referenced kptrs
> > > throguh arguments. In Qdisc_ops, an skb is passed to ".enqueue" in the
> > > first argument. The qdisc becomes the sole owner of the skb and must
> > > enqueue or drop the skb. This matches the referenced kptr semantic
> > > in bpf. However, the existing practice of acquiring a referenced kptr via
> > > a kfunc with KF_ACQUIRE does not play well in this case. Calling kfuncs
> > > repeatedly allows the user to acquire multiple references, while there
> > > should be only one reference to a unique skb in a qdisc.
> > >
> > > The solutioin is to make a struct_ops program automatically acquire a
> > > referenced kptr through a tagged argument in the stub function. When
> > > tagged with "__ref_acquired" (suggestion for a better name?), an
> > > reference kptr (ref_obj_id > 0) will be acquired automatically when
> > > entering the program. In addition, only the first read to the arguement
> > > is allowed and it will yeild a referenced kptr.
> > >
> > > Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> > > ---
> > > include/linux/bpf.h | 3 +++
> > > kernel/bpf/bpf_struct_ops.c | 17 +++++++++++++----
> > > kernel/bpf/btf.c | 10 +++++++++-
> > > kernel/bpf/verifier.c | 16 +++++++++++++---
> > > 4 files changed, 38 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 9c6a7b8ff963..6aabca1581fe 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -914,6 +914,7 @@ struct bpf_insn_access_aux {
> > > struct {
> > > struct btf *btf;
> > > u32 btf_id;
> > > + u32 ref_obj_id;
> > > };
> > > };
> > > struct bpf_verifier_log *log; /* for verbose logs */
> > > @@ -1416,6 +1417,8 @@ struct bpf_ctx_arg_aux {
> > > enum bpf_reg_type reg_type;
> > > struct btf *btf;
> > > u32 btf_id;
> > > + u32 ref_obj_id;
> > > + bool ref_acquired;
> > > };
> > >
> > > struct btf_mod_pair {
> > > diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> > > index 86c7884abaf8..bca8e5936846 100644
> > > --- a/kernel/bpf/bpf_struct_ops.c
> > > +++ b/kernel/bpf/bpf_struct_ops.c
> > > @@ -143,6 +143,7 @@ void bpf_struct_ops_image_free(void *image)
> > > }
> > >
> > > #define MAYBE_NULL_SUFFIX "__nullable"
> > > +#define REF_ACQUIRED_SUFFIX "__ref_acquired"
> > > #define MAX_STUB_NAME 128
> > >
> > > /* Return the type info of a stub function, if it exists.
> > > @@ -204,6 +205,7 @@ static int prepare_arg_info(struct btf *btf,
> > > struct bpf_struct_ops_arg_info *arg_info)
> > > {
> > > const struct btf_type *stub_func_proto, *pointed_type;
> > > + bool is_nullable = false, is_ref_acquired = false;
> > > const struct btf_param *stub_args, *args;
> > > struct bpf_ctx_arg_aux *info, *info_buf;
> > > u32 nargs, arg_no, info_cnt = 0;
> > > @@ -240,8 +242,11 @@ static int prepare_arg_info(struct btf *btf,
> > > /* Skip arguments that is not suffixed with
> > > * "__nullable".
> > > */
> > > - if (!btf_param_match_suffix(btf, &stub_args[arg_no],
> > > - MAYBE_NULL_SUFFIX))
> > > + is_nullable = btf_param_match_suffix(btf, &stub_args[arg_no],
> > > + MAYBE_NULL_SUFFIX);
> > > + is_ref_acquired = btf_param_match_suffix(btf, &stub_args[arg_no],
> > > + REF_ACQUIRED_SUFFIX);
> > > + if (!(is_nullable || is_ref_acquired))
> > > continue;
> > >
> > > /* Should be a pointer to struct */
> > > @@ -269,11 +274,15 @@ static int prepare_arg_info(struct btf *btf,
> > > }
> > >
> > > /* Fill the information of the new argument */
> > > - info->reg_type =
> > > - PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL;
> > > info->btf_id = arg_btf_id;
> > > info->btf = btf;
> > > info->offset = offset;
> > > + if (is_nullable) {
> > > + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL;
> > > + } else if (is_ref_acquired) {
> > > + info->reg_type = PTR_TRUSTED | PTR_TO_BTF_ID;
> > > + info->ref_acquired = true;
> > > + }
> > >
> > > info++;
> > > info_cnt++;
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 8c95392214ed..e462fb4a4598 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -6316,7 +6316,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >
> > > /* this is a pointer to another type */
> > > for (i = 0; i < prog->aux->ctx_arg_info_size; i++) {
> > > - const struct bpf_ctx_arg_aux *ctx_arg_info = &prog->aux->ctx_arg_info[i];
> > > + struct bpf_ctx_arg_aux *ctx_arg_info =
> > > + (struct bpf_ctx_arg_aux *)&prog->aux->ctx_arg_info[i];
> > >
> > > if (ctx_arg_info->offset == off) {
> > > if (!ctx_arg_info->btf_id) {
> > > @@ -6324,9 +6325,16 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > > return false;
> > > }
> > >
> > > + if (ctx_arg_info->ref_acquired && !ctx_arg_info->ref_obj_id) {
> > > + bpf_log(log, "cannot acquire a reference to context argument offset %u\n", off);
> > > + return false;
> > > + }
> > > +
> > > info->reg_type = ctx_arg_info->reg_type;
> > > info->btf = ctx_arg_info->btf ? : btf_vmlinux;
> > > info->btf_id = ctx_arg_info->btf_id;
> > > + info->ref_obj_id = ctx_arg_info->ref_obj_id;
> > > + ctx_arg_info->ref_obj_id = 0;
> > > return true;
> >
> > I think this is fragile. What if the compiler produces two independent
> > paths in the program which read the skb pointer once?
> > Technically, the program is still reading the skb pointer once at runtime.
> > Then you will reset ref_obj_id to 0 when exploring one, and assign as
> > 0 in the other one, causing errors.
> > ctx_arg_info appears to be global for the program.
> >
> > I think the better way would be to check if ref_obj_id is still part
> > of the reference state.
> > If the ref_obj_id has already been dropped from reference_state, then
> > any loads should get ref_obj_id = 0.
> > That would happen when dropping or enqueueing the skb into qdisc,
> > which would (I presume) do release_reference_state(ref_obj_id).
> > Otherwise, all of them can share the same ref_obj_id. You won't have
> > to implement "can only read once" logic,
> > and when you enqueue stuff in the qdisc, all identical copies produced
> > from different load instructions will be invalidated.
> > Same ref_obj_id == unique ownership of the same object.
> > You can already have multiple copies through rX = rY, multiple ctx
> > loads of skb will produce a similar verifier state.
> >
> > So, on entry, assign ctx_arg_info->ref_obj_id uniquely, then on each load:
> > if reference_state.find(ctx_arg_info->ref_obj_id) == true; then
> > info->ref_obj_id = ctx_arg_info->ref_obj_id; else info->ref_obj_id =
> > 0;
> >
> > Let me know if I missed something.
>
> You are right. The current approach will falsely reject valid programs,
> and your suggestion makes sense.
Also, I wonder whether when ref_obj_id has been released, we should
mark the loaded register as unknown scalar, vs skb with ref_obj_id =
0?
Otherwise right now it will take PTR_TO_BTF_ID | PTR_TRUSTED as
reg_type, and I think verifier will permit reads even if ref_obj_id =
0.
This will surely be bad once skb is dropped/enqueued, since the
program should no longer be able to read such memory.
>
> Thanks,
> Amery
next prev parent reply other threads:[~2024-05-17 0:24 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 19:23 [RFC PATCH v8 00/20] bpf qdisc Amery Hung
2024-05-10 19:23 ` [RFC PATCH v8 01/20] bpf: Support passing referenced kptr to struct_ops programs Amery Hung
2024-05-16 23:59 ` Kumar Kartikeya Dwivedi
2024-05-17 0:17 ` Amery Hung
2024-05-17 0:23 ` Kumar Kartikeya Dwivedi [this message]
2024-05-17 1:22 ` Amery Hung
2024-05-17 2:00 ` Kumar Kartikeya Dwivedi
2024-05-10 19:23 ` [RFC PATCH v8 02/20] selftests/bpf: Test referenced kptr arguments of " Amery Hung
2024-05-10 21:33 ` Kui-Feng Lee
2024-05-10 22:16 ` Amery Hung
2024-05-16 23:14 ` Amery Hung
2024-05-16 23:43 ` Martin KaFai Lau
2024-05-17 0:54 ` Amery Hung
2024-05-17 1:07 ` Martin KaFai Lau
2024-05-10 19:23 ` [RFC PATCH v8 03/20] bpf: Allow struct_ops prog to return referenced kptr Amery Hung
2024-05-17 2:06 ` Amery Hung
2024-05-17 5:30 ` Martin KaFai Lau
2024-05-10 19:23 ` [RFC PATCH v8 04/20] selftests/bpf: Test returning kptr from struct_ops programs Amery Hung
2024-05-10 19:23 ` [RFC PATCH v8 05/20] bpf: Generate btf_struct_metas for kernel BTF Amery Hung
2024-05-10 19:23 ` [RFC PATCH v8 06/20] bpf: Recognize kernel types as graph values Amery Hung
2024-05-10 19:23 ` [RFC PATCH v8 07/20] bpf: Allow adding kernel objects to collections Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 08/20] selftests/bpf: Test adding kernel object to bpf graph Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 09/20] bpf: Find special BTF fields in union Amery Hung
2024-05-16 23:37 ` Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 10/20] bpf: Introduce exclusive-ownership list and rbtree nodes Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 11/20] bpf: Allow adding exclusive nodes to bpf list and rbtree Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 12/20] selftests/bpf: Modify linked_list tests to work with macro-ified removes Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 13/20] bpf: net_sched: Support implementation of Qdisc_ops in bpf Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 14/20] bpf: net_sched: Add bpf qdisc kfuncs Amery Hung
2024-05-22 23:55 ` Martin KaFai Lau
2024-05-23 1:06 ` Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 15/20] bpf: net_sched: Allow more optional methods in Qdisc_ops Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 16/20] libbpf: Support creating and destroying qdisc Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 17/20] selftests: Add a basic fifo qdisc test Amery Hung
2024-05-21 3:15 ` Stanislav Fomichev
2024-05-21 15:03 ` Amery Hung
2024-05-21 17:57 ` Stanislav Fomichev
2024-05-10 19:24 ` [RFC PATCH v8 18/20] selftests: Add a bpf fq qdisc to selftest Amery Hung
2024-05-24 6:24 ` Martin KaFai Lau
2024-05-24 7:40 ` Toke Høiland-Jørgensen
2024-05-26 1:08 ` Martin KaFai Lau
2024-05-27 10:09 ` Toke Høiland-Jørgensen
2024-05-24 19:33 ` Alexei Starovoitov
2024-05-24 20:54 ` Martin KaFai Lau
2024-05-10 19:24 ` [RFC PATCH v8 19/20] selftests: Add a bpf netem " Amery Hung
2024-05-10 19:24 ` [RFC PATCH v8 20/20] selftests: Add a prio bpf qdisc Amery Hung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAP01T74mQPMktHJiPoZ7z-UfFCRoxOexpBe_X2v3rLpE5A+WEA@mail.gmail.com \
--to=memxor@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=sinquersw@gmail.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).