From: Kui-Feng Lee <sinquersw@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>,
Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
netdev@vger.kernel.org, Kui-Feng Lee <thinker.li@gmail.com>
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Tue, 30 Jan 2024 09:49:26 -0800 [thread overview]
Message-ID: <845df264-adb3-4e00-bb8e-2a0ac1d331ae@gmail.com> (raw)
In-Reply-To: <8c00bd63-2d00-401e-af6d-1b6aebac4701@linux.dev>
On 1/29/24 22:39, Martin KaFai Lau wrote:
> On 1/26/24 5:17 PM, Amery Hung wrote:
>> On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau
>> <martin.lau@linux.dev> wrote:
>>>
>>> On 1/23/24 9:22 PM, Amery Hung wrote:
>>>>> I looked at the high level of the patchset. The major ops that it
>>>>> wants to be
>>>>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and
>>>>> ".reset" in
>>>>> patch 4 and patch 5).
>>>>>
>>>>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach
>>>>> types (each for
>>>>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new
>>>>> "bpf_qdisc_ctx" in the
>>>>> uapi. It is no long an acceptable way to add new bpf extension.
>>>>>
>>>>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely
>>>>> implemented
>>>>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops
>>>>> for Qdisc_ops
>>>>> can be created. The bpf Qdisc_ops can be loaded through the
>>>>> existing struct_ops api.
>>>>>
>>>> Partially. If using struct_ops, I think we'll need another structure
>>>> like the following in bpf qdisc to be implemented with struct_ops bpf:
>>>>
>>>> struct bpf_qdisc_ops {
>>>> int (*enqueue) (struct sk_buff *skb)
>>>> void (*dequeue) (void)
>>>> void (*init) (void)
>>>> void (*reset) (void)
>>>> };
>>>>
>>>> Then, Qdisc_ops will wrap around them to handle things that cannot be
>>>> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
>>>
>>> We can see how those limitations (calling sch_tree_lock() and
>>> returning a ptr)
>>> can be addressed in bpf. This will also help other similar use cases.
>>>
>>
>> For kptr, I wonder if we can support the following semantics in bpf if
>> they make sense:
>
> I think they are useful but they are not fully supported now.
>
> Some thoughts below.
>
>> 1. Passing a referenced kptr into a bpf program, which will also need
>> to be released, or exchanged into maps or allocated objects.
>
> "enqueue" should be the one considering here:
>
> struct Qdisc_ops {
> /* ... */
> int (*enqueue)(struct sk_buff *skb,
> struct Qdisc *sch,
> struct sk_buff **to_free);
>
> };
>
> The verifier only marks the skb as a trusted kptr but does not mark its
> reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:
>
> if (prog_args_trusted(prog))
> info->reg_type |= PTR_TRUSTED;
>
> The verifier does not know the skb ownership is passed into the
> ".enqueue" ops and does not know the bpf prog needs to release it or
> store it in a map.
>
> The verifier tracks the reference state when a KF_ACQUIRE kfunc is
> called (just an example, not saying we need to use KF_ACQUIRE kfunc).
> Take a look at acquire_reference_state() which is the useful one here.
>
> Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier
> can always acquire_reference_state() for the "struct sk_buff *skb"
> argument.
>
> Take a look at a recent RFC:
> https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@gmail.com/
> which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC
> patch is tagging the argument could be NULL by appending "__nullable" to
> the argument name. The verifier will enforce that the bpf prog must
> check for NULL first.
>
> The similar idea can be used here but with a different tagging (for
> example, "__must_release", admittedly not a good name). While the RFC
> patch is in-progress, for now, may be hardcode for the ".enqueue" ops in
> check_struct_ops_btf_id() and always acquire_reference_state() for the
> skb. This part can be adjusted later once the RFC patch will be in shape.
>
>
> Then one more thing is to track when the struct_ops bpf prog is actually
> reading the value of the skb pointer. One thing is worth to mention
> here, e.g. a struct_ops prog for enqueue:
>
> SEC("struct_ops")
> int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
> struct sk_buff **to_free)
> {
> return bpf_qdisc_drop(skb, sch, to_free);
> }
>
> Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to
> an array of __u64 as the only argument. The skb is actually in ctx[0],
> sch is in ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g.
> r1 = ctx[0]), btf_ctx_access() marks the reg_type to PTR_TRUSTED. It
> needs to also initialize the reg->ref_obj_id by the id obtained earlier
> from acquire_reference_state() during check_struct_ops_btf_id() somehow.
>
>
>> 2. Returning a kptr from a program and treating it as releasing the
>> reference.
>
> e.g. for dequeue:
>
> struct Qdisc_ops {
> /* ... */
> struct sk_buff * (*dequeue)(struct Qdisc *);
> };
>
>
> Right now the verifier should complain on check_reference_leak() if the
> struct_ops bpf prog is returning a referenced kptr.
>
> Unlike an argument, the return type of a function does not have a name
> to tag. It is the first case that a struct_ops bpf_prog returning a
We may tag the stub functions instead, right?
Is the purpose here to return a referenced pointer from a struct_ops
operator without verifier complaining?
> pointer. One idea is to assume it must be a trusted pointer
> (PTR_TRUSTED) and the verifier should check it is indeed with
> PTR_TRUSTED flag.
>
> May be release_reference_state() can be called to assume the kernel will
> release it as long as the return pointer type is PTR_TRUSTED and the
> type matches the return type of the ops. Take a look at
> check_return_code().
>
next prev parent reply other threads:[~2024-01-30 17:49 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-20 11:41 ` kernel test robot
2024-01-20 12:13 ` kernel test robot
2024-01-20 13:47 ` kernel test robot
2024-01-23 23:51 ` Martin KaFai Lau
2024-01-24 5:22 ` Amery Hung
2024-01-26 2:22 ` Martin KaFai Lau
2024-01-27 1:17 ` Amery Hung
2024-01-30 6:39 ` Martin KaFai Lau
2024-01-30 17:49 ` Kui-Feng Lee [this message]
2024-01-31 1:01 ` Martin KaFai Lau
2024-01-31 16:49 ` Kui-Feng Lee
2024-01-31 16:59 ` Amery Hung
2024-01-31 16:23 ` Amery Hung
2024-02-02 1:47 ` Martin KaFai Lau
2024-02-09 20:14 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23 0:17 ` Andrii Nakryiko
2024-01-23 19:40 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29 ` Daniel Borkmann
2024-01-26 19:49 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10 ` Daniel Borkmann
2024-01-24 12:09 ` Jamal Hadi Salim
2024-01-24 13:07 ` Daniel Borkmann
2024-01-24 14:11 ` Jamal Hadi Salim
2024-01-24 15:26 ` Daniel Borkmann
2024-01-24 21:26 ` Amery Hung
2024-01-25 11:57 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=845df264-adb3-4e00-bb8e-2a0ac1d331ae@gmail.com \
--to=sinquersw@gmail.com \
--cc=ameryhung@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=thinker.li@gmail.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.