From: Martin KaFai Lau <martin.lau@linux.dev>
To: kuifeng@meta.com
Cc: bpf@vger.kernel.org, ast@kernel.org, song@kernel.org,
kernel-team@meta.com, andrii@kernel.org,
Stanislav Fomichev <sdf@google.com>,
Kui-Feng Lee <sinquersw@gmail.com>
Subject: Re: [RFC bpf-next 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt.
Date: Wed, 2 Aug 2023 15:28:16 -0700 [thread overview]
Message-ID: <2cbc2699-a88d-5798-eabd-b4dafdf6d100@linux.dev> (raw)
In-Reply-To: <CAKH8qBsn9e+ROsBN9EJ9mWQ6T_1=d0adHYPQ37WwM0TVn1H9hw@mail.gmail.com>
On 8/1/23 11:08 AM, Stanislav Fomichev wrote:
> On Tue, Aug 1, 2023 at 10:31 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 7/31/23 16:35, Stanislav Fomichev wrote:
>>> On Mon, Jul 31, 2023 at 3:02 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>>
>>>> Sorry for the late reply! I just backed from a vacation.
>>>>
>>>>
>>>> On 7/24/23 11:36, Stanislav Fomichev wrote:
>>>>> On 07/21, kuifeng@meta.com wrote:
>>>>>> From: Kui-Feng Lee <kuifeng@meta.com>
>>>>>>
>>>>>> Enable sleepable cgroup/{get,set}sockopt hooks.
>>>>>>
>>>>>> The sleepable BPF programs attached to cgroup/{get,set}sockopt hooks may
>>>>>> received a pointer to the optval in user space instead of a kernel
>>>>>> copy. ctx->user_optval and ctx->user_optval_end are the pointers to the
>>>>>> begin and end of the user space buffer if receiving a user space
>>>>>> buffer. ctx->optval and ctx->optval_end will be a kernel copy if receiving
>>>>>> a kernel space buffer.
>>>>>>
>>>>>> A program receives a user space buffer if ctx->flags &
>>>>>> BPF_SOCKOPT_FLAG_OPTVAL_USER is true, otherwise it receives a kernel space
>>>>>> buffer. The BPF programs should not read/write from/to a user space buffer
>>>>>> dirrectly. It should access the buffer through bpf_copy_from_user() and
>>>>>> bpf_copy_to_user() provided in the following patches.
>>>>>>
>>>>>> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
>>>>>> ---
>>>>>> include/linux/filter.h | 3 +
>>>>>> include/uapi/linux/bpf.h | 9 ++
>>>>>> kernel/bpf/cgroup.c | 189 ++++++++++++++++++++++++++-------
>>>>>> kernel/bpf/verifier.c | 7 +-
>>>>>> tools/include/uapi/linux/bpf.h | 9 ++
>>>>>> tools/lib/bpf/libbpf.c | 2 +
>>>>>> 6 files changed, 176 insertions(+), 43 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>>>>>> index f69114083ec7..301dd1ba0de1 100644
>>>>>> --- a/include/linux/filter.h
>>>>>> +++ b/include/linux/filter.h
>>>>>> @@ -1345,6 +1345,9 @@ struct bpf_sockopt_kern {
>>>>>> s32 level;
>>>>>> s32 optname;
>>>>>> s32 optlen;
>>>>>> + u32 flags;
>>>>>> + u8 *user_optval;
>>>>>> + u8 *user_optval_end;
>>>>>> /* for retval in struct bpf_cg_run_ctx */
>>>>>> struct task_struct *current_task;
>>>>>> /* Temporary "register" for indirect stores to ppos. */
>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>>> index 739c15906a65..b2f81193f97b 100644
>>>>>> --- a/include/uapi/linux/bpf.h
>>>>>> +++ b/include/uapi/linux/bpf.h
>>>>>> @@ -7135,6 +7135,15 @@ struct bpf_sockopt {
>>>>>> __s32 optname;
>>>>>> __s32 optlen;
>>>>>> __s32 retval;
>>>>>> +
>>>>>> + __bpf_md_ptr(void *, user_optval);
>>>>>> + __bpf_md_ptr(void *, user_optval_end);
>>>>>
>>>>> Can we re-purpose existing optval/optval_end pointers
>>>>> for the sleepable programs? IOW, when the prog is sleepable,
>>>>> pass user pointers via optval/optval_end and require the programs
>>>>> to do copy_to/from on this buffer (even if the backing pointer might be
>>>>> in kernel memory - we can handle that in the kfuncs?).
>>>>>
>>>>> The fact that the program now needs to look at the flag
>>>>> (BPF_SOCKOPT_FLAG_OPTVAL_USER) and decide which buffer to
>>>>> use makes the handling even more complicated; and we already have a
>>>>> bunch of hairy stuff in these hooks. (or I misreading the change?)
>>>>>
>>>>> Also, regarding sleepable and non-sleepable co-existence: do we really need
>>>>> that? Can we say that all the programs have to be sleepable
>>>>> or non-sleepable? Mixing them complicates the sharing of that buffer.
>>>>
>>>> I considered this approach as well. This is an open question for me.
>>>> If we go this way, it means we can not attach a BPF program of a type
>>>> if any program of the other type has been installed.
>>>
>>> If we pass two pointers (kernel copy buffer + real user mem) to the
>>> sleepable program, we'll make it even more complicated by inheriting
>>> all existing warts of the non-sleepable version :-( >>> IOW, feels like we should try to see if we can have some
>>> copy_to/from_user kfuncs in the sleepable version that transparently
>>> support either kernel or user memory (and prohibit direct access to
>>> user_optval in the sleepable version).
From looking at patch 5 selftest, I also think exposing user_optval_* and flags
is not ideal. For example, correct me if I am wrong, in patch 3, dynptr is only
used for setsockopt to alloc. Intuitively, when developing a bpf prog, I would
expect using bpf_dynptr_write() to write a new sockopt and then done. However,
it still needs to "install" (by calling bpf_sockopt_install_optval). I think the
"install" part is leaking too much internal details.
Beside, adding both new 'ctx->user_optval + len > ctx->user_optval_end' and
dynptr usage pattern together is counter productive considering dynptr is to
avoid the length comparison. Saving an unnecessary "copy_from_user(ctx.optval,
optval,...)" is more important than being able to directly read from
ctx->user_optval. The bpf prog is usually only interested in a few optnames and
directly returns without even looking at the optval for the uninterested
optnames. The current __cgroup_bpf_run_filter_{get,set}sockopt always does a
"copy_from_user(ctx.optval, optval,...)".
>>> And then, if we have one non-sleepable program in the chain, we can
>>> fallback everything to the kernel buffer (maybe).
>>> This way seems like we can support both versions in the same chain and
>>> have a more sane api?
>>
>> Basically, you are saying to move cp_from_optval() and cp_to_optval() in
>> the testcase to kfuncs. This can cause unnecessary copy. We can add
>> an API to make a dynptr from the ctx to avoid unnecessary copies.
>
> Yeah, handle this transparently in the kfunc or via dynptr, whatever works.
> I'm not too worried about the extra copy tbh, this is a slow path; I'm
> more concerned about improving the bpf program / user experience.
+1. It will be great if all can be done in two kfunc (/dynptr_{write,read}). I
would disallow sleepable prog to use the optval if it can make things simpler.
If it goes with dynptr, need to support bpf_dynptr_slice() as well which I think
should be doable after a quick thought.
The test needs to include a cgrp->effective array that has interleaved sleepable
and non-sleepable bpf progs.
next prev parent reply other threads:[~2023-08-02 22:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-22 5:22 [RFC bpf-next 0/5] Sleepable BPF programs on cgroup {get,set}sockopt kuifeng
2023-07-22 5:22 ` [RFC bpf-next 1/5] bpf: enable sleepable BPF programs attached to cgroup/{get,set}sockopt kuifeng
2023-07-22 16:09 ` kernel test robot
2023-07-22 18:22 ` kernel test robot
2023-07-24 18:36 ` Stanislav Fomichev
2023-07-31 22:02 ` Kui-Feng Lee
2023-07-31 23:35 ` Stanislav Fomichev
2023-08-01 17:31 ` Kui-Feng Lee
2023-08-01 18:08 ` Stanislav Fomichev
2023-08-02 22:28 ` Martin KaFai Lau [this message]
2023-08-02 19:25 ` Martin KaFai Lau
2023-07-22 5:22 ` [RFC bpf-next 2/5] bpf: Provide bpf_copy_from_user() and bpf_copy_to_user() kuifeng
2023-07-22 6:39 ` kernel test robot
2023-07-22 6:49 ` kernel test robot
2023-07-22 7:41 ` kernel test robot
2023-07-22 20:26 ` kernel test robot
2023-08-02 19:59 ` Martin KaFai Lau
2023-07-22 5:22 ` [RFC bpf-next 3/5] bpf: Add a new dynptr type for CGRUP_SOCKOPT kuifeng
2023-07-22 5:22 ` [RFC bpf-next 4/5] bpf: Prevent BPF programs from access the buffer pointed by user_optval kuifeng
2023-07-22 5:22 ` [RFC bpf-next 5/5] bpf: Add test cases for sleepable BPF programs of the CGROUP_SOCKOPT type kuifeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2cbc2699-a88d-5798-eabd-b4dafdf6d100@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@meta.com \
--cc=kuifeng@meta.com \
--cc=sdf@google.com \
--cc=sinquersw@gmail.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.