From: Martin KaFai Lau <martin.lau@linux.dev>
To: David Vernet <void@manifault.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com,
jolsa@kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com, tj@kernel.org, clm@meta.com,
thinker.li@gmail.com, Stanislav Fomichev <sdf@google.com>
Subject: Re: [PATCH bpf-next] bpf: Support default .validate() and .update() behavior for struct_ops links
Date: Fri, 11 Aug 2023 10:35:03 -0700 [thread overview]
Message-ID: <fe388d79-bdfc-0480-5f4b-1a40016fd53d@linux.dev> (raw)
In-Reply-To: <ZNVvfYEsLyotn+G1@google.com>
On 8/10/23 4:15 PM, Stanislav Fomichev wrote:
> On 08/10, David Vernet wrote:
>> On Thu, Aug 10, 2023 at 03:46:18PM -0700, Stanislav Fomichev wrote:
>>> On 08/10, David Vernet wrote:
>>>> Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also
>>>> define the .validate() and .update() callbacks in its corresponding
>>>> struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful
>>>> in its own right to ensure that the map is unloaded if an application
>>>> crashes. For example, with sched_ext, we want to automatically unload
>>>> the host-wide scheduler if the application crashes. We would likely
>>>> never support updating elements of a sched_ext struct_ops map, so we'd
>>>> have to implement these callbacks showing that they _can't_ support
>>>> element updates just to benefit from the basic lifetime management of
>>>> struct_ops links.
>>>>
>>>> Let's enable struct_ops maps to work with BPF_F_LINK even if they
>>>> haven't defined these callbacks, by assuming that a struct_ops map
>>>> element cannot be updated by default.
>>>
>>> Any reason this is not part of sched_ext series? As you mention,
>>> we don't seem to have such users in the three?
>>
>> Hi Stanislav,
>>
>> The sched_ext series [0] implements these callbacks. See
>> bpf_scx_update() and bpf_scx_validate().
>>
>> [0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@kernel.org/
>>
>> We could add this into that series and remove those callbacks, but this
>> patch is fixing a UX / API issue with struct_ops links that's not really
>> relevant to sched_ext. I don't think there's any reason to couple
>> updating struct_ops map elements with allowing the kernel to manage the
>> lifetime of struct_ops maps -- just because we only have 1 (non-test)
Agree the link-update does not necessarily couple with link-creation, so
removing 'link' update function enforcement is ok. The intention was to avoid
the struct_ops link inconsistent experience (one struct_ops link support update
and another struct_ops link does not) because consistency was one of the reason
for the true kernel backed link support that Kui-Feng did. tcp-cc is the only
one for now in struct_ops and it can support update, so the enforcement is here.
I can see Stan's point that removing it now looks immature before a struct_ops
landed in the kernel showing it does not make sense or very hard to support
'link' update. However, the scx patch set has shown this point, so I think it is
good enough.
For 'validate', it is not related a 'link' update. It is for the struct_ops
'map' update. If the loaded struct_ops map is invalid, it will end up having a
useless struct_ops map and no link can be created from it. I can see some
struct_ops subsystem check all the 'ops' function for NULL before calling (like
the FUSE RFC). I can also see some future struct_ops will prefer not to check
NULL at all and prefer to assume a subset of the ops is always valid. Does
having a 'validate' enforcement is blocking the scx patchset in some way? If
not, I would like to keep this for now. Once it is removed, there is no turning
back.
>> struct_ops implementation in-tree doesn't mean we shouldn't improve APIs
>> where it makes sense.
>>
>> Thanks,
>> David
>
> Ack. I guess up to you and Martin. Just trying to understand whether I'm
> missing something or the patch does indeed fix some use-case :-)
next prev parent reply other threads:[~2023-08-11 17:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 22:04 [PATCH bpf-next] bpf: Support default .validate() and .update() behavior for struct_ops links David Vernet
2023-08-10 22:46 ` Stanislav Fomichev
2023-08-10 23:01 ` David Vernet
2023-08-10 23:15 ` Stanislav Fomichev
2023-08-11 17:35 ` Martin KaFai Lau [this message]
2023-08-11 18:17 ` Kui-Feng Lee
2023-08-11 20:19 ` David Vernet
2023-08-11 21:25 ` Kui-Feng Lee
2023-08-11 22:49 ` Martin KaFai Lau
2023-08-11 23:12 ` Kui-Feng Lee
2023-08-11 23:34 ` Martin KaFai Lau
2023-08-11 23:36 ` David Vernet
2023-08-14 16:55 ` Martin KaFai Lau
2023-08-14 17:45 ` David Vernet
2023-08-11 6:22 ` Kui-Feng Lee
2023-08-11 15:10 ` David Vernet
2023-08-11 6:43 ` Yonghong Song
2023-08-11 15:09 ` David Vernet
2023-08-11 15:43 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe388d79-bdfc-0480-5f4b-1a40016fd53d@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=clm@meta.com \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=thinker.li@gmail.com \
--cc=tj@kernel.org \
--cc=void@manifault.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.