From: Yonghong Song <yonghong.song@linux.dev>
To: Alan Maguire <alan.maguire@oracle.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
David Vernet <void@manifault.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
Stephen Brennan <stephen.s.brennan@oracle.com>
Subject: Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu
Date: Thu, 3 Aug 2023 08:22:13 -0700 [thread overview]
Message-ID: <cddeb658-563e-9ff9-0ece-4509eabab663@linux.dev> (raw)
In-Reply-To: <998f8e89-fb00-820f-15d9-1d227cc09e54@oracle.com>
On 8/3/23 1:21 AM, Alan Maguire wrote:
> On 02/08/2023 17:33, Alexei Starovoitov wrote:
>> On Tue, Aug 1, 2023 at 8:30 PM David Vernet <void@manifault.com> wrote:
>>> I agree that this is the correct way to generalize this. The only thing
>>> that we'll have to figure out is how to generalize treating const struct
>>> cpumask * objects as kptrs. In sched_ext [0] we export
>>> scx_bpf_get_idle_cpumask() and scx_bpf_get_idle_smtmask() kfuncs to
>>> return trusted global cpumask kptrs that can then be "released" in
>>> scx_bpf_put_idle_cpumask(). scx_bpf_put_idle_cpumask() is empty and
>>> exists only to appease the verifier that the trusted cpumask kptrs
>>> aren't being leaked and are having their references "dropped".
>>
>> why is it KF_ACQUIRE ?
>> I think it can just return a trusted pointer without acquire.
>>
>>> [0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@kernel.org/
>>>
>>> I'd imagine that we have 2 ways forward if we want to enable progs to
>>> fetch other global cpumasks with static lifetimes (e.g.
>>> __cpu_possible_mask or nohz.idle_cpus_mask):
>>>
>>> 1. The most straightforward thing to do would be to add a new kfunc in
>>> kernel/bpf/cpumask.c that's a drop-in replacment for
>>> scx_bpf_put_idle_cpumask():
>>>
>>> void bpf_global_cpumask_drop(const struct cpumask *cpumask)
>>> {}
>>>
>>> 2. Another would be to implement something resembling what Yonghong
>>> suggested in [1], where progs can link against global allocated kptrs
>>> like:
>>>
>>> const struct cpumask *__cpu_possible_mask __ksym;
>>>
>>> [1]: https://lore.kernel.org/all/3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev/#t
>>>
>>> In my opinion (1) is more straightforward, (2) is a better UX.
>>
>> 1 = adding few kfuncs.
>> 2 = teaching pahole to emit certain global vars.
>>
>> nm vmlinux|g -w D|g -v __SCK_|g -v __tracepoint_|wc -l
>> 1998
>>
>> imo BTF increase trade off is acceptable.
>
> Agreed, Stephen's numbers on BTF size increase were pretty modest [1].
>
> What was gating that work in my mind was previous discussion around
> splitting aspects of BTF into a "vmlinux-extra". Experiments with this
> seemed to show it's hard to support, and worse, tooling would have to
> learn about its existence. We have to come up with a CONFIG convention
> about specifying what ends up in -extra versus core vmlinux BTF, what do
> we do about modules, etc. All feels like over-complication.
>
> I think a better path would be to support BTF in a vmlinux BTF module
> (controlled by making CONFIG_DEBUG_INFO_BTF tristate). The module is
> separately loadable, but puts vmlinux in the same place for tools -
> /sys/kernel/btf/vmlinux. That solves already-existing issues of BTF size
> for embedded use cases that have come up a few times, and lessens
> concerns about BTF size for other users, while it all works with
> existing tooling. I have a basic proof-of-concept but it will take time
> to hammer into shape.
>
> Because variable-related size increases are pretty modest, so should we
> proceed with the BTF variable support anyway? We can modularize BTF
> separately later on for those concerned about BTF size.
Alan, it seems a consensus has reached that we should include
global variables (excluding special kernel made ones like
__SCK_ and __tracepoint_) in vmlinux BTF.
please go ahead and propose a patch. Thanks!
>
> [1]
> https://lore.kernel.org/bpf/20221104231103.752040-1-stephen.s.brennan@oracle.com/
next prev parent reply other threads:[~2023-08-03 15:22 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 14:29 [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 1/3] bpf: Add bpf_for_each_cpu helper Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 2/3] cgroup, psi: Init root cgroup psi to psi_system Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add selftest for for_each_cpu Yafang Shao
2023-08-01 17:53 ` [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yonghong Song
2023-08-02 2:33 ` Yafang Shao
2023-08-02 2:45 ` Alexei Starovoitov
2023-08-02 2:57 ` Yafang Shao
2023-08-02 3:29 ` David Vernet
2023-08-02 6:54 ` Yonghong Song
2023-08-02 15:46 ` David Vernet
2023-08-02 16:23 ` Alexei Starovoitov
2023-08-02 16:33 ` Alexei Starovoitov
2023-08-02 17:06 ` David Vernet
2023-08-02 18:13 ` Alexei Starovoitov
2023-08-03 8:21 ` Alan Maguire
2023-08-03 15:22 ` Yonghong Song [this message]
2023-08-03 16:10 ` Alan Maguire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cddeb658-563e-9ff9-0ece-4509eabab663@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alan.maguire@oracle.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=stephen.s.brennan@oracle.com \
--cc=void@manifault.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.