Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: David Vernet <void@manifault.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu
Date: Tue, 1 Aug 2023 23:54:18 -0700	[thread overview]
Message-ID: <8b6b0703-4ed6-c0cb-c61a-9ebcfb5fe668@linux.dev> (raw)
In-Reply-To: <20230802032958.GB472124@maniforge>



On 8/1/23 8:29 PM, David Vernet wrote:
> On Tue, Aug 01, 2023 at 07:45:57PM -0700, Alexei Starovoitov wrote:
>> On Tue, Aug 1, 2023 at 7:34 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>>
>>>>
>>>> In kernel, we have a global variable
>>>>      nr_cpu_ids (also in kernel/bpf/helpers.c)
>>>> which is used in numerous places for per cpu data struct access.
>>>>
>>>> I am wondering whether we could have bpf code like
>>>>      int nr_cpu_ids __ksym;
> 
> I think this would be useful in general, though any __ksym variable like
> this would have to be const and mapped in .rodata, right? But yeah,
> being able to R/O map global variables like this which have static
> lifetimes would be nice.

No. There is no map here. __ksym symbol will have a ld_imm64 insn
to load the value in the bpf code. The address will be the kernel
address patched by libbpf.

> 
> It's not quite the same thing as nr_cpu_ids, but FWIW, you could
> accomplish something close to this by doing something like this in your
> BPF prog:
> 
> /* Set in user space to libbpf_num_possible_cpus() */
> const volatile __u32 nr_cpus;

This is another approach. In this case, nr_cpus will be
stored in a map.

I don't know the difference between kernel nr_cpu_ids vs.
libbpf_num_possible_cpus(). I am choosing nr_cpu_ids
because it is the one used inside the kernel. If
libbpf_num_possible_cpus() effectively nr_cpu_ids,
then happy to use libbpf_num_possible_cpus() which
is more user/libbpf friendly.

> 
> ...
> 	__u32 i;
> 
> 	bpf_for(i, 0, nr_cpus)
> 		bpf_printk("Iterating over cpu %u", i);
> 
> ...
> 
>>>>      struct bpf_iter_num it;
>>>>      int i = 0;
>>>>
>>>>      // nr_cpu_ids is special, we can give it a range [1, CONFIG_NR_CPUS].
>>>>      bpf_iter_num_new(&it, 1, nr_cpu_ids);
>>>>      while ((v = bpf_iter_num_next(&it))) {
>>>>             /* access cpu i data */
>>>>             i++;
>>>>      }
>>>>      bpf_iter_num_destroy(&it);
>>>>
>>>>   From all existing open coded iterator loops, looks like
>>>> upper bound has to be a constant. We might need to extend support
>>>> to bounded scalar upper bound if not there.
>>>
>>> Currently the upper bound is required by both the open-coded for-loop
>>> and the bpf_loop. I think we can extend it.
>>>
>>> It can't handle the cpumask case either.
>>>
>>>      for_each_cpu(cpu, mask)
>>>
>>> In the 'mask', the CPU IDs might not be continuous. In our container
>>> environment, we always use the cpuset cgroup for some critical tasks,
>>> but it is not so convenient to traverse the percpu data of this cpuset
>>> cgroup.  We have to do it as follows for this case :
>>>
>>> That's why we prefer to introduce a bpf_for_each_cpu helper. It is
>>> fine if it can be implemented as a kfunc.
>>
>> I think open-coded-iterators is the only acceptable path forward here.
>> Since existing bpf_iter_num doesn't fit due to sparse cpumask,
>> let's introduce bpf_iter_cpumask and few additional kfuncs
>> that return cpu_possible_mask and others.
> 
> I agree that this is the correct way to generalize this. The only thing
> that we'll have to figure out is how to generalize treating const struct
> cpumask * objects as kptrs. In sched_ext [0] we export
> scx_bpf_get_idle_cpumask() and scx_bpf_get_idle_smtmask() kfuncs to
> return trusted global cpumask kptrs that can then be "released" in
> scx_bpf_put_idle_cpumask(). scx_bpf_put_idle_cpumask() is empty and
> exists only to appease the verifier that the trusted cpumask kptrs
> aren't being leaked and are having their references "dropped".
> 
> [0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@kernel.org/
> 
> I'd imagine that we have 2 ways forward if we want to enable progs to
> fetch other global cpumasks with static lifetimes (e.g.
> __cpu_possible_mask or nohz.idle_cpus_mask):
> 
> 1. The most straightforward thing to do would be to add a new kfunc in
>     kernel/bpf/cpumask.c that's a drop-in replacment for
>     scx_bpf_put_idle_cpumask():
> 
> void bpf_global_cpumask_drop(const struct cpumask *cpumask)
> {}
> 
> 2. Another would be to implement something resembling what Yonghong
>     suggested in [1], where progs can link against global allocated kptrs
>     like:
> 
> const struct cpumask *__cpu_possible_mask __ksym;
> 
> [1]: https://lore.kernel.org/all/3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev/#t
> 
> In my opinion (1) is more straightforward, (2) is a better UX.
> 
> Note again that both approaches only works for cpumasks with static
> lifetimes.  I can't think of a way to treat dynamically allocated struct
> cpumask *objects as kptrs as there's nowhere to put a reference. If
> someone wants to track a dynamically allocated cpumask, they'd have to
> create a kptr out of its container object, and then pass that object's
> cpumask as a const struct cpumask * to other BPF cpumask kfuncs
> (including e.g. the proposed iterator).
> 
>> We already have some cpumask support in kernel/bpf/cpumask.c
>> bpf_iter_cpumask will be a natural follow up.
> 
> Yes, this should be easy to add.
> 
> - David

next prev parent reply	other threads:[~2023-08-02  6:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 14:29 [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 1/3] bpf: Add bpf_for_each_cpu helper Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 2/3] cgroup, psi: Init root cgroup psi to psi_system Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add selftest for for_each_cpu Yafang Shao
2023-08-01 17:53 ` [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yonghong Song
2023-08-02  2:33   ` Yafang Shao
2023-08-02  2:45     ` Alexei Starovoitov
2023-08-02  2:57       ` Yafang Shao
2023-08-02  3:29       ` David Vernet
2023-08-02  6:54         ` Yonghong Song [this message]
2023-08-02 15:46           ` David Vernet
2023-08-02 16:23             ` Alexei Starovoitov
2023-08-02 16:33         ` Alexei Starovoitov
2023-08-02 17:06           ` David Vernet
2023-08-02 18:13             ` Alexei Starovoitov
2023-08-03  8:21           ` Alan Maguire
2023-08-03 15:22             ` Yonghong Song
2023-08-03 16:10               ` Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b6b0703-4ed6-c0cb-c61a-9ebcfb5fe668@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=void@manifault.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox