Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Yafang Shao <laoar.shao@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
	andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
	yhs@fb.com, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, jolsa@kernel.org
Cc: bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu
Date: Tue, 1 Aug 2023 10:53:29 -0700	[thread overview]
Message-ID: <3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev> (raw)
In-Reply-To: <20230801142912.55078-1-laoar.shao@gmail.com>



On 8/1/23 7:29 AM, Yafang Shao wrote:
> Some statistic data is stored in percpu pointer but the kernel doesn't
> aggregate it into a single value, for example, the data in struct
> psi_group_cpu.
> 
> Currently, we can traverse percpu data using for_loop and bpf_per_cpu_ptr:
> 
>    for_loop(nr_cpus, callback_fn, callback_ctx, 0)
> 
> In the callback_fn, we retrieve the percpu pointer with bpf_per_cpu_ptr().
> The drawback is that 'nr_cpus' cannot be a variable; otherwise, it will be
> rejected by the verifier, hindering deployment, as servers may have
> different 'nr_cpus'. Using CONFIG_NR_CPUS is not ideal.
> 
> Alternatively, with the bpf_cpumask family, we can obtain a task's cpumask.
> However, it requires creating a bpf_cpumask, copying the cpumask from the
> task, and then parsing the CPU IDs from it, resulting in low efficiency.
> Introducing other kfuncs like bpf_cpumask_next might be necessary.
> 
> A new bpf helper, bpf_for_each_cpu, is introduced to conveniently traverse
> percpu data, covering all scenarios. It includes
> for_each_{possible, present, online}_cpu. The user can also traverse CPUs
> from a specific task, such as walking the CPUs of a cpuset cgroup when the
> task is in that cgroup.

The bpf subsystem has adopted kfunc approach. So there is no bpf helper
any more. You need to have a bpf_for_each_cpu kfunc instead.

But I am wondering whether we should use open coded iterator loops
    06accc8779c1  bpf: add support for open-coded iterator loops

In kernel, we have a global variable
    nr_cpu_ids (also in kernel/bpf/helpers.c)
which is used in numerous places for per cpu data struct access.

I am wondering whether we could have bpf code like
    int nr_cpu_ids __ksym;

    struct bpf_iter_num it;
    int i = 0;

    // nr_cpu_ids is special, we can give it a range [1, CONFIG_NR_CPUS].
    bpf_iter_num_new(&it, 1, nr_cpu_ids);
    while ((v = bpf_iter_num_next(&it))) {
           /* access cpu i data */
           i++;
    }
    bpf_iter_num_destroy(&it);

 From all existing open coded iterator loops, looks like
upper bound has to be a constant. We might need to extend support
to bounded scalar upper bound if not there.
> 
> In our use case, we utilize this new helper to traverse percpu psi data.
> This aids in understanding why CPU, Memory, and IO pressure data are high
> on a server or a container.
> 
> Due to the __percpu annotation, clang-14+ and pahole-1.23+ are required.
> 
> Yafang Shao (3):
>    bpf: Add bpf_for_each_cpu helper
>    cgroup, psi: Init root cgroup psi to psi_system
>    selftests/bpf: Add selftest for for_each_cpu
> 
>   include/linux/bpf.h                                |   1 +
>   include/linux/psi.h                                |   2 +-
>   include/uapi/linux/bpf.h                           |  32 +++++
>   kernel/bpf/bpf_iter.c                              |  72 +++++++++++
>   kernel/bpf/helpers.c                               |   2 +
>   kernel/bpf/verifier.c                              |  29 ++++-
>   kernel/cgroup/cgroup.c                             |   5 +-
>   tools/include/uapi/linux/bpf.h                     |  32 +++++
>   .../selftests/bpf/prog_tests/for_each_cpu.c        | 137 +++++++++++++++++++++
>   .../selftests/bpf/progs/test_for_each_cpu.c        |  63 ++++++++++
>   10 files changed, 372 insertions(+), 3 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/for_each_cpu.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_for_each_cpu.c
>

next prev parent reply	other threads:[~2023-08-01 17:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 14:29 [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 1/3] bpf: Add bpf_for_each_cpu helper Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 2/3] cgroup, psi: Init root cgroup psi to psi_system Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add selftest for for_each_cpu Yafang Shao
2023-08-01 17:53 ` Yonghong Song [this message]
2023-08-02  2:33   ` [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-02  2:45     ` Alexei Starovoitov
2023-08-02  2:57       ` Yafang Shao
2023-08-02  3:29       ` David Vernet
2023-08-02  6:54         ` Yonghong Song
2023-08-02 15:46           ` David Vernet
2023-08-02 16:23             ` Alexei Starovoitov
2023-08-02 16:33         ` Alexei Starovoitov
2023-08-02 17:06           ` David Vernet
2023-08-02 18:13             ` Alexei Starovoitov
2023-08-03  8:21           ` Alan Maguire
2023-08-03 15:22             ` Yonghong Song
2023-08-03 16:10               ` Alan Maguire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox