From: Yonghong Song <yonghong.song@linux.dev>
To: Yafang Shao <laoar.shao@gmail.com>,
ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
yhs@fb.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org
Cc: bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu
Date: Tue, 1 Aug 2023 10:53:29 -0700 [thread overview]
Message-ID: <3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev> (raw)
In-Reply-To: <20230801142912.55078-1-laoar.shao@gmail.com>
On 8/1/23 7:29 AM, Yafang Shao wrote:
> Some statistic data is stored in percpu pointer but the kernel doesn't
> aggregate it into a single value, for example, the data in struct
> psi_group_cpu.
>
> Currently, we can traverse percpu data using for_loop and bpf_per_cpu_ptr:
>
> for_loop(nr_cpus, callback_fn, callback_ctx, 0)
>
> In the callback_fn, we retrieve the percpu pointer with bpf_per_cpu_ptr().
> The drawback is that 'nr_cpus' cannot be a variable; otherwise, it will be
> rejected by the verifier, hindering deployment, as servers may have
> different 'nr_cpus'. Using CONFIG_NR_CPUS is not ideal.
>
> Alternatively, with the bpf_cpumask family, we can obtain a task's cpumask.
> However, it requires creating a bpf_cpumask, copying the cpumask from the
> task, and then parsing the CPU IDs from it, resulting in low efficiency.
> Introducing other kfuncs like bpf_cpumask_next might be necessary.
>
> A new bpf helper, bpf_for_each_cpu, is introduced to conveniently traverse
> percpu data, covering all scenarios. It includes
> for_each_{possible, present, online}_cpu. The user can also traverse CPUs
> from a specific task, such as walking the CPUs of a cpuset cgroup when the
> task is in that cgroup.
The bpf subsystem has adopted kfunc approach. So there is no bpf helper
any more. You need to have a bpf_for_each_cpu kfunc instead.
But I am wondering whether we should use open coded iterator loops
06accc8779c1 bpf: add support for open-coded iterator loops
In kernel, we have a global variable
nr_cpu_ids (also in kernel/bpf/helpers.c)
which is used in numerous places for per cpu data struct access.
I am wondering whether we could have bpf code like
int nr_cpu_ids __ksym;
struct bpf_iter_num it;
int i = 0;
// nr_cpu_ids is special, we can give it a range [1, CONFIG_NR_CPUS].
bpf_iter_num_new(&it, 1, nr_cpu_ids);
while ((v = bpf_iter_num_next(&it))) {
/* access cpu i data */
i++;
}
bpf_iter_num_destroy(&it);
From all existing open coded iterator loops, looks like
upper bound has to be a constant. We might need to extend support
to bounded scalar upper bound if not there.
>
> In our use case, we utilize this new helper to traverse percpu psi data.
> This aids in understanding why CPU, Memory, and IO pressure data are high
> on a server or a container.
>
> Due to the __percpu annotation, clang-14+ and pahole-1.23+ are required.
>
> Yafang Shao (3):
> bpf: Add bpf_for_each_cpu helper
> cgroup, psi: Init root cgroup psi to psi_system
> selftests/bpf: Add selftest for for_each_cpu
>
> include/linux/bpf.h | 1 +
> include/linux/psi.h | 2 +-
> include/uapi/linux/bpf.h | 32 +++++
> kernel/bpf/bpf_iter.c | 72 +++++++++++
> kernel/bpf/helpers.c | 2 +
> kernel/bpf/verifier.c | 29 ++++-
> kernel/cgroup/cgroup.c | 5 +-
> tools/include/uapi/linux/bpf.h | 32 +++++
> .../selftests/bpf/prog_tests/for_each_cpu.c | 137 +++++++++++++++++++++
> .../selftests/bpf/progs/test_for_each_cpu.c | 63 ++++++++++
> 10 files changed, 372 insertions(+), 3 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/for_each_cpu.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_for_each_cpu.c
>
next prev parent reply other threads:[~2023-08-01 17:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 14:29 [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 1/3] bpf: Add bpf_for_each_cpu helper Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 2/3] cgroup, psi: Init root cgroup psi to psi_system Yafang Shao
2023-08-01 14:29 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add selftest for for_each_cpu Yafang Shao
2023-08-01 17:53 ` Yonghong Song [this message]
2023-08-02 2:33 ` [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu Yafang Shao
2023-08-02 2:45 ` Alexei Starovoitov
2023-08-02 2:57 ` Yafang Shao
2023-08-02 3:29 ` David Vernet
2023-08-02 6:54 ` Yonghong Song
2023-08-02 15:46 ` David Vernet
2023-08-02 16:23 ` Alexei Starovoitov
2023-08-02 16:33 ` Alexei Starovoitov
2023-08-02 17:06 ` David Vernet
2023-08-02 18:13 ` Alexei Starovoitov
2023-08-03 8:21 ` Alan Maguire
2023-08-03 15:22 ` Yonghong Song
2023-08-03 16:10 ` Alan Maguire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox