From: Yonghong Song <yonghong.song@linux.dev>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
tj@kernel.org, andrii@kernel.org, kpsingh@kernel.org,
song@kernel.org, martin.lau@linux.dev, daniel@iogearbox.net,
ast@kernel.org, bpf@vger.kernel.org, lkp@intel.com,
john.fastabend@gmail.com, sdf@google.com, haoluo@google.com,
jolsa@kernel.org
Subject: Re: [PATCH v3 bpf-next 1/3] bpf: Add bpf_iter_cpumask kfuncs
Date: Thu, 18 Jan 2024 19:45:17 -0800 [thread overview]
Message-ID: <26d90b2c-72ce-4c3f-8c88-08ea3e605f3a@linux.dev> (raw)
In-Reply-To: <a5c077b5-5575-f1a8-9a65-b3877af56c0d@huaweicloud.com>
On 1/18/24 4:51 PM, Hou Tao wrote:
> Hi,
>
> On 1/19/2024 6:27 AM, Yonghong Song wrote:
>> On 1/16/24 6:48 PM, Yafang Shao wrote:
>>> Add three new kfuncs for bpf_iter_cpumask.
>>> - bpf_iter_cpumask_new
>>> It is defined with KF_RCU_PROTECTED and KF_RCU.
>>> KF_RCU_PROTECTED is defined because we must use it under the
>>> protection of RCU.
>>> KF_RCU is defined because the cpumask must be a RCU trusted pointer
>>> such as task->cpus_ptr.
>> I am not sure whether we need both or not.
>>
>> KF_RCU_PROTECTED means the function call needs within the rcu cs.
>> KF_RCU means the argument usage needs within the rcu cs.
>> We only need one of them (preferrably KF_RCU).
>>
>>> - bpf_iter_cpumask_next
>>> - bpf_iter_cpumask_destroy
>>>
>>> These new kfuncs facilitate the iteration of percpu data, such as
>>> runqueues, psi_cgroup_cpu, and more.
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> ---
>>> kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 69 insertions(+)
>>>
>>> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
>>> index 2e73533a3811..1840e48e6142 100644
>>> --- a/kernel/bpf/cpumask.c
>>> +++ b/kernel/bpf/cpumask.c
>>> @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct
>>> cpumask *cpumask)
>>> return cpumask_weight(cpumask);
>>> }
>>> +struct bpf_iter_cpumask {
>>> + __u64 __opaque[2];
>>> +} __aligned(8);
>>> +
>>> +struct bpf_iter_cpumask_kern {
>>> + const struct cpumask *mask;
>>> + int cpu;
>>> +} __aligned(8);
>>> +
>>> +/**
>>> + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a
>>> specified cpumask
>>> + * @it: The new bpf_iter_cpumask to be created.
>>> + * @mask: The cpumask to be iterated over.
>>> + *
>>> + * This function initializes a new bpf_iter_cpumask structure for
>>> iterating over
>>> + * the specified CPU mask. It assigns the provided cpumask to the
>>> newly created
>>> + * bpf_iter_cpumask @it for subsequent iteration operations.
>>> + *
>>> + * On success, 0 is returen. On failure, ERR is returned.
>>> + */
>>> +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it,
>>> const struct cpumask *mask)
>>> +{
>>> + struct bpf_iter_cpumask_kern *kit = (void *)it;
>>> +
>>> + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) >
>>> sizeof(struct bpf_iter_cpumask));
>>> + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
>>> + __alignof__(struct bpf_iter_cpumask));
>>> +
>>> + kit->mask = mask;
>>> + kit->cpu = -1;
>>> + return 0;
>>> +}
>> We have problem here. Let us say bpf_iter_cpumask_new() is called
>> inside rcu cs.
>> Once the control goes out of rcu cs, 'mask' could be freed, right?
>> Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs
>> as bpf_iter_cpumask_new(). But such a requirement seems odd.
> So the case is possible when using bpf_iter_cpumask_new() and
> bpf_iter_cpumask_next() in sleepable program and these two kfuncs are
> used in two different rcu_read_lock/rcu_read_unlock code blocks, right ?
Right, or bpf_iter_cpumask_new() inside rcu cs and bpf_iter_cpumask_next() not.
>> I think we can do things similar to bpf_iter_task_vma. You can
>> allocate memory
>> with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask.
>> This
>> way, you do not need to worry about potential use-after-free issue.
>> The memory can be freed with bpf_iter_cpumask_destroy().
>>
>>> +
>>> +/**
>>> + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask
>>> + * @it: The bpf_iter_cpumask
>>> + *
>>> + * This function retrieves a pointer to the number of the next CPU
>>> within the
>>> + * specified bpf_iter_cpumask. It allows sequential access to CPUs
>>> within the
>>> + * cpumask. If there are no further CPUs available, it returns NULL.
>>> + *
>>> + * Returns a pointer to the number of the next CPU in the cpumask or
>>> NULL if no
>>> + * further CPUs.
>>> + */
>>> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
>>> +{
>>> + struct bpf_iter_cpumask_kern *kit = (void *)it;
>>> + const struct cpumask *mask = kit->mask;
>>> + int cpu;
>>> +
>>> + cpu = cpumask_next(kit->cpu, mask);
>>> + if (cpu >= nr_cpu_ids)
>>> + return NULL;
>>> +
>>> + kit->cpu = cpu;
>>> + return &kit->cpu;
>>> +}
>>> +
>>> +/**
>>> + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask
>>> + * @it: The bpf_iter_cpumask to be destroyed.
>>> + */
>>> +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
>>> +{
>>> +}
>>> +
>>> __bpf_kfunc_end_defs();
>>> BTF_SET8_START(cpumask_kfunc_btf_ids)
>>> @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
>>> BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
>>> BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
>>> BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
>>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW |
>>> KF_RCU_PROTECTED | KF_RCU)
>>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL)
>>> +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
>>> BTF_SET8_END(cpumask_kfunc_btf_ids)
>>> static const struct btf_kfunc_id_set cpumask_kfunc_set = {
>> .
next prev parent reply other threads:[~2024-01-19 3:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 2:48 [PATCH v3 bpf-next 0/3] bpf: Add bpf_iter_cpumask Yafang Shao
2024-01-17 2:48 ` [PATCH v3 bpf-next 1/3] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
2024-01-18 22:27 ` Yonghong Song
2024-01-19 0:51 ` Hou Tao
2024-01-19 3:45 ` Yonghong Song [this message]
2024-01-19 9:50 ` Yafang Shao
2024-01-17 2:48 ` [PATCH v3 bpf-next 2/3] bpf, doc: Add document for cpumask iter Yafang Shao
2024-01-17 2:48 ` [PATCH v3 bpf-next 3/3] selftests/bpf: Add selftests " Yafang Shao
2024-01-18 23:46 ` Yonghong Song
2024-01-21 2:45 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=26d90b2c-72ce-4c3f-8c88-08ea3e605f3a@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=lkp@intel.com \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox