Re: [PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Leon Hwang <leon.hwang@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, yonghong.song@linux.dev, song@kernel.org,
	eddyz87@gmail.com, dxu@dxuuu.xyz, deso@posteo.net,
	kernel-patches-bot@fb.com
Subject: Re: [PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map
Date: Tue, 22 Jul 2025 01:08:38 +0800	[thread overview]
Message-ID: <d8d4cd89-2953-45d1-9f81-ab633aa3e4cd@linux.dev> (raw)
In-Reply-To: <CAEf4BzY74tbyzD-4iF1Em9EmKX=2fAN4dTp_k8o+MuN2T3CVqQ@mail.gmail.com>



On 2025/7/18 23:52, Andrii Nakryiko wrote:
> On Thu, Jul 17, 2025 at 12:38 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
>> to allow updating or looking up values for specified CPUs or for all CPUs
>> with a single value.
>>
>> This enhancement enables:
>>
>> * Efficient update of all CPUs using a single value when cpu == (u32)~0.
>> * Targeted update or lookup for a specified CPU otherwise.
>>
>> The flag is passed via:
>>
>> * map_flags in bpf_percpu_array_update() along with embedded cpu field.
>> * elem_flags in generic_map_update_batch() along with separated cpu field.
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  include/linux/bpf.h            |  3 +-
>>  include/uapi/linux/bpf.h       |  7 +++++
>>  kernel/bpf/arraymap.c          | 54 ++++++++++++++++++++++++++--------
>>  kernel/bpf/syscall.c           | 52 ++++++++++++++++++++------------
>>  tools/include/uapi/linux/bpf.h |  7 +++++
>>  5 files changed, 90 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index f9cd2164ed23..faee5710e913 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2671,7 +2671,8 @@ int map_set_for_each_callback_args(struct bpf_verifier_env *env,
>>                                    struct bpf_func_state *callee);
>>
>>  int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
>> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
>> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value,
>> +                         u64 flags);
>>  int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
>>                            u64 flags);
>>  int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 233de8677382..4cad3de6899d 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1372,6 +1372,12 @@ enum {
>>         BPF_NOEXIST     = 1, /* create new element if it didn't exist */
>>         BPF_EXIST       = 2, /* update existing element */
>>         BPF_F_LOCK      = 4, /* spin_lock-ed map_lookup/map_update */
>> +       BPF_F_CPU       = 8, /* map_update for percpu_array */
>> +};
>> +
>> +enum {
>> +       /* indicate updating value across all CPUs for percpu maps. */
>> +       BPF_ALL_CPUS    = (__u32)~0,
>>  };
>>
>>  /* flags for BPF_MAP_CREATE command */
>> @@ -1549,6 +1555,7 @@ union bpf_attr {
>>                 __u32           map_fd;
>>                 __u64           elem_flags;
>>                 __u64           flags;
>> +               __u32           cpu;
>>         } batch;
> 
> So you use flags to pass cpu for singular lookup/delete operations,
> but separate cpu field for batch. We need to be consistent here. I
> think I initially suggested a separate cpu field, but given how much
> churn it's causing in API and usage, I guess I'm leaning towards just
> passing it through flags.
> 
> But if someone else has strong preferences, I can be convinced otherwise.
> 

Sure, they should be consistent.

Passing the CPU info through flags was actually my original idea, but I
wasn’t sure how to convince you initially.

Let’s go ahead and pass the CPU through flags to minimize churn.

> Either way, it has to be consistent between batched and non-batched API.
> 
> Other than that and minor formatting needs below, LGTM.
> 
> pw-bot: cr
> 
>>
>>         struct { /* anonymous struct used by BPF_PROG_LOAD command */
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 3d080916faf9..d333663cbe71 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -295,17 +295,24 @@ static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key,
>>         return per_cpu_ptr(array->pptrs[index & array->index_mask], cpu);
>>  }
>>
>> -int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
>> +int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value, u64 flags)
>>  {
>>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>>         u32 index = *(u32 *)key;
>>         void __percpu *pptr;
>> -       int cpu, off = 0;
>> -       u32 size;
>> +       u32 size, cpu;
>> +       int off = 0;
>>
>>         if (unlikely(index >= array->map.max_entries))
>>                 return -ENOENT;
>>
>> +       cpu = (u32)(flags >> 32);
>> +       flags &= (u32)~0;
>> +       if (unlikely(flags > BPF_F_CPU))
>> +               return -EINVAL;
>> +       if (unlikely((flags & BPF_F_CPU) && cpu >= num_possible_cpus()))
>> +               return -E2BIG;
> 
> nit: I'd probably do -ERANGE for this one
> 

Ack.

>> +
>>         /* per_cpu areas are zero-filled and bpf programs can only
>>          * access 'value_size' of them, so copying rounded areas
>>          * will not leak any kernel data
> 
> [...]
> 
>> @@ -1941,19 +1945,25 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>>  {
>>         void __user *values = u64_to_user_ptr(attr->batch.values);
>>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
>> -       u32 value_size, cp, max_count;
>> +       u32 value_size, cp, max_count, cpu = attr->batch.cpu;
>> +       u64 elem_flags = attr->batch.elem_flags;
>>         void *key, *value;
>>         int err = 0;
>>
>> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
>> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
>>                 return -EINVAL;
>>
>> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>> +       if ((elem_flags & BPF_F_LOCK) &&
>>             !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
>>                 return -EINVAL;
>>         }
>>
>> -       value_size = bpf_map_value_size(map);
>> +       if ((elem_flags & BPF_F_CPU) &&
>> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> 
> nit: keep on the single line
> 

Ack.

Btw, how many chars of one line do we allow? 100 chars?

>> +               return -EINVAL;
>> +
>> +       value_size = bpf_map_value_size(map, elem_flags);
>> +       elem_flags |= ((u64)cpu) << 32;
>>
>>         max_count = attr->batch.count;
>>         if (!max_count)
> 
> [...]
> 
>> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>> +       if ((elem_flags & BPF_F_LOCK) &&
>>             !btf_record_has_field(map->record, BPF_SPIN_LOCK))
>>                 return -EINVAL;
>>
>> -       value_size = bpf_map_value_size(map);
>> +       if ((elem_flags & BPF_F_CPU) &&
>> +               map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
>> +               return -EINVAL;
> 
> same, formatting is off, but best to keep it single line
> 

Ack.

[...]

Thanks,
Leon

next prev parent reply	other threads:[~2025-07-21 17:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-17 19:37 [PATCH bpf-next 0/3] bpf: Introduce BPF_F_CPU flag for percpu_array map Leon Hwang
2025-07-17 19:37 ` [PATCH bpf-next 1/3] " Leon Hwang
2025-07-18 15:52   ` Andrii Nakryiko
2025-07-21 17:08     ` Leon Hwang [this message]
2025-07-21 17:11       ` Eduard Zingerman
2025-07-22 14:29         ` Leon Hwang
2025-07-17 19:37 ` [PATCH bpf-next 2/3] bpf, libbpf: Support BPF_F_CPU " Leon Hwang
2025-07-18 15:57   ` Andrii Nakryiko
2025-07-21 17:13     ` Leon Hwang
2025-07-17 19:37 ` [PATCH bpf-next 3/3] selftests/bpf: Add case to test BPF_F_CPU Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8d4cd89-2953-45d1-9f81-ab633aa3e4cd@linux.dev \
    --to=leon.hwang@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=deso@posteo.net \
    --cc=dxu@dxuuu.xyz \
    --cc=eddyz87@gmail.com \
    --cc=kernel-patches-bot@fb.com \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.