All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
Date: Sat, 19 Aug 2023 21:04:50 -0700	[thread overview]
Message-ID: <5b54fb06-3ce5-4219-a591-1a2d8ee77656@linux.dev> (raw)
In-Reply-To: <CAP01T756RSWSveq_SqfhFWJguT+gpwYU7iRtMGCgSFNf-x+JLQ@mail.gmail.com>



On 8/18/23 6:24 PM, Kumar Kartikeya Dwivedi wrote:
> On Mon, 14 Aug 2023 at 22:59, Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> Add two new kfunc's, bpf_percpu_obj_new_impl() and
>> bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
>> Two functions are very similar to bpf_obj_new_impl()
>> and bpf_obj_drop_impl(). The major difference is related
>> to percpu handling.
>>
>>      bpf_rcu_read_lock()
>>      struct val_t __percpu *v = map_val->percpu_data;
>>      ...
>>      bpf_rcu_read_unlock()
>>
>> For a percpu data map_val like above 'v', the reg->type
>> is set as
>>          PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
>> if inside rcu critical section.
>>
>> MEM_RCU marking here is similar to NON_OWN_REF as 'v'
>> is not a owning referenace. But NON_OWN_REF is
> 
> typo: reference

Ack.

> 
>> trusted and typically inside the spinlock while
>> MEM_RCU is under rcu read lock. RCU is preferred here
>> since percpu data structures mean potential concurrent
>> access into its contents.
>>
>> Also, bpf_percpu_obj_new_impl() is restricted to only accept
>> scalar struct which means nested kptr's are not allowed
>> but some other special field, e.g., bpf_list_head, bpf_spin_lock, etc.
>> could be nested (nested 'struct'). Later patch will improve verifier to
>> handle such nested special fields.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h   |  3 +-
>>   kernel/bpf/helpers.c  | 49 +++++++++++++++++++++++
>>   kernel/bpf/syscall.c  | 21 +++++++---
>>   kernel/bpf/verifier.c | 90 ++++++++++++++++++++++++++++++++++---------
>>   4 files changed, 137 insertions(+), 26 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index e6348fd0a785..a2cb380c43c7 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -197,7 +197,8 @@ struct btf_field_kptr {
>>          struct btf *btf;
>>          struct module *module;
>>          /* dtor used if btf_is_kernel(btf), otherwise the type is
>> -        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl is used
>> +        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl
>> +        * or __bpf_percpu_drop_impl is used
>>           */
>>          btf_dtor_kfunc_t dtor;
>>          u32 btf_id;
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index eb91cae0612a..dd14cb7da4af 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -1900,6 +1900,29 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>>          return p;
>>   }
>>
>> +__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>> +{
>> +       struct btf_struct_meta *meta = meta__ign;
>> +       const struct btf_record *rec;
>> +       u64 size = local_type_id__k;
>> +       void __percpu *pptr;
>> +       void *p;
>> +       int cpu;
>> +
>> +       p = bpf_mem_alloc(&bpf_global_percpu_ma, size);
>> +       if (!p)
>> +               return NULL;
>> +       if (meta) {
>> +               pptr = *((void __percpu **)p);
>> +               rec = meta->record;
>> +               for_each_possible_cpu(cpu) {
>> +                       bpf_obj_init(rec, per_cpu_ptr(pptr, cpu));
>> +               }
>> +       }
>> +
>> +       return p;
>> +}
>> +
>>   /* Must be called under migrate_disable(), as required by bpf_mem_free */
>>   void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>   {
>> @@ -1924,6 +1947,30 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
>>          __bpf_obj_drop_impl(p, meta ? meta->record : NULL);
>>   }
>>
>> +/* Must be called under migrate_disable(), as required by bpf_mem_free_rcu */
>> +void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec)
>> +{
>> +       void __percpu *pptr;
>> +       int cpu;
>> +
>> +       if (rec) {
>> +               pptr = *((void __percpu **)p);
>> +               for_each_possible_cpu(cpu) {
>> +                       bpf_obj_free_fields(rec, per_cpu_ptr(pptr, cpu));
> 
> Should this loop be done after we have waited for the RCU grace period?
> Otherwise any other CPU can reinitialize a field after this is done,
> move objects into lists/rbtree, and leak memory.
> Please correct me if I'm mistaken.

Thanks for spotting this. I think you are correct. The above scenario is
indeed possible. one cpu takes a direct reference of __percpu_kptr and
do a bunch of stuff, and the other cpu is doing a bpf_kptr_xchg to
get the __percpu_kptr and drops it. We should really drop the 
__percpu_kptr itself and the fields in its record after a rcu
grace period so the exist direct reference operation won't be
affected.

Will fix it in the v2.

> 
>> +               }
>> +       }
>> +
>> +       bpf_mem_free_rcu(&bpf_global_percpu_ma, p);
>> +}
>> +
>> +__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
>> +{
>> +       struct btf_struct_meta *meta = meta__ign;
>> +       void *p = p__alloc;
>> +
>> +       __bpf_percpu_obj_drop_impl(p, meta ? meta->record : NULL);
>> +}
>> +
>>   __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
>>   {
>>          struct btf_struct_meta *meta = meta__ign;
>> @@ -2436,7 +2483,9 @@ BTF_SET8_START(generic_btf_ids)
>>   BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>>   #endif
>>   BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
>> +BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
>>   BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_list_push_front_impl)
>>   BTF_ID_FLAGS(func, bpf_list_push_back_impl)
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 1c30b6ee84d4..9ceb6fd9a0e2 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -627,6 +627,7 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
>>   }
>>
>>   extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
>> +extern void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec);
>>
>>   void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>   {
>> @@ -660,13 +661,21 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>                          if (!btf_is_kernel(field->kptr.btf)) {
>>                                  pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
>>                                                                             field->kptr.btf_id);
>> -                               if (field->type != BPF_KPTR_PERCPU_REF)
>> +
>> +                               if (field->type == BPF_KPTR_PERCPU_REF) {
>> +                                       migrate_disable();
>> +                                       __bpf_percpu_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> +                                                                               pointee_struct_meta->record :
>> +                                                                               NULL);
>> +                                       migrate_enable();
>> +                               } else {
>>                                          WARN_ON_ONCE(!pointee_struct_meta);
>> -                               migrate_disable();
>> -                               __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> -                                                                pointee_struct_meta->record :
>> -                                                                NULL);
>> -                               migrate_enable();
>> +                                       migrate_disable();
>> +                                       __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> +                                                                        pointee_struct_meta->record :
>> +                                                                        NULL);
>> +                                       migrate_enable();
>> +                               }
>>                          } else {
>>                                  field->kptr.dtor(xchgd_field);
>>                          }
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4ccca1f6c998..a985fbf18a11 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta {
>>          /* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
>>           * generally to pass info about user-defined local kptr types to later
>>           * verification logic
>> -        *   bpf_obj_drop
>> +        *   bpf_obj_drop/bpf_percpu_obj_drop
>>           *     Record the local kptr type to be drop'd
>>           *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
>>           *     Record the local kptr type to be refcount_incr'd and use
>> @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>>          if (kptr_field->type == BPF_KPTR_UNREF)
>>                  perm_flags |= PTR_UNTRUSTED;
>>
>> +       if (kptr_field->type == BPF_KPTR_PERCPU_REF)
>> +               perm_flags |= MEM_PERCPU | MEM_ALLOC;
>> +
> 
> I think just this would permit PTR_TO_BTF_ID | MEM_ALLOC for percpu kptr?
> It would probably be good to include negative selftests for kptr_xchg
> type matching with percpu_kptr to prevent things like these.
> 
> Alexei already said map_kptr_match_type is not being invoked for
> MEM_ALLOC kptr_xchg, so that is also an existing bug.

I will fix that bug first and this part of change probably not
needed any more.

> 
>>          if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>>                  goto bad_type;
>>
>> [...]
>>          /* We need to verify reg->type and reg->btf, before accessing reg->btf */
>>          reg_name = btf_type_name(reg->btf, reg->btf_id);
>>
>> @@ -5084,7 +5091,17 @@ static bool rcu_safe_kptr(const struct btf_field *field)
>>   {
>>          const struct btf_field_kptr *kptr = &field->kptr;
>>
>> -       return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
>> +       return field->type == BPF_KPTR_PERCPU_REF ||
>> +              (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id));
>> +}
>> +
>> +static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field)
>> +{
>> +       if (!rcu_safe_kptr(kptr_field) || !in_rcu_cs(env))
>> +               return PTR_MAYBE_NULL | PTR_UNTRUSTED;
>> +       if (kptr_field->type != BPF_KPTR_PERCPU_REF)
>> +               return PTR_MAYBE_NULL | MEM_RCU;
>> +       return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU;
> 
> The inverted conditions are a bit hard to follow. Maybe better to
> explicitly check for both RCU cases, and default to untrusted
> otherwise?

Okay. Will do.

> 
>>   }
>>
>> [...]
>>

  reply	other threads:[~2023-08-20  4:07 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Yonghong Song
2023-08-18 18:37   ` David Marchevsky
2023-08-18 23:24     ` Alexei Starovoitov
2023-08-20  3:46       ` Yonghong Song
2023-08-20  3:45     ` Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
2023-08-19  0:29   ` Alexei Starovoitov
2023-08-20  3:47     ` Yonghong Song
2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
2023-08-20  4:04     ` Yonghong Song [this message]
2023-08-14 17:28 ` [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
2023-08-19  1:01   ` Alexei Starovoitov
2023-08-20  4:16     ` Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
2023-08-19  1:44   ` Kumar Kartikeya Dwivedi
2023-08-20  4:19     ` Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
2023-08-18 15:54   ` Daniel Borkmann
2023-08-18 17:17     ` Yonghong Song
2023-08-18 18:26   ` Zvi Effron
2023-08-18 18:58     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b54fb06-3ce5-4219-a591-1a2d8ee77656@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.