Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
Date: Wed, 23 Aug 2023 13:29:11 -0700	[thread overview]
Message-ID: <71152843-d35d-4165-6410-0aa30a4c0f74@linux.dev> (raw)
In-Reply-To: <CAADnVQK-6A08+OCtOK20yRebBP_N1hKgfmHxtMgokM67LZrcEQ@mail.gmail.com>



On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>> This is the final fix for the use-after-free scenario described in
>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
>>> non-owning refs"). That commit, by virtue of changing
>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
>>> the "refcount incr on 0" splat. The not_zero check in
>>> refcount_inc_not_zero, though, still occurs on memory that could have
>>> been free'd and reused, so the commit didn't properly fix the root
>>> cause.
>>>
>>> This patch actually fixes the issue by free'ing using the recently-added
>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
>>> RCU grace period has elapsed. If that has happened then
>>> there are no non-owning references alive that point to the
>>> recently-free'd memory, so it can be safely reused.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>> ---
>>>    kernel/bpf/helpers.c | 6 +++++-
>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index eb91cae0612a..945a85e25ac5 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>>
>>>        if (rec)
>>>                bpf_obj_free_fields(rec, p);
>>
>> During reviewing my percpu kptr patch with link
>>
>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
>> Kumar mentioned although percpu memory itself is freed under rcu.
>> But its record fields are freed immediately. This will cause
>> the problem since there may be some active uses of these fields
>> within rcu cs and after bpf_obj_free_fields(), some fields may
>> be re-initialized with new memory but they do not have chances
>> to free any more.
>>
>> Do we have problem here as well?
> 
> I think it's not an issue here or in your percpu patch,
> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
> call bpf_mem_free_rcu() (after this patch set lands).

The following is my understanding.

void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{
         const struct btf_field *fields;
         int i;

         if (IS_ERR_OR_NULL(rec))
                 return;
         fields = rec->fields;
         for (i = 0; i < rec->cnt; i++) {
                 struct btf_struct_meta *pointee_struct_meta;
                 const struct btf_field *field = &fields[i];
                 void *field_ptr = obj + field->offset;
                 void *xchgd_field;

                 switch (fields[i].type) {
                 case BPF_SPIN_LOCK:
                         break;
                 case BPF_TIMER:
                         bpf_timer_cancel_and_free(field_ptr);
                         break;
                 case BPF_KPTR_UNREF:
                         WRITE_ONCE(*(u64 *)field_ptr, 0);
                         break;
                 case BPF_KPTR_REF:
			......
                         break;
                 case BPF_LIST_HEAD:
                         if (WARN_ON_ONCE(rec->spin_lock_off < 0))
                                 continue;
                         bpf_list_head_free(field, field_ptr, obj + 
rec->spin_lock_off);
                         break;
                 case BPF_RB_ROOT:
                         if (WARN_ON_ONCE(rec->spin_lock_off < 0))
                                 continue;
                         bpf_rb_root_free(field, field_ptr, obj + 
rec->spin_lock_off);
                         break;
                 case BPF_LIST_NODE:
                 case BPF_RB_NODE:
                 case BPF_REFCOUNT:
                         break;
                 default:
                         WARN_ON_ONCE(1);
                         continue;
                 }
         }
}

For percpu kptr, the remaining possible actiionable fields are
	BPF_LIST_HEAD and BPF_RB_ROOT

So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
list/rb nodes to unlink them from the list_head/rb_root.

So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
Depending on whether the correspondingrec
with rb_node/list_node has ref count or not,
it may call bpf_mem_free() or bpf_mem_free_rcu(). If
bpf_mem_free() is called, then the field is immediately freed
but it may be used by some bpf prog (under rcu) concurrently,
could this be an issue? Changing bpf_mem_free() in
__bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.

Another thing is related to list_head/rb_root.
During bpf_obj_free_fields(), is it possible that another cpu
may allocate a list_node/rb_node and add to list_head/rb_root?
If this is true, then we might have a memory leak.
But I don't whether this is possible or not.

I think local kptr has the issue as percpu kptr.

> 
> In other words all bpf pointers are either properly life time
> checked through the verifier and freed via immediate bpf_mem_free()
> or they're bpf_mem_free_rcu().
> 
> 
>>
>> I am thinking whether I could create another flavor of bpf_mem_free_rcu
>> with a pre_free_callback function, something like
>>     bpf_mem_free_rcu_cb2(struct bpf_mem_alloc *ma, void *ptr,
>>         void (*cb)(void *, void *), void *arg1, void *arg2)
>>
>> The cb(arg1, arg2) will be called right before the real free of "ptr".
>>
>> For example, for this patch, the callback function can be
>>
>> static bpf_obj_free_fields_cb(void *rec, void *p)
>> {
>>          if (rec)
>>                  bpf_obj_free_fields(rec, p);
>>                  /* we need to ensure recursive freeing fields free
>>                   * needs to be done immediately, which means we will
>>                   * add a parameter to __bpf_obj_drop_impl() to
>>                   * indicate whether bpf_mem_free or bpf_mem_free_rcu
>>                   * should be called.
>>                   */
>> }
>>
>> bpf_mem_free_rcu_cb2(&bpf_global_ma, p, bpf_obj_free_fields_cb, rec, p);
> 
> It sounds nice in theory, but will be difficult to implement.
> bpf_ma would need to add extra two 8 byte pointers for every object,
> but there is no room at present. So to free after rcu gp with a callback
> it would need to allocate 16 byte (at least) which might fail and so on.
> bpf_mem_free_rcu_cb2() would be the last resort.

I don't like this either. I hope we can find better solutions than this.

next prev parent reply	other threads:[~2023-08-23 20:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-22  1:52   ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-23  6:26   ` Yonghong Song
2023-08-23 16:20     ` Alexei Starovoitov
2023-08-23 20:29       ` Yonghong Song [this message]
2023-08-24  1:38         ` Alexei Starovoitov
2023-08-24  2:09           ` Alexei Starovoitov
2023-08-24  4:01             ` Yonghong Song
2023-08-24  3:52           ` Yonghong Song
2023-08-24 22:03             ` Alexei Starovoitov
2023-08-24 22:25               ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-22  2:37   ` Yonghong Song
2023-08-22  3:19     ` Yonghong Song
2023-08-22  5:47     ` David Marchevsky
2023-08-22 16:02       ` Yonghong Song
2023-08-22 23:45       ` Alexei Starovoitov
2023-08-23  0:18         ` Yonghong Song
2023-08-23  0:21           ` Alexei Starovoitov
2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
2023-08-22  2:53   ` Yonghong Song
2023-08-22 19:46     ` Alexei Starovoitov
2023-08-22 19:53       ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
2023-08-22  3:18   ` Yonghong Song
2023-08-22  5:21     ` David Marchevsky
2023-08-22 15:00       ` Yonghong Song
2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71152843-d35d-4165-6410-0aa30a4c0f74@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.