All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Marchevsky <david.marchevsky@linux.dev>
To: yonghong.song@linux.dev, Dave Marchevsky <davemarchevsky@fb.com>,
	bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
Date: Tue, 22 Aug 2023 01:47:01 -0400	[thread overview]
Message-ID: <21f00803-d20f-e584-6512-67e5107e3865@linux.dev> (raw)
In-Reply-To: <fafb9664-2473-1993-ea0d-4e4228f32c7b@linux.dev>

On 8/21/23 10:37 PM, Yonghong Song wrote:
> 
> 
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>> An earlier patch in the series ensures that the underlying memory of
>> nodes with bpf_refcount - which can have multiple owners - is not reused
>> until RCU grace period has elapsed. This prevents
>> use-after-free with non-owning references that may point to
>> recently-freed memory. While RCU read lock is held, it's safe to
>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>> elapsed and therefore underlying memory couldn't have been reused.
>>
>>  From the perspective of verifier "trustedness" non-owning refs to
>> refcounted nodes are now trusted only in RCU CS and therefore should no
>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>> MEM_RCU in order to reflect this new state.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   include/linux/bpf.h   |  3 ++-
>>   kernel/bpf/verifier.c | 13 ++++++++++++-
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index eced6400f778..12596af59c00 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>> -     * Currently only valid for linked-list and rbtree nodes.
>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>        */
>>       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 8db0afa5985c..55607ab30522 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>       case PTR_TO_BTF_ID | MEM_RCU:
>>       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>            * its fixed offset must be 0. In the other cases, fixed offset
>>            * can be non-zero. This was already checked above. So pass
>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>   {
>>       struct bpf_verifier_state *state = env->cur_state;
>> +    struct btf_record *rec = reg_btf_record(reg);
>>         if (!state->active_lock.ptr) {
>>           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>       }
>>         reg->type |= NON_OWN_REF;
>> +    if (rec->refcount_off >= 0)
>> +        reg->type |= MEM_RCU;
> 
> Should the above MEM_RCU marking be done unless reg access is in
> rcu critical section?

I think it is fine, since non-owning references currently exist only within
spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
disabled + spin_lock CS should imply RCU CS.

  [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/

> 
> I think we still have issues for state resetting
> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> will try to convert the reg state to PTR_UNTRUSTED.
> 
> Let us say reg state is
>   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> 
> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> the reg state, the state should become
>   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> some additional code might be needed so we wont have
> verifier complaints about ref_obj_id == 0.
> 
> (2). If hitting bpf_rcu_read_unlock(), the state should become
>   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> since register access still in bpf_spin_lock() region.

I agree w/ your comment in side reply stating that this
case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
is currently not allowed.

> 
> Does this make sense?
> 


IIUC the specific reg state flow you're recommending is based on the convos
we've had over the past few weeks re: getting rid of special non-owning ref
lifetime rules, instead using RCU as much as possible. Specifically, this
recommended change would remove non-owning ref clobbering, instead just removing
NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
to collection kfuncs (refcount_acquire, etc).

I agree that in general we'll be able to loosen the lifetime logic for
non-owning refs, and your specific suggestion sounds reasonable. IMO it's
better to ship that as a separate series, though, as this series was meant
to be the minimum changes necessary to re-enable bpf_refcount_acquire, and
it's expanded a bit past that already. Easier to reason about the rest
of this series' changes without having to account for clobbering changes.

>> +
>>       return 0;
>>   }
>>   @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct bpf_func_state *state;
>>           struct bpf_reg_state *reg;
>>   +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>> +            verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
>> +            return -EACCES;
>> +        }
>> +
>>           if (rcu_lock) {
>>               verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>               return -EINVAL;
>> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
>>                       return -EINVAL;
>>                   }
>>   -                if (env->cur_state->active_rcu_lock) {
>> +                if (env->cur_state->active_rcu_lock &&
>> +                    !in_rbtree_lock_required_cb(env)) {
>>                       verbose(env, "bpf_rcu_read_unlock is missing\n");
>>                       return -EINVAL;
>>                   }

  parent reply	other threads:[~2023-08-22  5:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-22  1:52   ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-23  6:26   ` Yonghong Song
2023-08-23 16:20     ` Alexei Starovoitov
2023-08-23 20:29       ` Yonghong Song
2023-08-24  1:38         ` Alexei Starovoitov
2023-08-24  2:09           ` Alexei Starovoitov
2023-08-24  4:01             ` Yonghong Song
2023-08-24  3:52           ` Yonghong Song
2023-08-24 22:03             ` Alexei Starovoitov
2023-08-24 22:25               ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-22  2:37   ` Yonghong Song
2023-08-22  3:19     ` Yonghong Song
2023-08-22  5:47     ` David Marchevsky [this message]
2023-08-22 16:02       ` Yonghong Song
2023-08-22 23:45       ` Alexei Starovoitov
2023-08-23  0:18         ` Yonghong Song
2023-08-23  0:21           ` Alexei Starovoitov
2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
2023-08-22  2:53   ` Yonghong Song
2023-08-22 19:46     ` Alexei Starovoitov
2023-08-22 19:53       ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
2023-08-22  3:18   ` Yonghong Song
2023-08-22  5:21     ` David Marchevsky
2023-08-22 15:00       ` Yonghong Song
2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21f00803-d20f-e584-6512-67e5107e3865@linux.dev \
    --to=david.marchevsky@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.