From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
David Marchevsky <david.marchevsky@linux.dev>
Cc: Dave Marchevsky <davemarchevsky@fb.com>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
Date: Tue, 22 Aug 2023 17:18:40 -0700 [thread overview]
Message-ID: <4a601c15-25d5-ec97-849a-97e54ace8f0d@linux.dev> (raw)
In-Reply-To: <20230822234529.z6ogvsptbivobdmg@MacBook-Pro-8.local>
On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
>> On 8/21/23 10:37 PM, Yonghong Song wrote:
>>>
>>>
>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>> An earlier patch in the series ensures that the underlying memory of
>>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>>> until RCU grace period has elapsed. This prevents
>>>> use-after-free with non-owning references that may point to
>>>> recently-freed memory. While RCU read lock is held, it's safe to
>>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>>> elapsed and therefore underlying memory couldn't have been reused.
>>>>
>>>> From the perspective of verifier "trustedness" non-owning refs to
>>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>>> MEM_RCU in order to reflect this new state.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>> include/linux/bpf.h | 3 ++-
>>>> kernel/bpf/verifier.c | 13 ++++++++++++-
>>>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index eced6400f778..12596af59c00 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>> MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS),
>>>> /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>>> - * Currently only valid for linked-list and rbtree nodes.
>>>> + * Currently only valid for linked-list and rbtree nodes. If the nodes
>>>> + * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>>> */
>>>> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS),
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 8db0afa5985c..55607ab30522 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>> case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>> case PTR_TO_BTF_ID | MEM_RCU:
>>>> case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>>> + case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>> /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>> * its fixed offset must be 0. In the other cases, fixed offset
>>>> * can be non-zero. This was already checked above. So pass
>>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>> static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>> {
>>>> struct bpf_verifier_state *state = env->cur_state;
>>>> + struct btf_record *rec = reg_btf_record(reg);
>>>> if (!state->active_lock.ptr) {
>>>> verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>> }
>>>> reg->type |= NON_OWN_REF;
>>>> + if (rec->refcount_off >= 0)
>>>> + reg->type |= MEM_RCU;
>>>
>>> Should the above MEM_RCU marking be done unless reg access is in
>>> rcu critical section?
>>
>> I think it is fine, since non-owning references currently exist only within
>> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
>> disabled + spin_lock CS should imply RCU CS.
>>
>> [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
>>
>>>
>>> I think we still have issues for state resetting
>>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
>>> will try to convert the reg state to PTR_UNTRUSTED.
>>>
>>> Let us say reg state is
>>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
>>>
>>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
>>> the reg state, the state should become
>>> PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
>>> some additional code might be needed so we wont have
>>> verifier complaints about ref_obj_id == 0.
>>>
>>> (2). If hitting bpf_rcu_read_unlock(), the state should become
>>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
>>> since register access still in bpf_spin_lock() region.
>>
>> I agree w/ your comment in side reply stating that this
>> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
>> is currently not allowed.
>>
>>>
>>> Does this make sense?
>>>
>>
>>
>> IIUC the specific reg state flow you're recommending is based on the convos
>> we've had over the past few weeks re: getting rid of special non-owning ref
>> lifetime rules, instead using RCU as much as possible. Specifically, this
>> recommended change would remove non-owning ref clobbering, instead just removing
>> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
>> to collection kfuncs (refcount_acquire, etc).
>
> Overall the patch set makes sense to me, but I want to clarify above.
> My understanding that after the patch set applied bpf_spin_unlock()
> will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
> is not correct.
> Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().
I said it 'should become ...', but you are right. right now, it will do
mark_reg_invalid(). So it is correct just MAYBE a little conservative.
>
> Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region
> it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to
> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED
> which is a buggy combination which we would need to address if rcu_unlock is allowed eventually.
>
> Did I get it right?
> If so I think the whole set is good to do.
>
next prev parent reply other threads:[~2023-08-23 0:18 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-22 1:52 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-23 6:26 ` Yonghong Song
2023-08-23 16:20 ` Alexei Starovoitov
2023-08-23 20:29 ` Yonghong Song
2023-08-24 1:38 ` Alexei Starovoitov
2023-08-24 2:09 ` Alexei Starovoitov
2023-08-24 4:01 ` Yonghong Song
2023-08-24 3:52 ` Yonghong Song
2023-08-24 22:03 ` Alexei Starovoitov
2023-08-24 22:25 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-22 2:37 ` Yonghong Song
2023-08-22 3:19 ` Yonghong Song
2023-08-22 5:47 ` David Marchevsky
2023-08-22 16:02 ` Yonghong Song
2023-08-22 23:45 ` Alexei Starovoitov
2023-08-23 0:18 ` Yonghong Song [this message]
2023-08-23 0:21 ` Alexei Starovoitov
2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
2023-08-22 2:53 ` Yonghong Song
2023-08-22 19:46 ` Alexei Starovoitov
2023-08-22 19:53 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
2023-08-22 3:18 ` Yonghong Song
2023-08-22 5:21 ` David Marchevsky
2023-08-22 15:00 ` Yonghong Song
2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a601c15-25d5-ec97-849a-97e54ace8f0d@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davemarchevsky@fb.com \
--cc=david.marchevsky@linux.dev \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.