From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
David Marchevsky <david.marchevsky@linux.dev>
Cc: Dave Marchevsky <davemarchevsky@fb.com>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
Date: Tue, 22 Aug 2023 17:18:40 -0700 [thread overview]
Message-ID: <4a601c15-25d5-ec97-849a-97e54ace8f0d@linux.dev> (raw)
In-Reply-To: <20230822234529.z6ogvsptbivobdmg@MacBook-Pro-8.local>
On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
>> On 8/21/23 10:37 PM, Yonghong Song wrote:
>>>
>>>
>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>> An earlier patch in the series ensures that the underlying memory of
>>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>>> until RCU grace period has elapsed. This prevents
>>>> use-after-free with non-owning references that may point to
>>>> recently-freed memory. While RCU read lock is held, it's safe to
>>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>>> elapsed and therefore underlying memory couldn't have been reused.
>>>>
>>>> From the perspective of verifier "trustedness" non-owning refs to
>>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>>> MEM_RCU in order to reflect this new state.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>> include/linux/bpf.h | 3 ++-
>>>> kernel/bpf/verifier.c | 13 ++++++++++++-
>>>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index eced6400f778..12596af59c00 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>> MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS),
>>>> /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>>> - * Currently only valid for linked-list and rbtree nodes.
>>>> + * Currently only valid for linked-list and rbtree nodes. If the nodes
>>>> + * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>>> */
>>>> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS),
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 8db0afa5985c..55607ab30522 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>> case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>> case PTR_TO_BTF_ID | MEM_RCU:
>>>> case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>>> + case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>> /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>> * its fixed offset must be 0. In the other cases, fixed offset
>>>> * can be non-zero. This was already checked above. So pass
>>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>> static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>> {
>>>> struct bpf_verifier_state *state = env->cur_state;
>>>> + struct btf_record *rec = reg_btf_record(reg);
>>>> if (!state->active_lock.ptr) {
>>>> verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>> }
>>>> reg->type |= NON_OWN_REF;
>>>> + if (rec->refcount_off >= 0)
>>>> + reg->type |= MEM_RCU;
>>>
>>> Should the above MEM_RCU marking be done unless reg access is in
>>> rcu critical section?
>>
>> I think it is fine, since non-owning references currently exist only within
>> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
>> disabled + spin_lock CS should imply RCU CS.
>>
>> [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
>>
>>>
>>> I think we still have issues for state resetting
>>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
>>> will try to convert the reg state to PTR_UNTRUSTED.
>>>
>>> Let us say reg state is
>>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
>>>
>>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
>>> the reg state, the state should become
>>> PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
>>> some additional code might be needed so we wont have
>>> verifier complaints about ref_obj_id == 0.
>>>
>>> (2). If hitting bpf_rcu_read_unlock(), the state should become
>>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
>>> since register access still in bpf_spin_lock() region.
>>
>> I agree w/ your comment in side reply stating that this
>> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
>> is currently not allowed.
>>
>>>
>>> Does this make sense?
>>>
>>
>>
>> IIUC the specific reg state flow you're recommending is based on the convos
>> we've had over the past few weeks re: getting rid of special non-owning ref
>> lifetime rules, instead using RCU as much as possible. Specifically, this
>> recommended change would remove non-owning ref clobbering, instead just removing
>> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
>> to collection kfuncs (refcount_acquire, etc).
>
> Overall the patch set makes sense to me, but I want to clarify above.
> My understanding that after the patch set applied bpf_spin_unlock()
> will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
> is not correct.
> Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().
I said it 'should become ...', but you are right. right now, it will do
mark_reg_invalid(). So it is correct just MAYBE a little conservative.
>
> Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region
> it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to
> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED
> which is a buggy combination which we would need to address if rcu_unlock is allowed eventually.
>
> Did I get it right?
> If so I think the whole set is good to do.
>
next prev parent reply other threads:[~2023-08-23 0:18 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-22 1:52 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-23 6:26 ` Yonghong Song
2023-08-23 16:20 ` Alexei Starovoitov
2023-08-23 20:29 ` Yonghong Song
2023-08-24 1:38 ` Alexei Starovoitov
2023-08-24 2:09 ` Alexei Starovoitov
2023-08-24 4:01 ` Yonghong Song
2023-08-24 3:52 ` Yonghong Song
2023-08-24 22:03 ` Alexei Starovoitov
2023-08-24 22:25 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-22 2:37 ` Yonghong Song
2023-08-22 3:19 ` Yonghong Song
2023-08-22 5:47 ` David Marchevsky
2023-08-22 16:02 ` Yonghong Song
2023-08-22 23:45 ` Alexei Starovoitov
2023-08-23 0:18 ` Yonghong Song [this message]
2023-08-23 0:21 ` Alexei Starovoitov
2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
2023-08-22 2:53 ` Yonghong Song
2023-08-22 19:46 ` Alexei Starovoitov
2023-08-22 19:53 ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
2023-08-22 3:18 ` Yonghong Song
2023-08-22 5:21 ` David Marchevsky
2023-08-22 15:00 ` Yonghong Song
2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a601c15-25d5-ec97-849a-97e54ace8f0d@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davemarchevsky@fb.com \
--cc=david.marchevsky@linux.dev \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox