public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: David Marchevsky <david.marchevsky@linux.dev>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
Date: Fri, 4 Aug 2023 08:43:16 -0700	[thread overview]
Message-ID: <ad78a828-6c83-06c3-0154-ca53f22b03a9@linux.dev> (raw)
In-Reply-To: <e040f58a-4505-9333-2250-57df8ab7290e@linux.dev>



On 8/3/23 11:47 PM, David Marchevsky wrote:
> On 8/2/23 1:59 AM, Yonghong Song wrote:
>>
>>
>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>> The previous patch in the series ensures that the underlying memory of
>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>> until RCU Tasks Trace grace period has elapsed. This prevents
>>> use-after-free with non-owning references that may point to
>>> recently-freed memory. While RCU read lock is held, it's safe to
>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>> elapsed and therefore underlying memory couldn't have been reused.
>>>
>>>   From the perspective of verifier "trustedness" non-owning refs to
>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>> MEM_RCU in order to reflect this new state.
>>>
>>> Similarly to bpf_spin_unlock being a non-owning ref invalidation point,
>>> where non-owning ref reg states are clobbered so that they cannot be
>>> used outside of the critical section, currently all MEM_RCU regs are
>>> marked untrusted after bpf_rcu_read_unlock. This patch makes
>>> bpf_rcu_read_unlock a non-owning ref invalidation point as well,
>>> clobbering the non-owning refs instead of marking untrusted. In the
>>> future we may want to allow untrusted non-owning refs in which case we
>>> can remove this custom logic without breaking BPF programs as it's more
>>> restrictive than the default. That's a big change in semantics, though,
>>> and this series is focused on fixing the use-after-free in most
>>> straightforward way.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    include/linux/bpf.h   |  3 ++-
>>>    kernel/bpf/verifier.c | 17 +++++++++++++++--
>>>    2 files changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index ceaa8c23287f..37fba01b061a 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>> -     * Currently only valid for linked-list and rbtree nodes.
>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>
>> What does 'must' here mean?
>>
> 
> Meaning that if there's any NON_OWN_REF-flagged
> PTR_TO_BTF_ID which points to a struct with a bpf_refcount field,
> it should also be flagged with MEM_RCU. If it isn't, it's a
> verifier error.
> 
>>>         */
>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 9014b469dd9d..4bda365000d3 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -469,7 +469,8 @@ static bool type_is_ptr_alloc_obj(u32 type)
>>>      static bool type_is_non_owning_ref(u32 type)
>>>    {
>>> -    return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
>>> +    return type_is_ptr_alloc_obj(type) &&
>>> +        type_flag(type) & NON_OWN_REF;
>>
>> There is no code change here.
>>
> 
> Yep, will undo in v2.
> 
>>>    }
>>>      static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>>> @@ -8012,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>        case PTR_TO_BTF_ID | MEM_RCU:
>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>             * its fixed offset must be 0. In the other cases, fixed offset
>>>             * can be non-zero. This was already checked above. So pass
>>> @@ -10478,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>    {
>>>        struct bpf_verifier_state *state = env->cur_state;
>>> +    struct btf_record *rec = reg_btf_record(reg);
>>>          if (!state->active_lock.ptr) {
>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>> @@ -10490,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>        }
>>>          reg->type |= NON_OWN_REF;
>>> +    if (rec->refcount_off >= 0)
>>> +        reg->type |= MEM_RCU;
>>
>> Should we check whether the state is in rcu cs before marking MEM_RCU?
>>
> 
> I think this is implicitly being enforced.
> Rbtree/list kfuncs must be called under bpf_spin_lock,
> and this series requires bpf_spin_{lock,unlock} helpers
> to called in RCU CS if the BPF prog is sleepable.

I see.

Alexei early mentioned that
there is no need to put bpf_spin_lock inside RCU CS
if for sleepable program, we do preempt_disable before
real arch_spin_lock(). This is similar to to regular
spin_lock/raw_spin_lock which does preempt_disable
so spin_lock/spin_unlock region becomes an ATOMIC
region which prevents any blocking.

Not sure whether you want to implement in this
patch set or in later patch set.

> 
>>> +
>>>        return 0;
>>>    }
>>>    @@ -11327,10 +11333,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>            struct bpf_func_state *state;
>>>            struct bpf_reg_state *reg;
>>>    +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>>> +            verbose(env, "can't rcu read {lock,unlock} in rbtree cb\n");
>>> +            return -EACCES;
>>> +        }
>>> +
>>>            if (rcu_lock) {
>>>                verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>>                return -EINVAL;
>>>            } else if (rcu_unlock) {
>>> +            invalidate_non_owning_refs(env);
>>
>> If we have both spin lock and rcu like
>>
>>       bpf_rcu_read_lock()
>>       ...
>>       bpf_spin_lock()
>>       ...
>>       bpf_spin_unlock()  <=== invalidate all non_owning_refs
>>       ...                <=== MEM_RCU type is gone
>>       bpf_rcu_read_unlock()
>>
>> Maybe we could fine tune here to preserve MEM_RCU after bpf_spin_unlock()?
>>
> 
> IIUC, you're saying that we should no longer
> have non-owning refs get clobbered after bpf_spin_unlock,
> and instead just have rcu_read_unlock do its default
> "MEM_RCU refs become PTR_UNTRUSTED" logic.
> 
> In the cover letter I mention that this is probably
> the direction we want to go in in the long term, on
> the comments on patch 3:
> 
>    This might
>    allow custom non-owning ref lifetime + invalidation logic to be
>    entirely subsumed by MEM_RCU handling.
> 
> But I'm hesitant to do that in this fixes series
> as I'd like to minimize changes that could introduce
> additional bugs. This series' current changes keep the
> clobbering rules effectively unchanged - can always
> loosen them in the future. Also, I think we should
> make this change for _all_ non-owning refs, (w/ and w/o
> bpf_refcount field). Otherwise the verifier lifetime
> of non-owning refs would change if BPF program writer
> adds bpf_refcount field to their struct, or removes it.
>   
> 
>>>                bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>>                    if (reg->type & MEM_RCU) {
>>>                        reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
>>> @@ -16679,7 +16691,8 @@ static int do_check(struct bpf_verifier_env *env)
>>>                        return -EINVAL;
>>>                    }
>>>    -                if (env->cur_state->active_rcu_lock) {
>>> +                if (env->cur_state->active_rcu_lock &&
>>> +                    !in_rbtree_lock_required_cb(env)) {
>>>                        verbose(env, "bpf_rcu_read_unlock is missing\n");
>>>                        return -EINVAL;
>>>                    }

  reply	other threads:[~2023-08-04 15:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-02  3:57   ` Yonghong Song
2023-08-02 19:23     ` Dave Marchevsky
2023-08-02 21:41       ` Yonghong Song
2023-08-04  6:17         ` David Marchevsky
2023-08-04 15:37           ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-02  4:11   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-02  4:15   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-02  5:21   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-02  5:59   ` Yonghong Song
2023-08-04  6:47     ` David Marchevsky
2023-08-04 15:43       ` Yonghong Song [this message]
2023-08-02 22:50   ` Alexei Starovoitov
2023-08-04  6:55     ` David Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
2023-08-02  6:33   ` Yonghong Song
2023-08-02 22:55   ` Alexei Starovoitov
2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
2023-08-02 23:07   ` Alexei Starovoitov
2023-08-02  3:07 ` [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad78a828-6c83-06c3-0154-ca53f22b03a9@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=david.marchevsky@linux.dev \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox