From: Yonghong Song <yhs@meta.com>
To: John Fastabend <john.fastabend@gmail.com>,
Martin KaFai Lau <martin.lau@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
Date: Mon, 21 Nov 2022 20:16:13 -0800 [thread overview]
Message-ID: <88ffa4fe-948c-d3be-cb54-e3cf34cf3b06@meta.com> (raw)
In-Reply-To: <637c2dfe3277f_18ed920828@john.notmuch>
On 11/21/22 6:03 PM, John Fastabend wrote:
> Yonghong Song wrote:
>>
>>
>> On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
>>> On 11/21/22 12:01 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
>>>>> On 11/21/22 9:05 AM, Yonghong Song wrote:
>>>>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct
>>>>>> bpf_verifier_env *env,
>>>>>> return -EACCES;
>>>>>> }
>>>>>> + /* Access rcu protected memory */
>>>>>> + if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>>>>>> + !env->cur_state->active_rcu_lock) {
>>>>>> + verbose(env,
>>>>>> + "R%d is ptr_%s access rcu-protected memory with off=%d,
>>>>>> not rcu protected\n",
>>>>>> + regno, tname, off);
>>>>>> + return -EACCES;
>>>>>> + }
>>>>>> +
>>>>>> if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>>>>>> if (!btf_is_kernel(reg->btf)) {
>>>>>> verbose(env, "verifier internal error: reg->btf must
>>>>>> be kernel btf\n");
>>>>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct
>>>>>> bpf_verifier_env *env,
>>>>>> if (ret < 0)
>>>>>> return ret;
>>>>>> + /* The value is a rcu pointer. The load needs to be in a rcu
>>>>>> lock region,
>>>>>> + * similar to rcu_dereference().
>>>>>> + */
>>>>>> + if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
>>>>>> !env->cur_state->active_rcu_lock) {
>>>>>> + verbose(env,
>>>>>> + "R%d is rcu dereference ptr_%s with off=%d, not in
>>>>>> rcu_read_lock region\n",
>>>>>> + regno, tname, off);
>>>>>> + return -EACCES;
>>>>>> + }
>>>>>
>>>>> Would this make the existing rdonly use case fail?
>>>>>
>>>>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>>>> int task_real_parent(void *ctx)
>>>>> {
>>>>> struct task_struct *task, *real_parent;
>>>>>
>>>>> task = bpf_get_current_task_btf();
>>>>> real_parent = task->real_parent;
>>>>> bpf_printk("pid %u\n", real_parent->pid);
>>>>> return 0;
>>>>> }
>>>>
>>>> Right, it will fail. To fix the issue, user can do
>>>> bpf_rcu_read_lock();
>>>> real_parent = task->real_parent;
>>>> bpf_printk("pid %u\n", real_parent->pid);
>>>> bpf_rcu_read_unlock();
>>>>
>>>> But this raised a good question. How do we deal with
>>>> legacy sleepable programs with newly-added rcu tagging
>>>> capabilities.
>>>>
>>>> My current option is to error out if rcu usage is not right.
>>>> But this might break existing sleepable programs.
>>>>
>>>> Another option intends to not break existing, like above,
>>>> codes. In this case, MEM_RCU will not tagged if it is
>>>> not inside bpf_rcu_read_lock() region.
>>>
>>> hmm.... it is to make MEM_RCU to mean a reg is protected by the current
>>> active_rcu_lock or not?
>>
>> Yes, for example, in 'real_parent = task->real_parent' where
>> 'real_parent' in task_struct is tagged with __rcu in the struct
>> definition. So the 'real_parent' variable in the above assignment
>> will be tagged with MEM_RCU.
>>
>>>
>>>> In this case, the above non-rcu-protected code should work. And the
>>>> following should work as well although it is a little
>>>> bit awkward.
>>>> real_parent = task->real_parent; // real_parent not tagged with rcu
>>>> bpf_rcu_read_lock();
>>>> bpf_printk("pid %u\n", real_parent->pid);
>>>> bpf_rcu_read_unlock();
>>>
>>> I think it should be fine. bpf_rcu_read_lock() just not useful in this
>>> example but nothing break or crash. Also, after bpf_rcu_read_unlock(),
>>> real_parent will continue to be readable because the MEM_RCU is not set?
>>
>> That is correct. the variable real_parent is not tagged with MEM_RCU
>> and it will stay that way for the rest of its life cycle.
>>
>> With new PTR_TRUSTED mechanism, real_parent will be marked as normal
>> PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
>> compatibility. So in the above code, real_parent->pid is just a normal
>> load (not related to rcu/trusted/untrusted). People may think it
>> is okay, but actually it does not okay. Verifier could add more state
>> to issue proper warnings, but I am not sure whether it is worthwhile
>> or not. As you mentioned, nothing breaks. It is just the current
>> existing way. So we should be able to live with this.
>>
>>>
>>> On top of the active_rcu_lock, should MEM_RCU be set only when it is
>>> dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
>>
>> I didn't consider PTR_TRUSTED because it is just introduced yesterday...
>>
>> My current implementation inherits the old ptr_to_btf_id way where by
>> default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
>> we should be able to use it for a stronger guarantee.
>>
>>> I am thinking about the following more common case:
>>>
>>> /* bpf_get_current_task_btf() may need to be changed
>>> * to set PTR_TRUSTED at the retval?
>>> */
>>> /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
>>> task = bpf_get_current_task_btf();
>>>
>>> bpf_rcu_read_lock();
>>>
>>> /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
>>> real_parent = task->real_parent;
>>>
>>> /* bpf_task_acquire() needs to change to use
>>> refcount_inc_not_zero */
>>> real_parent = bpf_task_acquire(real_parent);
>>>
>>> bpf_rcu_read_unlock();
>>>
>>> /* real_parent is accessible here (after checking NULL) and
>>> * can be passed to kfunc
>>> */
>>>
>>
>> Yes, the above is a typical use case. Or alternatively after
>> real_parent = task->real_parent;
>> /* use real_parent inside the bpf_rcu_read_lock() region */
>>
>> I will try to utilize PTR_TRUSTED concept in the next revision.
>
> Also perhaps interesting is when task is read out of a map
> with reference already pinned. I think you should clear
> the MEM_RCU tag on all referenced objects?
The register tagged with MEM_RCU will not be a referenced obj.
MEM_RCU tag only appears to a register inside the rcu read lock
region as the rcu_reference() result. So the obj tagged with
MEM_RCU is protected with rcu read lock and it is valid and
trusted and there is no need to acquire additional reference.
If user calls another kfunc to acquire a reference, then
the resulted ptr will not have MEM_RCU tag but with non-zero
ref_obj_id.
The MEM_RCU reg will be invalidated when seeing bpf_rcu_read_unlock()
to prevent rcu-protected ptr to leak out of the rcu read lock region.
next prev parent reply other threads:[~2022-11-22 4:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
2022-11-22 17:06 ` KP Singh
2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
2022-11-21 19:41 ` Martin KaFai Lau
2022-11-21 20:01 ` Yonghong Song
2022-11-21 22:56 ` Martin KaFai Lau
2022-11-21 23:42 ` Yonghong Song
2022-11-22 2:03 ` John Fastabend
2022-11-22 4:16 ` Yonghong Song [this message]
2022-11-22 5:48 ` Alexei Starovoitov
2022-11-22 6:32 ` Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
2022-11-22 1:59 ` John Fastabend
2022-11-22 4:09 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88ffa4fe-948c-d3be-cb54-e3cf34cf3b06@meta.com \
--to=yhs@meta.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox