From: John Fastabend <john.fastabend@gmail.com>
To: Yonghong Song <yhs@meta.com>, Martin KaFai Lau <martin.lau@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
Date: Mon, 21 Nov 2022 18:03:42 -0800 [thread overview]
Message-ID: <637c2dfe3277f_18ed920828@john.notmuch> (raw)
In-Reply-To: <8166d67a-de10-7c6a-c0c5-976fbac37a55@meta.com>
Yonghong Song wrote:
>
>
> On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
> > On 11/21/22 12:01 PM, Yonghong Song wrote:
> >>
> >>
> >> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
> >>> On 11/21/22 9:05 AM, Yonghong Song wrote:
> >>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct
> >>>> bpf_verifier_env *env,
> >>>> return -EACCES;
> >>>> }
> >>>> + /* Access rcu protected memory */
> >>>> + if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> >>>> + !env->cur_state->active_rcu_lock) {
> >>>> + verbose(env,
> >>>> + "R%d is ptr_%s access rcu-protected memory with off=%d,
> >>>> not rcu protected\n",
> >>>> + regno, tname, off);
> >>>> + return -EACCES;
> >>>> + }
> >>>> +
> >>>> if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
> >>>> if (!btf_is_kernel(reg->btf)) {
> >>>> verbose(env, "verifier internal error: reg->btf must
> >>>> be kernel btf\n");
> >>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct
> >>>> bpf_verifier_env *env,
> >>>> if (ret < 0)
> >>>> return ret;
> >>>> + /* The value is a rcu pointer. The load needs to be in a rcu
> >>>> lock region,
> >>>> + * similar to rcu_dereference().
> >>>> + */
> >>>> + if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
> >>>> !env->cur_state->active_rcu_lock) {
> >>>> + verbose(env,
> >>>> + "R%d is rcu dereference ptr_%s with off=%d, not in
> >>>> rcu_read_lock region\n",
> >>>> + regno, tname, off);
> >>>> + return -EACCES;
> >>>> + }
> >>>
> >>> Would this make the existing rdonly use case fail?
> >>>
> >>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int task_real_parent(void *ctx)
> >>> {
> >>> struct task_struct *task, *real_parent;
> >>>
> >>> task = bpf_get_current_task_btf();
> >>> real_parent = task->real_parent;
> >>> bpf_printk("pid %u\n", real_parent->pid);
> >>> return 0;
> >>> }
> >>
> >> Right, it will fail. To fix the issue, user can do
> >> bpf_rcu_read_lock();
> >> real_parent = task->real_parent;
> >> bpf_printk("pid %u\n", real_parent->pid);
> >> bpf_rcu_read_unlock();
> >>
> >> But this raised a good question. How do we deal with
> >> legacy sleepable programs with newly-added rcu tagging
> >> capabilities.
> >>
> >> My current option is to error out if rcu usage is not right.
> >> But this might break existing sleepable programs.
> >>
> >> Another option intends to not break existing, like above,
> >> codes. In this case, MEM_RCU will not tagged if it is
> >> not inside bpf_rcu_read_lock() region.
> >
> > hmm.... it is to make MEM_RCU to mean a reg is protected by the current
> > active_rcu_lock or not?
>
> Yes, for example, in 'real_parent = task->real_parent' where
> 'real_parent' in task_struct is tagged with __rcu in the struct
> definition. So the 'real_parent' variable in the above assignment
> will be tagged with MEM_RCU.
>
> >
> >> In this case, the above non-rcu-protected code should work. And the
> >> following should work as well although it is a little
> >> bit awkward.
> >> real_parent = task->real_parent; // real_parent not tagged with rcu
> >> bpf_rcu_read_lock();
> >> bpf_printk("pid %u\n", real_parent->pid);
> >> bpf_rcu_read_unlock();
> >
> > I think it should be fine. bpf_rcu_read_lock() just not useful in this
> > example but nothing break or crash. Also, after bpf_rcu_read_unlock(),
> > real_parent will continue to be readable because the MEM_RCU is not set?
>
> That is correct. the variable real_parent is not tagged with MEM_RCU
> and it will stay that way for the rest of its life cycle.
>
> With new PTR_TRUSTED mechanism, real_parent will be marked as normal
> PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
> compatibility. So in the above code, real_parent->pid is just a normal
> load (not related to rcu/trusted/untrusted). People may think it
> is okay, but actually it does not okay. Verifier could add more state
> to issue proper warnings, but I am not sure whether it is worthwhile
> or not. As you mentioned, nothing breaks. It is just the current
> existing way. So we should be able to live with this.
>
> >
> > On top of the active_rcu_lock, should MEM_RCU be set only when it is
> > dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
>
> I didn't consider PTR_TRUSTED because it is just introduced yesterday...
>
> My current implementation inherits the old ptr_to_btf_id way where by
> default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
> we should be able to use it for a stronger guarantee.
>
> > I am thinking about the following more common case:
> >
> > /* bpf_get_current_task_btf() may need to be changed
> > * to set PTR_TRUSTED at the retval?
> > */
> > /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
> > task = bpf_get_current_task_btf();
> >
> > bpf_rcu_read_lock();
> >
> > /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
> > real_parent = task->real_parent;
> >
> > /* bpf_task_acquire() needs to change to use
> > refcount_inc_not_zero */
> > real_parent = bpf_task_acquire(real_parent);
> >
> > bpf_rcu_read_unlock();
> >
> > /* real_parent is accessible here (after checking NULL) and
> > * can be passed to kfunc
> > */
> >
>
> Yes, the above is a typical use case. Or alternatively after
> real_parent = task->real_parent;
> /* use real_parent inside the bpf_rcu_read_lock() region */
>
> I will try to utilize PTR_TRUSTED concept in the next revision.
Also perhaps interesting is when task is read out of a map
with reference already pinned. I think you should clear
the MEM_RCU tag on all referenced objects?
next prev parent reply other threads:[~2022-11-22 2:03 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
2022-11-22 17:06 ` KP Singh
2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
2022-11-21 19:41 ` Martin KaFai Lau
2022-11-21 20:01 ` Yonghong Song
2022-11-21 22:56 ` Martin KaFai Lau
2022-11-21 23:42 ` Yonghong Song
2022-11-22 2:03 ` John Fastabend [this message]
2022-11-22 4:16 ` Yonghong Song
2022-11-22 5:48 ` Alexei Starovoitov
2022-11-22 6:32 ` Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
2022-11-22 1:59 ` John Fastabend
2022-11-22 4:09 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=637c2dfe3277f_18ed920828@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=yhs@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.