BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@meta.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Yonghong Song <yhs@fb.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v2 5/8] bpf: Add bpf_rcu_read_lock() verifier support
Date: Tue, 8 Nov 2022 12:40:42 -0800	[thread overview]
Message-ID: <bb6a4598-2e28-9b95-bd23-ac6ed3b87260@meta.com> (raw)
In-Reply-To: <20221108201938.byemttanmpbh3gn4@apollo>



On 11/8/22 12:19 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, Nov 09, 2022 at 01:33:04AM IST, Yonghong Song wrote:
>>
>>
>> On 11/8/22 9:04 AM, Kumar Kartikeya Dwivedi wrote:
>>> On Tue, Nov 08, 2022 at 01:11:14PM IST, Yonghong Song wrote:
>>>> To simplify the design and support the common practice, no
>>>> nested bpf_rcu_read_lock() is allowed. During verification,
>>>> each paired bpf_rcu_read_lock()/unlock() has a unique
>>>> region id, starting from 1. Each rcu ptr register also
>>>> remembers the region id when the ptr reg is initialized.
>>>> The following is a simple example to illustrate the
>>>> rcu lock regions and usage of rcu ptr's.
>>>>
>>>>        ...                    <=== rcu lock region 0
>>>>        bpf_rcu_read_lock()    <=== rcu lock region 1
>>>>        rcu_ptr1 = ...         <=== rcu_ptr1 with region 1
>>>>        ... using rcu_ptr1 ...
>>>>        bpf_rcu_read_unlock()
>>>>        ...                    <=== rcu lock region -1
>>>>        bpf_rcu_read_lock()    <=== rcu lock region 2
>>>>        rcu_ptr2 = ...         <=== rcu_ptr2 with region 2
>>>>        ... using rcu_ptr2 ...
>>>>        ... using rcu_ptr1 ... <=== wrong, region 1 rcu_ptr in region 2
>>>>        bpf_rcu_read_unlock()
>>>>
>>>> Outside the rcu lock region, the rcu lock region id is 0 or negative of
>>>> previous valid rcu lock region id, so the next valid rcu lock region
>>>> id can be easily computed.
>>>>
>>>> Note that rcu protection is not needed for non-sleepable program. But
>>>> it is supported to make cross-sleepable/nonsleepable development easier.
>>>> For non-sleepable program, the following insns can be inside the rcu
>>>> lock region:
>>>>     - any non call insns except BPF_ABS/BPF_IND
>>>>     - non sleepable helpers or kfuncs
>>>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
>>>> allocation flag) should be GFP_ATOMIC.
>>>>
>>>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
>>>> this pointer and the load which gets this pointer needs to be
>>>> protected by bpf_rcu_read_lock(). The following shows a couple
>>>> of examples:
>>>>     struct task_struct {
>>>>           ...
>>>>           struct task_struct __rcu        *real_parent;
>>>>           struct css_set __rcu            *cgroups;
>>>>           ...
>>>>     };
>>>>     struct css_set {
>>>>           ...
>>>>           struct cgroup *dfl_cgrp;
>>>>           ...
>>>>     }
>>>>     ...
>>>>     task = bpf_get_current_task_btf();
>>>>     cgroups = task->cgroups;
>>>>     dfl_cgroup = cgroups->dfl_cgrp;
>>>>     ... using dfl_cgroup ...
>>>>
>>>> The bpf_rcu_read_lock/unlock() should be added like below to
>>>> avoid verification failures.
>>>>     task = bpf_get_current_task_btf();
>>>>     bpf_rcu_read_lock();
>>>>     cgroups = task->cgroups;
>>>>     dfl_cgroup = cgroups->dfl_cgrp;
>>>>     bpf_rcu_read_unlock();
>>>>     ... using dfl_cgroup ...
>>>>
>>>> The following is another example for task->real_parent.
>>>>     task = bpf_get_current_task_btf();
>>>>     bpf_rcu_read_lock();
>>>>     real_parent = task->real_parent;
>>>>     ... bpf_task_storage_get(&map, real_parent, 0, 0);
>>>>     bpf_rcu_read_unlock();
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    include/linux/bpf.h          |  1 +
>>>>    include/linux/bpf_verifier.h |  7 +++
>>>>    kernel/bpf/btf.c             | 32 ++++++++++++-
>>>>    kernel/bpf/verifier.c        | 92 +++++++++++++++++++++++++++++++-----
>>>>    4 files changed, 120 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index b4bbcafd1c9b..98af0c9ec721 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -761,6 +761,7 @@ struct bpf_prog_ops {
>>>>    struct btf_struct_access_info {
>>>>    	u32 next_btf_id;
>>>>    	enum bpf_type_flag flag;
>>>> +	bool is_rcu;
>>>>    };
>>>>
>>>>    struct bpf_verifier_ops {
>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>> index 1a32baa78ce2..5d703637bb12 100644
>>>> --- a/include/linux/bpf_verifier.h
>>>> +++ b/include/linux/bpf_verifier.h
>>>> @@ -179,6 +179,10 @@ struct bpf_reg_state {
>>>>    	 */
>>>>    	s32 subreg_def;
>>>>    	enum bpf_reg_liveness live;
>>>> +	/* 0: not rcu ptr; > 0: rcu ptr, id of the rcu read lock region where
>>>> +	 * the rcu ptr reg is initialized.
>>>> +	 */
>>>> +	int active_rcu_lock;
>>>>    	/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
>>>>    	bool precise;
>>>>    };
>>>> @@ -324,6 +328,8 @@ struct bpf_verifier_state {
>>>>    	u32 insn_idx;
>>>>    	u32 curframe;
>>>>    	u32 active_spin_lock;
>>>> +	/* <= 0: not in rcu read lock region; > 0: the rcu lock region id */
>>>> +	int active_rcu_lock;
>>>>    	bool speculative;
>>>>
>>>>    	/* first and last insn idx of this verifier state */
>>>> @@ -424,6 +430,7 @@ struct bpf_insn_aux_data {
>>>>    	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
>>>>    	bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
>>>>    	bool zext_dst; /* this insn zero extends dst reg */
>>>> +	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
>>>>    	u8 alu_state; /* used in combination with alu_limit */
>>>>
>>>>    	/* below fields are initialized once */
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index d2ee1669a2f3..c5a9569f2ae0 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -5831,6 +5831,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>>>>    		if (btf_type_is_ptr(mtype)) {
>>>>    			const struct btf_type *stype, *t;
>>>>    			enum bpf_type_flag tmp_flag = 0;
>>>> +			bool is_rcu = false;
>>>>    			u32 id;
>>>>
>>>>    			if (msize != size || off != moff) {
>>>> @@ -5850,12 +5851,16 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>>>>    				/* check __percpu tag */
>>>>    				if (strcmp(tag_value, "percpu") == 0)
>>>>    					tmp_flag = MEM_PERCPU;
>>>> +				/* check __rcu tag */
>>>> +				if (strcmp(tag_value, "rcu") == 0)
>>>> +					is_rcu = true;
>>>>    			}
>>>>
>>>>    			stype = btf_type_skip_modifiers(btf, mtype->type, &id);
>>>>    			if (btf_type_is_struct(stype)) {
>>>>    				info->next_btf_id = id;
>>>>    				info->flag = tmp_flag;
>>>> +				info->is_rcu = is_rcu;
>>>>    				return WALK_PTR;
>>>>    			}
>>>>    		}
>>>> @@ -6317,7 +6322,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>>>>    {
>>>>    	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>>>>    	bool rel = false, kptr_get = false, trusted_args = false;
>>>> -	bool sleepable = false;
>>>> +	bool sleepable = false, rcu_lock = false, rcu_unlock = false;
>>>>    	struct bpf_verifier_log *log = &env->log;
>>>>    	u32 i, nargs, ref_id, ref_obj_id = 0;
>>>>    	bool is_kfunc = btf_is_kernel(btf);
>>>> @@ -6356,6 +6361,31 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>>>>    		kptr_get = kfunc_meta->flags & KF_KPTR_GET;
>>>>    		trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
>>>>    		sleepable = kfunc_meta->flags & KF_SLEEPABLE;
>>>> +		rcu_lock = kfunc_meta->flags & KF_RCU_LOCK;
>>>> +		rcu_unlock = kfunc_meta->flags & KF_RCU_UNLOCK;
>>>> +	}
>>>> +
>>>> +	/* checking rcu read lock/unlock */
>>>> +	if (env->cur_state->active_rcu_lock > 0) {
>>>> +		if (rcu_lock) {
>>>> +			bpf_log(log, "nested rcu read lock (kernel function %s)\n", func_name);
>>>> +			return -EINVAL;
>>>> +		} else if (rcu_unlock) {
>>>> +			/* change active_rcu_lock to its corresponding negative value to
>>>> +			 * preserve the previous lock region id.
>>>> +			 */
>>>> +			env->cur_state->active_rcu_lock = -env->cur_state->active_rcu_lock;
>>>> +		} else if (sleepable) {
>>>> +			bpf_log(log, "kernel func %s is sleepable within rcu_read_lock region\n",
>>>> +				func_name);
>>>> +			return -EINVAL;
>>>> +		}
>>>> +	} else if (rcu_lock) {
>>>> +		/* a new lock region started, increase the region id. */
>>>> +		env->cur_state->active_rcu_lock = (-env->cur_state->active_rcu_lock) + 1;
>>>> +	} else if (rcu_unlock) {
>>>> +		bpf_log(log, "unmatched rcu read unlock (kernel function %s)\n", func_name);
>>>> +		return -EINVAL;
>>>>    	}
>>>>
>>>
>>> Can you provide more context on why having ids is better than simply
>>> invalidating the registers when the section ends, and making active_rcu_lock a
>>> boolean instead? You can use bpf_for_each_reg_in_vstate to find every reg having
>>> MEM_RCU and mark it unknown.
>>
>> I think we also need to invalidate rcu-ptr related states as well in spills.
>>
>> I also tried to support cases like:
>> 	bpf_rcu_read_lock();
>> 	rcu_ptr = ...
>> 	   ... rcu_ptr ...
>> 	bpf_rcu_read_unlock();
>> 	... rcu_ptr ... /* no load, just use the rcu_ptr somehow */
>>
>> In the above case, outside the rcu read lock region, there is no
>> load with rcu_ptr but it can still be used for other purposes
>> with a property of a pointer.
>>
>> But for a second thought, it should be okay to invalidate
>> rcu_ptr during bpf_rcu_read_unlock() as a scalar. This should
>> satisfy almost all (if not all) cases.
>>
>>>
>>> You won't have to match the id in btf_struct_access as such registers won't ever
>>> reach that function (if marked unknown on invalidation, they become scalars).
>>> The reg state won't need another active_rcu_lock member either, it is simply
>>> part of reg->type.
>>
>> Right, if I don't maintain region id's, no need to have reg->active_rcu_lock
>> and using MEM_RCU should be enough.
>>
>>>
>>> It seems to that simply invalidating registers when rcu_read_unlock is called is
>>> both less code to write and simpler to understand.
>>
>> invalidating rcu_ptr in registers and spills.
>>
> 
> If you use bpf_for_each_reg_in_vstate, it should cover both.

Just checked the macro implementation. Yes, it covers both reg and 
spills. Thanks for mentioning bpf_for_each_reg_in_vstate which I
am not aware of.

  reply	other threads:[~2022-11-08 20:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08  7:40 [PATCH bpf-next v2 0/8] bpf: Add bpf_rcu_read_lock() support Yonghong Song
2022-11-08  7:40 ` [PATCH bpf-next v2 1/8] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
2022-11-08  7:40 ` [PATCH bpf-next v2 2/8] bpf: Refactor btf_struct_access callback interface Yonghong Song
2022-11-08  7:41 ` [PATCH bpf-next v2 3/8] bpf: Abstract out functions to check sleepable helpers Yonghong Song
2022-11-08 10:43   ` kernel test robot
2022-11-08 14:15   ` kernel test robot
2022-11-08  7:41 ` [PATCH bpf-next v2 4/8] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
2022-11-08 16:56   ` Alexei Starovoitov
2022-11-08 19:09     ` Yonghong Song
2022-11-08 17:09   ` Kumar Kartikeya Dwivedi
2022-11-08 19:08     ` Yonghong Song
2022-11-08  7:41 ` [PATCH bpf-next v2 5/8] bpf: Add bpf_rcu_read_lock() verifier support Yonghong Song
2022-11-08 17:04   ` Kumar Kartikeya Dwivedi
2022-11-08 20:03     ` Yonghong Song
2022-11-08 20:19       ` Kumar Kartikeya Dwivedi
2022-11-08 20:40         ` Yonghong Song [this message]
2022-11-08  7:41 ` [PATCH bpf-next v2 6/8] bpf: Enable sleeptable support for cgrp local storage Yonghong Song
2022-11-08  7:41 ` [PATCH bpf-next v2 7/8] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
2022-11-08  7:41 ` [PATCH bpf-next v2 8/8] selftests/bpf: Add rcu_read_lock test to s390x deny list Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb6a4598-2e28-9b95-bd23-ac6ed3b87260@meta.com \
    --to=yhs@meta.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox