From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FC191E764B for ; Tue, 5 Nov 2024 21:26:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730841987; cv=none; b=kbnnQhArNNstRB6CBhgpk26Tz31i8FqVPipbdIdbh5QaCChflFyFGxclKbIfoapTmHSjJmFZuecJv+KAHPBoO8uSV/tvo64dX25hCvoZJj0REjU9ZIHzms0wsVY1I2uekHruH+mvFwqAr43grJQdbxax/xBV1uAsh3gApNQ24wg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730841987; c=relaxed/simple; bh=nGO8UXE/fPO8ZIe6/YysgMEr/3M5HgGLL3kATjFDtqI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=axvUGldQiIidp6/P4/puO/2guBTToMnjVw9sKNP7YN8Zq7l8W7x9aRlpJ+GRwTyUQwZneXVE6d7ti/Ro6la1KAoMColJ3plhwQ4iYb8V0S3Qpb47/XAhfrT1IX511itVnLviv3FKez0P0ZY6yABY7YRurFa2Xv2UP/Jvz92tSb4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=enGuFvZg; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="enGuFvZg" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1730841982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dkvWl+gXWFVRUzF+WhKtdAw2kYUpomMZvBF7n6HCqXw=; b=enGuFvZgB0Wm7+xKSMsI0LGFw0ykKTMa8GuNasn62E94bzvV79vHInMku4+eaY/3FSLWhc 0f1CKou7o0nXhkcaga9QkzqXWcJew61IkD2hFfvOP96D5/zlT7rkJ7wXBdyDyTX/+aVvuH zJa3lvOD0J7IAM4BKUn04b90xHi42+k= Date: Tue, 5 Nov 2024 13:26:14 -0800 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback Content-Language: en-GB To: Alexei Starovoitov Cc: bpf , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Kernel Team , Martin KaFai Lau , Tejun Heo References: <20241104193455.3241859-1-yonghong.song@linux.dev> <20241104193515.3243315-1-yonghong.song@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 11/5/24 12:26 PM, Alexei Starovoitov wrote: > On Mon, Nov 4, 2024 at 7:37 PM Yonghong Song wrote: >> >> On 11/4/24 6:51 PM, Alexei Starovoitov wrote: >>> On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song wrote: >>>> In previous patch, tracing progs are enabled for private stack since >>>> recursion checking ensures there exists no nested same bpf prog run on >>>> the same cpu. >>>> >>>> But it is still possible for nested bpf subprog run on the same cpu >>>> if the same subprog is called in both main prog and async callback, >>>> or in different async callbacks. For example, >>>> main_prog >>>> bpf_timer_set_callback(timer, timer_cb); >>>> call sub1 >>>> sub1 >>>> ... >>>> time_cb >>>> call sub1 >>>> >>>> In the above case, nested subprog run for sub1 is possible with one in >>>> process context and the other in softirq context. If this is the case, >>>> the verifier will disable private stack for this bpf prog. >>>> >>>> Signed-off-by: Yonghong Song >>>> --- >>>> include/linux/bpf_verifier.h | 2 ++ >>>> kernel/bpf/verifier.c | 42 +++++++++++++++++++++++++++++++----- >>>> 2 files changed, 39 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >>>> index 0622c11a7e19..e921589abc72 100644 >>>> --- a/include/linux/bpf_verifier.h >>>> +++ b/include/linux/bpf_verifier.h >>>> @@ -669,6 +669,8 @@ struct bpf_subprog_info { >>>> /* true if bpf_fastcall stack region is used by functions that can't be inlined */ >>>> bool keep_fastcall_stack: 1; >>>> bool use_priv_stack: 1; >>>> + bool visited_with_priv_stack_accum: 1; >>>> + bool visited_with_priv_stack: 1; >>>> >>>> u8 arg_cnt; >>>> struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>>> index 406195c433ea..e01b3f0fd314 100644 >>>> --- a/kernel/bpf/verifier.c >>>> +++ b/kernel/bpf/verifier.c >>>> @@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, >>>> idx, subprog_depth); >>>> return -EACCES; >>>> } >>>> - if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) >>>> + if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) { >>>> subprog[idx].use_priv_stack = true; >>>> + subprog[idx].visited_with_priv_stack = true; >>>> + } >>>> + } else { >>>> + subprog[idx].visited_with_priv_stack = true; >>> See suggestion for patch 3. >>> It's cleaner to rewrite with a single visited_with_priv_stack = true; statement. >> Ack. >> >>>> } >>>> } >>>> continue_func: >>>> @@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, >>>> static int check_max_stack_depth(struct bpf_verifier_env *env) >>>> { >>>> struct bpf_subprog_info *si = env->subprog_info; >>>> + enum priv_stack_mode orig_priv_stack_supported; >>>> enum priv_stack_mode priv_stack_supported; >>>> int ret, subtree_depth = 0, depth_frame; >>>> >>>> priv_stack_supported = bpf_enable_priv_stack(env->prog); >>>> + orig_priv_stack_supported = priv_stack_supported; >>>> >>>> if (priv_stack_supported != NO_PRIV_STACK) { >>>> for (int i = 0; i < env->subprog_cnt; i++) { >>>> @@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env) >>>> priv_stack_supported); >>>> if (ret < 0) >>>> return ret; >>>> + >>>> + if (priv_stack_supported != NO_PRIV_STACK) { >>>> + for (int j = 0; j < env->subprog_cnt; j++) { >>>> + if (si[j].visited_with_priv_stack_accum && >>>> + si[j].visited_with_priv_stack) { >>>> + /* si[j] is visited by both main/async subprog >>>> + * and another async subprog. >>>> + */ >>>> + priv_stack_supported = NO_PRIV_STACK; >>>> + break; >>>> + } >>>> + if (!si[j].visited_with_priv_stack_accum) >>>> + si[j].visited_with_priv_stack_accum = >>>> + si[j].visited_with_priv_stack; >>>> + } >>>> + } >>>> + if (priv_stack_supported != NO_PRIV_STACK) { >>>> + for (int j = 0; j < env->subprog_cnt; j++) >>>> + si[j].visited_with_priv_stack = false; >>>> + } >>> I cannot understand what this algorithm is doing. >>> What is the meaning of visited_with_priv_stack_accum ? >> The following is an example to show how the algorithm works. >> Let us say we have prog like >> main_prog0 si[0] >> sub1 si[1] >> sub2 si[2] >> async1 si[3] >> sub4 si[4] >> sub2 si[2] >> async2 si[5] >> sub4 si[4] >> sub5 si[6] >> >> >> Total 9 subprograms. >> >> after iteration 1 (main_prog0) >> visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9 >> visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false >> >> for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i] >> is false, so main_prog0 can use priv stack. >> >> visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false >> visited_with_priv_stack cleared with false. >> >> after iteration 2 (async1) >> visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false >> visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false >> >> Here, si[2] appears in both visited_with_priv_stack_accum and >> visited_with_priv_stack, so async1 cannot have priv stack. >> >> In my algorithm, I flipped the whole thing to no_priv_stack, which is >> too conservative. We should just skip async1 and continues. >> >> Let us say, we say async1 not having priv stack while main_prog0 has. >> >> /* the same as end of iteration 1 */ >> visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false >> visited_with_priv_stack cleared with false. >> >> after iteration 3 (async2) >> visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false >> visited_with_priv_stack: si[4] = si[5] = si[6] = true; >> >> there are no conflict, so async2 can use private stack. >> >> >> If we only have one bit in bpf_subprog_info, for a async tree, >> if marking a subprog to be true and later we found there is a conflict in >> async tree and we need make the whole async subprogs not eligible for priv stack, >> then it will be hard to undo previous markings. >> >> So visited_with_priv_stack_accum is to accumulate "true" results from >> main_prog/async's. > I see. I think it works, but feels complicated. > It feels it should be possible to do without extra flags. Like > check_max_stack_depth_subprog() will know whether it was called > to verify async_cb or not. > So it's just a matter of adding single 'if' to it: > if (subprog[idx].use_priv_stack && checking_async_cb) > /* reset to false due to potential recursion */ > subprog[idx].use_priv_stack = false; > > check_max_stack_depth() starts with i==0, > so reachable and eligible subprogs will be marked with use_priv_stack. > Then check_max_stack_depth_subprog() will be called again > to verify async. If it sees the mark it's a bad case. > what am I missing? First I think we still want to mark some subprogs in async tree to use private stack, right? If this is the case, then let us see the following examle: main_prog: sub1: use_priv_stack = true sub2" use_priv_stack = true async: /* calling sub1 twice */ sub1 <=== we do if (subprog[idx].use_priv_stack && checking_async_cb) subprog[idx].use_priv_stack = false; sub1 <=== here we have subprog[idx].use_priv_stack = false; we could mark use_priv_stack = true again here since logic didn't keep track of sub1 has been visited before. To solve the above issue, we need one visited bit in bpf_subprog_info. After finishing async tree, if for any subprog, visited_bit && subprog[idx].use_priv_stack is true, we can mark subprog[idx].use_priv_stack = false So one visited bit is enough. More complicated case is two asyncs. For example: main_prog: sub1 sub2 async1: sub3 async2: sub3 If async1/sub3 and async2/sub3 can be nested, then we will need two visited bits as I have above. If async1/sub3 and async2/sub3 cannot be nested, then one visited bit should be enough, since we can traverse async1/async2 with 'visited' marking and then compare against main prog. So the question would be: 1. Is it possible that two async call backs may nest with each other? I actually do not know the answer. 2. Do we want to allow subprogs in async tree to use private stacks?