From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 821CA224CC for ; Fri, 24 Jan 2025 00:47:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737679676; cv=none; b=bX63iNP5FaNx2D6OvsQhHWBjF2O/HkpZpG1cXW+NcRtlcfdQPfuNfQQHAlziOwKsZ1H/U+BooeCDVmVt01Lbm/qQvymVPIm/FBBkQoYejNGsqy20V88iXf11g9mOnoKeTltuMZXD+r0fb0PCSIHk0fUK3EApVMJw8KH7aArPjd4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737679676; c=relaxed/simple; bh=Vz6y+8dB7hx3JeW+p/kladDwPjHK/Q8zuaZZvkC10uY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=L6FVRiT0+KFgQTz4UJdeWHdpm5qOLpg/XlKb6LSBSWnpZ/TaWoc/K+pGuVll2o02AOVIehTzIfvrQcmWzkFGgNVua1VrmPlBSW8Uio9I/eEu06h35S1kQJlaw5So0pg1INisLxBlm5UrVGCP2xdgjsFdx7yPagc4Uy4Cmm6rKTQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=QkDY2lFO; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="QkDY2lFO" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1737679672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UUuaKy1sM92Co0RwIGKFEWtlJHw8JW5XxlTLishTd28=; b=QkDY2lFOjcmVdNQELpQS6+vB1VWXHDxWrPrK+FDAsJcNvSKNFBF/vBdEavxZkGrBnIdzQc Q7s2voLmkPf9Rj0MuzAoCTPOK5Mz9dx8JScgxstr+9YBl+z5AEyv6MVQbFQmt8Qh3H7tzD VLSnR9P6lbJAyyJkuD8VcO/s/kCOfVc= Date: Thu, 23 Jan 2025 16:47:41 -0800 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter To: "Matthieu Baerts (NGI0)" , Geliang Tang Cc: mptcp@lists.linux.dev, Mat Martineau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Martin KaFai Lau References: <20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-0-ae244d3cdbbc@kernel.org> <20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-4-ae244d3cdbbc@kernel.org> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau Content-Language: en-US In-Reply-To: <20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-4-ae244d3cdbbc@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote: > From: Geliang Tang > > It's necessary to traverse all subflows on the conn_list of an MPTCP > socket and then call kfunc to modify the fields of each subflow. In > kernel space, mptcp_for_each_subflow() helper is used for this: > > mptcp_for_each_subflow(msk, subflow) > kfunc(subflow); > > But in the MPTCP BPF program, this has not yet been implemented. As > Martin suggested recently, this conn_list walking + modify-by-kfunc > usage fits the bpf_iter use case. > > So this patch adds a new bpf_iter type named "mptcp_subflow" to do > this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ > _destroy(). And register these bpf_iter mptcp_subflow into mptcp > common kfunc set. Then bpf_for_each() for mptcp_subflow can be used > in BPF program like this: > > bpf_for_each(mptcp_subflow, subflow, msk) > kfunc(subflow); > > Suggested-by: Martin KaFai Lau > Signed-off-by: Geliang Tang > Reviewed-by: Mat Martineau > Signed-off-by: Matthieu Baerts (NGI0) > --- > Notes: > - v2: > - Add BUILD_BUG_ON() checks, similar to the ones done with other > bpf_iter_(...) helpers. > - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and > !spin_is_locked() (Martin). > > A few versions of this single patch have been previously posted to the > BPF mailing list by Geliang, before continuing to the MPTCP mailing list > only, with other patches of this series. The version of the whole series > has been reset to 1, but here is the ChangeLog for the previous ones: > - v2: remove msk->pm.lock in _new() and _destroy() (Martin) > drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) > - v3: drop bpf_iter__mptcp_subflow > - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check > it in _next() (Andrii) > - v5: use list_is_last() instead of list_entry_is_head() add > KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() > - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) > --- > net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 53 insertions(+) > > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c > index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 > --- a/net/mptcp/bpf.c > +++ b/net/mptcp/bpf.c > @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { > .set = &bpf_mptcp_fmodret_ids, > }; > > +struct bpf_iter_mptcp_subflow { > + __u64 __opaque[2]; > +} __aligned(8); > + > +struct bpf_iter_mptcp_subflow_kern { > + struct mptcp_sock *msk; > + struct list_head *pos; > +} __aligned(8); > + > __bpf_kfunc_start_defs(); > > __bpf_kfunc static struct mptcp_subflow_context * > @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) > return NULL; > } > > +__bpf_kfunc static int > +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, > + struct mptcp_sock *msk) > +{ > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; > + struct sock *sk = (struct sock *)msk; > + > + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > > + sizeof(struct bpf_iter_mptcp_subflow)); > + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != > + __alignof__(struct bpf_iter_mptcp_subflow)); > + > + kit->msk = msk; > + if (!msk) NULL check is not needed. verifier should have rejected it for KF_TRUSTED_ARGS. > + return -EINVAL; > + > + if (!sock_owned_by_user_nocheck(sk) && > + !spin_is_locked(&sk->sk_lock.slock)) I could have missed something. If it is to catch bug, should it be sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt hook here, the lock should have already been held earlier in the kernel. This set is only showing the cg sockopt bpf prog but missing the major struct_ops piece. It is hard to comment. I assumed the lock situation is the same for the struct_ops where the lock will be held before calling the struct_ops prog? > + return -EINVAL; > + > + kit->pos = &msk->conn_list; > + return 0; > +} > + > +__bpf_kfunc static struct mptcp_subflow_context * > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) > +{ > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; > + > + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) > + return NULL; > + > + kit->pos = kit->pos->next; > + return list_entry(kit->pos, struct mptcp_subflow_context, node); > +} > + > +__bpf_kfunc static void > +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) > +{ > +} > + > __bpf_kfunc_end_defs(); > > BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) > BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) > BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) > > static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { >