From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38FF72309BE for ; Fri, 16 May 2025 22:34:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747434889; cv=none; b=mISt+csZnU1/J0ubAmNZWwmPKODr299aShe8E6u9XXOItiWY2PG0VjhzebANArLIJwpGzx2jDzFCABAAVsxVvRgORDpHhP4CryOiRCmDA/Xu7fti1+w/dRueUWgNkzg79TAABXIk3o68tFmO9Mn/WNIgmWkAY0f+U23gGbz8rCM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747434889; c=relaxed/simple; bh=1Q/aHYWlKWrydy3cLVK8Qq4aGK0NaMxdS1gS1zbr5ZA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=XIsLtMuh25DVV9neEXGvj0STS039cfsIM67ImrxXixRx5NlYqICtMJwR1E9chv9IrTNn4UCSY09jrau8hldu5fvYsqByBZc6IdS/IYs8Uw5EOit1HINUgOs9NpRg4Xs4FWQ0wn3kdocXZb4xHLUi/5anWVWAeNvN6dr49L7Qx90= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ww0iUgyp; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ww0iUgyp" Message-ID: <60092fac-2c8a-4076-9130-8c3e41cba040@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747434875; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=15iMtyH1jixdvcvcU/HjXnbNttGRzvlXzQGL31LvRTI=; b=ww0iUgypAP2aPjYj7IgpdYQ3MYIXgkzo3/tL/pdsSxrOAZUMx8ADEaxfUHAJDsUi0ImibX A3IJHLuRO/48ajbOlSc499S4gwRPQAv80IwrVt/vh2c3Tp0/RR62DKrbkbi+MSt3p/Cdfa rtF6oWO7RaRlCJkA+z3d/iFxYMKycxc= Date: Fri, 16 May 2025 15:34:26 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next/net v3 2/5] bpf: Add mptcp_subflow bpf_iter To: "Matthieu Baerts (NGI0)" Cc: mptcp@lists.linux.dev, Mat Martineau , Geliang Tang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Martin KaFai Lau References: <20250320-bpf-next-net-mptcp-bpf_iter-subflows-v3-0-9abd22c2a7fd@kernel.org> <20250320-bpf-next-net-mptcp-bpf_iter-subflows-v3-2-9abd22c2a7fd@kernel.org> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: <20250320-bpf-next-net-mptcp-bpf_iter-subflows-v3-2-9abd22c2a7fd@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote: > From: Geliang Tang > > It's necessary to traverse all subflows on the conn_list of an MPTCP > socket and then call kfunc to modify the fields of each subflow. In > kernel space, mptcp_for_each_subflow() helper is used for this: > > mptcp_for_each_subflow(msk, subflow) > kfunc(subflow); > > But in the MPTCP BPF program, this has not yet been implemented. As > Martin suggested recently, this conn_list walking + modify-by-kfunc > usage fits the bpf_iter use case. > > So this patch adds a new bpf_iter type named "mptcp_subflow" to do > this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ > _destroy(). And register these bpf_iter mptcp_subflow into mptcp > common kfunc set. Then bpf_for_each() for mptcp_subflow can be used > in BPF program like this: > > bpf_for_each(mptcp_subflow, subflow, msk) > kfunc(subflow); > > Suggested-by: Martin KaFai Lau > Signed-off-by: Geliang Tang > Reviewed-by: Mat Martineau > Signed-off-by: Matthieu Baerts (NGI0) > --- > Notes: > - v2: > - Add BUILD_BUG_ON() checks, similar to the ones done with other > bpf_iter_(...) helpers. > - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and > !spin_is_locked() (Martin). > - v3: > - Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin) > - Remove unneeded !msk check (Martin) > - Remove locks checks, add msk_owned_by_me for lockdep (Martin) > - The following note and 2 questions have been added below. > > This new bpf_iter will be used by our future BPF packet schedulers and > path managers. To see how we are going to use them, please check our > export branch [1], especially these two commits: > > - "bpf: Add mptcp packet scheduler struct_ops": introduce a new > struct_ops. > - "selftests/bpf: Add bpf_burst scheduler & test": new test showing > how the new struct_ops and bpf_iter are being used. > > [1] https://github.com/multipath-tcp/mptcp_net-next/commits/export > > @BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter > to be used with struct_ops, but only with the two new ones we are going > to introduce that are specific to MPTCP, and with not others struct_ops > (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do > you have examples or doc you could point to us to have this restriction > in place, please? The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()". It is in net-next and bpf-next/net. > > Also, for one of the two future MPTCP struct_ops, not all callbacks > should be allowed to use this new bpf_iter, because they are called from > different contexts. How can we ensure such callbacks from a struct_ops > cannot call mptcp_subflow bpf_iter without adding new dedicated checks > looking if some locks are held for all callbacks? We understood that > they wanted to have something similar with sched_ext, but we are not > sure if this code is ready nor if it is going to be accepted. Same. Take a look at "bpf_qdisc_kfunc_filter()".