From: Martin KaFai Lau <martin.lau@linux.dev>
To: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
Geliang Tang <geliang@kernel.org>
Cc: mptcp@lists.linux.dev, Mat Martineau <martineau@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Mykola Lysenko <mykolal@fb.com>,
Shuah Khan <shuah@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter
Date: Thu, 23 Jan 2025 16:47:41 -0800 [thread overview]
Message-ID: <fdf0ddbe-e007-4a5f-bbdf-9a144e8fbe35@linux.dev> (raw)
In-Reply-To: <20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-4-ae244d3cdbbc@kernel.org>
On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> It's necessary to traverse all subflows on the conn_list of an MPTCP
> socket and then call kfunc to modify the fields of each subflow. In
> kernel space, mptcp_for_each_subflow() helper is used for this:
>
> mptcp_for_each_subflow(msk, subflow)
> kfunc(subflow);
>
> But in the MPTCP BPF program, this has not yet been implemented. As
> Martin suggested recently, this conn_list walking + modify-by-kfunc
> usage fits the bpf_iter use case.
>
> So this patch adds a new bpf_iter type named "mptcp_subflow" to do
> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
> _destroy(). And register these bpf_iter mptcp_subflow into mptcp
> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
> in BPF program like this:
>
> bpf_for_each(mptcp_subflow, subflow, msk)
> kfunc(subflow);
>
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
> - v2:
> - Add BUILD_BUG_ON() checks, similar to the ones done with other
> bpf_iter_(...) helpers.
> - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
> !spin_is_locked() (Martin).
>
> A few versions of this single patch have been previously posted to the
> BPF mailing list by Geliang, before continuing to the MPTCP mailing list
> only, with other patches of this series. The version of the whole series
> has been reset to 1, but here is the ChangeLog for the previous ones:
> - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
> - v3: drop bpf_iter__mptcp_subflow
> - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> it in _next() (Andrii)
> - v5: use list_is_last() instead of list_entry_is_head() add
> KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
> - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
> ---
> net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
> .set = &bpf_mptcp_fmodret_ids,
> };
>
> +struct bpf_iter_mptcp_subflow {
> + __u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_mptcp_subflow_kern {
> + struct mptcp_sock *msk;
> + struct list_head *pos;
> +} __aligned(8);
> +
> __bpf_kfunc_start_defs();
>
> __bpf_kfunc static struct mptcp_subflow_context *
> @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk)
> return NULL;
> }
>
> +__bpf_kfunc static int
> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> + struct mptcp_sock *msk)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> + struct sock *sk = (struct sock *)msk;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
> + sizeof(struct bpf_iter_mptcp_subflow));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
> + __alignof__(struct bpf_iter_mptcp_subflow));
> +
> + kit->msk = msk;
> + if (!msk)
NULL check is not needed. verifier should have rejected it for KF_TRUSTED_ARGS.
> + return -EINVAL;
> +
> + if (!sock_owned_by_user_nocheck(sk) &&
> + !spin_is_locked(&sk->sk_lock.slock))
I could have missed something. If it is to catch bug, should it be
sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt hook
here, the lock should have already been held earlier in the kernel.
This set is only showing the cg sockopt bpf prog but missing the major
struct_ops piece. It is hard to comment. I assumed the lock situation is the
same for the struct_ops where the lock will be held before calling the
struct_ops prog?
> + return -EINVAL;
> +
> + kit->pos = &msk->conn_list;
> + return 0;
> +}
> +
> +__bpf_kfunc static struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
> + return NULL;
> +
> + kit->pos = kit->pos->next;
> + return list_entry(kit->pos, struct mptcp_subflow_context, node);
> +}
> +
> +__bpf_kfunc static void
> +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> +{
> +}
> +
> __bpf_kfunc_end_defs();
>
> BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
> BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
> BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
>
> static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
>
next prev parent reply other threads:[~2025-01-24 0:47 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-19 15:46 [PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 1/7] bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 2/7] bpf: Allow use of skc_to_mptcp_sock in cg_sockopt Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 3/7] bpf: Register mptcp common kfunc set Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter Matthieu Baerts (NGI0)
2025-01-24 0:47 ` Martin KaFai Lau [this message]
2025-01-30 12:05 ` Matthieu Baerts
2024-12-19 15:46 ` [PATCH bpf-next/net v2 5/7] bpf: Acquire and release mptcp socket Matthieu Baerts (NGI0)
2025-01-24 1:26 ` Martin KaFai Lau
2024-12-19 15:46 ` [PATCH bpf-next/net v2 6/7] selftests/bpf: More endpoints for endpoint_init Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 7/7] selftests/bpf: Add mptcp_subflow bpf_iter subtest Matthieu Baerts (NGI0)
2025-01-24 1:38 ` Martin KaFai Lau
2025-01-15 9:39 ` [PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support Matthieu Baerts
2025-01-17 0:06 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fdf0ddbe-e007-4a5f-bbdf-9a144e8fbe35@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=geliang@kernel.org \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=martineau@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).