From: Martin KaFai Lau <martin.lau@linux.dev>
To: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
Geliang Tang <geliang@kernel.org>
Cc: mptcp@lists.linux.dev, Mat Martineau <martineau@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Mykola Lysenko <mykolal@fb.com>,
Shuah Khan <shuah@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter
Date: Thu, 23 Jan 2025 16:47:41 -0800 [thread overview]
Message-ID: <fdf0ddbe-e007-4a5f-bbdf-9a144e8fbe35@linux.dev> (raw)
In-Reply-To: <20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-4-ae244d3cdbbc@kernel.org>
On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> It's necessary to traverse all subflows on the conn_list of an MPTCP
> socket and then call kfunc to modify the fields of each subflow. In
> kernel space, mptcp_for_each_subflow() helper is used for this:
>
> mptcp_for_each_subflow(msk, subflow)
> kfunc(subflow);
>
> But in the MPTCP BPF program, this has not yet been implemented. As
> Martin suggested recently, this conn_list walking + modify-by-kfunc
> usage fits the bpf_iter use case.
>
> So this patch adds a new bpf_iter type named "mptcp_subflow" to do
> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
> _destroy(). And register these bpf_iter mptcp_subflow into mptcp
> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
> in BPF program like this:
>
> bpf_for_each(mptcp_subflow, subflow, msk)
> kfunc(subflow);
>
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
> - v2:
> - Add BUILD_BUG_ON() checks, similar to the ones done with other
> bpf_iter_(...) helpers.
> - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
> !spin_is_locked() (Martin).
>
> A few versions of this single patch have been previously posted to the
> BPF mailing list by Geliang, before continuing to the MPTCP mailing list
> only, with other patches of this series. The version of the whole series
> has been reset to 1, but here is the ChangeLog for the previous ones:
> - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
> - v3: drop bpf_iter__mptcp_subflow
> - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> it in _next() (Andrii)
> - v5: use list_is_last() instead of list_entry_is_head() add
> KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
> - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
> ---
> net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
> .set = &bpf_mptcp_fmodret_ids,
> };
>
> +struct bpf_iter_mptcp_subflow {
> + __u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_mptcp_subflow_kern {
> + struct mptcp_sock *msk;
> + struct list_head *pos;
> +} __aligned(8);
> +
> __bpf_kfunc_start_defs();
>
> __bpf_kfunc static struct mptcp_subflow_context *
> @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk)
> return NULL;
> }
>
> +__bpf_kfunc static int
> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> + struct mptcp_sock *msk)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> + struct sock *sk = (struct sock *)msk;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
> + sizeof(struct bpf_iter_mptcp_subflow));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
> + __alignof__(struct bpf_iter_mptcp_subflow));
> +
> + kit->msk = msk;
> + if (!msk)
NULL check is not needed. verifier should have rejected it for KF_TRUSTED_ARGS.
> + return -EINVAL;
> +
> + if (!sock_owned_by_user_nocheck(sk) &&
> + !spin_is_locked(&sk->sk_lock.slock))
I could have missed something. If it is to catch bug, should it be
sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt hook
here, the lock should have already been held earlier in the kernel.
This set is only showing the cg sockopt bpf prog but missing the major
struct_ops piece. It is hard to comment. I assumed the lock situation is the
same for the struct_ops where the lock will be held before calling the
struct_ops prog?
> + return -EINVAL;
> +
> + kit->pos = &msk->conn_list;
> + return 0;
> +}
> +
> +__bpf_kfunc static struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> +{
> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
> + return NULL;
> +
> + kit->pos = kit->pos->next;
> + return list_entry(kit->pos, struct mptcp_subflow_context, node);
> +}
> +
> +__bpf_kfunc static void
> +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> +{
> +}
> +
> __bpf_kfunc_end_defs();
>
> BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
> BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
> BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
>
> static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
>
next prev parent reply other threads:[~2025-01-24 0:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-19 15:46 [PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 1/7] bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 2/7] bpf: Allow use of skc_to_mptcp_sock in cg_sockopt Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 3/7] bpf: Register mptcp common kfunc set Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter Matthieu Baerts (NGI0)
2025-01-24 0:47 ` Martin KaFai Lau [this message]
2025-01-24 9:10 ` Matthieu Baerts
2025-01-30 12:05 ` Matthieu Baerts
2024-12-19 15:46 ` [PATCH bpf-next/net v2 5/7] bpf: Acquire and release mptcp socket Matthieu Baerts (NGI0)
2025-01-24 1:26 ` Martin KaFai Lau
2024-12-19 15:46 ` [PATCH bpf-next/net v2 6/7] selftests/bpf: More endpoints for endpoint_init Matthieu Baerts (NGI0)
2024-12-19 15:46 ` [PATCH bpf-next/net v2 7/7] selftests/bpf: Add mptcp_subflow bpf_iter subtest Matthieu Baerts (NGI0)
2025-01-24 1:38 ` Martin KaFai Lau
2025-01-24 9:13 ` Matthieu Baerts
2025-01-15 9:39 ` [PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support Matthieu Baerts
2025-01-17 0:06 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fdf0ddbe-e007-4a5f-bbdf-9a144e8fbe35@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=geliang@kernel.org \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=martineau@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.