BPF List
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Geliang Tang <geliang@kernel.org>, mptcp@lists.linux.dev
Cc: Geliang Tang <tanggeliang@kylinos.cn>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH mptcp-next 1/4] bpf: Add mptcp_subflow bpf_iter
Date: Thu, 5 Sep 2024 11:24:58 -0700	[thread overview]
Message-ID: <288ad1c2-501a-4319-bc1e-e7a7e276ff63@linux.dev> (raw)
In-Reply-To: <a75fc3e8df7141ce582448d3f092871a4943fbf4.1725544210.git.tanggeliang@kylinos.cn>

On 9/5/24 6:52 AM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> It's necessary to traverse all subflows on the conn_list of an MPTCP
> socket and then call kfunc to modify the fields of each subflow. In
> kernel space, mptcp_for_each_subflow() helper is used for this:
> 
>   mptcp_for_each_subflow(msk, subflow)
>           kfunc(subflow);
> 
> But in the MPTCP BPF program, this has not yet been implemented. As
> Martin suggested recently, this conn_list walking + modify-by-kfunc
> usage fits the bpf_iter use case.
> 
> This patch adds a new bpf_iter type named "mptcp_subflow" to do this.
> 
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>   kernel/bpf/helpers.c |  3 +++
>   net/mptcp/bpf.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 60 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b5f0adae8293..2340ba967444 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3023,6 +3023,9 @@ BTF_ID_FLAGS(func, bpf_preempt_enable)
>   BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>   BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
>   BTF_KFUNCS_END(common_btf_ids)
>   
>   static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index 9672a70c24b0..cda09bbfd617 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -204,6 +204,63 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
>   	.set   = &bpf_mptcp_fmodret_ids,
>   };
>   
> +struct bpf_iter__mptcp_subflow {
> +	__bpf_md_ptr(struct bpf_iter_meta *, meta);
> +	__bpf_md_ptr(struct mptcp_sock *, msk);
> +	__bpf_md_ptr(struct list_head *, pos);
> +};
> +
> +DEFINE_BPF_ITER_FUNC(mptcp_subflow, struct bpf_iter_meta *meta,
> +		     struct mptcp_sock *msk, struct list_head *pos)
> +
> +struct bpf_iter_mptcp_subflow {
> +	__u64 __opaque[3];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_mptcp_subflow_kern {
> +	struct mptcp_sock *msk;
> +	struct list_head *pos;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> +					   struct mptcp_sock *msk)
> +{
> +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> +	kit->msk = msk;
> +	kit->pos = &msk->conn_list;
> +	spin_lock_bh(&msk->pm.lock);

I don't think spin_lock here without unlock can be used. e.g. What if 
bpf_iter_mptcp_subflow_new() is called twice back-to-back.

I haven't looked at the mptcp details, some questions:
The list is protected by msk->pm.lock?
What happen to the sk_lock of the msk?
Can this be rcu-ify? or it needs some cares when walking the established TCP 
subflow?


[ Please cc the bpf list. Helping to review patches is a good way to contribute 
back to the mailing list. ]

> +
> +	return 0;
> +}
> +
> +__bpf_kfunc struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> +{
> +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +	struct mptcp_subflow_context *subflow;
> +	struct mptcp_sock *msk = kit->msk;
> +
> +	subflow = list_entry((kit->pos)->next, struct mptcp_subflow_context, node);
> +	if (list_entry_is_head(subflow, &msk->conn_list, node))
> +		return NULL;
> +
> +	kit->pos = &subflow->node;
> +	return subflow;
> +}
> +
> +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> +{
> +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +	struct mptcp_sock *msk = kit->msk;
> +
> +	spin_unlock_bh(&msk->pm.lock);
> +}
> +
> +__bpf_kfunc_end_defs();
> +
>   __diag_push();
>   __diag_ignore_all("-Wmissing-prototypes",
>   		  "kfuncs which will be used in BPF programs");


       reply	other threads:[~2024-09-05 18:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1725544210.git.tanggeliang@kylinos.cn>
     [not found] ` <a75fc3e8df7141ce582448d3f092871a4943fbf4.1725544210.git.tanggeliang@kylinos.cn>
2024-09-05 18:24   ` Martin KaFai Lau [this message]
2024-09-06 10:02     ` [PATCH mptcp-next 1/4] bpf: Add mptcp_subflow bpf_iter Geliang Tang
2024-09-06 21:29     ` Andrii Nakryiko
2024-09-09  1:12       ` Geliang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=288ad1c2-501a-4319-bc1e-e7a7e276ff63@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=geliang@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=tanggeliang@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox