From: Geliang Tang <geliang@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: mptcp@lists.linux.dev, Geliang Tang <tanggeliang@kylinos.cn>,
bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
Date: Fri, 13 Sep 2024 12:04:16 +0800 [thread overview]
Message-ID: <a9bd9aa00c702f98d86f5d7acd305cc477a4c91b.camel@kernel.org> (raw)
In-Reply-To: <CAEf4BzaVzVhoqhzpq-FD5GGJT1wW5=LbZ4ADs2+NdLO5rcJMMw@mail.gmail.com>
Hi Andrii,
On Thu, 2024-09-12 at 11:24 -0700, Andrii Nakryiko wrote:
> On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@kernel.org>
> wrote:
> >
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> >
> > It's necessary to traverse all subflows on the conn_list of an
> > MPTCP
> > socket and then call kfunc to modify the fields of each subflow. In
> > kernel space, mptcp_for_each_subflow() helper is used for this:
> >
> > mptcp_for_each_subflow(msk, subflow)
> > kfunc(subflow);
> >
> > But in the MPTCP BPF program, this has not yet been implemented. As
> > Martin suggested recently, this conn_list walking + modify-by-kfunc
> > usage fits the bpf_iter use case. So this patch adds a new bpf_iter
> > type named "mptcp_subflow" to do this and implements its helpers
> > bpf_iter_mptcp_subflow_new()/_next()/_destroy().
> >
> > Since these bpf_iter mptcp_subflow helpers are invoked in its
> > selftest
> > in a ftrace hook for mptcp_sched_get_send(), it's necessary to
> > register
> > them into a BPF_PROG_TYPE_TRACING type kfunc set together with
> > other
> > two used kfuncs mptcp_subflow_active() and
> > mptcp_subflow_set_scheduled().
> >
> > Then bpf_for_each() for mptcp_subflow can be used in BPF program
> > like
> > this:
> >
> > i = 0;
> > bpf_rcu_read_lock();
> > bpf_for_each(mptcp_subflow, subflow, msk) {
> > if (i++ >= MPTCP_SUBFLOWS_MAX)
> > break;
> > kfunc(subflow);
> > }
> > bpf_rcu_read_unlock();
> >
> > v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> > drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2]
> > (Andrii)
> > v3: drop bpf_iter__mptcp_subflow
> > v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> > it in
> > _next() (Andrii)
> >
> > Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-
> > ----
> > 1 file changed, 52 insertions(+), 5 deletions(-)
> >
>
> Looks ok from setting up open-coded iterator things, but I can't
> speak
> for other aspects I mentioned below.
>
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 6414824402e6..fec18e7e5e4a 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set
> > bpf_mptcp_fmodret_set = {
> > .set = &bpf_mptcp_fmodret_ids,
> > };
> >
> > -__diag_push();
> > -__diag_ignore_all("-Wmissing-prototypes",
> > - "kfuncs which will be used in BPF programs");
> > +struct bpf_iter_mptcp_subflow {
> > + __u64 __opaque[2];
> > +} __attribute__((aligned(8)));
> > +
> > +struct bpf_iter_mptcp_subflow_kern {
> > + struct mptcp_sock *msk;
> > + struct list_head *pos;
> > +} __attribute__((aligned(8)));
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,
> > + struct mptcp_sock *msk)
> > +{
> > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +
> > + kit->msk = msk;
> > + if (!msk)
> > + return -EINVAL;
> > +
> > + kit->pos = &msk->conn_list;
> > + return 0;
> > +}
> > +
> > +__bpf_kfunc struct mptcp_subflow_context *
> > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > +{
> > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > + struct mptcp_subflow_context *subflow;
> > + struct mptcp_sock *msk = kit->msk;
> > +
> > + if (!msk)
> > + return NULL;
> > +
> > + subflow = list_entry(kit->pos->next, struct
> > mptcp_subflow_context, node);
> > + if (!subflow || list_entry_is_head(subflow, &msk-
> > >conn_list, node))
>
> it's a bit weird that you need both !subflow and
> list_entry_is_head().
> Can you have NULL subflow at all? But this is the question to
> msk->conn_list semantics.
Do you mean to return subflow regardless of whether subflow is NULL? If
so, we need to use list_is_last() instead of list_entry_is_head():
{
struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
return NULL;
kit->pos = kit->pos->next;
return list_entry(kit->pos, struct mptcp_subflow_context, node);
}
Hope this is a better version.
>
> > + return NULL;
> > +
> > + kit->pos = &subflow->node;
> > + return subflow;
> > +}
> > +
> > +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct
> > bpf_iter_mptcp_subflow *it)
> > +{
> > +}
> >
> > __bpf_kfunc struct mptcp_subflow_context *
> > bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data,
> > unsigned int pos)
> > @@ -218,12 +260,15 @@ __bpf_kfunc bool
> > bpf_mptcp_subflow_queues_empty(struct sock *sk)
> > return tcp_rtx_queue_empty(sk);
> > }
> >
> > -__diag_pop();
> > +__bpf_kfunc_end_defs();
> >
> > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
>
> I'm not 100% sure, but I suspect you might need to specify
> KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a
KF_TRUSTED_ARGS breaks the selftest in patch 2 [1]:
'''
libbpf: prog 'trace_mptcp_sched_get_send': BPF program load failed:
Invalid argument
libbpf: prog 'trace_mptcp_sched_get_send': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk) @
mptcp_bpf_iter.c:13
0: (79) r6 = *(u64 *)(r1 +0)
func 'mptcp_sched_get_send' arg0 has btf_id 28019 type STRUCT
'mptcp_sock'
1: R1=ctx() R6_w=ptr_mptcp_sock()
; if (bpf_get_current_pid_tgid() >> 32 != pid) @ mptcp_bpf_iter.c:17
1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
2: (77) r0 >>= 32 ;
R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
3: (18) r1 = 0xffffb766c1684004 ;
R1_w=map_value(map=mptcp_bp.bss,ks=4,vs=8,off=4)
5: (61) r1 = *(u32 *)(r1 +0) ;
R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
6: (67) r1 <<= 32 ;
R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,sm
ax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
7: (c7) r1 s>>= 32 ;
R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
8: (5d) if r0 != r1 goto pc+39 ;
R0_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
R1_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
; iter = 0; @ mptcp_bpf_iter.c:20
9: (18) r8 = 0xffffb766c1684000 ;
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
11: (b4) w1 = 0 ; R1_w=0
12: (63) *(u32 *)(r8 +0) = r1 ; R1_w=0
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
; bpf_rcu_read_lock(); @ mptcp_bpf_iter.c:22
13: (85) call bpf_rcu_read_lock#84967 ;
14: (bf) r7 = r10 ; R7_w=fp0 R10=fp0
; iter = 0; @ mptcp_bpf_iter.c:20
15: (07) r7 += -16 ; R7_w=fp-16
; bpf_for_each(mptcp_subflow, subflow, msk) { @ mptcp_bpf_iter.c:23
16: (bf) r1 = r7 ; R1_w=fp-16 R7_w=fp-16
17: (bf) r2 = r6 ; R2_w=ptr_mptcp_sock()
R6=ptr_mptcp_sock()
18: (85) call bpf_iter_mptcp_subflow_new#62694
R2 must be referenced or trusted
processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 1
peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'trace_mptcp_sched_get_send': failed to load: -22
libbpf: failed to load object 'mptcp_bpf_iter'
libbpf: failed to load BPF skeleton 'mptcp_bpf_iter': -22
test_bpf_iter:FAIL:skel_open_load: mptcp_iter unexpected error: -22
#169/4 mptcp/bpf_iter:FAIL
'''
I don't know how to fix it yet.
> valid owned kernel object. Other BPF folks might help to clarify
> this.
>
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
> > +BTF_ID_FLAGS(func, mptcp_subflow_active)
But we do need to add KF_ITER_NEW/NEXT/DESTROY flags here:
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT |
KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
With these flags, we can drop this additional test in loop:
if (i++ >= MPTCP_SUBFLOWS_MAX)
break;
This problem has been bothering me for a while. I got "infinite loop
detected" errors when walking the list with
bpf_for_each(mptcp_subflow). So I had to use this additional test to
break it.
With these KF_ITER_NEW/NEXT/DESTROY flags, no need to use this
additional test anymore.
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/patch/4467ab3af0f1a6c378778ca6be72753b16a1532c.1726132802.git.tanggeliang@kylinos.cn/
> > BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> > BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
> > -BTF_ID_FLAGS(func, mptcp_subflow_active)
> > BTF_ID_FLAGS(func, mptcp_set_timeout)
> > BTF_ID_FLAGS(func, mptcp_wnd_end)
> > BTF_ID_FLAGS(func, tcp_stream_memory_free)
> > @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
> > int ret;
> >
> > ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > + ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > +
> > &bpf_mptcp_sched_kfunc_set);
> > ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> >
> > &bpf_mptcp_sched_kfunc_set);
> > #ifdef CONFIG_BPF_JIT
> > --
> > 2.43.0
> >
> >
next prev parent reply other threads:[~2024-09-13 4:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1726132802.git.tanggeliang@kylinos.cn>
2024-09-12 9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 18:24 ` Andrii Nakryiko
2024-09-13 4:04 ` Geliang Tang [this message]
2024-09-13 20:57 ` Andrii Nakryiko
2024-09-14 0:41 ` Martin KaFai Lau
2024-09-14 8:40 ` Geliang Tang
2024-09-14 10:12 ` Geliang Tang
2024-09-28 1:34 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a9bd9aa00c702f98d86f5d7acd305cc477a4c91b.camel@kernel.org \
--to=geliang@kernel.org \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=tanggeliang@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox