BPF List
 help / color / mirror / Atom feed
From: Geliang Tang <geliang@kernel.org>
To: Martin KaFai Lau <martin.lau@linux.dev>,
	Matthieu Baerts <matttbe@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	mptcp@lists.linux.dev,  Geliang Tang <tanggeliang@kylinos.cn>,
	bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
Date: Sat, 14 Sep 2024 18:12:36 +0800	[thread overview]
Message-ID: <b623745d19dd1d9743674def8d565ef779ef0952.camel@kernel.org> (raw)
In-Reply-To: <766062c8fd8920dcc51e7ab2c097541d96bb8ab8.camel@kernel.org>

On Sat, 2024-09-14 at 16:40 +0800, Geliang Tang wrote:
> Hi Martin, Andrii, Matt,
> 
> On Fri, 2024-09-13 at 17:41 -0700, Martin KaFai Lau wrote:
> > On 9/13/24 1:57 PM, Andrii Nakryiko wrote:
> > > > > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > > > > > bpf_iter_mptcp_subflow *it,
> > > > > > +                                          struct
> > > > > > mptcp_sock
> > > > > > *msk)
> > > > > > +{
> > > > > > +       struct bpf_iter_mptcp_subflow_kern *kit = (void
> > > > > > *)it;
> > > > > > +
> > > > > > +       kit->msk = msk;
> > > > > > +       if (!msk)
> > > > > > +               return -EINVAL;
> > > > > > +
> > > > > > +       kit->pos = &msk->conn_list;
> > > > > > +       return 0;
> > > > > > +}
> > 
> > [ ... ]
> > 
> > > > > >   BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > > > > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> > > > > 
> > > > > I'm not 100% sure, but I suspect you might need to specify
> > > > > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk`
> > > > > is
> > > > > a
> > 
> > +1
> 
> So we must add KF_TRUSTED_ARGS flag, right?
> 
> > 
> > > > > > @@ -241,6 +286,8 @@ static int __init
> > > > > > bpf_mptcp_kfunc_init(void)
> > > > > >          int ret;
> > > > > > 
> > > > > >          ret =
> > > > > > register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > > > > > +       ret = ret ?:
> > > > > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > > > > > +
> > > > > > &bpf_mptcp_sched_kfunc_set);
> > 
> > This cannot be used in tracing.
> 
> Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
> 
> We plan to use it in MPTCP BPF packet schedulers, which are not
> tracing, but "struct_ops" types. And they work well with
> KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
> 
> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
> KF_TRUSTED_ARGS);
> 
> An example of the scheduler is:
> 
> SEC("struct_ops")
> int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
>              struct mptcp_sched_data *data)
> {
>         struct mptcp_subflow_context *subflow;
> 
>         bpf_rcu_read_lock();
>         bpf_for_each(mptcp_subflow, subflow, msk) {
>                 mptcp_subflow_set_scheduled(subflow, true);
>                 break;
>         }
>         bpf_rcu_read_unlock();
> 
>         return 0;
> }
> 
> SEC(".struct_ops")
> struct mptcp_sched_ops first = { 
>         .init           = (void *)mptcp_sched_first_init,
>         .release        = (void *)mptcp_sched_first_release,
>         .get_subflow    = (void *)bpf_first_get_subflow,
>         .name           = "bpf_first",
> };
> 
> But BPF mptcp_sched_ops code has not been merged into bpf-next yet,
> so
> I simply test this bpf_for_each(mptcp_subflow) in tracing since I
> noticed other bpf_iter selftests are using tracing too:
> 
> progs/iters_task.c
> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> 
> progs/iters_css.c
> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> 
> If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
> will try to move the selftest into a struct_ops.
> 
> > 
> > Going back to my earlier question in v1. How is the msk->conn_list
> > protected?
> > 
> 
> msk->conn_list is protected by msk socket lock. (@Matt, am I right?)
> We
> use this in kernel code:
> 
> 	struct sock *sk = (struct sock *)msk;
> 
> 	lock_sock(sk);
> 	kfunc(&msk->conn_list);
> 	release_sock(sk);
> 
> If so, should we also use lock_sock/release_sock in
> bpf_iter_mptcp_subflow_next()?

bpf_for_each(mptcp_subflow) is expected to be used in the get_subflow()
interface of mptcp_sched_ops to traverse all subflows and then pick one
subflow to send data. This interface is invoked in
mptcp_sched_get_send(), here the msk socket lock is hold already:

int mptcp_sched_get_send(struct mptcp_sock *msk)
{
        struct mptcp_subflow_context *subflow;
        struct mptcp_sched_data data;

        msk_owned_by_me(msk);

	... ...

        mptcp_for_each_subflow(msk, subflow) {
                if (READ_ONCE(subflow->scheduled))
                        return 0;
        }

        data.reinject = false;
        if (msk->sched == &mptcp_sched_default || !msk->sched)
                return mptcp_sched_default_get_subflow(msk, &data);
        return msk->sched->get_subflow(msk, &data);
}

So no need to hold msk socket lock again in BPF schedulers. This means
we can walk the conn_list without any lock. bpf_rcu_read_lock() and
bpf_rcu_read_unlock() can be dropped in BPF schedulers too. Am I right?

Thanks,
-Geliang


  reply	other threads:[~2024-09-14 10:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1726132802.git.tanggeliang@kylinos.cn>
2024-09-12  9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 18:24   ` Andrii Nakryiko
2024-09-13  4:04     ` Geliang Tang
2024-09-13 20:57       ` Andrii Nakryiko
2024-09-14  0:41         ` Martin KaFai Lau
2024-09-14  8:40           ` Geliang Tang
2024-09-14 10:12             ` Geliang Tang [this message]
2024-09-28  1:34               ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b623745d19dd1d9743674def8d565ef779ef0952.camel@kernel.org \
    --to=geliang@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=tanggeliang@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox