BPF List
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Geliang Tang <geliang@kernel.org>
Cc: Matthieu Baerts <matttbe@kernel.org>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	mptcp@lists.linux.dev, Geliang Tang <tanggeliang@kylinos.cn>,
	bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter
Date: Fri, 27 Sep 2024 18:34:52 -0700	[thread overview]
Message-ID: <7829ef57-60e2-4648-b9cf-2d756fd85533@linux.dev> (raw)
In-Reply-To: <b623745d19dd1d9743674def8d565ef779ef0952.camel@kernel.org>

On 9/14/24 12:12 PM, Geliang Tang wrote:
>>>>>>> @@ -241,6 +286,8 @@ static int __init
>>>>>>> bpf_mptcp_kfunc_init(void)
>>>>>>>           int ret;
>>>>>>>
>>>>>>>           ret =
>>>>>>> register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
>>>>>>> +       ret = ret ?:
>>>>>>> register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
>>>>>>> +
>>>>>>> &bpf_mptcp_sched_kfunc_set);
>>>
>>> This cannot be used in tracing.
>>
>> Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
>>
>> We plan to use it in MPTCP BPF packet schedulers, which are not
>> tracing, but "struct_ops" types. And they work well with
>> KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
>>
>> BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
>> KF_TRUSTED_ARGS);
>>
>> An example of the scheduler is:
>>
>> SEC("struct_ops")
>> int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
>>               struct mptcp_sched_data *data)
>> {
>>          struct mptcp_subflow_context *subflow;
>>
>>          bpf_rcu_read_lock();
>>          bpf_for_each(mptcp_subflow, subflow, msk) {
>>                  mptcp_subflow_set_scheduled(subflow, true);
>>                  break;
>>          }
>>          bpf_rcu_read_unlock();
>>
>>          return 0;
>> }
>>
>> SEC(".struct_ops")
>> struct mptcp_sched_ops first = {
>>          .init           = (void *)mptcp_sched_first_init,
>>          .release        = (void *)mptcp_sched_first_release,
>>          .get_subflow    = (void *)bpf_first_get_subflow,
>>          .name           = "bpf_first",
>> };
>>
>> But BPF mptcp_sched_ops code has not been merged into bpf-next yet,
>> so
>> I simply test this bpf_for_each(mptcp_subflow) in tracing since I
>> noticed other bpf_iter selftests are using tracing too:
>>
>> progs/iters_task.c
>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>
>> progs/iters_css.c
>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>
>> If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
>> will try to move the selftest into a struct_ops.
>>
>>>
>>> Going back to my earlier question in v1. How is the msk->conn_list
>>> protected?
>>>
>>
>> msk->conn_list is protected by msk socket lock. (@Matt, am I right?)
>> We use this in kernel code:

A KF_TRUSTED_ARGS msk does not mean its sock locked. That said, only the tp_btf 
should have trusted msk args but still it is hard to audit all existing and 
future tracepoints which may or may not have msk lock held.

If the subflow list is rcu-ify, it will be easier to reason for tracing use 
case. Regardless, the use case is not for tracing, so this discussion is 
probably moot now.

>>
>> 	struct sock *sk = (struct sock *)msk;
>>
>> 	lock_sock(sk);
>> 	kfunc(&msk->conn_list);
>> 	release_sock(sk);
>>
>> If so, should we also use lock_sock/release_sock in
>> bpf_iter_mptcp_subflow_next()?
> 
> bpf_for_each(mptcp_subflow) is expected to be used in the get_subflow()
> interface of mptcp_sched_ops to traverse all subflows and then pick one
> subflow to send data. This interface is invoked in
> mptcp_sched_get_send(), here the msk socket lock is hold already:
> 
> int mptcp_sched_get_send(struct mptcp_sock *msk)
> {
>          struct mptcp_subflow_context *subflow;
>          struct mptcp_sched_data data;
> 
>          msk_owned_by_me(msk);
> 
> 	... ...
> 
>          mptcp_for_each_subflow(msk, subflow) {
>                  if (READ_ONCE(subflow->scheduled))
>                          return 0;
>          }
> 
>          data.reinject = false;
>          if (msk->sched == &mptcp_sched_default || !msk->sched)
>                  return mptcp_sched_default_get_subflow(msk, &data);
>          return msk->sched->get_subflow(msk, &data);
> }
> 
> So no need to hold msk socket lock again in BPF schedulers. This means
> we can walk the conn_list without any lock. bpf_rcu_read_lock() and
> bpf_rcu_read_unlock() can be dropped in BPF schedulers too. Am I right?

hmm... don't know how bpf_rcu_read_lock() comes into picture considering the msk 
lock has already been held in the struct_ops that you are planning to add.

Something you may need to consider on the subflow locking situation. Not sure 
exactly what the bpf prog needs to do on a subflow. e.g. If it needs to change 
some fields in the subflow, does it need to lock the subflow or the msk lock is 
good enough.

      reply	other threads:[~2024-09-28  1:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1726132802.git.tanggeliang@kylinos.cn>
2024-09-12  9:25 ` [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter Geliang Tang
2024-09-12 18:24   ` Andrii Nakryiko
2024-09-13  4:04     ` Geliang Tang
2024-09-13 20:57       ` Andrii Nakryiko
2024-09-14  0:41         ` Martin KaFai Lau
2024-09-14  8:40           ` Geliang Tang
2024-09-14 10:12             ` Geliang Tang
2024-09-28  1:34               ` Martin KaFai Lau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7829ef57-60e2-4648-b9cf-2d756fd85533@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=geliang@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=tanggeliang@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox