From: Geliang Tang <geliang@kernel.org>
To: Mat Martineau <martineau@kernel.org>
Cc: mptcp@lists.linux.dev, Geliang Tang <tanggeliang@kylinos.cn>
Subject: Re: [PATCH mptcp-next v7 4/5] Squash to "selftests/bpf: Add bpf_burst scheduler & test"
Date: Wed, 23 Oct 2024 16:57:48 +0800 [thread overview]
Message-ID: <155fd0bca02ef0619bbd5dd0f1efa4dfdf8a903b.camel@kernel.org> (raw)
In-Reply-To: <2bc8e804-3fca-086b-5050-b178eda1066a@kernel.org>
On Tue, 2024-10-22 at 17:03 -0700, Mat Martineau wrote:
> On Tue, 22 Oct 2024, Geliang Tang wrote:
>
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> >
> > Use the newly added bpf_for_each() helper to walk the conn_list.
> >
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > .../selftests/bpf/progs/mptcp_bpf_burst.c | 79 ++++++++++------
> > ---
> > 1 file changed, 40 insertions(+), 39 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> > b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> > index eb21119aa8f7..e7df5f048aa4 100644
> > --- a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> > +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
> > @@ -11,6 +11,10 @@ char _license[] SEC("license") = "GPL";
> >
> > #define min(a, b) ((a) < (b) ? (a) : (b))
> >
> > +#define SSK_MODE_ACTIVE 0
> > +#define SSK_MODE_BACKUP 1
> > +#define SSK_MODE_MAX 2
> > +
>
> Hi Geliang -
>
> > struct bpf_subflow_send_info {
> > __u8 subflow_id;
>
> If you store a subflow pointer here instead of an index, there's no
> need
> to add the mptcp_lookup_subflow_id() helper function, an extra
> iteration over conn_list is eliminated, and the code is more like
> mptcp_subflow_get_send().
I thought so too. Two changes were needed to achieve this.
1. Move sk_stream_memory_free check inside bpf_for_each() loop.
2. Implement mptcp_subflow_set_scheduled helper in BPF.
>
> - Mat
>
> > __u64 linger_time;
> > @@ -23,10 +27,6 @@ extern bool tcp_stream_memory_free(const struct
> > sock *sk, int wake) __ksym;
> > extern bool bpf_mptcp_subflow_queues_empty(struct sock *sk) __ksym;
> > extern void mptcp_pm_subflow_chk_stale(const struct mptcp_sock
> > *msk, struct sock *ssk) __ksym;
> >
> > -#define SSK_MODE_ACTIVE 0
> > -#define SSK_MODE_BACKUP 1
> > -#define SSK_MODE_MAX 2
> > -
> > static __always_inline __u64 div_u64(__u64 dividend, __u32 divisor)
> > {
> > return dividend / divisor;
> > @@ -57,6 +57,19 @@ static __always_inline bool
> > sk_stream_memory_free(const struct sock *sk)
> > return __sk_stream_memory_free(sk, 0);
> > }
> >
> > +static struct mptcp_subflow_context *
> > +mptcp_lookup_subflow_by_id(struct mptcp_sock *msk, unsigned int
> > id)
> > +{
> > + struct mptcp_subflow_context *subflow;
> > +
> > + bpf_for_each(mptcp_subflow, subflow, msk) {
> > + if (subflow->subflow_id == id)
> > + return subflow;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > SEC("struct_ops")
> > void BPF_PROG(mptcp_sched_burst_init, struct mptcp_sock *msk)
> > {
> > @@ -67,8 +80,7 @@ void BPF_PROG(mptcp_sched_burst_release, struct
> > mptcp_sock *msk)
> > {
> > }
> >
> > -static int bpf_burst_get_send(struct mptcp_sock *msk,
> > - struct mptcp_sched_data *data)
> > +static int bpf_burst_get_send(struct mptcp_sock *msk)
> > {
> > struct bpf_subflow_send_info send_info[SSK_MODE_MAX];
> > struct mptcp_subflow_context *subflow;
> > @@ -84,16 +96,10 @@ static int bpf_burst_get_send(struct mptcp_sock
> > *msk,
> > send_info[i].linger_time = -1;
> > }
> >
> > - for (i = 0; i < data->subflows && i < MPTCP_SUBFLOWS_MAX;
> > i++) {
> > - bool backup;
> > + bpf_for_each(mptcp_subflow, subflow, msk) {
> > + bool backup = subflow->backup || subflow-
> > >request_bkup;
> >
> > - subflow = bpf_mptcp_subflow_ctx_by_pos(data, i);
> > - if (!subflow)
> > - break;
> > -
> > - backup = subflow->backup || subflow->request_bkup;
> > -
> > - ssk = mptcp_subflow_tcp_sock(subflow);
> > + ssk = bpf_mptcp_subflow_tcp_sock(subflow);
> > if (!mptcp_subflow_active(subflow))
> > continue;
> >
> > @@ -109,7 +115,7 @@ static int bpf_burst_get_send(struct mptcp_sock
> > *msk,
> >
> > linger_time = div_u64((__u64)ssk->sk_wmem_queued
> > << 32, pace);
> > if (linger_time < send_info[backup].linger_time) {
> > - send_info[backup].subflow_id = i;
> > + send_info[backup].subflow_id = subflow-
> > >subflow_id;
> > send_info[backup].linger_time =
> > linger_time;
> > }
> > }
> > @@ -119,10 +125,10 @@ static int bpf_burst_get_send(struct
> > mptcp_sock *msk,
> > if (!nr_active)
> > send_info[SSK_MODE_ACTIVE].subflow_id =
> > send_info[SSK_MODE_BACKUP].subflow_id;
> >
> > - subflow = bpf_mptcp_subflow_ctx_by_pos(data,
> > send_info[SSK_MODE_ACTIVE].subflow_id);
> > + subflow = mptcp_lookup_subflow_by_id(msk,
> > send_info[SSK_MODE_ACTIVE].subflow_id);
BPF does not allow access to subflow in this way:
subflow = send_info[SSK_MODE_ACTIVE].subflow;
So I have to use bpf_core_cast() here as:
subflow = bpf_core_cast(send_info[SSK_MODE_ACTIVE].subflow,
struct mptcp_subflow_context);
> > if (!subflow)
> > return -1;
> > - ssk = mptcp_subflow_tcp_sock(subflow);
> > + ssk = bpf_mptcp_subflow_tcp_sock(subflow);
Here BPF doesn't allow passing a cast pointer (subflow) to a kfunc
(bpf_mptcp_subflow_tcp_sock). Fortunately we can use
mptcp_subflow_tcp_sock instead, which is a BPF helper, not a kfunc:
ssk = mptcp_subflow_tcp_sock(subflow);
> > if (!ssk || !sk_stream_memory_free(ssk))
Again, BPF does not allow passing a cast pointer to a kfunc,
sk_stream_memory_free(ssk) fails.
It's not possible to implement a sk_stream_memory_free function in BPF,
it's too complicated. So the approach I took in v8 was to move this
sk_stream_memory_free check forward to the position of
mptcp_subflow_active in bpf_for_each() loop. I think this doesn't
change the logic of the burst scheduler, but I'd like to hear your
opinion.
After this, mptcp_subflow_set_scheduled(subflow, true) is not allowed
too. So I have to implement a mptcp_subflow_set_scheduled helper in
BPF. It's easy since WRITE_ONCE() is defined in progs/map_kptr.c.
> > return -1;
> >
> > @@ -141,23 +147,18 @@ static int bpf_burst_get_send(struct
> > mptcp_sock *msk,
> > return 0;
> > }
> >
> > -static int bpf_burst_get_retrans(struct mptcp_sock *msk,
> > - struct mptcp_sched_data *data)
> > +static int bpf_burst_get_retrans(struct mptcp_sock *msk)
> > {
> > - int backup = MPTCP_SUBFLOWS_MAX, pick =
> > MPTCP_SUBFLOWS_MAX, subflow_id;
> > + struct sock *backup = NULL, *pick = NULL;
> > struct mptcp_subflow_context *subflow;
> > int min_stale_count = INT_MAX;
> > - struct sock *ssk;
> >
> > - for (int i = 0; i < data->subflows && i <
> > MPTCP_SUBFLOWS_MAX; i++) {
> > - subflow = bpf_mptcp_subflow_ctx_by_pos(data, i);
> > - if (!subflow)
> > - break;
> > + bpf_for_each(mptcp_subflow, subflow, msk) {
> > + struct sock *ssk =
> > bpf_mptcp_subflow_tcp_sock(subflow);
> >
> > if (!mptcp_subflow_active(subflow))
> > continue;
> >
> > - ssk = mptcp_subflow_tcp_sock(subflow);
> > /* still data outstanding at TCP level? skip this
> > */
> > if (!tcp_rtx_and_write_queues_empty(ssk)) {
> > mptcp_pm_subflow_chk_stale(msk, ssk);
> > @@ -166,23 +167,23 @@ static int bpf_burst_get_retrans(struct
> > mptcp_sock *msk,
> > }
> >
> > if (subflow->backup || subflow->request_bkup) {
> > - if (backup == MPTCP_SUBFLOWS_MAX)
> > - backup = i;
> > + if (!backup)
> > + backup = ssk;
> > continue;
> > }
> >
> > - if (pick == MPTCP_SUBFLOWS_MAX)
> > - pick = i;
> > + if (!pick)
> > + pick = ssk;
> > }
> >
> > - if (pick < MPTCP_SUBFLOWS_MAX) {
> > - subflow_id = pick;
> > + if (pick)
> > goto out;
> > - }
> > - subflow_id = min_stale_count > 1 ? backup :
> > MPTCP_SUBFLOWS_MAX;
> > + pick = min_stale_count > 1 ? backup : NULL;
> >
> > out:
> > - subflow = bpf_mptcp_subflow_ctx_by_pos(data, subflow_id);
> > + if (!pick)
> > + return -1;
> > + subflow = bpf_mptcp_subflow_ctx(pick);
> > if (!subflow)
> > return -1;
> > mptcp_subflow_set_scheduled(subflow, true);
> > @@ -194,11 +195,11 @@ int BPF_PROG(bpf_burst_get_subflow, struct
> > mptcp_sock *msk,
> > struct mptcp_sched_data *data)
> > {
> > if (data->reinject)
> > - return bpf_burst_get_retrans(msk, data);
> > - return bpf_burst_get_send(msk, data);
> > + return bpf_burst_get_retrans(msk);
> > + return bpf_burst_get_send(msk);
> > }
> >
> > -SEC(".struct_ops")
> > +SEC(".struct_ops.link")
> > struct mptcp_sched_ops burst = {
> > .init = (void *)mptcp_sched_burst_init,
> > .release = (void *)mptcp_sched_burst_release,
> > --
> > 2.45.2
> >
> >
> >
next prev parent reply other threads:[~2024-10-23 8:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-22 7:52 [PATCH mptcp-next v7 0/5] use bpf_iter in bpf schedulers Geliang Tang
2024-10-22 7:52 ` [PATCH mptcp-next v7 1/5] Squash to "selftests/bpf: Add bpf_bkup scheduler & test" Geliang Tang
2024-10-22 7:52 ` [PATCH mptcp-next v7 2/5] Squash to "selftests/bpf: Add bpf_rr " Geliang Tang
2024-10-22 23:52 ` Mat Martineau
2024-10-23 8:32 ` Geliang Tang
2024-10-22 7:52 ` [PATCH mptcp-next v7 3/5] Squash to "selftests/bpf: Add bpf_red " Geliang Tang
2024-10-22 7:52 ` [PATCH mptcp-next v7 4/5] Squash to "selftests/bpf: Add bpf_burst " Geliang Tang
2024-10-23 0:03 ` Mat Martineau
2024-10-23 8:57 ` Geliang Tang [this message]
2024-10-22 7:52 ` [PATCH mptcp-next v7 5/5] Squash to "selftests/bpf: Add bpf_first " Geliang Tang
2024-10-23 0:10 ` Mat Martineau
2024-10-23 9:00 ` Geliang Tang
2024-10-22 9:04 ` [PATCH mptcp-next v7 0/5] use bpf_iter in bpf schedulers MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=155fd0bca02ef0619bbd5dd0f1efa4dfdf8a903b.camel@kernel.org \
--to=geliang@kernel.org \
--cc=martineau@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=tanggeliang@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.