From: Geliang Tang <geliang.tang@suse.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler
Date: Mon, 12 Jun 2023 21:29:34 +0800 [thread overview]
Message-ID: <20230612132934.GA9248@localhost> (raw)
In-Reply-To: <5336efd2153d4667f9d0aea52b6d67ed2685ddbf.camel@redhat.com>
On Mon, Jun 12, 2023 at 01:05:30PM +0200, Paolo Abeni wrote:
> On Sat, 2023-06-10 at 09:45 +0800, Geliang Tang wrote:
> > On Fri, Jun 09, 2023 at 04:40:01PM +0200, Paolo Abeni wrote:
> > > On Fri, 2023-06-09 at 21:32 +0800, Geliang Tang wrote:
> > > > On Fri, Jun 09, 2023 at 11:57:53AM +0200, Paolo Abeni wrote:
> > > >
> > > > > May I guess you get a similar error if you do:
> > > > >
> > > > > subflow = mptcp_subflow_ctx(ssk)
> > > > >
> > > > > ? (just out of sheer ignorance and curiosity)
> > > >
> > > > Yes. It seems that accessing 'send_info[SSK_MODE_ACTIVE].ssk' is
> > > > considered unsafe in BPF context. So here we pass 'send_info[SSK_MODE_ACTIVE].ssk'
> > > > into bpf_mptcp_subflow_ctx() to find the related subflow. Then we access this
> > > > subflow instead of 'send_info[SSK_MODE_ACTIVE].ssk' below. This can make BPF happy.
> > > >
> > > > >
> > > > > > + * then use mptcp_subflow_tcp_sock() to get the ssk.
> > > > > > + */
> > > > > > + subflow = mptcp_subflow_tcp_sock(send_info[SSK_MODE_ACTIVE].ssk, data);
> > > > > > + ssk = mptcp_subflow_tcp_sock(subflow);
> > > > >
> > > > > What if you store the 'subflow' pointer in 'send_info'? Will the
> > > > > verifier splat with that? and what if we store the corresponding
> > > > > context index 'i' instead?
> > > >
> > > > Storing the 'subflow' or index 'i' don't work too.
> > >
> > > Uhmm... I'm very surprised about 'i'. Specifically what I mean is:
> > >
> > > struct subflow_send_info {
> > > unsigned int subflow_id;
> > > __u64 linger_time;
> > > };
> > >
> > > // ...
> > > for (i = 0; i < SSK_MODE_MAX; ++i) {
> > > send_info[i].ssk = MPTCP_SUBFLOWS_MAX;
> > > send_info[i].linger_time = -1;
> > > }
> > > // ...
> > >
> > > if (linger_time < send_info[subflow->backup].linger_time) {
> > > send_info[subflow->backup].subflow_id = i;
> > > send_info[subflow->backup].linger_time = linger_time;
> > > }
> > >
> > > // ...
> > > if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
> > > send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
> > >
> > > if (send_info[SSK_MODE_ACTIVE].subflow_id < MPTCP_SUBFLOWS_MAX)
> > > subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];
> > >
> > > The last assignment should be equivalent to the already used 'subflow =
> > > data->contexts[i];'. What kind of errors do you see here?!? Could you
> > > please report them verbatim?
> >
> > This line "subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];" will get a error:
> >
> > R2 is ptr_mptcp_sched_data invalid variable offset: off=16, var_off=(0x0; 0x38)
>
> I see. This looks like a verifier-imposed artificial constraint.
>
> Basically any access to:
>
> btf_type.array_field[valid_and_validated_variable_index]
>
> is not allowed.
>
> The main point of having sched_data.context exposed to the pluggable
> scheduler via mptcp_sched_data is to avoid looping through the subflows
> to fetch a given one, I think. But the above verifier constraint
> basically prevents such usage.
>
> I think we should add an helper into the core implementing the position
> (number) to subflow (context) mapping. e.g.:
>
> struct mptcp_subflow_context *mptcp_subflow_ctx_by_pos(struct mptcp_sk *msk,
> unsigned int pos)
> {
> if (pos >= MPTCP_SUBFLOWS_MAX)
> return NULL;
>
> return msk->sched_data.context[pos];
> }
>
> And use such helper here instead of bpf_mptcp_subflow_ctx().
"return msk->sched_data.contexts[pos];" will get the same error:
R3 is ptr_mptcp_sock invalid variable offset: off=1880, var_off=(0x0; 0x38)
Here's the patch:
'''
diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index 3e8df90951e9..5f2d8acc1a84 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -268,6 +268,7 @@ struct mptcp_sock {
__u64 snd_nxt;
__u32 token;
struct sock *first;
+ struct mptcp_sched_data sched_data;
char ca_name[TCP_CA_NAME_MAX];
} __attribute__((preserve_access_index));
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
index 2f97ffb707ac..308c254d8aba 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
@@ -10,7 +10,7 @@ char _license[] SEC("license") = "GPL";
#define MPTCP_SEND_BURST_SIZE 65428
struct subflow_send_info {
- struct sock *ssk;
+ unsigned int subflow_id;
__u64 linger_time;
};
@@ -32,6 +32,14 @@ bpf_mptcp_subflow_ctx(const struct sock *ssk, const struct mptcp_sched_data *dat
return data->contexts[nr];
}
+static struct mptcp_subflow_context *
+mptcp_subflow_ctx_by_pos(const struct mptcp_sock *msk, unsigned int pos)
+{
+ if (pos >= MPTCP_SUBFLOWS_MAX)
+ return NULL;
+ return msk->sched_data.contexts[pos];
+}
+
static inline __u64 div_u64_rem(__u64 dividend, __u32 divisor, __u32 *remainder)
{
*remainder = dividend % divisor;
@@ -79,13 +87,13 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
struct mptcp_subflow_context *subflow;
struct sock *sk = (struct sock *)msk;
__u32 pace, burst, wmem;
- int i, nr_active = 0;
__u64 linger_time;
struct sock *ssk;
+ int i;
/* pick the subflow with the lower wmem/wspace ratio */
for (i = 0; i < SSK_MODE_MAX; ++i) {
- send_info[i].ssk = NULL;
+ send_info[i].subflow_id = MPTCP_SUBFLOWS_MAX;
send_info[i].linger_time = -1;
}
@@ -98,7 +106,6 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
if (!mptcp_subflow_active(subflow))
continue;
- nr_active += !subflow->backup;
pace = subflow->avg_pacing_rate;
if (!pace) {
/* init pacing rate from socket */
@@ -110,15 +117,15 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
linger_time = div_u64((__u64)ssk->sk_wmem_queued << 32, pace);
if (linger_time < send_info[subflow->backup].linger_time) {
- send_info[subflow->backup].ssk = ssk;
+ send_info[subflow->backup].subflow_id = i;
send_info[subflow->backup].linger_time = linger_time;
}
}
mptcp_set_timeout(sk);
/* pick the best backup if no other subflow is active */
- if (!nr_active)
- send_info[SSK_MODE_ACTIVE].ssk = send_info[SSK_MODE_BACKUP].ssk;
+ if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
+ send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
/* Pass "send_info[SSK_MODE_ACTIVE].ssk" directly to bpf_sk_stream_memory_free()
* will get an error:
@@ -127,7 +134,7 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
* then use mptcp_subflow_tcp_sock() to get the ssk,
* and pass the ssk to bpf_sk_stream_memory_free().
*/
- subflow = bpf_mptcp_subflow_ctx(send_info[SSK_MODE_ACTIVE].ssk, data);
+ subflow = mptcp_subflow_ctx_by_pos(msk, send_info[SSK_MODE_ACTIVE].subflow_id);
ssk = mptcp_subflow_tcp_sock(subflow);
if (!ssk || !bpf_sk_stream_memory_free(ssk))
return -1;
'''
>
> There are a number of possible follow-ups to the above, not strictly
> related to this series, but IMHO needed before upstreaming this code:
>
> -we could remove the 'context' array from the data directly visible to
> the ebpf code.
> - Instead we could export to the scheduler the number of subflows
> currently present into the msk socket, so the scheduler itself will not
> have to always ask for all the MPTCP_SUBFLOWS_MAX possible subflows.
This for loop in BPF will get the "invalid variable offset" error too:
for (int i = 0; i < number_of_subflows; i++)
contexts[i];
So we have to loop from 0 to MAX like this:
for (i = 0; i < SSK_MODE_MAX; i++)
contexts[i];
Thanks,
-Geliang
> - we could drop entirely the data_init() scheduler operation: all the
> data should be make available/prepared by the core itself, no need to
> add an indirect call to always do the same operation.
>
> /P
>
next prev parent reply other threads:[~2023-06-12 13:29 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-08 5:46 [PATCH mptcp-next v8 00/17] BPF packet scheduler updates Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 01/17] Squash to "mptcp: drop last_snd and MPTCP_RESET_SCHEDULER" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 02/17] Squash to "mptcp: add struct mptcp_sched_ops" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 03/17] Squash to "mptcp: add scheduler wrappers" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 04/17] mptcp: add last_snd in sched_data Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 05/17] mptcp: add snd_burst " Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 06/17] mptcp: register default scheduler Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 07/17] Squash to "bpf: Add bpf_mptcp_sched_ops" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 08/17] Squash to "selftests/bpf: Add mptcp sched structs" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 09/17] Squash to "selftests/bpf: Add bpf_rr scheduler" Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 10/17] mptcp: add two wrappers needed by bpf_burst Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 11/17] bpf: Add bpf_burst write accesses Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 12/17] bpf: Export more bpf_burst related functions Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler Geliang Tang
2023-06-09 9:57 ` Paolo Abeni
2023-06-09 13:32 ` Geliang Tang
2023-06-09 14:40 ` Paolo Abeni
2023-06-10 1:45 ` Geliang Tang
2023-06-12 11:05 ` Paolo Abeni
2023-06-12 13:29 ` Geliang Tang [this message]
2023-06-12 14:22 ` Paolo Abeni
2023-06-13 5:36 ` Geliang Tang
2023-06-13 9:35 ` Paolo Abeni
2023-06-13 12:32 ` Geliang Tang
2023-06-13 13:36 ` Paolo Abeni
2023-06-13 14:03 ` Geliang Tang
2023-06-13 14:18 ` Paolo Abeni
2023-06-08 5:46 ` [PATCH mptcp-next v8 14/17] selftests/bpf: Add bpf_burst test Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 15/17] bpf: Add subflow bit flags write accesses Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 16/17] selftests/bpf: Add bpf_stale scheduler Geliang Tang
2023-06-08 5:46 ` [PATCH mptcp-next v8 17/17] selftests/bpf: Add bpf_stale test Geliang Tang
2023-06-08 6:29 ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-08 6:56 ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI
2023-06-13 10:08 ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-13 10:57 ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230612132934.GA9248@localhost \
--to=geliang.tang@suse.com \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox