MPTCP Linux Development
 help / color / mirror / Atom feed
From: Geliang Tang <geliang.tang@suse.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler
Date: Mon, 12 Jun 2023 21:29:34 +0800	[thread overview]
Message-ID: <20230612132934.GA9248@localhost> (raw)
In-Reply-To: <5336efd2153d4667f9d0aea52b6d67ed2685ddbf.camel@redhat.com>

On Mon, Jun 12, 2023 at 01:05:30PM +0200, Paolo Abeni wrote:
> On Sat, 2023-06-10 at 09:45 +0800, Geliang Tang wrote:
> > On Fri, Jun 09, 2023 at 04:40:01PM +0200, Paolo Abeni wrote:
> > > On Fri, 2023-06-09 at 21:32 +0800, Geliang Tang wrote:
> > > > On Fri, Jun 09, 2023 at 11:57:53AM +0200, Paolo Abeni wrote:
> > > > 
> > > > > May I guess you get a similar error if you do:
> > > > > 
> > > > > 	subflow = mptcp_subflow_ctx(ssk)
> > > > > 
> > > > > ? (just out of sheer ignorance and curiosity)
> > > > 
> > > > Yes. It seems that accessing 'send_info[SSK_MODE_ACTIVE].ssk' is
> > > > considered unsafe in BPF context. So here we pass 'send_info[SSK_MODE_ACTIVE].ssk'
> > > > into bpf_mptcp_subflow_ctx() to find the related subflow. Then we access this
> > > > subflow instead of 'send_info[SSK_MODE_ACTIVE].ssk' below. This can make BPF happy.
> > > > 
> > > > > 
> > > > > > +	 * then use mptcp_subflow_tcp_sock() to get the ssk.
> > > > > > +	 */
> > > > > > +	subflow = mptcp_subflow_tcp_sock(send_info[SSK_MODE_ACTIVE].ssk, data);
> > > > > > +	ssk = mptcp_subflow_tcp_sock(subflow);
> > > > > 
> > > > > What if you store the 'subflow' pointer in 'send_info'? Will the
> > > > > verifier splat with that? and what if we store the corresponding
> > > > > context index 'i' instead?
> > > > 
> > > > Storing the 'subflow' or index 'i' don't work too.
> > > 
> > > Uhmm... I'm very surprised about 'i'. Specifically what I mean is:
> > > 
> > > struct subflow_send_info {
> > > 	unsigned int subflow_id;
> > > 	__u64 linger_time;
> > > };
> > > 
> > > // ...
> > > 	for (i = 0; i < SSK_MODE_MAX; ++i) {
> > > 		send_info[i].ssk = MPTCP_SUBFLOWS_MAX;
> > > 		send_info[i].linger_time = -1;
> > > 	}
> > > // ...
> > > 
> > > 		if (linger_time < send_info[subflow->backup].linger_time) {
> > > 			send_info[subflow->backup].subflow_id = i;
> > > 			send_info[subflow->backup].linger_time = linger_time;
> > > 		}
> > > 
> > > // ...
> > > 	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
> > > 		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
> > > 
> > > 	if (send_info[SSK_MODE_ACTIVE].subflow_id < MPTCP_SUBFLOWS_MAX)
> > > 		subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];
> > > 
> > > The last assignment should be equivalent to the already used 'subflow =
> > > data->contexts[i];'. What kind of errors do you see here?!? Could you
> > > please report them verbatim?
> > 
> > This line "subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];" will get a error:
> > 
> >   R2 is ptr_mptcp_sched_data invalid variable offset: off=16, var_off=(0x0; 0x38)
> 
> I see. This looks like a verifier-imposed artificial constraint.
> 
> Basically any access to:
> 
> btf_type.array_field[valid_and_validated_variable_index]
> 
> is not allowed.
> 
> The main point of having sched_data.context exposed to the pluggable
> scheduler via mptcp_sched_data is to avoid looping through the subflows
> to fetch a given one, I think. But the above verifier constraint
> basically prevents such usage.
> 
> I think we should add an helper into the core implementing the position
> (number) to subflow (context) mapping. e.g.:
> 
> struct mptcp_subflow_context *mptcp_subflow_ctx_by_pos(struct mptcp_sk *msk,
> 							unsigned int pos)
> {
> 	if (pos >= MPTCP_SUBFLOWS_MAX)
> 		return NULL;
> 
> 	return msk->sched_data.context[pos];
> }
> 
> And use such helper here instead of bpf_mptcp_subflow_ctx(). 

"return msk->sched_data.contexts[pos];" will get the same error:

R3 is ptr_mptcp_sock invalid variable offset: off=1880, var_off=(0x0; 0x38)

Here's the patch:

'''
diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index 3e8df90951e9..5f2d8acc1a84 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -268,6 +268,7 @@ struct mptcp_sock {
 	__u64		snd_nxt;
 	__u32		token;
 	struct sock	*first;
+	struct mptcp_sched_data sched_data;
 	char		ca_name[TCP_CA_NAME_MAX];
 } __attribute__((preserve_access_index));
 
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
index 2f97ffb707ac..308c254d8aba 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
@@ -10,7 +10,7 @@ char _license[] SEC("license") = "GPL";
 #define MPTCP_SEND_BURST_SIZE	65428
 
 struct subflow_send_info {
-	struct sock *ssk;
+	unsigned int subflow_id;
 	__u64 linger_time;
 };
 
@@ -32,6 +32,14 @@ bpf_mptcp_subflow_ctx(const struct sock *ssk, const struct mptcp_sched_data *dat
 	return data->contexts[nr];
 }
 
+static struct mptcp_subflow_context *
+mptcp_subflow_ctx_by_pos(const struct mptcp_sock *msk, unsigned int pos)
+{
+	if (pos >= MPTCP_SUBFLOWS_MAX)
+		return NULL;
+	return msk->sched_data.contexts[pos];
+}
+
 static inline __u64 div_u64_rem(__u64 dividend, __u32 divisor, __u32 *remainder)
 {
 	*remainder = dividend % divisor;
@@ -79,13 +87,13 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 	__u32 pace, burst, wmem;
-	int i, nr_active = 0;
 	__u64 linger_time;
 	struct sock *ssk;
+	int i;
 
 	/* pick the subflow with the lower wmem/wspace ratio */
 	for (i = 0; i < SSK_MODE_MAX; ++i) {
-		send_info[i].ssk = NULL;
+		send_info[i].subflow_id = MPTCP_SUBFLOWS_MAX;
 		send_info[i].linger_time = -1;
 	}
 
@@ -98,7 +106,6 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 		if (!mptcp_subflow_active(subflow))
 			continue;
 
-		nr_active += !subflow->backup;
 		pace = subflow->avg_pacing_rate;
 		if (!pace) {
 			/* init pacing rate from socket */
@@ -110,15 +117,15 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 
 		linger_time = div_u64((__u64)ssk->sk_wmem_queued << 32, pace);
 		if (linger_time < send_info[subflow->backup].linger_time) {
-			send_info[subflow->backup].ssk = ssk;
+			send_info[subflow->backup].subflow_id = i;
 			send_info[subflow->backup].linger_time = linger_time;
 		}
 	}
 	mptcp_set_timeout(sk);
 
 	/* pick the best backup if no other subflow is active */
-	if (!nr_active)
-		send_info[SSK_MODE_ACTIVE].ssk = send_info[SSK_MODE_BACKUP].ssk;
+	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
+		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
 
 	/* Pass "send_info[SSK_MODE_ACTIVE].ssk" directly to bpf_sk_stream_memory_free()
 	 * will get an error:
@@ -127,7 +134,7 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 	 * then use mptcp_subflow_tcp_sock() to get the ssk,
 	 * and pass the ssk to bpf_sk_stream_memory_free().
 	 */
-	subflow = bpf_mptcp_subflow_ctx(send_info[SSK_MODE_ACTIVE].ssk, data);
+	subflow = mptcp_subflow_ctx_by_pos(msk, send_info[SSK_MODE_ACTIVE].subflow_id);
 	ssk = mptcp_subflow_tcp_sock(subflow);
 	if (!ssk || !bpf_sk_stream_memory_free(ssk))
 		return -1;
'''

> 
> There are a number of possible follow-ups to the above, not strictly
> related to this series, but IMHO needed before upstreaming this code:
> 
> -we could remove the 'context' array from the data directly visible to
> the ebpf code. 
> - Instead we could export to the scheduler the number of subflows
> currently present into the msk socket, so the scheduler itself will not
> have to always ask for all the MPTCP_SUBFLOWS_MAX possible subflows. 

This for loop in BPF will get the "invalid variable offset" error too:

for (int i = 0; i < number_of_subflows; i++)
	contexts[i];

So we have to loop from 0 to MAX like this:

for (i = 0; i < SSK_MODE_MAX; i++)
	contexts[i];

Thanks,
-Geliang

> - we could drop entirely the data_init() scheduler operation: all the
> data should be make available/prepared by the core itself, no need to
> add an indirect call to always do the same operation.
> 
> /P
> 

  reply	other threads:[~2023-06-12 13:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08  5:46 [PATCH mptcp-next v8 00/17] BPF packet scheduler updates Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 01/17] Squash to "mptcp: drop last_snd and MPTCP_RESET_SCHEDULER" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 02/17] Squash to "mptcp: add struct mptcp_sched_ops" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 03/17] Squash to "mptcp: add scheduler wrappers" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 04/17] mptcp: add last_snd in sched_data Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 05/17] mptcp: add snd_burst " Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 06/17] mptcp: register default scheduler Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 07/17] Squash to "bpf: Add bpf_mptcp_sched_ops" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 08/17] Squash to "selftests/bpf: Add mptcp sched structs" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 09/17] Squash to "selftests/bpf: Add bpf_rr scheduler" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 10/17] mptcp: add two wrappers needed by bpf_burst Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 11/17] bpf: Add bpf_burst write accesses Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 12/17] bpf: Export more bpf_burst related functions Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler Geliang Tang
2023-06-09  9:57   ` Paolo Abeni
2023-06-09 13:32     ` Geliang Tang
2023-06-09 14:40       ` Paolo Abeni
2023-06-10  1:45         ` Geliang Tang
2023-06-12 11:05           ` Paolo Abeni
2023-06-12 13:29             ` Geliang Tang [this message]
2023-06-12 14:22               ` Paolo Abeni
2023-06-13  5:36                 ` Geliang Tang
2023-06-13  9:35                   ` Paolo Abeni
2023-06-13 12:32                 ` Geliang Tang
2023-06-13 13:36                   ` Paolo Abeni
2023-06-13 14:03                     ` Geliang Tang
2023-06-13 14:18                       ` Paolo Abeni
2023-06-08  5:46 ` [PATCH mptcp-next v8 14/17] selftests/bpf: Add bpf_burst test Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 15/17] bpf: Add subflow bit flags write accesses Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 16/17] selftests/bpf: Add bpf_stale scheduler Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 17/17] selftests/bpf: Add bpf_stale test Geliang Tang
2023-06-08  6:29   ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-08  6:56   ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI
2023-06-13 10:08   ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-13 10:57   ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230612132934.GA9248@localhost \
    --to=geliang.tang@suse.com \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox