All of lore.kernel.org
 help / color / mirror / Atom feed
From: Geliang Tang <geliang.tang@suse.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler
Date: Mon, 12 Jun 2023 21:29:34 +0800	[thread overview]
Message-ID: <20230612132934.GA9248@localhost> (raw)
In-Reply-To: <5336efd2153d4667f9d0aea52b6d67ed2685ddbf.camel@redhat.com>

On Mon, Jun 12, 2023 at 01:05:30PM +0200, Paolo Abeni wrote:
> On Sat, 2023-06-10 at 09:45 +0800, Geliang Tang wrote:
> > On Fri, Jun 09, 2023 at 04:40:01PM +0200, Paolo Abeni wrote:
> > > On Fri, 2023-06-09 at 21:32 +0800, Geliang Tang wrote:
> > > > On Fri, Jun 09, 2023 at 11:57:53AM +0200, Paolo Abeni wrote:
> > > > 
> > > > > May I guess you get a similar error if you do:
> > > > > 
> > > > > 	subflow = mptcp_subflow_ctx(ssk)
> > > > > 
> > > > > ? (just out of sheer ignorance and curiosity)
> > > > 
> > > > Yes. It seems that accessing 'send_info[SSK_MODE_ACTIVE].ssk' is
> > > > considered unsafe in BPF context. So here we pass 'send_info[SSK_MODE_ACTIVE].ssk'
> > > > into bpf_mptcp_subflow_ctx() to find the related subflow. Then we access this
> > > > subflow instead of 'send_info[SSK_MODE_ACTIVE].ssk' below. This can make BPF happy.
> > > > 
> > > > > 
> > > > > > +	 * then use mptcp_subflow_tcp_sock() to get the ssk.
> > > > > > +	 */
> > > > > > +	subflow = mptcp_subflow_tcp_sock(send_info[SSK_MODE_ACTIVE].ssk, data);
> > > > > > +	ssk = mptcp_subflow_tcp_sock(subflow);
> > > > > 
> > > > > What if you store the 'subflow' pointer in 'send_info'? Will the
> > > > > verifier splat with that? and what if we store the corresponding
> > > > > context index 'i' instead?
> > > > 
> > > > Storing the 'subflow' or index 'i' don't work too.
> > > 
> > > Uhmm... I'm very surprised about 'i'. Specifically what I mean is:
> > > 
> > > struct subflow_send_info {
> > > 	unsigned int subflow_id;
> > > 	__u64 linger_time;
> > > };
> > > 
> > > // ...
> > > 	for (i = 0; i < SSK_MODE_MAX; ++i) {
> > > 		send_info[i].ssk = MPTCP_SUBFLOWS_MAX;
> > > 		send_info[i].linger_time = -1;
> > > 	}
> > > // ...
> > > 
> > > 		if (linger_time < send_info[subflow->backup].linger_time) {
> > > 			send_info[subflow->backup].subflow_id = i;
> > > 			send_info[subflow->backup].linger_time = linger_time;
> > > 		}
> > > 
> > > // ...
> > > 	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
> > > 		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
> > > 
> > > 	if (send_info[SSK_MODE_ACTIVE].subflow_id < MPTCP_SUBFLOWS_MAX)
> > > 		subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];
> > > 
> > > The last assignment should be equivalent to the already used 'subflow =
> > > data->contexts[i];'. What kind of errors do you see here?!? Could you
> > > please report them verbatim?
> > 
> > This line "subflow = data->context[send_info[SSK_MODE_ACTIVE].subflow_id];" will get a error:
> > 
> >   R2 is ptr_mptcp_sched_data invalid variable offset: off=16, var_off=(0x0; 0x38)
> 
> I see. This looks like a verifier-imposed artificial constraint.
> 
> Basically any access to:
> 
> btf_type.array_field[valid_and_validated_variable_index]
> 
> is not allowed.
> 
> The main point of having sched_data.context exposed to the pluggable
> scheduler via mptcp_sched_data is to avoid looping through the subflows
> to fetch a given one, I think. But the above verifier constraint
> basically prevents such usage.
> 
> I think we should add an helper into the core implementing the position
> (number) to subflow (context) mapping. e.g.:
> 
> struct mptcp_subflow_context *mptcp_subflow_ctx_by_pos(struct mptcp_sk *msk,
> 							unsigned int pos)
> {
> 	if (pos >= MPTCP_SUBFLOWS_MAX)
> 		return NULL;
> 
> 	return msk->sched_data.context[pos];
> }
> 
> And use such helper here instead of bpf_mptcp_subflow_ctx(). 

"return msk->sched_data.contexts[pos];" will get the same error:

R3 is ptr_mptcp_sock invalid variable offset: off=1880, var_off=(0x0; 0x38)

Here's the patch:

'''
diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index 3e8df90951e9..5f2d8acc1a84 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -268,6 +268,7 @@ struct mptcp_sock {
 	__u64		snd_nxt;
 	__u32		token;
 	struct sock	*first;
+	struct mptcp_sched_data sched_data;
 	char		ca_name[TCP_CA_NAME_MAX];
 } __attribute__((preserve_access_index));
 
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
index 2f97ffb707ac..308c254d8aba 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_burst.c
@@ -10,7 +10,7 @@ char _license[] SEC("license") = "GPL";
 #define MPTCP_SEND_BURST_SIZE	65428
 
 struct subflow_send_info {
-	struct sock *ssk;
+	unsigned int subflow_id;
 	__u64 linger_time;
 };
 
@@ -32,6 +32,14 @@ bpf_mptcp_subflow_ctx(const struct sock *ssk, const struct mptcp_sched_data *dat
 	return data->contexts[nr];
 }
 
+static struct mptcp_subflow_context *
+mptcp_subflow_ctx_by_pos(const struct mptcp_sock *msk, unsigned int pos)
+{
+	if (pos >= MPTCP_SUBFLOWS_MAX)
+		return NULL;
+	return msk->sched_data.contexts[pos];
+}
+
 static inline __u64 div_u64_rem(__u64 dividend, __u32 divisor, __u32 *remainder)
 {
 	*remainder = dividend % divisor;
@@ -79,13 +87,13 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
 	__u32 pace, burst, wmem;
-	int i, nr_active = 0;
 	__u64 linger_time;
 	struct sock *ssk;
+	int i;
 
 	/* pick the subflow with the lower wmem/wspace ratio */
 	for (i = 0; i < SSK_MODE_MAX; ++i) {
-		send_info[i].ssk = NULL;
+		send_info[i].subflow_id = MPTCP_SUBFLOWS_MAX;
 		send_info[i].linger_time = -1;
 	}
 
@@ -98,7 +106,6 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 		if (!mptcp_subflow_active(subflow))
 			continue;
 
-		nr_active += !subflow->backup;
 		pace = subflow->avg_pacing_rate;
 		if (!pace) {
 			/* init pacing rate from socket */
@@ -110,15 +117,15 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 
 		linger_time = div_u64((__u64)ssk->sk_wmem_queued << 32, pace);
 		if (linger_time < send_info[subflow->backup].linger_time) {
-			send_info[subflow->backup].ssk = ssk;
+			send_info[subflow->backup].subflow_id = i;
 			send_info[subflow->backup].linger_time = linger_time;
 		}
 	}
 	mptcp_set_timeout(sk);
 
 	/* pick the best backup if no other subflow is active */
-	if (!nr_active)
-		send_info[SSK_MODE_ACTIVE].ssk = send_info[SSK_MODE_BACKUP].ssk;
+	if (send_info[SSK_MODE_ACTIVE].subflow_id == MPTCP_SUBFLOWS_MAX)
+		send_info[SSK_MODE_ACTIVE].subflow_id = send_info[SSK_MODE_BACKUP].subflow_id;
 
 	/* Pass "send_info[SSK_MODE_ACTIVE].ssk" directly to bpf_sk_stream_memory_free()
 	 * will get an error:
@@ -127,7 +134,7 @@ static int bpf_burst_get_send(const struct mptcp_sock *msk,
 	 * then use mptcp_subflow_tcp_sock() to get the ssk,
 	 * and pass the ssk to bpf_sk_stream_memory_free().
 	 */
-	subflow = bpf_mptcp_subflow_ctx(send_info[SSK_MODE_ACTIVE].ssk, data);
+	subflow = mptcp_subflow_ctx_by_pos(msk, send_info[SSK_MODE_ACTIVE].subflow_id);
 	ssk = mptcp_subflow_tcp_sock(subflow);
 	if (!ssk || !bpf_sk_stream_memory_free(ssk))
 		return -1;
'''

> 
> There are a number of possible follow-ups to the above, not strictly
> related to this series, but IMHO needed before upstreaming this code:
> 
> -we could remove the 'context' array from the data directly visible to
> the ebpf code. 
> - Instead we could export to the scheduler the number of subflows
> currently present into the msk socket, so the scheduler itself will not
> have to always ask for all the MPTCP_SUBFLOWS_MAX possible subflows. 

This for loop in BPF will get the "invalid variable offset" error too:

for (int i = 0; i < number_of_subflows; i++)
	contexts[i];

So we have to loop from 0 to MAX like this:

for (i = 0; i < SSK_MODE_MAX; i++)
	contexts[i];

Thanks,
-Geliang

> - we could drop entirely the data_init() scheduler operation: all the
> data should be make available/prepared by the core itself, no need to
> add an indirect call to always do the same operation.
> 
> /P
> 

  reply	other threads:[~2023-06-12 13:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-08  5:46 [PATCH mptcp-next v8 00/17] BPF packet scheduler updates Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 01/17] Squash to "mptcp: drop last_snd and MPTCP_RESET_SCHEDULER" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 02/17] Squash to "mptcp: add struct mptcp_sched_ops" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 03/17] Squash to "mptcp: add scheduler wrappers" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 04/17] mptcp: add last_snd in sched_data Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 05/17] mptcp: add snd_burst " Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 06/17] mptcp: register default scheduler Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 07/17] Squash to "bpf: Add bpf_mptcp_sched_ops" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 08/17] Squash to "selftests/bpf: Add mptcp sched structs" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 09/17] Squash to "selftests/bpf: Add bpf_rr scheduler" Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 10/17] mptcp: add two wrappers needed by bpf_burst Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 11/17] bpf: Add bpf_burst write accesses Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 12/17] bpf: Export more bpf_burst related functions Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 13/17] selftests/bpf: Add bpf_burst scheduler Geliang Tang
2023-06-09  9:57   ` Paolo Abeni
2023-06-09 13:32     ` Geliang Tang
2023-06-09 14:40       ` Paolo Abeni
2023-06-10  1:45         ` Geliang Tang
2023-06-12 11:05           ` Paolo Abeni
2023-06-12 13:29             ` Geliang Tang [this message]
2023-06-12 14:22               ` Paolo Abeni
2023-06-13  5:36                 ` Geliang Tang
2023-06-13  9:35                   ` Paolo Abeni
2023-06-13 12:32                 ` Geliang Tang
2023-06-13 13:36                   ` Paolo Abeni
2023-06-13 14:03                     ` Geliang Tang
2023-06-13 14:18                       ` Paolo Abeni
2023-06-08  5:46 ` [PATCH mptcp-next v8 14/17] selftests/bpf: Add bpf_burst test Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 15/17] bpf: Add subflow bit flags write accesses Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 16/17] selftests/bpf: Add bpf_stale scheduler Geliang Tang
2023-06-08  5:46 ` [PATCH mptcp-next v8 17/17] selftests/bpf: Add bpf_stale test Geliang Tang
2023-06-08  6:29   ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-08  6:56   ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI
2023-06-13 10:08   ` selftests/bpf: Add bpf_stale test: Build Failure MPTCP CI
2023-06-13 10:57   ` selftests/bpf: Add bpf_stale test: Tests Results MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230612132934.GA9248@localhost \
    --to=geliang.tang@suse.com \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.