* [PATCH net-next 01/10] mptcp: refactor push_pending logic
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 02/10] mptcp: drop last_snd and MPTCP_RESET_SCHEDULER Mat Martineau
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
To support redundant package schedulers more easily, this patch refactors
__mptcp_push_pending() logic from:
For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Push/release on final subflow
->
While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc
Push/release on final subflow
Refactors __mptcp_subflow_push_pending logic from:
For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
Update msk->first_pending
->
While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc
Move the duplicate code from __mptcp_push_pending() and
__mptcp_subflow_push_pending() into a new helper function, named
__subflow_push_pending(). Simplify __mptcp_push_pending() and
__mptcp_subflow_push_pending() by invoking this helper.
Also move the burst check conditions out of the function
mptcp_subflow_get_send(), check them in __subflow_push_pending() in
the inner "for each pending dfrag" loop.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 153 +++++++++++++++++++++++++++------------------------
1 file changed, 81 insertions(+), 72 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6019a3cf1625..29c662ffcd05 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1386,14 +1386,6 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
sk_stream_memory_free(msk->first) ? msk->first : NULL;
}
- /* re-use last subflow, if the burst allow that */
- if (msk->last_snd && msk->snd_burst > 0 &&
- sk_stream_memory_free(msk->last_snd) &&
- mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
- mptcp_set_timeout(sk);
- return msk->last_snd;
- }
-
/* pick the subflow with the lower wmem/wspace ratio */
for (i = 0; i < SSK_MODE_MAX; ++i) {
send_info[i].ssk = NULL;
@@ -1499,57 +1491,86 @@ void mptcp_check_and_set_pending(struct sock *sk)
mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING);
}
-void __mptcp_push_pending(struct sock *sk, unsigned int flags)
+static int __subflow_push_pending(struct sock *sk, struct sock *ssk,
+ struct mptcp_sendmsg_info *info)
{
- struct sock *prev_ssk = NULL, *ssk = NULL;
struct mptcp_sock *msk = mptcp_sk(sk);
- struct mptcp_sendmsg_info info = {
- .flags = flags,
- };
- bool do_check_data_fin = false;
struct mptcp_data_frag *dfrag;
- int len;
+ int len, copied = 0, err = 0;
while ((dfrag = mptcp_send_head(sk))) {
- info.sent = dfrag->already_sent;
- info.limit = dfrag->data_len;
+ info->sent = dfrag->already_sent;
+ info->limit = dfrag->data_len;
len = dfrag->data_len - dfrag->already_sent;
while (len > 0) {
int ret = 0;
- prev_ssk = ssk;
- ssk = mptcp_subflow_get_send(msk);
-
- /* First check. If the ssk has changed since
- * the last round, release prev_ssk
- */
- if (ssk != prev_ssk && prev_ssk)
- mptcp_push_release(prev_ssk, &info);
- if (!ssk)
- goto out;
-
- /* Need to lock the new subflow only if different
- * from the previous one, otherwise we are still
- * helding the relevant lock
- */
- if (ssk != prev_ssk)
- lock_sock(ssk);
-
- ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
+ ret = mptcp_sendmsg_frag(sk, ssk, dfrag, info);
if (ret <= 0) {
- if (ret == -EAGAIN)
- continue;
- mptcp_push_release(ssk, &info);
+ err = copied ? : ret;
goto out;
}
- do_check_data_fin = true;
- info.sent += ret;
+ info->sent += ret;
+ copied += ret;
len -= ret;
mptcp_update_post_push(msk, dfrag, ret);
}
WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
+
+ if (msk->snd_burst <= 0 ||
+ !sk_stream_memory_free(ssk) ||
+ !mptcp_subflow_active(mptcp_subflow_ctx(ssk))) {
+ err = copied;
+ goto out;
+ }
+ mptcp_set_timeout(sk);
+ }
+ err = copied;
+
+out:
+ return err;
+}
+
+void __mptcp_push_pending(struct sock *sk, unsigned int flags)
+{
+ struct sock *prev_ssk = NULL, *ssk = NULL;
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct mptcp_sendmsg_info info = {
+ .flags = flags,
+ };
+ bool do_check_data_fin = false;
+
+ while (mptcp_send_head(sk)) {
+ int ret = 0;
+
+ prev_ssk = ssk;
+ ssk = mptcp_subflow_get_send(msk);
+
+ /* First check. If the ssk has changed since
+ * the last round, release prev_ssk
+ */
+ if (ssk != prev_ssk && prev_ssk)
+ mptcp_push_release(prev_ssk, &info);
+ if (!ssk)
+ goto out;
+
+ /* Need to lock the new subflow only if different
+ * from the previous one, otherwise we are still
+ * helding the relevant lock
+ */
+ if (ssk != prev_ssk)
+ lock_sock(ssk);
+
+ ret = __subflow_push_pending(sk, ssk, &info);
+ if (ret <= 0) {
+ if (ret == -EAGAIN)
+ continue;
+ mptcp_push_release(ssk, &info);
+ goto out;
+ }
+ do_check_data_fin = true;
}
/* at this point we held the socket lock for the last subflow we used */
@@ -1570,42 +1591,30 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
struct mptcp_sendmsg_info info = {
.data_lock_held = true,
};
- struct mptcp_data_frag *dfrag;
struct sock *xmit_ssk;
- int len, copied = 0;
+ int copied = 0;
info.flags = 0;
- while ((dfrag = mptcp_send_head(sk))) {
- info.sent = dfrag->already_sent;
- info.limit = dfrag->data_len;
- len = dfrag->data_len - dfrag->already_sent;
- while (len > 0) {
- int ret = 0;
-
- /* check for a different subflow usage only after
- * spooling the first chunk of data
- */
- xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk);
- if (!xmit_ssk)
- goto out;
- if (xmit_ssk != ssk) {
- mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
- MPTCP_DELEGATE_SEND);
- goto out;
- }
-
- ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
- if (ret <= 0)
- goto out;
+ while (mptcp_send_head(sk)) {
+ int ret = 0;
- info.sent += ret;
- copied += ret;
- len -= ret;
- first = false;
-
- mptcp_update_post_push(msk, dfrag, ret);
+ /* check for a different subflow usage only after
+ * spooling the first chunk of data
+ */
+ xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk);
+ if (!xmit_ssk)
+ goto out;
+ if (xmit_ssk != ssk) {
+ mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
+ MPTCP_DELEGATE_SEND);
+ goto out;
}
- WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
+
+ ret = __subflow_push_pending(sk, ssk, &info);
+ first = false;
+ if (ret <= 0)
+ break;
+ copied += ret;
}
out:
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 02/10] mptcp: drop last_snd and MPTCP_RESET_SCHEDULER
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 01/10] mptcp: refactor push_pending logic Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 03/10] mptcp: add struct mptcp_sched_ops Mat Martineau
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
Since the burst check conditions have moved out of the function
mptcp_subflow_get_send(), it makes all msk->last_snd useless.
This patch drops them as well as the macro MPTCP_RESET_SCHEDULER.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/pm.c | 9 +--------
net/mptcp/pm_netlink.c | 3 ---
net/mptcp/protocol.c | 11 +----------
net/mptcp/protocol.h | 2 --
4 files changed, 2 insertions(+), 23 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 7dbbad1e4f55..d8da5374d9e1 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -299,15 +299,8 @@ void mptcp_pm_mp_prio_received(struct sock *ssk, u8 bkup)
pr_debug("subflow->backup=%d, bkup=%d\n", subflow->backup, bkup);
msk = mptcp_sk(sk);
- if (subflow->backup != bkup) {
+ if (subflow->backup != bkup)
subflow->backup = bkup;
- mptcp_data_lock(sk);
- if (!sock_owned_by_user(sk))
- msk->last_snd = NULL;
- else
- __set_bit(MPTCP_RESET_SCHEDULER, &msk->cb_flags);
- mptcp_data_unlock(sk);
- }
mptcp_event(MPTCP_EVENT_SUB_PRIORITY, msk, ssk, GFP_ATOMIC);
}
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index c75d9d88a053..9661f3812682 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -472,9 +472,6 @@ static void __mptcp_pm_send_ack(struct mptcp_sock *msk, struct mptcp_subflow_con
slow = lock_sock_fast(ssk);
if (prio) {
- if (subflow->backup != backup)
- msk->last_snd = NULL;
-
subflow->send_mp_prio = 1;
subflow->backup = backup;
subflow->request_bkup = backup;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 29c662ffcd05..f15ff80be30f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1438,16 +1438,13 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
burst = min_t(int, MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt);
wmem = READ_ONCE(ssk->sk_wmem_queued);
- if (!burst) {
- msk->last_snd = NULL;
+ if (!burst)
return ssk;
- }
subflow = mptcp_subflow_ctx(ssk);
subflow->avg_pacing_rate = div_u64((u64)subflow->avg_pacing_rate * wmem +
READ_ONCE(ssk->sk_pacing_rate) * burst,
burst + wmem);
- msk->last_snd = ssk;
msk->snd_burst = burst;
return ssk;
}
@@ -2379,9 +2376,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
WRITE_ONCE(msk->first, NULL);
out:
- if (ssk == msk->last_snd)
- msk->last_snd = NULL;
-
if (need_push)
__mptcp_push_pending(sk, 0);
}
@@ -3046,7 +3040,6 @@ static int mptcp_disconnect(struct sock *sk, int flags)
* subflow
*/
mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
- msk->last_snd = NULL;
WRITE_ONCE(msk->flags, 0);
msk->cb_flags = 0;
msk->push_pending = 0;
@@ -3316,8 +3309,6 @@ static void mptcp_release_cb(struct sock *sk)
__mptcp_set_connected(sk);
if (__test_and_clear_bit(MPTCP_ERROR_REPORT, &msk->cb_flags))
__mptcp_error_report(sk);
- if (__test_and_clear_bit(MPTCP_RESET_SCHEDULER, &msk->cb_flags))
- msk->last_snd = NULL;
}
__mptcp_update_rmem(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 38c7ea013361..cbf9a9e176b2 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -123,7 +123,6 @@
#define MPTCP_RETRANSMIT 4
#define MPTCP_FLUSH_JOIN_LIST 5
#define MPTCP_CONNECTED 6
-#define MPTCP_RESET_SCHEDULER 7
struct mptcp_skb_cb {
u64 map_seq;
@@ -269,7 +268,6 @@ struct mptcp_sock {
u64 rcv_data_fin_seq;
u64 bytes_retrans;
int rmem_fwd_alloc;
- struct sock *last_snd;
int snd_burst;
int old_wspace;
u64 recovery_snd_nxt; /* in recovery mode accept up to this seq;
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 03/10] mptcp: add struct mptcp_sched_ops
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 01/10] mptcp: refactor push_pending logic Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 02/10] mptcp: drop last_snd and MPTCP_RESET_SCHEDULER Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 04/10] mptcp: add a new sysctl scheduler Mat Martineau
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch defines struct mptcp_sched_ops, which has three struct members,
name, owner and list, and four function pointers: init(), release() and
get_subflow().
The scheduler function get_subflow() have a struct mptcp_sched_data
parameter, which contains a reinject flag for retrans or not, a subflows
number and a mptcp_subflow_context array.
Add the scheduler registering, unregistering and finding functions to add,
delete and find a packet scheduler on the global list mptcp_sched_list.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
include/net/mptcp.h | 21 ++++++++++++++++++++
net/mptcp/Makefile | 2 +-
net/mptcp/protocol.h | 3 +++
net/mptcp/sched.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 81 insertions(+), 1 deletion(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 3c5c68618fcc..fb996124b3d5 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -96,6 +96,27 @@ struct mptcp_out_options {
#endif
};
+#define MPTCP_SCHED_NAME_MAX 16
+#define MPTCP_SUBFLOWS_MAX 8
+
+struct mptcp_sched_data {
+ bool reinject;
+ u8 subflows;
+ struct mptcp_subflow_context *contexts[MPTCP_SUBFLOWS_MAX];
+};
+
+struct mptcp_sched_ops {
+ int (*get_subflow)(struct mptcp_sock *msk,
+ struct mptcp_sched_data *data);
+
+ char name[MPTCP_SCHED_NAME_MAX];
+ struct module *owner;
+ struct list_head list;
+
+ void (*init)(struct mptcp_sock *msk);
+ void (*release)(struct mptcp_sock *msk);
+} ____cacheline_aligned_in_smp;
+
#ifdef CONFIG_MPTCP
void mptcp_init(void);
diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile
index a3829ce548f9..84e531f86b82 100644
--- a/net/mptcp/Makefile
+++ b/net/mptcp/Makefile
@@ -2,7 +2,7 @@
obj-$(CONFIG_MPTCP) += mptcp.o
mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o pm.o diag.o \
- mib.o pm_netlink.o sockopt.o pm_userspace.o fastopen.o
+ mib.o pm_netlink.o sockopt.o pm_userspace.o fastopen.o sched.o
obj-$(CONFIG_SYN_COOKIES) += syncookies.o
obj-$(CONFIG_INET_MPTCP_DIAG) += mptcp_diag.o
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index cbf9a9e176b2..985e8f86668d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -655,6 +655,9 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
void mptcp_info2sockaddr(const struct mptcp_addr_info *info,
struct sockaddr_storage *addr,
unsigned short family);
+struct mptcp_sched_ops *mptcp_sched_find(const char *name);
+int mptcp_register_scheduler(struct mptcp_sched_ops *sched);
+void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched);
static inline bool __tcp_can_send(const struct sock *ssk)
{
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
new file mode 100644
index 000000000000..c5d3bbafba71
--- /dev/null
+++ b/net/mptcp/sched.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Multipath TCP
+ *
+ * Copyright (c) 2022, SUSE.
+ */
+
+#define pr_fmt(fmt) "MPTCP: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/rculist.h>
+#include <linux/spinlock.h>
+#include "protocol.h"
+
+static DEFINE_SPINLOCK(mptcp_sched_list_lock);
+static LIST_HEAD(mptcp_sched_list);
+
+/* Must be called with rcu read lock held */
+struct mptcp_sched_ops *mptcp_sched_find(const char *name)
+{
+ struct mptcp_sched_ops *sched, *ret = NULL;
+
+ list_for_each_entry_rcu(sched, &mptcp_sched_list, list) {
+ if (!strcmp(sched->name, name)) {
+ ret = sched;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+int mptcp_register_scheduler(struct mptcp_sched_ops *sched)
+{
+ if (!sched->get_subflow)
+ return -EINVAL;
+
+ spin_lock(&mptcp_sched_list_lock);
+ if (mptcp_sched_find(sched->name)) {
+ spin_unlock(&mptcp_sched_list_lock);
+ return -EEXIST;
+ }
+ list_add_tail_rcu(&sched->list, &mptcp_sched_list);
+ spin_unlock(&mptcp_sched_list_lock);
+
+ pr_debug("%s registered", sched->name);
+ return 0;
+}
+
+void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched)
+{
+ spin_lock(&mptcp_sched_list_lock);
+ list_del_rcu(&sched->list);
+ spin_unlock(&mptcp_sched_list_lock);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 04/10] mptcp: add a new sysctl scheduler
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (2 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 03/10] mptcp: add struct mptcp_sched_ops Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 05/10] mptcp: add sched in mptcp_sock Mat Martineau
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch adds a new sysctl, named scheduler, to support for selection
of different schedulers. Export mptcp_get_scheduler helper to get this
sysctl.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
Documentation/networking/mptcp-sysctl.rst | 8 ++++++++
net/mptcp/ctrl.c | 14 ++++++++++++++
net/mptcp/protocol.h | 1 +
3 files changed, 23 insertions(+)
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 213510698014..15f1919d640c 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -74,3 +74,11 @@ stale_loss_cnt - INTEGER
This is a per-namespace sysctl.
Default: 4
+
+scheduler - STRING
+ Select the scheduler of your choice.
+
+ Support for selection of different schedulers. This is a per-namespace
+ sysctl.
+
+ Default: "default"
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index ae20b7d92e28..c46c22a84d23 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -32,6 +32,7 @@ struct mptcp_pernet {
u8 checksum_enabled;
u8 allow_join_initial_addr_port;
u8 pm_type;
+ char scheduler[MPTCP_SCHED_NAME_MAX];
};
static struct mptcp_pernet *mptcp_get_pernet(const struct net *net)
@@ -69,6 +70,11 @@ int mptcp_get_pm_type(const struct net *net)
return mptcp_get_pernet(net)->pm_type;
}
+const char *mptcp_get_scheduler(const struct net *net)
+{
+ return mptcp_get_pernet(net)->scheduler;
+}
+
static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
{
pernet->mptcp_enabled = 1;
@@ -77,6 +83,7 @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
pernet->allow_join_initial_addr_port = 1;
pernet->stale_loss_cnt = 4;
pernet->pm_type = MPTCP_PM_TYPE_KERNEL;
+ strcpy(pernet->scheduler, "default");
}
#ifdef CONFIG_SYSCTL
@@ -128,6 +135,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = &mptcp_pm_type_max
},
+ {
+ .procname = "scheduler",
+ .maxlen = MPTCP_SCHED_NAME_MAX,
+ .mode = 0644,
+ .proc_handler = proc_dostring,
+ },
{}
};
@@ -149,6 +162,7 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
table[3].data = &pernet->allow_join_initial_addr_port;
table[4].data = &pernet->stale_loss_cnt;
table[5].data = &pernet->pm_type;
+ table[6].data = &pernet->scheduler;
hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table);
if (!hdr)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 985e8f86668d..bfa13a50f276 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -623,6 +623,7 @@ int mptcp_is_checksum_enabled(const struct net *net);
int mptcp_allow_join_id0(const struct net *net);
unsigned int mptcp_stale_loss_cnt(const struct net *net);
int mptcp_get_pm_type(const struct net *net);
+const char *mptcp_get_scheduler(const struct net *net);
void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
const struct mptcp_options_received *mp_opt);
bool __mptcp_retransmit_pending_data(struct sock *sk);
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 05/10] mptcp: add sched in mptcp_sock
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (3 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 04/10] mptcp: add a new sysctl scheduler Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 06/10] mptcp: add scheduled in mptcp_subflow_context Mat Martineau
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch adds a new struct member sched in struct mptcp_sock.
And two helpers mptcp_init_sched() and mptcp_release_sched() to
init and release it.
Init it with the sysctl scheduler in mptcp_init_sock(), copy the
scheduler from the parent in mptcp_sk_clone(), and release it in
__mptcp_destroy_sock().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 8 ++++++++
net/mptcp/protocol.h | 4 ++++
net/mptcp/sched.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f15ff80be30f..54a3eccfa731 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2697,6 +2697,7 @@ static void mptcp_ca_reset(struct sock *sk)
static int mptcp_init_sock(struct sock *sk)
{
struct net *net = sock_net(sk);
+ int ret;
__mptcp_init_sock(sk);
@@ -2706,6 +2707,11 @@ static int mptcp_init_sock(struct sock *sk)
if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net))
return -ENOMEM;
+ ret = mptcp_init_sched(mptcp_sk(sk),
+ mptcp_sched_find(mptcp_get_scheduler(net)));
+ if (ret)
+ return ret;
+
set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
/* fetch the ca name; do it outside __mptcp_init_sock(), so that clone will
@@ -2851,6 +2857,7 @@ static void __mptcp_destroy_sock(struct sock *sk)
mptcp_stop_timer(sk);
sk_stop_timer(sk, &sk->sk_timer);
msk->pm.status = 0;
+ mptcp_release_sched(msk);
sk->sk_prot->destroy(sk);
@@ -3105,6 +3112,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
msk->snd_una = msk->write_seq;
msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd;
msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
+ mptcp_init_sched(msk, mptcp_sk(sk)->sched);
/* passive msk is created after the first/MPC subflow */
msk->subflow_id = 2;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index bfa13a50f276..548c302a757e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -312,6 +312,7 @@ struct mptcp_sock {
* lock as such sock is freed after close().
*/
struct mptcp_pm_data pm;
+ struct mptcp_sched_ops *sched;
struct {
u32 space; /* bytes copied in last measurement window */
u32 copied; /* bytes copied in this measurement window */
@@ -659,6 +660,9 @@ void mptcp_info2sockaddr(const struct mptcp_addr_info *info,
struct mptcp_sched_ops *mptcp_sched_find(const char *name);
int mptcp_register_scheduler(struct mptcp_sched_ops *sched);
void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched);
+int mptcp_init_sched(struct mptcp_sock *msk,
+ struct mptcp_sched_ops *sched);
+void mptcp_release_sched(struct mptcp_sock *msk);
static inline bool __tcp_can_send(const struct sock *ssk)
{
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index c5d3bbafba71..53773668b5ee 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -54,3 +54,36 @@ void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched)
list_del_rcu(&sched->list);
spin_unlock(&mptcp_sched_list_lock);
}
+
+int mptcp_init_sched(struct mptcp_sock *msk,
+ struct mptcp_sched_ops *sched)
+{
+ if (!sched)
+ goto out;
+
+ if (!bpf_try_module_get(sched, sched->owner))
+ return -EBUSY;
+
+ msk->sched = sched;
+ if (msk->sched->init)
+ msk->sched->init(msk);
+
+ pr_debug("sched=%s", msk->sched->name);
+
+out:
+ return 0;
+}
+
+void mptcp_release_sched(struct mptcp_sock *msk)
+{
+ struct mptcp_sched_ops *sched = msk->sched;
+
+ if (!sched)
+ return;
+
+ msk->sched = NULL;
+ if (sched->release)
+ sched->release(msk);
+
+ bpf_module_put(sched, sched->owner);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 06/10] mptcp: add scheduled in mptcp_subflow_context
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (4 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 05/10] mptcp: add sched in mptcp_sock Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 07/10] mptcp: add scheduler wrappers Mat Martineau
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch adds a new member scheduled in struct mptcp_subflow_context,
which will be set in the MPTCP scheduler context when the scheduler
picks this subflow to send data.
Add a new helper mptcp_subflow_set_scheduled() to set this flag using
WRITE_ONCE().
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.h | 3 +++
net/mptcp/sched.c | 6 ++++++
2 files changed, 9 insertions(+)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 548c302a757e..e7523a40132f 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -491,6 +491,7 @@ struct mptcp_subflow_context {
is_mptfo : 1, /* subflow is doing TFO */
__unused : 9;
enum mptcp_data_avail data_avail;
+ bool scheduled;
u32 remote_nonce;
u64 thmac;
u32 local_nonce;
@@ -663,6 +664,8 @@ void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched);
int mptcp_init_sched(struct mptcp_sock *msk,
struct mptcp_sched_ops *sched);
void mptcp_release_sched(struct mptcp_sock *msk);
+void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
+ bool scheduled);
static inline bool __tcp_can_send(const struct sock *ssk)
{
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 53773668b5ee..d295b92a5789 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -87,3 +87,9 @@ void mptcp_release_sched(struct mptcp_sock *msk)
bpf_module_put(sched, sched->owner);
}
+
+void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
+ bool scheduled)
+{
+ WRITE_ONCE(subflow->scheduled, scheduled);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 07/10] mptcp: add scheduler wrappers
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (5 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 06/10] mptcp: add scheduled in mptcp_subflow_context Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 08/10] mptcp: use get_send wrapper Mat Martineau
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch defines two packet scheduler wrappers mptcp_sched_get_send()
and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in
them.
Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
mptcp_sched_get_send().
If msk->sched is NULL, use default functions mptcp_subflow_get_send()
and mptcp_subflow_get_retrans() to send data.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 4 ++--
net/mptcp/protocol.h | 4 ++++
net/mptcp/sched.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 54a3eccfa731..9cd172d2c8d6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1366,7 +1366,7 @@ bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
* returns the subflow that will transmit the next DSS
* additionally updates the rtx timeout
*/
-static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
+struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
{
struct subflow_send_info send_info[SSK_MODE_MAX];
struct mptcp_subflow_context *subflow;
@@ -2204,7 +2204,7 @@ static void mptcp_timeout_timer(struct timer_list *t)
*
* A backup subflow is returned only if that is the only kind available.
*/
-static struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
+struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
{
struct sock *backup = NULL, *pick = NULL;
struct mptcp_subflow_context *subflow;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e7523a40132f..78562f695c46 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -666,6 +666,10 @@ int mptcp_init_sched(struct mptcp_sock *msk,
void mptcp_release_sched(struct mptcp_sock *msk);
void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
bool scheduled);
+struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk);
+struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk);
+int mptcp_sched_get_send(struct mptcp_sock *msk);
+int mptcp_sched_get_retrans(struct mptcp_sock *msk);
static inline bool __tcp_can_send(const struct sock *ssk)
{
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index d295b92a5789..884606686cfe 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -93,3 +93,51 @@ void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
{
WRITE_ONCE(subflow->scheduled, scheduled);
}
+
+int mptcp_sched_get_send(struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sched_data data;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled))
+ return 0;
+ }
+
+ if (!msk->sched) {
+ struct sock *ssk;
+
+ ssk = mptcp_subflow_get_send(msk);
+ if (!ssk)
+ return -EINVAL;
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+ return 0;
+ }
+
+ data.reinject = false;
+ return msk->sched->get_subflow(msk, &data);
+}
+
+int mptcp_sched_get_retrans(struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sched_data data;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled))
+ return 0;
+ }
+
+ if (!msk->sched) {
+ struct sock *ssk;
+
+ ssk = mptcp_subflow_get_retrans(msk);
+ if (!ssk)
+ return -EINVAL;
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+ return 0;
+ }
+
+ data.reinject = true;
+ return msk->sched->get_subflow(msk, &data);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 08/10] mptcp: use get_send wrapper
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (6 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 07/10] mptcp: add scheduler wrappers Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 09/10] mptcp: use get_retrans wrapper Mat Martineau
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch adds the multiple subflows support for __mptcp_push_pending
and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
mptcp_subflow_get_send() in them.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move msk_owned_by_me() and fallback checks into get_send() wrapper from
mptcp_subflow_get_send().
This commit allows the scheduler to set the subflow->scheduled bit in
multiple subflows, but it does not allow for sending redundant data.
Multiple scheduled subflows will send sequential data on each subflow.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 113 +++++++++++++++++++++++++++++++--------------------
net/mptcp/sched.c | 13 ++++++
2 files changed, 81 insertions(+), 45 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9cd172d2c8d6..77e94ee82859 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1377,15 +1377,6 @@ struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
u64 linger_time;
long tout = 0;
- msk_owned_by_me(msk);
-
- if (__mptcp_check_fallback(msk)) {
- if (!msk->first)
- return NULL;
- return __tcp_can_send(msk->first) &&
- sk_stream_memory_free(msk->first) ? msk->first : NULL;
- }
-
/* pick the subflow with the lower wmem/wspace ratio */
for (i = 0; i < SSK_MODE_MAX; ++i) {
send_info[i].ssk = NULL;
@@ -1538,43 +1529,56 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
.flags = flags,
};
bool do_check_data_fin = false;
+ int push_count = 1;
- while (mptcp_send_head(sk)) {
+ while (mptcp_send_head(sk) && (push_count > 0)) {
+ struct mptcp_subflow_context *subflow;
int ret = 0;
- prev_ssk = ssk;
- ssk = mptcp_subflow_get_send(msk);
+ if (mptcp_sched_get_send(msk))
+ break;
- /* First check. If the ssk has changed since
- * the last round, release prev_ssk
- */
- if (ssk != prev_ssk && prev_ssk)
- mptcp_push_release(prev_ssk, &info);
- if (!ssk)
- goto out;
+ push_count = 0;
- /* Need to lock the new subflow only if different
- * from the previous one, otherwise we are still
- * helding the relevant lock
- */
- if (ssk != prev_ssk)
- lock_sock(ssk);
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ mptcp_subflow_set_scheduled(subflow, false);
- ret = __subflow_push_pending(sk, ssk, &info);
- if (ret <= 0) {
- if (ret == -EAGAIN)
- continue;
- mptcp_push_release(ssk, &info);
- goto out;
+ prev_ssk = ssk;
+ ssk = mptcp_subflow_tcp_sock(subflow);
+ if (ssk != prev_ssk) {
+ /* First check. If the ssk has changed since
+ * the last round, release prev_ssk
+ */
+ if (prev_ssk)
+ mptcp_push_release(prev_ssk, &info);
+
+ /* Need to lock the new subflow only if different
+ * from the previous one, otherwise we are still
+ * helding the relevant lock
+ */
+ lock_sock(ssk);
+ }
+
+ push_count++;
+
+ ret = __subflow_push_pending(sk, ssk, &info);
+ if (ret <= 0) {
+ if (ret != -EAGAIN ||
+ (1 << ssk->sk_state) &
+ (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSE))
+ push_count--;
+ continue;
+ }
+ do_check_data_fin = true;
+ }
}
- do_check_data_fin = true;
}
/* at this point we held the socket lock for the last subflow we used */
if (ssk)
mptcp_push_release(ssk, &info);
-out:
/* ensure the rtx timer is running */
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
@@ -1588,30 +1592,49 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
struct mptcp_sendmsg_info info = {
.data_lock_held = true,
};
+ bool keep_pushing = true;
struct sock *xmit_ssk;
int copied = 0;
info.flags = 0;
- while (mptcp_send_head(sk)) {
+ while (mptcp_send_head(sk) && keep_pushing) {
+ struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
int ret = 0;
/* check for a different subflow usage only after
* spooling the first chunk of data
*/
- xmit_ssk = first ? ssk : mptcp_subflow_get_send(msk);
- if (!xmit_ssk)
- goto out;
- if (xmit_ssk != ssk) {
- mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
- MPTCP_DELEGATE_SEND);
+ if (first) {
+ mptcp_subflow_set_scheduled(subflow, false);
+ ret = __subflow_push_pending(sk, ssk, &info);
+ first = false;
+ if (ret <= 0)
+ break;
+ copied += ret;
+ continue;
+ }
+
+ if (mptcp_sched_get_send(msk))
goto out;
+
+ if (READ_ONCE(subflow->scheduled)) {
+ mptcp_subflow_set_scheduled(subflow, false);
+ ret = __subflow_push_pending(sk, ssk, &info);
+ if (ret <= 0)
+ keep_pushing = false;
+ copied += ret;
}
- ret = __subflow_push_pending(sk, ssk, &info);
- first = false;
- if (ret <= 0)
- break;
- copied += ret;
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ xmit_ssk = mptcp_subflow_tcp_sock(subflow);
+ if (xmit_ssk != ssk) {
+ mptcp_subflow_delegate(subflow,
+ MPTCP_DELEGATE_SEND);
+ keep_pushing = false;
+ }
+ }
+ }
}
out:
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 884606686cfe..078b5d44978d 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -99,6 +99,19 @@ int mptcp_sched_get_send(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
+ msk_owned_by_me(msk);
+
+ /* the following check is moved out of mptcp_subflow_get_send */
+ if (__mptcp_check_fallback(msk)) {
+ if (msk->first &&
+ __tcp_can_send(msk->first) &&
+ sk_stream_memory_free(msk->first)) {
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(msk->first), true);
+ return 0;
+ }
+ return -EINVAL;
+ }
+
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 09/10] mptcp: use get_retrans wrapper
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (7 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 08/10] mptcp: use get_send wrapper Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-21 22:25 ` [PATCH net-next 10/10] mptcp: register default scheduler Mat Martineau
2023-08-23 1:40 ` [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension patchwork-bot+netdevbpf
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch adds the multiple subflows support for __mptcp_retrans(). Use
get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move msk_owned_by_me() and fallback checks into get_retrans() wrapper
from mptcp_subflow_get_retrans().
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 65 ++++++++++++++++++++++++++++++----------------------
net/mptcp/sched.c | 6 +++++
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 77e94ee82859..61590ff2b9ee 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2233,11 +2233,6 @@ struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
int min_stale_count = INT_MAX;
- msk_owned_by_me(msk);
-
- if (__mptcp_check_fallback(msk))
- return NULL;
-
mptcp_for_each_subflow(msk, subflow) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
@@ -2515,16 +2510,17 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
static void __mptcp_retrans(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ struct mptcp_subflow_context *subflow;
struct mptcp_sendmsg_info info = {};
struct mptcp_data_frag *dfrag;
- size_t copied = 0;
struct sock *ssk;
- int ret;
+ int ret, err;
+ u16 len = 0;
mptcp_clean_una_wakeup(sk);
/* first check ssk: need to kick "stale" logic */
- ssk = mptcp_subflow_get_retrans(msk);
+ err = mptcp_sched_get_retrans(msk);
dfrag = mptcp_rtx_head(sk);
if (!dfrag) {
if (mptcp_data_fin_enabled(msk)) {
@@ -2543,32 +2539,45 @@ static void __mptcp_retrans(struct sock *sk)
goto reset_timer;
}
- if (!ssk)
+ if (err)
goto reset_timer;
- lock_sock(ssk);
+ mptcp_for_each_subflow(msk, subflow) {
+ if (READ_ONCE(subflow->scheduled)) {
+ u16 copied = 0;
- /* limit retransmission to the bytes already sent on some subflows */
- info.sent = 0;
- info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent;
- while (info.sent < info.limit) {
- ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
- if (ret <= 0)
- break;
+ mptcp_subflow_set_scheduled(subflow, false);
- MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
- copied += ret;
- info.sent += ret;
- }
- if (copied) {
- dfrag->already_sent = max(dfrag->already_sent, info.sent);
- msk->bytes_retrans += copied;
- tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
- info.size_goal);
- WRITE_ONCE(msk->allow_infinite_fallback, false);
+ ssk = mptcp_subflow_tcp_sock(subflow);
+
+ lock_sock(ssk);
+
+ /* limit retransmission to the bytes already sent on some subflows */
+ info.sent = 0;
+ info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len :
+ dfrag->already_sent;
+ while (info.sent < info.limit) {
+ ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
+ if (ret <= 0)
+ break;
+
+ MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS);
+ copied += ret;
+ info.sent += ret;
+ }
+ if (copied) {
+ len = max(copied, len);
+ tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
+ info.size_goal);
+ WRITE_ONCE(msk->allow_infinite_fallback, false);
+ }
+
+ release_sock(ssk);
+ }
}
- release_sock(ssk);
+ msk->bytes_retrans += len;
+ dfrag->already_sent = max(dfrag->already_sent, len);
reset_timer:
mptcp_check_and_set_pending(sk);
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 078b5d44978d..cac1cc1fa3b0 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -136,6 +136,12 @@ int mptcp_sched_get_retrans(struct mptcp_sock *msk)
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
+ msk_owned_by_me(msk);
+
+ /* the following check is moved out of mptcp_subflow_get_retrans */
+ if (__mptcp_check_fallback(msk))
+ return -EINVAL;
+
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next 10/10] mptcp: register default scheduler
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (8 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 09/10] mptcp: use get_retrans wrapper Mat Martineau
@ 2023-08-21 22:25 ` Mat Martineau
2023-08-23 1:40 ` [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension patchwork-bot+netdevbpf
10 siblings, 0 replies; 12+ messages in thread
From: Mat Martineau @ 2023-08-21 22:25 UTC (permalink / raw)
To: Matthieu Baerts, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, mptcp, Geliang Tang, Mat Martineau
From: Geliang Tang <geliang.tang@suse.com>
This patch defines the default packet scheduler mptcp_sched_default.
Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init().
Skip deleting this default scheduler in mptcp_unregister_scheduler().
Set msk->sched to the default scheduler when the input parameter of
mptcp_init_sched() is NULL.
Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans()
if the defaut scheduler is set or msk->sched is NULL.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
---
net/mptcp/protocol.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/sched.c | 55 +++++++++++++++++++++++++++++++---------------------
3 files changed, 35 insertions(+), 22 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 61590ff2b9ee..933b257eee02 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3965,6 +3965,7 @@ void __init mptcp_proto_init(void)
mptcp_subflow_init();
mptcp_pm_init();
+ mptcp_sched_init();
mptcp_token_init();
if (proto_register(&mptcp_prot, 1) != 0)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 78562f695c46..7254b3562575 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -661,6 +661,7 @@ void mptcp_info2sockaddr(const struct mptcp_addr_info *info,
struct mptcp_sched_ops *mptcp_sched_find(const char *name);
int mptcp_register_scheduler(struct mptcp_sched_ops *sched);
void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched);
+void mptcp_sched_init(void);
int mptcp_init_sched(struct mptcp_sock *msk,
struct mptcp_sched_ops *sched);
void mptcp_release_sched(struct mptcp_sock *msk);
diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index cac1cc1fa3b0..4ab0693c069c 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -16,6 +16,26 @@
static DEFINE_SPINLOCK(mptcp_sched_list_lock);
static LIST_HEAD(mptcp_sched_list);
+static int mptcp_sched_default_get_subflow(struct mptcp_sock *msk,
+ struct mptcp_sched_data *data)
+{
+ struct sock *ssk;
+
+ ssk = data->reinject ? mptcp_subflow_get_retrans(msk) :
+ mptcp_subflow_get_send(msk);
+ if (!ssk)
+ return -EINVAL;
+
+ mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
+ return 0;
+}
+
+static struct mptcp_sched_ops mptcp_sched_default = {
+ .get_subflow = mptcp_sched_default_get_subflow,
+ .name = "default",
+ .owner = THIS_MODULE,
+};
+
/* Must be called with rcu read lock held */
struct mptcp_sched_ops *mptcp_sched_find(const char *name)
{
@@ -50,16 +70,24 @@ int mptcp_register_scheduler(struct mptcp_sched_ops *sched)
void mptcp_unregister_scheduler(struct mptcp_sched_ops *sched)
{
+ if (sched == &mptcp_sched_default)
+ return;
+
spin_lock(&mptcp_sched_list_lock);
list_del_rcu(&sched->list);
spin_unlock(&mptcp_sched_list_lock);
}
+void mptcp_sched_init(void)
+{
+ mptcp_register_scheduler(&mptcp_sched_default);
+}
+
int mptcp_init_sched(struct mptcp_sock *msk,
struct mptcp_sched_ops *sched)
{
if (!sched)
- goto out;
+ sched = &mptcp_sched_default;
if (!bpf_try_module_get(sched, sched->owner))
return -EBUSY;
@@ -70,7 +98,6 @@ int mptcp_init_sched(struct mptcp_sock *msk,
pr_debug("sched=%s", msk->sched->name);
-out:
return 0;
}
@@ -117,17 +144,9 @@ int mptcp_sched_get_send(struct mptcp_sock *msk)
return 0;
}
- if (!msk->sched) {
- struct sock *ssk;
-
- ssk = mptcp_subflow_get_send(msk);
- if (!ssk)
- return -EINVAL;
- mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
- return 0;
- }
-
data.reinject = false;
+ if (msk->sched == &mptcp_sched_default || !msk->sched)
+ return mptcp_sched_default_get_subflow(msk, &data);
return msk->sched->get_subflow(msk, &data);
}
@@ -147,16 +166,8 @@ int mptcp_sched_get_retrans(struct mptcp_sock *msk)
return 0;
}
- if (!msk->sched) {
- struct sock *ssk;
-
- ssk = mptcp_subflow_get_retrans(msk);
- if (!ssk)
- return -EINVAL;
- mptcp_subflow_set_scheduled(mptcp_subflow_ctx(ssk), true);
- return 0;
- }
-
data.reinject = true;
+ if (msk->sched == &mptcp_sched_default || !msk->sched)
+ return mptcp_sched_default_get_subflow(msk, &data);
return msk->sched->get_subflow(msk, &data);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension
2023-08-21 22:25 [PATCH net-next 00/10] mptcp: Prepare MPTCP packet scheduler for BPF extension Mat Martineau
` (9 preceding siblings ...)
2023-08-21 22:25 ` [PATCH net-next 10/10] mptcp: register default scheduler Mat Martineau
@ 2023-08-23 1:40 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-23 1:40 UTC (permalink / raw)
To: Mat Martineau
Cc: matthieu.baerts, davem, edumazet, kuba, pabeni, netdev, mptcp,
geliang.tang
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 21 Aug 2023 15:25:11 -0700 you wrote:
> The kernel's MPTCP packet scheduler has, to date, been a one-size-fits
> all algorithm that is hard-coded. It attempts to balance latency and
> throughput when transmitting data across multiple TCP subflows, and has
> some limited tunability through sysctls. It has been a long-term goal of
> the Linux MPTCP community to support customizable packet schedulers for
> use cases that need to make different trade-offs regarding latency,
> throughput, redundancy, and other metrics. BPF is well-suited for
> configuring customized, per-packet scheduling decisions without having
> to modify the kernel or manage out-of-tree kernel modules.
>
> [...]
Here is the summary with links:
- [net-next,01/10] mptcp: refactor push_pending logic
https://git.kernel.org/netdev/net-next/c/c5b4297dee91
- [net-next,02/10] mptcp: drop last_snd and MPTCP_RESET_SCHEDULER
https://git.kernel.org/netdev/net-next/c/ebc1e08f01eb
- [net-next,03/10] mptcp: add struct mptcp_sched_ops
https://git.kernel.org/netdev/net-next/c/740ebe35bd3f
- [net-next,04/10] mptcp: add a new sysctl scheduler
https://git.kernel.org/netdev/net-next/c/e3b2870b6d22
- [net-next,05/10] mptcp: add sched in mptcp_sock
https://git.kernel.org/netdev/net-next/c/1730b2b2c5a5
- [net-next,06/10] mptcp: add scheduled in mptcp_subflow_context
https://git.kernel.org/netdev/net-next/c/fce68b03086f
- [net-next,07/10] mptcp: add scheduler wrappers
https://git.kernel.org/netdev/net-next/c/07336a87fe87
- [net-next,08/10] mptcp: use get_send wrapper
https://git.kernel.org/netdev/net-next/c/0fa1b3783a17
- [net-next,09/10] mptcp: use get_retrans wrapper
https://git.kernel.org/netdev/net-next/c/ee2708aedad0
- [net-next,10/10] mptcp: register default scheduler
https://git.kernel.org/netdev/net-next/c/ed1ad86b8527
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread