All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support
@ 2024-02-16 11:28 Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 1/7] mptcp: push at DSS boundaries Paolo Abeni
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

Patch 6/7 does the magic, all the others are minor cleanup and fix of
buglet exposed by such feature. I'll push a paired pktdrill test.

Note that this relies on the existing accounting for snd_nxt. As I
stated such accounting is not 110% accurate as it tracks the most recent
sequence number queued to any subflow, and not the actual sequence
number sent on the wire.

I experimented a lot trying to implement the latter and in the end it
proved to be both "too complex" and "not necessary".

The complexity raises from the need for additional lock and a lot of
refactory to introduce such protection without adding significant
overhead. Additionally, snd_nxt is currenly used and exposed with the
current semantic both eBPF and the internal packet scheduling.
Introducing a different tracking will still require us to keep the old
one.

More interesting, a more accurate tracking could be not strictly
necessary: as the MPTCP protocol enqueues data to the subflows only up
the available send window, any enqueue data is sent on the wire
instantly, without any blocking operation short of a drop in the tx path
at the nft or TC layer.

The individual patches changelog carry the gory details.

Still sending a single series to outline the functional requirements,
even if the first 3 patches could land on mptcp-net directly.

v2 -> v3:
  - typo in patch 1/7
  - dropped unused code in patch 3/7
  - dropped duplicate code in patch 6/7

v1 -> v2:
  - clarifiy commit message in patch 1/7
  - fix possible wake-up bug in patch 6/7

*** BLURB HERE ***

Paolo Abeni (7):
  mptcp: push at DSS boundaries
  mptcp: fix snd_wnd initialization for passive socket
  mptcp: fix potential wake-up event loss
  mptcp: cleanup writer wake-up
  mptcp: avoid some duplicate code in socket option handling
  mptcp: implement TCP_NOTSENT_LOWAT support.
  mptcp: cleanup SOL_TCP handling

 net/mptcp/protocol.c | 57 ++++++++++++++++++++++++-----------
 net/mptcp/protocol.h | 51 ++++++++++++++++++++++---------
 net/mptcp/sockopt.c  | 71 ++++++++++++++++++++------------------------
 3 files changed, 108 insertions(+), 71 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 1/7] mptcp: push at DSS boundaries
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

when inserting not contiguous data in the subflow write queue,
the protocol creates a new skb and prevent the TCP stack from
merging it later with already queued skbs by setting the EOR marker.

Still no push flag is explicitly set at the end of previous GSO
packet, making the aggregation on the receiver side sub-optimal -
and packetdrill self-tests less predictable.

Explicitly mark the end of not contiguous DSS with the push flag.

Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - fixed subj typo (Mat)
---
 net/mptcp/protocol.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3017b01ac488..21b3729c65ac 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1265,6 +1265,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		mpext = mptcp_get_ext(skb);
 		if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
 			TCP_SKB_CB(skb)->eor = 1;
+			tcp_mark_push(tcp_sk(ssk), skb);
 			goto alloc_skb;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 2/7] mptcp: fix snd_wnd initialization for passive socket
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 1/7] mptcp: push at DSS boundaries Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

Such value should be inherited from the first subflow, but
passive sockets always used 'rsk_rcv_wnd'.

Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 21b3729c65ac..7c3cb97214b8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3216,7 +3216,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 	WRITE_ONCE(msk->write_seq, subflow_req->idsn + 1);
 	WRITE_ONCE(msk->snd_nxt, msk->write_seq);
 	WRITE_ONCE(msk->snd_una, msk->write_seq);
-	WRITE_ONCE(msk->wnd_end, msk->snd_nxt + req->rsk_rcv_wnd);
+	WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd);
 	msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
 	mptcp_init_sched(msk, mptcp_sk(sk)->sched);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 3/7] mptcp: fix potential wake-up event loss
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 1/7] mptcp: push at DSS boundaries Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 4/7] mptcp: cleanup writer wake-up Paolo Abeni
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

After the blamed commit below, the send buffer auto-tuning can
happen after that the mptcp_propagate_sndbuf() completes - via
the delegated action infrastructure.

We must check for write space even after such change or we risk
missing the wake-up event.

Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - drop unused __mptcp_sync_sndnxt() - Mat
---
 net/mptcp/protocol.h | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 486fff865803..738705207652 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -797,6 +797,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
 }
 
+static inline void mptcp_write_space(struct sock *sk)
+{
+	if (sk_stream_is_writeable(sk)) {
+		/* pairs with memory barrier in mptcp_poll */
+		smp_mb();
+		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
+			sk_stream_write_space(sk);
+	}
+}
+
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow;
@@ -815,6 +825,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
 
 	/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
 	WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
+	mptcp_write_space(sk);
 }
 
 /* The called held both the msk socket and the subflow socket locks,
@@ -845,16 +856,6 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
 	local_bh_enable();
 }
 
-static inline void mptcp_write_space(struct sock *sk)
-{
-	if (sk_stream_is_writeable(sk)) {
-		/* pairs with memory barrier in mptcp_poll */
-		smp_mb();
-		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
-			sk_stream_write_space(sk);
-	}
-}
-
 void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
 
 #define MPTCP_TOKEN_MAX_RETRIES	4
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 4/7] mptcp: cleanup writer wake-up
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (2 preceding siblings ...)
  2024-02-16 11:28 ` [PATCH v3 net-next 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

After commit 5cf92bbadc58 ("mptcp: re-enable sndbuf autotune"), the
MPTCP_NOSPACE bit is redundant: it is always set and cleared together with
SOCK_NOSPACE.

Let's drop the first and always relay on the latter, dropping a bunch
of useless code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 15 +++------------
 net/mptcp/protocol.h | 16 ++++++----------
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7c3cb97214b8..def01e030121 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1692,15 +1692,6 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
 	}
 }
 
-static void mptcp_set_nospace(struct sock *sk)
-{
-	/* enable autotune */
-	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-
-	/* will be cleared on avail space */
-	set_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags);
-}
-
 static int mptcp_disconnect(struct sock *sk, int flags);
 
 static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
@@ -1874,7 +1865,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		continue;
 
 wait_for_memory:
-		mptcp_set_nospace(sk);
+		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		__mptcp_push_pending(sk, msg->msg_flags);
 		ret = sk_stream_wait_memory(sk, &timeo);
 		if (ret)
@@ -3896,8 +3887,8 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
 	if (sk_stream_is_writeable(sk))
 		return EPOLLOUT | EPOLLWRNORM;
 
-	mptcp_set_nospace(sk);
-	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
+	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+	smp_mb__after_atomic(); /* NOSPACE is changed by mptcp_write_space() */
 	if (sk_stream_is_writeable(sk))
 		return EPOLLOUT | EPOLLWRNORM;
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 738705207652..3414e7705835 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -113,10 +113,9 @@
 #define MPTCP_RST_TRANSIENT	BIT(0)
 
 /* MPTCP socket atomic flags */
-#define MPTCP_NOSPACE		1
-#define MPTCP_WORK_RTX		2
-#define MPTCP_FALLBACK_DONE	4
-#define MPTCP_WORK_CLOSE_SUBFLOW 5
+#define MPTCP_WORK_RTX		1
+#define MPTCP_FALLBACK_DONE	2
+#define MPTCP_WORK_CLOSE_SUBFLOW 3
 
 /* MPTCP socket release cb flags */
 #define MPTCP_PUSH_PENDING	1
@@ -799,12 +798,9 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 
 static inline void mptcp_write_space(struct sock *sk)
 {
-	if (sk_stream_is_writeable(sk)) {
-		/* pairs with memory barrier in mptcp_poll */
-		smp_mb();
-		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
-			sk_stream_write_space(sk);
-	}
+	/* pairs with memory barrier in mptcp_poll */
+	smp_mb();
+	sk_stream_write_space(sk);
 }
 
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 5/7] mptcp: avoid some duplicate code in socket option handling
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (3 preceding siblings ...)
  2024-02-16 11:28 ` [PATCH v3 net-next 4/7] mptcp: cleanup writer wake-up Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

The mptcp_get_int_option() helper is needless open-coded in a
couple of places, replace the duplicate code with the helper
call.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/sockopt.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index da37e4541a5d..ac37f6c5e2ed 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -629,13 +629,11 @@ static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optva
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val;
-
-	if (optlen < sizeof(int))
-		return -EINVAL;
+	int val, ret;
 
-	if (copy_from_sockptr(&val, optval, sizeof(val)))
-		return -EFAULT;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
 
 	lock_sock(sk);
 	sockopt_seq_inc(msk);
@@ -659,13 +657,11 @@ static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t op
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val;
-
-	if (optlen < sizeof(int))
-		return -EINVAL;
+	int val, ret;
 
-	if (copy_from_sockptr(&val, optval, sizeof(val)))
-		return -EFAULT;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
 
 	lock_sock(sk);
 	sockopt_seq_inc(msk);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 6/7] mptcp: implement TCP_NOTSENT_LOWAT support.
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (4 preceding siblings ...)
  2024-02-16 11:28 ` [PATCH v3 net-next 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

Add support for such socket option storing the user-space provided
value in a new msk field, and using such data to implement the
_mptcp_stream_memory_free() helper, similar to the TCP one.

To avoid adding more indirect calls in the fast path, open-code
a variant of sk_stream_memory_free() in mptcp_sendmsg() and add
direct calls to the mptcp stream memory free helper where possible.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
 - drop duplicate mptcp_stream_memory_free() impl - Mat
---
 net/mptcp/protocol.c | 39 ++++++++++++++++++++++++++++++++++-----
 net/mptcp/protocol.h | 28 +++++++++++++++++++++++++++-
 net/mptcp/sockopt.c  | 12 ++++++++++++
 3 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index def01e030121..50dcba41b6ef 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1762,6 +1762,30 @@ static int do_copy_data_nocache(struct sock *sk, int copy,
 	return 0;
 }
 
+/* open-code sk_stream_memory_free() plus sent limit computation to
+ * avoid indirect calls in fast-path.
+ * Called under the msk socket lock, so we can avoid a bunch of ONCE
+ * annotations.
+ */
+static u32 mptcp_send_limit(const struct sock *sk)
+{
+	const struct mptcp_sock *msk = mptcp_sk(sk);
+	u32 limit, not_sent;
+
+	if (sk->sk_wmem_queued >= READ_ONCE(sk->sk_sndbuf))
+		return 0;
+
+	limit = mptcp_notsent_lowat(sk);
+	if (limit == UINT_MAX)
+		return UINT_MAX;
+
+	not_sent = msk->write_seq - msk->snd_nxt;
+	if (not_sent >= limit)
+		return 0;
+
+	return limit - not_sent;
+}
+
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1806,6 +1830,12 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		struct mptcp_data_frag *dfrag;
 		bool dfrag_collapsed;
 		size_t psize, offset;
+		u32 copy_limit;
+
+		/* ensure fitting the notsent_lowat() constraint */
+		copy_limit = mptcp_send_limit(sk);
+		if (!copy_limit)
+			goto wait_for_memory;
 
 		/* reuse tail pfrag, if possible, or carve a new one from the
 		 * page allocator
@@ -1813,9 +1843,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		dfrag = mptcp_pending_tail(sk);
 		dfrag_collapsed = mptcp_frag_can_collapse_to(msk, pfrag, dfrag);
 		if (!dfrag_collapsed) {
-			if (!sk_stream_memory_free(sk))
-				goto wait_for_memory;
-
 			if (!mptcp_page_frag_refill(sk, pfrag))
 				goto wait_for_memory;
 
@@ -1830,6 +1857,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		offset = dfrag->offset + dfrag->data_len;
 		psize = pfrag->size - offset;
 		psize = min_t(size_t, psize, msg_data_left(msg));
+		psize = min_t(size_t, psize, copy_limit);
 		total_ts = psize + frag_truesize;
 
 		if (!sk_wmem_schedule(sk, total_ts))
@@ -3711,6 +3739,7 @@ static struct proto mptcp_prot = {
 	.unhash		= mptcp_unhash,
 	.get_port	= mptcp_get_port,
 	.forward_alloc_get	= mptcp_forward_alloc_get,
+	.stream_memory_free	= mptcp_stream_memory_free,
 	.sockets_allocated	= &mptcp_sockets_allocated,
 
 	.memory_allocated	= &tcp_memory_allocated,
@@ -3884,12 +3913,12 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
 {
 	struct sock *sk = (struct sock *)msk;
 
-	if (sk_stream_is_writeable(sk))
+	if (__mptcp_stream_is_writeable(sk, 1))
 		return EPOLLOUT | EPOLLWRNORM;
 
 	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 	smp_mb__after_atomic(); /* NOSPACE is changed by mptcp_write_space() */
-	if (sk_stream_is_writeable(sk))
+	if (__mptcp_stream_is_writeable(sk, 1))
 		return EPOLLOUT | EPOLLWRNORM;
 
 	return 0;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 3414e7705835..9e4bb9d36dbc 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -307,6 +307,7 @@ struct mptcp_sock {
 			in_accept_queue:1,
 			free_first:1,
 			rcvspace_init:1;
+	u32		notsent_lowat;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
@@ -796,11 +797,36 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
 }
 
+static inline u32 mptcp_notsent_lowat(const struct sock *sk)
+{
+	struct net *net = sock_net(sk);
+	u32 val;
+
+	val = READ_ONCE(mptcp_sk(sk)->notsent_lowat);
+	return val ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat);
+}
+
+static inline bool mptcp_stream_memory_free(const struct sock *sk, int wake)
+{
+	const struct mptcp_sock *msk = mptcp_sk(sk);
+	u32 notsent_bytes;
+
+	notsent_bytes = READ_ONCE(msk->write_seq) - READ_ONCE(msk->snd_nxt);
+	return (notsent_bytes << wake) < mptcp_notsent_lowat(sk);
+}
+
+static inline bool __mptcp_stream_is_writeable(const struct sock *sk, int wake)
+{
+	return mptcp_stream_memory_free(sk, wake) &&
+	       __sk_stream_is_writeable(sk, wake);
+}
+
 static inline void mptcp_write_space(struct sock *sk)
 {
 	/* pairs with memory barrier in mptcp_poll */
 	smp_mb();
-	sk_stream_write_space(sk);
+	if (mptcp_stream_memory_free(sk, 1))
+		sk_stream_write_space(sk);
 }
 
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index ac37f6c5e2ed..1b38dac70719 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -812,6 +812,16 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return 0;
 	case TCP_ULP:
 		return -EOPNOTSUPP;
+	case TCP_NOTSENT_LOWAT:
+		ret = mptcp_get_int_option(msk, optval, optlen, &val);
+		if (ret)
+			return ret;
+
+		lock_sock(sk);
+		WRITE_ONCE(msk->notsent_lowat, val);
+		mptcp_write_space(sk);
+		release_sock(sk);
+		return 0;
 	case TCP_CONGESTION:
 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
 	case TCP_CORK:
@@ -1345,6 +1355,8 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return mptcp_put_int_option(msk, optval, optlen, msk->cork);
 	case TCP_NODELAY:
 		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
+	case TCP_NOTSENT_LOWAT:
+		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
 	}
 	return -EOPNOTSUPP;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (5 preceding siblings ...)
  2024-02-16 11:28 ` [PATCH v3 net-next 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
@ 2024-02-16 11:28 ` Paolo Abeni
  2024-02-16 12:18   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
                     ` (2 more replies)
  2024-02-16 19:48 ` [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Mat Martineau
  2024-02-19  9:44 ` Matthieu Baerts
  8 siblings, 3 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-16 11:28 UTC (permalink / raw)
  To: mptcp

Most TCP-level socket options get an integer from user space, and
set the corresponding field under the msk-level socket lock.

Reduce the code duplication moving such operations in the common code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/sockopt.c | 75 ++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 45 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 1b38dac70719..dcd1c76d2a3b 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -624,18 +624,11 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	return ret;
 }
 
-static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optval,
-					 unsigned int optlen)
+static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val, ret;
 
-	ret = mptcp_get_int_option(msk, optval, optlen, &val);
-	if (ret)
-		return ret;
-
-	lock_sock(sk);
 	sockopt_seq_inc(msk);
 	msk->cork = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -647,23 +640,15 @@ static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optva
 	}
 	if (!val)
 		mptcp_check_and_set_pending(sk);
-	release_sock(sk);
 
 	return 0;
 }
 
-static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t optval,
-					    unsigned int optlen)
+static int __mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, int val)
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val, ret;
 
-	ret = mptcp_get_int_option(msk, optval, optlen, &val);
-	if (ret)
-		return ret;
-
-	lock_sock(sk);
 	sockopt_seq_inc(msk);
 	msk->nodelay = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -675,8 +660,6 @@ static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t op
 	}
 	if (val)
 		mptcp_check_and_set_pending(sk);
-	release_sock(sk);
-
 	return 0;
 }
 
@@ -799,35 +782,10 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 	int ret, val;
 
 	switch (optname) {
-	case TCP_INQ:
-		ret = mptcp_get_int_option(msk, optval, optlen, &val);
-		if (ret)
-			return ret;
-		if (val < 0 || val > 1)
-			return -EINVAL;
-
-		lock_sock(sk);
-		msk->recvmsg_inq = !!val;
-		release_sock(sk);
-		return 0;
 	case TCP_ULP:
 		return -EOPNOTSUPP;
-	case TCP_NOTSENT_LOWAT:
-		ret = mptcp_get_int_option(msk, optval, optlen, &val);
-		if (ret)
-			return ret;
-
-		lock_sock(sk);
-		WRITE_ONCE(msk->notsent_lowat, val);
-		mptcp_write_space(sk);
-		release_sock(sk);
-		return 0;
 	case TCP_CONGESTION:
 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
-	case TCP_CORK:
-		return mptcp_setsockopt_sol_tcp_cork(msk, optval, optlen);
-	case TCP_NODELAY:
-		return mptcp_setsockopt_sol_tcp_nodelay(msk, optval, optlen);
 	case TCP_DEFER_ACCEPT:
 		/* See tcp.c: TCP_DEFER_ACCEPT does not fail */
 		mptcp_setsockopt_first_sf_only(msk, SOL_TCP, optname, optval, optlen);
@@ -840,7 +798,34 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 						      optval, optlen);
 	}
 
-	return -EOPNOTSUPP;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	switch (optname) {
+	case TCP_INQ:
+		if (val < 0 || val > 1)
+			ret = -EINVAL;
+		else
+			msk->recvmsg_inq = !!val;
+		break;
+	case TCP_NOTSENT_LOWAT:
+		WRITE_ONCE(msk->notsent_lowat, val);
+		mptcp_write_space(sk);
+		break;
+	case TCP_CORK:
+		ret = __mptcp_setsockopt_sol_tcp_cork(msk, val);
+		break;
+	case TCP_NODELAY:
+		ret = __mptcp_setsockopt_sol_tcp_nodelay(msk, val);
+		break;
+	default:
+		ret = -ENOPROTOOPT;
+	}
+
+	release_sock(sk);
+	return ret;
 }
 
 int mptcp_setsockopt(struct sock *sk, int level, int optname,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
@ 2024-02-16 12:18   ` MPTCP CI
  2024-02-16 12:41   ` MPTCP CI
  2024-02-16 20:34   ` MPTCP CI
  2 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2024-02-16 12:18 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7930042401

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/4601bea9a9c5


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  2024-02-16 12:18   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
@ 2024-02-16 12:41   ` MPTCP CI
  2024-02-16 20:34   ` MPTCP CI
  2 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2024-02-16 12:41 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5138880970620928
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5138880970620928/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6264780877463552
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6264780877463552/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/4601bea9a9c5


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (6 preceding siblings ...)
  2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
@ 2024-02-16 19:48 ` Mat Martineau
  2024-02-19  9:44 ` Matthieu Baerts
  8 siblings, 0 replies; 13+ messages in thread
From: Mat Martineau @ 2024-02-16 19:48 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 16 Feb 2024, Paolo Abeni wrote:

> Patch 6/7 does the magic, all the others are minor cleanup and fix of
> buglet exposed by such feature. I'll push a paired pktdrill test.
>
> Note that this relies on the existing accounting for snd_nxt. As I
> stated such accounting is not 110% accurate as it tracks the most recent
> sequence number queued to any subflow, and not the actual sequence
> number sent on the wire.
>
> I experimented a lot trying to implement the latter and in the end it
> proved to be both "too complex" and "not necessary".
>
> The complexity raises from the need for additional lock and a lot of
> refactory to introduce such protection without adding significant
> overhead. Additionally, snd_nxt is currenly used and exposed with the
> current semantic both eBPF and the internal packet scheduling.
> Introducing a different tracking will still require us to keep the old
> one.
>
> More interesting, a more accurate tracking could be not strictly
> necessary: as the MPTCP protocol enqueues data to the subflows only up
> the available send window, any enqueue data is sent on the wire
> instantly, without any blocking operation short of a drop in the tx path
> at the nft or TC layer.
>
> The individual patches changelog carry the gory details.
>

v3 of the series LGTM, thanks Paolo.

Matthieu note that this series is split between mptcp-net and mptcp-next:

> Still sending a single series to outline the functional requirements,
> even if the first 3 patches could land on mptcp-net directly.
>

Reviewed-by: Mat Martineau <martineau@kernel.org>


> v2 -> v3:
>  - typo in patch 1/7
>  - dropped unused code in patch 3/7
>  - dropped duplicate code in patch 6/7
>
> v1 -> v2:
>  - clarifiy commit message in patch 1/7
>  - fix possible wake-up bug in patch 6/7
>
> *** BLURB HERE ***
>
> Paolo Abeni (7):
>  mptcp: push at DSS boundaries
>  mptcp: fix snd_wnd initialization for passive socket
>  mptcp: fix potential wake-up event loss
>  mptcp: cleanup writer wake-up
>  mptcp: avoid some duplicate code in socket option handling
>  mptcp: implement TCP_NOTSENT_LOWAT support.
>  mptcp: cleanup SOL_TCP handling
>
> net/mptcp/protocol.c | 57 ++++++++++++++++++++++++-----------
> net/mptcp/protocol.h | 51 ++++++++++++++++++++++---------
> net/mptcp/sockopt.c  | 71 ++++++++++++++++++++------------------------
> 3 files changed, 108 insertions(+), 71 deletions(-)
>
> -- 
> 2.43.0
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  2024-02-16 12:18   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
  2024-02-16 12:41   ` MPTCP CI
@ 2024-02-16 20:34   ` MPTCP CI
  2 siblings, 0 replies; 13+ messages in thread
From: MPTCP CI @ 2024-02-16 20:34 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7935569600

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/758a441415a6


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support
  2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (7 preceding siblings ...)
  2024-02-16 19:48 ` [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Mat Martineau
@ 2024-02-19  9:44 ` Matthieu Baerts
  8 siblings, 0 replies; 13+ messages in thread
From: Matthieu Baerts @ 2024-02-19  9:44 UTC (permalink / raw)
  To: Paolo Abeni, Mat Martineau; +Cc: mptcp

Hi Paolo, Mat,

On 16/02/2024 12:28, Paolo Abeni wrote:
> Patch 6/7 does the magic, all the others are minor cleanup and fix of
> buglet exposed by such feature. I'll push a paired pktdrill test.
> 
> Note that this relies on the existing accounting for snd_nxt. As I
> stated such accounting is not 110% accurate as it tracks the most recent
> sequence number queued to any subflow, and not the actual sequence
> number sent on the wire.
> 
> I experimented a lot trying to implement the latter and in the end it
> proved to be both "too complex" and "not necessary".
> 
> The complexity raises from the need for additional lock and a lot of
> refactory to introduce such protection without adding significant
> overhead. Additionally, snd_nxt is currenly used and exposed with the
> current semantic both eBPF and the internal packet scheduling.
> Introducing a different tracking will still require us to keep the old
> one.
> 
> More interesting, a more accurate tracking could be not strictly
> necessary: as the MPTCP protocol enqueues data to the subflows only up
> the available send window, any enqueue data is sent on the wire
> instantly, without any blocking operation short of a drop in the tx path
> at the nft or TC layer.
> 
> The individual patches changelog carry the gory details.
> 
> Still sending a single series to outline the functional requirements,
> even if the first 3 patches could land on mptcp-net directly.

Thank you for the patches, and the reviews!

Now in our tree (fixes for -net, and feat. for net-next):


New patches for t/upstream-net and t/upstream:
- d9b01cb9adc7: mptcp: push at DSS boundaries
- 1ea124e2ca57: mptcp: fix snd_wnd initialization for passive socket
- c64351cfe29e: mptcp: fix potential wake-up event loss
- Results: 086d253b7038..9dcf86462a59 (export-net)

- efe1cf277159: conflict in top-bases/t/DO-NOT-MERGE-git-markup-net-next
- Results: 8de48203098c..4f7ce06a0f9e (export)

New patches for t/upstream (only):
- 232c91fcaf38: mptcp: cleanup writer wake-up
- 1e762225d9b2: mptcp: avoid some duplicate code in socket option handling
- 51913f3f330f: mptcp: implement TCP_NOTSENT_LOWAT support
- eb5db737fdac: mptcp: cleanup SOL_TCP handling
- Results: 4f7ce06a0f9e..f2fb9bec2195 (export)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-02-19  9:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-16 11:28 [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 1/7] mptcp: push at DSS boundaries Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 4/7] mptcp: cleanup writer wake-up Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
2024-02-16 11:28 ` [PATCH v3 net-next 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
2024-02-16 12:18   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
2024-02-16 12:41   ` MPTCP CI
2024-02-16 20:34   ` MPTCP CI
2024-02-16 19:48 ` [PATCH v3 net-next 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Mat Martineau
2024-02-19  9:44 ` Matthieu Baerts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.