All of lore.kernel.org
 help / color / mirror / Atom feed
* [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support
@ 2024-02-12 15:18 Paolo Abeni
  2024-02-12 15:18 ` [mptcp-next v2 1/7] mptcp: push ad DSS boundaries Paolo Abeni
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:18 UTC (permalink / raw)
  To: mptcp

Patch 6/7 does the magic, all the others are minor cleanup and fix of
buglet exposed by such feature. I'll push a paired pktdrill test.

Note that this relies on the existing accounting for snd_nxt. As I
stated such accounting is not 110% accurate as it tracks the most recent
sequence number queued to any subflow, and not the actual sequence
number sent on the wire.

I experimented a lot trying to implement the latter and in the end it
proved to be both "too complex" and "not necessary".

The complexity raises from the need for additional lock and a lot of
refactory to introduce such protection without adding significant
overhead. Additionally, snd_nxt is currenly used and exposed with the
current semantic both eBPF and the internal packet scheduling.
Introducing a different tracking will still require us to keep the old
one.

More interesting, a more accurate tracking could be not strictly
necessary: as the MPTCP protocol enqueues data to the subflows only up
the available send window, any enqueue data is sent on the wire
instantly, without any blocking operation short of a drop in the tx path
at the nft or TC layer.

The individual patches changelog carry the gory details.

Still sending a single series to outline the functional requirements,
even if the first 3 patches could land on mptcp-net directly.

v1 -> v2:
  - clarifiy commit message in patch 1/7
  - fix possible wake-up bug in patch 6/7

Paolo Abeni (7):
  mptcp: push ad DSS boundaries
  mptcp: fix snd_wnd initialization for passive socket
  mptcp: fix potential wake-up event loss
  mptcp: cleanup writer wake-up
  mptcp: avoid some duplicate code in socket option handling
  mptcp: implement TCP_NOTSENT_LOWAT support.
  mptcp: cleanup SOL_TCP handling

 net/mptcp/protocol.c | 66 +++++++++++++++++++++++++++++-----------
 net/mptcp/protocol.h | 57 ++++++++++++++++++++++++++++-------
 net/mptcp/sockopt.c  | 71 ++++++++++++++++++++------------------------
 3 files changed, 126 insertions(+), 68 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [mptcp-next v2 1/7] mptcp: push ad DSS boundaries
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
@ 2024-02-12 15:18 ` Paolo Abeni
  2024-02-15 22:33   ` Mat Martineau
  2024-02-12 15:18 ` [mptcp-next v2 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:18 UTC (permalink / raw)
  To: mptcp

when inserting not contiguous data in the subflow write queue,
the protocol creates a new skb and prevent the TCP stack from
merging it later with already queued skbs by setting the EOR marker.

Still no push flag is explicitly set at the end of previous GSO
packet, making the aggregation on the receiver side sub-optimal -
and packetdrill self-tests less predictable.

Explicitly mark the end of not contiguous DSS with the push flag.

Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3017b01ac488..21b3729c65ac 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1265,6 +1265,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		mpext = mptcp_get_ext(skb);
 		if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
 			TCP_SKB_CB(skb)->eor = 1;
+			tcp_mark_push(tcp_sk(ssk), skb);
 			goto alloc_skb;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 2/7] mptcp: fix snd_wnd initialization for passive socket
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
  2024-02-12 15:18 ` [mptcp-next v2 1/7] mptcp: push ad DSS boundaries Paolo Abeni
@ 2024-02-12 15:18 ` Paolo Abeni
  2024-02-12 15:18 ` [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:18 UTC (permalink / raw)
  To: mptcp

Such value should be inherited from the first subflow, but
passive sockets always used 'rsk_rcv_wnd'.

Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 21b3729c65ac..7c3cb97214b8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3216,7 +3216,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 	WRITE_ONCE(msk->write_seq, subflow_req->idsn + 1);
 	WRITE_ONCE(msk->snd_nxt, msk->write_seq);
 	WRITE_ONCE(msk->snd_una, msk->write_seq);
-	WRITE_ONCE(msk->wnd_end, msk->snd_nxt + req->rsk_rcv_wnd);
+	WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd);
 	msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
 	mptcp_init_sched(msk, mptcp_sk(sk)->sched);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
  2024-02-12 15:18 ` [mptcp-next v2 1/7] mptcp: push ad DSS boundaries Paolo Abeni
  2024-02-12 15:18 ` [mptcp-next v2 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
@ 2024-02-12 15:18 ` Paolo Abeni
  2024-02-15 22:20   ` Mat Martineau
  2024-02-12 15:18 ` [mptcp-next v2 4/7] mptcp: cleanup writer wake-up Paolo Abeni
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:18 UTC (permalink / raw)
  To: mptcp

After the blamed commit below, the send buffer auto-tuning can
happen after that the mptcp_propagate_sndbuf() completes - via
the delegated action infrastructure.

We must check for write space even after such change or we risk
missing the wake-up event.

Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.h | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 486fff865803..2e197262a42e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -797,6 +797,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
 }
 
+static inline void mptcp_write_space(struct sock *sk)
+{
+	if (sk_stream_is_writeable(sk)) {
+		/* pairs with memory barrier in mptcp_poll */
+		smp_mb();
+		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
+			sk_stream_write_space(sk);
+	}
+}
+
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
 {
 	struct mptcp_subflow_context *subflow;
@@ -815,6 +825,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
 
 	/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
 	WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
+	mptcp_write_space(sk);
 }
 
 /* The called held both the msk socket and the subflow socket locks,
@@ -845,14 +856,16 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
 	local_bh_enable();
 }
 
-static inline void mptcp_write_space(struct sock *sk)
+static inline void __mptcp_sync_sndnxt(struct sock *sk, u64 new_snd_nxt)
 {
-	if (sk_stream_is_writeable(sk)) {
-		/* pairs with memory barrier in mptcp_poll */
-		smp_mb();
-		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
-			sk_stream_write_space(sk);
-	}
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	if (!after64(new_snd_nxt, msk->snd_nxt))
+		return;
+
+	msk->bytes_sent += new_snd_nxt - msk->snd_nxt;
+	WRITE_ONCE(msk->snd_nxt, new_snd_nxt);
+	mptcp_write_space(sk);
 }
 
 void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 4/7] mptcp: cleanup writer wake-up
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (2 preceding siblings ...)
  2024-02-12 15:18 ` [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
@ 2024-02-12 15:18 ` Paolo Abeni
  2024-02-12 15:19 ` [mptcp-next v2 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:18 UTC (permalink / raw)
  To: mptcp

After commit 5cf92bbadc58 ("mptcp: re-enable sndbuf autotune"), the
MPTCP_NOSPACE bit is redundant: it is always set and cleared together with
SOCK_NOSPACE.

Let's drop the first and always relay on the latter, dropping a bunch
of useless code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 15 +++------------
 net/mptcp/protocol.h | 16 ++++++----------
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7c3cb97214b8..def01e030121 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1692,15 +1692,6 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool
 	}
 }
 
-static void mptcp_set_nospace(struct sock *sk)
-{
-	/* enable autotune */
-	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-
-	/* will be cleared on avail space */
-	set_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags);
-}
-
 static int mptcp_disconnect(struct sock *sk, int flags);
 
 static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
@@ -1874,7 +1865,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		continue;
 
 wait_for_memory:
-		mptcp_set_nospace(sk);
+		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		__mptcp_push_pending(sk, msg->msg_flags);
 		ret = sk_stream_wait_memory(sk, &timeo);
 		if (ret)
@@ -3896,8 +3887,8 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
 	if (sk_stream_is_writeable(sk))
 		return EPOLLOUT | EPOLLWRNORM;
 
-	mptcp_set_nospace(sk);
-	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
+	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+	smp_mb__after_atomic(); /* NOSPACE is changed by mptcp_write_space() */
 	if (sk_stream_is_writeable(sk))
 		return EPOLLOUT | EPOLLWRNORM;
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 2e197262a42e..0aa77c03643d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -113,10 +113,9 @@
 #define MPTCP_RST_TRANSIENT	BIT(0)
 
 /* MPTCP socket atomic flags */
-#define MPTCP_NOSPACE		1
-#define MPTCP_WORK_RTX		2
-#define MPTCP_FALLBACK_DONE	4
-#define MPTCP_WORK_CLOSE_SUBFLOW 5
+#define MPTCP_WORK_RTX		1
+#define MPTCP_FALLBACK_DONE	2
+#define MPTCP_WORK_CLOSE_SUBFLOW 3
 
 /* MPTCP socket release cb flags */
 #define MPTCP_PUSH_PENDING	1
@@ -799,12 +798,9 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 
 static inline void mptcp_write_space(struct sock *sk)
 {
-	if (sk_stream_is_writeable(sk)) {
-		/* pairs with memory barrier in mptcp_poll */
-		smp_mb();
-		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
-			sk_stream_write_space(sk);
-	}
+	/* pairs with memory barrier in mptcp_poll */
+	smp_mb();
+	sk_stream_write_space(sk);
 }
 
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 5/7] mptcp: avoid some duplicate code in socket option handling
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (3 preceding siblings ...)
  2024-02-12 15:18 ` [mptcp-next v2 4/7] mptcp: cleanup writer wake-up Paolo Abeni
@ 2024-02-12 15:19 ` Paolo Abeni
  2024-02-12 15:19 ` [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
  2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  6 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:19 UTC (permalink / raw)
  To: mptcp

The mptcp_get_int_option() helper is needless open-coded in a
couple of places, replace the duplicate code with the helper
call.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/sockopt.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index da37e4541a5d..ac37f6c5e2ed 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -629,13 +629,11 @@ static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optva
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val;
-
-	if (optlen < sizeof(int))
-		return -EINVAL;
+	int val, ret;
 
-	if (copy_from_sockptr(&val, optval, sizeof(val)))
-		return -EFAULT;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
 
 	lock_sock(sk);
 	sockopt_seq_inc(msk);
@@ -659,13 +657,11 @@ static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t op
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val;
-
-	if (optlen < sizeof(int))
-		return -EINVAL;
+	int val, ret;
 
-	if (copy_from_sockptr(&val, optval, sizeof(val)))
-		return -EFAULT;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
 
 	lock_sock(sk);
 	sockopt_seq_inc(msk);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support.
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (4 preceding siblings ...)
  2024-02-12 15:19 ` [mptcp-next v2 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
@ 2024-02-12 15:19 ` Paolo Abeni
  2024-02-15 22:31   ` Mat Martineau
  2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  6 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:19 UTC (permalink / raw)
  To: mptcp

Add support for such socket option storing the user-space provided
value in a new msk field, and using such data to implement the
_mptcp_stream_memory_free() helper, similar to the TCP one.

To avoid adding more indirect calls in the fast path, open-code
a variant of sk_stream_memory_free() in mptcp_sendmsg() and add
direct calls to the mptcp stream memory free helper where possible.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 48 +++++++++++++++++++++++++++++++++++++++-----
 net/mptcp/protocol.h | 28 +++++++++++++++++++++++++-
 net/mptcp/sockopt.c  | 12 +++++++++++
 3 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index def01e030121..cc5680c4c5b3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1762,6 +1762,39 @@ static int do_copy_data_nocache(struct sock *sk, int copy,
 	return 0;
 }
 
+static bool mptcp_stream_memory_free(const struct sock *sk, int wake)
+{
+	const struct mptcp_sock *msk = mptcp_sk(sk);
+	u32 notsent_bytes = READ_ONCE(msk->write_seq) -
+			    READ_ONCE(msk->snd_nxt);
+
+	return (notsent_bytes << wake) < mptcp_notsent_lowat(sk);
+}
+
+/* open-code sk_stream_memory_free() plus sent limit computation to
+ * avoid indirect calls in fast-path.
+ * Called under the msk socket lock, so we can avoid a bunch of ONCE
+ * annotations.
+ */
+static u32 mptcp_send_limit(const struct sock *sk)
+{
+	const struct mptcp_sock *msk = mptcp_sk(sk);
+	u32 limit, not_sent;
+
+	if (sk->sk_wmem_queued >= READ_ONCE(sk->sk_sndbuf))
+		return 0;
+
+	limit = mptcp_notsent_lowat(sk);
+	if (limit == UINT_MAX)
+		return UINT_MAX;
+
+	not_sent = msk->write_seq - msk->snd_nxt;
+	if (not_sent >= limit)
+		return 0;
+
+	return limit - not_sent;
+}
+
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1806,6 +1839,12 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		struct mptcp_data_frag *dfrag;
 		bool dfrag_collapsed;
 		size_t psize, offset;
+		u32 copy_limit;
+
+		/* ensure fitting the notsent_lowat() constraint */
+		copy_limit = mptcp_send_limit(sk);
+		if (!copy_limit)
+			goto wait_for_memory;
 
 		/* reuse tail pfrag, if possible, or carve a new one from the
 		 * page allocator
@@ -1813,9 +1852,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		dfrag = mptcp_pending_tail(sk);
 		dfrag_collapsed = mptcp_frag_can_collapse_to(msk, pfrag, dfrag);
 		if (!dfrag_collapsed) {
-			if (!sk_stream_memory_free(sk))
-				goto wait_for_memory;
-
 			if (!mptcp_page_frag_refill(sk, pfrag))
 				goto wait_for_memory;
 
@@ -1830,6 +1866,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		offset = dfrag->offset + dfrag->data_len;
 		psize = pfrag->size - offset;
 		psize = min_t(size_t, psize, msg_data_left(msg));
+		psize = min_t(size_t, psize, copy_limit);
 		total_ts = psize + frag_truesize;
 
 		if (!sk_wmem_schedule(sk, total_ts))
@@ -3711,6 +3748,7 @@ static struct proto mptcp_prot = {
 	.unhash		= mptcp_unhash,
 	.get_port	= mptcp_get_port,
 	.forward_alloc_get	= mptcp_forward_alloc_get,
+	.stream_memory_free	= mptcp_stream_memory_free,
 	.sockets_allocated	= &mptcp_sockets_allocated,
 
 	.memory_allocated	= &tcp_memory_allocated,
@@ -3884,12 +3922,12 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
 {
 	struct sock *sk = (struct sock *)msk;
 
-	if (sk_stream_is_writeable(sk))
+	if (__mptcp_stream_is_writeable(sk, 1))
 		return EPOLLOUT | EPOLLWRNORM;
 
 	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 	smp_mb__after_atomic(); /* NOSPACE is changed by mptcp_write_space() */
-	if (sk_stream_is_writeable(sk))
+	if (__mptcp_stream_is_writeable(sk, 1))
 		return EPOLLOUT | EPOLLWRNORM;
 
 	return 0;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 0aa77c03643d..62b84cc6f35e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -307,6 +307,7 @@ struct mptcp_sock {
 			in_accept_queue:1,
 			free_first:1,
 			rcvspace_init:1;
+	u32		notsent_lowat;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
@@ -796,11 +797,36 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
 }
 
+static inline u32 mptcp_notsent_lowat(const struct sock *sk)
+{
+	struct net *net = sock_net(sk);
+	u32 val;
+
+	val = READ_ONCE(mptcp_sk(sk)->notsent_lowat);
+	return val ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat);
+}
+
+static inline bool __mptcp_stream_memory_free(const struct sock *sk, int wake)
+{
+	const struct mptcp_sock *msk = mptcp_sk(sk);
+	u32 notsent_bytes;
+
+	notsent_bytes = READ_ONCE(msk->write_seq) - READ_ONCE(msk->snd_nxt);
+	return (notsent_bytes << wake) < mptcp_notsent_lowat(sk);
+}
+
+static inline bool __mptcp_stream_is_writeable(const struct sock *sk, int wake)
+{
+	return __mptcp_stream_memory_free(sk, wake) &&
+	       __sk_stream_is_writeable(sk, wake);
+}
+
 static inline void mptcp_write_space(struct sock *sk)
 {
 	/* pairs with memory barrier in mptcp_poll */
 	smp_mb();
-	sk_stream_write_space(sk);
+	if (__mptcp_stream_memory_free(sk, 1))
+		sk_stream_write_space(sk);
 }
 
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index ac37f6c5e2ed..1b38dac70719 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -812,6 +812,16 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return 0;
 	case TCP_ULP:
 		return -EOPNOTSUPP;
+	case TCP_NOTSENT_LOWAT:
+		ret = mptcp_get_int_option(msk, optval, optlen, &val);
+		if (ret)
+			return ret;
+
+		lock_sock(sk);
+		WRITE_ONCE(msk->notsent_lowat, val);
+		mptcp_write_space(sk);
+		release_sock(sk);
+		return 0;
 	case TCP_CONGESTION:
 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
 	case TCP_CORK:
@@ -1345,6 +1355,8 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 		return mptcp_put_int_option(msk, optval, optlen, msk->cork);
 	case TCP_NODELAY:
 		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
+	case TCP_NOTSENT_LOWAT:
+		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
 	}
 	return -EOPNOTSUPP;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling
  2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
                   ` (5 preceding siblings ...)
  2024-02-12 15:19 ` [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
@ 2024-02-12 15:19 ` Paolo Abeni
  2024-02-12 16:10   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
                     ` (2 more replies)
  6 siblings, 3 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-12 15:19 UTC (permalink / raw)
  To: mptcp

Most TCP-level socket options get an integer from user space, and
set the corresponding field under the msk-level socket lock.

Reduce the code duplication moving such operations in the common code.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/sockopt.c | 75 ++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 45 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 1b38dac70719..dcd1c76d2a3b 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -624,18 +624,11 @@ static int mptcp_setsockopt_sol_tcp_congestion(struct mptcp_sock *msk, sockptr_t
 	return ret;
 }
 
-static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optval,
-					 unsigned int optlen)
+static int __mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, int val)
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val, ret;
 
-	ret = mptcp_get_int_option(msk, optval, optlen, &val);
-	if (ret)
-		return ret;
-
-	lock_sock(sk);
 	sockopt_seq_inc(msk);
 	msk->cork = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -647,23 +640,15 @@ static int mptcp_setsockopt_sol_tcp_cork(struct mptcp_sock *msk, sockptr_t optva
 	}
 	if (!val)
 		mptcp_check_and_set_pending(sk);
-	release_sock(sk);
 
 	return 0;
 }
 
-static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t optval,
-					    unsigned int optlen)
+static int __mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, int val)
 {
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int val, ret;
 
-	ret = mptcp_get_int_option(msk, optval, optlen, &val);
-	if (ret)
-		return ret;
-
-	lock_sock(sk);
 	sockopt_seq_inc(msk);
 	msk->nodelay = !!val;
 	mptcp_for_each_subflow(msk, subflow) {
@@ -675,8 +660,6 @@ static int mptcp_setsockopt_sol_tcp_nodelay(struct mptcp_sock *msk, sockptr_t op
 	}
 	if (val)
 		mptcp_check_and_set_pending(sk);
-	release_sock(sk);
-
 	return 0;
 }
 
@@ -799,35 +782,10 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 	int ret, val;
 
 	switch (optname) {
-	case TCP_INQ:
-		ret = mptcp_get_int_option(msk, optval, optlen, &val);
-		if (ret)
-			return ret;
-		if (val < 0 || val > 1)
-			return -EINVAL;
-
-		lock_sock(sk);
-		msk->recvmsg_inq = !!val;
-		release_sock(sk);
-		return 0;
 	case TCP_ULP:
 		return -EOPNOTSUPP;
-	case TCP_NOTSENT_LOWAT:
-		ret = mptcp_get_int_option(msk, optval, optlen, &val);
-		if (ret)
-			return ret;
-
-		lock_sock(sk);
-		WRITE_ONCE(msk->notsent_lowat, val);
-		mptcp_write_space(sk);
-		release_sock(sk);
-		return 0;
 	case TCP_CONGESTION:
 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
-	case TCP_CORK:
-		return mptcp_setsockopt_sol_tcp_cork(msk, optval, optlen);
-	case TCP_NODELAY:
-		return mptcp_setsockopt_sol_tcp_nodelay(msk, optval, optlen);
 	case TCP_DEFER_ACCEPT:
 		/* See tcp.c: TCP_DEFER_ACCEPT does not fail */
 		mptcp_setsockopt_first_sf_only(msk, SOL_TCP, optname, optval, optlen);
@@ -840,7 +798,34 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 						      optval, optlen);
 	}
 
-	return -EOPNOTSUPP;
+	ret = mptcp_get_int_option(msk, optval, optlen, &val);
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	switch (optname) {
+	case TCP_INQ:
+		if (val < 0 || val > 1)
+			ret = -EINVAL;
+		else
+			msk->recvmsg_inq = !!val;
+		break;
+	case TCP_NOTSENT_LOWAT:
+		WRITE_ONCE(msk->notsent_lowat, val);
+		mptcp_write_space(sk);
+		break;
+	case TCP_CORK:
+		ret = __mptcp_setsockopt_sol_tcp_cork(msk, val);
+		break;
+	case TCP_NODELAY:
+		ret = __mptcp_setsockopt_sol_tcp_nodelay(msk, val);
+		break;
+	default:
+		ret = -ENOPROTOOPT;
+	}
+
+	release_sock(sk);
+	return ret;
 }
 
 int mptcp_setsockopt(struct sock *sk, int level, int optname,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
@ 2024-02-12 16:10   ` MPTCP CI
  2024-02-12 16:28   ` MPTCP CI
  2024-02-13  9:53   ` MPTCP CI
  2 siblings, 0 replies; 15+ messages in thread
From: MPTCP CI @ 2024-02-12 16:10 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7874079031

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/11a409accb04


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  2024-02-12 16:10   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
@ 2024-02-12 16:28   ` MPTCP CI
  2024-02-13  9:53   ` MPTCP CI
  2 siblings, 0 replies; 15+ messages in thread
From: MPTCP CI @ 2024-02-12 16:28 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Script error! ❓:
  - Task: https://cirrus-ci.com/task/4764402235211776
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4764402235211776/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5890302142054400
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5890302142054400/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/11a409accb04


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mptcp: cleanup SOL_TCP handling: Tests Results
  2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
  2024-02-12 16:10   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
  2024-02-12 16:28   ` MPTCP CI
@ 2024-02-13  9:53   ` MPTCP CI
  2 siblings, 0 replies; 15+ messages in thread
From: MPTCP CI @ 2024-02-13  9:53 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5917362851938304
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5917362851938304/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5890302142054400
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5890302142054400/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/11a409accb04


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss
  2024-02-12 15:18 ` [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
@ 2024-02-15 22:20   ` Mat Martineau
  2024-02-16  9:39     ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Mat Martineau @ 2024-02-15 22:20 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 12 Feb 2024, Paolo Abeni wrote:

> After the blamed commit below, the send buffer auto-tuning can
> happen after that the mptcp_propagate_sndbuf() completes - via
> the delegated action infrastructure.
>
> We must check for write space even after such change or we risk
> missing the wake-up event.
>
> Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.h | 27 ++++++++++++++++++++-------
> 1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 486fff865803..2e197262a42e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -797,6 +797,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
> 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
> }
>
> +static inline void mptcp_write_space(struct sock *sk)
> +{
> +	if (sk_stream_is_writeable(sk)) {
> +		/* pairs with memory barrier in mptcp_poll */
> +		smp_mb();
> +		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
> +			sk_stream_write_space(sk);
> +	}
> +}
> +
> static inline void __mptcp_sync_sndbuf(struct sock *sk)
> {
> 	struct mptcp_subflow_context *subflow;
> @@ -815,6 +825,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
>
> 	/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
> 	WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
> +	mptcp_write_space(sk);
> }
>
> /* The called held both the msk socket and the subflow socket locks,
> @@ -845,14 +856,16 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
> 	local_bh_enable();
> }
>
> -static inline void mptcp_write_space(struct sock *sk)
> +static inline void __mptcp_sync_sndnxt(struct sock *sk, u64 new_snd_nxt)

Hi Paolo -

I think you missed my comment on this in v1, __mptcp_sync_sndnxt() isn't 
called anywhere.

- Mat

> {
> -	if (sk_stream_is_writeable(sk)) {
> -		/* pairs with memory barrier in mptcp_poll */
> -		smp_mb();
> -		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
> -			sk_stream_write_space(sk);
> -	}
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> +
> +	if (!after64(new_snd_nxt, msk->snd_nxt))
> +		return;
> +
> +	msk->bytes_sent += new_snd_nxt - msk->snd_nxt;
> +	WRITE_ONCE(msk->snd_nxt, new_snd_nxt);
> +	mptcp_write_space(sk);
> }
>
> void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
> -- 
> 2.43.0
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support.
  2024-02-12 15:19 ` [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
@ 2024-02-15 22:31   ` Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2024-02-15 22:31 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 12 Feb 2024, Paolo Abeni wrote:

> Add support for such socket option storing the user-space provided
> value in a new msk field, and using such data to implement the
> _mptcp_stream_memory_free() helper, similar to the TCP one.
>
> To avoid adding more indirect calls in the fast path, open-code
> a variant of sk_stream_memory_free() in mptcp_sendmsg() and add
> direct calls to the mptcp stream memory free helper where possible.
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 48 +++++++++++++++++++++++++++++++++++++++-----
> net/mptcp/protocol.h | 28 +++++++++++++++++++++++++-
> net/mptcp/sockopt.c  | 12 +++++++++++
> 3 files changed, 82 insertions(+), 6 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index def01e030121..cc5680c4c5b3 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1762,6 +1762,39 @@ static int do_copy_data_nocache(struct sock *sk, int copy,
> 	return 0;
> }
>
> +static bool mptcp_stream_memory_free(const struct sock *sk, int wake)
> +{
> +	const struct mptcp_sock *msk = mptcp_sk(sk);
> +	u32 notsent_bytes = READ_ONCE(msk->write_seq) -
> +			    READ_ONCE(msk->snd_nxt);
> +
> +	return (notsent_bytes << wake) < mptcp_notsent_lowat(sk);

Hi Paolo -

This is identical to __mptcp_stream_memory_free() (other than 'inline' and 
combining notsent_bytes assignment w/ declaration). Prefer to avoid the 
code duplication!


- Mat


> +}
> +
> +/* open-code sk_stream_memory_free() plus sent limit computation to
> + * avoid indirect calls in fast-path.
> + * Called under the msk socket lock, so we can avoid a bunch of ONCE
> + * annotations.
> + */
> +static u32 mptcp_send_limit(const struct sock *sk)
> +{
> +	const struct mptcp_sock *msk = mptcp_sk(sk);
> +	u32 limit, not_sent;
> +
> +	if (sk->sk_wmem_queued >= READ_ONCE(sk->sk_sndbuf))
> +		return 0;
> +
> +	limit = mptcp_notsent_lowat(sk);
> +	if (limit == UINT_MAX)
> +		return UINT_MAX;
> +
> +	not_sent = msk->write_seq - msk->snd_nxt;
> +	if (not_sent >= limit)
> +		return 0;
> +
> +	return limit - not_sent;
> +}
> +
> static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -1806,6 +1839,12 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 		struct mptcp_data_frag *dfrag;
> 		bool dfrag_collapsed;
> 		size_t psize, offset;
> +		u32 copy_limit;
> +
> +		/* ensure fitting the notsent_lowat() constraint */
> +		copy_limit = mptcp_send_limit(sk);
> +		if (!copy_limit)
> +			goto wait_for_memory;
>
> 		/* reuse tail pfrag, if possible, or carve a new one from the
> 		 * page allocator
> @@ -1813,9 +1852,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 		dfrag = mptcp_pending_tail(sk);
> 		dfrag_collapsed = mptcp_frag_can_collapse_to(msk, pfrag, dfrag);
> 		if (!dfrag_collapsed) {
> -			if (!sk_stream_memory_free(sk))
> -				goto wait_for_memory;
> -
> 			if (!mptcp_page_frag_refill(sk, pfrag))
> 				goto wait_for_memory;
>
> @@ -1830,6 +1866,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 		offset = dfrag->offset + dfrag->data_len;
> 		psize = pfrag->size - offset;
> 		psize = min_t(size_t, psize, msg_data_left(msg));
> +		psize = min_t(size_t, psize, copy_limit);
> 		total_ts = psize + frag_truesize;
>
> 		if (!sk_wmem_schedule(sk, total_ts))
> @@ -3711,6 +3748,7 @@ static struct proto mptcp_prot = {
> 	.unhash		= mptcp_unhash,
> 	.get_port	= mptcp_get_port,
> 	.forward_alloc_get	= mptcp_forward_alloc_get,
> +	.stream_memory_free	= mptcp_stream_memory_free,
> 	.sockets_allocated	= &mptcp_sockets_allocated,
>
> 	.memory_allocated	= &tcp_memory_allocated,
> @@ -3884,12 +3922,12 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk)
> {
> 	struct sock *sk = (struct sock *)msk;
>
> -	if (sk_stream_is_writeable(sk))
> +	if (__mptcp_stream_is_writeable(sk, 1))
> 		return EPOLLOUT | EPOLLWRNORM;
>
> 	set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> 	smp_mb__after_atomic(); /* NOSPACE is changed by mptcp_write_space() */
> -	if (sk_stream_is_writeable(sk))
> +	if (__mptcp_stream_is_writeable(sk, 1))
> 		return EPOLLOUT | EPOLLWRNORM;
>
> 	return 0;
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 0aa77c03643d..62b84cc6f35e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -307,6 +307,7 @@ struct mptcp_sock {
> 			in_accept_queue:1,
> 			free_first:1,
> 			rcvspace_init:1;
> +	u32		notsent_lowat;
> 	struct work_struct work;
> 	struct sk_buff  *ooo_last_skb;
> 	struct rb_root  out_of_order_queue;
> @@ -796,11 +797,36 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
> 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
> }
>
> +static inline u32 mptcp_notsent_lowat(const struct sock *sk)
> +{
> +	struct net *net = sock_net(sk);
> +	u32 val;
> +
> +	val = READ_ONCE(mptcp_sk(sk)->notsent_lowat);
> +	return val ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat);
> +}
> +
> +static inline bool __mptcp_stream_memory_free(const struct sock *sk, int wake)
> +{
> +	const struct mptcp_sock *msk = mptcp_sk(sk);
> +	u32 notsent_bytes;
> +
> +	notsent_bytes = READ_ONCE(msk->write_seq) - READ_ONCE(msk->snd_nxt);
> +	return (notsent_bytes << wake) < mptcp_notsent_lowat(sk);
> +}
> +
> +static inline bool __mptcp_stream_is_writeable(const struct sock *sk, int wake)
> +{
> +	return __mptcp_stream_memory_free(sk, wake) &&
> +	       __sk_stream_is_writeable(sk, wake);
> +}
> +
> static inline void mptcp_write_space(struct sock *sk)
> {
> 	/* pairs with memory barrier in mptcp_poll */
> 	smp_mb();
> -	sk_stream_write_space(sk);
> +	if (__mptcp_stream_memory_free(sk, 1))
> +		sk_stream_write_space(sk);
> }
>
> static inline void __mptcp_sync_sndbuf(struct sock *sk)
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index ac37f6c5e2ed..1b38dac70719 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -812,6 +812,16 @@ static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
> 		return 0;
> 	case TCP_ULP:
> 		return -EOPNOTSUPP;
> +	case TCP_NOTSENT_LOWAT:
> +		ret = mptcp_get_int_option(msk, optval, optlen, &val);
> +		if (ret)
> +			return ret;
> +
> +		lock_sock(sk);
> +		WRITE_ONCE(msk->notsent_lowat, val);
> +		mptcp_write_space(sk);
> +		release_sock(sk);
> +		return 0;
> 	case TCP_CONGESTION:
> 		return mptcp_setsockopt_sol_tcp_congestion(msk, optval, optlen);
> 	case TCP_CORK:
> @@ -1345,6 +1355,8 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
> 		return mptcp_put_int_option(msk, optval, optlen, msk->cork);
> 	case TCP_NODELAY:
> 		return mptcp_put_int_option(msk, optval, optlen, msk->nodelay);
> +	case TCP_NOTSENT_LOWAT:
> +		return mptcp_put_int_option(msk, optval, optlen, msk->notsent_lowat);
> 	}
> 	return -EOPNOTSUPP;
> }
> -- 
> 2.43.0
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [mptcp-next v2 1/7] mptcp: push ad DSS boundaries
  2024-02-12 15:18 ` [mptcp-next v2 1/7] mptcp: push ad DSS boundaries Paolo Abeni
@ 2024-02-15 22:33   ` Mat Martineau
  0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2024-02-15 22:33 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Mon, 12 Feb 2024, Paolo Abeni wrote:

> when inserting not contiguous data in the subflow write queue,
> the protocol creates a new skb and prevent the TCP stack from
> merging it later with already queued skbs by setting the EOR marker.
>
> Still no push flag is explicitly set at the end of previous GSO
> packet, making the aggregation on the receiver side sub-optimal -
> and packetdrill self-tests less predictable.
>
> Explicitly mark the end of not contiguous DSS with the push flag.
>
> Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

As long as I'm asking for other changes that require a v3 spin, please 
also change this commit title to "push at..."

Thanks!

Mat


> net/mptcp/protocol.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 3017b01ac488..21b3729c65ac 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1265,6 +1265,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
> 		mpext = mptcp_get_ext(skb);
> 		if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
> 			TCP_SKB_CB(skb)->eor = 1;
> +			tcp_mark_push(tcp_sk(ssk), skb);
> 			goto alloc_skb;
> 		}
>
> -- 
> 2.43.0
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss
  2024-02-15 22:20   ` Mat Martineau
@ 2024-02-16  9:39     ` Paolo Abeni
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Abeni @ 2024-02-16  9:39 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Thu, 2024-02-15 at 14:20 -0800, Mat Martineau wrote:
> On Mon, 12 Feb 2024, Paolo Abeni wrote:
> 
> > After the blamed commit below, the send buffer auto-tuning can
> > happen after that the mptcp_propagate_sndbuf() completes - via
> > the delegated action infrastructure.
> > 
> > We must check for write space even after such change or we risk
> > missing the wake-up event.
> > 
> > Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/mptcp/protocol.h | 27 ++++++++++++++++++++-------
> > 1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> > index 486fff865803..2e197262a42e 100644
> > --- a/net/mptcp/protocol.h
> > +++ b/net/mptcp/protocol.h
> > @@ -797,6 +797,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
> > 	       READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
> > }
> > 
> > +static inline void mptcp_write_space(struct sock *sk)
> > +{
> > +	if (sk_stream_is_writeable(sk)) {
> > +		/* pairs with memory barrier in mptcp_poll */
> > +		smp_mb();
> > +		if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
> > +			sk_stream_write_space(sk);
> > +	}
> > +}
> > +
> > static inline void __mptcp_sync_sndbuf(struct sock *sk)
> > {
> > 	struct mptcp_subflow_context *subflow;
> > @@ -815,6 +825,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
> > 
> > 	/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
> > 	WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
> > +	mptcp_write_space(sk);
> > }
> > 
> > /* The called held both the msk socket and the subflow socket locks,
> > @@ -845,14 +856,16 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
> > 	local_bh_enable();
> > }
> > 
> > -static inline void mptcp_write_space(struct sock *sk)
> > +static inline void __mptcp_sync_sndnxt(struct sock *sk, u64 new_snd_nxt)
> 
> Hi Paolo -
> 
> I think you missed my comment on this in v1, __mptcp_sync_sndnxt() isn't 
> called anywhere.

Indeed! sorry for missing it, and thank you for persisting. Yes, I must
drop the function definition. FTR, it's a left-over of a previous
implementation attempt.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-02-16  9:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-12 15:18 [mptcp-next v2 0/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
2024-02-12 15:18 ` [mptcp-next v2 1/7] mptcp: push ad DSS boundaries Paolo Abeni
2024-02-15 22:33   ` Mat Martineau
2024-02-12 15:18 ` [mptcp-next v2 2/7] mptcp: fix snd_wnd initialization for passive socket Paolo Abeni
2024-02-12 15:18 ` [mptcp-next v2 3/7] mptcp: fix potential wake-up event loss Paolo Abeni
2024-02-15 22:20   ` Mat Martineau
2024-02-16  9:39     ` Paolo Abeni
2024-02-12 15:18 ` [mptcp-next v2 4/7] mptcp: cleanup writer wake-up Paolo Abeni
2024-02-12 15:19 ` [mptcp-next v2 5/7] mptcp: avoid some duplicate code in socket option handling Paolo Abeni
2024-02-12 15:19 ` [mptcp-next v2 6/7] mptcp: implement TCP_NOTSENT_LOWAT support Paolo Abeni
2024-02-15 22:31   ` Mat Martineau
2024-02-12 15:19 ` [mptcp-next v2 7/7] mptcp: cleanup SOL_TCP handling Paolo Abeni
2024-02-12 16:10   ` mptcp: cleanup SOL_TCP handling: Tests Results MPTCP CI
2024-02-12 16:28   ` MPTCP CI
2024-02-13  9:53   ` MPTCP CI

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.