netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] mptcp: fix 3rd ack rtx timer
@ 2021-11-19 14:27 Paolo Abeni
  2021-11-19 14:27 ` [PATCH net 1/2] mptcp: fix delack timer Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-11-19 14:27 UTC (permalink / raw)
  To: netdev; +Cc: mptcp, Eric Dumazet

Eric noted that the MPTCP code do the wrong thing to schedule
the MPJ 3rd ack timer. He also provided a patch to address the
issues (patch 1/2).

To fix for good the MPJ 3rd ack retransmission timer, we additionally
need to set it after the current ack is transmitted (patch 2/2)

Note that the bug went unnotice so far because all the related
tests required some running data transfer, and that causes
MPTCP-level ack even on the opening MPJ subflow. We now have
explicit packet drill coverage for this code path.

Eric Dumazet (1):
  mptcp: fix delack timer

Paolo Abeni (1):
  mptcp: use delegate action to schedule 3rd ack retrans

 net/mptcp/options.c  | 32 ++++++++-------------------
 net/mptcp/protocol.c | 51 ++++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h | 17 ++++++++-------
 3 files changed, 60 insertions(+), 40 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net 1/2] mptcp: fix delack timer
  2021-11-19 14:27 [PATCH net 0/2] mptcp: fix 3rd ack rtx timer Paolo Abeni
@ 2021-11-19 14:27 ` Paolo Abeni
  2021-11-19 14:27 ` [PATCH net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-11-19 14:27 UTC (permalink / raw)
  To: netdev; +Cc: mptcp, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

To compute the rtx timeout schedule_3rdack_retransmission() does multiple
things in the wrong way: srtt_us is measured in usec/8 and the timeout
itself is an absolute value.

Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/mptcp/options.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7c3420afb1a0..2e9b73eeeeb5 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -434,9 +434,10 @@ static void schedule_3rdack_retransmission(struct sock *sk)
 
 	/* reschedule with a timeout above RTT, as we must look only for drop */
 	if (tp->srtt_us)
-		timeout = tp->srtt_us << 1;
+		timeout = usecs_to_jiffies(tp->srtt_us >> (3 - 1));
 	else
 		timeout = TCP_TIMEOUT_INIT;
+	timeout += jiffies;
 
 	WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
 	icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 2/2] mptcp: use delegate action to schedule 3rd ack retrans
  2021-11-19 14:27 [PATCH net 0/2] mptcp: fix 3rd ack rtx timer Paolo Abeni
  2021-11-19 14:27 ` [PATCH net 1/2] mptcp: fix delack timer Paolo Abeni
@ 2021-11-19 14:27 ` Paolo Abeni
  2021-11-19 15:46 ` [PATCH net 0/2] mptcp: fix 3rd ack rtx timer David Miller
  2021-11-20 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-11-19 14:27 UTC (permalink / raw)
  To: netdev; +Cc: mptcp, Eric Dumazet

Scheduling a delack in mptcp_established_options_mp() is
not a good idea: such function is called by tcp_send_ack() and
the pending delayed ack will be cleared shortly after by the
tcp_event_ack_sent() call in __tcp_transmit_skb().

Instead use the mptcp delegated action infrastructure to
schedule the delayed ack after the current bh processing completes.

Additionally moves the schedule_3rdack_retransmission() helper
into protocol.c to avoid making it visible in a different compilation
unit.

Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c  | 33 ++++++++--------------------
 net/mptcp/protocol.c | 51 ++++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h | 17 ++++++++-------
 3 files changed, 60 insertions(+), 41 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 2e9b73eeeeb5..fe98e4f475ba 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
 	return false;
 }
 
-/* MP_JOIN client subflow must wait for 4th ack before sending any data:
- * TCP can't schedule delack timer before the subflow is fully established.
- * MPTCP uses the delack timer to do 3rd ack retransmissions
- */
-static void schedule_3rdack_retransmission(struct sock *sk)
-{
-	struct inet_connection_sock *icsk = inet_csk(sk);
-	struct tcp_sock *tp = tcp_sk(sk);
-	unsigned long timeout;
-
-	/* reschedule with a timeout above RTT, as we must look only for drop */
-	if (tp->srtt_us)
-		timeout = usecs_to_jiffies(tp->srtt_us >> (3 - 1));
-	else
-		timeout = TCP_TIMEOUT_INIT;
-	timeout += jiffies;
-
-	WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
-	icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
-	icsk->icsk_ack.timeout = timeout;
-	sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout);
-}
-
 static void clear_3rdack_retransmission(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
 		*size = TCPOLEN_MPTCP_MPJ_ACK;
 		pr_debug("subflow=%p", subflow);
 
-		schedule_3rdack_retransmission(sk);
+		/* we can use the full delegate action helper only from BH context
+		 * If we are in process context - sk is flushing the backlog at
+		 * socket lock release time - just set the appropriate flag, will
+		 * be handled by the release callback
+		 */
+		if (sock_owned_by_user(sk))
+			set_bit(MPTCP_DELEGATE_ACK, &subflow->delegated_status);
+		else
+			mptcp_subflow_delegate(subflow, MPTCP_DELEGATE_ACK);
 		return true;
 	}
 	return false;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b7e32e316738..c82a76d2d0bf 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1596,7 +1596,8 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
 			if (!xmit_ssk)
 				goto out;
 			if (xmit_ssk != ssk) {
-				mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
+				mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk),
+						       MPTCP_DELEGATE_SEND);
 				goto out;
 			}
 
@@ -2943,7 +2944,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
 		if (xmit_ssk == ssk)
 			__mptcp_subflow_push_pending(sk, ssk);
 		else if (xmit_ssk)
-			mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
+			mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), MPTCP_DELEGATE_SEND);
 	} else {
 		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
 	}
@@ -2993,18 +2994,50 @@ static void mptcp_release_cb(struct sock *sk)
 	__mptcp_update_rmem(sk);
 }
 
+/* MP_JOIN client subflow must wait for 4th ack before sending any data:
+ * TCP can't schedule delack timer before the subflow is fully established.
+ * MPTCP uses the delack timer to do 3rd ack retransmissions
+ */
+static void schedule_3rdack_retransmission(struct sock *ssk)
+{
+	struct inet_connection_sock *icsk = inet_csk(ssk);
+	struct tcp_sock *tp = tcp_sk(ssk);
+	unsigned long timeout;
+
+	if (mptcp_subflow_ctx(ssk)->fully_established)
+		return;
+
+	/* reschedule with a timeout above RTT, as we must look only for drop */
+	if (tp->srtt_us)
+		timeout = usecs_to_jiffies(tp->srtt_us >> (3 - 1));
+	else
+		timeout = TCP_TIMEOUT_INIT;
+	timeout += jiffies;
+
+	WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
+	icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
+	icsk->icsk_ack.timeout = timeout;
+	sk_reset_timer(ssk, &icsk->icsk_delack_timer, timeout);
+}
+
 void mptcp_subflow_process_delegated(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = subflow->conn;
 
-	mptcp_data_lock(sk);
-	if (!sock_owned_by_user(sk))
-		__mptcp_subflow_push_pending(sk, ssk);
-	else
-		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
-	mptcp_data_unlock(sk);
-	mptcp_subflow_delegated_done(subflow);
+	if (test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) {
+		mptcp_data_lock(sk);
+		if (!sock_owned_by_user(sk))
+			__mptcp_subflow_push_pending(sk, ssk);
+		else
+			set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
+		mptcp_data_unlock(sk);
+		mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_SEND);
+	}
+	if (test_bit(MPTCP_DELEGATE_ACK, &subflow->delegated_status)) {
+		schedule_3rdack_retransmission(ssk);
+		mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_ACK);
+	}
 }
 
 static int mptcp_hash(struct sock *sk)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 67a61ac48b20..d87cc040352e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -387,6 +387,7 @@ struct mptcp_delegated_action {
 DECLARE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
 
 #define MPTCP_DELEGATE_SEND		0
+#define MPTCP_DELEGATE_ACK		1
 
 /* MPTCP subflow context */
 struct mptcp_subflow_context {
@@ -492,23 +493,23 @@ static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk,
 
 void mptcp_subflow_process_delegated(struct sock *ssk);
 
-static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow)
+static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow, int action)
 {
 	struct mptcp_delegated_action *delegated;
 	bool schedule;
 
+	/* the caller held the subflow bh socket lock */
+	lockdep_assert_in_softirq();
+
 	/* The implied barrier pairs with mptcp_subflow_delegated_done(), and
 	 * ensures the below list check sees list updates done prior to status
 	 * bit changes
 	 */
-	if (!test_and_set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) {
+	if (!test_and_set_bit(action, &subflow->delegated_status)) {
 		/* still on delegated list from previous scheduling */
 		if (!list_empty(&subflow->delegated_node))
 			return;
 
-		/* the caller held the subflow bh socket lock */
-		lockdep_assert_in_softirq();
-
 		delegated = this_cpu_ptr(&mptcp_delegated_actions);
 		schedule = list_empty(&delegated->head);
 		list_add_tail(&subflow->delegated_node, &delegated->head);
@@ -533,16 +534,16 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
 
 static inline bool mptcp_subflow_has_delegated_action(const struct mptcp_subflow_context *subflow)
 {
-	return test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
+	return !!READ_ONCE(subflow->delegated_status);
 }
 
-static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow)
+static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow, int action)
 {
 	/* pairs with mptcp_subflow_delegate, ensures delegate_node is updated before
 	 * touching the status bit
 	 */
 	smp_wmb();
-	clear_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
+	clear_bit(action, &subflow->delegated_status);
 }
 
 int mptcp_is_enabled(const struct net *net);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/2] mptcp: fix 3rd ack rtx timer
  2021-11-19 14:27 [PATCH net 0/2] mptcp: fix 3rd ack rtx timer Paolo Abeni
  2021-11-19 14:27 ` [PATCH net 1/2] mptcp: fix delack timer Paolo Abeni
  2021-11-19 14:27 ` [PATCH net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
@ 2021-11-19 15:46 ` David Miller
  2021-11-20 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2021-11-19 15:46 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, mptcp, edumazet

From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 19 Nov 2021 15:27:53 +0100

> Eric noted that the MPTCP code do the wrong thing to schedule
> the MPJ 3rd ack timer. He also provided a patch to address the
> issues (patch 1/2).
> 
> To fix for good the MPJ 3rd ack retransmission timer, we additionally
> need to set it after the current ack is transmitted (patch 2/2)
> 
> Note that the bug went unnotice so far because all the related
> tests required some running data transfer, and that causes
> MPTCP-level ack even on the opening MPJ subflow. We now have
> explicit packet drill coverage for this code path.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/2] mptcp: fix 3rd ack rtx timer
  2021-11-19 14:27 [PATCH net 0/2] mptcp: fix 3rd ack rtx timer Paolo Abeni
                   ` (2 preceding siblings ...)
  2021-11-19 15:46 ` [PATCH net 0/2] mptcp: fix 3rd ack rtx timer David Miller
@ 2021-11-20 14:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-20 14:30 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, mptcp, edumazet

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 19 Nov 2021 15:27:53 +0100 you wrote:
> Eric noted that the MPTCP code do the wrong thing to schedule
> the MPJ 3rd ack timer. He also provided a patch to address the
> issues (patch 1/2).
> 
> To fix for good the MPJ 3rd ack retransmission timer, we additionally
> need to set it after the current ack is transmitted (patch 2/2)
> 
> [...]

Here is the summary with links:
  - [net,1/2] mptcp: fix delack timer
    https://git.kernel.org/netdev/net/c/ee50e67ba0e1
  - [net,2/2] mptcp: use delegate action to schedule 3rd ack retrans
    https://git.kernel.org/netdev/net/c/bcd97734318d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-20 14:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-19 14:27 [PATCH net 0/2] mptcp: fix 3rd ack rtx timer Paolo Abeni
2021-11-19 14:27 ` [PATCH net 1/2] mptcp: fix delack timer Paolo Abeni
2021-11-19 14:27 ` [PATCH net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
2021-11-19 15:46 ` [PATCH net 0/2] mptcp: fix 3rd ack rtx timer David Miller
2021-11-20 14:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).