* [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions @ 2021-10-28 13:08 Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni 0 siblings, 2 replies; 5+ messages in thread From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw) To: mptcp The first patch is from Eric, thus is correct ;) The 2nd one is much more doubtful, starting from the author ;) The individual commit messages should be descriptive enough. With these 2 patches the related, pending pktdrill test: https://github.com/multipath-tcp/packetdrill/pull/70 now complete successfully Eric Dumazet (1): mptcp: fix delack timer Paolo Abeni (1): mptcp: use delegate action to schedule 3rd ack retrans net/mptcp/options.c | 32 ++++++++-------------------- net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++-------- net/mptcp/protocol.h | 17 ++++++++------- 3 files changed, 59 insertions(+), 40 deletions(-) -- 2.26.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH mptcp-net 1/2] mptcp: fix delack timer 2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni @ 2021-10-28 13:08 ` Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni 1 sibling, 0 replies; 5+ messages in thread From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw) To: mptcp From: Eric Dumazet <edumazet@google.com> To compute the rtx timeout schedule_3rdack_retransmission() does multiple things in the wrong way: srtt_us is measures in usec/8 and the timeout itself is an absolute vale. Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") Signed-off-by: Eric Dumazet <edumazet@google.com> --- commit message typos added by me --- net/mptcp/options.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 422f4acfb3e6..465d4143d555 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -434,9 +434,10 @@ static void schedule_3rdack_retransmission(struct sock *sk) /* reschedule with a timeout above RTT, as we must look only for drop */ if (tp->srtt_us) - timeout = tp->srtt_us << 1; + timeout = usecs_to_jiffies(tp->srtt_us >> (3-1)); else timeout = TCP_TIMEOUT_INIT; + timeout += jiffies; WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER); icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER; -- 2.26.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans 2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni @ 2021-10-28 13:08 ` Paolo Abeni 2021-10-29 23:36 ` Mat Martineau 1 sibling, 1 reply; 5+ messages in thread From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw) To: mptcp Scheduling a delack in mptcp_established_options_mp() is not a good idea: such function is called by tcp_send_ack() and the pending delayed ack will be cleared shortly after by the tcp_event_ack_sent() call in __tcp_transmit_skb(). Instead use the mptcp delegated action infrastructure to schedule the delayed ack after the current bh processing completes. Additionally moves the schedule_3rdack_retransmission() helper into protocol.c to avoid making it visible in a different compilation unit. Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- side note: Why we already see 3rd ack retransimssion in pcap traces even if the delack timer was broken sice the beginning?!? I guess that is due to pure ack generated by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only checks for the TCP status and not the MPTCP one. When landing on MPJ subflows not already fully established, such data-ack will be converted in MPJ 3rd ack by the MPTCP output path/out options building, as the MPJ option has higher priority than DSS --- net/mptcp/options.c | 33 ++++++++--------------------- net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++-------- net/mptcp/protocol.h | 17 ++++++++------- 3 files changed, 59 insertions(+), 41 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 465d4143d555..12ffee025373 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, return false; } -/* MP_JOIN client subflow must wait for 4th ack before sending any data: - * TCP can't schedule delack timer before the subflow is fully established. - * MPTCP uses the delack timer to do 3rd ack retransmissions - */ -static void schedule_3rdack_retransmission(struct sock *sk) -{ - struct inet_connection_sock *icsk = inet_csk(sk); - struct tcp_sock *tp = tcp_sk(sk); - unsigned long timeout; - - /* reschedule with a timeout above RTT, as we must look only for drop */ - if (tp->srtt_us) - timeout = usecs_to_jiffies(tp->srtt_us >> (3-1)); - else - timeout = TCP_TIMEOUT_INIT; - timeout += jiffies; - - WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER); - icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER; - icsk->icsk_ack.timeout = timeout; - sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout); -} - static void clear_3rdack_retransmission(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb, *size = TCPOLEN_MPTCP_MPJ_ACK; pr_debug("subflow=%p", subflow); - schedule_3rdack_retransmission(sk); + /* we can use the full delegate action helper only from BH context + * If we are in process context - sk is flushing the backlog at + * socket lock release time - just set the appropriate flag, will + * be handled by the release callback + */ + if (sock_owned_by_user(sk)) + set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status); + else + mptcp_subflow_delegate(subflow, MPTCP_DELEGATE_ACK); return true; } return false; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 582f5f88d6ef..cd00f8998375 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1619,7 +1619,7 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) if (!xmit_ssk) goto out; if (xmit_ssk != ssk) { - mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk)); + mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), MPTCP_DELEGATE_SEND); goto out; } @@ -2972,7 +2972,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk) if (xmit_ssk == ssk) __mptcp_subflow_push_pending(sk, ssk); else if (xmit_ssk) - mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk)); + mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), MPTCP_DELEGATE_SEND); } else { set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); } @@ -3022,18 +3022,50 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_update_rmem(sk); } +/* MP_JOIN client subflow must wait for 4th ack before sending any data: + * TCP can't schedule delack timer before the subflow is fully established. + * MPTCP uses the delack timer to do 3rd ack retransmissions + */ +static void schedule_3rdack_retransmission(struct sock *ssk) +{ + struct inet_connection_sock *icsk = inet_csk(ssk); + struct tcp_sock *tp = tcp_sk(ssk); + unsigned long timeout; + + if (mptcp_subflow_ctx(ssk)->fully_established) + return; + + /* reschedule with a timeout above RTT, as we must look only for drop */ + if (tp->srtt_us) + timeout = usecs_to_jiffies(tp->srtt_us >> (3-1)); + else + timeout = TCP_TIMEOUT_INIT; + timeout += jiffies; + + WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER); + icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER; + icsk->icsk_ack.timeout = timeout; + sk_reset_timer(ssk, &icsk->icsk_delack_timer, timeout); +} + void mptcp_subflow_process_delegated(struct sock *ssk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct sock *sk = subflow->conn; - mptcp_data_lock(sk); - if (!sock_owned_by_user(sk)) - __mptcp_subflow_push_pending(sk, ssk); - else - set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); - mptcp_data_unlock(sk); - mptcp_subflow_delegated_done(subflow); + if (test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) { + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) + __mptcp_subflow_push_pending(sk, ssk); + else + set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); + mptcp_data_unlock(sk); + mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_SEND); + } + if (test_bit(MPTCP_DELEGATE_ACK, &subflow->delegated_status)) { + schedule_3rdack_retransmission(ssk); + mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_ACK); + } } static int mptcp_hash(struct sock *sk) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 46691acdea24..f6f4054e19ee 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -387,6 +387,7 @@ struct mptcp_delegated_action { DECLARE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions); #define MPTCP_DELEGATE_SEND 0 +#define MPTCP_DELEGATE_ACK 1 /* MPTCP subflow context */ struct mptcp_subflow_context { @@ -493,23 +494,23 @@ static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk, void mptcp_subflow_process_delegated(struct sock *ssk); -static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow) +static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow, int action) { struct mptcp_delegated_action *delegated; bool schedule; + /* the caller held the subflow bh socket lock */ + lockdep_assert_in_softirq(); + /* The implied barrier pairs with mptcp_subflow_delegated_done(), and * ensures the below list check sees list updates done prior to status * bit changes */ - if (!test_and_set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) { + if (!test_and_set_bit(action, &subflow->delegated_status)) { /* still on delegated list from previous scheduling */ if (!list_empty(&subflow->delegated_node)) return; - /* the caller held the subflow bh socket lock */ - lockdep_assert_in_softirq(); - delegated = this_cpu_ptr(&mptcp_delegated_actions); schedule = list_empty(&delegated->head); list_add_tail(&subflow->delegated_node, &delegated->head); @@ -534,16 +535,16 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated) static inline bool mptcp_subflow_has_delegated_action(const struct mptcp_subflow_context *subflow) { - return test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status); + return !!READ_ONCE(subflow->delegated_status); } -static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow) +static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow, int action) { /* pairs with mptcp_subflow_delegate, ensures delegate_node is updated before * touching the status bit */ smp_wmb(); - clear_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status); + clear_bit(action, &subflow->delegated_status); } int mptcp_is_enabled(const struct net *net); -- 2.26.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans 2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni @ 2021-10-29 23:36 ` Mat Martineau 2021-11-02 17:56 ` Paolo Abeni 0 siblings, 1 reply; 5+ messages in thread From: Mat Martineau @ 2021-10-29 23:36 UTC (permalink / raw) To: Paolo Abeni; +Cc: mptcp On Thu, 28 Oct 2021, Paolo Abeni wrote: > Scheduling a delack in mptcp_established_options_mp() is > not a good idea: such function is called by tcp_send_ack() and > the pending delayed ack will be cleared shortly after by the > tcp_event_ack_sent() call in __tcp_transmit_skb(). > > Instead use the mptcp delegated action infrastructure to > schedule the delayed ack after the current bh processing completes. > > Additionally moves the schedule_3rdack_retransmission() helper > into protocol.c to avoid making it visible in a different compilation > unit. > > Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > side note: > Why we already see 3rd ack retransimssion in pcap traces even if the delack timer > was broken sice the beginning?!? I guess that is due to pure ack generated > by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only > checks for the TCP status and not the MPTCP one. When landing on MPJ > subflows not already fully established, such data-ack will be converted in MPJ > 3rd ack by the MPTCP output path/out options building, as the MPJ option > has higher priority than DSS > --- > net/mptcp/options.c | 33 ++++++++--------------------- > net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++-------- > net/mptcp/protocol.h | 17 ++++++++------- > 3 files changed, 59 insertions(+), 41 deletions(-) > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c > index 465d4143d555..12ffee025373 100644 > --- a/net/mptcp/options.c > +++ b/net/mptcp/options.c > @@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, > return false; > } > > -/* MP_JOIN client subflow must wait for 4th ack before sending any data: > - * TCP can't schedule delack timer before the subflow is fully established. > - * MPTCP uses the delack timer to do 3rd ack retransmissions > - */ > -static void schedule_3rdack_retransmission(struct sock *sk) > -{ > - struct inet_connection_sock *icsk = inet_csk(sk); > - struct tcp_sock *tp = tcp_sk(sk); > - unsigned long timeout; > - > - /* reschedule with a timeout above RTT, as we must look only for drop */ > - if (tp->srtt_us) > - timeout = usecs_to_jiffies(tp->srtt_us >> (3-1)); > - else > - timeout = TCP_TIMEOUT_INIT; > - timeout += jiffies; > - > - WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER); > - icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER; > - icsk->icsk_ack.timeout = timeout; > - sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout); > -} > - > static void clear_3rdack_retransmission(struct sock *sk) > { > struct inet_connection_sock *icsk = inet_csk(sk); > @@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb, > *size = TCPOLEN_MPTCP_MPJ_ACK; > pr_debug("subflow=%p", subflow); > > - schedule_3rdack_retransmission(sk); > + /* we can use the full delegate action helper only from BH context > + * If we are in process context - sk is flushing the backlog at > + * socket lock release time - just set the appropriate flag, will > + * be handled by the release callback > + */ > + if (sock_owned_by_user(sk)) > + set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status); I believe you intended MPTCP_DELEGATE_ACK here > + else > + mptcp_subflow_delegate(subflow, MPTCP_DELEGATE_ACK); > return true; > } > return false; The rest looks good. -- Mat Martineau Intel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans 2021-10-29 23:36 ` Mat Martineau @ 2021-11-02 17:56 ` Paolo Abeni 0 siblings, 0 replies; 5+ messages in thread From: Paolo Abeni @ 2021-11-02 17:56 UTC (permalink / raw) To: Mat Martineau; +Cc: mptcp On Fri, 2021-10-29 at 16:36 -0700, Mat Martineau wrote: > On Thu, 28 Oct 2021, Paolo Abeni wrote: > > > Scheduling a delack in mptcp_established_options_mp() is > > not a good idea: such function is called by tcp_send_ack() and > > the pending delayed ack will be cleared shortly after by the > > tcp_event_ack_sent() call in __tcp_transmit_skb(). > > > > Instead use the mptcp delegated action infrastructure to > > schedule the delayed ack after the current bh processing completes. > > > > Additionally moves the schedule_3rdack_retransmission() helper > > into protocol.c to avoid making it visible in a different compilation > > unit. > > > > Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > side note: > > Why we already see 3rd ack retransimssion in pcap traces even if the delack timer > > was broken sice the beginning?!? I guess that is due to pure ack generated > > by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only > > checks for the TCP status and not the MPTCP one. When landing on MPJ > > subflows not already fully established, such data-ack will be converted in MPJ > > 3rd ack by the MPTCP output path/out options building, as the MPJ option > > has higher priority than DSS > > --- > > net/mptcp/options.c | 33 ++++++++--------------------- > > net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++-------- > > net/mptcp/protocol.h | 17 ++++++++------- > > 3 files changed, 59 insertions(+), 41 deletions(-) > > > > diff --git a/net/mptcp/options.c b/net/mptcp/options.c > > index 465d4143d555..12ffee025373 100644 > > --- a/net/mptcp/options.c > > +++ b/net/mptcp/options.c > > @@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, > > return false; > > } > > > > -/* MP_JOIN client subflow must wait for 4th ack before sending any data: > > - * TCP can't schedule delack timer before the subflow is fully established. > > - * MPTCP uses the delack timer to do 3rd ack retransmissions > > - */ > > -static void schedule_3rdack_retransmission(struct sock *sk) > > -{ > > - struct inet_connection_sock *icsk = inet_csk(sk); > > - struct tcp_sock *tp = tcp_sk(sk); > > - unsigned long timeout; > > - > > - /* reschedule with a timeout above RTT, as we must look only for drop */ > > - if (tp->srtt_us) > > - timeout = usecs_to_jiffies(tp->srtt_us >> (3-1)); > > - else > > - timeout = TCP_TIMEOUT_INIT; > > - timeout += jiffies; > > - > > - WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER); > > - icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER; > > - icsk->icsk_ack.timeout = timeout; > > - sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout); > > -} > > - > > static void clear_3rdack_retransmission(struct sock *sk) > > { > > struct inet_connection_sock *icsk = inet_csk(sk); > > @@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb, > > *size = TCPOLEN_MPTCP_MPJ_ACK; > > pr_debug("subflow=%p", subflow); > > > > - schedule_3rdack_retransmission(sk); > > + /* we can use the full delegate action helper only from BH context > > + * If we are in process context - sk is flushing the backlog at > > + * socket lock release time - just set the appropriate flag, will > > + * be handled by the release callback > > + */ > > + if (sock_owned_by_user(sk)) > > + set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status); > > I believe you intended MPTCP_DELEGATE_ACK here Indeed! self-test did not catch the above because the mpj ssh is almost never owned by user before being fully-acked (and generally speaking every subflow has a very small probability of being owned by user) I'll send a new version, thanks! Paolo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-02 17:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni 2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni 2021-10-29 23:36 ` Mat Martineau 2021-11-02 17:56 ` Paolo Abeni
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.