* [PATCH mptcp-net 1/3] mptcp: be sure to send ack when mptcp-level window re-opens
2025-01-10 16:43 [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes Paolo Abeni
@ 2025-01-10 16:43 ` Paolo Abeni
2025-01-10 16:43 ` [PATCH mptcp-net 2/3] mptcp: fix spurious wake-up on under memory pressure Paolo Abeni
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2025-01-10 16:43 UTC (permalink / raw)
To: mptcp
mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.
After commit ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/options.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index a62bc874bf1e..123f3f297284 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -607,7 +607,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
}
opts->ext_copy.use_ack = 1;
opts->suboptions = OPTION_MPTCP_DSS;
- WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
/* Add kind/length/subtype/flag overhead if mapping is not populated */
if (dss_size == 0)
@@ -1288,7 +1287,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
}
MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT);
}
- return;
+ goto update_wspace;
}
if (rcv_wnd_new != rcv_wnd_old) {
@@ -1313,6 +1312,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
th->window = htons(new_win);
MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED);
}
+
+update_wspace:
+ WRITE_ONCE(msk->old_wspace, tp->rcv_wnd);
}
__sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
--
2.47.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH mptcp-net 2/3] mptcp: fix spurious wake-up on under memory pressure.
2025-01-10 16:43 [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes Paolo Abeni
2025-01-10 16:43 ` [PATCH mptcp-net 1/3] mptcp: be sure to send ack when mptcp-level window re-opens Paolo Abeni
@ 2025-01-10 16:43 ` Paolo Abeni
2025-01-10 16:43 ` [PATCH mptcp-net 3/3] selftests: mptcp: avoid spurious errors on disconnect Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2025-01-10 16:43 UTC (permalink / raw)
To: mptcp
The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a93e661ef5c4..73526f1d768f 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -760,10 +760,15 @@ static inline u64 mptcp_data_avail(const struct mptcp_sock *msk)
static inline bool mptcp_epollin_ready(const struct sock *sk)
{
+ u64 data_avail = mptcp_data_avail(mptcp_sk(sk));
+
+ if (!data_avail)
+ return false;
+
/* mptcp doesn't have to deal with small skbs in the receive queue,
- * at it can always coalesce them
+ * as it can always coalesce them
*/
- return (mptcp_data_avail(mptcp_sk(sk)) >= sk->sk_rcvlowat) ||
+ return (data_avail >= sk->sk_rcvlowat) ||
(mem_cgroup_sockets_enabled && sk->sk_memcg &&
mem_cgroup_under_socket_pressure(sk->sk_memcg)) ||
READ_ONCE(tcp_memory_pressure);
--
2.47.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH mptcp-net 3/3] selftests: mptcp: avoid spurious errors on disconnect
2025-01-10 16:43 [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes Paolo Abeni
2025-01-10 16:43 ` [PATCH mptcp-net 1/3] mptcp: be sure to send ack when mptcp-level window re-opens Paolo Abeni
2025-01-10 16:43 ` [PATCH mptcp-net 2/3] mptcp: fix spurious wake-up on under memory pressure Paolo Abeni
@ 2025-01-10 16:43 ` Paolo Abeni
2025-01-11 11:41 ` Matthieu Baerts
2025-01-10 17:55 ` [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes MPTCP CI
2025-01-11 11:39 ` Matthieu Baerts
4 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2025-01-10 16:43 UTC (permalink / raw)
To: mptcp
The disconnect test-case generates spurious errors:
NFO: disconnect
INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
(duration 140ms) [FAIL] file received by server does not match (in, out):
-rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
Trailing bytes are:
��\����R���!8��u2��5N%w------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
Trailing bytes are:
��\����R���!8��u2��5N%2 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ]
04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ]
[FAIL] Tests of the full disconnection have failed
Time: 2 seconds
The root cause is actually in the user-space bits: the test program
currently disconnects as soon as all the pending data has been spooled,
generating an FASTCLOSE. If such option reaches the peer before that
the latter has reached the closed status, the msk socket will report an
error to the user-space, as per protocol specification, causing the above
failure.
Address the issue explicitly waiting for all the relevant sockets to reach
a closed status before performing the disconnect.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
.../selftests/net/mptcp/mptcp_connect.c | 44 ++++++++++++++-----
1 file changed, 33 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 4209b9569039..5cfea1760431 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -25,6 +25,8 @@
#include <sys/types.h>
#include <sys/mman.h>
+#include <arpa/inet.h>
+
#include <netdb.h>
#include <netinet/in.h>
@@ -1211,23 +1213,43 @@ static void parse_setsock_options(const char *name)
exit(1);
}
-void xdisconnect(int fd, int addrlen)
+void xdisconnect(int fd)
{
- struct sockaddr_storage empty;
+ socklen_t addrlen = sizeof(struct sockaddr_storage);
+ struct sockaddr_storage addr, empty;
int msec_sleep = 10;
- int queued = 1;
- int i;
+ void *raw_addr;
+ int i, cmdlen;
+ char cmd[128];
+
+ /* get the local address and convert it to string */
+ if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0)
+ xerror("getsockname");
+
+ if (addr.ss_family == AF_INET)
+ raw_addr= &(((struct sockaddr_in *)&addr)->sin_addr);
+ else if (addr.ss_family == AF_INET6)
+ raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr);
+ else
+ xerror("bad family");
+
+ strcpy(cmd, "ss -M | grep ");
+ cmdlen = strlen(cmd);
+ if (!inet_ntop(addr.ss_family , raw_addr, &cmd[cmdlen],
+ sizeof(cmd) - cmdlen))
+ xerror("inet_ntop");
+ strcat(cmd, " >/dev/null");
shutdown(fd, SHUT_WR);
- /* while until the pending data is completely flushed, the later
+ /*
+ * wait until the pending data is completely flushed and all
+ * the MPTCP sockets reached the closed status.
* disconnect will bypass/ignore/drop any pending data.
*/
for (i = 0; ; i += msec_sleep) {
- if (ioctl(fd, SIOCOUTQ, &queued) < 0)
- xerror("can't query out socket queue: %d", errno);
-
- if (!queued)
+ /* closed socket are not listed by 'ss' */
+ if (system(cmd))
break;
if (i > poll_timeout)
@@ -1281,9 +1303,9 @@ int main_loop(void)
return ret;
if (cfg_truncate > 0) {
- xdisconnect(fd, peer->ai_addrlen);
+ xdisconnect(fd);
} else if (--cfg_repeat > 0) {
- xdisconnect(fd, peer->ai_addrlen);
+ xdisconnect(fd);
/* the socket could be unblocking at this point, we need the
* connect to be blocking
--
2.47.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH mptcp-net 3/3] selftests: mptcp: avoid spurious errors on disconnect
2025-01-10 16:43 ` [PATCH mptcp-net 3/3] selftests: mptcp: avoid spurious errors on disconnect Paolo Abeni
@ 2025-01-11 11:41 ` Matthieu Baerts
0 siblings, 0 replies; 9+ messages in thread
From: Matthieu Baerts @ 2025-01-11 11:41 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 10/01/2025 17:43, Paolo Abeni wrote:
> The disconnect test-case generates spurious errors:
>
> NFO: disconnect
> INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
> 01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
> (duration 140ms) [FAIL] file received by server does not match (in, out):
> -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
> Trailing bytes are:
> ��\����R���!8��u2��5N%w------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
> Trailing bytes are:
> ��\����R���!8��u2��5N%2 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
> 03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ]
> 04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ]
> [FAIL] Tests of the full disconnection have failed
> Time: 2 seconds
>
> The root cause is actually in the user-space bits: the test program
> currently disconnects as soon as all the pending data has been spooled,
> generating an FASTCLOSE. If such option reaches the peer before that
> the latter has reached the closed status, the msk socket will report an
> error to the user-space, as per protocol specification, causing the above
> failure.
Good catch!
> Address the issue explicitly waiting for all the relevant sockets to reach
> a closed status before performing the disconnect.
>
> Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> .../selftests/net/mptcp/mptcp_connect.c | 44 ++++++++++++++-----
> 1 file changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> index 4209b9569039..5cfea1760431 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> @@ -25,6 +25,8 @@
> #include <sys/types.h>
> #include <sys/mman.h>
>
> +#include <arpa/inet.h>
> +
> #include <netdb.h>
> #include <netinet/in.h>
>
> @@ -1211,23 +1213,43 @@ static void parse_setsock_options(const char *name)
> exit(1);
> }
>
> -void xdisconnect(int fd, int addrlen)
> +void xdisconnect(int fd)
> {
> - struct sockaddr_storage empty;
> + socklen_t addrlen = sizeof(struct sockaddr_storage);
> + struct sockaddr_storage addr, empty;
> int msec_sleep = 10;
> - int queued = 1;
> - int i;
> + void *raw_addr;
> + int i, cmdlen;
> + char cmd[128];
> +
> + /* get the local address and convert it to string */
> + if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0)
> + xerror("getsockname");
> +
> + if (addr.ss_family == AF_INET)
> + raw_addr= &(((struct sockaddr_in *)&addr)->sin_addr);
> + else if (addr.ss_family == AF_INET6)
> + raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr);
> + else
> + xerror("bad family");
> +
> + strcpy(cmd, "ss -M | grep ");
> + cmdlen = strlen(cmd);
> + if (!inet_ntop(addr.ss_family , raw_addr, &cmd[cmdlen],
> + sizeof(cmd) - cmdlen))
> + xerror("inet_ntop");
> + strcat(cmd, " >/dev/null");
Detail: we can use 'grep -q', so no need to add ">/dev/null".
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes
2025-01-10 16:43 [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes Paolo Abeni
` (2 preceding siblings ...)
2025-01-10 16:43 ` [PATCH mptcp-net 3/3] selftests: mptcp: avoid spurious errors on disconnect Paolo Abeni
@ 2025-01-10 17:55 ` MPTCP CI
2025-01-11 11:39 ` Matthieu Baerts
4 siblings, 0 replies; 9+ messages in thread
From: MPTCP CI @ 2025-01-10 17:55 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12713850045
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/69da18b21696
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=924334
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes
2025-01-10 16:43 [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes Paolo Abeni
` (3 preceding siblings ...)
2025-01-10 17:55 ` [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes MPTCP CI
@ 2025-01-11 11:39 ` Matthieu Baerts
2025-01-11 12:52 ` Matthieu Baerts
2025-01-13 8:35 ` Paolo Abeni
4 siblings, 2 replies; 9+ messages in thread
From: Matthieu Baerts @ 2025-01-11 11:39 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 10/01/2025 17:43, Paolo Abeni wrote:
> still under test locally. Sharing early for larger coverage.
> Among other things, still to evaluate if patch 1/3 increases the ack
> frequency "too much".
>
> Paolo Abeni (3):
> mptcp: be sure to send ack when mptcp-level window re-opens
> mptcp: fix spurious wake-up on under memory pressure.
> selftests: mptcp: avoid spurious errors on disconnect
Thank you very much for these fixes!
They look good to me and the CI, and the mptcp_connect.sh has
successfully run 1250 times on my side:
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes
2025-01-11 11:39 ` Matthieu Baerts
@ 2025-01-11 12:52 ` Matthieu Baerts
2025-01-13 8:35 ` Paolo Abeni
1 sibling, 0 replies; 9+ messages in thread
From: Matthieu Baerts @ 2025-01-11 12:52 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 11/01/2025 12:39, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 10/01/2025 17:43, Paolo Abeni wrote:
>> still under test locally. Sharing early for larger coverage.
>> Among other things, still to evaluate if patch 1/3 increases the ack
>> frequency "too much".
>>
>> Paolo Abeni (3):
>> mptcp: be sure to send ack when mptcp-level window re-opens
>> mptcp: fix spurious wake-up on under memory pressure.
>> selftests: mptcp: avoid spurious errors on disconnect
>
> Thank you very much for these fixes!
>
> They look good to me and the CI, and the mptcp_connect.sh has
> successfully run 1250 times on my side:
>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Now in our tree (fixes for -net) with the small modification I suggested:
New patches for t/upstream-net and t/upstream:
- 6c4d6102b5d2: mptcp: be sure to send ack when mptcp-level window re-opens
- 7114896353d0: mptcp: fix spurious wake-up on under memory pressure
- f1255ad62bbd: selftests: mptcp: avoid spurious errors on disconnect
- Results: 33abddb5214c..17f91cba3085 (export-net)
- Results: dcd1bfd53763..17aabfa75062 (export)
Tests are now in progress:
- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/ce2b29666dcb17454f9c500eb2aa2c61889f4b01/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/b1747a5cf769c578348f5371b105ddb41d4f8b6f/checks
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH mptcp-net 0/3] mptcp: a bunch of fixes for ST flakes
2025-01-11 11:39 ` Matthieu Baerts
2025-01-11 12:52 ` Matthieu Baerts
@ 2025-01-13 8:35 ` Paolo Abeni
1 sibling, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2025-01-13 8:35 UTC (permalink / raw)
To: Matthieu Baerts, mptcp
On 1/11/25 12:39 PM, Matthieu Baerts wrote:
> On 10/01/2025 17:43, Paolo Abeni wrote:
>> still under test locally. Sharing early for larger coverage.
>> Among other things, still to evaluate if patch 1/3 increases the ack
>> frequency "too much".
>>
>> Paolo Abeni (3):
>> mptcp: be sure to send ack when mptcp-level window re-opens
>> mptcp: fix spurious wake-up on under memory pressure.
>> selftests: mptcp: avoid spurious errors on disconnect
>
> Thank you very much for these fixes!
>
> They look good to me and the CI, and the mptcp_connect.sh has
> successfully run 1250 times on my side:
>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Great! thank you for the additional testing!
I suggest to send them to netdev soon, to hit -rc8 and possibly spend
some more time on upstream PW before.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread