* [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking
@ 2022-04-11 10:40 Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 1/4] mptcp: really share subflow snd_wnd Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 10:40 UTC (permalink / raw)
To: mptcp
I've been chasing bad/unstable performances with multiple subflows
on very high speed links.
It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:
- the rcv_wnd is not effectively shared on the tx side, as each
subflow takes in account only the value received by the underlaying
TCP connection. This is addressed in patch 1/4
- The mptcp-level offered wnd right edge is currently allowed to shrink.
Reading section 3.3.4.:
"""
The receive window is relative to the DATA_ACK. As in TCP, a
receiver MUST NOT shrink the right edge of the receive window (i.e.,
DATA_ACK + receive window). The receiver will use the data sequence
number to tell if a packet should be accepted at the connection
level.
"""
I read the above as we need to reflect window right-edge tracking
on the wire, see patch 3/4.
- The offered window right edge tracking can happen concurrently on
multiple subflows, but there is no mutex protection. We need an
additional atomic operation - still patch 3/4
This series additionally bump a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).
With this series tput in the critical scenario raises from ~26 Gbps
(ranging in 4-30 Gbps) to ~43 Gbps (with min > 33 Gbps)
I guess patch 3/4 is the most debatable - expecially for RFC compliance
Any feedback more then welcome!
Note: still in patch 3/4, I'm unsure that the th->window update is
strictly necessary from functional perspective (e.g. possibly the atomic
operation is enough), I'll try to test that, too.
Paolo Abeni (4):
mptcp: really share subflow snd_wnd
mptcp: add mib for xmit window sharing
mptcp: never shrink offered window
mptcp: add more offered MIBs counter.
include/net/mptcp.h | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/mptcp/mib.c | 4 +++
net/mptcp/mib.h | 6 +++++
net/mptcp/options.c | 61 +++++++++++++++++++++++++++++++++++++------
net/mptcp/protocol.c | 24 +++++++++++------
6 files changed, 81 insertions(+), 18 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/4] mptcp: really share subflow snd_wnd
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
@ 2022-04-11 10:40 ` Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 2/4] mptcp: add mib for xmit window sharing Paolo Abeni
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 10:40 UTC (permalink / raw)
To: mptcp
As per RFC, mptcp subflows use a "shared" snd_wnd: the effective
window is the maximum among the current values received on all
subflows. Without such feature a data transfer using multiple
subflows could block.
Window sharing is currently implemented in the RX side:
__tcp_select_window uses the mptcp-level receive buffer to compute
the announced window.
That is not enough: the TCP stack will stick to the window size
received on the given subflow; we need to propagate the msk window
value on each subflow at xmit time.
Change the packet scheduler to ignore the subflow level window
and and use instead the msk level one
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b2c654992de0..9eb33681d00d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1141,19 +1141,20 @@ struct mptcp_sendmsg_info {
bool data_lock_held;
};
-static int mptcp_check_allowed_size(struct mptcp_sock *msk, u64 data_seq,
- int avail_size)
+static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *ssk,
+ u64 data_seq, int avail_size)
{
u64 window_end = mptcp_wnd_end(msk);
+ u64 mptcp_snd_wnd;
if (__mptcp_check_fallback(msk))
return avail_size;
- if (!before64(data_seq + avail_size, window_end)) {
- u64 allowed_size = window_end - data_seq;
+ mptcp_snd_wnd = window_end - data_seq;
+ avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size);
- return min_t(unsigned int, allowed_size, avail_size);
- }
+ if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd))
+ tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd);
return avail_size;
}
@@ -1305,7 +1306,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
}
/* Zero window and all data acked? Probe. */
- copy = mptcp_check_allowed_size(msk, data_seq, copy);
+ copy = mptcp_check_allowed_size(msk, ssk, data_seq, copy);
if (copy == 0) {
u64 snd_una = READ_ONCE(msk->snd_una);
@@ -1498,11 +1499,16 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
* to check that subflow has a non empty cwin.
*/
ssk = send_info[SSK_MODE_ACTIVE].ssk;
- if (!ssk || !sk_stream_memory_free(ssk) || !tcp_sk(ssk)->snd_wnd)
+ if (!ssk || !sk_stream_memory_free(ssk))
return NULL;
- burst = min_t(int, MPTCP_SEND_BURST_SIZE, tcp_sk(ssk)->snd_wnd);
+ burst = min_t(int, MPTCP_SEND_BURST_SIZE, mptcp_wnd_end(msk) - msk->snd_nxt);
wmem = READ_ONCE(ssk->sk_wmem_queued);
+ if (!burst) {
+ msk->last_snd = NULL;
+ return ssk;
+ }
+
subflow = mptcp_subflow_ctx(ssk);
subflow->avg_pacing_rate = div_u64((u64)subflow->avg_pacing_rate * wmem +
READ_ONCE(ssk->sk_pacing_rate) * burst,
--
2.35.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/4] mptcp: add mib for xmit window sharing
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 1/4] mptcp: really share subflow snd_wnd Paolo Abeni
@ 2022-04-11 10:40 ` Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 10:40 UTC (permalink / raw)
To: mptcp
Bump a counter for counter when snd_wnd is shared among subflow,
for observability's sake.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/protocol.c | 4 +++-
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index d93a8c9996fd..6a6f8151375a 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -56,6 +56,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED),
SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
+ SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
SNMP_MIB_SENTINEL
};
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 529d07af9e14..2411510bef66 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -49,6 +49,7 @@ enum linux_mptcp_mib_field {
MPTCP_MIB_RCVPRUNED, /* Incoming packet dropped due to memory limit */
MPTCP_MIB_SUBFLOWSTALE, /* Subflows entered 'stale' status */
MPTCP_MIB_SUBFLOWRECOVER, /* Subflows returned to active status after being stale */
+ MPTCP_MIB_SNDWNDSHARED, /* Subflow snd wnd is overridden by msk's one */
__MPTCP_MIB_MAX
};
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9eb33681d00d..6ef530271845 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1153,8 +1153,10 @@ static int mptcp_check_allowed_size(const struct mptcp_sock *msk, struct sock *s
mptcp_snd_wnd = window_end - data_seq;
avail_size = min_t(unsigned int, mptcp_snd_wnd, avail_size);
- if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd))
+ if (unlikely(tcp_sk(ssk)->snd_wnd < mptcp_snd_wnd)) {
tcp_sk(ssk)->snd_wnd = min_t(u64, U32_MAX, mptcp_snd_wnd);
+ MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_SNDWNDSHARED);
+ }
return avail_size;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 3/4] mptcp: never shrink offered window
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 1/4] mptcp: really share subflow snd_wnd Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 2/4] mptcp: add mib for xmit window sharing Paolo Abeni
@ 2022-04-11 10:40 ` Paolo Abeni
2022-04-11 18:23 ` kernel test robot
` (2 more replies)
2022-04-11 10:40 ` [RFC PATCH 4/4] mptcp: add more offered MIBs counter Paolo Abeni
2022-04-11 13:49 ` [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
4 siblings, 3 replies; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 10:40 UTC (permalink / raw)
To: mptcp
As per RFC, the offered MPTCP-level window should never shrink.
While we currently track the right edge, we don't enforce the
above constraint on the wire.
Additionally, concurrent xmit on different subflows can end-up in
erroneous right edge update.
Address the above explicitly updating the announced window and
protecting the update with an additional atomic operation (sic)
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/net/mptcp.h | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/mptcp/options.c | 57 +++++++++++++++++++++++++++++++++++++------
3 files changed, 51 insertions(+), 10 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 877077b53200..5706d132713e 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
struct mptcp_out_options *opts);
bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(__be32 *ptr, struct tcp_sock *tp,
struct mptcp_out_options *opts);
void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c221f3bce975..edad33b9db11 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -444,7 +444,7 @@ struct tcp_out_options {
struct mptcp_out_options mptcp;
};
-static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
+static void mptcp_options_write(__be32 *ptr, struct tcp_sock *tp,
struct tcp_out_options *opts)
{
#if IS_ENABLED(CONFIG_MPTCP)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index e05d9458a025..ffc0292da041 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1224,20 +1224,61 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
return true;
}
-static void mptcp_set_rwin(const struct tcp_sock *tp)
+static void mptcp_set_rwin(struct tcp_sock *tp, void *opt)
{
const struct sock *ssk = (const struct sock *)tp;
- const struct mptcp_subflow_context *subflow;
+ struct mptcp_subflow_context *subflow;
+ u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
struct mptcp_sock *msk;
- u64 ack_seq;
+ struct tcphdr *th;
+ u32 new_win;
+ u64 win;
subflow = mptcp_subflow_ctx(ssk);
msk = mptcp_sk(subflow->conn);
- ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
+ ack_seq = READ_ONCE(msk->ack_seq);
+ rcv_wnd_new = ack_seq + tp->rcv_wnd;
+
+ rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
+ if (after64(rcv_wnd_new, rcv_wnd_old)) {
+ u64 rcv_wnd;
- if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
- WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
+ for (;;) {
+ rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
+
+ if (rcv_wnd == rcv_wnd_old)
+ break;
+ if (before64(rcv_wnd_new, rcv_wnd))
+ goto raise_win;
+ rcv_wnd_old = rcv_wnd;
+ };
+ return;
+ }
+
+ if (rcv_wnd_new != rcv_wnd_old) {
+
+raise_win:
+ th = opt - sizeof(struct tcphdr);
+ win = rcv_wnd_old - ack_seq;
+ tp->rcv_wnd = min_t(u64, win, U32_MAX);
+ new_win = tp->rcv_wnd;
+
+ /* Make sure we do not exceed the maximum possible
+ * scaled window.
+ */
+ if (unlikely(th->syn))
+ new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
+ if (!tp->rx_opt.rcv_wscale &&
+ sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
+ new_win = min(new_win, MAX_TCP_WINDOW);
+ else
+ new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
+
+ /* RFC1323 scaling applied */
+ new_win >>= tp->rx_opt.rcv_wscale;
+ th->window = htons(new_win);
+ }
}
u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
@@ -1265,7 +1306,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
~csum_unfold(mpext->csum));
}
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(__be32 *ptr, struct tcp_sock *tp,
struct mptcp_out_options *opts)
{
const struct sock *ssk = (const struct sock *)tp;
@@ -1554,7 +1595,7 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
}
if (tp)
- mptcp_set_rwin(tp);
+ mptcp_set_rwin(tp, opts);
}
__be32 mptcp_get_reset_option(const struct sk_buff *skb)
--
2.35.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 4/4] mptcp: add more offered MIBs counter.
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
` (2 preceding siblings ...)
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
@ 2022-04-11 10:40 ` Paolo Abeni
2022-04-11 11:58 ` mptcp: add more offered MIBs counter.: Tests Results MPTCP CI
2022-04-11 13:49 ` [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 10:40 UTC (permalink / raw)
To: mptcp
Track the exceptional handling of MPTCP-level offered window
with a few more counters for observability.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/mib.c | 3 +++
net/mptcp/mib.h | 5 +++++
net/mptcp/options.c | 6 +++++-
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 6a6f8151375a..0dac2863c6e1 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -57,6 +57,9 @@ static const struct snmp_mib mptcp_snmp_list[] = {
SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE),
SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER),
SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED),
+ SNMP_MIB_ITEM("RcvWndShared", MPTCP_MIB_RCVWNDSHARED),
+ SNMP_MIB_ITEM("RcvWndConflictUpdate", MPTCP_MIB_RCVWNDCONFLICTUPDATE),
+ SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
SNMP_MIB_SENTINEL
};
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 2411510bef66..e05428838005 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -50,6 +50,11 @@ enum linux_mptcp_mib_field {
MPTCP_MIB_SUBFLOWSTALE, /* Subflows entered 'stale' status */
MPTCP_MIB_SUBFLOWRECOVER, /* Subflows returned to active status after being stale */
MPTCP_MIB_SNDWNDSHARED, /* Subflow snd wnd is overridden by msk's one */
+ MPTCP_MIB_RCVWNDSHARED, /* Subflow rcv wnd is overridden by msk's one */
+ MPTCP_MIB_RCVWNDCONFLICTUPDATE, /* subflow rcv wnd is overridden by msk's one due to
+ * conflict with another subflow while updating msk rcv wnd
+ */
+ MPTCP_MIB_RCVWNDCONFLICT, /* Conflict with another subflow while updating msk rcv wnd */
__MPTCP_MIB_MAX
};
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index ffc0292da041..dc84f39ff9f9 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1249,8 +1249,11 @@ static void mptcp_set_rwin(struct tcp_sock *tp, void *opt)
if (rcv_wnd == rcv_wnd_old)
break;
- if (before64(rcv_wnd_new, rcv_wnd))
+ if (before64(rcv_wnd_new, rcv_wnd)) {
+ MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICTUPDATE);
goto raise_win;
+ }
+ MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT);
rcv_wnd_old = rcv_wnd;
};
return;
@@ -1278,6 +1281,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, void *opt)
/* RFC1323 scaling applied */
new_win >>= tp->rx_opt.rcv_wscale;
th->window = htons(new_win);
+ MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED);
}
}
--
2.35.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: mptcp: add more offered MIBs counter.: Tests Results
2022-04-11 10:40 ` [RFC PATCH 4/4] mptcp: add more offered MIBs counter Paolo Abeni
@ 2022-04-11 11:58 ` MPTCP CI
0 siblings, 0 replies; 10+ messages in thread
From: MPTCP CI @ 2022-04-11 11:58 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal:
- Success! ✅:
- Task: https://cirrus-ci.com/task/6597949374332928
- Summary: https://api.cirrus-ci.com/v1/artifact/task/6597949374332928/summary/summary.txt
- KVM Validation: debug:
- Critical: KMemLeak ❌:
- Task: https://cirrus-ci.com/task/4768362025713664
- Summary: https://api.cirrus-ci.com/v1/artifact/task/4768362025713664/summary/summary.txt
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/3b2bbb4557d7
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-debug
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
` (3 preceding siblings ...)
2022-04-11 10:40 ` [RFC PATCH 4/4] mptcp: add more offered MIBs counter Paolo Abeni
@ 2022-04-11 13:49 ` Paolo Abeni
4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2022-04-11 13:49 UTC (permalink / raw)
To: mptcp
On Mon, 2022-04-11 at 12:40 +0200, Paolo Abeni wrote:
> Note: still in patch 3/4, I'm unsure that the th->window update is
> strictly necessary from functional perspective (e.g. possibly the atomic
> operation is enough), I'll try to test that, too.
A couple more notes:
- it looks like the th->window update is necessary, a test run without
the relevant chunk produced significantly less good results.
- this is still (supriese, surprise!) not perfect, I can observe
low_but_not_zero count of TcpExtTCPZeroWindowDrop, which instead should
be 0 without evil middle-box (and is 0 with plain TCP)
- I'm wondering if we should additionally add another mib counter for
MPTCPWANTZEROWINDOWADV - quite similar to TCPWANTZEROWINDOWADV
Cheers,
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/4] mptcp: never shrink offered window
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
@ 2022-04-11 18:23 ` kernel test robot
2022-04-11 19:46 ` kernel test robot
2022-04-15 0:39 ` Mat Martineau
2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2022-04-11 18:23 UTC (permalink / raw)
To: Paolo Abeni; +Cc: llvm, kbuild-all
Hi Paolo,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on mptcp/export]
[also build test ERROR on linus/master v5.18-rc2 next-20220411]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-improve-mptcp-level-window-tracking/20220411-184144
base: https://github.com/multipath-tcp/mptcp_net-next.git export
config: hexagon-defconfig (https://download.01.org/0day-ci/archive/20220412/202204120247.mC6ZORgL-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project c6e83f560f06cdfe8aa47b248d8bdc58f947274b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/4cd25cbca4acc18a428817e9b1adc5ddd131422b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Paolo-Abeni/mptcp-improve-mptcp-level-window-tracking/20220411-184144
git checkout 4cd25cbca4acc18a428817e9b1adc5ddd131422b
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash net/mptcp/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
net/mptcp/options.c:554:21: warning: parameter 'remaining' set but not used [-Wunused-but-set-parameter]
unsigned int remaining,
^
>> net/mptcp/options.c:1248:14: error: implicit declaration of function 'arch_cmpxchg64' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
^
include/linux/atomic/atomic-instrumented.h:1946:2: note: expanded from macro 'cmpxchg64'
arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
^
1 warning and 1 error generated.
vim +/arch_cmpxchg64 +1248 net/mptcp/options.c
1226
1227 static void mptcp_set_rwin(struct tcp_sock *tp, void *opt)
1228 {
1229 const struct sock *ssk = (const struct sock *)tp;
1230 struct mptcp_subflow_context *subflow;
1231 u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
1232 struct mptcp_sock *msk;
1233 struct tcphdr *th;
1234 u32 new_win;
1235 u64 win;
1236
1237 subflow = mptcp_subflow_ctx(ssk);
1238 msk = mptcp_sk(subflow->conn);
1239
1240 ack_seq = READ_ONCE(msk->ack_seq);
1241 rcv_wnd_new = ack_seq + tp->rcv_wnd;
1242
1243 rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
1244 if (after64(rcv_wnd_new, rcv_wnd_old)) {
1245 u64 rcv_wnd;
1246
1247 for (;;) {
> 1248 rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
1249
1250 if (rcv_wnd == rcv_wnd_old)
1251 break;
1252 if (before64(rcv_wnd_new, rcv_wnd))
1253 goto raise_win;
1254 rcv_wnd_old = rcv_wnd;
1255 };
1256 return;
1257 }
1258
1259 if (rcv_wnd_new != rcv_wnd_old) {
1260
1261 raise_win:
1262 th = opt - sizeof(struct tcphdr);
1263 win = rcv_wnd_old - ack_seq;
1264 tp->rcv_wnd = min_t(u64, win, U32_MAX);
1265 new_win = tp->rcv_wnd;
1266
1267 /* Make sure we do not exceed the maximum possible
1268 * scaled window.
1269 */
1270 if (unlikely(th->syn))
1271 new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
1272 if (!tp->rx_opt.rcv_wscale &&
1273 sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
1274 new_win = min(new_win, MAX_TCP_WINDOW);
1275 else
1276 new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
1277
1278 /* RFC1323 scaling applied */
1279 new_win >>= tp->rx_opt.rcv_wscale;
1280 th->window = htons(new_win);
1281 }
1282 }
1283
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/4] mptcp: never shrink offered window
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
2022-04-11 18:23 ` kernel test robot
@ 2022-04-11 19:46 ` kernel test robot
2022-04-15 0:39 ` Mat Martineau
2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2022-04-11 19:46 UTC (permalink / raw)
To: Paolo Abeni; +Cc: llvm, kbuild-all
Hi Paolo,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on mptcp/export]
[also build test ERROR on linus/master v5.18-rc2 next-20220411]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-improve-mptcp-level-window-tracking/20220411-184144
base: https://github.com/multipath-tcp/mptcp_net-next.git export
config: arm-buildonly-randconfig-r006-20220411 (https://download.01.org/0day-ci/archive/20220412/202204120334.WqEBThyK-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project c6e83f560f06cdfe8aa47b248d8bdc58f947274b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/intel-lab-lkp/linux/commit/4cd25cbca4acc18a428817e9b1adc5ddd131422b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Paolo-Abeni/mptcp-improve-mptcp-level-window-tracking/20220411-184144
git checkout 4cd25cbca4acc18a428817e9b1adc5ddd131422b
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash net/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
net/mptcp/options.c:554:21: warning: parameter 'remaining' set but not used [-Wunused-but-set-parameter]
unsigned int remaining,
^
In file included from net/mptcp/options.c:9:
In file included from include/linux/kernel.h:22:
In file included from include/linux/bitops.h:33:
In file included from arch/arm/include/asm/bitops.h:243:
In file included from include/asm-generic/bitops/lock.h:5:
In file included from include/linux/atomic.h:7:
In file included from arch/arm/include/asm/atomic.h:16:
>> arch/arm/include/asm/cmpxchg.h:254:1: error: instruction requires: !armv*m
"1: ldrexd %1, %H1, [%3]\n"
^
<inline asm>:1:5: note: instantiated into assembly here
1: ldrexd r8, r9, [r11]
^
In file included from net/mptcp/options.c:9:
In file included from include/linux/kernel.h:22:
In file included from include/linux/bitops.h:33:
In file included from arch/arm/include/asm/bitops.h:243:
In file included from include/asm-generic/bitops/lock.h:5:
In file included from include/linux/atomic.h:7:
In file included from arch/arm/include/asm/atomic.h:16:
arch/arm/include/asm/cmpxchg.h:258:2: error: instruction requires: !armv*m
" strexd %0, %5, %H5, [%3]\n"
^
<inline asm>:5:2: note: instantiated into assembly here
strexd r0, r4, r5, [r11]
^
1 warning and 2 errors generated.
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for DRM_GEM_SHMEM_HELPER
Depends on HAS_IOMEM && DRM && MMU
Selected by
- DRM_SSD130X && HAS_IOMEM && DRM
vim +254 arch/arm/include/asm/cmpxchg.h
e001bbae7147b1 Russell King 2015-05-26 243
2523c67bb6962f Will Deacon 2013-10-09 244 static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
2523c67bb6962f Will Deacon 2013-10-09 245 unsigned long long old,
2523c67bb6962f Will Deacon 2013-10-09 246 unsigned long long new)
2523c67bb6962f Will Deacon 2013-10-09 247 {
2523c67bb6962f Will Deacon 2013-10-09 248 unsigned long long oldval;
2523c67bb6962f Will Deacon 2013-10-09 249 unsigned long res;
2523c67bb6962f Will Deacon 2013-10-09 250
c32ffce0f66e5d Will Deacon 2014-02-21 251 prefetchw(ptr);
c32ffce0f66e5d Will Deacon 2014-02-21 252
2523c67bb6962f Will Deacon 2013-10-09 253 __asm__ __volatile__(
2523c67bb6962f Will Deacon 2013-10-09 @254 "1: ldrexd %1, %H1, [%3]\n"
2523c67bb6962f Will Deacon 2013-10-09 255 " teq %1, %4\n"
2523c67bb6962f Will Deacon 2013-10-09 256 " teqeq %H1, %H4\n"
2523c67bb6962f Will Deacon 2013-10-09 257 " bne 2f\n"
2523c67bb6962f Will Deacon 2013-10-09 258 " strexd %0, %5, %H5, [%3]\n"
2523c67bb6962f Will Deacon 2013-10-09 259 " teq %0, #0\n"
2523c67bb6962f Will Deacon 2013-10-09 260 " bne 1b\n"
2523c67bb6962f Will Deacon 2013-10-09 261 "2:"
2523c67bb6962f Will Deacon 2013-10-09 262 : "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
2523c67bb6962f Will Deacon 2013-10-09 263 : "r" (ptr), "r" (old), "r" (new)
2523c67bb6962f Will Deacon 2013-10-09 264 : "cc");
2523c67bb6962f Will Deacon 2013-10-09 265
2523c67bb6962f Will Deacon 2013-10-09 266 return oldval;
2523c67bb6962f Will Deacon 2013-10-09 267 }
2523c67bb6962f Will Deacon 2013-10-09 268
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/4] mptcp: never shrink offered window
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
2022-04-11 18:23 ` kernel test robot
2022-04-11 19:46 ` kernel test robot
@ 2022-04-15 0:39 ` Mat Martineau
2 siblings, 0 replies; 10+ messages in thread
From: Mat Martineau @ 2022-04-15 0:39 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
On Mon, 11 Apr 2022, Paolo Abeni wrote:
> As per RFC, the offered MPTCP-level window should never shrink.
> While we currently track the right edge, we don't enforce the
> above constraint on the wire.
> Additionally, concurrent xmit on different subflows can end-up in
> erroneous right edge update.
> Address the above explicitly updating the announced window and
> protecting the update with an additional atomic operation (sic)
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/net/mptcp.h | 2 +-
> net/ipv4/tcp_output.c | 2 +-
> net/mptcp/options.c | 57 +++++++++++++++++++++++++++++++++++++------
> 3 files changed, 51 insertions(+), 10 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 877077b53200..5706d132713e 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -125,7 +125,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> struct mptcp_out_options *opts);
> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>
> -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> +void mptcp_write_options(__be32 *ptr, struct tcp_sock *tp,
> struct mptcp_out_options *opts);
>
> void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index c221f3bce975..edad33b9db11 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -444,7 +444,7 @@ struct tcp_out_options {
> struct mptcp_out_options mptcp;
> };
>
> -static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
> +static void mptcp_options_write(__be32 *ptr, struct tcp_sock *tp,
> struct tcp_out_options *opts)
> {
> #if IS_ENABLED(CONFIG_MPTCP)
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index e05d9458a025..ffc0292da041 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -1224,20 +1224,61 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
> return true;
> }
>
> -static void mptcp_set_rwin(const struct tcp_sock *tp)
> +static void mptcp_set_rwin(struct tcp_sock *tp, void *opt)
> {
> const struct sock *ssk = (const struct sock *)tp;
> - const struct mptcp_subflow_context *subflow;
> + struct mptcp_subflow_context *subflow;
> + u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
> struct mptcp_sock *msk;
> - u64 ack_seq;
> + struct tcphdr *th;
> + u32 new_win;
> + u64 win;
>
> subflow = mptcp_subflow_ctx(ssk);
> msk = mptcp_sk(subflow->conn);
>
> - ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
> + ack_seq = READ_ONCE(msk->ack_seq);
> + rcv_wnd_new = ack_seq + tp->rcv_wnd;
> +
> + rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
> + if (after64(rcv_wnd_new, rcv_wnd_old)) {
> + u64 rcv_wnd;
>
> - if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
> - WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
> + for (;;) {
> + rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
> +
> + if (rcv_wnd == rcv_wnd_old)
> + break;
> + if (before64(rcv_wnd_new, rcv_wnd))
> + goto raise_win;
> + rcv_wnd_old = rcv_wnd;
> + };
> + return;
> + }
> +
> + if (rcv_wnd_new != rcv_wnd_old) {
> +
> +raise_win:
> + th = opt - sizeof(struct tcphdr);
'opt' is a pointer to a struct mptcp_out_options, not a pointer to the
option bytes in a tcp header... so this ends up writing somewhere
unexpected on the stack.
'ptr' (in mptcp_write_options()) is somewhere in the TCP header, but we
don't know how much option space was consumed before mptcp_write_options()
was called. Looks like a copy of the original ptr passed in to
tcp_options_write() (or, even better, an actual tcphdr pointer!) needs to
be plumbed in to this function.
- Mat
> + win = rcv_wnd_old - ack_seq;
> + tp->rcv_wnd = min_t(u64, win, U32_MAX);
> + new_win = tp->rcv_wnd;
> +
> + /* Make sure we do not exceed the maximum possible
> + * scaled window.
> + */
> + if (unlikely(th->syn))
> + new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
> + if (!tp->rx_opt.rcv_wscale &&
> + sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
> + new_win = min(new_win, MAX_TCP_WINDOW);
> + else
> + new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
> +
> + /* RFC1323 scaling applied */
> + new_win >>= tp->rx_opt.rcv_wscale;
> + th->window = htons(new_win);
> + }
> }
>
> u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
> @@ -1265,7 +1306,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
> ~csum_unfold(mpext->csum));
> }
>
> -void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> +void mptcp_write_options(__be32 *ptr, struct tcp_sock *tp,
> struct mptcp_out_options *opts)
> {
> const struct sock *ssk = (const struct sock *)tp;
> @@ -1554,7 +1595,7 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> }
>
> if (tp)
> - mptcp_set_rwin(tp);
> + mptcp_set_rwin(tp, opts);
> }
>
> __be32 mptcp_get_reset_option(const struct sk_buff *skb)
> --
> 2.35.1
>
>
>
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-04-15 0:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-11 10:40 [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 1/4] mptcp: really share subflow snd_wnd Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 2/4] mptcp: add mib for xmit window sharing Paolo Abeni
2022-04-11 10:40 ` [RFC PATCH 3/4] mptcp: never shrink offered window Paolo Abeni
2022-04-11 18:23 ` kernel test robot
2022-04-11 19:46 ` kernel test robot
2022-04-15 0:39 ` Mat Martineau
2022-04-11 10:40 ` [RFC PATCH 4/4] mptcp: add more offered MIBs counter Paolo Abeni
2022-04-11 11:58 ` mptcp: add more offered MIBs counter.: Tests Results MPTCP CI
2022-04-11 13:49 ` [RFC PATCH 0/4] mptcp: improve mptcp-level window tracking Paolo Abeni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.