* [PATCH mptcp-next 0/4] mptcp: add some more diag info
@ 2023-05-04 16:39 Paolo Abeni
2023-05-04 16:39 ` [PATCH mptcp-next 1/4] mptcp: add subflow unique id Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Paolo Abeni @ 2023-05-04 16:39 UTC (permalink / raw)
To: mptcp
This is a follow up to the topic discussed in recent pubblic mtg.
Introduces unique id for accurate subflow stats tracking and
aggregate mptcp counters, plus some minimal self-tests.
The tests themself do not take in account support for running on
older kernel.
This is on top of "selftests: mptcp: centralize stats dumping" -
that is should land not on top of the current export branch head.
That in turn will cause some non trivial conflicts with:
"mptcp: use get_retrans wrapper"
the resolution should like the following (for brevity only reporting
the lines affected by the conflict resolution)
---
@@ -2574,16 +2569,17 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
static void __mptcp_retrans(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ struct mptcp_subflow_context *subflow;
struct mptcp_sendmsg_info info = {};
struct mptcp_data_frag *dfrag;
- size_t copied = 0;
+ u16 already_sent, len = 0;
struct sock *ssk;
- int ret;
+ int ret, err;
// ...
@@ -2602,32 +2598,48 @@ static void __mptcp_retrans(struct sock *sk)
// ...
- release_sock(ssk);
+ already_send = max(dfrag->already_sent, len);
+ msk->bytes_retrans += already_sent - dfrag->alread_sent;
+ dfrag->already_sent = already_sent;
// ...
Paolo Abeni (4):
mptcp: add subflow unique id
mptcp: move snd_una update earlier for fallback socket.
mptcp: track some aggregate data counters.
selftests: mptcp: explicitly tests aggregate countes
include/uapi/linux/mptcp.h | 8 +++++++
net/mptcp/options.c | 14 ++++++++++-
net/mptcp/protocol.c | 20 ++++++++++------
net/mptcp/protocol.h | 9 +++++++-
net/mptcp/sockopt.c | 23 +++++++++++++++----
.../selftests/net/mptcp/mptcp_sockopt.c | 20 ++++++++++++++++
6 files changed, 80 insertions(+), 14 deletions(-)
--
2.40.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH mptcp-next 1/4] mptcp: add subflow unique id
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
@ 2023-05-04 16:39 ` Paolo Abeni
2023-05-17 14:25 ` Matthieu Baerts
2023-05-04 16:40 ` [PATCH mptcp-next 2/4] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2023-05-04 16:39 UTC (permalink / raw)
To: mptcp
The user-space need to preperly account the data received/sent by
individual subflows. When additional subflows are created and/or
closed during the MPTCP socket lifetime, the information currently
exposed via MPTCP_TCPINFO are not enough: subflows are identifed only
by the sequential position inside the info dumps, and that will change
with the above mentioned events.
To solve the above problem, this patch introduces a new subflow identifier
that is unique inside the given mptcp socket scope.
The initial subflow get the id 1 and the other subflows get incremental
values at join time.
The identifier is exposed to user-space overloading tcpi_fackets in
MPTCP_TCPINFO.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/uapi/linux/mptcp.h | 3 +++
net/mptcp/protocol.c | 3 +++
net/mptcp/protocol.h | 5 ++++-
net/mptcp/sockopt.c | 1 +
4 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index 32af2d278cb4..49bfbf85cb80 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -249,4 +249,7 @@ struct mptcp_subflow_addrs {
#define MPTCP_TCPINFO 2
#define MPTCP_SUBFLOW_ADDRS 3
+/* MPTCP_TCPINFO overload obsoleted tcp fields with MPTCP-specific info */
+#define tcpi_subflow_id tcpi_fackets
+
#endif /* _UAPI_MPTCP_H */
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2d331b2d62b7..7f5c89315aad 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -96,6 +96,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
list_add(&subflow->node, &msk->conn_list);
sock_hold(ssock->sk);
subflow->request_mptcp = 1;
+ subflow->subflow_id = msk->subflow_id++;
/* This is the first subflow, always with id 0 */
subflow->local_id_valid = 1;
@@ -838,6 +839,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
if (sk->sk_socket && !ssk->sk_socket)
mptcp_sock_graft(ssk, sk->sk_socket);
+ mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
mptcp_sockopt_sync_locked(msk, ssk);
return true;
}
@@ -2726,6 +2728,7 @@ static int __mptcp_init_sock(struct sock *sk)
WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
WRITE_ONCE(msk->allow_infinite_fallback, true);
msk->recovery = false;
+ msk->subflow_id = 1;
mptcp_pm_data_init(msk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 2d7b2c80a164..babf1230c84d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -316,7 +316,8 @@ struct mptcp_sock {
u64 rtt_us; /* last maximum rtt of subflows */
} rcvq_space;
- u32 setsockopt_seq;
+ u32 subflow_id;
+ u32 setsockopt_seq;
char ca_name[TCP_CA_NAME_MAX];
struct mptcp_sock *dl_next;
};
@@ -497,6 +498,8 @@ struct mptcp_subflow_context {
u8 reset_reason:4;
u8 stale_count;
+ u32 subflow_id;
+
long delegated_status;
unsigned long fail_tout;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index d4258869ac48..d2aa02e28917 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1032,6 +1032,7 @@ static int mptcp_getsockopt_tcpinfo(struct mptcp_sock *msk, char __user *optval,
struct tcp_info info;
tcp_get_info(ssk, &info);
+ info.tcpi_subflow_id = subflow->subflow_id;
if (copy_to_user(infoptr, &info, sfd.size_user)) {
release_sock(sk);
--
2.40.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH mptcp-next 2/4] mptcp: move snd_una update earlier for fallback socket.
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
2023-05-04 16:39 ` [PATCH mptcp-next 1/4] mptcp: add subflow unique id Paolo Abeni
@ 2023-05-04 16:40 ` Paolo Abeni
2023-05-04 16:40 ` [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2023-05-04 16:40 UTC (permalink / raw)
To: mptcp
That will avoid an unneeded conditional in both the fast-path
and in the fallback case and will simplify a bit the next
patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/options.c | 6 ++++++
net/mptcp/protocol.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a8083207be4..4bdcd2b326bd 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1119,6 +1119,12 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
mptcp_data_lock(subflow->conn);
if (sk_stream_memory_free(sk))
__mptcp_check_push(subflow->conn, sk);
+
+ /* on fallback we just need to ignore the msk-level snd_una, as
+ * this is really plain TCP
+ */
+ msk->snd_una = READ_ONCE(msk->snd_nxt);
+
__mptcp_data_acked(subflow->conn);
mptcp_data_unlock(subflow->conn);
return true;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7f5c89315aad..8d674aef2f1e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -998,12 +998,6 @@ static void __mptcp_clean_una(struct sock *sk)
struct mptcp_data_frag *dtmp, *dfrag;
u64 snd_una;
- /* on fallback we just need to ignore snd_una, as this is really
- * plain TCP
- */
- if (__mptcp_check_fallback(msk))
- msk->snd_una = READ_ONCE(msk->snd_nxt);
-
snd_una = msk->snd_una;
list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) {
if (after64(dfrag->data_seq + dfrag->data_len, snd_una))
--
2.40.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters.
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
2023-05-04 16:39 ` [PATCH mptcp-next 1/4] mptcp: add subflow unique id Paolo Abeni
2023-05-04 16:40 ` [PATCH mptcp-next 2/4] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
@ 2023-05-04 16:40 ` Paolo Abeni
2023-05-17 14:26 ` Matthieu Baerts
2023-05-04 16:40 ` [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes Paolo Abeni
2023-05-17 14:24 ` [PATCH mptcp-next 0/4] mptcp: add some more diag info Matthieu Baerts
4 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2023-05-04 16:40 UTC (permalink / raw)
To: mptcp
Currently there are no data transfer counters accounting for all
the subflows used by a given MPTCP socket. The user-space can compute
such figures aggregating the subflow info, but that is inaccurate
if any subflow is closed before the MPTCP socket itself.
Add the new counters in the MPTCP socket itself and expose them
via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
acquire the relevant locks before fetching the msk data, to ensure
better data consistency
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/uapi/linux/mptcp.h | 5 +++++
net/mptcp/options.c | 10 ++++++++--
net/mptcp/protocol.c | 11 ++++++++++-
net/mptcp/protocol.h | 4 ++++
net/mptcp/sockopt.c | 22 +++++++++++++++++-----
5 files changed, 44 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index 49bfbf85cb80..868a77434ec5 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -123,6 +123,11 @@ struct mptcp_info {
__u8 mptcpi_local_addr_used;
__u8 mptcpi_local_addr_max;
__u8 mptcpi_csum_enabled;
+ __u32 mptcpi_retransmits;
+ __u64 mptcpi_bytes_retrans;
+ __u64 mptcpi_bytes_sent;
+ __u64 mptcpi_bytes_received;
+ __u64 mptcpi_bytes_acked;
};
/*
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 4bdcd2b326bd..c254accb14de 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1026,6 +1026,12 @@ u64 __mptcp_expand_seq(u64 old_seq, u64 cur_seq)
return cur_seq;
}
+static void __mptcp_snd_una_update(struct mptcp_sock *msk, u64 new_snd_una)
+{
+ msk->bytes_acked += new_snd_una - msk->snd_una;
+ msk->snd_una = new_snd_una;
+}
+
static void ack_update_msk(struct mptcp_sock *msk,
struct sock *ssk,
struct mptcp_options_received *mp_opt)
@@ -1057,7 +1063,7 @@ static void ack_update_msk(struct mptcp_sock *msk,
__mptcp_check_push(sk, ssk);
if (after64(new_snd_una, old_snd_una)) {
- msk->snd_una = new_snd_una;
+ __mptcp_snd_una_update(msk, new_snd_una);
__mptcp_data_acked(sk);
}
mptcp_data_unlock(sk);
@@ -1123,7 +1129,7 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
/* on fallback we just need to ignore the msk-level snd_una, as
* this is really plain TCP
*/
- msk->snd_una = READ_ONCE(msk->snd_nxt);
+ __mptcp_snd_una_update(msk, READ_ONCE(msk->snd_nxt));
__mptcp_data_acked(subflow->conn);
mptcp_data_unlock(subflow->conn);
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8d674aef2f1e..312bc8e32e93 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -378,6 +378,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
/* in sequence */
+ msk->bytes_received += copy_len;
WRITE_ONCE(msk->ack_seq, msk->ack_seq + copy_len);
tail = skb_peek_tail(&sk->sk_receive_queue);
if (tail && mptcp_try_coalesce(sk, tail, skb))
@@ -761,6 +762,7 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
MPTCP_SKB_CB(skb)->map_seq += delta;
__skb_queue_tail(&sk->sk_receive_queue, skb);
}
+ msk->bytes_received += end_seq - msk->ack_seq;
msk->ack_seq = end_seq;
moved = true;
}
@@ -1525,8 +1527,10 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
* that has been handed to the subflow for transmission
* and skip update in case it was old dfrag.
*/
- if (likely(after64(snd_nxt_new, msk->snd_nxt)))
+ if (likely(after64(snd_nxt_new, msk->snd_nxt))) {
+ msk->bytes_sent += snd_nxt_new - msk->snd_nxt;
msk->snd_nxt = snd_nxt_new;
+ }
}
void mptcp_check_and_set_pending(struct sock *sk)
@@ -2585,6 +2589,7 @@ static void __mptcp_retrans(struct sock *sk)
}
if (copied) {
dfrag->already_sent = max(dfrag->already_sent, info.sent);
+ msk->bytes_retrans += copied;
tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
info.size_goal);
WRITE_ONCE(msk->allow_infinite_fallback, false);
@@ -3098,6 +3103,10 @@ static int mptcp_disconnect(struct sock *sk, int flags)
WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
mptcp_pm_data_reset(msk);
mptcp_ca_reset(sk);
+ msk->bytes_acked = 0;
+ msk->bytes_received = 0;
+ msk->bytes_sent = 0;
+ msk->bytes_retrans = 0;
sk->sk_shutdown = 0;
sk_error_report(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index babf1230c84d..8559d4f81654 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -262,10 +262,13 @@ struct mptcp_sock {
u64 local_key;
u64 remote_key;
u64 write_seq;
+ u64 bytes_sent;
u64 snd_nxt;
+ u64 bytes_received;
u64 ack_seq;
atomic64_t rcv_wnd_sent;
u64 rcv_data_fin_seq;
+ u64 bytes_retrans;
int rmem_fwd_alloc;
struct sock *last_snd;
int snd_burst;
@@ -274,6 +277,7 @@ struct mptcp_sock {
* recovery related fields are under data_lock
* protection
*/
+ u64 bytes_acked;
u64 snd_una;
u64 wnd_end;
unsigned long timer_ival;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index d2aa02e28917..2265387a5db2 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -888,7 +888,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
{
+ struct sock *sk = (struct sock *)msk;
u32 flags = 0;
+ bool slow;
memset(info, 0, sizeof(*info));
@@ -914,11 +916,21 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
if (READ_ONCE(msk->can_ack))
flags |= MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED;
info->mptcpi_flags = flags;
- info->mptcpi_token = READ_ONCE(msk->token);
- info->mptcpi_write_seq = READ_ONCE(msk->write_seq);
- info->mptcpi_snd_una = READ_ONCE(msk->snd_una);
- info->mptcpi_rcv_nxt = READ_ONCE(msk->ack_seq);
- info->mptcpi_csum_enabled = READ_ONCE(msk->csum_enabled);
+ mptcp_data_lock(sk);
+ info->mptcpi_snd_una = msk->snd_una;
+ info->mptcpi_rcv_nxt = msk->ack_seq;
+ info->mptcpi_bytes_acked = msk->bytes_acked;
+ mptcp_data_unlock(sk);
+
+ slow = lock_sock_fast(sk);
+ info->mptcpi_csum_enabled = msk->csum_enabled;
+ info->mptcpi_token = msk->token;
+ info->mptcpi_write_seq = msk->write_seq;
+ info->mptcpi_retransmits = inet_csk(sk)->icsk_retransmits;
+ info->mptcpi_bytes_sent = msk->bytes_sent;
+ info->mptcpi_bytes_received = msk->bytes_received;
+ info->mptcpi_bytes_retrans = msk->bytes_retrans;
+ unlock_sock_fast(sk, slow);
}
EXPORT_SYMBOL_GPL(mptcp_diag_fill_info);
--
2.40.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
` (2 preceding siblings ...)
2023-05-04 16:40 ` [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters Paolo Abeni
@ 2023-05-04 16:40 ` Paolo Abeni
2023-05-17 14:27 ` Matthieu Baerts
2023-05-17 14:24 ` [PATCH mptcp-next 0/4] mptcp: add some more diag info Matthieu Baerts
4 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2023-05-04 16:40 UTC (permalink / raw)
To: mptcp
Update the existing sockopt test-case to do some some
basic checks on the newly added counters.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
.../selftests/net/mptcp/mptcp_sockopt.c | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index ae61f39556ca..0c58de3ef339 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -51,6 +51,11 @@ struct mptcp_info {
__u8 mptcpi_local_addr_used;
__u8 mptcpi_local_addr_max;
__u8 mptcpi_csum_enabled;
+ __u32 mptcpi_retransmits;
+ __u64 mptcpi_bytes_retrans;
+ __u64 mptcpi_bytes_sent;
+ __u64 mptcpi_bytes_received;
+ __u64 mptcpi_bytes_acked;
};
struct mptcp_subflow_data {
@@ -83,6 +88,7 @@ struct mptcp_subflow_addrs {
struct so_state {
struct mptcp_info mi;
+ struct mptcp_info last_sample;
uint64_t mptcpi_rcv_delta;
uint64_t tcpi_rcv_delta;
};
@@ -320,6 +326,7 @@ static void do_getsockopt_mptcp_info(struct so_state *s, int fd, size_t w)
assert(olen == sizeof(i));
+ s->last_sample = i;
if (s->mi.mptcpi_write_seq == 0)
s->mi = i;
@@ -556,6 +563,19 @@ static void process_one_client(int fd, int pipefd)
do_getsockopts(&s, fd, ret, ret2);
if (s.mptcpi_rcv_delta != (uint64_t)ret + 1)
xerror("mptcpi_rcv_delta %" PRIu64 ", expect %" PRIu64, s.mptcpi_rcv_delta, ret + 1, s.mptcpi_rcv_delta - ret);
+ if (s.last_sample.mptcpi_bytes_sent != ret2)
+ xerror("mptcpi_bytes_sent %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_sent, ret2,
+ s.last_sample.mptcpi_bytes_sent - ret2);
+ if (s.last_sample.mptcpi_bytes_received != ret)
+ xerror("mptcpi_bytes_received %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_received, ret,
+ s.last_sample.mptcpi_bytes_received - ret);
+ if (s.last_sample.mptcpi_bytes_acked != ret)
+ xerror("mptcpi_bytes_acked %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_acked, ret2,
+ s.last_sample.mptcpi_bytes_acked - ret2);
+
close(fd);
}
--
2.40.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 0/4] mptcp: add some more diag info
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
` (3 preceding siblings ...)
2023-05-04 16:40 ` [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes Paolo Abeni
@ 2023-05-17 14:24 ` Matthieu Baerts
4 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2023-05-17 14:24 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 04/05/2023 18:39, Paolo Abeni wrote:
> This is a follow up to the topic discussed in recent pubblic mtg.
>
> Introduces unique id for accurate subflow stats tracking and
> aggregate mptcp counters, plus some minimal self-tests.
Thank you for the patches and sorry for the delay! I have some questions
if you don't mind but I think they are not blocking. So the patches look
good to me!
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> The tests themself do not take in account support for running on
> older kernel.
I can take care of that after having applied the series.
> This is on top of "selftests: mptcp: centralize stats dumping" -
> that is should land not on top of the current export branch head.
>
> That in turn will cause some non trivial conflicts with:
>
> "mptcp: use get_retrans wrapper"
>
> the resolution should like the following (for brevity only reporting
> the lines affected by the conflict resolution)
Thank you for having checked and shared that!
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 1/4] mptcp: add subflow unique id
2023-05-04 16:39 ` [PATCH mptcp-next 1/4] mptcp: add subflow unique id Paolo Abeni
@ 2023-05-17 14:25 ` Matthieu Baerts
2023-05-18 14:25 ` Paolo Abeni
0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Baerts @ 2023-05-17 14:25 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 04/05/2023 18:39, Paolo Abeni wrote:
> The user-space need to preperly account the data received/sent by
> individual subflows. When additional subflows are created and/or
> closed during the MPTCP socket lifetime, the information currently
> exposed via MPTCP_TCPINFO are not enough: subflows are identifed only
> by the sequential position inside the info dumps, and that will change
> with the above mentioned events.
>
> To solve the above problem, this patch introduces a new subflow identifier
> that is unique inside the given mptcp socket scope.
>
> The initial subflow get the id 1 and the other subflows get incremental
> values at join time.
>
> The identifier is exposed to user-space overloading tcpi_fackets in
> MPTCP_TCPINFO.
This smart idea solves the issue with MPTCP_TCPINFO but not with
MPTCP_SUBFLOW_ADDRS. But I guess for the second one, we can easily add a
new field in "struct mptcp_subflow_addrs", right? If yes, we can do that
in a new dedicated commit, that's not blocking this series I think.
We can add:
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/388
(But not the "Closes" tag then)
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/uapi/linux/mptcp.h | 3 +++
> net/mptcp/protocol.c | 3 +++
> net/mptcp/protocol.h | 5 ++++-
> net/mptcp/sockopt.c | 1 +
> 4 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index 32af2d278cb4..49bfbf85cb80 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -249,4 +249,7 @@ struct mptcp_subflow_addrs {
> #define MPTCP_TCPINFO 2
> #define MPTCP_SUBFLOW_ADDRS 3
>
> +/* MPTCP_TCPINFO overload obsoleted tcp fields with MPTCP-specific info */
> +#define tcpi_subflow_id tcpi_fackets
Should we eventually add a comment next to the declaration of
"tcpi_fackets" in include/uapi/linux/tcp.h to say that this field is
being used by MPTCP_TCPINFO? Or this is not needed because this is field
is only overridden when using MPTCP_TCPINFO?
I guess it will never be removed anyway. But could it be overridden by
another field at some points later?
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters.
2023-05-04 16:40 ` [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters Paolo Abeni
@ 2023-05-17 14:26 ` Matthieu Baerts
2023-05-18 14:17 ` Paolo Abeni
0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Baerts @ 2023-05-17 14:26 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 04/05/2023 18:40, Paolo Abeni wrote:
> Currently there are no data transfer counters accounting for all
> the subflows used by a given MPTCP socket. The user-space can compute
> such figures aggregating the subflow info, but that is inaccurate
> if any subflow is closed before the MPTCP socket itself.
>
> Add the new counters in the MPTCP socket itself and expose them
> via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
> acquire the relevant locks before fetching the msk data, to ensure
> better data consistency
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(...)
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index d2aa02e28917..2265387a5db2 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -888,7 +888,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>
> void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
> {
> + struct sock *sk = (struct sock *)msk;
> u32 flags = 0;
> + bool slow;
>
> memset(info, 0, sizeof(*info));
>
> @@ -914,11 +916,21 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
> if (READ_ONCE(msk->can_ack))
> flags |= MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED;
> info->mptcpi_flags = flags;
> - info->mptcpi_token = READ_ONCE(msk->token);
> - info->mptcpi_write_seq = READ_ONCE(msk->write_seq);
> - info->mptcpi_snd_una = READ_ONCE(msk->snd_una);
> - info->mptcpi_rcv_nxt = READ_ONCE(msk->ack_seq);
> - info->mptcpi_csum_enabled = READ_ONCE(msk->csum_enabled);
> + mptcp_data_lock(sk);
> + info->mptcpi_snd_una = msk->snd_una;
> + info->mptcpi_rcv_nxt = msk->ack_seq;
> + info->mptcpi_bytes_acked = msk->bytes_acked;
> + mptcp_data_unlock(sk);
Do you know if this could have a "bad" impact on the performances?
I mean mptcp_diag_fill_info() is also used on when dumping all the
connections via Netlink. I guess the lock_sock_fast() should be fine
because it is also used in tcp_get_info() but what about the data lock?
Also OK because that's a spin lock and we only copy fields?
> +
> + slow = lock_sock_fast(sk);
> + info->mptcpi_csum_enabled = msk->csum_enabled;
> + info->mptcpi_token = msk->token;
> + info->mptcpi_write_seq = msk->write_seq;
> + info->mptcpi_retransmits = inet_csk(sk)->icsk_retransmits;
> + info->mptcpi_bytes_sent = msk->bytes_sent;
> + info->mptcpi_bytes_received = msk->bytes_received;
> + info->mptcpi_bytes_retrans = msk->bytes_retrans;
> + unlock_sock_fast(sk, slow);
> }
> EXPORT_SYMBOL_GPL(mptcp_diag_fill_info);
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes
2023-05-04 16:40 ` [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes Paolo Abeni
@ 2023-05-17 14:27 ` Matthieu Baerts
0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2023-05-17 14:27 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 04/05/2023 18:40, Paolo Abeni wrote:
> Update the existing sockopt test-case to do some some
> basic checks on the newly added counters.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> .../selftests/net/mptcp/mptcp_sockopt.c | 20 +++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
> index ae61f39556ca..0c58de3ef339 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
> @@ -51,6 +51,11 @@ struct mptcp_info {
> __u8 mptcpi_local_addr_used;
> __u8 mptcpi_local_addr_max;
> __u8 mptcpi_csum_enabled;
> + __u32 mptcpi_retransmits;
> + __u64 mptcpi_bytes_retrans;
> + __u64 mptcpi_bytes_sent;
> + __u64 mptcpi_bytes_received;
> + __u64 mptcpi_bytes_acked;
> };
>
> struct mptcp_subflow_data {
> @@ -83,6 +88,7 @@ struct mptcp_subflow_addrs {
>
> struct so_state {
> struct mptcp_info mi;
> + struct mptcp_info last_sample;
> uint64_t mptcpi_rcv_delta;
> uint64_t tcpi_rcv_delta;
> };
> @@ -320,6 +326,7 @@ static void do_getsockopt_mptcp_info(struct so_state *s, int fd, size_t w)
>
> assert(olen == sizeof(i));
>
> + s->last_sample = i;
> if (s->mi.mptcpi_write_seq == 0)
> s->mi = i;
>
> @@ -556,6 +563,19 @@ static void process_one_client(int fd, int pipefd)
> do_getsockopts(&s, fd, ret, ret2);
> if (s.mptcpi_rcv_delta != (uint64_t)ret + 1)
> xerror("mptcpi_rcv_delta %" PRIu64 ", expect %" PRIu64, s.mptcpi_rcv_delta, ret + 1, s.mptcpi_rcv_delta - ret);
To support old kernels, I guess the best is to store the "length" of the
getsockopt(MPTCP_INFO) and only check these new following conditions if
the "length" received from the kernel is bigger than a certain size (by
looking at the offset of the mptcpi_retransmits field in the mptcp_info
structure I suppose)?
> + if (s.last_sample.mptcpi_bytes_sent != ret2)
> + xerror("mptcpi_bytes_sent %" PRIu64 ", expect %" PRIu64,
> + s.last_sample.mptcpi_bytes_sent, ret2,
> + s.last_sample.mptcpi_bytes_sent - ret2);
> + if (s.last_sample.mptcpi_bytes_received != ret)
> + xerror("mptcpi_bytes_received %" PRIu64 ", expect %" PRIu64,
> + s.last_sample.mptcpi_bytes_received, ret,
> + s.last_sample.mptcpi_bytes_received - ret);
> + if (s.last_sample.mptcpi_bytes_acked != ret)
> + xerror("mptcpi_bytes_acked %" PRIu64 ", expect %" PRIu64,
> + s.last_sample.mptcpi_bytes_acked, ret2,
> + s.last_sample.mptcpi_bytes_acked - ret2);
> +
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters.
2023-05-17 14:26 ` Matthieu Baerts
@ 2023-05-18 14:17 ` Paolo Abeni
2023-05-19 9:18 ` Matthieu Baerts
0 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2023-05-18 14:17 UTC (permalink / raw)
To: Matthieu Baerts, mptcp
On Wed, 2023-05-17 at 16:26 +0200, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 04/05/2023 18:40, Paolo Abeni wrote:
> > Currently there are no data transfer counters accounting for all
> > the subflows used by a given MPTCP socket. The user-space can compute
> > such figures aggregating the subflow info, but that is inaccurate
> > if any subflow is closed before the MPTCP socket itself.
> >
> > Add the new counters in the MPTCP socket itself and expose them
> > via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
> > acquire the relevant locks before fetching the msk data, to ensure
> > better data consistency
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> (...)
>
> > diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> > index d2aa02e28917..2265387a5db2 100644
> > --- a/net/mptcp/sockopt.c
> > +++ b/net/mptcp/sockopt.c
> > @@ -888,7 +888,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
> >
> > void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
> > {
> > + struct sock *sk = (struct sock *)msk;
> > u32 flags = 0;
> > + bool slow;
> >
> > memset(info, 0, sizeof(*info));
> >
> > @@ -914,11 +916,21 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
> > if (READ_ONCE(msk->can_ack))
> > flags |= MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED;
> > info->mptcpi_flags = flags;
> > - info->mptcpi_token = READ_ONCE(msk->token);
> > - info->mptcpi_write_seq = READ_ONCE(msk->write_seq);
> > - info->mptcpi_snd_una = READ_ONCE(msk->snd_una);
> > - info->mptcpi_rcv_nxt = READ_ONCE(msk->ack_seq);
> > - info->mptcpi_csum_enabled = READ_ONCE(msk->csum_enabled);
> > + mptcp_data_lock(sk);
> > + info->mptcpi_snd_una = msk->snd_una;
> > + info->mptcpi_rcv_nxt = msk->ack_seq;
> > + info->mptcpi_bytes_acked = msk->bytes_acked;
> > + mptcp_data_unlock(sk);
>
> Do you know if this could have a "bad" impact on the performances?
It depends ;)
If you do something alike:
for CPU in $(seq $(nproc)); do while ss -M; do :; done & done
it will impact mptcp socket performances ;)
For a more reasonable usage, nothing measurable is expected: once every
a few minutes? hours? much longer interval? an additional process will
compete for the mptcp data and socket lock. It's not expected to matter
much.
The fact we need to acquire independently the data lock and socket lock
is a bit not nice - for the user-space requesting the dump. It can
possibly be avoided implementing some helper to upgrade the mptcp data
lock to the (fast) socket lock, but at this point I think it would be a
bit overkill.
> I mean mptcp_diag_fill_info() is also used on when dumping all the
> connections via Netlink. I guess the lock_sock_fast() should be fine
> because it is also used in tcp_get_info() but what about the data lock?
> Also OK because that's a spin lock and we only copy fields?
The main impact we have on the data path is the dirtying of the msk
socket spin lock cacheline, which will possibly cause a cache miss on
the next data path lock operation, if the dumping happens on a
different core. Not relevant until we get many of them per socket per
seconds.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 1/4] mptcp: add subflow unique id
2023-05-17 14:25 ` Matthieu Baerts
@ 2023-05-18 14:25 ` Paolo Abeni
0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2023-05-18 14:25 UTC (permalink / raw)
To: Matthieu Baerts, mptcp
On Wed, 2023-05-17 at 16:25 +0200, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 04/05/2023 18:39, Paolo Abeni wrote:
> > The user-space need to preperly account the data received/sent by
> > individual subflows. When additional subflows are created and/or
> > closed during the MPTCP socket lifetime, the information currently
> > exposed via MPTCP_TCPINFO are not enough: subflows are identifed only
> > by the sequential position inside the info dumps, and that will change
> > with the above mentioned events.
> >
> > To solve the above problem, this patch introduces a new subflow identifier
> > that is unique inside the given mptcp socket scope.
> >
> > The initial subflow get the id 1 and the other subflows get incremental
> > values at join time.
> >
> > The identifier is exposed to user-space overloading tcpi_fackets in
> > MPTCP_TCPINFO.
>
> This smart idea solves the issue with MPTCP_TCPINFO but not with
> MPTCP_SUBFLOW_ADDRS. But I guess for the second one, we can easily add a
> new field in "struct mptcp_subflow_addrs", right? If yes, we can do that
> in a new dedicated commit, that's not blocking this series I think.
>
> We can add:
>
> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/388
Right, the above is not completed yet.
Possibly the right solution would be implementing a new SOCKOPT dumping
all the info alltogether.
BTW I just noticed this patch has a bug: we need to explicitly set the
subflow_id for active MPJ subflow - that will not land into
__mptcp_finish_join as they are already present into the conn_list when
mptcp_finish_join() is invoked.
>
> (But not the "Closes" tag then)
>
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > include/uapi/linux/mptcp.h | 3 +++
> > net/mptcp/protocol.c | 3 +++
> > net/mptcp/protocol.h | 5 ++++-
> > net/mptcp/sockopt.c | 1 +
> > 4 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > index 32af2d278cb4..49bfbf85cb80 100644
> > --- a/include/uapi/linux/mptcp.h
> > +++ b/include/uapi/linux/mptcp.h
> > @@ -249,4 +249,7 @@ struct mptcp_subflow_addrs {
> > #define MPTCP_TCPINFO 2
> > #define MPTCP_SUBFLOW_ADDRS 3
> >
> > +/* MPTCP_TCPINFO overload obsoleted tcp fields with MPTCP-specific info */
> > +#define tcpi_subflow_id tcpi_fackets
>
> Should we eventually add a comment next to the declaration of
> "tcpi_fackets" in include/uapi/linux/tcp.h to say that this field is
> being used by MPTCP_TCPINFO? Or this is not needed because this is field
> is only overridden when using MPTCP_TCPINFO?
>
> I guess it will never be removed anyway. But could it be overridden by
> another field at some points later?
>
> Cheers,
> Matt
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters.
2023-05-18 14:17 ` Paolo Abeni
@ 2023-05-19 9:18 ` Matthieu Baerts
0 siblings, 0 replies; 12+ messages in thread
From: Matthieu Baerts @ 2023-05-19 9:18 UTC (permalink / raw)
To: Paolo Abeni, mptcp
Hi Paolo,
On 18/05/2023 16:17, Paolo Abeni wrote:
> On Wed, 2023-05-17 at 16:26 +0200, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 04/05/2023 18:40, Paolo Abeni wrote:
>>> Currently there are no data transfer counters accounting for all
>>> the subflows used by a given MPTCP socket. The user-space can compute
>>> such figures aggregating the subflow info, but that is inaccurate
>>> if any subflow is closed before the MPTCP socket itself.
>>>
>>> Add the new counters in the MPTCP socket itself and expose them
>>> via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
>>> acquire the relevant locks before fetching the msk data, to ensure
>>> better data consistency
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>
>> (...)
>>
>>> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
>>> index d2aa02e28917..2265387a5db2 100644
>>> --- a/net/mptcp/sockopt.c
>>> +++ b/net/mptcp/sockopt.c
>>> @@ -888,7 +888,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
>>>
>>> void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
>>> {
>>> + struct sock *sk = (struct sock *)msk;
>>> u32 flags = 0;
>>> + bool slow;
>>>
>>> memset(info, 0, sizeof(*info));
>>>
>>> @@ -914,11 +916,21 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
>>> if (READ_ONCE(msk->can_ack))
>>> flags |= MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED;
>>> info->mptcpi_flags = flags;
>>> - info->mptcpi_token = READ_ONCE(msk->token);
>>> - info->mptcpi_write_seq = READ_ONCE(msk->write_seq);
>>> - info->mptcpi_snd_una = READ_ONCE(msk->snd_una);
>>> - info->mptcpi_rcv_nxt = READ_ONCE(msk->ack_seq);
>>> - info->mptcpi_csum_enabled = READ_ONCE(msk->csum_enabled);
>>> + mptcp_data_lock(sk);
>>> + info->mptcpi_snd_una = msk->snd_una;
>>> + info->mptcpi_rcv_nxt = msk->ack_seq;
>>> + info->mptcpi_bytes_acked = msk->bytes_acked;
>>> + mptcp_data_unlock(sk);
>>
>> Do you know if this could have a "bad" impact on the performances?
>
> It depends ;)
>
> If you do something alike:
>
> for CPU in $(seq $(nproc)); do while ss -M; do :; done & done
>
> it will impact mptcp socket performances ;)
>
> For a more reasonable usage, nothing measurable is expected: once every
> a few minutes? hours? much longer interval? an additional process will
> compete for the mptcp data and socket lock. It's not expected to matter
> much.
In case of debugging, we might want to take these info very regularly.
But then it is a debugging session and it is understandable it can have
an impact. Taking more CPU waiting to get the info is fine in this case,
more than slowing the connections.
> The fact we need to acquire independently the data lock and socket lock
> is a bit not nice - for the user-space requesting the dump. It can
> possibly be avoided implementing some helper to upgrade the mptcp data
> lock to the (fast) socket lock, but at this point I think it would be a
> bit overkill.
Indeed.
>> I mean mptcp_diag_fill_info() is also used on when dumping all the
>> connections via Netlink. I guess the lock_sock_fast() should be fine
>> because it is also used in tcp_get_info() but what about the data lock?
>> Also OK because that's a spin lock and we only copy fields?
>
> The main impact we have on the data path is the dirtying of the msk
> socket spin lock cacheline, which will possibly cause a cache miss on
> the next data path lock operation, if the dumping happens on a
> different core. Not relevant until we get many of them per socket per
> seconds.
Good point.
Thank you for having shared your point of view!
I'm fine with this!
I guess we don't need to backport the locks you added then, can be seen
as a new feature.
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-05-19 9:18 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-04 16:39 [PATCH mptcp-next 0/4] mptcp: add some more diag info Paolo Abeni
2023-05-04 16:39 ` [PATCH mptcp-next 1/4] mptcp: add subflow unique id Paolo Abeni
2023-05-17 14:25 ` Matthieu Baerts
2023-05-18 14:25 ` Paolo Abeni
2023-05-04 16:40 ` [PATCH mptcp-next 2/4] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
2023-05-04 16:40 ` [PATCH mptcp-next 3/4] mptcp: track some aggregate data counters Paolo Abeni
2023-05-17 14:26 ` Matthieu Baerts
2023-05-18 14:17 ` Paolo Abeni
2023-05-19 9:18 ` Matthieu Baerts
2023-05-04 16:40 ` [PATCH mptcp-next 4/4] selftests: mptcp: explicitly tests aggregate countes Paolo Abeni
2023-05-17 14:27 ` Matthieu Baerts
2023-05-17 14:24 ` [PATCH mptcp-next 0/4] mptcp: add some more diag info Matthieu Baerts
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.