* [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info
@ 2023-05-23 17:37 Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 1/6] mptcp: add subflow unique id Paolo Abeni
` (5 more replies)
0 siblings, 6 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
This is a follow up to the topic discussed in recent pubblic mtg.
Introduces unique id for accurate subflow stats tracking and
aggregate mptcp counters, plus some minimal self-tests.
The tests themself do not take in account support for running on
older kernel.
This is on top of "mptcp: a bunch of data race fixes".
There should be non trivial conflicts with:
"mptcp: use get_retrans wrapper".
v2 -> v3:
- address Matttbe comments on patch 1, 2 and 5, see the indivdual
patches changelog for the details
v1 -> v2:
- introduce MPTCP_FULL_INFO instead of overloading a tcp_info field
- add related self-tests
- fix a couple of subflow_id initialization bugs
Paolo Abeni (6):
mptcp: add subflow unique id
mptcp: introduce MPTCP_FULL_INFO getsockopt
mptcp: move snd_una update earlier for fallback socket.
mptcp: track some aggregate data counters.
selftests: mptcp: explicitly tests aggregate counters
selftests: mptcp: add MPTCP_FULL_INFO testcase
include/uapi/linux/mptcp.h | 21 ++++
net/mptcp/options.c | 14 ++-
net/mptcp/protocol.c | 24 ++--
net/mptcp/protocol.h | 9 +-
net/mptcp/sockopt.c | 97 ++++++++++++++++-
net/mptcp/subflow.c | 2 +
.../selftests/net/mptcp/mptcp_sockopt.c | 103 +++++++++++++++++-
7 files changed, 253 insertions(+), 17 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 1/6] mptcp: add subflow unique id
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt Paolo Abeni
` (4 subsequent siblings)
5 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
The user-space need to preperly account the data received/sent by
individual subflows. When additional subflows are created and/or
closed during the MPTCP socket lifetime, the information currently
exposed via MPTCP_TCPINFO are not enough: subflows are identifed only
by the sequential position inside the info dumps, and that will change
with the above mentioned events.
To solve the above problem, this patch introduces a new subflow identifier
that is unique inside the given mptcp socket scope.
The initial subflow get the id 1 and the other subflows get incremental
values at join time.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
- fix msk subflow_id init (Matttbe)
v1 -> v2:
- properly set subflow_id for the first passive subflow and active subflows, too
- drop the tcpi_fackets overload
---
net/mptcp/protocol.c | 6 ++++++
net/mptcp/protocol.h | 5 ++++-
net/mptcp/subflow.c | 2 ++
3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 28da6a9fe8fd..9998b2dd150e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -96,6 +96,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
list_add(&subflow->node, &msk->conn_list);
sock_hold(ssock->sk);
subflow->request_mptcp = 1;
+ subflow->subflow_id = msk->subflow_id++;
/* This is the first subflow, always with id 0 */
subflow->local_id_valid = 1;
@@ -845,6 +846,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
if (sk->sk_socket && !ssk->sk_socket)
mptcp_sock_graft(ssk, sk->sk_socket);
+ mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
mptcp_sockopt_sync_locked(msk, ssk);
mptcp_subflow_joined(msk, ssk);
return true;
@@ -2775,6 +2777,7 @@ static int __mptcp_init_sock(struct sock *sk)
WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
WRITE_ONCE(msk->allow_infinite_fallback, true);
msk->recovery = false;
+ msk->subflow_id = 1;
mptcp_pm_data_init(msk);
@@ -3206,6 +3209,9 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
mptcp_init_sched(msk, mptcp_sk(sk)->sched);
+ /* passive msk is created after the first/MPC subflow */
+ msk->subflow_id = 2;
+
sock_reset_flag(nsk, SOCK_RCU_FREE);
security_inet_csk_clone(nsk, req);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index de94c01746dc..f9180ecce5e4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -319,7 +319,8 @@ struct mptcp_sock {
u64 rtt_us; /* last maximum rtt of subflows */
} rcvq_space;
- u32 setsockopt_seq;
+ u32 subflow_id;
+ u32 setsockopt_seq;
char ca_name[TCP_CA_NAME_MAX];
struct mptcp_sock *dl_next;
};
@@ -501,6 +502,8 @@ struct mptcp_subflow_context {
u8 reset_reason:4;
u8 stale_count;
+ u32 subflow_id;
+
long delegated_status;
unsigned long fail_tout;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 63ac4dc621d4..c7001a23550a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -819,6 +819,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
if (!ctx->conn)
goto fallback;
+ ctx->subflow_id = 1;
owner = mptcp_sk(ctx->conn);
mptcp_pm_new_connection(owner, child, 1);
@@ -1574,6 +1575,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
subflow->remote_id = remote_id;
subflow->request_join = 1;
subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
+ subflow->subflow_id = msk->subflow_id++;
mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
sock_hold(ssk);
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 1/6] mptcp: add subflow unique id Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
2023-05-23 18:25 ` Florian Westphal
2023-05-23 17:37 ` [PATCH v3 mptcp-next 3/6] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
` (3 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
Some user-space applications want to monitor the subflows utilization.
Dumping the per subflow tcp_info is not enough, as the PM could close
and re-create the subflows under-the-hood, fooling the accounting.
Even checking the src/dst addresses used by each subflow could not
be enough, because new subflows could re-use the same address/port of
the just closed one.
This patch introduces a new socket option, allow dumping all the relevant
information all-at-once (everything, everywhere...), in a consistent manner.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
- added missing changelog (oops)
---
include/uapi/linux/mptcp.h | 16 ++++++++
net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index 32af2d278cb4..f4f42d88e58b 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -12,6 +12,7 @@
#include <linux/in.h> /* for sockaddr_in */
#include <linux/in6.h> /* for sockaddr_in6 */
#include <linux/socket.h> /* for sockaddr_storage and sa_family */
+#include <linux/tcp.h> /* for tcp_info */
#define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
#define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
@@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
};
};
+struct mptcp_subflow_info {
+ __u32 id;
+ struct mptcp_subflow_addrs addrs;
+};
+
+/* struct subflow_info is not supposed nor allowed to grow in
+ * future versions.
+ * If need will arise, a new socket option should be added.
+ */
+struct mptcp_subflow_full_info {
+ struct mptcp_subflow_info subflow_info;
+ struct tcp_info tcp_info;
+};
+
/* MPTCP socket options */
#define MPTCP_INFO 1
#define MPTCP_TCPINFO 2
#define MPTCP_SUBFLOW_ADDRS 3
+#define MPTCP_FULL_INFO 4
#endif /* _UAPI_MPTCP_H */
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index d4258869ac48..b4d04d5dc1f7 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1146,6 +1146,79 @@ static int mptcp_getsockopt_subflow_addrs(struct mptcp_sock *msk, char __user *o
return 0;
}
+static int mptcp_getsockopt_full_info(struct mptcp_sock *msk, char __user *optval,
+ int __user *optlen)
+{
+ struct mptcp_subflow_context *subflow;
+ struct sock *sk = (struct sock *)msk;
+ unsigned int sfcount = 0, copied = 0;
+ int mptcp_info_len, len, size_user;
+ struct mptcp_subflow_data sfd;
+ char __user *infoptr;
+
+ len = mptcp_get_subflow_data(&sfd, optval, optlen);
+ if (len < 0)
+ return len;
+
+ /* don't bother filling the mptcp info if there is not enough
+ * user-space-provided storage
+ */
+ infoptr = optval + sfd.size_subflow_data;
+ if (len > sizeof(struct mptcp_subflow_data)) {
+ struct mptcp_info mptcp_info;
+
+ memset(&mptcp_info, 0, sizeof(mptcp_info));
+ mptcp_info_len = min_t(unsigned int, len, sizeof(struct mptcp_info));
+
+ mptcp_diag_fill_info(msk, &mptcp_info);
+
+ if (copy_to_user(infoptr, &mptcp_info, mptcp_info_len))
+ return -EFAULT;
+
+ len -= mptcp_info_len;
+ copied += mptcp_info_len;
+ infoptr += mptcp_info_len;
+ }
+
+ sfd.size_kernel = sizeof(struct mptcp_subflow_full_info);
+ sfd.size_user = min_t(unsigned int, sfd.size_user,
+ sizeof(struct mptcp_subflow_full_info));
+ lock_sock(sk);
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+ ++sfcount;
+ if (len >= size_user) {
+ struct mptcp_subflow_full_info sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ sinfo.subflow_info.id = subflow->subflow_id;
+ mptcp_get_sub_addrs(ssk, &sinfo.subflow_info.addrs);
+
+ /* don't bother filling the tcp info if the storage is too small */
+ if (sfd.size_user > sizeof(struct mptcp_subflow_info))
+ tcp_get_info(ssk, &sinfo.tcp_info);
+
+ if (copy_to_user(infoptr, &sinfo, sfd.size_user)) {
+ release_sock(sk);
+ return -EFAULT;
+ }
+
+ infoptr += sfd.size_user;
+ copied += sfd.size_user;
+ len -= sfd.size_user;
+ }
+ }
+ release_sock(sk);
+
+ sfd.num_subflows = sfcount;
+ if (mptcp_put_subflow_data(&sfd, optval, copied, optlen))
+ return -EFAULT;
+
+ return 0;
+}
+
static int mptcp_put_int_option(struct mptcp_sock *msk, char __user *optval,
int __user *optlen, int val)
{
@@ -1219,6 +1292,8 @@ static int mptcp_getsockopt_sol_mptcp(struct mptcp_sock *msk, int optname,
switch (optname) {
case MPTCP_INFO:
return mptcp_getsockopt_info(msk, optval, optlen);
+ case MPTCP_FULL_INFO:
+ return mptcp_getsockopt_full_info(msk, optval, optlen);
case MPTCP_TCPINFO:
return mptcp_getsockopt_tcpinfo(msk, optval, optlen);
case MPTCP_SUBFLOW_ADDRS:
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 3/6] mptcp: move snd_una update earlier for fallback socket.
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 1/6] mptcp: add subflow unique id Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters Paolo Abeni
` (2 subsequent siblings)
5 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
That will avoid an unneeded conditional in both the fast-path
and in the fallback case and will simplify a bit the next
patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/options.c | 6 ++++++
net/mptcp/protocol.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a8083207be4..4bdcd2b326bd 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1119,6 +1119,12 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
mptcp_data_lock(subflow->conn);
if (sk_stream_memory_free(sk))
__mptcp_check_push(subflow->conn, sk);
+
+ /* on fallback we just need to ignore the msk-level snd_una, as
+ * this is really plain TCP
+ */
+ msk->snd_una = READ_ONCE(msk->snd_nxt);
+
__mptcp_data_acked(subflow->conn);
mptcp_data_unlock(subflow->conn);
return true;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9998b2dd150e..89fee2ac84e2 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1006,12 +1006,6 @@ static void __mptcp_clean_una(struct sock *sk)
struct mptcp_data_frag *dtmp, *dfrag;
u64 snd_una;
- /* on fallback we just need to ignore snd_una, as this is really
- * plain TCP
- */
- if (__mptcp_check_fallback(msk))
- msk->snd_una = READ_ONCE(msk->snd_nxt);
-
snd_una = msk->snd_una;
list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) {
if (after64(dfrag->data_seq + dfrag->data_len, snd_una))
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters.
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
` (2 preceding siblings ...)
2023-05-23 17:37 ` [PATCH v3 mptcp-next 3/6] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
2023-05-23 18:49 ` Florian Westphal
2023-05-23 17:37 ` [PATCH v3 mptcp-next 5/6] selftests: mptcp: explicitly tests aggregate counters Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 6/6] selftests: mptcp: add MPTCP_FULL_INFO testcase Paolo Abeni
5 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
Currently there are no data transfer counters accounting for all
the subflows used by a given MPTCP socket. The user-space can compute
such figures aggregating the subflow info, but that is inaccurate
if any subflow is closed before the MPTCP socket itself.
Add the new counters in the MPTCP socket itself and expose them
via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
acquire the relevant locks before fetching the msk data, to ensure
better data consistency
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/uapi/linux/mptcp.h | 5 +++++
net/mptcp/options.c | 10 ++++++++--
net/mptcp/protocol.c | 12 +++++++++++-
net/mptcp/protocol.h | 4 ++++
net/mptcp/sockopt.c | 22 +++++++++++++++++-----
5 files changed, 45 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
index f4f42d88e58b..ac469625affd 100644
--- a/include/uapi/linux/mptcp.h
+++ b/include/uapi/linux/mptcp.h
@@ -124,6 +124,11 @@ struct mptcp_info {
__u8 mptcpi_local_addr_used;
__u8 mptcpi_local_addr_max;
__u8 mptcpi_csum_enabled;
+ __u32 mptcpi_retransmits;
+ __u64 mptcpi_bytes_retrans;
+ __u64 mptcpi_bytes_sent;
+ __u64 mptcpi_bytes_received;
+ __u64 mptcpi_bytes_acked;
};
/*
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 4bdcd2b326bd..c254accb14de 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1026,6 +1026,12 @@ u64 __mptcp_expand_seq(u64 old_seq, u64 cur_seq)
return cur_seq;
}
+static void __mptcp_snd_una_update(struct mptcp_sock *msk, u64 new_snd_una)
+{
+ msk->bytes_acked += new_snd_una - msk->snd_una;
+ msk->snd_una = new_snd_una;
+}
+
static void ack_update_msk(struct mptcp_sock *msk,
struct sock *ssk,
struct mptcp_options_received *mp_opt)
@@ -1057,7 +1063,7 @@ static void ack_update_msk(struct mptcp_sock *msk,
__mptcp_check_push(sk, ssk);
if (after64(new_snd_una, old_snd_una)) {
- msk->snd_una = new_snd_una;
+ __mptcp_snd_una_update(msk, new_snd_una);
__mptcp_data_acked(sk);
}
mptcp_data_unlock(sk);
@@ -1123,7 +1129,7 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
/* on fallback we just need to ignore the msk-level snd_una, as
* this is really plain TCP
*/
- msk->snd_una = READ_ONCE(msk->snd_nxt);
+ __mptcp_snd_una_update(msk, READ_ONCE(msk->snd_nxt));
__mptcp_data_acked(subflow->conn);
mptcp_data_unlock(subflow->conn);
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 89fee2ac84e2..adf26b991c1e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -378,6 +378,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) {
/* in sequence */
+ msk->bytes_received += copy_len;
WRITE_ONCE(msk->ack_seq, msk->ack_seq + copy_len);
tail = skb_peek_tail(&sk->sk_receive_queue);
if (tail && mptcp_try_coalesce(sk, tail, skb))
@@ -761,6 +762,7 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
MPTCP_SKB_CB(skb)->map_seq += delta;
__skb_queue_tail(&sk->sk_receive_queue, skb);
}
+ msk->bytes_received += end_seq - msk->ack_seq;
msk->ack_seq = end_seq;
moved = true;
}
@@ -1513,8 +1515,10 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
* that has been handed to the subflow for transmission
* and skip update in case it was old dfrag.
*/
- if (likely(after64(snd_nxt_new, msk->snd_nxt)))
+ if (likely(after64(snd_nxt_new, msk->snd_nxt))) {
+ msk->bytes_sent += snd_nxt_new - msk->snd_nxt;
msk->snd_nxt = snd_nxt_new;
+ }
}
void mptcp_check_and_set_pending(struct sock *sk)
@@ -2639,6 +2643,8 @@ static void __mptcp_retrans(struct sock *sk)
msk->last_snd = ssk;
}
}
+
+ msk->bytes_retrans += len;
dfrag->already_sent = max(dfrag->already_sent, len);
reset_timer:
@@ -3153,6 +3159,10 @@ static int mptcp_disconnect(struct sock *sk, int flags)
WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
mptcp_pm_data_reset(msk);
mptcp_ca_reset(sk);
+ msk->bytes_acked = 0;
+ msk->bytes_received = 0;
+ msk->bytes_sent = 0;
+ msk->bytes_retrans = 0;
WRITE_ONCE(sk->sk_shutdown, 0);
sk_error_report(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f9180ecce5e4..0283383a09f4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -261,10 +261,13 @@ struct mptcp_sock {
u64 local_key;
u64 remote_key;
u64 write_seq;
+ u64 bytes_sent;
u64 snd_nxt;
+ u64 bytes_received;
u64 ack_seq;
atomic64_t rcv_wnd_sent;
u64 rcv_data_fin_seq;
+ u64 bytes_retrans;
int rmem_fwd_alloc;
struct sock *last_snd;
int snd_burst;
@@ -273,6 +276,7 @@ struct mptcp_sock {
* recovery related fields are under data_lock
* protection
*/
+ u64 bytes_acked;
u64 snd_una;
u64 wnd_end;
unsigned long timer_ival;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index b4d04d5dc1f7..2464100ef820 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -888,7 +888,9 @@ static int mptcp_getsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
{
+ struct sock *sk = (struct sock *)msk;
u32 flags = 0;
+ bool slow;
memset(info, 0, sizeof(*info));
@@ -914,11 +916,21 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info)
if (READ_ONCE(msk->can_ack))
flags |= MPTCP_INFO_FLAG_REMOTE_KEY_RECEIVED;
info->mptcpi_flags = flags;
- info->mptcpi_token = READ_ONCE(msk->token);
- info->mptcpi_write_seq = READ_ONCE(msk->write_seq);
- info->mptcpi_snd_una = READ_ONCE(msk->snd_una);
- info->mptcpi_rcv_nxt = READ_ONCE(msk->ack_seq);
- info->mptcpi_csum_enabled = READ_ONCE(msk->csum_enabled);
+ mptcp_data_lock(sk);
+ info->mptcpi_snd_una = msk->snd_una;
+ info->mptcpi_rcv_nxt = msk->ack_seq;
+ info->mptcpi_bytes_acked = msk->bytes_acked;
+ mptcp_data_unlock(sk);
+
+ slow = lock_sock_fast(sk);
+ info->mptcpi_csum_enabled = msk->csum_enabled;
+ info->mptcpi_token = msk->token;
+ info->mptcpi_write_seq = msk->write_seq;
+ info->mptcpi_retransmits = inet_csk(sk)->icsk_retransmits;
+ info->mptcpi_bytes_sent = msk->bytes_sent;
+ info->mptcpi_bytes_received = msk->bytes_received;
+ info->mptcpi_bytes_retrans = msk->bytes_retrans;
+ unlock_sock_fast(sk, slow);
}
EXPORT_SYMBOL_GPL(mptcp_diag_fill_info);
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 5/6] selftests: mptcp: explicitly tests aggregate counters
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
` (3 preceding siblings ...)
2023-05-23 17:37 ` [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 6/6] selftests: mptcp: add MPTCP_FULL_INFO testcase Paolo Abeni
5 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
Update the existing sockopt test-case to do some basic checks
on the newly added counters.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v2 -> v3:
- be more kind with older kernel (Matttbe)
---
.../selftests/net/mptcp/mptcp_sockopt.c | 27 ++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index ae61f39556ca..ff8fcdfccf76 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -51,6 +51,11 @@ struct mptcp_info {
__u8 mptcpi_local_addr_used;
__u8 mptcpi_local_addr_max;
__u8 mptcpi_csum_enabled;
+ __u32 mptcpi_retransmits;
+ __u64 mptcpi_bytes_retrans;
+ __u64 mptcpi_bytes_sent;
+ __u64 mptcpi_bytes_received;
+ __u64 mptcpi_bytes_acked;
};
struct mptcp_subflow_data {
@@ -83,8 +88,10 @@ struct mptcp_subflow_addrs {
struct so_state {
struct mptcp_info mi;
+ struct mptcp_info last_sample;
uint64_t mptcpi_rcv_delta;
uint64_t tcpi_rcv_delta;
+ bool pkt_stats_avail;
};
static void die_perror(const char *msg)
@@ -318,8 +325,9 @@ static void do_getsockopt_mptcp_info(struct so_state *s, int fd, size_t w)
if (ret < 0)
die_perror("getsockopt MPTCP_INFO");
- assert(olen == sizeof(i));
+ s->pkt_stats_avail = olen >= sizeof(i);
+ s->last_sample = i;
if (s->mi.mptcpi_write_seq == 0)
s->mi = i;
@@ -556,6 +564,23 @@ static void process_one_client(int fd, int pipefd)
do_getsockopts(&s, fd, ret, ret2);
if (s.mptcpi_rcv_delta != (uint64_t)ret + 1)
xerror("mptcpi_rcv_delta %" PRIu64 ", expect %" PRIu64, s.mptcpi_rcv_delta, ret + 1, s.mptcpi_rcv_delta - ret);
+
+ /* be nice when running on top of older kernel */
+ if (s.pkt_stats_avail) {
+ if (s.last_sample.mptcpi_bytes_sent != ret2)
+ xerror("mptcpi_bytes_sent %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_sent, ret2,
+ s.last_sample.mptcpi_bytes_sent - ret2);
+ if (s.last_sample.mptcpi_bytes_received != ret)
+ xerror("mptcpi_bytes_received %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_received, ret,
+ s.last_sample.mptcpi_bytes_received - ret);
+ if (s.last_sample.mptcpi_bytes_acked != ret)
+ xerror("mptcpi_bytes_acked %" PRIu64 ", expect %" PRIu64,
+ s.last_sample.mptcpi_bytes_acked, ret2,
+ s.last_sample.mptcpi_bytes_acked - ret2);
+ }
+
close(fd);
}
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 mptcp-next 6/6] selftests: mptcp: add MPTCP_FULL_INFO testcase
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
` (4 preceding siblings ...)
2023-05-23 17:37 ` [PATCH v3 mptcp-next 5/6] selftests: mptcp: explicitly tests aggregate counters Paolo Abeni
@ 2023-05-23 17:37 ` Paolo Abeni
5 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-23 17:37 UTC (permalink / raw)
To: mptcp
Add a testcase explicitly triggering the newly introduce
MPTCP_FULL_INFO getsockopt.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
.../selftests/net/mptcp/mptcp_sockopt.c | 76 ++++++++++++++++++-
1 file changed, 74 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index ff8fcdfccf76..e4cdf4af75c3 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -86,9 +86,25 @@ struct mptcp_subflow_addrs {
#define MPTCP_SUBFLOW_ADDRS 3
#endif
+#ifndef MPTCP_FULL_INFO
+struct mptcp_subflow_info {
+ __u32 id;
+ struct mptcp_subflow_addrs addrs;
+};
+
+struct mptcp_subflow_full_info {
+ struct mptcp_subflow_info subflow_info;
+ struct tcp_info tcp_info;
+};
+
+#define MPTCP_FULL_INFO 4
+#endif
+
struct so_state {
struct mptcp_info mi;
struct mptcp_info last_sample;
+ struct tcp_info tcp_info;
+ struct mptcp_subflow_addrs addrs;
uint64_t mptcpi_rcv_delta;
uint64_t tcpi_rcv_delta;
bool pkt_stats_avail;
@@ -365,6 +381,8 @@ static void do_getsockopt_tcp_info(struct so_state *s, int fd, size_t r, size_t
olen -= sizeof(struct mptcp_subflow_data);
assert(olen == sizeof(struct tcp_info));
+ s->tcp_info = ti.ti[0];
+
if (ti.ti[0].tcpi_bytes_sent == w &&
ti.ti[0].tcpi_bytes_received == r)
goto done;
@@ -386,7 +404,7 @@ static void do_getsockopt_tcp_info(struct so_state *s, int fd, size_t r, size_t
do_getsockopt_bogus_sf_data(fd, MPTCP_TCPINFO);
}
-static void do_getsockopt_subflow_addrs(int fd)
+static void do_getsockopt_subflow_addrs(struct so_state *s, int fd)
{
struct sockaddr_storage remote, local;
socklen_t olen, rlen, llen;
@@ -433,6 +451,7 @@ static void do_getsockopt_subflow_addrs(int fd)
assert(memcmp(&local, &addrs.addr[0].ss_local, sizeof(local)) == 0);
assert(memcmp(&remote, &addrs.addr[0].ss_remote, sizeof(remote)) == 0);
+ s->addrs = addrs.addr[0];
memset(&addrs, 0, sizeof(addrs));
@@ -453,13 +472,66 @@ static void do_getsockopt_subflow_addrs(int fd)
do_getsockopt_bogus_sf_data(fd, MPTCP_SUBFLOW_ADDRS);
}
+static void do_getsockopt_mptcp_full_info(struct so_state *s, int fd)
+{
+ size_t data_size = 2 * sizeof(struct mptcp_subflow_full_info) +
+ sizeof(struct mptcp_subflow_data) +
+ sizeof(struct mptcp_info);
+ struct mptcp_subflow_full_info *sfi;
+ struct mptcp_subflow_data *sfd;
+ void *data = alloca(data_size);
+ struct mptcp_info *mi;
+ socklen_t olen;
+ int ret;
+
+ sfd = data;
+ memset(data, 0, data_size);
+
+ sfd->size_subflow_data = sizeof(struct mptcp_subflow_data);
+ sfd->size_user = sizeof(struct mptcp_subflow_full_info);
+ olen = data_size;
+
+ ret = getsockopt(fd, SOL_MPTCP, MPTCP_FULL_INFO, data, &olen);
+ if (ret < 0) {
+ if (errno == EOPNOTSUPP) {
+ fprintf(stderr, "\tMPTCP_FULL_INFO test skipped due to lack of kernel support\n");
+ return;
+ }
+ xerror("getsockopt MPTCP_FULL_INFO");
+ }
+
+ assert(olen <= data_size);
+ assert(sfd->size_user == sfd->size_kernel);
+ assert(sfd->size_user == sizeof(struct mptcp_subflow_full_info));
+ assert(sfd->num_subflows == 1);
+
+ assert(olen > (socklen_t)sizeof(struct mptcp_subflow_data));
+ olen -= sizeof(struct mptcp_subflow_data);
+ assert(olen > (socklen_t)sizeof(struct mptcp_info));
+ mi = data + sizeof(struct mptcp_subflow_data);
+ assert(mi->mptcpi_subflows == 0);
+ assert(mi->mptcpi_bytes_sent == s->last_sample.mptcpi_bytes_sent);
+ assert(mi->mptcpi_bytes_received == s->last_sample.mptcpi_bytes_received);
+
+ olen -= sizeof(struct mptcp_info);
+ assert(olen == sizeof(struct mptcp_subflow_full_info));
+ sfi = data + sizeof(struct mptcp_subflow_data) + sizeof(struct mptcp_info);
+ assert(sfi->subflow_info.id == 1);
+ assert(sfi->tcp_info.tcpi_bytes_sent == s->tcp_info.tcpi_bytes_sent);
+ assert(sfi->tcp_info.tcpi_bytes_received == s->tcp_info.tcpi_bytes_received);
+ assert(!memcmp(&sfi->subflow_info.addrs, &s->addrs, sizeof(struct mptcp_subflow_addrs)));
+}
+
static void do_getsockopts(struct so_state *s, int fd, size_t r, size_t w)
{
do_getsockopt_mptcp_info(s, fd, w);
do_getsockopt_tcp_info(s, fd, r, w);
- do_getsockopt_subflow_addrs(fd);
+ do_getsockopt_subflow_addrs(s, fd);
+
+ if (r)
+ do_getsockopt_mptcp_full_info(s, fd);
}
static void connect_one_server(int fd, int pipefd)
--
2.40.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-23 17:37 ` [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt Paolo Abeni
@ 2023-05-23 18:25 ` Florian Westphal
2023-05-24 6:34 ` Paolo Abeni
2023-05-24 8:13 ` Matthieu Baerts
0 siblings, 2 replies; 25+ messages in thread
From: Florian Westphal @ 2023-05-23 18:25 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> Some user-space applications want to monitor the subflows utilization.
>
> Dumping the per subflow tcp_info is not enough, as the PM could close
> and re-create the subflows under-the-hood, fooling the accounting.
> Even checking the src/dst addresses used by each subflow could not
> be enough, because new subflows could re-use the same address/port of
> the just closed one.
>
> This patch introduces a new socket option, allow dumping all the relevant
> information all-at-once (everything, everywhere...), in a consistent manner.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v2 -> v3:
> - added missing changelog (oops)
> ---
> include/uapi/linux/mptcp.h | 16 ++++++++
> net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 91 insertions(+)
>
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index 32af2d278cb4..f4f42d88e58b 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -12,6 +12,7 @@
> #include <linux/in.h> /* for sockaddr_in */
> #include <linux/in6.h> /* for sockaddr_in6 */
> #include <linux/socket.h> /* for sockaddr_storage and sa_family */
> +#include <linux/tcp.h> /* for tcp_info */
>
> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
> #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
> @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
> };
> };
>
> +struct mptcp_subflow_info {
> + __u32 id;
> + struct mptcp_subflow_addrs addrs;
> +};
> +
> +/* struct subflow_info is not supposed nor allowed to grow in
> + * future versions.
> + * If need will arise, a new socket option should be added.
> + */
> +struct mptcp_subflow_full_info {
> + struct mptcp_subflow_info subflow_info;
> + struct tcp_info tcp_info;
> +};
I dislike this constraint.
Why not do something like this:
struct mptcp_subflow_full_info {
__u32 size_subflow_full_info; /* size of this structure in userspace */
__u32 num_subflows_user; /* max subflows that userspace is interested in */
__u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
__u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
__u32 size_sfinfo_kernel; /* must be 0, set by kernel */
__u32 size_sfinfo_user;
__u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
__u32 size_tcpinfo_user;
__aligned_u64 subflow_info_addr;
__aligned_u64 tcp_info_addr;
};
userspace does:
x = calloc(sizeof(struct mptcp_subflow_info), 42);
y = calloc(sizeof(struct tcp_info), 42);
struct mptcp_subflow_full_info {
.size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
.num_subflows_user = 42,
.size_sfinfo_user = sizeof(struct mptcp_subflow_info),
.size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
.subflow_info_addr = (u64)x,
.tcp_info_addr = (u64)y,
};
kernel does:
1. put_user() the real sizes uses by the kernel
2. put_user() the real subflow count
3. copy subflow_info_addr and tcp_info_addr addresses
4. treat as userspace pointers
for each subflow, copy tcpinfo and subflow info to the
two arrays provided by userspace.
Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
that new kernel won't populate fields that don't exist in userspace.
Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
whatever comes first and updates num_subflows_copied to the real copied
value.
This allows userspace to discover when kernel had more subflows
but could not place the data due to lack of space in the
userspace-provided arrays.
This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
etc. in the future.
What do you think?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters.
2023-05-23 17:37 ` [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters Paolo Abeni
@ 2023-05-23 18:49 ` Florian Westphal
2023-05-24 8:15 ` Paolo Abeni
0 siblings, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2023-05-23 18:49 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> Currently there are no data transfer counters accounting for all
> the subflows used by a given MPTCP socket. The user-space can compute
> such figures aggregating the subflow info, but that is inaccurate
> if any subflow is closed before the MPTCP socket itself.
>
> Add the new counters in the MPTCP socket itself and expose them
> via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
> acquire the relevant locks before fetching the msk data, to ensure
> better data consistency
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/uapi/linux/mptcp.h | 5 +++++
> net/mptcp/options.c | 10 ++++++++--
> net/mptcp/protocol.c | 12 +++++++++++-
> net/mptcp/protocol.h | 4 ++++
> net/mptcp/sockopt.c | 22 +++++++++++++++++-----
> 5 files changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> index f4f42d88e58b..ac469625affd 100644
> --- a/include/uapi/linux/mptcp.h
> +++ b/include/uapi/linux/mptcp.h
> @@ -124,6 +124,11 @@ struct mptcp_info {
> __u8 mptcpi_local_addr_used;
> __u8 mptcpi_local_addr_max;
> __u8 mptcpi_csum_enabled;
> + __u32 mptcpi_retransmits;
> + __u64 mptcpi_bytes_retrans;
> + __u64 mptcpi_bytes_sent;
> + __u64 mptcpi_bytes_received;
> + __u64 mptcpi_bytes_acked;
__aligned_u64 is preferred for uapi.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-23 18:25 ` Florian Westphal
@ 2023-05-24 6:34 ` Paolo Abeni
2023-05-24 6:36 ` Paolo Abeni
2023-05-24 9:56 ` Florian Westphal
2023-05-24 8:13 ` Matthieu Baerts
1 sibling, 2 replies; 25+ messages in thread
From: Paolo Abeni @ 2023-05-24 6:34 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Tue, 2023-05-23 at 20:25 +0200, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > Some user-space applications want to monitor the subflows utilization.
> >
> > Dumping the per subflow tcp_info is not enough, as the PM could close
> > and re-create the subflows under-the-hood, fooling the accounting.
> > Even checking the src/dst addresses used by each subflow could not
> > be enough, because new subflows could re-use the same address/port of
> > the just closed one.
> >
> > This patch introduces a new socket option, allow dumping all the relevant
> > information all-at-once (everything, everywhere...), in a consistent manner.
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v2 -> v3:
> > - added missing changelog (oops)
> > ---
> > include/uapi/linux/mptcp.h | 16 ++++++++
> > net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 91 insertions(+)
> >
> > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > index 32af2d278cb4..f4f42d88e58b 100644
> > --- a/include/uapi/linux/mptcp.h
> > +++ b/include/uapi/linux/mptcp.h
> > @@ -12,6 +12,7 @@
> > #include <linux/in.h> /* for sockaddr_in */
> > #include <linux/in6.h> /* for sockaddr_in6 */
> > #include <linux/socket.h> /* for sockaddr_storage and sa_family */
> > +#include <linux/tcp.h> /* for tcp_info */
> >
> > #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
> > #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
> > @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
> > };
> > };
> >
> > +struct mptcp_subflow_info {
> > + __u32 id;
> > + struct mptcp_subflow_addrs addrs;
> > +};
> > +
> > +/* struct subflow_info is not supposed nor allowed to grow in
> > + * future versions.
> > + * If need will arise, a new socket option should be added.
> > + */
> > +struct mptcp_subflow_full_info {
> > + struct mptcp_subflow_info subflow_info;
> > + struct tcp_info tcp_info;
> > +};
>
> I dislike this constraint.
>
> Why not do something like this:
>
> struct mptcp_subflow_full_info {
> __u32 size_subflow_full_info; /* size of this structure in userspace */
> __u32 num_subflows_user; /* max subflows that userspace is interested in */
> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> __u32 size_sfinfo_user;
> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> __u32 size_tcpinfo_user;
> __aligned_u64 subflow_info_addr;
> __aligned_u64 tcp_info_addr;
> };
>
> userspace does:
>
> x = calloc(sizeof(struct mptcp_subflow_info), 42);
> y = calloc(sizeof(struct tcp_info), 42);
>
> struct mptcp_subflow_full_info {
> .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
> .num_subflows_user = 42,
> .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
> .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
> .subflow_info_addr = (u64)x,
> .tcp_info_addr = (u64)y,
> };
>
> kernel does:
> 1. put_user() the real sizes uses by the kernel
> 2. put_user() the real subflow count
> 3. copy subflow_info_addr and tcp_info_addr addresses
> 4. treat as userspace pointers
>
> for each subflow, copy tcpinfo and subflow info to the
> two arrays provided by userspace.
>
> Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
> that new kernel won't populate fields that don't exist in userspace.
>
> Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
> whatever comes first and updates num_subflows_copied to the real copied
> value.
>
> This allows userspace to discover when kernel had more subflows
> but could not place the data due to lack of space in the
> userspace-provided arrays.
>
> This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
> etc. in the future.
>
> What do you think?
I ended-up with the current code trying to re-use as much as possible
from the existing infra - specifically the
mptcp_get_subflow_data()/mptcp_put_subflow_data() helpers.
I agree the constraint on mptcp_subflow_full_info is not nice,
especially if we want to include subflow level indormation alike
local/remote address id, flags, etc...
Isn't num_subflows_copied always implied? == min(num_subflows_user,
num_subflows_kern).
Otherwise LGTM, I'll try to have a spin.
Thanks!
Paolo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 6:34 ` Paolo Abeni
@ 2023-05-24 6:36 ` Paolo Abeni
2023-05-24 7:36 ` Matthieu Baerts
2023-05-24 9:56 ` Florian Westphal
1 sibling, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-24 6:36 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Wed, 2023-05-24 at 08:34 +0200, Paolo Abeni wrote:
> On Tue, 2023-05-23 at 20:25 +0200, Florian Westphal wrote:
> > Paolo Abeni <pabeni@redhat.com> wrote:
> > > Some user-space applications want to monitor the subflows utilization.
> > >
> > > Dumping the per subflow tcp_info is not enough, as the PM could close
> > > and re-create the subflows under-the-hood, fooling the accounting.
> > > Even checking the src/dst addresses used by each subflow could not
> > > be enough, because new subflows could re-use the same address/port of
> > > the just closed one.
> > >
> > > This patch introduces a new socket option, allow dumping all the relevant
> > > information all-at-once (everything, everywhere...), in a consistent manner.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > v2 -> v3:
> > > - added missing changelog (oops)
> > > ---
> > > include/uapi/linux/mptcp.h | 16 ++++++++
> > > net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 91 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > > index 32af2d278cb4..f4f42d88e58b 100644
> > > --- a/include/uapi/linux/mptcp.h
> > > +++ b/include/uapi/linux/mptcp.h
> > > @@ -12,6 +12,7 @@
> > > #include <linux/in.h> /* for sockaddr_in */
> > > #include <linux/in6.h> /* for sockaddr_in6 */
> > > #include <linux/socket.h> /* for sockaddr_storage and sa_family */
> > > +#include <linux/tcp.h> /* for tcp_info */
> > >
> > > #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
> > > #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
> > > @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
> > > };
> > > };
> > >
> > > +struct mptcp_subflow_info {
> > > + __u32 id;
> > > + struct mptcp_subflow_addrs addrs;
> > > +};
> > > +
> > > +/* struct subflow_info is not supposed nor allowed to grow in
> > > + * future versions.
> > > + * If need will arise, a new socket option should be added.
> > > + */
> > > +struct mptcp_subflow_full_info {
> > > + struct mptcp_subflow_info subflow_info;
> > > + struct tcp_info tcp_info;
> > > +};
> >
> > I dislike this constraint.
> >
> > Why not do something like this:
> >
> > struct mptcp_subflow_full_info {
> > __u32 size_subflow_full_info; /* size of this structure in userspace */
> > __u32 num_subflows_user; /* max subflows that userspace is interested in */
> > __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> > __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
> > __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_sfinfo_user;
> > __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_tcpinfo_user;
> > __aligned_u64 subflow_info_addr;
> > __aligned_u64 tcp_info_addr;
> > };
> >
> > userspace does:
> >
> > x = calloc(sizeof(struct mptcp_subflow_info), 42);
> > y = calloc(sizeof(struct tcp_info), 42);
> >
> > struct mptcp_subflow_full_info {
> > .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
> > .num_subflows_user = 42,
> > .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
> > .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
> > .subflow_info_addr = (u64)x,
> > .tcp_info_addr = (u64)y,
> > };
> >
> > kernel does:
> > 1. put_user() the real sizes uses by the kernel
> > 2. put_user() the real subflow count
> > 3. copy subflow_info_addr and tcp_info_addr addresses
> > 4. treat as userspace pointers
> >
> > for each subflow, copy tcpinfo and subflow info to the
> > two arrays provided by userspace.
> >
> > Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
> > that new kernel won't populate fields that don't exist in userspace.
> >
> > Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
> > whatever comes first and updates num_subflows_copied to the real copied
> > value.
> >
> > This allows userspace to discover when kernel had more subflows
> > but could not place the data due to lack of space in the
> > userspace-provided arrays.
> >
> > This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
> > etc. in the future.
> >
> > What do you think?
>
> I ended-up with the current code trying to re-use as much as possible
> from the existing infra - specifically the
> mptcp_get_subflow_data()/mptcp_put_subflow_data() helpers.
>
> I agree the constraint on mptcp_subflow_full_info is not nice,
> especially if we want to include subflow level indormation alike
> local/remote address id, flags, etc...
>
> Isn't num_subflows_copied always implied? == min(num_subflows_user,
> num_subflows_kern).
>
> Otherwise LGTM, I'll try to have a spin.
Oops, I forgot... should mptcp_subflow_full_info contain mptcp_info,
too? Overall layout:
struct mptcp_subflow_full_info {
__u32 size_subflow_full_info; /* size of this structure in userspace */
__u32 num_subflows_user; /* max subflows that userspace is interested in */
__u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
__u32 size_sfinfo_kernel; /* must be 0, set by kernel */
__u32 size_sfinfo_user;
__u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
__u32 size_tcpinfo_user;
__aligned_u64 subflow_info_addr;
__aligned_u64 tcp_info_addr;
struct mptcp_info mptcp_indo;
};
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 6:36 ` Paolo Abeni
@ 2023-05-24 7:36 ` Matthieu Baerts
2023-05-24 8:11 ` Paolo Abeni
0 siblings, 1 reply; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 7:36 UTC (permalink / raw)
To: Paolo Abeni, Florian Westphal; +Cc: mptcp
Hi Paolo, Florian,
On 24/05/2023 08:36, Paolo Abeni wrote:
> On Wed, 2023-05-24 at 08:34 +0200, Paolo Abeni wrote:
>> On Tue, 2023-05-23 at 20:25 +0200, Florian Westphal wrote:
>>> Paolo Abeni <pabeni@redhat.com> wrote:
>>>> Some user-space applications want to monitor the subflows utilization.
>>>>
>>>> Dumping the per subflow tcp_info is not enough, as the PM could close
>>>> and re-create the subflows under-the-hood, fooling the accounting.
>>>> Even checking the src/dst addresses used by each subflow could not
>>>> be enough, because new subflows could re-use the same address/port of
>>>> the just closed one.
>>>>
>>>> This patch introduces a new socket option, allow dumping all the relevant
>>>> information all-at-once (everything, everywhere...), in a consistent manner.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>> v2 -> v3:
>>>> - added missing changelog (oops)
>>>> ---
>>>> include/uapi/linux/mptcp.h | 16 ++++++++
>>>> net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 91 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
>>>> index 32af2d278cb4..f4f42d88e58b 100644
>>>> --- a/include/uapi/linux/mptcp.h
>>>> +++ b/include/uapi/linux/mptcp.h
>>>> @@ -12,6 +12,7 @@
>>>> #include <linux/in.h> /* for sockaddr_in */
>>>> #include <linux/in6.h> /* for sockaddr_in6 */
>>>> #include <linux/socket.h> /* for sockaddr_storage and sa_family */
>>>> +#include <linux/tcp.h> /* for tcp_info */
>>>>
>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
>>>> @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
>>>> };
>>>> };
>>>>
>>>> +struct mptcp_subflow_info {
>>>> + __u32 id;
>>>> + struct mptcp_subflow_addrs addrs;
>>>> +};
>>>> +
>>>> +/* struct subflow_info is not supposed nor allowed to grow in
>>>> + * future versions.
>>>> + * If need will arise, a new socket option should be added.
>>>> + */
>>>> +struct mptcp_subflow_full_info {
>>>> + struct mptcp_subflow_info subflow_info;
>>>> + struct tcp_info tcp_info;
>>>> +};
>>>
>>> I dislike this constraint.
>>>
>>> Why not do something like this:
>>>
>>> struct mptcp_subflow_full_info {
>>> __u32 size_subflow_full_info; /* size of this structure in userspace */
>>> __u32 num_subflows_user; /* max subflows that userspace is interested in */
>>> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
>>> __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
>>> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_sfinfo_user;
>>> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_tcpinfo_user;
>>> __aligned_u64 subflow_info_addr;
>>> __aligned_u64 tcp_info_addr;
>>> };
>>>
>>> userspace does:
>>>
>>> x = calloc(sizeof(struct mptcp_subflow_info), 42);
>>> y = calloc(sizeof(struct tcp_info), 42);
>>>
>>> struct mptcp_subflow_full_info {
>>> .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
>>> .num_subflows_user = 42,
>>> .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
>>> .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
>>> .subflow_info_addr = (u64)x,
>>> .tcp_info_addr = (u64)y,
>>> };
>>>
>>> kernel does:
>>> 1. put_user() the real sizes uses by the kernel
>>> 2. put_user() the real subflow count
>>> 3. copy subflow_info_addr and tcp_info_addr addresses
>>> 4. treat as userspace pointers
>>>
>>> for each subflow, copy tcpinfo and subflow info to the
>>> two arrays provided by userspace.
>>>
>>> Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
>>> that new kernel won't populate fields that don't exist in userspace.
>>>
>>> Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
>>> whatever comes first and updates num_subflows_copied to the real copied
>>> value.
>>>
>>> This allows userspace to discover when kernel had more subflows
>>> but could not place the data due to lack of space in the
>>> userspace-provided arrays.
>>>
>>> This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
>>> etc. in the future.
>>>
>>> What do you think?
>>
>> I ended-up with the current code trying to re-use as much as possible
>> from the existing infra - specifically the
>> mptcp_get_subflow_data()/mptcp_put_subflow_data() helpers.
>>
>> I agree the constraint on mptcp_subflow_full_info is not nice,
>> especially if we want to include subflow level indormation alike
>> local/remote address id, flags, etc...
>>
>> Isn't num_subflows_copied always implied? == min(num_subflows_user,
>> num_subflows_kern).
>>
>> Otherwise LGTM, I'll try to have a spin.
>
> Oops, I forgot... should mptcp_subflow_full_info contain mptcp_info,
> too?
Good point. Indeed I guess it is very likely someone who want info about
MPTCP subflow will also want info about the MPTCP connection.
> Overall layout:
>
> struct mptcp_subflow_full_info {
> __u32 size_subflow_full_info; /* size of this structure in userspace */
> __u32 num_subflows_user; /* max subflows that userspace is interested in */
> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> __u32 size_sfinfo_user;
> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> __u32 size_tcpinfo_user;
> __aligned_u64 subflow_info_addr;
> __aligned_u64 tcp_info_addr;
> struct mptcp_info mptcp_indo;
I guess you will need a size for the mptcp_info structure, no?
Maybe this should come before the previous ones?
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 7:36 ` Matthieu Baerts
@ 2023-05-24 8:11 ` Paolo Abeni
2023-05-24 8:16 ` Matthieu Baerts
0 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-24 8:11 UTC (permalink / raw)
To: Matthieu Baerts, Florian Westphal; +Cc: mptcp
On Wed, 2023-05-24 at 09:36 +0200, Matthieu Baerts wrote:
> Hi Paolo, Florian,
>
> On 24/05/2023 08:36, Paolo Abeni wrote:
> > On Wed, 2023-05-24 at 08:34 +0200, Paolo Abeni wrote:
> > > On Tue, 2023-05-23 at 20:25 +0200, Florian Westphal wrote:
> > > > Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > Some user-space applications want to monitor the subflows utilization.
> > > > >
> > > > > Dumping the per subflow tcp_info is not enough, as the PM could close
> > > > > and re-create the subflows under-the-hood, fooling the accounting.
> > > > > Even checking the src/dst addresses used by each subflow could not
> > > > > be enough, because new subflows could re-use the same address/port of
> > > > > the just closed one.
> > > > >
> > > > > This patch introduces a new socket option, allow dumping all the relevant
> > > > > information all-at-once (everything, everywhere...), in a consistent manner.
> > > > >
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > ---
> > > > > v2 -> v3:
> > > > > - added missing changelog (oops)
> > > > > ---
> > > > > include/uapi/linux/mptcp.h | 16 ++++++++
> > > > > net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
> > > > > 2 files changed, 91 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > > > > index 32af2d278cb4..f4f42d88e58b 100644
> > > > > --- a/include/uapi/linux/mptcp.h
> > > > > +++ b/include/uapi/linux/mptcp.h
> > > > > @@ -12,6 +12,7 @@
> > > > > #include <linux/in.h> /* for sockaddr_in */
> > > > > #include <linux/in6.h> /* for sockaddr_in6 */
> > > > > #include <linux/socket.h> /* for sockaddr_storage and sa_family */
> > > > > +#include <linux/tcp.h> /* for tcp_info */
> > > > >
> > > > > #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
> > > > > #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
> > > > > @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
> > > > > };
> > > > > };
> > > > >
> > > > > +struct mptcp_subflow_info {
> > > > > + __u32 id;
> > > > > + struct mptcp_subflow_addrs addrs;
> > > > > +};
> > > > > +
> > > > > +/* struct subflow_info is not supposed nor allowed to grow in
> > > > > + * future versions.
> > > > > + * If need will arise, a new socket option should be added.
> > > > > + */
> > > > > +struct mptcp_subflow_full_info {
> > > > > + struct mptcp_subflow_info subflow_info;
> > > > > + struct tcp_info tcp_info;
> > > > > +};
> > > >
> > > > I dislike this constraint.
> > > >
> > > > Why not do something like this:
> > > >
> > > > struct mptcp_subflow_full_info {
> > > > __u32 size_subflow_full_info; /* size of this structure in userspace */
> > > > __u32 num_subflows_user; /* max subflows that userspace is interested in */
> > > > __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> > > > __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
> > > > __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> > > > __u32 size_sfinfo_user;
> > > > __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> > > > __u32 size_tcpinfo_user;
> > > > __aligned_u64 subflow_info_addr;
> > > > __aligned_u64 tcp_info_addr;
> > > > };
> > > >
> > > > userspace does:
> > > >
> > > > x = calloc(sizeof(struct mptcp_subflow_info), 42);
> > > > y = calloc(sizeof(struct tcp_info), 42);
> > > >
> > > > struct mptcp_subflow_full_info {
> > > > .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
> > > > .num_subflows_user = 42,
> > > > .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
> > > > .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
> > > > .subflow_info_addr = (u64)x,
> > > > .tcp_info_addr = (u64)y,
> > > > };
> > > >
> > > > kernel does:
> > > > 1. put_user() the real sizes uses by the kernel
> > > > 2. put_user() the real subflow count
> > > > 3. copy subflow_info_addr and tcp_info_addr addresses
> > > > 4. treat as userspace pointers
> > > >
> > > > for each subflow, copy tcpinfo and subflow info to the
> > > > two arrays provided by userspace.
> > > >
> > > > Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
> > > > that new kernel won't populate fields that don't exist in userspace.
> > > >
> > > > Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
> > > > whatever comes first and updates num_subflows_copied to the real copied
> > > > value.
> > > >
> > > > This allows userspace to discover when kernel had more subflows
> > > > but could not place the data due to lack of space in the
> > > > userspace-provided arrays.
> > > >
> > > > This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
> > > > etc. in the future.
> > > >
> > > > What do you think?
> > >
> > > I ended-up with the current code trying to re-use as much as possible
> > > from the existing infra - specifically the
> > > mptcp_get_subflow_data()/mptcp_put_subflow_data() helpers.
> > >
> > > I agree the constraint on mptcp_subflow_full_info is not nice,
> > > especially if we want to include subflow level indormation alike
> > > local/remote address id, flags, etc...
> > >
> > > Isn't num_subflows_copied always implied? == min(num_subflows_user,
> > > num_subflows_kern).
> > >
> > > Otherwise LGTM, I'll try to have a spin.
> >
> > Oops, I forgot... should mptcp_subflow_full_info contain mptcp_info,
> > too?
>
> Good point. Indeed I guess it is very likely someone who want info about
> MPTCP subflow will also want info about the MPTCP connection.
>
> > Overall layout:
> >
> > struct mptcp_subflow_full_info {
> > __u32 size_subflow_full_info; /* size of this structure in userspace */
> > __u32 num_subflows_user; /* max subflows that userspace is interested in */
> > __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> > __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_sfinfo_user;
> > __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_tcpinfo_user;
> > __aligned_u64 subflow_info_addr;
> > __aligned_u64 tcp_info_addr;
> > struct mptcp_info mptcp_indo;
>
> I guess you will need a size for the mptcp_info structure, no?
>
> Maybe this should come before the previous ones?
I'm coding the thing right now: it turns out we are better of not
including mptcp_info inside mptcp_subflow_full_info, so the
manipulation schema will look alike the exiting
mptcp_{get,put}_subflow_data().
Also the user mptcp_info size is implied, since is always equal to
*optlen - mptcp_subflow_full_info.mptcp_subflow_full_info.
/P
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-23 18:25 ` Florian Westphal
2023-05-24 6:34 ` Paolo Abeni
@ 2023-05-24 8:13 ` Matthieu Baerts
2023-05-24 8:53 ` Paolo Abeni
2023-05-24 10:04 ` Florian Westphal
1 sibling, 2 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 8:13 UTC (permalink / raw)
To: Florian Westphal, Paolo Abeni; +Cc: mptcp
Hi Florian,
Thank you for your nice review!
On 23/05/2023 20:25, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> Some user-space applications want to monitor the subflows utilization.
>>
>> Dumping the per subflow tcp_info is not enough, as the PM could close
>> and re-create the subflows under-the-hood, fooling the accounting.
>> Even checking the src/dst addresses used by each subflow could not
>> be enough, because new subflows could re-use the same address/port of
>> the just closed one.
>>
>> This patch introduces a new socket option, allow dumping all the relevant
>> information all-at-once (everything, everywhere...), in a consistent manner.
>>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> v2 -> v3:
>> - added missing changelog (oops)
>> ---
>> include/uapi/linux/mptcp.h | 16 ++++++++
>> net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 91 insertions(+)
>>
>> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
>> index 32af2d278cb4..f4f42d88e58b 100644
>> --- a/include/uapi/linux/mptcp.h
>> +++ b/include/uapi/linux/mptcp.h
>> @@ -12,6 +12,7 @@
>> #include <linux/in.h> /* for sockaddr_in */
>> #include <linux/in6.h> /* for sockaddr_in6 */
>> #include <linux/socket.h> /* for sockaddr_storage and sa_family */
>> +#include <linux/tcp.h> /* for tcp_info */
>>
>> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
>> #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
>> @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
>> };
>> };
>>
>> +struct mptcp_subflow_info {
>> + __u32 id;
>> + struct mptcp_subflow_addrs addrs;
>> +};
>> +
>> +/* struct subflow_info is not supposed nor allowed to grow in
>> + * future versions.
>> + * If need will arise, a new socket option should be added.
>> + */
>> +struct mptcp_subflow_full_info {
>> + struct mptcp_subflow_info subflow_info;
>> + struct tcp_info tcp_info;
>> +};
>
> I dislike this constraint.
>
> Why not do something like this:
>
> struct mptcp_subflow_full_info {
(If we add mptcp_info (cf Paolo's email), better to strip 'subflow')
> __u32 size_subflow_full_info; /* size of this structure in userspace */
Why do we need this field? Can we not use the option length (int __user
*optlen) provided by the userspace?
> __u32 num_subflows_user; /* max subflows that userspace is interested in */
> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> __u32 size_sfinfo_user;
> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> __u32 size_tcpinfo_user;
Do we need to have the kernel and user size in different fields?
Is it not enough have:
num_subflows; /* will be set by the kernel */
num_subflows_copied; /* max for the user, kernel put what it did */
size_sfinfo; /* user set it, kernel put what it used: min */
size_tcpinfo; /* same */
> __aligned_u64 subflow_info_addr;
> __aligned_u64 tcp_info_addr;
> };
>
> userspace does:
>
> x = calloc(sizeof(struct mptcp_subflow_info), 42);
> y = calloc(sizeof(struct tcp_info), 42);
>
> struct mptcp_subflow_full_info {
> .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
> .num_subflows_user = 42,
> .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
> .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
> .subflow_info_addr = (u64)x,
> .tcp_info_addr = (u64)y,
> };
>
> kernel does:
> 1. put_user() the real sizes uses by the kernel
> 2. put_user() the real subflow count
> 3. copy subflow_info_addr and tcp_info_addr addresses
> 4. treat as userspace pointers
>
> for each subflow, copy tcpinfo and subflow info to the
> two arrays provided by userspace.
>
> Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
> that new kernel won't populate fields that don't exist in userspace.
>
> Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
> whatever comes first and updates num_subflows_copied to the real copied
> value.
>
> This allows userspace to discover when kernel had more subflows
> but could not place the data due to lack of space in the
> userspace-provided arrays.
>
> This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
> etc. in the future.
If we add mptcp_info structure as mentioned by Paolo, we are getting
very close the the getsockopt(MPTCP_INFO) from the fork kernel, see the
section "Get detailed information of your subflows":
https://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP
And the kernel header:
https://github.com/multipath-tcp/mptcp/blob/mptcp_trunk/include/uapi/linux/tcp.h#L321-L366
> What do you think?
If we want to be even more future prove, why not having something more
flexible where the userspace puts flags of what it wants and then there
is a kind of TLV structure? Maybe it would be too complex to read/write?
struct mptcp_tlv_info {
__u32 type;
__u32 length;
__aligned_u64 value;
};
struct mptcp_full_info {
__u32 flags;
__u32 num_subflows;
__u32 num_subflows_copied;
struct mptcp_tlv_info[]; /* number has to be linked to flags */
};
Maybe it would be good to add a size for the subflow info structure: if
we use such TLV structure, for info around subflows, we will need to set
at least the ID for each subflow (maybe more later). We might then need:
struct mptcp_full_info {
(...)
__u32 size_mptcp_subflow_info;
(...)
};
struct mptcp_subflow_info {
__u32 id;
};
struct mptcp_subflow_addrs {
struct mptcp_subflow_info info;
struct mptcp_subflow_addrs addrs;
};
struct mptcp_subflow_tcp_info {
struct mptcp_subflow_info info;
struct tcp_info tcp_info;
};
Again, just a quick idea, I didn't investigate more. With that, we will
likely not need more getsockopt() to get info about the MPTCP connection
and its subflows, just an all in one getsockopt().
I'm just trying to reacting quickly before going into a direction :)
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters.
2023-05-23 18:49 ` Florian Westphal
@ 2023-05-24 8:15 ` Paolo Abeni
2023-05-24 10:05 ` Florian Westphal
0 siblings, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-24 8:15 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Tue, 2023-05-23 at 20:49 +0200, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > Currently there are no data transfer counters accounting for all
> > the subflows used by a given MPTCP socket. The user-space can compute
> > such figures aggregating the subflow info, but that is inaccurate
> > if any subflow is closed before the MPTCP socket itself.
> >
> > Add the new counters in the MPTCP socket itself and expose them
> > via the existing diag and sockopt. While touching mptcp_diag_fill_info(),
> > acquire the relevant locks before fetching the msk data, to ensure
> > better data consistency
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > include/uapi/linux/mptcp.h | 5 +++++
> > net/mptcp/options.c | 10 ++++++++--
> > net/mptcp/protocol.c | 12 +++++++++++-
> > net/mptcp/protocol.h | 4 ++++
> > net/mptcp/sockopt.c | 22 +++++++++++++++++-----
> > 5 files changed, 45 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > index f4f42d88e58b..ac469625affd 100644
> > --- a/include/uapi/linux/mptcp.h
> > +++ b/include/uapi/linux/mptcp.h
> > @@ -124,6 +124,11 @@ struct mptcp_info {
> > __u8 mptcpi_local_addr_used;
> > __u8 mptcpi_local_addr_max;
> > __u8 mptcpi_csum_enabled;
> > + __u32 mptcpi_retransmits;
> > + __u64 mptcpi_bytes_retrans;
> > + __u64 mptcpi_bytes_sent;
> > + __u64 mptcpi_bytes_received;
> > + __u64 mptcpi_bytes_acked;
>
> __aligned_u64 is preferred for uapi.
Ouch! We already have a few __u64 in the same struct.
Should we mix __u64 and __aligned ones? By pure luck, the both existing
__u64 and the new one should be aligned, even without the annotation.
/P
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 8:11 ` Paolo Abeni
@ 2023-05-24 8:16 ` Matthieu Baerts
0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 8:16 UTC (permalink / raw)
To: Paolo Abeni, Florian Westphal; +Cc: mptcp
Hi Paolo,
On 24/05/2023 10:11, Paolo Abeni wrote:
> On Wed, 2023-05-24 at 09:36 +0200, Matthieu Baerts wrote:
>> Hi Paolo, Florian,
>>
>> On 24/05/2023 08:36, Paolo Abeni wrote:
>>> On Wed, 2023-05-24 at 08:34 +0200, Paolo Abeni wrote:
>>>> On Tue, 2023-05-23 at 20:25 +0200, Florian Westphal wrote:
>>>>> Paolo Abeni <pabeni@redhat.com> wrote:
>>>>>> Some user-space applications want to monitor the subflows utilization.
>>>>>>
>>>>>> Dumping the per subflow tcp_info is not enough, as the PM could close
>>>>>> and re-create the subflows under-the-hood, fooling the accounting.
>>>>>> Even checking the src/dst addresses used by each subflow could not
>>>>>> be enough, because new subflows could re-use the same address/port of
>>>>>> the just closed one.
>>>>>>
>>>>>> This patch introduces a new socket option, allow dumping all the relevant
>>>>>> information all-at-once (everything, everywhere...), in a consistent manner.
>>>>>>
>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>> ---
>>>>>> v2 -> v3:
>>>>>> - added missing changelog (oops)
>>>>>> ---
>>>>>> include/uapi/linux/mptcp.h | 16 ++++++++
>>>>>> net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 91 insertions(+)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
>>>>>> index 32af2d278cb4..f4f42d88e58b 100644
>>>>>> --- a/include/uapi/linux/mptcp.h
>>>>>> +++ b/include/uapi/linux/mptcp.h
>>>>>> @@ -12,6 +12,7 @@
>>>>>> #include <linux/in.h> /* for sockaddr_in */
>>>>>> #include <linux/in6.h> /* for sockaddr_in6 */
>>>>>> #include <linux/socket.h> /* for sockaddr_storage and sa_family */
>>>>>> +#include <linux/tcp.h> /* for tcp_info */
>>>>>>
>>>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
>>>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
>>>>>> @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
>>>>>> };
>>>>>> };
>>>>>>
>>>>>> +struct mptcp_subflow_info {
>>>>>> + __u32 id;
>>>>>> + struct mptcp_subflow_addrs addrs;
>>>>>> +};
>>>>>> +
>>>>>> +/* struct subflow_info is not supposed nor allowed to grow in
>>>>>> + * future versions.
>>>>>> + * If need will arise, a new socket option should be added.
>>>>>> + */
>>>>>> +struct mptcp_subflow_full_info {
>>>>>> + struct mptcp_subflow_info subflow_info;
>>>>>> + struct tcp_info tcp_info;
>>>>>> +};
>>>>>
>>>>> I dislike this constraint.
>>>>>
>>>>> Why not do something like this:
>>>>>
>>>>> struct mptcp_subflow_full_info {
>>>>> __u32 size_subflow_full_info; /* size of this structure in userspace */
>>>>> __u32 num_subflows_user; /* max subflows that userspace is interested in */
>>>>> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
>>>>> __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
>>>>> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
>>>>> __u32 size_sfinfo_user;
>>>>> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
>>>>> __u32 size_tcpinfo_user;
>>>>> __aligned_u64 subflow_info_addr;
>>>>> __aligned_u64 tcp_info_addr;
>>>>> };
>>>>>
>>>>> userspace does:
>>>>>
>>>>> x = calloc(sizeof(struct mptcp_subflow_info), 42);
>>>>> y = calloc(sizeof(struct tcp_info), 42);
>>>>>
>>>>> struct mptcp_subflow_full_info {
>>>>> .size_subflow_full_info = sizeof(struct mptcp_subflow_full_info),
>>>>> .num_subflows_user = 42,
>>>>> .size_sfinfo_user = sizeof(struct mptcp_subflow_info),
>>>>> .size_tcpinfo_user = sizeof(struct mptcp_subflow_info),
>>>>> .subflow_info_addr = (u64)x,
>>>>> .tcp_info_addr = (u64)y,
>>>>> };
>>>>>
>>>>> kernel does:
>>>>> 1. put_user() the real sizes uses by the kernel
>>>>> 2. put_user() the real subflow count
>>>>> 3. copy subflow_info_addr and tcp_info_addr addresses
>>>>> 4. treat as userspace pointers
>>>>>
>>>>> for each subflow, copy tcpinfo and subflow info to the
>>>>> two arrays provided by userspace.
>>>>>
>>>>> Kernel truncates copied data via size_sfinfo_user/size_tcpinfo_user so
>>>>> that new kernel won't populate fields that don't exist in userspace.
>>>>>
>>>>> Kernel stops copying after 'num_subflows_user' or 'num_subflows_kern',
>>>>> whatever comes first and updates num_subflows_copied to the real copied
>>>>> value.
>>>>>
>>>>> This allows userspace to discover when kernel had more subflows
>>>>> but could not place the data due to lack of space in the
>>>>> userspace-provided arrays.
>>>>>
>>>>> This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
>>>>> etc. in the future.
>>>>>
>>>>> What do you think?
>>>>
>>>> I ended-up with the current code trying to re-use as much as possible
>>>> from the existing infra - specifically the
>>>> mptcp_get_subflow_data()/mptcp_put_subflow_data() helpers.
>>>>
>>>> I agree the constraint on mptcp_subflow_full_info is not nice,
>>>> especially if we want to include subflow level indormation alike
>>>> local/remote address id, flags, etc...
>>>>
>>>> Isn't num_subflows_copied always implied? == min(num_subflows_user,
>>>> num_subflows_kern).
>>>>
>>>> Otherwise LGTM, I'll try to have a spin.
>>>
>>> Oops, I forgot... should mptcp_subflow_full_info contain mptcp_info,
>>> too?
>>
>> Good point. Indeed I guess it is very likely someone who want info about
>> MPTCP subflow will also want info about the MPTCP connection.
>>
>>> Overall layout:
>>>
>>> struct mptcp_subflow_full_info {
>>> __u32 size_subflow_full_info; /* size of this structure in userspace */
>>> __u32 num_subflows_user; /* max subflows that userspace is interested in */
>>> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
>>> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_sfinfo_user;
>>> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_tcpinfo_user;
>>> __aligned_u64 subflow_info_addr;
>>> __aligned_u64 tcp_info_addr;
>>> struct mptcp_info mptcp_indo;
>>
>> I guess you will need a size for the mptcp_info structure, no?
>>
>> Maybe this should come before the previous ones?
>
> I'm coding the thing right now: it turns out we are better of not
> including mptcp_info inside mptcp_subflow_full_info, so the
> manipulation schema will look alike the exiting
> mptcp_{get,put}_subflow_data().
(if you are coding this right now, and also linked to my previous email,
we can also continue the discussion on IRC if it is easier for you :) )
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 8:13 ` Matthieu Baerts
@ 2023-05-24 8:53 ` Paolo Abeni
2023-05-24 9:12 ` Matthieu Baerts
2023-05-24 10:04 ` Florian Westphal
1 sibling, 1 reply; 25+ messages in thread
From: Paolo Abeni @ 2023-05-24 8:53 UTC (permalink / raw)
To: Matthieu Baerts, Florian Westphal; +Cc: mptcp
On Wed, 2023-05-24 at 10:13 +0200, Matthieu Baerts wrote:
> Hi Florian,
>
> Thank you for your nice review!
>
> On 23/05/2023 20:25, Florian Westphal wrote:
> > Paolo Abeni <pabeni@redhat.com> wrote:
> > > Some user-space applications want to monitor the subflows utilization.
> > >
> > > Dumping the per subflow tcp_info is not enough, as the PM could close
> > > and re-create the subflows under-the-hood, fooling the accounting.
> > > Even checking the src/dst addresses used by each subflow could not
> > > be enough, because new subflows could re-use the same address/port of
> > > the just closed one.
> > >
> > > This patch introduces a new socket option, allow dumping all the relevant
> > > information all-at-once (everything, everywhere...), in a consistent manner.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > v2 -> v3:
> > > - added missing changelog (oops)
> > > ---
> > > include/uapi/linux/mptcp.h | 16 ++++++++
> > > net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 91 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
> > > index 32af2d278cb4..f4f42d88e58b 100644
> > > --- a/include/uapi/linux/mptcp.h
> > > +++ b/include/uapi/linux/mptcp.h
> > > @@ -12,6 +12,7 @@
> > > #include <linux/in.h> /* for sockaddr_in */
> > > #include <linux/in6.h> /* for sockaddr_in6 */
> > > #include <linux/socket.h> /* for sockaddr_storage and sa_family */
> > > +#include <linux/tcp.h> /* for tcp_info */
> > >
> > > #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
> > > #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
> > > @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
> > > };
> > > };
> > >
> > > +struct mptcp_subflow_info {
> > > + __u32 id;
> > > + struct mptcp_subflow_addrs addrs;
> > > +};
> > > +
> > > +/* struct subflow_info is not supposed nor allowed to grow in
> > > + * future versions.
> > > + * If need will arise, a new socket option should be added.
> > > + */
> > > +struct mptcp_subflow_full_info {
> > > + struct mptcp_subflow_info subflow_info;
> > > + struct tcp_info tcp_info;
> > > +};
> >
> > I dislike this constraint.
> >
> > Why not do something like this:
> >
> > struct mptcp_subflow_full_info {
>
> (If we add mptcp_info (cf Paolo's email), better to strip 'subflow')
>
> > __u32 size_subflow_full_info; /* size of this structure in userspace */
>
> Why do we need this field? Can we not use the option length (int __user
> *optlen) provided by the userspace?
With this we can append arbitrary/variable size mptcp_info at the end.
>
> > __u32 num_subflows_user; /* max subflows that userspace is interested in */
> > __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
> > __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
> > __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_sfinfo_user;
> > __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_tcpinfo_user;
>
> Do we need to have the kernel and user size in different fields?
If we keep, we can re-use the existing code to manipulate this struct.
Also is consistent with other socket option.
> > This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
> > etc. in the future.
>
> If we add mptcp_info structure as mentioned by Paolo, we are getting
> very close the the getsockopt(MPTCP_INFO) from the fork kernel, see the
> section "Get detailed information of your subflows":
>
> https://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP
>
> And the kernel header:
>
> https://github.com/multipath-tcp/mptcp/blob/mptcp_trunk/include/uapi/linux/tcp.h#L321-L366
>
> > What do you think?
>
> If we want to be even more future prove, why not having something more
> flexible where the userspace puts flags of what it wants and then there
> is a kind of TLV structure? Maybe it would be too complex to read/write?
>
>
> struct mptcp_tlv_info {
> __u32 type;
> __u32 length;
> __aligned_u64 value;
> };
>
> struct mptcp_full_info {
> __u32 flags;
> __u32 num_subflows;
> __u32 num_subflows_copied;
> struct mptcp_tlv_info[]; /* number has to be linked to flags */
> };
>
> Maybe it would be good to add a size for the subflow info structure: if
> we use such TLV structure, for info around subflows, we will need to set
> at least the ID for each subflow (maybe more later). We might then need:
>
> struct mptcp_full_info {
> (...)
> __u32 size_mptcp_subflow_info;
> (...)
> };
>
> struct mptcp_subflow_info {
> __u32 id;
> };
>
> struct mptcp_subflow_addrs {
> struct mptcp_subflow_info info;
> struct mptcp_subflow_addrs addrs;
> };
>
> struct mptcp_subflow_tcp_info {
> struct mptcp_subflow_info info;
> struct tcp_info tcp_info;
> };
>
> Again, just a quick idea, I didn't investigate more. With that, we will
> likely not need more getsockopt() to get info about the MPTCP connection
> and its subflows, just an all in one getsockopt().
>
> I'm just trying to reacting quickly before going into a direction :)
Uhm... I think TLV are not in the right direction: away from the
current convention for variable size/changing struct. If use TLV we
could as well move to NL, which will be again inconsistent to fetch
such kind of info IMHO.
I think it's preferable to keep the sockopt API self-consistent then be
aligned with out-of-tree.
The layout proposed by Florian fits what we already have.
With some little field re-order I can avoid code duplication to
parse/store the initial header.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 8:53 ` Paolo Abeni
@ 2023-05-24 9:12 ` Matthieu Baerts
0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 9:12 UTC (permalink / raw)
To: Paolo Abeni, Florian Westphal; +Cc: mptcp
Hi Paolo,
On 24/05/2023 10:53, Paolo Abeni wrote:
> On Wed, 2023-05-24 at 10:13 +0200, Matthieu Baerts wrote:
>> Hi Florian,
>>
>> Thank you for your nice review!
>>
>> On 23/05/2023 20:25, Florian Westphal wrote:
>>> Paolo Abeni <pabeni@redhat.com> wrote:
>>>> Some user-space applications want to monitor the subflows utilization.
>>>>
>>>> Dumping the per subflow tcp_info is not enough, as the PM could close
>>>> and re-create the subflows under-the-hood, fooling the accounting.
>>>> Even checking the src/dst addresses used by each subflow could not
>>>> be enough, because new subflows could re-use the same address/port of
>>>> the just closed one.
>>>>
>>>> This patch introduces a new socket option, allow dumping all the relevant
>>>> information all-at-once (everything, everywhere...), in a consistent manner.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>> v2 -> v3:
>>>> - added missing changelog (oops)
>>>> ---
>>>> include/uapi/linux/mptcp.h | 16 ++++++++
>>>> net/mptcp/sockopt.c | 75 ++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 91 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h
>>>> index 32af2d278cb4..f4f42d88e58b 100644
>>>> --- a/include/uapi/linux/mptcp.h
>>>> +++ b/include/uapi/linux/mptcp.h
>>>> @@ -12,6 +12,7 @@
>>>> #include <linux/in.h> /* for sockaddr_in */
>>>> #include <linux/in6.h> /* for sockaddr_in6 */
>>>> #include <linux/socket.h> /* for sockaddr_storage and sa_family */
>>>> +#include <linux/tcp.h> /* for tcp_info */
>>>>
>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0)
>>>> #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1)
>>>> @@ -244,9 +245,24 @@ struct mptcp_subflow_addrs {
>>>> };
>>>> };
>>>>
>>>> +struct mptcp_subflow_info {
>>>> + __u32 id;
>>>> + struct mptcp_subflow_addrs addrs;
>>>> +};
>>>> +
>>>> +/* struct subflow_info is not supposed nor allowed to grow in
>>>> + * future versions.
>>>> + * If need will arise, a new socket option should be added.
>>>> + */
>>>> +struct mptcp_subflow_full_info {
>>>> + struct mptcp_subflow_info subflow_info;
>>>> + struct tcp_info tcp_info;
>>>> +};
>>>
>>> I dislike this constraint.
>>>
>>> Why not do something like this:
>>>
>>> struct mptcp_subflow_full_info {
>>
>> (If we add mptcp_info (cf Paolo's email), better to strip 'subflow')
>>
>>> __u32 size_subflow_full_info; /* size of this structure in userspace */
>>
>> Why do we need this field? Can we not use the option length (int __user
>> *optlen) provided by the userspace?
>
> With this we can append arbitrary/variable size mptcp_info at the end.
I guess either we use a dedicated field for mptcp_info or we infer its
value by looking at size_full_info or optlen. It looks clearer to me
from the userspace point of view to have a dedicated size for mptcp_info
than giving twice sizeof(mptcp_subflow_tcp_info) but I'm probably
missing something :)
>>> __u32 num_subflows_user; /* max subflows that userspace is interested in */
>>> __u32 num_subflows_kern; /* must be 0, set by kernel (real subflow count) */
>>> __u32 num_subflows_copied; /* must be 0, set by kernel (number of subflow infos copied back)
>>> __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_sfinfo_user;
>>> __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
>>> __u32 size_tcpinfo_user;
>>
>> Do we need to have the kernel and user size in different fields?
>
> If we keep, we can re-use the existing code to manipulate this struct.
> Also is consistent with other socket option.
If we don't need them, we don't need to be consistent, especially if we
plan to deprecate the MPTCP_TCP_INFO and MPTCP_SUBFLOW_ADDRS sockopt at
some points :)
But if it allows to re-use the existing code and not deprecate the two
sockopt, I understand we want to do that!
>>> This is more complicated but no need to add new MPTCP_FULL_INFO_V2/V3/V4
>>> etc. in the future.
>>
>> If we add mptcp_info structure as mentioned by Paolo, we are getting
>> very close the the getsockopt(MPTCP_INFO) from the fork kernel, see the
>> section "Get detailed information of your subflows":
>>
>> https://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP
>>
>> And the kernel header:
>>
>> https://github.com/multipath-tcp/mptcp/blob/mptcp_trunk/include/uapi/linux/tcp.h#L321-L366
>>
>>> What do you think?
>>
>> If we want to be even more future prove, why not having something more
>> flexible where the userspace puts flags of what it wants and then there
>> is a kind of TLV structure? Maybe it would be too complex to read/write?
>>
>>
>> struct mptcp_tlv_info {
>> __u32 type;
>> __u32 length;
>> __aligned_u64 value;
>> };
>>
>> struct mptcp_full_info {
>> __u32 flags;
>> __u32 num_subflows;
>> __u32 num_subflows_copied;
>> struct mptcp_tlv_info[]; /* number has to be linked to flags */
>> };
>>
>> Maybe it would be good to add a size for the subflow info structure: if
>> we use such TLV structure, for info around subflows, we will need to set
>> at least the ID for each subflow (maybe more later). We might then need:
>>
>> struct mptcp_full_info {
>> (...)
>> __u32 size_mptcp_subflow_info;
>> (...)
>> };
>>
>> struct mptcp_subflow_info {
>> __u32 id;
>> };
>>
>> struct mptcp_subflow_addrs {
>> struct mptcp_subflow_info info;
>> struct mptcp_subflow_addrs addrs;
>> };
>>
>> struct mptcp_subflow_tcp_info {
>> struct mptcp_subflow_info info;
>> struct tcp_info tcp_info;
>> };
>>
>> Again, just a quick idea, I didn't investigate more. With that, we will
>> likely not need more getsockopt() to get info about the MPTCP connection
>> and its subflows, just an all in one getsockopt().
>>
>> I'm just trying to reacting quickly before going into a direction :)
>
> Uhm... I think TLV are not in the right direction: away from the
> current convention for variable size/changing struct. If use TLV we
> could as well move to NL, which will be again inconsistent to fetch
> such kind of info IMHO.
>
> I think it's preferable to keep the sockopt API self-consistent then be
> aligned with out-of-tree.
>
> The layout proposed by Florian fits what we already have.
>
> With some little field re-order I can avoid code duplication to
> parse/store the initial header.
I see, thank you for the explanation!
I was suggesting the TLV structure because maybe the userspace doesn't
need everything, e.g. the different addresses, maybe just the ID per
subflow is enough? But because we only copy info and we don't compute
stuff, it is fine to receive everything.
Also the other advantage is to be able to get more info but I suppose we
can:
- add more stuff in mptcp_info for the MPTCP connection
- add more stuff in mptcp_subflow_info for per subflow info.
So as long as we keep a "generic" subflow info structure like you did in
your patch v3 2/6 and not specific to the addresses only, I think
indeed, we don't need TLVs.
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 6:34 ` Paolo Abeni
2023-05-24 6:36 ` Paolo Abeni
@ 2023-05-24 9:56 ` Florian Westphal
2023-05-24 12:04 ` Matthieu Baerts
1 sibling, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2023-05-24 9:56 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Florian Westphal, mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> Isn't num_subflows_copied always implied? == min(num_subflows_user,
> num_subflows_kern).
Right, userspace can infer the correct number, i.e.
num_subflows_kern would always be the "actual" used number.
num_subflows_user > num_subflows_kern:
Userspace can safely access "num_subflows_kern" entries
(contain no garbage).
num_subflows_user < num_subflows_kern:
Userspace can safely access "num_subflows_user" entries
(info arrays did not have enough space for all subflows).
I added the "num_subflows_copied" for ease-of-use on userspace
side but I agree that its not required and userspace has
enough info to figure out which upper cap to use.
> Otherwise LGTM, I'll try to have a spin.
Thanks Paolo.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 8:13 ` Matthieu Baerts
2023-05-24 8:53 ` Paolo Abeni
@ 2023-05-24 10:04 ` Florian Westphal
2023-05-24 12:08 ` Matthieu Baerts
1 sibling, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2023-05-24 10:04 UTC (permalink / raw)
To: Matthieu Baerts; +Cc: Florian Westphal, Paolo Abeni, mptcp
Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
> On 23/05/2023 20:25, Florian Westphal wrote:
> > __u32 size_subflow_full_info; /* size of this structure in userspace */
>
> Why do we need this field? Can we not use the option length (int __user
> *optlen) provided by the userspace?
Right, yes, we can (and should) use __user *optlen instead
of this explicit member.
> > __u32 size_sfinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_sfinfo_user;
> > __u32 size_tcpinfo_kernel; /* must be 0, set by kernel */
> > __u32 size_tcpinfo_user;
>
> Do we need to have the kernel and user size in different fields?
> Is it not enough have:
>
> num_subflows; /* will be set by the kernel */
> num_subflows_copied; /* max for the user, kernel put what it did */
> size_sfinfo; /* user set it, kernel put what it used: min */
> size_tcpinfo; /* same */
Yes, that would work too: kernel updates value to what its using.
I have no preference. I used extra fields to better document
that this is filled in by the kernel.
I'll let Paolo make the call.
> If we add mptcp_info structure as mentioned by Paolo, we are getting
> very close the the getsockopt(MPTCP_INFO) from the fork kernel, see the
> section "Get detailed information of your subflows":
>
> https://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP
>
> And the kernel header:
>
> https://github.com/multipath-tcp/mptcp/blob/mptcp_trunk/include/uapi/linux/tcp.h#L321-L366
Right, this is similar.
> If we want to be even more future prove, why not having something more
> flexible where the userspace puts flags of what it wants and then there
> is a kind of TLV structure? Maybe it would be too complex to read/write?
>
>
> struct mptcp_tlv_info {
> __u32 type;
> __u32 length;
> __aligned_u64 value;
> };
>
> struct mptcp_full_info {
> __u32 flags;
> __u32 num_subflows;
> __u32 num_subflows_copied;
> struct mptcp_tlv_info[]; /* number has to be linked to flags */
> };
Hmm, I think it would be better to reuse an existing scheme
and e.g. return a netlink attribute blob in such a case,
this would also allow sharing with the diag interface/functions.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters.
2023-05-24 8:15 ` Paolo Abeni
@ 2023-05-24 10:05 ` Florian Westphal
0 siblings, 0 replies; 25+ messages in thread
From: Florian Westphal @ 2023-05-24 10:05 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Florian Westphal, mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> > __aligned_u64 is preferred for uapi.
>
> Ouch! We already have a few __u64 in the same struct.
>
> Should we mix __u64 and __aligned ones? By pure luck, the both existing
> __u64 and the new one should be aligned, even without the annotation.
I'd leave the existing fields as-is, changing this now doesn't buy
us anything.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 9:56 ` Florian Westphal
@ 2023-05-24 12:04 ` Matthieu Baerts
2023-05-24 12:22 ` Florian Westphal
0 siblings, 1 reply; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 12:04 UTC (permalink / raw)
To: Florian Westphal, Paolo Abeni; +Cc: mptcp
Hi Florian, Paolo,
On 24/05/2023 11:56, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> Isn't num_subflows_copied always implied? == min(num_subflows_user,
>> num_subflows_kern).
>
> Right, userspace can infer the correct number, i.e.
> num_subflows_kern would always be the "actual" used number.
>
> num_subflows_user > num_subflows_kern:
>
> Userspace can safely access "num_subflows_kern" entries
> (contain no garbage).
>
> num_subflows_user < num_subflows_kern:
> Userspace can safely access "num_subflows_user" entries
> (info arrays did not have enough space for all subflows).
>
> I added the "num_subflows_copied" for ease-of-use on userspace
> side but I agree that its not required and userspace has
> enough info to figure out which upper cap to use.
If I understand properly, it means that num_subflows_user and _kern will
only be used to know the size of the arrays:
- num_subflows_user: size of the arrays in the buffer
- num_subflows_kern: what has been written by the kernel, max _user
I think it is still important to have somewhere a num_subflows field to
know how many subflows are attached to an MPTCP connection (also for the
userspace to know if it missed some info because it gave a too small
buffer).
"struct mptcp_info" has a "subflows" field but only counting the
additional subflows so you never know the actual number of subflows :)
Or we add a num_subflows in "struct mptcp_info"?
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 10:04 ` Florian Westphal
@ 2023-05-24 12:08 ` Matthieu Baerts
0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 12:08 UTC (permalink / raw)
To: Florian Westphal; +Cc: Paolo Abeni, mptcp
Hi Florian,
Thank you for your replies!
On 24/05/2023 12:04, Florian Westphal wrote:
> Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
>> If we want to be even more future prove, why not having something more
>> flexible where the userspace puts flags of what it wants and then there
>> is a kind of TLV structure? Maybe it would be too complex to read/write?
>>
>>
>> struct mptcp_tlv_info {
>> __u32 type;
>> __u32 length;
>> __aligned_u64 value;
>> };
>>
>> struct mptcp_full_info {
>> __u32 flags;
>> __u32 num_subflows;
>> __u32 num_subflows_copied;
>> struct mptcp_tlv_info[]; /* number has to be linked to flags */
>> };
>
> Hmm, I think it would be better to reuse an existing scheme
> and e.g. return a netlink attribute blob in such a case,
> this would also allow sharing with the diag interface/functions.
Ah yes, good idea, I didn't think about that.
But I don't know what's best for the userspace: parsing netlink blob or
a specific structure with a size that can be expended. Netlink blob
sounds more future prove and probably easier to maintain (if it is OK to
use that with getsockopt()).
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 12:04 ` Matthieu Baerts
@ 2023-05-24 12:22 ` Florian Westphal
2023-05-24 12:37 ` Matthieu Baerts
0 siblings, 1 reply; 25+ messages in thread
From: Florian Westphal @ 2023-05-24 12:22 UTC (permalink / raw)
To: Matthieu Baerts; +Cc: Florian Westphal, Paolo Abeni, mptcp
Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
> > I added the "num_subflows_copied" for ease-of-use on userspace
> > side but I agree that its not required and userspace has
> > enough info to figure out which upper cap to use.
>
> If I understand properly, it means that num_subflows_user and _kern will
> only be used to know the size of the arrays:
>
> - num_subflows_user: size of the arrays in the buffer
> - num_subflows_kern: what has been written by the kernel, max _user
This is why I had the '_copied' in my proposal :-)
With only _user and _kern:
_user: size of the arrays in the buffer
_kern: subflow count of the kernel, i.e. can be larger than _user.
> I think it is still important to have somewhere a num_subflows field to
> know how many subflows are attached to an MPTCP connection (also for the
> userspace to know if it missed some info because it gave a too small
> buffer).
Yes, userspace must be able to infer that it only got partial
information due to lack of space.
E.g. if userspace expects a maximum amount of 16 subflows but connection
has 240 subflows then num_subflows_kern is 240 and _user is 16.
Userspace can only access up to 16 (since thats what it
provided/allocated).
The _copied proposal would have made this easier on userspace side
(_user is the allocated space, _kern what the kernel would like (real
count), and _copied what was written out).
But as Paolo said userspace can infer the valid count by checking
if _kern > _user, so it does:
bool truncated = kern > user;
subflows_copied = truncated ? user : kern;
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt
2023-05-24 12:22 ` Florian Westphal
@ 2023-05-24 12:37 ` Matthieu Baerts
0 siblings, 0 replies; 25+ messages in thread
From: Matthieu Baerts @ 2023-05-24 12:37 UTC (permalink / raw)
To: Florian Westphal; +Cc: Paolo Abeni, mptcp
Hi Florian,
On 24/05/2023 14:22, Florian Westphal wrote:
> Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
>>> I added the "num_subflows_copied" for ease-of-use on userspace
>>> side but I agree that its not required and userspace has
>>> enough info to figure out which upper cap to use.
>>
>> If I understand properly, it means that num_subflows_user and _kern will
>> only be used to know the size of the arrays:
>>
>> - num_subflows_user: size of the arrays in the buffer
>> - num_subflows_kern: what has been written by the kernel, max _user
>
> This is why I had the '_copied' in my proposal :-)
>
> With only _user and _kern:
>
> _user: size of the arrays in the buffer
> _kern: subflow count of the kernel, i.e. can be larger than _user.
Thank you, that's clearer, so _kern would be the subflow count, not just
what has been copied.
>> I think it is still important to have somewhere a num_subflows field to
>> know how many subflows are attached to an MPTCP connection (also for the
>> userspace to know if it missed some info because it gave a too small
>> buffer).
>
> Yes, userspace must be able to infer that it only got partial
> information due to lack of space.
>
> E.g. if userspace expects a maximum amount of 16 subflows but connection
> has 240 subflows then num_subflows_kern is 240 and _user is 16.
>
> Userspace can only access up to 16 (since thats what it
> provided/allocated).
>
> The _copied proposal would have made this easier on userspace side
> (_user is the allocated space, _kern what the kernel would like (real
> count), and _copied what was written out).
>
> But as Paolo said userspace can infer the valid count by checking
> if _kern > _user, so it does:
>
> bool truncated = kern > user;
> subflows_copied = truncated ? user : kern;
OK then it is fine if we can use min(_user, _kern) to know the number of
subflows written by the kernel and '_kern' for the actual number of
subflows attached to an MPTCP connection.
Maybe we should rename 'num_subflows_kern' to 'num_subflows'? and even
'num_subflows_user' to 'num_subflows_buffer' or 'size_array(s)'?
Or maybe enough with the comments from the example you provided?
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-05-24 12:37 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-23 17:37 [PATCH v3 mptcp-next 0/6] mptcp: add some more diag info Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 1/6] mptcp: add subflow unique id Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 2/6] mptcp: introduce MPTCP_FULL_INFO getsockopt Paolo Abeni
2023-05-23 18:25 ` Florian Westphal
2023-05-24 6:34 ` Paolo Abeni
2023-05-24 6:36 ` Paolo Abeni
2023-05-24 7:36 ` Matthieu Baerts
2023-05-24 8:11 ` Paolo Abeni
2023-05-24 8:16 ` Matthieu Baerts
2023-05-24 9:56 ` Florian Westphal
2023-05-24 12:04 ` Matthieu Baerts
2023-05-24 12:22 ` Florian Westphal
2023-05-24 12:37 ` Matthieu Baerts
2023-05-24 8:13 ` Matthieu Baerts
2023-05-24 8:53 ` Paolo Abeni
2023-05-24 9:12 ` Matthieu Baerts
2023-05-24 10:04 ` Florian Westphal
2023-05-24 12:08 ` Matthieu Baerts
2023-05-23 17:37 ` [PATCH v3 mptcp-next 3/6] mptcp: move snd_una update earlier for fallback socket Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 4/6] mptcp: track some aggregate data counters Paolo Abeni
2023-05-23 18:49 ` Florian Westphal
2023-05-24 8:15 ` Paolo Abeni
2023-05-24 10:05 ` Florian Westphal
2023-05-23 17:37 ` [PATCH v3 mptcp-next 5/6] selftests: mptcp: explicitly tests aggregate counters Paolo Abeni
2023-05-23 17:37 ` [PATCH v3 mptcp-next 6/6] selftests: mptcp: add MPTCP_FULL_INFO testcase Paolo Abeni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.