* [PATCH mptcp-next v14 0/8] implement mptcp read_sock
@ 2025-11-24 9:12 Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 1/8] mptcp: add eat_recv_skb helper Geliang Tang
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
v14:
- Patch 2, new helper __mptcp_read_sock() with noack parameter,
this makes it more similar to __tcp_read_sock() and also prepares
for the use of mptcp_read_sock_noack() in MPTCP KTLS support. Also
invoke msk_owned_by_me() in it to make sure socket was locked.
- Patch 5, export tcp_splice_data_recv() as Paolo suggested in v7.
- Patch 6, drop mptcp_splice_data_recv().
v13:
- rebase on "mptcp: introduce backlog processing" v6
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1761198660.git.geliang@kernel.org/
v12:
- rebase on "mptcp: receive path improvement" 1-7.
- some cleanups.
v11:
- drop "tcp: drop release and lock again in splice_read", and add this
release and lock again in mptcp_splice_read too. (Thanks Mat, I didn't
understand the intent of this code before.)
- call mptcp_rps_record_subflows() in mptcp_splice_read as Mat
suggested.
v10:
- add an offset parameter for mptcp_recv_skb and make it more like
tcp_recv_skb.
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1756780274.git.tanggeliang@kylinos.cn/
v9:
- merge the squash-to patches.
- a new patch "drop release and lock again in splice_read".
- Link: https://patchwork.kernel.org/project/mptcp/cover/cover.1752399660.git.tanggeliang@kylinos.cn/
v8:
- export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h.
- add a new helper mptcp_recv_should_stop.
- add mptcp_connect_splice.sh.
- update commit logs.
v7:
- only patch 1 and patch 2 changed.
- add a new helper mptcp_eat_recv_skb.
- invoke skb_peek in mptcp_recv_skb().
- use while ((skb = mptcp_recv_skb(sk)) != NULL) instead of
skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp).
v6:
- address Paolo's comments for v4, v5 (thanks)
v5:
- extract the common code of __mptcp_recvmsg_mskq() and mptcp_read_sock()
into a new helper __mptcp_recvmsg_desc() to reduce duplication code.
v4:
- v3 doesn't work for MPTCP fallback tests in mptcp_connect.sh, this
set fix it.
- invoke __mptcp_move_skbs in mptcp_read_sock.
- use INDIRECT_CALL_INET_1 in __tcp_splice_read.
v3:
- merge the two squash-to patches.
- use sk->sk_rcvbuf instead of INT_MAX as the max len in
mptcp_read_sock().
- add splice io mode for mptcp_connect and drop mptcp_splice.c test.
- the splice test for packetdrill is also added here:
https://github.com/multipath-tcp/packetdrill/pull/162
v2:
- set splice_read of mptcp
- add a splice selftest.
I have good news! I recently added MPTCP support to "NVME over TCP".
And my RFC patches are under review by NVME maintainer Hannes.
Replacing "NVME over TCP" with MPTCP is very simple. I used IPPROTO_MPTCP
instead of IPPROTO_TCP to create MPTCP sockets on both target and host
sides, these sockets are created in Kernel space.
nvmet_tcp_add_port:
ret = sock_create(port->addr.ss_family, SOCK_STREAM,
IPPROTO_MPTCP, &port->sock);
nvme_tcp_alloc_queue:
ret = sock_create_kern(current->nsproxy->net_ns,
ctrl->addr.ss_family, SOCK_STREAM,
IPPROTO_MPTCP, &queue->sock);
nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it is not implemented in MPTCP. So I implemented it
with reference to __mptcp_recvmsg_mskq().
Since the NVME part patches are still under reviewing, I only send the
MPTCP part patches in this set to MPTCP ML for your opinions.
Geliang Tang (8):
mptcp: add eat_recv_skb helper
mptcp: implement .read_sock
tcp: add recv_should_stop helper
mptcp: use recv_should_stop helper
tcp: export tcp_splice_state
mptcp: implement .splice_read
selftests: mptcp: add splice io mode
selftests: mptcp: connect: cover splice mode
include/net/tcp.h | 34 +++
net/ipv4/tcp.c | 86 ++-----
net/mptcp/protocol.c | 236 +++++++++++++++---
tools/testing/selftests/net/mptcp/Makefile | 1 +
.../selftests/net/mptcp/mptcp_connect.c | 63 ++++-
.../net/mptcp/mptcp_connect_splice.sh | 5 +
6 files changed, 315 insertions(+), 110 deletions(-)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 1/8] mptcp: add eat_recv_skb helper
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock Geliang Tang
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch extracts the free skb related code in __mptcp_recvmsg_mskq()
into a new helper mptcp_eat_recv_skb().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ad20b087e9a9..b29790a7cea8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1995,6 +1995,17 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied);
+static void mptcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
+{
+ /* avoid the indirect call, we know the destructor is sock_rfree */
+ skb->destructor = NULL;
+ skb->sk = NULL;
+ atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
+ sk_mem_uncharge(sk, skb->truesize);
+ __skb_unlink(skb, &sk->sk_receive_queue);
+ skb_attempt_defer_free(skb);
+}
+
static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int copied_total,
struct scm_timestamping_internal *tss,
@@ -2049,13 +2060,7 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg,
break;
}
- /* avoid the indirect call, we know the destructor is sock_rfree */
- skb->destructor = NULL;
- skb->sk = NULL;
- atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
- sk_mem_uncharge(sk, skb->truesize);
- __skb_unlink(skb, &sk->sk_receive_queue);
- skb_attempt_defer_free(skb);
+ mptcp_eat_recv_skb(sk, skb);
}
if (copied >= len)
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 1/8] mptcp: add eat_recv_skb helper Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-12-06 0:17 ` Mat Martineau
2025-11-24 9:12 ` [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper Geliang Tang
` (6 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Hannes Reinecke
From: Geliang Tang <tanggeliang@kylinos.cn>
nvme_tcp_try_recv() needs to call .read_sock interface of struct
proto_ops, but it's not implemented in MPTCP.
This patch implements it with reference to __tcp_read_sock() and
__mptcp_recvmsg_mskq().
Corresponding to tcp_recv_skb(), a new helper for MPTCP named
mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.
Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
sk->sk_rcvbuf as the max read length. The LISTEN status is checked
before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
are invoked after the loop. In the loop, all flags checks for
__mptcp_recvmsg_mskq() are removed.
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b29790a7cea8..75f90f5fc2a3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -4296,6 +4296,90 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
return mask;
}
+static struct sk_buff *mptcp_recv_skb(struct sock *sk, u32 *off)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sk_buff *skb;
+ u32 offset;
+
+ if (!list_empty(&msk->backlog_list))
+ mptcp_move_skbs(sk);
+
+ while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
+ offset = MPTCP_SKB_CB(skb)->offset;
+ if (offset < skb->len) {
+ *off = offset;
+ return skb;
+ }
+ mptcp_eat_recv_skb(sk, skb);
+ }
+ return NULL;
+}
+
+/*
+ * Note:
+ * - It is assumed that the socket was locked by the caller.
+ */
+static int __mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor, bool noack)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ size_t len = sk->sk_rcvbuf;
+ struct sk_buff *skb;
+ int copied = 0;
+ u32 offset;
+
+ msk_owned_by_me(msk);
+
+ if (sk->sk_state == TCP_LISTEN)
+ return -ENOTCONN;
+ while ((skb = mptcp_recv_skb(sk, &offset)) != NULL) {
+ u32 data_len = skb->len - offset;
+ int count;
+ u32 size;
+
+ size = min_t(size_t, len - copied, data_len);
+ count = recv_actor(desc, skb, offset, size);
+ if (count <= 0) {
+ if (!copied)
+ copied = count;
+ break;
+ }
+
+ copied += count;
+
+ msk->bytes_consumed += count;
+ if (count < data_len) {
+ MPTCP_SKB_CB(skb)->offset += count;
+ MPTCP_SKB_CB(skb)->map_seq += count;
+ break;
+ }
+
+ mptcp_eat_recv_skb(sk, skb);
+
+ if (copied >= len)
+ break;
+ }
+
+ if (noack)
+ goto out;
+
+ mptcp_rcv_space_adjust(msk, copied);
+
+ if (copied > 0) {
+ mptcp_recv_skb(sk, &offset);
+ mptcp_cleanup_rbuf(msk, copied);
+ }
+out:
+ return copied;
+}
+
+static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor)
+{
+ return __mptcp_read_sock(sk, desc, recv_actor, false);
+}
+
static const struct proto_ops mptcp_stream_ops = {
.family = PF_INET,
.owner = THIS_MODULE,
@@ -4316,6 +4400,7 @@ static const struct proto_ops mptcp_stream_ops = {
.recvmsg = inet_recvmsg,
.mmap = sock_no_mmap,
.set_rcvlowat = mptcp_set_rcvlowat,
+ .read_sock = mptcp_read_sock,
};
static struct inet_protosw mptcp_protosw = {
@@ -4420,6 +4505,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
.compat_ioctl = inet6_compat_ioctl,
#endif
.set_rcvlowat = mptcp_set_rcvlowat,
+ .read_sock = mptcp_read_sock,
};
static struct proto mptcp_v6_prot;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 1/8] mptcp: add eat_recv_skb helper Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-12-06 0:20 ` Mat Martineau
2025-12-10 1:30 ` Mat Martineau
2025-11-24 9:12 ` [PATCH mptcp-next v14 4/8] mptcp: use " Geliang Tang
` (5 subsequent siblings)
8 siblings, 2 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Paolo Abeni
From: Geliang Tang <tanggeliang@kylinos.cn>
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
and tcp_splice_read() to check whether to stop receiving. It will be used
for MPTCP too.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
include/net/tcp.h | 23 +++++++++++++++
net/ipv4/tcp.c | 73 ++++++++---------------------------------------
2 files changed, 35 insertions(+), 61 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0deb5e9dd911..94a50a9daf65 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2942,4 +2942,27 @@ enum skb_drop_reason tcp_inbound_hash(struct sock *sk,
const void *saddr, const void *daddr,
int family, int dif, int sdif);
+static inline int tcp_recv_should_stop(struct sock *sk, long timeo)
+{
+ if (sock_flag(sk, SOCK_DONE))
+ return -ENETDOWN;
+
+ if (sk->sk_err)
+ return sock_error(sk);
+
+ if (sk->sk_shutdown & RCV_SHUTDOWN)
+ return -ESHUTDOWN;
+
+ if (sk->sk_state == TCP_CLOSE)
+ return -ENOTCONN;
+
+ if (!timeo)
+ return -EAGAIN;
+
+ if (signal_pending(current))
+ return sock_intr_errno(timeo);
+
+ return 0;
+}
+
#endif /* _TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5265a9453814..8b4917a7f262 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -842,26 +842,14 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
if (ret < 0)
break;
else if (!ret) {
+ int err;
+
if (spliced)
break;
- if (sock_flag(sk, SOCK_DONE))
- break;
- if (sk->sk_err) {
- ret = sock_error(sk);
- break;
- }
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
- if (sk->sk_state == TCP_CLOSE) {
- /*
- * This occurs when user tries to read
- * from never connected socket.
- */
- ret = -ENOTCONN;
- break;
- }
- if (!timeo) {
- ret = -EAGAIN;
+ err = tcp_recv_should_stop(sk, timeo);
+ if (err < 0) {
+ if (err != -ENETDOWN && err != -ESHUTDOWN)
+ ret = err;
break;
}
/* if __tcp_splice_read() got nothing while we have
@@ -873,10 +861,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
ret = sk_wait_data(sk, &timeo, NULL);
if (ret < 0)
break;
- if (signal_pending(current)) {
- ret = sock_intr_errno(timeo);
- break;
- }
continue;
}
tss.len -= ret;
@@ -887,9 +871,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
release_sock(sk);
lock_sock(sk);
- if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- signal_pending(current))
+ if (tcp_recv_should_stop(sk, timeo))
break;
}
@@ -2732,42 +2714,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
if (copied >= target && !READ_ONCE(sk->sk_backlog.tail))
break;
- if (copied) {
- if (!timeo ||
- sk->sk_err ||
- sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- signal_pending(current))
- break;
- } else {
- if (sock_flag(sk, SOCK_DONE))
- break;
-
- if (sk->sk_err) {
- copied = sock_error(sk);
- break;
- }
-
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
-
- if (sk->sk_state == TCP_CLOSE) {
- /* This occurs when user tries to read
- * from never connected socket.
- */
- copied = -ENOTCONN;
- break;
- }
-
- if (!timeo) {
- copied = -EAGAIN;
- break;
- }
-
- if (signal_pending(current)) {
- copied = sock_intr_errno(timeo);
- break;
- }
+ err = tcp_recv_should_stop(sk, timeo);
+ if (err < 0) {
+ if (!copied && err != -ENETDOWN && err != -ESHUTDOWN)
+ copied = err;
+ break;
}
if (copied >= target) {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 4/8] mptcp: use recv_should_stop helper
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (2 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 5/8] tcp: export tcp_splice_state Geliang Tang
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
Use the newly added tcp_recv_should_stop() helper in mptcp_recvmsg() to
check whether to stop receiving.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.c | 35 +++++------------------------------
1 file changed, 5 insertions(+), 30 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 75f90f5fc2a3..9302b372910e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2296,36 +2296,11 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
if (copied >= target)
break;
- if (copied) {
- if (sk->sk_err ||
- sk->sk_state == TCP_CLOSE ||
- (sk->sk_shutdown & RCV_SHUTDOWN) ||
- !timeo ||
- signal_pending(current))
- break;
- } else {
- if (sk->sk_err) {
- copied = sock_error(sk);
- break;
- }
-
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- break;
-
- if (sk->sk_state == TCP_CLOSE) {
- copied = -ENOTCONN;
- break;
- }
-
- if (!timeo) {
- copied = -EAGAIN;
- break;
- }
-
- if (signal_pending(current)) {
- copied = sock_intr_errno(timeo);
- break;
- }
+ err = tcp_recv_should_stop(sk, timeo);
+ if (err < 0) {
+ if (!copied && err != -ESHUTDOWN)
+ copied = err;
+ break;
}
pr_debug("block timeout %ld\n", timeo);
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 5/8] tcp: export tcp_splice_state
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (3 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 4/8] mptcp: use " Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read Geliang Tang
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Paolo Abeni
From: Geliang Tang <tanggeliang@kylinos.cn>
Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h so
that they can be used by MPTCP.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
include/net/tcp.h | 11 +++++++++++
net/ipv4/tcp.c | 13 ++-----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 94a50a9daf65..a5a30b65e0ed 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -347,6 +347,15 @@ extern struct proto tcp_prot;
#define TCP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->mib.tcp_statistics, field)
#define TCP_ADD_STATS(net, field, val) SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val)
+/*
+ * TCP splice context
+ */
+struct tcp_splice_state {
+ struct pipe_inode_info *pipe;
+ size_t len;
+ unsigned int flags;
+};
+
void tcp_tsq_work_init(void);
int tcp_v4_err(struct sk_buff *skb, u32);
@@ -378,6 +387,8 @@ void tcp_rcv_space_adjust(struct sock *sk);
int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp);
void tcp_twsk_destructor(struct sock *sk);
void tcp_twsk_purge(struct list_head *net_exit_list);
+int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
+ unsigned int offset, size_t len);
ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8b4917a7f262..c96f8fdf271a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -318,15 +318,6 @@ EXPORT_SYMBOL(tcp_have_smc);
struct percpu_counter tcp_sockets_allocated ____cacheline_aligned_in_smp;
EXPORT_IPV6_MOD(tcp_sockets_allocated);
-/*
- * TCP splice context
- */
-struct tcp_splice_state {
- struct pipe_inode_info *pipe;
- size_t len;
- unsigned int flags;
-};
-
/*
* Pressure flag: try to collapse.
* Technical note: it is used by multiple contexts non atomically.
@@ -775,8 +766,8 @@ void tcp_push(struct sock *sk, int flags, int mss_now,
__tcp_push_pending_frames(sk, mss_now, nonagle);
}
-static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
- unsigned int offset, size_t len)
+int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
+ unsigned int offset, size_t len)
{
struct tcp_splice_state *tss = rd_desc->arg.data;
int ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (4 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 5/8] tcp: export tcp_splice_state Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-12-06 0:22 ` Mat Martineau
2025-11-24 9:12 ` [PATCH mptcp-next v14 7/8] selftests: mptcp: add splice io mode Geliang Tang
` (2 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch implements .splice_read interface of mptcp struct proto_ops
as mptcp_splice_read() with reference to tcp_splice_read().
Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined,
invoking mptcp_read_sock() instead of tcp_read_sock().
mptcp_splice_read() is almost the same as tcp_splice_read(), except for
sock_rps_record_flow().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/protocol.c | 96 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9302b372910e..abd0f44ad63f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -4355,6 +4355,100 @@ static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
return __mptcp_read_sock(sk, desc, recv_actor, false);
}
+static int __mptcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
+{
+ /* Store TCP splice context information in read_descriptor_t. */
+ read_descriptor_t rd_desc = {
+ .arg.data = tss,
+ .count = tss->len,
+ };
+
+ return mptcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
+}
+
+/**
+ * mptcp_splice_read - splice data from MPTCP socket to a pipe
+ * @sock: socket to splice from
+ * @ppos: position (not valid)
+ * @pipe: pipe to splice to
+ * @len: number of bytes to splice
+ * @flags: splice modifier flags
+ *
+ * Description:
+ * Will read pages from given socket and fill them into a pipe.
+ *
+ **/
+static ssize_t mptcp_splice_read(struct socket *sock, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ struct tcp_splice_state tss = {
+ .pipe = pipe,
+ .len = len,
+ .flags = flags,
+ };
+ struct sock *sk = sock->sk;
+ ssize_t spliced = 0;
+ int ret = 0;
+ long timeo;
+
+ /*
+ * We can't seek on a socket input
+ */
+ if (unlikely(*ppos))
+ return -ESPIPE;
+
+ lock_sock(sk);
+
+ mptcp_rps_record_subflows(mptcp_sk(sk));
+
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
+ while (tss.len) {
+ ret = __mptcp_splice_read(sk, &tss);
+ if (ret < 0) {
+ break;
+ } else if (!ret) {
+ int err;
+
+ if (spliced)
+ break;
+ err = tcp_recv_should_stop(sk, timeo);
+ if (err < 0) {
+ if (err != -ESHUTDOWN)
+ ret = err;
+ break;
+ }
+ /* if __mptcp_splice_read() got nothing while we have
+ * an skb in receive queue, we do not want to loop.
+ * This might happen with URG data.
+ */
+ if (!skb_queue_empty(&sk->sk_receive_queue))
+ break;
+ ret = sk_wait_data(sk, &timeo, NULL);
+ if (ret < 0)
+ break;
+ continue;
+ }
+ tss.len -= ret;
+ spliced += ret;
+
+ if (!tss.len || !timeo)
+ break;
+ release_sock(sk);
+ lock_sock(sk);
+
+ if (tcp_recv_should_stop(sk, timeo))
+ break;
+ }
+
+ release_sock(sk);
+
+ if (spliced)
+ return spliced;
+
+ return ret;
+}
+
static const struct proto_ops mptcp_stream_ops = {
.family = PF_INET,
.owner = THIS_MODULE,
@@ -4376,6 +4470,7 @@ static const struct proto_ops mptcp_stream_ops = {
.mmap = sock_no_mmap,
.set_rcvlowat = mptcp_set_rcvlowat,
.read_sock = mptcp_read_sock,
+ .splice_read = mptcp_splice_read,
};
static struct inet_protosw mptcp_protosw = {
@@ -4481,6 +4576,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
#endif
.set_rcvlowat = mptcp_set_rcvlowat,
.read_sock = mptcp_read_sock,
+ .splice_read = mptcp_splice_read,
};
static struct proto mptcp_v6_prot;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 7/8] selftests: mptcp: add splice io mode
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (5 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 8/8] selftests: mptcp: connect: cover splice mode Geliang Tang
2025-11-24 10:39 ` [PATCH mptcp-next v14 0/8] implement mptcp read_sock MPTCP CI
8 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Usage:
./mptcp_connect.sh -m splice
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
.../selftests/net/mptcp/mptcp_connect.c | 63 ++++++++++++++++++-
1 file changed, 62 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 55bbdcf19222..2e50d6f93640 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -51,6 +51,7 @@ enum cfg_mode {
CFG_MODE_POLL,
CFG_MODE_MMAP,
CFG_MODE_SENDFILE,
+ CFG_MODE_SPLICE,
};
enum cfg_peek {
@@ -123,7 +124,7 @@ static void die_usage(void)
fprintf(stderr, "\t-j -- add additional sleep at connection start and tear down "
"-- for MPJ tests\n");
fprintf(stderr, "\t-l -- listens mode, accepts incoming connection\n");
- fprintf(stderr, "\t-m [poll|mmap|sendfile] -- use poll(default)/mmap+write/sendfile\n");
+ fprintf(stderr, "\t-m [poll|mmap|sendfile|splice] -- use poll(default)/mmap+write/sendfile/splice\n");
fprintf(stderr, "\t-M mark -- set socket packet mark\n");
fprintf(stderr, "\t-o option -- test sockopt <option>\n");
fprintf(stderr, "\t-p num -- use port num\n");
@@ -934,6 +935,55 @@ static int copyfd_io_sendfile(int infd, int peerfd, int outfd,
return err;
}
+static int do_splice(const int infd, const int outfd, const size_t len,
+ struct wstate *winfo)
+{
+ int pipefd[2];
+ ssize_t bytes;
+ int err;
+
+ err = pipe(pipefd);
+ if (err)
+ return err;
+
+ while ((bytes = splice(infd, NULL, pipefd[1], NULL,
+ len - winfo->total_len,
+ SPLICE_F_MOVE | SPLICE_F_MORE)) > 0) {
+ splice(pipefd[0], NULL, outfd, NULL, bytes,
+ SPLICE_F_MOVE | SPLICE_F_MORE);
+ }
+
+ close(pipefd[0]);
+ close(pipefd[1]);
+
+ return 0;
+}
+
+static int copyfd_io_splice(int infd, int peerfd, int outfd, unsigned int size,
+ bool *in_closed_after_out, struct wstate *winfo)
+{
+ int err;
+
+ if (listen_mode) {
+ err = do_splice(peerfd, outfd, size, winfo);
+ if (err)
+ return err;
+
+ err = do_splice(infd, peerfd, size, winfo);
+ } else {
+ err = do_splice(infd, peerfd, size, winfo);
+ if (err)
+ return err;
+
+ shut_wr(peerfd);
+
+ err = do_splice(peerfd, outfd, size, winfo);
+ *in_closed_after_out = true;
+ }
+
+ return err;
+}
+
static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct wstate *winfo)
{
bool in_closed_after_out = false;
@@ -966,6 +1016,14 @@ static int copyfd_io(int infd, int peerfd, int outfd, bool close_peerfd, struct
&in_closed_after_out, winfo);
break;
+ case CFG_MODE_SPLICE:
+ file_size = get_infd_size(infd);
+ if (file_size < 0)
+ return file_size;
+ ret = copyfd_io_splice(infd, peerfd, outfd, file_size,
+ &in_closed_after_out, winfo);
+ break;
+
default:
fprintf(stderr, "Invalid mode %d\n", cfg_mode);
@@ -1389,12 +1447,15 @@ int parse_mode(const char *mode)
return CFG_MODE_MMAP;
if (!strcasecmp(mode, "sendfile"))
return CFG_MODE_SENDFILE;
+ if (!strcasecmp(mode, "splice"))
+ return CFG_MODE_SPLICE;
fprintf(stderr, "Unknown test mode: %s\n", mode);
fprintf(stderr, "Supported modes are:\n");
fprintf(stderr, "\t\t\"poll\" - interleaved read/write using poll()\n");
fprintf(stderr, "\t\t\"mmap\" - send entire input file (mmap+write), then read response (-l will read input first)\n");
fprintf(stderr, "\t\t\"sendfile\" - send entire input file (sendfile), then read response (-l will read input first)\n");
+ fprintf(stderr, "\t\t\"splice\" - send entire input file (splice), then read response (-l will read input first)\n");
die_usage();
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH mptcp-next v14 8/8] selftests: mptcp: connect: cover splice mode
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (6 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 7/8] selftests: mptcp: add splice io mode Geliang Tang
@ 2025-11-24 9:12 ` Geliang Tang
2025-11-24 10:39 ` [PATCH mptcp-next v14 0/8] implement mptcp read_sock MPTCP CI
8 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-11-24 9:12 UTC (permalink / raw)
To: mptcp; +Cc: Geliang Tang, Matthieu Baerts
From: Geliang Tang <tanggeliang@kylinos.cn>
The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
tools/testing/selftests/net/mptcp/Makefile | 1 +
tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh | 5 +++++
2 files changed, 6 insertions(+)
create mode 100755 tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh
diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile
index 15d144a25d82..30d9022119fa 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -10,6 +10,7 @@ TEST_PROGS := \
mptcp_connect_checksum.sh \
mptcp_connect_mmap.sh \
mptcp_connect_sendfile.sh \
+ mptcp_connect_splice.sh \
mptcp_join.sh \
mptcp_sockopt.sh \
pm_netlink.sh \
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh b/tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh
new file mode 100755
index 000000000000..241254a966c9
--- /dev/null
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect_splice.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+MPTCP_LIB_KSFT_TEST="$(basename "${0}" .sh)" \
+ "$(dirname "${0}")/mptcp_connect.sh" -m splice "${@}"
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 0/8] implement mptcp read_sock
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
` (7 preceding siblings ...)
2025-11-24 9:12 ` [PATCH mptcp-next v14 8/8] selftests: mptcp: connect: cover splice mode Geliang Tang
@ 2025-11-24 10:39 ` MPTCP CI
8 siblings, 0 replies; 15+ messages in thread
From: MPTCP CI @ 2025-11-24 10:39 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19629593335
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/600285bd8e5c
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1026854
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock
2025-11-24 9:12 ` [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock Geliang Tang
@ 2025-12-06 0:17 ` Mat Martineau
2025-12-08 2:14 ` Geliang Tang
0 siblings, 1 reply; 15+ messages in thread
From: Mat Martineau @ 2025-12-06 0:17 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang, Hannes Reinecke
On Mon, 24 Nov 2025, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> nvme_tcp_try_recv() needs to call .read_sock interface of struct
> proto_ops, but it's not implemented in MPTCP.
>
> This patch implements it with reference to __tcp_read_sock() and
> __mptcp_recvmsg_mskq().
>
> Corresponding to tcp_recv_skb(), a new helper for MPTCP named
> mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.
>
> Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
> sk->sk_rcvbuf as the max read length. The LISTEN status is checked
> before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
> are invoked after the loop. In the loop, all flags checks for
> __mptcp_recvmsg_mskq() are removed.
>
> Reviewed-by: Hannes Reinecke <hare@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> net/mptcp/protocol.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 86 insertions(+)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index b29790a7cea8..75f90f5fc2a3 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -4296,6 +4296,90 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
> return mask;
> }
>
> +static struct sk_buff *mptcp_recv_skb(struct sock *sk, u32 *off)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sk);
> + struct sk_buff *skb;
> + u32 offset;
> +
> + if (!list_empty(&msk->backlog_list))
> + mptcp_move_skbs(sk);
> +
> + while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> + offset = MPTCP_SKB_CB(skb)->offset;
> + if (offset < skb->len) {
> + *off = offset;
> + return skb;
> + }
> + mptcp_eat_recv_skb(sk, skb);
> + }
> + return NULL;
> +}
> +
> +/*
> + * Note:
> + * - It is assumed that the socket was locked by the caller.
> + */
> +static int __mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor, bool noack)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sk);
> + size_t len = sk->sk_rcvbuf;
Hi Geliang!
Why limit the maximum length here? The socket lock is held already so no
new skbs will be added to sk_receive_queue.
- Mat
> + struct sk_buff *skb;
> + int copied = 0;
> + u32 offset;
> +
> + msk_owned_by_me(msk);
> +
> + if (sk->sk_state == TCP_LISTEN)
> + return -ENOTCONN;
> + while ((skb = mptcp_recv_skb(sk, &offset)) != NULL) {
> + u32 data_len = skb->len - offset;
> + int count;
> + u32 size;
> +
> + size = min_t(size_t, len - copied, data_len);
> + count = recv_actor(desc, skb, offset, size);
> + if (count <= 0) {
> + if (!copied)
> + copied = count;
> + break;
> + }
> +
> + copied += count;
> +
> + msk->bytes_consumed += count;
> + if (count < data_len) {
> + MPTCP_SKB_CB(skb)->offset += count;
> + MPTCP_SKB_CB(skb)->map_seq += count;
> + break;
> + }
> +
> + mptcp_eat_recv_skb(sk, skb);
> +
> + if (copied >= len)
> + break;
> + }
> +
> + if (noack)
> + goto out;
> +
> + mptcp_rcv_space_adjust(msk, copied);
> +
> + if (copied > 0) {
> + mptcp_recv_skb(sk, &offset);
> + mptcp_cleanup_rbuf(msk, copied);
> + }
> +out:
> + return copied;
> +}
> +
> +static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor)
> +{
> + return __mptcp_read_sock(sk, desc, recv_actor, false);
> +}
> +
> static const struct proto_ops mptcp_stream_ops = {
> .family = PF_INET,
> .owner = THIS_MODULE,
> @@ -4316,6 +4400,7 @@ static const struct proto_ops mptcp_stream_ops = {
> .recvmsg = inet_recvmsg,
> .mmap = sock_no_mmap,
> .set_rcvlowat = mptcp_set_rcvlowat,
> + .read_sock = mptcp_read_sock,
> };
>
> static struct inet_protosw mptcp_protosw = {
> @@ -4420,6 +4505,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
> .compat_ioctl = inet6_compat_ioctl,
> #endif
> .set_rcvlowat = mptcp_set_rcvlowat,
> + .read_sock = mptcp_read_sock,
> };
>
> static struct proto mptcp_v6_prot;
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper
2025-11-24 9:12 ` [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper Geliang Tang
@ 2025-12-06 0:20 ` Mat Martineau
2025-12-10 1:30 ` Mat Martineau
1 sibling, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2025-12-06 0:20 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang, Paolo Abeni
On Mon, 24 Nov 2025, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
> and tcp_splice_read() to check whether to stop receiving. It will be used
> for MPTCP too.
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> include/net/tcp.h | 23 +++++++++++++++
> net/ipv4/tcp.c | 73 ++++++++---------------------------------------
> 2 files changed, 35 insertions(+), 61 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 0deb5e9dd911..94a50a9daf65 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2942,4 +2942,27 @@ enum skb_drop_reason tcp_inbound_hash(struct sock *sk,
> const void *saddr, const void *daddr,
> int family, int dif, int sdif);
>
> +static inline int tcp_recv_should_stop(struct sock *sk, long timeo)
> +{
> + if (sock_flag(sk, SOCK_DONE))
> + return -ENETDOWN;
> +
> + if (sk->sk_err)
> + return sock_error(sk);
Hi Geliang -
sock_error() will clear sk_err, and this introduces some changes in
certain code paths that are refactored below (especially where data was
read successfully before the error was encountered).
I suggest either handling sk_err outside of this helper so the exact
behavior of the existing code can be maintained, or maybe defer this
helper for a separate patch series.
- Mat
> +
> + if (sk->sk_shutdown & RCV_SHUTDOWN)
> + return -ESHUTDOWN;
> +
> + if (sk->sk_state == TCP_CLOSE)
> + return -ENOTCONN;
> +
> + if (!timeo)
> + return -EAGAIN;
> +
> + if (signal_pending(current))
> + return sock_intr_errno(timeo);
> +
> + return 0;
> +}
> +
> #endif /* _TCP_H */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5265a9453814..8b4917a7f262 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -842,26 +842,14 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> if (ret < 0)
> break;
> else if (!ret) {
> + int err;
> +
> if (spliced)
> break;
> - if (sock_flag(sk, SOCK_DONE))
> - break;
> - if (sk->sk_err) {
> - ret = sock_error(sk);
> - break;
> - }
> - if (sk->sk_shutdown & RCV_SHUTDOWN)
> - break;
> - if (sk->sk_state == TCP_CLOSE) {
> - /*
> - * This occurs when user tries to read
> - * from never connected socket.
> - */
> - ret = -ENOTCONN;
> - break;
> - }
> - if (!timeo) {
> - ret = -EAGAIN;
> + err = tcp_recv_should_stop(sk, timeo);
> + if (err < 0) {
> + if (err != -ENETDOWN && err != -ESHUTDOWN)
> + ret = err;
> break;
> }
> /* if __tcp_splice_read() got nothing while we have
> @@ -873,10 +861,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> ret = sk_wait_data(sk, &timeo, NULL);
> if (ret < 0)
> break;
> - if (signal_pending(current)) {
> - ret = sock_intr_errno(timeo);
> - break;
> - }
> continue;
> }
> tss.len -= ret;
> @@ -887,9 +871,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> release_sock(sk);
> lock_sock(sk);
>
> - if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
> - (sk->sk_shutdown & RCV_SHUTDOWN) ||
> - signal_pending(current))
> + if (tcp_recv_should_stop(sk, timeo))
> break;
> }
>
> @@ -2732,42 +2714,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> if (copied >= target && !READ_ONCE(sk->sk_backlog.tail))
> break;
>
> - if (copied) {
> - if (!timeo ||
> - sk->sk_err ||
> - sk->sk_state == TCP_CLOSE ||
> - (sk->sk_shutdown & RCV_SHUTDOWN) ||
> - signal_pending(current))
> - break;
> - } else {
> - if (sock_flag(sk, SOCK_DONE))
> - break;
> -
> - if (sk->sk_err) {
> - copied = sock_error(sk);
> - break;
> - }
> -
> - if (sk->sk_shutdown & RCV_SHUTDOWN)
> - break;
> -
> - if (sk->sk_state == TCP_CLOSE) {
> - /* This occurs when user tries to read
> - * from never connected socket.
> - */
> - copied = -ENOTCONN;
> - break;
> - }
> -
> - if (!timeo) {
> - copied = -EAGAIN;
> - break;
> - }
> -
> - if (signal_pending(current)) {
> - copied = sock_intr_errno(timeo);
> - break;
> - }
> + err = tcp_recv_should_stop(sk, timeo);
> + if (err < 0) {
> + if (!copied && err != -ENETDOWN && err != -ESHUTDOWN)
> + copied = err;
> + break;
> }
>
> if (copied >= target) {
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read
2025-11-24 9:12 ` [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read Geliang Tang
@ 2025-12-06 0:22 ` Mat Martineau
0 siblings, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2025-12-06 0:22 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang
On Mon, 24 Nov 2025, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> This patch implements .splice_read interface of mptcp struct proto_ops
> as mptcp_splice_read() with reference to tcp_splice_read().
>
> Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined,
> invoking mptcp_read_sock() instead of tcp_read_sock().
>
> mptcp_splice_read() is almost the same as tcp_splice_read(), except for
> sock_rps_record_flow().
>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> net/mptcp/protocol.c | 96 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 96 insertions(+)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 9302b372910e..abd0f44ad63f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -4355,6 +4355,100 @@ static int mptcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> return __mptcp_read_sock(sk, desc, recv_actor, false);
> }
>
> +static int __mptcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
> +{
> + /* Store TCP splice context information in read_descriptor_t. */
> + read_descriptor_t rd_desc = {
> + .arg.data = tss,
> + .count = tss->len,
> + };
> +
> + return mptcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
> +}
> +
> +/**
> + * mptcp_splice_read - splice data from MPTCP socket to a pipe
> + * @sock: socket to splice from
> + * @ppos: position (not valid)
> + * @pipe: pipe to splice to
> + * @len: number of bytes to splice
> + * @flags: splice modifier flags
> + *
> + * Description:
> + * Will read pages from given socket and fill them into a pipe.
> + *
> + **/
> +static ssize_t mptcp_splice_read(struct socket *sock, loff_t *ppos,
> + struct pipe_inode_info *pipe, size_t len,
> + unsigned int flags)
> +{
> + struct tcp_splice_state tss = {
> + .pipe = pipe,
> + .len = len,
> + .flags = flags,
> + };
> + struct sock *sk = sock->sk;
> + ssize_t spliced = 0;
> + int ret = 0;
> + long timeo;
> +
> + /*
> + * We can't seek on a socket input
> + */
> + if (unlikely(*ppos))
> + return -ESPIPE;
> +
> + lock_sock(sk);
> +
> + mptcp_rps_record_subflows(mptcp_sk(sk));
> +
> + timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
> + while (tss.len) {
> + ret = __mptcp_splice_read(sk, &tss);
> + if (ret < 0) {
> + break;
> + } else if (!ret) {
> + int err;
> +
> + if (spliced)
> + break;
> + err = tcp_recv_should_stop(sk, timeo);
> + if (err < 0) {
> + if (err != -ESHUTDOWN)
> + ret = err;
> + break;
> + }
> + /* if __mptcp_splice_read() got nothing while we have
> + * an skb in receive queue, we do not want to loop.
> + * This might happen with URG data.
> + */
> + if (!skb_queue_empty(&sk->sk_receive_queue))
> + break;
Hi Geliang -
This queue could also be non-empty due to the length limit in patch 2.
Maybe another reason to remove that length limit.
- Mat
> + ret = sk_wait_data(sk, &timeo, NULL);
> + if (ret < 0)
> + break;
> + continue;
> + }
> + tss.len -= ret;
> + spliced += ret;
> +
> + if (!tss.len || !timeo)
> + break;
> + release_sock(sk);
> + lock_sock(sk);
> +
> + if (tcp_recv_should_stop(sk, timeo))
> + break;
> + }
> +
> + release_sock(sk);
> +
> + if (spliced)
> + return spliced;
> +
> + return ret;
> +}
> +
> static const struct proto_ops mptcp_stream_ops = {
> .family = PF_INET,
> .owner = THIS_MODULE,
> @@ -4376,6 +4470,7 @@ static const struct proto_ops mptcp_stream_ops = {
> .mmap = sock_no_mmap,
> .set_rcvlowat = mptcp_set_rcvlowat,
> .read_sock = mptcp_read_sock,
> + .splice_read = mptcp_splice_read,
> };
>
> static struct inet_protosw mptcp_protosw = {
> @@ -4481,6 +4576,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
> #endif
> .set_rcvlowat = mptcp_set_rcvlowat,
> .read_sock = mptcp_read_sock,
> + .splice_read = mptcp_splice_read,
> };
>
> static struct proto mptcp_v6_prot;
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock
2025-12-06 0:17 ` Mat Martineau
@ 2025-12-08 2:14 ` Geliang Tang
0 siblings, 0 replies; 15+ messages in thread
From: Geliang Tang @ 2025-12-08 2:14 UTC (permalink / raw)
To: Mat Martineau; +Cc: mptcp, Geliang Tang, Hannes Reinecke
Hi Mat,
Thank you for your review.
On Fri, 2025-12-05 at 16:17 -0800, Mat Martineau wrote:
> On Mon, 24 Nov 2025, Geliang Tang wrote:
>
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> >
> > nvme_tcp_try_recv() needs to call .read_sock interface of struct
> > proto_ops, but it's not implemented in MPTCP.
> >
> > This patch implements it with reference to __tcp_read_sock() and
> > __mptcp_recvmsg_mskq().
> >
> > Corresponding to tcp_recv_skb(), a new helper for MPTCP named
> > mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.
> >
> > Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
> > sk->sk_rcvbuf as the max read length. The LISTEN status is checked
> > before the while loop, and mptcp_recv_skb() and
> > mptcp_cleanup_rbuf()
> > are invoked after the loop. In the loop, all flags checks for
> > __mptcp_recvmsg_mskq() are removed.
> >
> > Reviewed-by: Hannes Reinecke <hare@kernel.org>
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/protocol.c | 86
> > ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 86 insertions(+)
> >
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index b29790a7cea8..75f90f5fc2a3 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -4296,6 +4296,90 @@ static __poll_t mptcp_poll(struct file
> > *file, struct socket *sock,
> > return mask;
> > }
> >
> > +static struct sk_buff *mptcp_recv_skb(struct sock *sk, u32 *off)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sk);
> > + struct sk_buff *skb;
> > + u32 offset;
> > +
> > + if (!list_empty(&msk->backlog_list))
> > + mptcp_move_skbs(sk);
> > +
> > + while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
> > + offset = MPTCP_SKB_CB(skb)->offset;
> > + if (offset < skb->len) {
> > + *off = offset;
> > + return skb;
> > + }
> > + mptcp_eat_recv_skb(sk, skb);
> > + }
> > + return NULL;
> > +}
> > +
> > +/*
> > + * Note:
> > + * - It is assumed that the socket was locked by the caller.
> > + */
> > +static int __mptcp_read_sock(struct sock *sk, read_descriptor_t
> > *desc,
> > + sk_read_actor_t recv_actor, bool
> > noack)
> > +{
> > + struct mptcp_sock *msk = mptcp_sk(sk);
> > + size_t len = sk->sk_rcvbuf;
>
> Hi Geliang!
>
> Why limit the maximum length here? The socket lock is held already so
> no
> new skbs will be added to sk_receive_queue.
This is indeed a point where I'm not entirely certain. In v1 and v2
[1], I did set this len to INT_MAX, meaning there is no restriction on
the maximum length.
I revised it to INT_MAX again in v15 and also removed
tcp_recv_should_stop() helper related patches (patch 3, patch 4 in v14)
from this series as you suggested.
With these changes, everything is working fine, and all tests pass.
Please review v15 for me.
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/patch/32cc946892d634a14cdf4373c1b9d47126162e5a.1749286212.git.tanggeliang@kylinos.cn/
>
> - Mat
>
>
> > + struct sk_buff *skb;
> > + int copied = 0;
> > + u32 offset;
> > +
> > + msk_owned_by_me(msk);
> > +
> > + if (sk->sk_state == TCP_LISTEN)
> > + return -ENOTCONN;
> > + while ((skb = mptcp_recv_skb(sk, &offset)) != NULL) {
> > + u32 data_len = skb->len - offset;
> > + int count;
> > + u32 size;
> > +
> > + size = min_t(size_t, len - copied, data_len);
> > + count = recv_actor(desc, skb, offset, size);
> > + if (count <= 0) {
> > + if (!copied)
> > + copied = count;
> > + break;
> > + }
> > +
> > + copied += count;
> > +
> > + msk->bytes_consumed += count;
> > + if (count < data_len) {
> > + MPTCP_SKB_CB(skb)->offset += count;
> > + MPTCP_SKB_CB(skb)->map_seq += count;
> > + break;
> > + }
> > +
> > + mptcp_eat_recv_skb(sk, skb);
> > +
> > + if (copied >= len)
> > + break;
> > + }
> > +
> > + if (noack)
> > + goto out;
> > +
> > + mptcp_rcv_space_adjust(msk, copied);
> > +
> > + if (copied > 0) {
> > + mptcp_recv_skb(sk, &offset);
> > + mptcp_cleanup_rbuf(msk, copied);
> > + }
> > +out:
> > + return copied;
> > +}
> > +
> > +static int mptcp_read_sock(struct sock *sk, read_descriptor_t
> > *desc,
> > + sk_read_actor_t recv_actor)
> > +{
> > + return __mptcp_read_sock(sk, desc, recv_actor, false);
> > +}
> > +
> > static const struct proto_ops mptcp_stream_ops = {
> > .family = PF_INET,
> > .owner = THIS_MODULE,
> > @@ -4316,6 +4400,7 @@ static const struct proto_ops
> > mptcp_stream_ops = {
> > .recvmsg = inet_recvmsg,
> > .mmap = sock_no_mmap,
> > .set_rcvlowat = mptcp_set_rcvlowat,
> > + .read_sock = mptcp_read_sock,
> > };
> >
> > static struct inet_protosw mptcp_protosw = {
> > @@ -4420,6 +4505,7 @@ static const struct proto_ops
> > mptcp_v6_stream_ops = {
> > .compat_ioctl = inet6_compat_ioctl,
> > #endif
> > .set_rcvlowat = mptcp_set_rcvlowat,
> > + .read_sock = mptcp_read_sock,
> > };
> >
> > static struct proto mptcp_v6_prot;
> > --
> > 2.43.0
> >
> >
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper
2025-11-24 9:12 ` [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper Geliang Tang
2025-12-06 0:20 ` Mat Martineau
@ 2025-12-10 1:30 ` Mat Martineau
1 sibling, 0 replies; 15+ messages in thread
From: Mat Martineau @ 2025-12-10 1:30 UTC (permalink / raw)
To: Geliang Tang; +Cc: mptcp, Geliang Tang, Paolo Abeni
On Mon, 24 Nov 2025, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
> and tcp_splice_read() to check whether to stop receiving. It will be used
> for MPTCP too.
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> include/net/tcp.h | 23 +++++++++++++++
> net/ipv4/tcp.c | 73 ++++++++---------------------------------------
> 2 files changed, 35 insertions(+), 61 deletions(-)
>
Acked-by: Mat Martineau <martineau@kernel.org>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 0deb5e9dd911..94a50a9daf65 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2942,4 +2942,27 @@ enum skb_drop_reason tcp_inbound_hash(struct sock *sk,
> const void *saddr, const void *daddr,
> int family, int dif, int sdif);
>
> +static inline int tcp_recv_should_stop(struct sock *sk, long timeo)
> +{
> + if (sock_flag(sk, SOCK_DONE))
> + return -ENETDOWN;
> +
> + if (sk->sk_err)
> + return sock_error(sk);
> +
> + if (sk->sk_shutdown & RCV_SHUTDOWN)
> + return -ESHUTDOWN;
> +
> + if (sk->sk_state == TCP_CLOSE)
> + return -ENOTCONN;
> +
> + if (!timeo)
> + return -EAGAIN;
> +
> + if (signal_pending(current))
> + return sock_intr_errno(timeo);
> +
> + return 0;
> +}
> +
> #endif /* _TCP_H */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5265a9453814..8b4917a7f262 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -842,26 +842,14 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> if (ret < 0)
> break;
> else if (!ret) {
> + int err;
> +
> if (spliced)
> break;
> - if (sock_flag(sk, SOCK_DONE))
> - break;
> - if (sk->sk_err) {
> - ret = sock_error(sk);
> - break;
> - }
> - if (sk->sk_shutdown & RCV_SHUTDOWN)
> - break;
> - if (sk->sk_state == TCP_CLOSE) {
> - /*
> - * This occurs when user tries to read
> - * from never connected socket.
> - */
> - ret = -ENOTCONN;
> - break;
> - }
> - if (!timeo) {
> - ret = -EAGAIN;
> + err = tcp_recv_should_stop(sk, timeo);
> + if (err < 0) {
> + if (err != -ENETDOWN && err != -ESHUTDOWN)
> + ret = err;
> break;
> }
> /* if __tcp_splice_read() got nothing while we have
> @@ -873,10 +861,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> ret = sk_wait_data(sk, &timeo, NULL);
> if (ret < 0)
> break;
> - if (signal_pending(current)) {
> - ret = sock_intr_errno(timeo);
> - break;
> - }
> continue;
> }
> tss.len -= ret;
> @@ -887,9 +871,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
> release_sock(sk);
> lock_sock(sk);
>
> - if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
> - (sk->sk_shutdown & RCV_SHUTDOWN) ||
> - signal_pending(current))
> + if (tcp_recv_should_stop(sk, timeo))
> break;
> }
>
> @@ -2732,42 +2714,11 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> if (copied >= target && !READ_ONCE(sk->sk_backlog.tail))
> break;
>
> - if (copied) {
> - if (!timeo ||
> - sk->sk_err ||
> - sk->sk_state == TCP_CLOSE ||
> - (sk->sk_shutdown & RCV_SHUTDOWN) ||
> - signal_pending(current))
> - break;
> - } else {
> - if (sock_flag(sk, SOCK_DONE))
> - break;
> -
> - if (sk->sk_err) {
> - copied = sock_error(sk);
> - break;
> - }
> -
> - if (sk->sk_shutdown & RCV_SHUTDOWN)
> - break;
> -
> - if (sk->sk_state == TCP_CLOSE) {
> - /* This occurs when user tries to read
> - * from never connected socket.
> - */
> - copied = -ENOTCONN;
> - break;
> - }
> -
> - if (!timeo) {
> - copied = -EAGAIN;
> - break;
> - }
> -
> - if (signal_pending(current)) {
> - copied = sock_intr_errno(timeo);
> - break;
> - }
> + err = tcp_recv_should_stop(sk, timeo);
> + if (err < 0) {
> + if (!copied && err != -ENETDOWN && err != -ESHUTDOWN)
> + copied = err;
> + break;
> }
>
> if (copied >= target) {
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-12-10 1:30 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 9:12 [PATCH mptcp-next v14 0/8] implement mptcp read_sock Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 1/8] mptcp: add eat_recv_skb helper Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 2/8] mptcp: implement .read_sock Geliang Tang
2025-12-06 0:17 ` Mat Martineau
2025-12-08 2:14 ` Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 3/8] tcp: add recv_should_stop helper Geliang Tang
2025-12-06 0:20 ` Mat Martineau
2025-12-10 1:30 ` Mat Martineau
2025-11-24 9:12 ` [PATCH mptcp-next v14 4/8] mptcp: use " Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 5/8] tcp: export tcp_splice_state Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 6/8] mptcp: implement .splice_read Geliang Tang
2025-12-06 0:22 ` Mat Martineau
2025-11-24 9:12 ` [PATCH mptcp-next v14 7/8] selftests: mptcp: add splice io mode Geliang Tang
2025-11-24 9:12 ` [PATCH mptcp-next v14 8/8] selftests: mptcp: connect: cover splice mode Geliang Tang
2025-11-24 10:39 ` [PATCH mptcp-next v14 0/8] implement mptcp read_sock MPTCP CI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox