* [PATCH mptcp-next v11 1/4] mptcp: sockopt: factor inet_flags propagation into a mask
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
@ 2026-05-31 14:59 ` David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 2/4] mptcp: propagate RECVERR sockopts to subflows David Carlier
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Carlier @ 2026-05-31 14:59 UTC (permalink / raw)
To: mptcp; +Cc: matttbe, martineau, geliang, pabeni, David Carlier
Introduce MPTCP_INET_FLAGS_MASK and replace the per-flag
inet_assign_bit() calls in sync_socket_options() with a loop driven
by the mask that calls assign_bit() per set bit, preserving the
per-bit atomicity of the original. Further flags propagated by MPTCP
can be added by extending the mask rather than touching the call
site.
No functional change.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/sockopt.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index fcf6feb2a9eb..7be9a46cbdbe 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -16,6 +16,10 @@
#define MIN_INFO_OPTLEN_SIZE 16
#define MIN_FULL_INFO_OPTLEN_SIZE 40
+#define MPTCP_INET_FLAGS_MASK \
+ (BIT(INET_FLAGS_TRANSPARENT) | \
+ BIT(INET_FLAGS_FREEBIND) | \
+ BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
{
@@ -1551,6 +1555,9 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
{
static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK;
struct sock *sk = (struct sock *)msk;
+ unsigned long mask = MPTCP_INET_FLAGS_MASK;
+ unsigned long src;
+ int b;
bool keep_open;
keep_open = sock_flag(sk, SOCK_KEEPOPEN);
@@ -1597,9 +1604,11 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
tcp_sock_set_maxseg(ssk, msk->maxseg);
- inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
- inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
- inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_PORT, sk));
+ src = READ_ONCE(inet_sk(sk)->inet_flags);
+
+ for_each_set_bit(b, &mask, BITS_PER_LONG)
+ assign_bit(b, &inet_sk(ssk)->inet_flags, src & BIT(b));
+
WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_port_range));
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH mptcp-next v11 2/4] mptcp: propagate RECVERR sockopts to subflows
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 1/4] mptcp: sockopt: factor inet_flags propagation into a mask David Carlier
@ 2026-05-31 14:59 ` David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 3/4] mptcp: support MSG_ERRQUEUE on the parent socket David Carlier
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Carlier @ 2026-05-31 14:59 UTC (permalink / raw)
To: mptcp; +Cc: matttbe, martineau, geliang, pabeni, David Carlier
Propagate IP_RECVERR/IP_RECVERR_RFC4884 and
IPV6_RECVERR/IPV6_RECVERR_RFC4884 from the MPTCP socket to existing
and future subflows.
mptcp_setsockopt_recverr() snapshots optval into a local int, applies
it to the parent socket via ip_setsockopt() / ipv6_setsockopt(), bumps
msk->setsockopt_seq, and forwards to every subflow via
mptcp_setsockopt_all_sf(). Newly-joining subflows pick up the four
RECVERR bits through sync_socket_options() now that
MPTCP_INET_FLAGS_MASK covers them.
mptcp_setsockopt_all_sf() skips IPv4 subflows when called with
SOL_IPV6: ipv6_setsockopt() on a sock with sk_family != AF_INET6
returns an error, which would abort the loop and leave the remaining
subflows desynchronised. This branch was unreachable before this
patch (the only caller was TCP_MAXSEG, family-agnostic); it becomes
live with the new IPV6_RECVERR / IPV6_RECVERR_RFC4884 caller and the
v4-subflow-on-AF_INET6-msk case (v4 MP_JOIN, or userspace PM grafting
a v4 subflow onto a v6 msk).
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/sockopt.c | 140 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 117 insertions(+), 23 deletions(-)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 7be9a46cbdbe..a2a980304660 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -8,6 +8,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
+#include <net/ipv6.h>
#include <net/sock.h>
#include <net/protocol.h>
#include <net/tcp.h>
@@ -19,7 +20,11 @@
#define MPTCP_INET_FLAGS_MASK \
(BIT(INET_FLAGS_TRANSPARENT) | \
BIT(INET_FLAGS_FREEBIND) | \
- BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
+ BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT) | \
+ BIT(INET_FLAGS_RECVERR) | \
+ BIT(INET_FLAGS_RECVERR_RFC4884) | \
+ BIT(INET_FLAGS_RECVERR6) | \
+ BIT(INET_FLAGS_RECVERR6_RFC4884))
static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
{
@@ -398,6 +403,86 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
return -EOPNOTSUPP;
}
+static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
+ int optname, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct mptcp_subflow_context *subflow;
+ int ret = 0;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ int err;
+
+ /* SOL_IPV6 options on a v4 subflow (v4 MP_JOIN, or userspace PM
+ * grafting a v4 subflow onto an AF_INET6 msk) would otherwise
+ * abort the loop with -EAFNOSUPPORT from ipv6_setsockopt().
+ */
+ if (level == SOL_IPV6 && ssk->sk_family != AF_INET6)
+ continue;
+
+ err = tcp_setsockopt(ssk, level, optname, optval, optlen);
+ if (err < 0 && ret == 0)
+ ret = err;
+ }
+
+ if (!ret)
+ sockopt_seq_inc(msk);
+
+ return ret;
+}
+
+static int mptcp_setsockopt_recverr(struct mptcp_sock *msk, int level,
+ int optname, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct sock *sk = (struct sock *)msk;
+ int val = 0, ret;
+
+ /* Let ip_setsockopt() / ipv6_setsockopt() validate optval and optlen
+ * (so 1-byte boolean writes keep the same ABI as plain TCP) and update
+ * the parent's RECVERR bit. Re-read that bit under lock_sock() and
+ * push it to the subflows: concurrent setsockopt callers cannot leave
+ * parent and subflows desynchronized this way.
+ */
+ if (level == SOL_IP)
+ ret = ip_setsockopt(sk, level, optname, optval, optlen);
+#if IS_ENABLED(CONFIG_IPV6)
+ else if (level == SOL_IPV6) {
+ if (sk->sk_family != AF_INET6)
+ return -ENOPROTOOPT;
+ ret = ipv6_setsockopt(sk, level, optname, optval, optlen);
+ }
+#endif
+ else
+ return -EOPNOTSUPP;
+ if (ret)
+ return ret;
+
+ lock_sock(sk);
+ switch (optname) {
+ case IP_RECVERR:
+ val = inet_test_bit(RECVERR, sk);
+ break;
+ case IP_RECVERR_RFC4884:
+ val = inet_test_bit(RECVERR_RFC4884, sk);
+ break;
+#if IS_ENABLED(CONFIG_IPV6)
+ case IPV6_RECVERR:
+ val = inet6_test_bit(RECVERR6, sk);
+ break;
+ case IPV6_RECVERR_RFC4884:
+ val = inet6_test_bit(RECVERR6_RFC4884, sk);
+ break;
+#endif
+ }
+
+ ret = mptcp_setsockopt_all_sf(msk, level, optname,
+ KERNEL_SOCKPTR(&val), sizeof(val));
+ release_sock(sk);
+ return ret;
+}
+
static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -440,6 +525,10 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
release_sock(sk);
break;
+ case IPV6_RECVERR:
+ case IPV6_RECVERR_RFC4884:
+ ret = mptcp_setsockopt_recverr(msk, SOL_IPV6, optname, optval, optlen);
+ break;
}
return ret;
@@ -785,6 +874,9 @@ static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
return mptcp_setsockopt_sol_ip_set(msk, optname, optval, optlen);
case IP_TOS:
return mptcp_setsockopt_v4_set_tos(msk, optname, optval, optlen);
+ case IP_RECVERR:
+ case IP_RECVERR_RFC4884:
+ return mptcp_setsockopt_recverr(msk, SOL_IP, optname, optval, optlen);
}
return -EOPNOTSUPP;
@@ -812,28 +904,6 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
return ret;
}
-static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
- int optname, sockptr_t optval,
- unsigned int optlen)
-{
- struct mptcp_subflow_context *subflow;
- int ret = 0;
-
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- int err;
-
- err = tcp_setsockopt(ssk, level, optname, optval, optlen);
- if (err < 0 && ret == 0)
- ret = err;
- }
-
- if (!ret)
- sockopt_seq_inc(msk);
-
- return ret;
-}
-
static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -1478,6 +1548,12 @@ static int mptcp_getsockopt_v4(struct mptcp_sock *msk, int optname,
case IP_LOCAL_PORT_RANGE:
return mptcp_put_int_option(msk, optval, optlen,
READ_ONCE(inet_sk(sk)->local_port_range));
+ case IP_RECVERR:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet_test_bit(RECVERR, sk));
+ case IP_RECVERR_RFC4884:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet_test_bit(RECVERR_RFC4884, sk));
}
return -EOPNOTSUPP;
@@ -1498,6 +1574,16 @@ static int mptcp_getsockopt_v6(struct mptcp_sock *msk, int optname,
case IPV6_FREEBIND:
return mptcp_put_int_option(msk, optval, optlen,
inet_test_bit(FREEBIND, sk));
+ case IPV6_RECVERR:
+ if (sk->sk_family != AF_INET6)
+ return -ENOPROTOOPT;
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet6_test_bit(RECVERR6, sk));
+ case IPV6_RECVERR_RFC4884:
+ if (sk->sk_family != AF_INET6)
+ return -ENOPROTOOPT;
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet6_test_bit(RECVERR6_RFC4884, sk));
}
return -EOPNOTSUPP;
@@ -1606,6 +1692,14 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
src = READ_ONCE(inet_sk(sk)->inet_flags);
+ /* RECVERR6 bits are only read on AF_INET6 sockets; copying them onto a
+ * v4 subflow is dead state and diverges from the SOL_IPV6 skip in
+ * mptcp_setsockopt_all_sf().
+ */
+ if (ssk->sk_family != AF_INET6)
+ mask &= ~(BIT(INET_FLAGS_RECVERR6) |
+ BIT(INET_FLAGS_RECVERR6_RFC4884));
+
for_each_set_bit(b, &mask, BITS_PER_LONG)
assign_bit(b, &inet_sk(ssk)->inet_flags, src & BIT(b));
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH mptcp-next v11 3/4] mptcp: support MSG_ERRQUEUE on the parent socket
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 1/4] mptcp: sockopt: factor inet_flags propagation into a mask David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 2/4] mptcp: propagate RECVERR sockopts to subflows David Carlier
@ 2026-05-31 14:59 ` David Carlier
2026-05-31 14:59 ` [PATCH mptcp-next v11 4/4] selftests: mptcp: cover IP_RECVERR sockopt propagation David Carlier
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: David Carlier @ 2026-05-31 14:59 UTC (permalink / raw)
To: mptcp; +Cc: matttbe, martineau, geliang, pabeni, David Carlier
Splice pending err skbs from each subflow's error queue onto the parent
msk's error queue at error-report time, so poll() and recvmsg(MSG_ERRQUEUE)
on the parent socket observe TX timestamps and MSG_ZEROCOPY completion
notifications through the standard inet ABI.
The splice filters by SO_EE_ORIGIN: TIMESTAMPING / ZEROCOPY / LOCAL
events forward to the parent because they are tied to user-handed data,
not to a specific path; subflow-level ICMP errors are dropped because
the legacy RECVERR ABI cannot meaningfully convey their per-subflow peer
identity to single-path-aware userspace. Such events will be carried by
a future MPTCP_RECERR channel.
Forwarded events all go through sock_queue_err_skb(), which re-homes
skb->sk onto the parent and charges sk_rmem_alloc, so the parent's error
queue stays bounded by sk_rcvbuf and is dropped under rmem pressure
(sk_rmem_alloc + truesize >= sk_rcvbuf), matching tcp's sk_rcvbuf-gated
tx-timestamp path and ip_icmp_error() / ipv6_icmp_error(). MPTCP itself
never originates MSG_ZEROCOPY or OPT_ID tx-timestamp completions -- its
data path copies into msk-owned pages and bypasses tcp_sendmsg_locked()
-- so no subflow-relative ee_data sequence is ever forwarded to the
parent. The MSG_ERRQUEUE branch of mptcp_recvmsg() forwards to
inet_recv_error() directly, and poll() advertises EPOLLERR purely on the
parent's sk_err / sk_error_queue, matching tcp_poll().
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/protocol.c | 63 +++++++++++++++++++++++++++++++++++++-------
1 file changed, 54 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f1d74d4b28cf..42a355311c81 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -11,6 +11,7 @@
#include <linux/netdevice.h>
#include <linux/sched/signal.h>
#include <linux/atomic.h>
+#include <linux/errqueue.h>
#include <net/aligned_data.h>
#include <net/rps.h>
#include <net/sock.h>
@@ -894,21 +895,61 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
return moved;
}
+static bool mptcp_errqueue_skb_forwardable(const struct sk_buff *skb)
+{
+ u8 origin = SKB_EXT_ERR(skb)->ee.ee_origin;
+
+ return origin == SO_EE_ORIGIN_TIMESTAMPING ||
+ origin == SO_EE_ORIGIN_ZEROCOPY ||
+ origin == SO_EE_ORIGIN_LOCAL;
+}
+
+static bool __mptcp_subflow_splice_errqueue(struct sock *sk, struct sock *ssk)
+{
+ struct sk_buff *skb;
+ bool moved = false;
+
+ while ((skb = skb_dequeue(&ssk->sk_error_queue))) {
+ if (!mptcp_errqueue_skb_forwardable(skb)) {
+ kfree_skb(skb); /* path-specific (ICMP) — belongs in MPTCP_RECERR */
+ continue;
+ }
+ /* sock_queue_err_skb() re-homes skb->sk onto the parent and
+ * charges its sk_rmem_alloc, so the error queue stays bounded by
+ * sk_rcvbuf; drop on overflow, matching tcp's tx-timestamp path.
+ * MPTCP never originates MSG_ZEROCOPY or OPT_ID tx-timestamp
+ * completions (the data path copies and bypasses
+ * tcp_sendmsg_locked()), so no subflow-relative ee_data sequence
+ * is ever forwarded.
+ */
+ if (sock_queue_err_skb(sk, skb)) {
+ kfree_skb(skb);
+ continue;
+ }
+ moved = true;
+ }
+
+ return moved;
+}
+
static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
{
+ bool propagated = false;
int ssk_state;
+ bool report;
int err;
+ report = __mptcp_subflow_splice_errqueue(sk, ssk);
+
/* only propagate errors on fallen-back sockets or
* on MPC connect
*/
if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(mptcp_sk(sk)))
- return false;
+ goto out;
err = sock_error(ssk);
if (!err)
- return false;
-
+ goto out;
/* We need to propagate only transition to CLOSE state.
* Orphaned socket will see such state change via
* subflow_sched_work_if_closed() and that path will properly
@@ -918,11 +959,15 @@ static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
mptcp_set_state(sk, ssk_state);
WRITE_ONCE(sk->sk_err, -err);
+ report = propagated = true;
- /* This barrier is coupled with smp_rmb() in mptcp_poll() */
- smp_wmb();
- sk_error_report(sk);
- return true;
+out:
+ if (report) {
+ /* This barrier is coupled with smp_rmb() in mptcp_poll() */
+ smp_wmb();
+ sk_error_report(sk);
+ }
+ return propagated;
}
void __mptcp_error_report(struct sock *sk)
@@ -2363,7 +2408,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int target;
long timeo;
- /* MSG_ERRQUEUE is really a no-op till we support IP_RECVERR */
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len);
@@ -4413,7 +4457,8 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
/* This barrier is coupled with smp_wmb() in __mptcp_error_report() */
smp_rmb();
- if (READ_ONCE(sk->sk_err))
+ if (READ_ONCE(sk->sk_err) ||
+ !skb_queue_empty_lockless(&sk->sk_error_queue))
mask |= EPOLLERR;
return mask;
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH mptcp-next v11 4/4] selftests: mptcp: cover IP_RECVERR sockopt propagation
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
` (2 preceding siblings ...)
2026-05-31 14:59 ` [PATCH mptcp-next v11 3/4] mptcp: support MSG_ERRQUEUE on the parent socket David Carlier
@ 2026-05-31 14:59 ` David Carlier
2026-05-31 16:11 ` [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket MPTCP CI
2026-06-11 20:22 ` David CARLIER
5 siblings, 0 replies; 7+ messages in thread
From: David Carlier @ 2026-05-31 14:59 UTC (permalink / raw)
To: mptcp; +Cc: matttbe, martineau, geliang, pabeni, David Carlier
Exercise setsockopt/getsockopt of IP_RECVERR and IPV6_RECVERR on the
MPTCP parent socket, including the empty-errqueue EAGAIN contract on
MSG_ERRQUEUE|MSG_DONTWAIT.
End-to-end errqueue delivery (ICMP, TX timestamps, zerocopy) depends on
subflow-side producers that are out of scope for this series and will be
covered by follow-up work.
Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
.../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index b6e58d936ebe..95bb2cc8e2ff 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -769,6 +769,60 @@ static void test_ip_tos_sockopt(int fd)
xerror("expect socklen_t == -1");
}
+static void test_ip_recverr_sockopt(int fd)
+{
+ struct iovec iov = {
+ .iov_base = &(char){ 0 },
+ .iov_len = 1,
+ };
+ struct msghdr msg = {
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ };
+ int one = 1, zero = 0, val = -1;
+ socklen_t s = sizeof(val);
+ int level, optname, r;
+
+ switch (pf) {
+ case AF_INET:
+ level = SOL_IP;
+ optname = IP_RECVERR;
+ break;
+ case AF_INET6:
+ level = SOL_IPV6;
+ optname = IPV6_RECVERR;
+ break;
+ default:
+ xerror("Unknown pf %d\n", pf);
+ }
+
+ r = setsockopt(fd, level, optname, &one, sizeof(one));
+ if (r)
+ die_perror("setsockopt recverr on");
+
+ r = getsockopt(fd, level, optname, &val, &s);
+ if (r)
+ die_perror("getsockopt recverr on");
+ if (s != sizeof(val) || val != one)
+ xerror("recverr on mismatch val=%d len=%u", val, s);
+
+ r = recvmsg(fd, &msg, MSG_ERRQUEUE | MSG_DONTWAIT);
+ if (r != -1 || errno != EAGAIN)
+ xerror("expected empty errqueue to return EAGAIN, ret=%d errno=%d", r, errno);
+
+ r = setsockopt(fd, level, optname, &zero, sizeof(zero));
+ if (r)
+ die_perror("setsockopt recverr off");
+
+ val = -1;
+ s = sizeof(val);
+ r = getsockopt(fd, level, optname, &val, &s);
+ if (r)
+ die_perror("getsockopt recverr off");
+ if (s != sizeof(val) || val != zero)
+ xerror("recverr off mismatch val=%d len=%u", val, s);
+}
+
static int client(int pipefd)
{
int fd = -1;
@@ -787,6 +841,7 @@ static int client(int pipefd)
}
test_ip_tos_sockopt(fd);
+ test_ip_recverr_sockopt(fd);
connect_one_server(fd, pipefd);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
` (3 preceding siblings ...)
2026-05-31 14:59 ` [PATCH mptcp-next v11 4/4] selftests: mptcp: cover IP_RECVERR sockopt propagation David Carlier
@ 2026-05-31 16:11 ` MPTCP CI
2026-06-11 20:22 ` David CARLIER
5 siblings, 0 replies; 7+ messages in thread
From: MPTCP CI @ 2026-05-31 16:11 UTC (permalink / raw)
To: David Carlier; +Cc: mptcp
Hi David,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_add_addr ⚠️
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_connect_checksum ⚠️
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26716494041
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/5dd33dfffc0d
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103609
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
2026-05-31 14:59 [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket David Carlier
` (4 preceding siblings ...)
2026-05-31 16:11 ` [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket MPTCP CI
@ 2026-06-11 20:22 ` David CARLIER
5 siblings, 0 replies; 7+ messages in thread
From: David CARLIER @ 2026-06-11 20:22 UTC (permalink / raw)
To: mptcp; +Cc: matttbe, martineau, geliang, pabeni
Hi Matthieu,
The lone Medium (patch 2) is self-resolving — sashiko itself notes
it's fixed by patch 3 within the same series:
▎ Propagating IP_RECVERR to subflows queues ICMP errors into
sk_error_queue, but MPTCP recvmsg() doesn't implement MSG_ERRQUEUE…
Note: This regression is fixed later
▎ in the patch series by ("mptcp: support MSG_ERRQUEUE on the parent
socket"), which splices and drains the subflow error queues.
So I think it is ok ?
Cheers.
On Sun, 31 May 2026 at 15:59, David Carlier <devnexen@gmail.com> wrote:
>
> This series lets MPTCP applications use poll(EPOLLERR) and
> recvmsg(MSG_ERRQUEUE) on the parent socket to drain TX timestamps,
> MSG_ZEROCOPY completion notifications and SO_EE_ORIGIN_LOCAL events
> through the standard inet ABI, the same way they would on a plain TCP
> socket. ICMP-derived errors stay on the subflow queue: the legacy
> RECVERR ABI cannot convey their per-subflow peer identity, and they
> are intended for a future MPTCP_RECERR channel.
>
> Patch 1 factors the existing inet_flags subflow-propagation hard-coded
> list into a mask, so subsequent patches can extend it without churn.
>
> Patch 2 makes IP_RECVERR / IPV6_RECVERR (and the RFC4884 variants)
> propagate to the subflows. The parent stores the bit so MPTCP-aware
> helpers can branch on it.
>
> Patch 3 splices subflow err-skbs onto the parent's sk_error_queue at
> error-report time. All forwarded events go through sock_queue_err_skb(),
> which re-homes skb->sk onto the parent and charges sk_rmem_alloc, so the
> parent's error queue stays bounded by sk_rcvbuf and is dropped under rmem
> pressure, matching tcp's tx-timestamp path and ip_icmp_error() /
> ipv6_icmp_error(). MPTCP never originates MSG_ZEROCOPY or OPT_ID
> tx-timestamp completions -- its data path copies into msk-owned pages and
> bypasses tcp_sendmsg_locked() -- so no subflow-relative ee_data sequence
> is ever forwarded. mptcp_recvmsg(MSG_ERRQUEUE) forwards directly to
> inet_recv_error(), and mptcp_poll() advertises EPOLLERR purely on the
> parent's sk_err / sk_error_queue, matching tcp_poll().
>
> Patch 4 is a selftest covering the propagation path.
>
> Changes in v11 (addresses sashiko v10 review,
> https://sashiko.dev/#/patchset/20260529174524.260199-1-devnexen@gmail.com):
> - patch 3/4: route MSG_ZEROCOPY completions through sock_queue_err_skb()
> like every other forwarded event, rather than orphaning them and
> queueing to the parent by hand. The hand-rolled path ran the subflow
> destructor (refunding its memory charge) but never charged the parent,
> so completions could pile up unbounded on the parent err queue and
> exhaust memory (OOM). The "never drop or we leak pinned pages" premise
> was also wrong: __msg_zerocopy_callback() calls
> mm_unaccount_pinned_pages() before queueing, so a dropped notification
> loses only the notification, not the pages. (sashiko v10, High)
> - no functional change for the ee_data concern: MPTCP originates neither
> MSG_ZEROCOPY nor OPT_ID tx-timestamp completions, so no subflow-relative
> sequence is ever spliced to the parent. (sashiko v10, High)
> - patch 2/4: initialise val in mptcp_setsockopt_recverr() to silence a
> latent -Wmaybe-uninitialized on the switch without a default case.
>
> v10: https://lore.kernel.org/mptcp/20260529174524.260199-1-devnexen@gmail.com/
> v9: https://lore.kernel.org/mptcp/20260528055459.55133-1-devnexen@gmail.com/
>
> David Carlier (4):
> mptcp: sockopt: factor inet_flags propagation into a mask
> mptcp: propagate RECVERR sockopts to subflows
> mptcp: support MSG_ERRQUEUE on the parent socket
> selftests: mptcp: cover IP_RECVERR sockopt propagation
>
> net/mptcp/protocol.c | 63 ++++++--
> net/mptcp/sockopt.c | 153 +++++++++++++++---
> .../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++
> 3 files changed, 237 insertions(+), 34 deletions(-)
>
>
> base-commit: e05cbdb611ff815528cdf90e29a96663b9af48c6
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread