* [PATCH 5.15.y 1/6] mptcp: track and update contiguous data status
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 2/6] mptcp: handle consistently DSS corruption Matthieu Baerts (NGI0)
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Geliang Tang, sashal, Paolo Abeni, Mat Martineau,
David S . Miller, Matthieu Baerts
From: Geliang Tang <geliang.tang@suse.com>
commit 0530020a7c8f2204e784f0dbdc882bbd961fdbde upstream.
This patch adds a new member allow_infinite_fallback in mptcp_sock,
which is initialized to 'true' when the connection begins and is set
to 'false' on any retransmit or successful MP_JOIN. Only do infinite
mapping fallback if there is a single subflow AND there have been no
retransmissions AND there have never been any MP_JOINs.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: e32d262c89e2 ("mptcp: handle consistently DSS corruption")
[ Conflicts in protocol.c, because commit 3e5014909b56 ("mptcp: cleanup
MPJ subflow list handling") is not in this version. This commit is
linked to a new feature, changing the context around. The new line
can still be added at the same place.
Conflicts in protocol.h, because commit 4f6e14bd19d6 ("mptcp: support
TCP_CORK and TCP_NODELAY") is not in this version. This commit is
linked to a new feature, changing the context around. The new line can
still be added at the same place.
Conflicts in subflow.c, because commit 0348c690ed37 ("mptcp: add the
fallback check") is not in this version. This commit is linked to a
new feature, changing the context around. The new line can still be
added at the same place. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/protocol.c | 3 +++
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 4 +++-
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index da2a1a150bc6..73a0b0d15382 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2472,6 +2472,7 @@ static void __mptcp_retrans(struct sock *sk)
dfrag->already_sent = max(dfrag->already_sent, info.sent);
tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
info.size_goal);
+ WRITE_ONCE(msk->allow_infinite_fallback, false);
}
release_sock(ssk);
@@ -2549,6 +2550,7 @@ static int __mptcp_init_sock(struct sock *sk)
msk->first = NULL;
inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
+ WRITE_ONCE(msk->allow_infinite_fallback, true);
msk->recovery = false;
mptcp_pm_data_init(msk);
@@ -3299,6 +3301,7 @@ bool mptcp_finish_join(struct sock *ssk)
if (parent_sock && !ssk->sk_socket)
mptcp_sock_graft(ssk, parent_sock);
subflow->map_seq = READ_ONCE(msk->ack_seq);
+ WRITE_ONCE(msk->allow_infinite_fallback, false);
out:
mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC);
return true;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9e0a5591d4e1..5d458c3161cd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -249,6 +249,7 @@ struct mptcp_sock {
bool rcv_fastclose;
bool use_64bit_ack; /* Set when we received a 64-bit DSN */
bool csum_enabled;
+ bool allow_infinite_fallback;
spinlock_t join_list_lock;
int keepalive_cnt;
int keepalive_idle;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e71082dd6484..412823af2c1d 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1219,7 +1219,8 @@ static bool subflow_check_data_avail(struct sock *ssk)
fallback:
/* RFC 8684 section 3.7. */
if (subflow->send_mp_fail) {
- if (mptcp_has_another_subflow(ssk)) {
+ if (mptcp_has_another_subflow(ssk) ||
+ !READ_ONCE(msk->allow_infinite_fallback)) {
while ((skb = skb_peek(&ssk->sk_receive_queue)))
sk_eat_skb(ssk, skb);
}
@@ -1481,6 +1482,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
/* discard the subflow socket */
mptcp_sock_graft(ssk, sk->sk_socket);
iput(SOCK_INODE(sf));
+ WRITE_ONCE(msk->allow_infinite_fallback, false);
return err;
failed_unlink:
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 5.15.y 2/6] mptcp: handle consistently DSS corruption
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 1/6] mptcp: track and update contiguous data status Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 3/6] tcp: fix mptcp DSS corruption due to large pmtu xmit Matthieu Baerts (NGI0)
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Paolo Abeni, sashal, Matthieu Baerts, Jakub Kicinski
From: Paolo Abeni <pabeni@redhat.com>
commit e32d262c89e2b22cb0640223f953b548617ed8a6 upstream.
Bugged peer implementation can send corrupted DSS options, consistently
hitting a few warning in the data path. Use DEBUG_NET assertions, to
avoid the splat on some builds and handle consistently the error, dumping
related MIBs and performing fallback and/or reset according to the
subflow type.
Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflicts in mib.[ch], because commit 104125b82e5c ("mptcp: add mib
for infinite map sending") is linked to a new feature, not available
in this version. Resolving the conflicts is easy, simply adding the
new lines declaring the new "DSS corruptions" MIB entries.
Also removed in protocol.c and subflow.c all DEBUG_NET_WARN_ON_ONCE
because they are not defined in this version: enough with the MIB
counters that have been added in this commit. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/mib.c | 2 ++
net/mptcp/mib.h | 2 ++
net/mptcp/protocol.c | 20 +++++++++++++++++---
net/mptcp/subflow.c | 2 +-
4 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index c2fadfcfd6d6..08f82e1ca2f7 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -26,6 +26,8 @@ static const struct snmp_mib mptcp_snmp_list[] = {
SNMP_MIB_ITEM("MPJoinAckRx", MPTCP_MIB_JOINACKRX),
SNMP_MIB_ITEM("MPJoinAckHMacFailure", MPTCP_MIB_JOINACKMAC),
SNMP_MIB_ITEM("DSSNotMatching", MPTCP_MIB_DSSNOMATCH),
+ SNMP_MIB_ITEM("DSSCorruptionFallback", MPTCP_MIB_DSSCORRUPTIONFALLBACK),
+ SNMP_MIB_ITEM("DSSCorruptionReset", MPTCP_MIB_DSSCORRUPTIONRESET),
SNMP_MIB_ITEM("InfiniteMapRx", MPTCP_MIB_INFINITEMAPRX),
SNMP_MIB_ITEM("DSSNoMatchTCP", MPTCP_MIB_DSSTCPMISMATCH),
SNMP_MIB_ITEM("DataCsumErr", MPTCP_MIB_DATACSUMERR),
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 90025acdcf72..1b7f6d24904b 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -19,6 +19,8 @@ enum linux_mptcp_mib_field {
MPTCP_MIB_JOINACKRX, /* Received an ACK + MP_JOIN */
MPTCP_MIB_JOINACKMAC, /* HMAC was wrong on ACK + MP_JOIN */
MPTCP_MIB_DSSNOMATCH, /* Received a new mapping that did not match the previous one */
+ MPTCP_MIB_DSSCORRUPTIONFALLBACK,/* DSS corruption detected, fallback */
+ MPTCP_MIB_DSSCORRUPTIONRESET, /* DSS corruption detected, MPJ subflow reset */
MPTCP_MIB_INFINITEMAPRX, /* Received an infinite mapping */
MPTCP_MIB_DSSTCPMISMATCH, /* DSS-mapping did not map with TCP's sequence numbers */
MPTCP_MIB_DATACSUMERR, /* The data checksum fail */
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 73a0b0d15382..34c98596350e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -554,6 +554,18 @@ static bool mptcp_check_data_fin(struct sock *sk)
return ret;
}
+static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
+{
+ if (READ_ONCE(msk->allow_infinite_fallback)) {
+ MPTCP_INC_STATS(sock_net(ssk),
+ MPTCP_MIB_DSSCORRUPTIONFALLBACK);
+ mptcp_do_fallback(ssk);
+ } else {
+ MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET);
+ mptcp_subflow_reset(ssk);
+ }
+}
+
static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
struct sock *ssk,
unsigned int *bytes)
@@ -626,10 +638,12 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
moved += len;
seq += len;
- if (WARN_ON_ONCE(map_remaining < len))
- break;
+ if (unlikely(map_remaining < len))
+ mptcp_dss_corruption(msk, ssk);
} else {
- WARN_ON_ONCE(!fin);
+ if (unlikely(!fin))
+ mptcp_dss_corruption(msk, ssk);
+
sk_eat_skb(ssk, skb);
done = true;
}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 412823af2c1d..7eff961267d0 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -847,7 +847,7 @@ static bool skb_is_fully_mapped(struct sock *ssk, struct sk_buff *skb)
unsigned int skb_consumed;
skb_consumed = tcp_sk(ssk)->copied_seq - TCP_SKB_CB(skb)->seq;
- if (WARN_ON_ONCE(skb_consumed >= skb->len))
+ if (unlikely(skb_consumed >= skb->len))
return true;
return skb->len - skb_consumed <= subflow->map_data_len -
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 5.15.y 3/6] tcp: fix mptcp DSS corruption due to large pmtu xmit
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 1/6] mptcp: track and update contiguous data status Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 2/6] mptcp: handle consistently DSS corruption Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 4/6] mptcp: fallback when MPTCP opts are dropped after 1st data Matthieu Baerts (NGI0)
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Paolo Abeni, sashal, syzbot+d1bff73460e33101f0e7, Matthieu Baerts,
Jakub Kicinski
From: Paolo Abeni <pabeni@redhat.com>
commit 4dabcdf581217e60690467a37c956a5b8dbc6bd9 upstream.
Syzkaller was able to trigger a DSS corruption:
TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
Modules linked in:
CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
FS: 000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
__netif_receive_skb_one_core net/core/dev.c:5662 [inline]
__netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
process_backlog+0x662/0x15b0 net/core/dev.c:6107
__napi_poll+0xcb/0x490 net/core/dev.c:6771
napi_poll net/core/dev.c:6840 [inline]
net_rx_action+0x89b/0x1240 net/core/dev.c:6962
handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
do_softirq+0x11b/0x1e0 kernel/softirq.c:455
</IRQ>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
dev_queue_xmit include/linux/netdevice.h:3094 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
ip_local_out net/ipv4/ip_output.c:130 [inline]
__ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
__tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
__tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
tcp_push_pending_frames include/net/tcp.h:2107 [inline]
tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
sk_backlog_rcv include/net/sock.h:1113 [inline]
__release_sock+0x214/0x350 net/core/sock.c:3072
release_sock+0x61/0x1f0 net/core/sock.c:3626
mptcp_push_release net/mptcp/protocol.c:1486 [inline]
__mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:745
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
___sys_sendmsg net/socket.c:2657 [inline]
__sys_sendmsg+0x2aa/0x390 net/socket.c:2686
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb06e9317f9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
</TASK>
Additionally syzkaller provided a nice reproducer. The repro enables
pmtu on the loopback device, leading to tcp_mtu_probe() generating
very large probe packets.
tcp_can_coalesce_send_queue_head() currently does not check for
mptcp-level invariants, and allowed the creation of cross-DSS probes,
leading to the mentioned corruption.
Address the issue teaching tcp_can_coalesce_send_queue_head() about
mptcp using the tcp_skb_can_collapse(), also reducing the code
duplication.
Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Cc: stable@vger.kernel.org
Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflict in tcp_output.c, because commit 65249feb6b3d ("net: add
support for skbs with unreadable frags"), and commit 9b65b17db723
("net: avoid double accounting for pure zerocopy skbs") are not in
this version. These commits are linked to new features and introduce
new conditions which cause the conflicts. Resolving this is easy: we
can ignore the missing new condition, and use tcp_skb_can_collapse()
like in the original patch. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2c9670c83202..44eedae43eaa 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2308,7 +2308,7 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len)
if (len <= skb->len)
break;
- if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb))
+ if (tcp_has_tx_tstamp(skb) || !tcp_skb_can_collapse(skb, next))
return false;
len -= skb->len;
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 5.15.y 4/6] mptcp: fallback when MPTCP opts are dropped after 1st data
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
` (2 preceding siblings ...)
2024-10-19 9:30 ` [PATCH 5.15.y 3/6] tcp: fix mptcp DSS corruption due to large pmtu xmit Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 5/6] mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow Matthieu Baerts (NGI0)
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Matthieu Baerts (NGI0), sashal, Christoph Paasch, Paolo Abeni,
Jakub Kicinski
commit 119d51e225febc8152476340a880f5415a01e99e upstream.
As reported by Christoph [1], before this patch, an MPTCP connection was
wrongly reset when a host received a first data packet with MPTCP
options after the 3wHS, but got the next ones without.
According to the MPTCP v1 specs [2], a fallback should happen in this
case, because the host didn't receive a DATA_ACK from the other peer,
nor receive data for more than the initial window which implies a
DATA_ACK being received by the other peer.
The patch here re-uses the same logic as the one used in other places:
by looking at allow_infinite_fallback, which is disabled at the creation
of an additional subflow. It's not looking at the first DATA_ACK (or
implying one received from the other side) as suggested by the RFC, but
it is in continuation with what was already done, which is safer, and it
fixes the reported issue. The next step, looking at this first DATA_ACK,
is tracked in [4].
This patch has been validated using the following Packetdrill script:
0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// 3WHS is OK
+0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
+0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
+0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
+0 accept(3, ..., ...) = 4
// Data from the client with valid MPTCP options (no DATA_ACK: normal)
+0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
// From here, the MPTCP options will be dropped by a middlebox
+0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs>
+0.1 read(4, ..., 500) = 500
+0 write(4, ..., 100) = 100
// The server replies with data, still thinking MPTCP is being used
+0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
// But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
+0.1 < . 501:501(0) ack 101 win 2048
+0.0 < P. 501:601(100) ack 101 win 2048
// The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
+0.0 > . 101:101(0) ack 601
Note that this script requires Packetdrill with MPTCP support, see [3].
Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
Link: https://github.com/multipath-tcp/packetdrill [3]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/subflow.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 7eff961267d0..feb146a62f97 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1152,7 +1152,7 @@ static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
else if (READ_ONCE(msk->csum_enabled))
return !subflow->valid_csum_seen;
else
- return !subflow->fully_established;
+ return READ_ONCE(msk->allow_infinite_fallback);
}
static bool subflow_check_data_avail(struct sock *ssk)
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 5.15.y 5/6] mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
` (3 preceding siblings ...)
2024-10-19 9:30 ` [PATCH 5.15.y 4/6] mptcp: fallback when MPTCP opts are dropped after 1st data Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-19 9:30 ` [PATCH 5.15.y 6/6] mptcp: prevent MPC handshake on port-based signal endpoints Matthieu Baerts (NGI0)
2024-10-21 9:41 ` [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Greg KH
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Matthieu Baerts (NGI0), sashal, syzbot+3c8b7a8e7df6a2a226ca,
Paolo Abeni
commit 7decd1f5904a489d3ccdcf131972f94645681689 upstream.
Syzkaller reported this splat:
==================================================================
BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662
CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf7fe4579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Allocated by task 5387:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
__tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
__mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Freed by task 113:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x14f/0x4b0 mm/slub.c:4727
kvfree+0x47/0x50 mm/util.c:701
kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
shrink_many mm/vmscan.c:4879 [inline]
lru_gen_shrink_node mm/vmscan.c:4957 [inline]
shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
kswapd_shrink_node mm/vmscan.c:6765 [inline]
balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Last potentially related work creation:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
sk_backlog_rcv include/net/sock.h:1115 [inline]
__release_sock+0x31b/0x400 net/core/sock.c:3072
__tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
__mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
The buggy address belongs to the object at ffff8880569ac800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 88 bytes inside of
freed 512-byte region [ffff8880569ac800, ffff8880569aca00)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
__alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab mm/slub.c:2578 [inline]
new_slab+0x2ba/0x3f0 mm/slub.c:2631
___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
mld_add_delrec net/ipv6/mcast.c:743 [inline]
igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
__ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
__ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
call_netdevice_notifiers net/core/dev.c:2048 [inline]
dev_close_many+0x333/0x6a0 net/core/dev.c:1589
page last free pid 13136 tgid 13136 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:319 [inline]
__kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
device_del+0x623/0x9f0 drivers/base/core.c:3882
snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
device_remove drivers/base/dd.c:569 [inline]
device_remove+0x122/0x170 drivers/base/dd.c:561
That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.
Fixes: 1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: stable@vger.kernel.org
Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/670d7337.050a0220.4cbc0.004f.GAE@google.com
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ Conflicts in pm_netlink.c, because commit a88c9e496937 ("mptcp: do not
block subflows creation on errors") is linked to a new feature, not
available in this version. This commit modifies the context. Resolving
the conflicts is easy, simply moving the lines the same way it was
done in the original patch, ignoring the comment that is not in this
version. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index e524171291bc..133c5f2b3ba6 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -793,10 +793,10 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
i, rm_list->ids[i], subflow->local_id, subflow->remote_id);
spin_unlock_bh(&msk->pm.lock);
mptcp_subflow_shutdown(sk, ssk, how);
+ removed |= subflow->request_join;
mptcp_close_ssk(sk, ssk, subflow);
spin_lock_bh(&msk->pm.lock);
- removed |= subflow->request_join;
msk->pm.subflows--;
if (rm_type == MPTCP_MIB_RMSUBFLOW)
__MPTCP_INC_STATS(sock_net(sk), rm_type);
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 5.15.y 6/6] mptcp: prevent MPC handshake on port-based signal endpoints
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
` (4 preceding siblings ...)
2024-10-19 9:30 ` [PATCH 5.15.y 5/6] mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow Matthieu Baerts (NGI0)
@ 2024-10-19 9:30 ` Matthieu Baerts (NGI0)
2024-10-21 9:41 ` [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Greg KH
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-10-19 9:30 UTC (permalink / raw)
To: mptcp, stable, gregkh
Cc: Paolo Abeni, sashal, syzbot+f4aacdfef2c6a6529c3e, Cong Wang,
Matthieu Baerts, Mat Martineau, Jakub Kicinski
From: Paolo Abeni <pabeni@redhat.com>
commit 3d041393ea8c815f773020fb4a995331a69c0139 upstream.
Syzkaller reported a lockdep splat:
============================================
WARNING: possible recursive locking detected
6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
--------------------------------------------
syz-executor364/5113 is trying to acquire lock:
ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
but task is already holding lock:
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(k-slock-AF_INET);
lock(k-slock-AF_INET);
*** DEADLOCK ***
May be due to missing lock nesting notation
7 locks held by syz-executor364/5113:
#0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
#0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
#1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
#1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
#2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
#3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
#4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
#4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
#5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
#5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
#6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
stack backtrace:
CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
check_deadlock kernel/locking/lockdep.c:3061 [inline]
validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
__lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
__netif_receive_skb_one_core net/core/dev.c:5661 [inline]
__netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
process_backlog+0x662/0x15b0 net/core/dev.c:6108
__napi_poll+0xcb/0x490 net/core/dev.c:6772
napi_poll net/core/dev.c:6841 [inline]
net_rx_action+0x89b/0x1240 net/core/dev.c:6963
handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
do_softirq+0x11b/0x1e0 kernel/softirq.c:455
</IRQ>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
__dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
dev_queue_xmit include/linux/netdevice.h:3105 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
ip_local_out net/ipv4/ip_output.c:129 [inline]
__ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
__tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
sk_backlog_rcv include/net/sock.h:1111 [inline]
__release_sock+0x214/0x350 net/core/sock.c:3004
release_sock+0x61/0x1f0 net/core/sock.c:3558
mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
__do_sys_sendmmsg net/socket.c:2766 [inline]
__se_sys_sendmmsg net/socket.c:2763 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f04fb13a6b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9
RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004
RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240
R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300
</TASK>
As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.
Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.
Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.
Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflicts in mib.[ch], because commit 6982826fe5e5 ("mptcp: fallback
to TCP after SYN+MPC drops"), and commit 27069e7cb3d1 ("mptcp: disable
active MPTCP in case of blackhole") are linked to new features, not
available in this version. Resolving the conflicts is easy, simply
adding the new lines declaring the new "endpoint attempt" MIB entry.
Also a conflict in protocol.h, because commit fce68b03086f ("mptcp:
add scheduled in mptcp_subflow_context") is not in this version, and
changes the context by introducing 'scheduled' variable just before.
Also a conflict in pm_netlink.c, because commit 3aa362494170 ("mptcp:
avoid ssock usage in mptcp_pm_nl_create_listen_socket()") is not in
this version, and refactor the function: that's fine, we can still set
pm_listener before doing the 'listen()', taking 'ssock->sk' as 'ssk'
is not defined before this refactoring. There is also a conflict
because the context has been changed later in commit 69925a346acb
("mptcp: ensure listener is unhashed before updating the sk status"). ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 11 +++++++++++
5 files changed, 15 insertions(+)
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 08f82e1ca2f7..3e773259fa83 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -15,6 +15,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
SNMP_MIB_ITEM("MPCapableACKRX", MPTCP_MIB_MPCAPABLEPASSIVEACK),
SNMP_MIB_ITEM("MPCapableFallbackACK", MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK),
SNMP_MIB_ITEM("MPCapableFallbackSYNACK", MPTCP_MIB_MPCAPABLEACTIVEFALLBACK),
+ SNMP_MIB_ITEM("MPCapableEndpAttempt", MPTCP_MIB_MPCAPABLEENDPATTEMPT),
SNMP_MIB_ITEM("MPFallbackTokenInit", MPTCP_MIB_TOKENFALLBACKINIT),
SNMP_MIB_ITEM("MPTCPRetrans", MPTCP_MIB_RETRANSSEGS),
SNMP_MIB_ITEM("MPJoinNoTokenFound", MPTCP_MIB_JOINNOTOKEN),
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 1b7f6d24904b..0690db18fc95 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -8,6 +8,7 @@ enum linux_mptcp_mib_field {
MPTCP_MIB_MPCAPABLEPASSIVEACK, /* Received third ACK with MP_CAPABLE */
MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK,/* Server-side fallback during 3-way handshake */
MPTCP_MIB_MPCAPABLEACTIVEFALLBACK, /* Client-side fallback during 3-way handshake */
+ MPTCP_MIB_MPCAPABLEENDPATTEMPT, /* Prohibited MPC to port-based endp */
MPTCP_MIB_TOKENFALLBACKINIT, /* Could not init/allocate token */
MPTCP_MIB_RETRANSSEGS, /* Segments retransmitted at the MPTCP-level */
MPTCP_MIB_JOINNOTOKEN, /* Received MP_JOIN but the token was not found */
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 133c5f2b3ba6..a7a46d99d5a3 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -991,6 +991,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
goto out;
}
+ WRITE_ONCE(mptcp_subflow_ctx(ssock->sk)->pm_listener, true);
err = kernel_listen(ssock, backlog);
if (err) {
pr_warn("kernel_listen error, err=%d", err);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 5d458c3161cd..8f5e5a66babf 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -446,6 +446,7 @@ struct mptcp_subflow_context {
close_event_done : 1, /* has done the post-closed part */
__unused : 11;
enum mptcp_data_avail data_avail;
+ bool pm_listener; /* a listener managed by the kernel PM? */
u32 remote_nonce;
u64 thmac;
u32 local_nonce;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index feb146a62f97..d8b33e10750b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -129,6 +129,13 @@ static void subflow_add_reset_reason(struct sk_buff *skb, u8 reason)
}
}
+static int subflow_reset_req_endp(struct request_sock *req, struct sk_buff *skb)
+{
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEENDPATTEMPT);
+ subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
+ return -EPERM;
+}
+
/* Init mptcp request socket.
*
* Returns an error code if a JOIN has failed and a TCP reset
@@ -160,6 +167,8 @@ static int subflow_check_req(struct request_sock *req,
if (opt_mp_capable) {
SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVE);
+ if (unlikely(listener->pm_listener))
+ return subflow_reset_req_endp(req, skb);
if (opt_mp_join)
return 0;
} else if (opt_mp_join) {
@@ -167,6 +176,8 @@ static int subflow_check_req(struct request_sock *req,
if (mp_opt.backup)
SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINSYNBACKUPRX);
+ } else if (unlikely(listener->pm_listener)) {
+ return subflow_reset_req_endp(req, skb);
}
if (opt_mp_capable && listener->request_mptcp) {
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 5.15.y 0/6] mptcp: fix recent failed backports
2024-10-19 9:30 [PATCH 5.15.y 0/6] mptcp: fix recent failed backports Matthieu Baerts (NGI0)
` (5 preceding siblings ...)
2024-10-19 9:30 ` [PATCH 5.15.y 6/6] mptcp: prevent MPC handshake on port-based signal endpoints Matthieu Baerts (NGI0)
@ 2024-10-21 9:41 ` Greg KH
6 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2024-10-21 9:41 UTC (permalink / raw)
To: Matthieu Baerts (NGI0); +Cc: mptcp, stable, sashal
On Sat, Oct 19, 2024 at 11:30:46AM +0200, Matthieu Baerts (NGI0) wrote:
> Greg recently reported 6 patches that could not be applied without
> conflicts in v5.15:
>
> - e32d262c89e2 ("mptcp: handle consistently DSS corruption")
> - 4dabcdf58121 ("tcp: fix mptcp DSS corruption due to large pmtu xmit")
> - 119d51e225fe ("mptcp: fallback when MPTCP opts are dropped after 1st
> data")
> - 7decd1f5904a ("mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow")
> - 3d041393ea8c ("mptcp: prevent MPC handshake on port-based signal
> endpoints")
> - 5afca7e996c4 ("selftests: mptcp: join: test for prohibited MPC to
> port-based endp")
>
> Conflicts have been resolved for the 5 first ones, and documented in
> each patch.
Now queued up, thanks!
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread