From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting
Date: Fri, 5 Dec 2025 14:47:52 +0100 [thread overview]
Message-ID: <01a8fa17-6223-410c-921f-985d02ff880b@kernel.org> (raw)
In-Reply-To: <ffcda047-0f64-4348-a65b-d3c2d139e6e4@redhat.com>
Hi Paolo,
On 05/12/2025 09:06, Paolo Abeni wrote:
> On 12/4/25 6:44 PM, Matthieu Baerts wrote:
>> On 03/12/2025 19:55, Paolo Abeni wrote:
>>> Jakub reported an MPTCP deadlock at fallback time:
>>>
>>> WARNING: possible recursive locking detected
>>> 6.18.0-rc7-virtme #1 Not tainted
>>> --------------------------------------------
>>> mptcp_connect/20858 is trying to acquire lock:
>>> ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
>>>
>>> but task is already holding lock:
>>> ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>
>>> other info that might help us debug this:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0
>>> ----
>>> lock(&msk->fallback_lock);
>>> lock(&msk->fallback_lock);
>>>
>>> *** DEADLOCK ***
>>>
>>> May be due to missing lock nesting notation
>>>
>>> 3 locks held by mptcp_connect/20858:
>>> #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
>>> #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
>>> #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>
>>> stack backtrace:
>>> CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
>>> Hardware name: Bochs, BIOS Bochs 01/01/2011
>>> Call Trace:
>>> <TASK>
>>> dump_stack_lvl+0x6f/0xa0
>>> print_deadlock_bug.cold+0xc0/0xcd
>>> validate_chain+0x2ff/0x5f0
>>> __lock_acquire+0x34c/0x740
>>> lock_acquire.part.0+0xbc/0x260
>>> _raw_spin_lock_bh+0x38/0x50
>>> __mptcp_try_fallback+0xd8/0x280
>>> mptcp_sendmsg_frag+0x16c2/0x3050
>>> __mptcp_retrans+0x421/0xaa0
>>> mptcp_release_cb+0x5aa/0xa70
>>> release_sock+0xab/0x1d0
>>> mptcp_sendmsg+0xd5b/0x1bc0
>>> sock_write_iter+0x281/0x4d0
>>> new_sync_write+0x3c5/0x6f0
>>> vfs_write+0x65e/0xbb0
>>> ksys_write+0x17e/0x200
>>> do_syscall_64+0xbb/0xfd0
>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>> RIP: 0033:0x7fa5627cbc5e
>>> Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
>>> RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e
>>> RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005
>>> RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920
>>> R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c
>>>
>>> The packet scheduler could attempt a reinjection after receiving an
>>> MP_FAIL and before the infinite map has been transmitted, causing a
>>> deadlock since MPTCP needs to do the reinjection atomically from WRT
>>> fallback.
>>>
>>> Address the issue explicitly avoiding the reinjection in the critical
>>> scenario. Note that this is the only fallback critical section that
>>> could potentially send packets and hit the double-lock.
>>
>> Thank you for the fix!
>>
>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>
>> Out-of-curiosity: any idea why we only see it now while the fix tag is
>> from July? :)
>
> The deadlock is deterministic, when the relevant pre-conditions are
> reached; but such pre-req are quite/very unlikely:
>
> - the peer send an MP_FAIL [1]
> - the ssk/pm/msk tries to send an ack reply with infinite mapping
> - allocation of such skb fails [2]
> - the scheduler kick a mptcp-level retransmission before any other
> later transmit [3]
>
> Eech of [1], [2] and [3] is quite/very unlikely and we need all of them
> with suitable/strict time scheduling.
Thank you for your reply!
[1] is expected in this selftest, [3] I can understand, but not [2]. Or
an issue with the memory size allocated per VM in the new NIPA LF machines?
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-12-05 13:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 18:55 [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting Paolo Abeni
2025-12-04 10:20 ` MPTCP CI
2025-12-04 17:44 ` Matthieu Baerts
2025-12-04 18:27 ` Matthieu Baerts
2025-12-05 8:06 ` Paolo Abeni
2025-12-05 13:47 ` Matthieu Baerts [this message]
2025-12-05 13:53 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01a8fa17-6223-410c-921f-985d02ff880b@kernel.org \
--to=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox