From: Paolo Abeni <pabeni@redhat.com>
To: Matthieu Baerts <matttbe@kernel.org>
Cc: MPTCP Linux <mptcp@lists.linux.dev>
Subject: Re: [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting
Date: Fri, 5 Dec 2025 14:53:52 +0100 [thread overview]
Message-ID: <2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com> (raw)
In-Reply-To: <01a8fa17-6223-410c-921f-985d02ff880b@kernel.org>
On 12/5/25 2:47 PM, Matthieu Baerts wrote:
> On 05/12/2025 09:06, Paolo Abeni wrote:
>> On 12/4/25 6:44 PM, Matthieu Baerts wrote:
>>> On 03/12/2025 19:55, Paolo Abeni wrote:
>>>> Jakub reported an MPTCP deadlock at fallback time:
>>>>
>>>> WARNING: possible recursive locking detected
>>>> 6.18.0-rc7-virtme #1 Not tainted
>>>> --------------------------------------------
>>>> mptcp_connect/20858 is trying to acquire lock:
>>>> ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
>>>>
>>>> but task is already holding lock:
>>>> ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>> other info that might help us debug this:
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0
>>>> ----
>>>> lock(&msk->fallback_lock);
>>>> lock(&msk->fallback_lock);
>>>>
>>>> *** DEADLOCK ***
>>>>
>>>> May be due to missing lock nesting notation
>>>>
>>>> 3 locks held by mptcp_connect/20858:
>>>> #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
>>>> #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
>>>> #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>> stack backtrace:
>>>> CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
>>>> Hardware name: Bochs, BIOS Bochs 01/01/2011
>>>> Call Trace:
>>>> <TASK>
>>>> dump_stack_lvl+0x6f/0xa0
>>>> print_deadlock_bug.cold+0xc0/0xcd
>>>> validate_chain+0x2ff/0x5f0
>>>> __lock_acquire+0x34c/0x740
>>>> lock_acquire.part.0+0xbc/0x260
>>>> _raw_spin_lock_bh+0x38/0x50
>>>> __mptcp_try_fallback+0xd8/0x280
>>>> mptcp_sendmsg_frag+0x16c2/0x3050
>>>> __mptcp_retrans+0x421/0xaa0
>>>> mptcp_release_cb+0x5aa/0xa70
>>>> release_sock+0xab/0x1d0
>>>> mptcp_sendmsg+0xd5b/0x1bc0
>>>> sock_write_iter+0x281/0x4d0
>>>> new_sync_write+0x3c5/0x6f0
>>>> vfs_write+0x65e/0xbb0
>>>> ksys_write+0x17e/0x200
>>>> do_syscall_64+0xbb/0xfd0
>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>> RIP: 0033:0x7fa5627cbc5e
>>>> Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
>>>> RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e
>>>> RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005
>>>> RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920
>>>> R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c
>>>>
>>>> The packet scheduler could attempt a reinjection after receiving an
>>>> MP_FAIL and before the infinite map has been transmitted, causing a
>>>> deadlock since MPTCP needs to do the reinjection atomically from WRT
>>>> fallback.
>>>>
>>>> Address the issue explicitly avoiding the reinjection in the critical
>>>> scenario. Note that this is the only fallback critical section that
>>>> could potentially send packets and hit the double-lock.
>>>
>>> Thank you for the fix!
>>>
>>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>
>>> Out-of-curiosity: any idea why we only see it now while the fix tag is
>>> from July? :)
>>
>> The deadlock is deterministic, when the relevant pre-conditions are
>> reached; but such pre-req are quite/very unlikely:
>>
>> - the peer send an MP_FAIL [1]
>> - the ssk/pm/msk tries to send an ack reply with infinite mapping
>> - allocation of such skb fails [2]
>> - the scheduler kick a mptcp-level retransmission before any other
>> later transmit [3]
>>
>> Eech of [1], [2] and [3] is quite/very unlikely and we need all of them
>> with suitable/strict time scheduling.
>
> Thank you for your reply!
>
> [1] is expected in this selftest, [3] I can understand, but not [2]. Or
> an issue with the memory size allocated per VM in the new NIPA LF machines?
Allocations can always fail, and this one is GFP_ATOMIC, so even more
likely. (note that the allocation includes __GFP_NOWARN).
Possibly nipa VMs are provisioned with a limited amount of memory (IDK)
/P
prev parent reply other threads:[~2025-12-05 13:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 18:55 [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting Paolo Abeni
2025-12-04 10:20 ` MPTCP CI
2025-12-04 17:44 ` Matthieu Baerts
2025-12-04 18:27 ` Matthieu Baerts
2025-12-05 8:06 ` Paolo Abeni
2025-12-05 13:47 ` Matthieu Baerts
2025-12-05 13:53 ` Paolo Abeni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com \
--to=pabeni@redhat.com \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox