public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Matthieu Baerts <matttbe@kernel.org>
Cc: MPTCP Linux <mptcp@lists.linux.dev>
Subject: Re: [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting
Date: Fri, 5 Dec 2025 14:53:52 +0100	[thread overview]
Message-ID: <2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com> (raw)
In-Reply-To: <01a8fa17-6223-410c-921f-985d02ff880b@kernel.org>

On 12/5/25 2:47 PM, Matthieu Baerts wrote:
> On 05/12/2025 09:06, Paolo Abeni wrote:
>> On 12/4/25 6:44 PM, Matthieu Baerts wrote:
>>> On 03/12/2025 19:55, Paolo Abeni wrote:
>>>> Jakub reported an MPTCP deadlock at fallback time:
>>>>
>>>>  WARNING: possible recursive locking detected
>>>>  6.18.0-rc7-virtme #1 Not tainted
>>>>  --------------------------------------------
>>>>  mptcp_connect/20858 is trying to acquire lock:
>>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
>>>>
>>>>  but task is already holding lock:
>>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>>  other info that might help us debug this:
>>>>   Possible unsafe locking scenario:
>>>>
>>>>         CPU0
>>>>         ----
>>>>    lock(&msk->fallback_lock);
>>>>    lock(&msk->fallback_lock);
>>>>
>>>>   *** DEADLOCK ***
>>>>
>>>>   May be due to missing lock nesting notation
>>>>
>>>>  3 locks held by mptcp_connect/20858:
>>>>   #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
>>>>   #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
>>>>   #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>>  stack backtrace:
>>>>  CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
>>>>  Hardware name: Bochs, BIOS Bochs 01/01/2011
>>>>  Call Trace:
>>>>   <TASK>
>>>>   dump_stack_lvl+0x6f/0xa0
>>>>   print_deadlock_bug.cold+0xc0/0xcd
>>>>   validate_chain+0x2ff/0x5f0
>>>>   __lock_acquire+0x34c/0x740
>>>>   lock_acquire.part.0+0xbc/0x260
>>>>   _raw_spin_lock_bh+0x38/0x50
>>>>   __mptcp_try_fallback+0xd8/0x280
>>>>   mptcp_sendmsg_frag+0x16c2/0x3050
>>>>   __mptcp_retrans+0x421/0xaa0
>>>>   mptcp_release_cb+0x5aa/0xa70
>>>>   release_sock+0xab/0x1d0
>>>>   mptcp_sendmsg+0xd5b/0x1bc0
>>>>   sock_write_iter+0x281/0x4d0
>>>>   new_sync_write+0x3c5/0x6f0
>>>>   vfs_write+0x65e/0xbb0
>>>>   ksys_write+0x17e/0x200
>>>>   do_syscall_64+0xbb/0xfd0
>>>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>  RIP: 0033:0x7fa5627cbc5e
>>>>  Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
>>>>  RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>>  RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e
>>>>  RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005
>>>>  RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000
>>>>  R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920
>>>>  R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c
>>>>
>>>> The packet scheduler could attempt a reinjection after receiving an
>>>> MP_FAIL and before the infinite map has been transmitted, causing a
>>>> deadlock since MPTCP needs to do the reinjection atomically from WRT
>>>> fallback.
>>>>
>>>> Address the issue explicitly avoiding the reinjection in the critical
>>>> scenario. Note that this is the only fallback critical section that
>>>> could potentially send packets and hit the double-lock.
>>>
>>> Thank you for the fix!
>>>
>>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>
>>> Out-of-curiosity: any idea why we only see it now while the fix tag is
>>> from July? :)
>>
>> The deadlock is deterministic, when the relevant pre-conditions are
>> reached; but such pre-req are quite/very unlikely:
>>
>> - the peer send an MP_FAIL [1]
>> - the ssk/pm/msk tries to send an ack reply with infinite mapping
>> - allocation of such skb fails [2]
>> - the scheduler kick a mptcp-level retransmission before any other
>>   later transmit [3]
>>
>> Eech of [1], [2] and [3] is quite/very unlikely and we need all of them
>> with suitable/strict time scheduling.
> 
> Thank you for your reply!
> 
> [1] is expected in this selftest, [3] I can understand, but not [2]. Or
> an issue with the memory size allocated per VM in the new NIPA LF machines?

Allocations can always fail, and this one is GFP_ATOMIC, so even more
likely. (note that the allocation includes __GFP_NOWARN).

Possibly nipa VMs are provisioned with a limited amount of memory (IDK)

/P


      reply	other threads:[~2025-12-05 13:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03 18:55 [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting Paolo Abeni
2025-12-04 10:20 ` MPTCP CI
2025-12-04 17:44 ` Matthieu Baerts
2025-12-04 18:27   ` Matthieu Baerts
2025-12-05  8:06   ` Paolo Abeni
2025-12-05 13:47     ` Matthieu Baerts
2025-12-05 13:53       ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com \
    --to=pabeni@redhat.com \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox