All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Matthieu Baerts <matttbe@kernel.org>
Cc: MPTCP Linux <mptcp@lists.linux.dev>
Subject: Re: [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting
Date: Fri, 5 Dec 2025 14:53:52 +0100	[thread overview]
Message-ID: <2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com> (raw)
In-Reply-To: <01a8fa17-6223-410c-921f-985d02ff880b@kernel.org>

On 12/5/25 2:47 PM, Matthieu Baerts wrote:
> On 05/12/2025 09:06, Paolo Abeni wrote:
>> On 12/4/25 6:44 PM, Matthieu Baerts wrote:
>>> On 03/12/2025 19:55, Paolo Abeni wrote:
>>>> Jakub reported an MPTCP deadlock at fallback time:
>>>>
>>>>  WARNING: possible recursive locking detected
>>>>  6.18.0-rc7-virtme #1 Not tainted
>>>>  --------------------------------------------
>>>>  mptcp_connect/20858 is trying to acquire lock:
>>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
>>>>
>>>>  but task is already holding lock:
>>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>>  other info that might help us debug this:
>>>>   Possible unsafe locking scenario:
>>>>
>>>>         CPU0
>>>>         ----
>>>>    lock(&msk->fallback_lock);
>>>>    lock(&msk->fallback_lock);
>>>>
>>>>   *** DEADLOCK ***
>>>>
>>>>   May be due to missing lock nesting notation
>>>>
>>>>  3 locks held by mptcp_connect/20858:
>>>>   #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
>>>>   #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
>>>>   #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>>
>>>>  stack backtrace:
>>>>  CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
>>>>  Hardware name: Bochs, BIOS Bochs 01/01/2011
>>>>  Call Trace:
>>>>   <TASK>
>>>>   dump_stack_lvl+0x6f/0xa0
>>>>   print_deadlock_bug.cold+0xc0/0xcd
>>>>   validate_chain+0x2ff/0x5f0
>>>>   __lock_acquire+0x34c/0x740
>>>>   lock_acquire.part.0+0xbc/0x260
>>>>   _raw_spin_lock_bh+0x38/0x50
>>>>   __mptcp_try_fallback+0xd8/0x280
>>>>   mptcp_sendmsg_frag+0x16c2/0x3050
>>>>   __mptcp_retrans+0x421/0xaa0
>>>>   mptcp_release_cb+0x5aa/0xa70
>>>>   release_sock+0xab/0x1d0
>>>>   mptcp_sendmsg+0xd5b/0x1bc0
>>>>   sock_write_iter+0x281/0x4d0
>>>>   new_sync_write+0x3c5/0x6f0
>>>>   vfs_write+0x65e/0xbb0
>>>>   ksys_write+0x17e/0x200
>>>>   do_syscall_64+0xbb/0xfd0
>>>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>  RIP: 0033:0x7fa5627cbc5e
>>>>  Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
>>>>  RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>>  RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e
>>>>  RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005
>>>>  RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000
>>>>  R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920
>>>>  R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c
>>>>
>>>> The packet scheduler could attempt a reinjection after receiving an
>>>> MP_FAIL and before the infinite map has been transmitted, causing a
>>>> deadlock since MPTCP needs to do the reinjection atomically from WRT
>>>> fallback.
>>>>
>>>> Address the issue explicitly avoiding the reinjection in the critical
>>>> scenario. Note that this is the only fallback critical section that
>>>> could potentially send packets and hit the double-lock.
>>>
>>> Thank you for the fix!
>>>
>>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>
>>> Out-of-curiosity: any idea why we only see it now while the fix tag is
>>> from July? :)
>>
>> The deadlock is deterministic, when the relevant pre-conditions are
>> reached; but such pre-req are quite/very unlikely:
>>
>> - the peer send an MP_FAIL [1]
>> - the ssk/pm/msk tries to send an ack reply with infinite mapping
>> - allocation of such skb fails [2]
>> - the scheduler kick a mptcp-level retransmission before any other
>>   later transmit [3]
>>
>> Eech of [1], [2] and [3] is quite/very unlikely and we need all of them
>> with suitable/strict time scheduling.
> 
> Thank you for your reply!
> 
> [1] is expected in this selftest, [3] I can understand, but not [2]. Or
> an issue with the memory size allocated per VM in the new NIPA LF machines?

Allocations can always fail, and this one is GFP_ATOMIC, so even more
likely. (note that the allocation includes __GFP_NOWARN).

Possibly nipa VMs are provisioned with a limited amount of memory (IDK)

/P


      reply	other threads:[~2025-12-05 13:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03 18:55 [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting Paolo Abeni
2025-12-04 10:20 ` MPTCP CI
2025-12-04 17:44 ` Matthieu Baerts
2025-12-04 18:27   ` Matthieu Baerts
2025-12-05  8:06   ` Paolo Abeni
2025-12-05 13:47     ` Matthieu Baerts
2025-12-05 13:53       ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dbd0bbb-3e40-45dd-a5ff-e6c52c44cf2d@redhat.com \
    --to=pabeni@redhat.com \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.