public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting
Date: Fri, 5 Dec 2025 14:47:52 +0100	[thread overview]
Message-ID: <01a8fa17-6223-410c-921f-985d02ff880b@kernel.org> (raw)
In-Reply-To: <ffcda047-0f64-4348-a65b-d3c2d139e6e4@redhat.com>

Hi Paolo,

On 05/12/2025 09:06, Paolo Abeni wrote:
> On 12/4/25 6:44 PM, Matthieu Baerts wrote:
>> On 03/12/2025 19:55, Paolo Abeni wrote:
>>> Jakub reported an MPTCP deadlock at fallback time:
>>>
>>>  WARNING: possible recursive locking detected
>>>  6.18.0-rc7-virtme #1 Not tainted
>>>  --------------------------------------------
>>>  mptcp_connect/20858 is trying to acquire lock:
>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
>>>
>>>  but task is already holding lock:
>>>  ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>
>>>  other info that might help us debug this:
>>>   Possible unsafe locking scenario:
>>>
>>>         CPU0
>>>         ----
>>>    lock(&msk->fallback_lock);
>>>    lock(&msk->fallback_lock);
>>>
>>>   *** DEADLOCK ***
>>>
>>>   May be due to missing lock nesting notation
>>>
>>>  3 locks held by mptcp_connect/20858:
>>>   #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
>>>   #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
>>>   #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
>>>
>>>  stack backtrace:
>>>  CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
>>>  Hardware name: Bochs, BIOS Bochs 01/01/2011
>>>  Call Trace:
>>>   <TASK>
>>>   dump_stack_lvl+0x6f/0xa0
>>>   print_deadlock_bug.cold+0xc0/0xcd
>>>   validate_chain+0x2ff/0x5f0
>>>   __lock_acquire+0x34c/0x740
>>>   lock_acquire.part.0+0xbc/0x260
>>>   _raw_spin_lock_bh+0x38/0x50
>>>   __mptcp_try_fallback+0xd8/0x280
>>>   mptcp_sendmsg_frag+0x16c2/0x3050
>>>   __mptcp_retrans+0x421/0xaa0
>>>   mptcp_release_cb+0x5aa/0xa70
>>>   release_sock+0xab/0x1d0
>>>   mptcp_sendmsg+0xd5b/0x1bc0
>>>   sock_write_iter+0x281/0x4d0
>>>   new_sync_write+0x3c5/0x6f0
>>>   vfs_write+0x65e/0xbb0
>>>   ksys_write+0x17e/0x200
>>>   do_syscall_64+0xbb/0xfd0
>>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>  RIP: 0033:0x7fa5627cbc5e
>>>  Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
>>>  RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>  RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e
>>>  RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005
>>>  RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000
>>>  R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920
>>>  R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c
>>>
>>> The packet scheduler could attempt a reinjection after receiving an
>>> MP_FAIL and before the infinite map has been transmitted, causing a
>>> deadlock since MPTCP needs to do the reinjection atomically from WRT
>>> fallback.
>>>
>>> Address the issue explicitly avoiding the reinjection in the critical
>>> scenario. Note that this is the only fallback critical section that
>>> could potentially send packets and hit the double-lock.
>>
>> Thank you for the fix!
>>
>> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>
>> Out-of-curiosity: any idea why we only see it now while the fix tag is
>> from July? :)
> 
> The deadlock is deterministic, when the relevant pre-conditions are
> reached; but such pre-req are quite/very unlikely:
> 
> - the peer send an MP_FAIL [1]
> - the ssk/pm/msk tries to send an ack reply with infinite mapping
> - allocation of such skb fails [2]
> - the scheduler kick a mptcp-level retransmission before any other
>   later transmit [3]
> 
> Eech of [1], [2] and [3] is quite/very unlikely and we need all of them
> with suitable/strict time scheduling.

Thank you for your reply!

[1] is expected in this selftest, [3] I can understand, but not [2]. Or
an issue with the memory size allocated per VM in the new NIPA LF machines?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2025-12-05 13:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03 18:55 [PATCH mptcp-net] mptcp: avoid deadlock on fallback while reinjecting Paolo Abeni
2025-12-04 10:20 ` MPTCP CI
2025-12-04 17:44 ` Matthieu Baerts
2025-12-04 18:27   ` Matthieu Baerts
2025-12-05  8:06   ` Paolo Abeni
2025-12-05 13:47     ` Matthieu Baerts [this message]
2025-12-05 13:53       ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01a8fa17-6223-410c-921f-985d02ff880b@kernel.org \
    --to=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox