netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
       [not found] ` <CAH2r5mtoz+4RSDLJijNFD6dRiLTWKou8m3M2mTp2cy7oPsP=Qg@mail.gmail.com>
@ 2024-12-19  0:30   ` Steve French
  0 siblings, 0 replies; 2+ messages in thread
From: Steve French @ 2024-12-19  0:30 UTC (permalink / raw)
  To: Network Development; +Cc: Enzo Matsumiya

[-- Attachment #1: Type: text/plain, Size: 4394 bytes --]

Correcting email for network mailing list

---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Wed, Dec 18, 2024 at 6:28 PM
Subject: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
<linux-net@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: Enzo Matsumiya <ematsumiya@suse.de>


Adding fsdevel and networking in case any thoughts on this fix for
network/namespaces refcount issue (that causes rmmod UAF).

Any opinions on Enzo's proposed Fix?

---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Tue, Dec 17, 2024 at 9:24 PM
Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
To: CIFS <linux-cifs@vger.kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Enzo Matsumiya <ematsumiya@suse.de>


Enzo had an interesting patch, that seems to fix an important problem.

Here was his repro scenario:

     tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
//someserver/target1 /mnt/test
     tw:~ # ls /mnt/test
     abc  dir1  dir3  target1_file.txt  tsub
     tw:~ # iptables -A INPUT -s someserver -j DROP

Trigger reconnect and wait for 3*echo_interval:

     tw:~ # cat /mnt/test/target1_file.txt
     cat: /mnt/test/target1_file.txt: Host is down

Then umount and rmmod.  Note that rmmod might take several iterations
until it properly tears down everything, so make sure you see the "not
loaded" message before proceeding:

     tw:~ # umount /mnt/*; rmmod cifs
     umount: /mnt/az: not mounted.
     umount: /mnt/dfs: not mounted.
     umount: /mnt/local: not mounted.
     umount: /mnt/scratch: not mounted.
     rmmod: ERROR: Module cifs is in use
     ...
     tw:~ # rmmod cifs
     rmmod: ERROR: Module cifs is not currently loaded

Then kickoff the TCP internals:
     tw:~ # iptables -F

Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
later on.


Any thoughts on his patch?  See below (and attached)

    Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
    fixed a netns UAF by manually enabled socket refcounting
    (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).

    The reason the patch worked for that bug was because we now hold
    references to the netns (get_net_track() gets a ref internally)
    and they're properly released (internally, on __sk_destruct()),
    but only because sk->sk_net_refcnt was set.

    Problem:
    (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
    if init_net or other)

    Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
    only out of cifs scope, but also technically wrong -- it's set conditionally
    based on user (=1) vs kernel (=0) sockets.  And net/ implementations
    seem to base their user vs kernel space operations on it.

    e.g. upon TCP socket close, the TCP timers are not cleared because
    sk->sk_net_refcnt=1:
    (cf. commit 151c9c724d05 ("tcp: properly terminate timers for
kernel sockets"))

    net/ipv4/tcp.c:
        void tcp_close(struct sock *sk, long timeout)
        {
            lock_sock(sk);
            __tcp_close(sk, timeout);
            release_sock(sk);
            if (!sk->sk_net_refcnt)
                    inet_csk_clear_xmit_timers_sync(sk);
            sock_put(sk);
        }

    Which will throw a lockdep warning and then, as expected, deadlock on
    tcp_write_timer().

    A way to reproduce this is by running the reproducer from ef7134c7fc48
    and then 'rmmod cifs'.  A few seconds later, the deadlock/lockdep
    warning shows up.

    Fix:
    We shouldn't mess with socket internals ourselves, so do not set
    sk_net_refcnt manually.

    Also change __sock_create() to sock_create_kern() for explicitness.

    As for non-init_net network namespaces, we deal with it the best way
    we can -- hold an extra netns reference for server->ssocket and drop it
    when it's released.  This ensures that the netns still exists whenever
    we need to create/destroy server->ssocket, but is not directly tied to
    it.

    Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")


--
Thanks,

Steve


--
Thanks,

Steve


-- 
Thanks,

Steve

[-- Attachment #2: 0001-smb-client-fix-TCP-timers-deadlock-after-rmmod.patch --]
[-- Type: application/x-patch, Size: 5831 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
       [not found] <CAH2r5msqxcvHcbDt0x_eNpbdPxUhgFoOAPchZ16EBZeFhCdAKA@mail.gmail.com>
       [not found] ` <CAH2r5mtoz+4RSDLJijNFD6dRiLTWKou8m3M2mTp2cy7oPsP=Qg@mail.gmail.com>
@ 2024-12-19  8:41 ` Kuniyuki Iwashima
  1 sibling, 0 replies; 2+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-19  8:41 UTC (permalink / raw)
  To: smfrench; +Cc: ematsumiya, kuniyu, linux-cifs, netdev

Hi,

sorry for the late response.

From: Steve French <smfrench@gmail.com>
Date: Tue, 17 Dec 2024 21:24:12 -0600
> Enzo had an interesting patch, that seems to fix an important problem.
> 
> Here was his repro scenario:
> 
>      tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
> //someserver/target1 /mnt/test
>      tw:~ # ls /mnt/test
>      abc  dir1  dir3  target1_file.txt  tsub
>      tw:~ # iptables -A INPUT -s someserver -j DROP
> 
> Trigger reconnect and wait for 3*echo_interval:
> 
>      tw:~ # cat /mnt/test/target1_file.txt
>      cat: /mnt/test/target1_file.txt: Host is down
> 
> Then umount and rmmod.  Note that rmmod might take several iterations
> until it properly tears down everything, so make sure you see the "not
> loaded" message before proceeding:
> 
>      tw:~ # umount /mnt/*; rmmod cifs
>      umount: /mnt/az: not mounted.
>      umount: /mnt/dfs: not mounted.
>      umount: /mnt/local: not mounted.
>      umount: /mnt/scratch: not mounted.
>      rmmod: ERROR: Module cifs is in use
>      ...
>      tw:~ # rmmod cifs
>      rmmod: ERROR: Module cifs is not currently loaded
> 
> Then kickoff the TCP internals:
>      tw:~ # iptables -F
> 
> Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
> later on.

I tried the repro and confirmed it triggers null deref.

It happens in LOCKDEP internal, so for me it looks like a problem in
LOCKDEP rather than CIFS or TCP.

I think LOCKDEP should hold a module reference and prevent related
modules from being unloaded.

[  242.144225][    C0] ------------[ cut here ]------------
[  242.144562][    C0] DEBUG_LOCKS_WARN_ON(1)
[  242.144575][    C0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:232 hlock_class+0xf9/0x130
[  242.145435][    C0] Modules linked in: [last unloaded: cifs]
[  242.145820][    C0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-01014-g835f27aa21c9 #25
[  242.146426][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  242.147197][    C0] RIP: 0010:hlock_class+0xf9/0x130
[  242.147529][    C0] Code: b6 14 11 38 d0 7c 04 84 d2 75 47 8b 05 54 a5 6b 04 85 c0 75 19 90 48 c7 c6 e0 e1 46 84 48 c7 c7 c0 dd 46 84 e8 88 6d eb ff 90 <0f> 0b 90 90 90 31 c0 5b c3 cc cc cc cc e8 05 91 4d 00 e9 19 ff ff
[  242.148779][    C0] RSP: 0018:ff1100006c009138 EFLAGS: 00010082
[  242.149168][    C0] RAX: 0000000000000000 RBX: 00000000000004e4 RCX: 0000000000000027
[  242.149673][    C0] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff1100006c1e8a08
[  242.150180][    C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[  242.150688][    C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[  242.151199][    C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[  242.151706][    C0] FS:  0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[  242.152275][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  242.152693][    C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[  242.153197][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  242.153698][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  242.154200][    C0] PKRU: 55555554
[  242.154426][    C0] Call Trace:
[  242.154636][    C0]  <IRQ>
[  242.154817][    C0]  ? __warn+0xcc/0x2e0
[  242.155086][    C0]  ? hlock_class+0xf9/0x130
[  242.155378][    C0]  ? report_bug+0x28c/0x2d0
[  242.155678][    C0]  ? handle_bug+0x54/0xa0
[  242.155954][    C0]  ? exc_invalid_op+0x18/0x50
[  242.156250][    C0]  ? asm_exc_invalid_op+0x1a/0x20
[  242.156579][    C0]  ? hlock_class+0xf9/0x130
[  242.156867][    C0]  ? hlock_class+0xf8/0x130
[  242.157154][    C0]  __lock_acquire+0x4d8/0x3c30
[  242.157459][    C0]  ? find_held_lock+0x33/0x120
[  242.157766][    C0]  ? __pfx___lock_acquire+0x10/0x10
[  242.158095][    C0]  ? __pfx_lock_release+0x10/0x10
[  242.158414][    C0]  lock_acquire+0x196/0x520
[  242.158699][    C0]  ? tcp_v4_rcv+0x2695/0x3b70
[  242.158999][    C0]  ? __pfx_lock_acquire+0x10/0x10
[  242.159325][    C0]  ? __pfx_tcp_inbound_hash+0x10/0x10
[  242.159673][    C0]  ? __pfx_sk_filter_trim_cap+0x10/0x10
[  242.160039][    C0]  ? kasan_quarantine_put+0x84/0x1d0
[  242.160382][    C0]  _raw_spin_lock_nested+0x2e/0x70
[  242.160707][    C0]  ? tcp_v4_rcv+0x2695/0x3b70
[  242.161004][    C0]  tcp_v4_rcv+0x2695/0x3b70
[  242.161292][    C0]  ? __pfx_tcp_v4_rcv+0x10/0x10
[  242.161598][    C0]  ? ip_local_deliver_finish+0x204/0x490
[  242.161960][    C0]  ? __pfx_raw_local_deliver+0x10/0x10
[  242.162308][    C0]  ? hlock_class+0x4e/0x130
[  242.162598][    C0]  ? lock_release+0x5d5/0xb00
[  242.162898][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.163223][    C0]  ip_protocol_deliver_rcu+0x8e/0x330
[  242.163568][    C0]  ip_local_deliver_finish+0x2bd/0x490
[  242.163922][    C0]  ip_local_deliver+0x198/0x450
[  242.164236][    C0]  ? __pfx_ip_local_deliver+0x10/0x10
[  242.164579][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.164896][    C0]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[  242.165273][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.165594][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.165921][    C0]  ip_sublist_rcv_finish+0x1e7/0x410
[  242.166260][    C0]  ip_sublist_rcv+0x380/0x760
[  242.166564][    C0]  ? __pfx_ip_sublist_rcv+0x10/0x10
[  242.166898][    C0]  ? hlock_class+0x4e/0x130
[  242.167187][    C0]  ? __lock_acquire+0x15e6/0x3c30
[  242.167506][    C0]  ? ip_fast_csum+0xc/0x20
[  242.167789][    C0]  ? ip_rcv_core+0x5dd/0xd30
[  242.168084][    C0]  ip_list_rcv+0x26e/0x380
[  242.168367][    C0]  ? __pfx_ip_list_rcv+0x10/0x10
[  242.168684][    C0]  __netif_receive_skb_list_core+0x5c2/0x830
[  242.169063][    C0]  ? __pfx___netif_receive_skb_list_core+0x10/0x10
[  242.169470][    C0]  ? lockdep_hardirqs_on_prepare+0x12b/0x3f0
[  242.169844][    C0]  ? kvm_clock_get_cycles+0x18/0x30
[  242.170174][    C0]  ? ktime_get_with_offset+0xe3/0x240
[  242.170520][    C0]  netif_receive_skb_list_internal+0x595/0xc70
[  242.170905][    C0]  ? __pfx_netif_receive_skb_list_internal+0x10/0x10
[  242.171321][    C0]  ? __kasan_mempool_unpoison_object+0x11e/0x1e0
[  242.171721][    C0]  ? napi_gro_receive+0x792/0x9d0
[  242.172040][    C0]  napi_complete_done+0x19d/0x700
[  242.172359][    C0]  ? __pfx_napi_complete_done+0x10/0x10
[  242.172709][    C0]  ? __pfx_e1000_clean_rx_irq+0x10/0x10
[  242.173061][    C0]  e1000_clean+0x85f/0x23e0
[  242.173347][    C0]  ? hlock_class+0x4e/0x130
[  242.173636][    C0]  ? __pfx_e1000_clean+0x10/0x10
[  242.173953][    C0]  __napi_poll.constprop.0+0x9b/0x430
[  242.174296][    C0]  net_rx_action+0x8d4/0xc90
[  242.174594][    C0]  ? __pfx_net_rx_action+0x10/0x10
[  242.174919][    C0]  ? handle_irq_event+0x10d/0x1c0
[  242.175246][    C0]  ? find_held_lock+0x33/0x120
[  242.175556][    C0]  ? hlock_class+0x4e/0x130
[  242.175855][    C0]  ? lock_release+0x5d5/0xb00
[  242.176157][    C0]  ? __pfx_lock_release+0x10/0x10
[  242.176478][    C0]  ? kvm_guest_apic_eoi_write+0x1e/0x40
[  242.176831][    C0]  handle_softirqs+0x1ae/0x770
[  242.177137][    C0]  irq_exit_rcu+0x94/0xc0
[  242.177413][    C0]  common_interrupt+0x7e/0x90
[  242.177716][    C0]  </IRQ>
[  242.177904][    C0]  <TASK>
[  242.178093][    C0]  asm_common_interrupt+0x26/0x40
[  242.178415][    C0] RIP: 0010:default_idle+0xf/0x20
[  242.178738][    C0] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 df 32 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[  242.179958][    C0] RSP: 0018:ffffffff85007e30 EFLAGS: 00000206
[  242.180341][    C0] RAX: 000000000007b609 RBX: 0000000000000000 RCX: ffffffff840f7d25
[  242.180849][    C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8125f83d
[  242.181357][    C0] RBP: dffffc0000000000 R08: 0000000000000001 R09: ffe21c000d83ecb8
[  242.181864][    C0] R10: ff1100006c1f65c3 R11: 0000000000000000 R12: ffffffff8595b388
[  242.182371][    C0] R13: 1ffffffff0a00fcb R14: 0000000000000000 R15: 0000000000000000
[  242.182877][    C0]  ? ct_kernel_exit.constprop.0+0xc5/0xf0
[  242.183240][    C0]  ? do_idle+0x2fd/0x3b0
[  242.183508][    C0]  default_idle_call+0x6d/0xb0
[  242.183811][    C0]  do_idle+0x2fd/0x3b0
[  242.184072][    C0]  ? __pfx_do_idle+0x10/0x10
[  242.184368][    C0]  cpu_startup_entry+0x4f/0x60
[  242.184673][    C0]  rest_init+0x139/0x200
[  242.184945][    C0]  ? acpi_subsystem_init+0x50/0x140
[  242.185280][    C0]  start_kernel+0x374/0x3f0
[  242.185573][    C0]  x86_64_start_reservations+0x18/0x30
[  242.185925][    C0]  x86_64_start_kernel+0xcf/0xe0
[  242.186236][    C0]  common_startup_64+0x12c/0x138
[  242.186551][    C0]  </TASK>
[  242.186740][    C0] irq event stamp: 505362
[  242.187010][    C0] hardirqs last  enabled at (505362): [<ffffffff81171ea1>] __local_bh_enable_ip+0xa1/0x110
[  242.187632][    C0] hardirqs last disabled at (505361): [<ffffffff81171eca>] __local_bh_enable_ip+0xca/0x110
[  242.188252][    C0] softirqs last  enabled at (505348): [<ffffffff81171a8c>] handle_softirqs+0x50c/0x770
[  242.188851][    C0] softirqs last disabled at (505355): [<ffffffff81172174>] irq_exit_rcu+0x94/0xc0
[  242.189424][    C0] ---[ end trace 0000000000000000 ]---
[  242.189778][    C0] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000018: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  242.190575][    C0] KASAN: null-ptr-deref in range [0x00000000000000c0-0x00000000000000c7]
[  242.191097][    C0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.12.0-rc4-01014-g835f27aa21c9 #25
[  242.191782][    C0] Tainted: [W]=WARN
[  242.192023][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  242.192788][    C0] RIP: 0010:__lock_acquire+0x4f0/0x3c30
[  242.193137][    C0] Code: 84 24 40 01 00 00 4c 89 f7 41 89 46 34 e8 e8 27 ff ff 48 ba 00 00 00 00 00 fc ff df 48 8d b8 c4 00 00 00 48 89 f9 48 c1 e9 03 <0f> b6 14 11 48 89 f9 83 e1 07 38 ca 7f 08 84 d2 0f 85 b6 29 00 00
[  242.194860][    C0] RSP: 0018:ff1100006c009148 EFLAGS: 00010007
[  242.195251][    C0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
[  242.195758][    C0] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 00000000000000c4
[  242.196253][    C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[  242.196750][    C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[  242.197245][    C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[  242.197742][    C0] FS:  0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[  242.198301][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  242.198715][    C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[  242.199212][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  242.199703][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  242.200200][    C0] PKRU: 55555554
[  242.200424][    C0] Call Trace:
[  242.200635][    C0]  <IRQ>
[  242.200816][    C0]  ? die_addr+0x3c/0xa0
[  242.201084][    C0]  ? exc_general_protection+0x14d/0x230
[  242.201433][    C0]  ? asm_exc_general_protection+0x26/0x30
[  242.201793][    C0]  ? __lock_acquire+0x4f0/0x3c30
[  242.202103][    C0]  ? __lock_acquire+0x4d8/0x3c30
[  242.202414][    C0]  ? find_held_lock+0x33/0x120
[  242.202715][    C0]  ? __pfx___lock_acquire+0x10/0x10
[  242.203040][    C0]  ? __pfx_lock_release+0x10/0x10
[  242.203355][    C0]  lock_acquire+0x196/0x520
[  242.203639][    C0]  ? tcp_v4_rcv+0x2695/0x3b70
[  242.203934][    C0]  ? __pfx_lock_acquire+0x10/0x10
[  242.204246][    C0]  ? __pfx_tcp_inbound_hash+0x10/0x10
[  242.204577][    C0]  ? __pfx_sk_filter_trim_cap+0x10/0x10
[  242.204923][    C0]  ? kasan_quarantine_put+0x84/0x1d0
[  242.205257][    C0]  _raw_spin_lock_nested+0x2e/0x70
[  242.205586][    C0]  ? tcp_v4_rcv+0x2695/0x3b70
[  242.205883][    C0]  tcp_v4_rcv+0x2695/0x3b70
[  242.206170][    C0]  ? __pfx_tcp_v4_rcv+0x10/0x10
[  242.206475][    C0]  ? ip_local_deliver_finish+0x204/0x490
[  242.206828][    C0]  ? __pfx_raw_local_deliver+0x10/0x10
[  242.207166][    C0]  ? hlock_class+0x4e/0x130
[  242.207446][    C0]  ? lock_release+0x5d5/0xb00
[  242.207736][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.208047][    C0]  ip_protocol_deliver_rcu+0x8e/0x330
[  242.208380][    C0]  ip_local_deliver_finish+0x2bd/0x490
[  242.208719][    C0]  ip_local_deliver+0x198/0x450
[  242.209021][    C0]  ? __pfx_ip_local_deliver+0x10/0x10
[  242.209351][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.209661][    C0]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[  242.210030][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.210342][    C0]  ? lock_is_held_type+0xa0/0x120
[  242.210655][    C0]  ip_sublist_rcv_finish+0x1e7/0x410
[  242.210982][    C0]  ip_sublist_rcv+0x380/0x760
[  242.211274][    C0]  ? __pfx_ip_sublist_rcv+0x10/0x10
[  242.211594][    C0]  ? hlock_class+0x4e/0x130
[  242.211873][    C0]  ? __lock_acquire+0x15e6/0x3c30
[  242.212186][    C0]  ? ip_fast_csum+0xc/0x20
[  242.212460][    C0]  ? ip_rcv_core+0x5dd/0xd30
[  242.212746][    C0]  ip_list_rcv+0x26e/0x380
[  242.213021][    C0]  ? __pfx_ip_list_rcv+0x10/0x10
[  242.213328][    C0]  __netif_receive_skb_list_core+0x5c2/0x830
[  242.213699][    C0]  ? __pfx___netif_receive_skb_list_core+0x10/0x10
[  242.214103][    C0]  ? lockdep_hardirqs_on_prepare+0x12b/0x3f0
[  242.214477][    C0]  ? kvm_clock_get_cycles+0x18/0x30
[  242.214807][    C0]  ? ktime_get_with_offset+0xe3/0x240
[  242.215139][    C0]  netif_receive_skb_list_internal+0x595/0xc70
[  242.215520][    C0]  ? __pfx_netif_receive_skb_list_internal+0x10/0x10
[  242.215934][    C0]  ? __kasan_mempool_unpoison_object+0x11e/0x1e0
[  242.216324][    C0]  ? napi_gro_receive+0x792/0x9d0
[  242.216633][    C0]  napi_complete_done+0x19d/0x700
[  242.216941][    C0]  ? __pfx_napi_complete_done+0x10/0x10
[  242.217280][    C0]  ? __pfx_e1000_clean_rx_irq+0x10/0x10
[  242.217618][    C0]  e1000_clean+0x85f/0x23e0
[  242.217897][    C0]  ? hlock_class+0x4e/0x130
[  242.218173][    C0]  ? __pfx_e1000_clean+0x10/0x10
[  242.218477][    C0]  __napi_poll.constprop.0+0x9b/0x430
[  242.218807][    C0]  net_rx_action+0x8d4/0xc90
[  242.219092][    C0]  ? __pfx_net_rx_action+0x10/0x10
[  242.219406][    C0]  ? handle_irq_event+0x10d/0x1c0
[  242.219719][    C0]  ? find_held_lock+0x33/0x120
[  242.220014][    C0]  ? hlock_class+0x4e/0x130
[  242.220293][    C0]  ? lock_release+0x5d5/0xb00
[  242.220581][    C0]  ? __pfx_lock_release+0x10/0x10
[  242.220887][    C0]  ? kvm_guest_apic_eoi_write+0x1e/0x40
[  242.221227][    C0]  handle_softirqs+0x1ae/0x770
[  242.221527][    C0]  irq_exit_rcu+0x94/0xc0
[  242.221793][    C0]  common_interrupt+0x7e/0x90
[  242.222081][    C0]  </IRQ>
[  242.222262][    C0]  <TASK>
[  242.222443][    C0]  asm_common_interrupt+0x26/0x40
[  242.222754][    C0] RIP: 0010:default_idle+0xf/0x20
[  242.223063][    C0] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 df 32 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[  242.224245][    C0] RSP: 0018:ffffffff85007e30 EFLAGS: 00000206
[  242.224619][    C0] RAX: 000000000007b609 RBX: 0000000000000000 RCX: ffffffff840f7d25
[  242.225113][    C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8125f83d
[  242.225596][    C0] RBP: dffffc0000000000 R08: 0000000000000001 R09: ffe21c000d83ecb8
[  242.226077][    C0] R10: ff1100006c1f65c3 R11: 0000000000000000 R12: ffffffff8595b388
[  242.226560][    C0] R13: 1ffffffff0a00fcb R14: 0000000000000000 R15: 0000000000000000
[  242.227044][    C0]  ? ct_kernel_exit.constprop.0+0xc5/0xf0
[  242.227393][    C0]  ? do_idle+0x2fd/0x3b0
[  242.227655][    C0]  default_idle_call+0x6d/0xb0
[  242.227948][    C0]  do_idle+0x2fd/0x3b0
[  242.228201][    C0]  ? __pfx_do_idle+0x10/0x10
[  242.228486][    C0]  cpu_startup_entry+0x4f/0x60
[  242.228781][    C0]  rest_init+0x139/0x200
[  242.229043][    C0]  ? acpi_subsystem_init+0x50/0x140
[  242.229361][    C0]  start_kernel+0x374/0x3f0
[  242.229637][    C0]  x86_64_start_reservations+0x18/0x30
[  242.229969][    C0]  x86_64_start_kernel+0xcf/0xe0
[  242.230268][    C0]  common_startup_64+0x12c/0x138
[  242.230570][    C0]  </TASK>
[  242.230757][    C0] Modules linked in: [last unloaded: cifs]
[  242.231113][    C0] ---[ end trace 0000000000000000 ]---
[  242.231443][    C0] RIP: 0010:__lock_acquire+0x4f0/0x3c30
[  242.231778][    C0] Code: 84 24 40 01 00 00 4c 89 f7 41 89 46 34 e8 e8 27 ff ff 48 ba 00 00 00 00 00 fc ff df 48 8d b8 c4 00 00 00 48 89 f9 48 c1 e9 03 <0f> b6 14 11 48 89 f9 83 e1 07 38 ca 7f 08 84 d2 0f 85 b6 29 00 00
[  242.232952][    C0] RSP: 0018:ff1100006c009148 EFLAGS: 00010007
[  242.233322][    C0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
[  242.233800][    C0] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 00000000000000c4
[  242.234277][    C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[  242.234756][    C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[  242.235239][    C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[  242.235721][    C0] FS:  0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[  242.236257][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  242.236655][    C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[  242.237140][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  242.237621][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  242.238101][    C0] PKRU: 55555554
[  242.238317][    C0] Kernel panic - not syncing: Fatal exception in interrupt
[  242.239006][    C0] Kernel Offset: disabled
[  242.239320][    C0] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---



> 
> 
> Any thoughts on his patch?  See below (and attached)
> 
>     Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
>     fixed a netns UAF by manually enabled socket refcounting
>     (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
> 
>     The reason the patch worked for that bug was because we now hold
>     references to the netns (get_net_track() gets a ref internally)
>     and they're properly released (internally, on __sk_destruct()),
>     but only because sk->sk_net_refcnt was set.
> 
>     Problem:
>     (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
>     if init_net or other)
> 
>     Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
>     only out of cifs scope, but also technically wrong -- it's set conditionally
>     based on user (=1) vs kernel (=0) sockets.  And net/ implementations
>     seem to base their user vs kernel space operations on it.
> 
>     e.g. upon TCP socket close, the TCP timers are not cleared because
>     sk->sk_net_refcnt=1:
>     (cf. commit 151c9c724d05 ("tcp: properly terminate timers for
> kernel sockets"))
> 
>     net/ipv4/tcp.c:
>         void tcp_close(struct sock *sk, long timeout)
>         {
>             lock_sock(sk);
>             __tcp_close(sk, timeout);
>             release_sock(sk);
>             if (!sk->sk_net_refcnt)
>                     inet_csk_clear_xmit_timers_sync(sk);
>             sock_put(sk);
>         }
> 
>     Which will throw a lockdep warning and then, as expected, deadlock on

I wondered how the deadlock happens when I read this, but it seems this
'deadlock' means literally 'dead' (freed) lock, not something like ABBA
deadlock.


>     tcp_write_timer().
> 
>     A way to reproduce this is by running the reproducer from ef7134c7fc48
>     and then 'rmmod cifs'.  A few seconds later, the deadlock/lockdep
>     warning shows up.
> 
>     Fix:
>     We shouldn't mess with socket internals ourselves, so do not set
>     sk_net_refcnt manually.
> 
>     Also change __sock_create() to sock_create_kern() for explicitness.
> 
>     As for non-init_net network namespaces, we deal with it the best way
>     we can -- hold an extra netns reference for server->ssocket and drop it
>     when it's released.  This ensures that the netns still exists whenever
>     we need to create/destroy server->ssocket, but is not directly tied to
>     it.
> 
>     Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-12-19  8:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAH2r5msqxcvHcbDt0x_eNpbdPxUhgFoOAPchZ16EBZeFhCdAKA@mail.gmail.com>
     [not found] ` <CAH2r5mtoz+4RSDLJijNFD6dRiLTWKou8m3M2mTp2cy7oPsP=Qg@mail.gmail.com>
2024-12-19  0:30   ` Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod Steve French
2024-12-19  8:41 ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).