netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steve French <sfrench@samba.org>,
	Enzo Matsumiya <ematsumiya@suse.de>,
	Wang Zhaolong <wangzhaolong1@huawei.com>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	netdev@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2 net] net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.
Date: Mon, 7 Apr 2025 11:45:59 +0100	[thread overview]
Message-ID: <20250407104559.GB395307@horms.kernel.org> (raw)
In-Reply-To: <20250403203619.36648-1-kuniyu@amazon.com>

On Thu, Apr 03, 2025 at 01:34:31PM -0700, Kuniyuki Iwashima wrote:
> When I ran the repro [0] and waited a few seconds, I observed two
> LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1]
> 
> Reproduction Steps:
> 
>   1) Mount CIFS
>   2) Add an iptables rule to drop incoming FIN packets for CIFS
>   3) Unmount CIFS
>   4) Unload the CIFS module
>   5) Remove the iptables rule
> 
> At step 3), the CIFS module calls sock_release() for the underlying
> TCP socket, and it returns quickly.  However, the socket remains in
> FIN_WAIT_1 because incoming FIN packets are dropped.
> 
> At this point, the module's refcnt is 0 while the socket is still
> alive, so the following rmmod command succeeds.
> 
>   # ss -tan
>   State      Recv-Q Send-Q Local Address:Port  Peer Address:Port
>   FIN-WAIT-1 0      477        10.0.2.15:51062   10.0.0.137:445
> 
>   # lsmod | grep cifs
>   cifs                 1159168  0
> 
> This highlights a discrepancy between the lifetime of the CIFS module
> and the underlying TCP socket.  Even after CIFS calls sock_release()
> and it returns, the TCP socket does not die immediately in order to
> close the connection gracefully.
> 
> While this is generally fine, it causes an issue with LOCKDEP because
> CIFS assigns a different lock class to the TCP socket's sk->sk_lock
> using sock_lock_init_class_and_name().
> 
> Once an incoming packet is processed for the socket or a timer fires,
> sk->sk_lock is acquired.
> 
> Then, LOCKDEP checks the lock context in check_wait_context(), where
> hlock_class() is called to retrieve the lock class.  However, since
> the module has already been unloaded, hlock_class() logs a warning
> and returns NULL, triggering the null-ptr-deref.
> 
> If LOCKDEP is enabled, we must ensure that a module calling
> sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded
> while such a socket is still alive to prevent this issue.
> 
> Let's hold the module reference in sock_lock_init_class_and_name()
> and release it when the socket is freed in __sk_destruct().
> 
> Note that sock_lock_init() clears sk->sk_owner for svc_create_socket()
> that calls sock_lock_init_class_and_name() for a listening socket,
> which clones a socket by sk_clone_lock() without GFP_ZERO.
> 
> [0]:
> CIFS_SERVER="10.0.0.137"
> CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST"
> DEV="enp0s3"
> CRED="/root/WindowsCredential.txt"
> 
> MNT=$(mktemp -d /tmp/XXXXXX)
> mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1
> 
> iptables -A INPUT -s ${CIFS_SERVER} -j DROP
> 
> for i in $(seq 10);
> do
>     umount ${MNT}
>     rmmod cifs
>     sleep 1
> done
> 
> rm -r ${MNT}
> 
> iptables -D INPUT -s ${CIFS_SERVER} -j DROP
> 
> [1]:
> DEBUG_LOCKS_WARN_ON(1)
> WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
> CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
> ...
> Call Trace:
>  <IRQ>
>  __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178)
>  lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
>  _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
>  tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
> ...
> 
> BUG: kernel NULL pointer dereference, address: 00000000000000c4
>  PF: supervisor read access in kernel mode
>  PF: error_code(0x0000) - not-present page
> PGD 0
> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G        W          6.14.0 #36
> Tainted: [W]=WARN
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178)
> Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44
> RSP: 0018:ffa0000000468a10 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
> RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88
> RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
> R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
> R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
> FS:  0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <IRQ>
>  lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
>  _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
>  tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
>  ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
>  ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234)
>  ip_sublist_rcv_finish (net/ipv4/ip_input.c:576)
>  ip_list_rcv_finish (net/ipv4/ip_input.c:628)
>  ip_list_rcv (net/ipv4/ip_input.c:670)
>  __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986)
>  netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129)
>  napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496)
>  e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815)
>  __napi_poll.constprop.0 (net/core/dev.c:7191)
>  net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382)
>  handle_softirqs (kernel/softirq.c:561)
>  __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662)
>  irq_exit_rcu (kernel/softirq.c:680)
>  common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14))
>   </IRQ>
>  <TASK>
>  asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
> RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744)
> Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202
> RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5
> RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
>  default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
>  do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
>  cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1))
>  start_secondary (arch/x86/kernel/smpboot.c:315)
>  common_startup_64 (arch/x86/kernel/head_64.S:421)
>  </TASK>
> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
> CR2: 00000000000000c4
> 
> Fixes: ed07536ed673 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: stable@vger.kernel.org
> ---
> v2:
>   * Clear sk_owner in sock_lock_init()
>   * Define helper under the same #if guard
>   * Remove redundant null check before module_put()
> 
> v1: https://lore.kernel.org/netdev/20250403020837.51664-1-kuniyu@amazon.com/
> ---
>  include/net/sock.h | 38 ++++++++++++++++++++++++++++++++++++--
>  net/core/sock.c    |  4 ++++
>  2 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8daf1b3b12c6..4216d7d86150 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -547,6 +547,10 @@ struct sock {
>  	struct rcu_head		sk_rcu;
>  	netns_tracker		ns_tracker;
>  	struct xarray		sk_user_frags;
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> +	struct module		*sk_owner;
> +#endif

Not a proper review, but FWIIW, sk_owner should be added to the Kernel doc
for struct sock.

>  };
>  
>  struct sock_bh_locked {

...

  reply	other threads:[~2025-04-07 10:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 20:34 [PATCH v2 net] net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod Kuniyuki Iwashima
2025-04-07 10:45 ` Simon Horman [this message]
2025-04-07 15:46   ` Kuniyuki Iwashima
2025-04-07 11:21 ` Eric Dumazet
2025-04-07 15:52   ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250407104559.GB395307@horms.kernel.org \
    --to=horms@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ematsumiya@suse.de \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sfrench@samba.org \
    --cc=stable@vger.kernel.org \
    --cc=wangzhaolong1@huawei.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).