* kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
@ 2025-01-25 20:44 Salvatore Bonaccorso
2025-01-25 22:55 ` Jeff Layton
0 siblings, 1 reply; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-25 20:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
Tom Talpey
Cc: linux-nfs, linux-kernel
Hi Chuck, Jeff, NFSD maintainers,
In Debian we got a report from a user which triggered an issue during
package updates hwere nfs-kernel-server restart was involved, then
hanging and included a kernel trace of a NULL pointer dereference.
The full report is at:
https://bugs.debian.org/1093734
While I was not able to trigger the issue, the provided log is as
follows:
2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0
2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G W 6.12.9-amd64 #1 Debian 6.12.9-1
2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS: 0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
2025-01-21T12:07:18.015582+01:00 $HOST kernel: <TASK>
2025-01-21T12:07:18.015583+01:00 $HOST kernel: ? __die_body.cold+0x19/0x27
2025-01-21T12:07:18.015584+01:00 $HOST kernel: ? page_fault_oops+0x15a/0x2d0
2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? exc_page_fault+0x7e/0x180
2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? asm_exc_page_fault+0x26/0x30
2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? srso_alias_return_thunk+0x5/0xfbef5
2025-01-21T12:07:18.015587+01:00 $HOST kernel: nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel: nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel: process_one_work+0x177/0x330
2025-01-21T12:07:18.015589+01:00 $HOST kernel: worker_thread+0x252/0x390
2025-01-21T12:07:18.015590+01:00 $HOST kernel: ? __pfx_worker_thread+0x10/0x10
2025-01-21T12:07:18.015611+01:00 $HOST kernel: kthread+0xd2/0x100
2025-01-21T12:07:18.015612+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015613+01:00 $HOST kernel: ret_from_fork+0x34/0x50
2025-01-21T12:07:18.015615+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015616+01:00 $HOST kernel: ret_from_fork_asm+0x1a/0x30
2025-01-21T12:07:18.015618+01:00 $HOST kernel: </TASK>
2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
2025-01-21T12:07:18.015622+01:00 $HOST kernel: async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---
The used kernel version from the user is 6.12.9 based.
Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
to struct nfsd_file") be related?
Regards,
Salvatore
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso @ 2025-01-25 22:55 ` Jeff Layton 2025-01-26 7:57 ` Salvatore Bonaccorso 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2025-01-25 22:55 UTC (permalink / raw) To: Salvatore Bonaccorso, Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey Cc: linux-nfs, linux-kernel On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > Hi Chuck, Jeff, NFSD maintainers, > > In Debian we got a report from a user which triggered an issue during > package updates hwere nfs-kernel-server restart was involved, then > hanging and included a kernel trace of a NULL pointer dereference. > > The full report is at: > https://bugs.debian.org/1093734 > > While I was not able to trigger the issue, the provided log is as > follows: > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 Thanks for the bug report. It's getting late here, so I can only take a quick look. svc_wake_up is pretty small: void svc_wake_up(struct svc_serv *serv) { struct svc_pool *pool = &serv->sv_pools[0]; set_bit(SP_TASK_PENDING, &pool->sp_flags); svc_pool_wake_idle_thread(pool); } pahole on my machine says that struct svc_serv has this at offset 0x90: struct svc_pool * sv_pools; /* 0x90 0x8 */ So it looks like the nn->nfsd_serv was a NULL pointer. That only happens when we shut down the server, so this looks like a race between filecache garbage collection with shutdown. The filecache gets shut down in nfsd_shutdown_net, which gets called _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look at whether we can reorder the NULL pointer setting to later, or work around this some other way. Could I trouble you to open a bug for this at bugzilla.kernel.org? > 2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode > 2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page > 2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0 > 2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > 2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G W 6.12.9-amd64 #1 Debian 6.12.9-1 > 2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN > 2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024 > 2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd] > 2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc] > 2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f > 2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286 > 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30 > 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000 > 2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28 > 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30 > 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8 > 2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS: 0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000 > 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0 > 2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554 > 2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace: > 2025-01-21T12:07:18.015582+01:00 $HOST kernel: <TASK> > 2025-01-21T12:07:18.015583+01:00 $HOST kernel: ? __die_body.cold+0x19/0x27 > 2025-01-21T12:07:18.015584+01:00 $HOST kernel: ? page_fault_oops+0x15a/0x2d0 > 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? exc_page_fault+0x7e/0x180 > 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? asm_exc_page_fault+0x26/0x30 > 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? svc_wake_up+0x9/0x20 [sunrpc] > 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? srso_alias_return_thunk+0x5/0xfbef5 > 2025-01-21T12:07:18.015587+01:00 $HOST kernel: nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd] > 2025-01-21T12:07:18.015588+01:00 $HOST kernel: nfsd_file_gc_worker+0x190/0x2c0 [nfsd] > 2025-01-21T12:07:18.015588+01:00 $HOST kernel: process_one_work+0x177/0x330 > 2025-01-21T12:07:18.015589+01:00 $HOST kernel: worker_thread+0x252/0x390 > 2025-01-21T12:07:18.015590+01:00 $HOST kernel: ? __pfx_worker_thread+0x10/0x10 > 2025-01-21T12:07:18.015611+01:00 $HOST kernel: kthread+0xd2/0x100 > 2025-01-21T12:07:18.015612+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10 > 2025-01-21T12:07:18.015613+01:00 $HOST kernel: ret_from_fork+0x34/0x50 > 2025-01-21T12:07:18.015615+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10 > 2025-01-21T12:07:18.015616+01:00 $HOST kernel: ret_from_fork_asm+0x1a/0x30 > 2025-01-21T12:07:18.015618+01:00 $HOST kernel: </TASK> > 2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy > 2025-01-21T12:07:18.015622+01:00 $HOST kernel: async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi > 2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090 > 2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]--- > > The used kernel version from the user is 6.12.9 based. > > Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc > to struct nfsd_file") be related? > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 2025-01-25 22:55 ` Jeff Layton @ 2025-01-26 7:57 ` Salvatore Bonaccorso 2025-01-26 12:06 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Salvatore Bonaccorso @ 2025-01-26 7:57 UTC (permalink / raw) To: Jeff Layton Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel Hi Jeff, On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote: > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > > Hi Chuck, Jeff, NFSD maintainers, > > > > In Debian we got a report from a user which triggered an issue during > > package updates hwere nfs-kernel-server restart was involved, then > > hanging and included a kernel trace of a NULL pointer dereference. > > > > The full report is at: > > https://bugs.debian.org/1093734 > > > > While I was not able to trigger the issue, the provided log is as > > follows: > > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 > > Thanks for the bug report. It's getting late here, so I can only take a > quick look. svc_wake_up is pretty small: > > void svc_wake_up(struct svc_serv *serv) > { > struct svc_pool *pool = &serv->sv_pools[0]; > > set_bit(SP_TASK_PENDING, &pool->sp_flags); > svc_pool_wake_idle_thread(pool); > } > > pahole on my machine says that struct svc_serv has this at offset 0x90: > > struct svc_pool * sv_pools; /* 0x90 0x8 */ > > So it looks like the nn->nfsd_serv was a NULL pointer. That only > happens when we shut down the server, so this looks like a race between > filecache garbage collection with shutdown. > > The filecache gets shut down in nfsd_shutdown_net, which gets called > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look > at whether we can reorder the NULL pointer setting to later, or work > around this some other way. > > Could I trouble you to open a bug for this at bugzilla.kernel.org? Thanks a lot for your quick response on it and the analysis. Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a patch already, do you still want me to do it? If so I try to reference as well all followups so that the information is not spread around threads. Thanks a lot for your work! Regards, Salvatore ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 2025-01-26 7:57 ` Salvatore Bonaccorso @ 2025-01-26 12:06 ` Jeff Layton 2025-01-26 12:54 ` Salvatore Bonaccorso 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2025-01-26 12:06 UTC (permalink / raw) To: Salvatore Bonaccorso Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote: > Hi Jeff, > > On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote: > > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > > > Hi Chuck, Jeff, NFSD maintainers, > > > > > > In Debian we got a report from a user which triggered an issue during > > > package updates hwere nfs-kernel-server restart was involved, then > > > hanging and included a kernel trace of a NULL pointer dereference. > > > > > > The full report is at: > > > https://bugs.debian.org/1093734 > > > > > > While I was not able to trigger the issue, the provided log is as > > > follows: > > > > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev > > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 > > > > Thanks for the bug report. It's getting late here, so I can only take a > > quick look. svc_wake_up is pretty small: > > > > void svc_wake_up(struct svc_serv *serv) > > { > > struct svc_pool *pool = &serv->sv_pools[0]; > > > > set_bit(SP_TASK_PENDING, &pool->sp_flags); > > svc_pool_wake_idle_thread(pool); > > } > > > > pahole on my machine says that struct svc_serv has this at offset 0x90: > > > > struct svc_pool * sv_pools; /* 0x90 0x8 */ > > > > So it looks like the nn->nfsd_serv was a NULL pointer. That only > > happens when we shut down the server, so this looks like a race between > > filecache garbage collection with shutdown. > > > > The filecache gets shut down in nfsd_shutdown_net, which gets called > > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look > > at whether we can reorder the NULL pointer setting to later, or work > > around this some other way. > > > > Could I trouble you to open a bug for this at bugzilla.kernel.org? > > Thanks a lot for your quick response on it and the analysis. > > Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a > patch already, do you still want me to do it? > > If so I try to reference as well all followups so that the information > is not spread around threads. > > Thanks a lot for your work! > I think you can skip the BZ for now. Thanks again for the bug report! -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 2025-01-26 12:06 ` Jeff Layton @ 2025-01-26 12:54 ` Salvatore Bonaccorso 0 siblings, 0 replies; 5+ messages in thread From: Salvatore Bonaccorso @ 2025-01-26 12:54 UTC (permalink / raw) To: Jeff Layton Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel Hi Jeff, On Sun, Jan 26, 2025 at 07:06:09AM -0500, Jeff Layton wrote: > On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote: > > Hi Jeff, > > > > On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote: > > > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > > > > Hi Chuck, Jeff, NFSD maintainers, > > > > > > > > In Debian we got a report from a user which triggered an issue during > > > > package updates hwere nfs-kernel-server restart was involved, then > > > > hanging and included a kernel trace of a NULL pointer dereference. > > > > > > > > The full report is at: > > > > https://bugs.debian.org/1093734 > > > > > > > > While I was not able to trigger the issue, the provided log is as > > > > follows: > > > > > > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > > > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > > > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev > > > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > > > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > > > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > > > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > > > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 > > > > > > Thanks for the bug report. It's getting late here, so I can only take a > > > quick look. svc_wake_up is pretty small: > > > > > > void svc_wake_up(struct svc_serv *serv) > > > { > > > struct svc_pool *pool = &serv->sv_pools[0]; > > > > > > set_bit(SP_TASK_PENDING, &pool->sp_flags); > > > svc_pool_wake_idle_thread(pool); > > > } > > > > > > pahole on my machine says that struct svc_serv has this at offset 0x90: > > > > > > struct svc_pool * sv_pools; /* 0x90 0x8 */ > > > > > > So it looks like the nn->nfsd_serv was a NULL pointer. That only > > > happens when we shut down the server, so this looks like a race between > > > filecache garbage collection with shutdown. > > > > > > The filecache gets shut down in nfsd_shutdown_net, which gets called > > > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look > > > at whether we can reorder the NULL pointer setting to later, or work > > > around this some other way. > > > > > > Could I trouble you to open a bug for this at bugzilla.kernel.org? > > > > Thanks a lot for your quick response on it and the analysis. > > > > Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a > > patch already, do you still want me to do it? > > > > If so I try to reference as well all followups so that the information > > is not spread around threads. > > > > Thanks a lot for your work! > > > > I think you can skip the BZ for now. Ok then I leave the bugzilla bug filling step off. thanks again for your hard work on the NFS front! Regards, Salvatore ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-26 12:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso 2025-01-25 22:55 ` Jeff Layton 2025-01-26 7:57 ` Salvatore Bonaccorso 2025-01-26 12:06 ` Jeff Layton 2025-01-26 12:54 ` Salvatore Bonaccorso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox