* kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
@ 2025-01-25 20:44 Salvatore Bonaccorso
2025-01-25 22:55 ` Jeff Layton
0 siblings, 1 reply; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-25 20:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
Tom Talpey
Cc: linux-nfs, linux-kernel
Hi Chuck, Jeff, NFSD maintainers,
In Debian we got a report from a user which triggered an issue during
package updates hwere nfs-kernel-server restart was involved, then
hanging and included a kernel trace of a NULL pointer dereference.
The full report is at:
https://bugs.debian.org/1093734
While I was not able to trigger the issue, the provided log is as
follows:
2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0
2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G W 6.12.9-amd64 #1 Debian 6.12.9-1
2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS: 0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
2025-01-21T12:07:18.015582+01:00 $HOST kernel: <TASK>
2025-01-21T12:07:18.015583+01:00 $HOST kernel: ? __die_body.cold+0x19/0x27
2025-01-21T12:07:18.015584+01:00 $HOST kernel: ? page_fault_oops+0x15a/0x2d0
2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? exc_page_fault+0x7e/0x180
2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? asm_exc_page_fault+0x26/0x30
2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? srso_alias_return_thunk+0x5/0xfbef5
2025-01-21T12:07:18.015587+01:00 $HOST kernel: nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel: nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel: process_one_work+0x177/0x330
2025-01-21T12:07:18.015589+01:00 $HOST kernel: worker_thread+0x252/0x390
2025-01-21T12:07:18.015590+01:00 $HOST kernel: ? __pfx_worker_thread+0x10/0x10
2025-01-21T12:07:18.015611+01:00 $HOST kernel: kthread+0xd2/0x100
2025-01-21T12:07:18.015612+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015613+01:00 $HOST kernel: ret_from_fork+0x34/0x50
2025-01-21T12:07:18.015615+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015616+01:00 $HOST kernel: ret_from_fork_asm+0x1a/0x30
2025-01-21T12:07:18.015618+01:00 $HOST kernel: </TASK>
2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
2025-01-21T12:07:18.015622+01:00 $HOST kernel: async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---
The used kernel version from the user is 6.12.9 based.
Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
to struct nfsd_file") be related?
Regards,
Salvatore
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso
@ 2025-01-25 22:55 ` Jeff Layton
2025-01-26 7:57 ` Salvatore Bonaccorso
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-01-25 22:55 UTC (permalink / raw)
To: Salvatore Bonaccorso, Chuck Lever, Neil Brown, Olga Kornievskaia,
Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel
On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> Hi Chuck, Jeff, NFSD maintainers,
>
> In Debian we got a report from a user which triggered an issue during
> package updates hwere nfs-kernel-server restart was involved, then
> hanging and included a kernel trace of a NULL pointer dereference.
>
> The full report is at:
> https://bugs.debian.org/1093734
>
> While I was not able to trigger the issue, the provided log is as
> follows:
>
> 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
Thanks for the bug report. It's getting late here, so I can only take a
quick look. svc_wake_up is pretty small:
void svc_wake_up(struct svc_serv *serv)
{
struct svc_pool *pool = &serv->sv_pools[0];
set_bit(SP_TASK_PENDING, &pool->sp_flags);
svc_pool_wake_idle_thread(pool);
}
pahole on my machine says that struct svc_serv has this at offset 0x90:
struct svc_pool * sv_pools; /* 0x90 0x8 */
So it looks like the nn->nfsd_serv was a NULL pointer. That only
happens when we shut down the server, so this looks like a race between
filecache garbage collection with shutdown.
The filecache gets shut down in nfsd_shutdown_net, which gets called
_after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
at whether we can reorder the NULL pointer setting to later, or work
around this some other way.
Could I trouble you to open a bug for this at bugzilla.kernel.org?
> 2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
> 2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
> 2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0
> 2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> 2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G W 6.12.9-amd64 #1 Debian 6.12.9-1
> 2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
> 2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
> 2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
> 2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
> 2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS: 0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
> 2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel: <TASK>
> 2025-01-21T12:07:18.015583+01:00 $HOST kernel: ? __die_body.cold+0x19/0x27
> 2025-01-21T12:07:18.015584+01:00 $HOST kernel: ? page_fault_oops+0x15a/0x2d0
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? exc_page_fault+0x7e/0x180
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? asm_exc_page_fault+0x26/0x30
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> 2025-01-21T12:07:18.015587+01:00 $HOST kernel: nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel: nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel: process_one_work+0x177/0x330
> 2025-01-21T12:07:18.015589+01:00 $HOST kernel: worker_thread+0x252/0x390
> 2025-01-21T12:07:18.015590+01:00 $HOST kernel: ? __pfx_worker_thread+0x10/0x10
> 2025-01-21T12:07:18.015611+01:00 $HOST kernel: kthread+0xd2/0x100
> 2025-01-21T12:07:18.015612+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015613+01:00 $HOST kernel: ret_from_fork+0x34/0x50
> 2025-01-21T12:07:18.015615+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015616+01:00 $HOST kernel: ret_from_fork_asm+0x1a/0x30
> 2025-01-21T12:07:18.015618+01:00 $HOST kernel: </TASK>
> 2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
> 2025-01-21T12:07:18.015622+01:00 $HOST kernel: async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
> 2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
> 2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---
>
> The used kernel version from the user is 6.12.9 based.
>
> Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
> to struct nfsd_file") be related?
>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
2025-01-25 22:55 ` Jeff Layton
@ 2025-01-26 7:57 ` Salvatore Bonaccorso
2025-01-26 12:06 ` Jeff Layton
0 siblings, 1 reply; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-26 7:57 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs, linux-kernel
Hi Jeff,
On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > Hi Chuck, Jeff, NFSD maintainers,
> >
> > In Debian we got a report from a user which triggered an issue during
> > package updates hwere nfs-kernel-server restart was involved, then
> > hanging and included a kernel trace of a NULL pointer dereference.
> >
> > The full report is at:
> > https://bugs.debian.org/1093734
> >
> > While I was not able to trigger the issue, the provided log is as
> > follows:
> >
> > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
>
> Thanks for the bug report. It's getting late here, so I can only take a
> quick look. svc_wake_up is pretty small:
>
> void svc_wake_up(struct svc_serv *serv)
> {
> struct svc_pool *pool = &serv->sv_pools[0];
>
> set_bit(SP_TASK_PENDING, &pool->sp_flags);
> svc_pool_wake_idle_thread(pool);
> }
>
> pahole on my machine says that struct svc_serv has this at offset 0x90:
>
> struct svc_pool * sv_pools; /* 0x90 0x8 */
>
> So it looks like the nn->nfsd_serv was a NULL pointer. That only
> happens when we shut down the server, so this looks like a race between
> filecache garbage collection with shutdown.
>
> The filecache gets shut down in nfsd_shutdown_net, which gets called
> _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> at whether we can reorder the NULL pointer setting to later, or work
> around this some other way.
>
> Could I trouble you to open a bug for this at bugzilla.kernel.org?
Thanks a lot for your quick response on it and the analysis.
Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
patch already, do you still want me to do it?
If so I try to reference as well all followups so that the information
is not spread around threads.
Thanks a lot for your work!
Regards,
Salvatore
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
2025-01-26 7:57 ` Salvatore Bonaccorso
@ 2025-01-26 12:06 ` Jeff Layton
2025-01-26 12:54 ` Salvatore Bonaccorso
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-01-26 12:06 UTC (permalink / raw)
To: Salvatore Bonaccorso
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs, linux-kernel
On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote:
> Hi Jeff,
>
> On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > > Hi Chuck, Jeff, NFSD maintainers,
> > >
> > > In Debian we got a report from a user which triggered an issue during
> > > package updates hwere nfs-kernel-server restart was involved, then
> > > hanging and included a kernel trace of a NULL pointer dereference.
> > >
> > > The full report is at:
> > > https://bugs.debian.org/1093734
> > >
> > > While I was not able to trigger the issue, the provided log is as
> > > follows:
> > >
> > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
> >
> > Thanks for the bug report. It's getting late here, so I can only take a
> > quick look. svc_wake_up is pretty small:
> >
> > void svc_wake_up(struct svc_serv *serv)
> > {
> > struct svc_pool *pool = &serv->sv_pools[0];
> >
> > set_bit(SP_TASK_PENDING, &pool->sp_flags);
> > svc_pool_wake_idle_thread(pool);
> > }
> >
> > pahole on my machine says that struct svc_serv has this at offset 0x90:
> >
> > struct svc_pool * sv_pools; /* 0x90 0x8 */
> >
> > So it looks like the nn->nfsd_serv was a NULL pointer. That only
> > happens when we shut down the server, so this looks like a race between
> > filecache garbage collection with shutdown.
> >
> > The filecache gets shut down in nfsd_shutdown_net, which gets called
> > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> > at whether we can reorder the NULL pointer setting to later, or work
> > around this some other way.
> >
> > Could I trouble you to open a bug for this at bugzilla.kernel.org?
>
> Thanks a lot for your quick response on it and the analysis.
>
> Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
> patch already, do you still want me to do it?
>
> If so I try to reference as well all followups so that the information
> is not spread around threads.
>
> Thanks a lot for your work!
>
I think you can skip the BZ for now.
Thanks again for the bug report!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
2025-01-26 12:06 ` Jeff Layton
@ 2025-01-26 12:54 ` Salvatore Bonaccorso
0 siblings, 0 replies; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-26 12:54 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs, linux-kernel
Hi Jeff,
On Sun, Jan 26, 2025 at 07:06:09AM -0500, Jeff Layton wrote:
> On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote:
> > Hi Jeff,
> >
> > On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> > > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > > > Hi Chuck, Jeff, NFSD maintainers,
> > > >
> > > > In Debian we got a report from a user which triggered an issue during
> > > > package updates hwere nfs-kernel-server restart was involved, then
> > > > hanging and included a kernel trace of a NULL pointer dereference.
> > > >
> > > > The full report is at:
> > > > https://bugs.debian.org/1093734
> > > >
> > > > While I was not able to trigger the issue, the provided log is as
> > > > follows:
> > > >
> > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
> > >
> > > Thanks for the bug report. It's getting late here, so I can only take a
> > > quick look. svc_wake_up is pretty small:
> > >
> > > void svc_wake_up(struct svc_serv *serv)
> > > {
> > > struct svc_pool *pool = &serv->sv_pools[0];
> > >
> > > set_bit(SP_TASK_PENDING, &pool->sp_flags);
> > > svc_pool_wake_idle_thread(pool);
> > > }
> > >
> > > pahole on my machine says that struct svc_serv has this at offset 0x90:
> > >
> > > struct svc_pool * sv_pools; /* 0x90 0x8 */
> > >
> > > So it looks like the nn->nfsd_serv was a NULL pointer. That only
> > > happens when we shut down the server, so this looks like a race between
> > > filecache garbage collection with shutdown.
> > >
> > > The filecache gets shut down in nfsd_shutdown_net, which gets called
> > > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> > > at whether we can reorder the NULL pointer setting to later, or work
> > > around this some other way.
> > >
> > > Could I trouble you to open a bug for this at bugzilla.kernel.org?
> >
> > Thanks a lot for your quick response on it and the analysis.
> >
> > Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
> > patch already, do you still want me to do it?
> >
> > If so I try to reference as well all followups so that the information
> > is not spread around threads.
> >
> > Thanks a lot for your work!
> >
>
> I think you can skip the BZ for now.
Ok then I leave the bugzilla bug filling step off.
thanks again for your hard work on the NFS front!
Regards,
Salvatore
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-26 12:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso
2025-01-25 22:55 ` Jeff Layton
2025-01-26 7:57 ` Salvatore Bonaccorso
2025-01-26 12:06 ` Jeff Layton
2025-01-26 12:54 ` Salvatore Bonaccorso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox