public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
@ 2025-01-25 20:44 Salvatore Bonaccorso
  2025-01-25 22:55 ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-25 20:44 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton, Neil Brown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey
  Cc: linux-nfs, linux-kernel

Hi Chuck, Jeff, NFSD maintainers,

In Debian we got a report from a user which triggered an issue during
package updates hwere nfs-kernel-server restart was involved, then
hanging and included a kernel trace of a NULL pointer dereference.

The full report is at:
https://bugs.debian.org/1093734

While I was not able to trigger the issue, the provided log is as
follows:

2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0 
2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G        W          6.12.9-amd64 #1  Debian 6.12.9-1
2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS:  0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
2025-01-21T12:07:18.015582+01:00 $HOST kernel:  <TASK>
2025-01-21T12:07:18.015583+01:00 $HOST kernel:  ? __die_body.cold+0x19/0x27
2025-01-21T12:07:18.015584+01:00 $HOST kernel:  ? page_fault_oops+0x15a/0x2d0
2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? exc_page_fault+0x7e/0x180
2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? asm_exc_page_fault+0x26/0x30
2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? svc_wake_up+0x9/0x20 [sunrpc]
2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
2025-01-21T12:07:18.015587+01:00 $HOST kernel:  nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel:  nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
2025-01-21T12:07:18.015588+01:00 $HOST kernel:  process_one_work+0x177/0x330
2025-01-21T12:07:18.015589+01:00 $HOST kernel:  worker_thread+0x252/0x390
2025-01-21T12:07:18.015590+01:00 $HOST kernel:  ? __pfx_worker_thread+0x10/0x10
2025-01-21T12:07:18.015611+01:00 $HOST kernel:  kthread+0xd2/0x100
2025-01-21T12:07:18.015612+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015613+01:00 $HOST kernel:  ret_from_fork+0x34/0x50
2025-01-21T12:07:18.015615+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
2025-01-21T12:07:18.015616+01:00 $HOST kernel:  ret_from_fork_asm+0x1a/0x30
2025-01-21T12:07:18.015618+01:00 $HOST kernel:  </TASK>
2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
2025-01-21T12:07:18.015622+01:00 $HOST kernel:  async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---

The used kernel version from the user is 6.12.9 based.

Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
to struct nfsd_file") be related?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
  2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso
@ 2025-01-25 22:55 ` Jeff Layton
  2025-01-26  7:57   ` Salvatore Bonaccorso
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-01-25 22:55 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Chuck Lever, Neil Brown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-kernel

On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> Hi Chuck, Jeff, NFSD maintainers,
> 
> In Debian we got a report from a user which triggered an issue during
> package updates hwere nfs-kernel-server restart was involved, then
> hanging and included a kernel trace of a NULL pointer dereference.
> 
> The full report is at:
> https://bugs.debian.org/1093734
> 
> While I was not able to trigger the issue, the provided log is as
> follows:
> 
> 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090

Thanks for the bug report. It's getting late here, so I can only take a
quick look. svc_wake_up is pretty small:

void svc_wake_up(struct svc_serv *serv)
{
        struct svc_pool *pool = &serv->sv_pools[0];

        set_bit(SP_TASK_PENDING, &pool->sp_flags);
        svc_pool_wake_idle_thread(pool);
}

pahole on my machine says that struct svc_serv has this at offset 0x90:

	struct svc_pool *          sv_pools;             /*  0x90   0x8 */

So it looks like the nn->nfsd_serv was a NULL pointer. That only
happens when we shut down the server, so this looks like a race between
filecache garbage collection with shutdown.

The filecache gets shut down in nfsd_shutdown_net, which gets called
_after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
at whether we can reorder the NULL pointer setting to later, or work
around this some other way.

Could I trouble you to open a bug for this at bugzilla.kernel.org?

> 2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
> 2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
> 2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0 
> 2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> 2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G        W          6.12.9-amd64 #1  Debian 6.12.9-1
> 2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
> 2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
> 2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
> 2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
> 2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS:  0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
> 2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel:  <TASK>
> 2025-01-21T12:07:18.015583+01:00 $HOST kernel:  ? __die_body.cold+0x19/0x27
> 2025-01-21T12:07:18.015584+01:00 $HOST kernel:  ? page_fault_oops+0x15a/0x2d0
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? exc_page_fault+0x7e/0x180
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? asm_exc_page_fault+0x26/0x30
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> 2025-01-21T12:07:18.015587+01:00 $HOST kernel:  nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel:  nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel:  process_one_work+0x177/0x330
> 2025-01-21T12:07:18.015589+01:00 $HOST kernel:  worker_thread+0x252/0x390
> 2025-01-21T12:07:18.015590+01:00 $HOST kernel:  ? __pfx_worker_thread+0x10/0x10
> 2025-01-21T12:07:18.015611+01:00 $HOST kernel:  kthread+0xd2/0x100
> 2025-01-21T12:07:18.015612+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015613+01:00 $HOST kernel:  ret_from_fork+0x34/0x50
> 2025-01-21T12:07:18.015615+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015616+01:00 $HOST kernel:  ret_from_fork_asm+0x1a/0x30
> 2025-01-21T12:07:18.015618+01:00 $HOST kernel:  </TASK>
> 2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
> 2025-01-21T12:07:18.015622+01:00 $HOST kernel:  async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
> 2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
> 2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---
> 
> The used kernel version from the user is 6.12.9 based.
> 
> Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
> to struct nfsd_file") be related?
> 



-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
  2025-01-25 22:55 ` Jeff Layton
@ 2025-01-26  7:57   ` Salvatore Bonaccorso
  2025-01-26 12:06     ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-26  7:57 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

Hi Jeff,

On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > Hi Chuck, Jeff, NFSD maintainers,
> > 
> > In Debian we got a report from a user which triggered an issue during
> > package updates hwere nfs-kernel-server restart was involved, then
> > hanging and included a kernel trace of a NULL pointer dereference.
> > 
> > The full report is at:
> > https://bugs.debian.org/1093734
> > 
> > While I was not able to trigger the issue, the provided log is as
> > follows:
> > 
> > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
> 
> Thanks for the bug report. It's getting late here, so I can only take a
> quick look. svc_wake_up is pretty small:
> 
> void svc_wake_up(struct svc_serv *serv)
> {
>         struct svc_pool *pool = &serv->sv_pools[0];
> 
>         set_bit(SP_TASK_PENDING, &pool->sp_flags);
>         svc_pool_wake_idle_thread(pool);
> }
> 
> pahole on my machine says that struct svc_serv has this at offset 0x90:
> 
> 	struct svc_pool *          sv_pools;             /*  0x90   0x8 */
> 
> So it looks like the nn->nfsd_serv was a NULL pointer. That only
> happens when we shut down the server, so this looks like a race between
> filecache garbage collection with shutdown.
> 
> The filecache gets shut down in nfsd_shutdown_net, which gets called
> _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> at whether we can reorder the NULL pointer setting to later, or work
> around this some other way.
> 
> Could I trouble you to open a bug for this at bugzilla.kernel.org?

Thanks a lot for your quick response on it and the analysis.

Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
patch already, do you still want me to do it?

If so I try to reference as well all followups so that the information
is not spread around threads.

Thanks a lot for your work!

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
  2025-01-26  7:57   ` Salvatore Bonaccorso
@ 2025-01-26 12:06     ` Jeff Layton
  2025-01-26 12:54       ` Salvatore Bonaccorso
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-01-26 12:06 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote:
> Hi Jeff,
> 
> On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > > Hi Chuck, Jeff, NFSD maintainers,
> > > 
> > > In Debian we got a report from a user which triggered an issue during
> > > package updates hwere nfs-kernel-server restart was involved, then
> > > hanging and included a kernel trace of a NULL pointer dereference.
> > > 
> > > The full report is at:
> > > https://bugs.debian.org/1093734
> > > 
> > > While I was not able to trigger the issue, the provided log is as
> > > follows:
> > > 
> > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
> > 
> > Thanks for the bug report. It's getting late here, so I can only take a
> > quick look. svc_wake_up is pretty small:
> > 
> > void svc_wake_up(struct svc_serv *serv)
> > {
> >         struct svc_pool *pool = &serv->sv_pools[0];
> > 
> >         set_bit(SP_TASK_PENDING, &pool->sp_flags);
> >         svc_pool_wake_idle_thread(pool);
> > }
> > 
> > pahole on my machine says that struct svc_serv has this at offset 0x90:
> > 
> > 	struct svc_pool *          sv_pools;             /*  0x90   0x8 */
> > 
> > So it looks like the nn->nfsd_serv was a NULL pointer. That only
> > happens when we shut down the server, so this looks like a race between
> > filecache garbage collection with shutdown.
> > 
> > The filecache gets shut down in nfsd_shutdown_net, which gets called
> > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> > at whether we can reorder the NULL pointer setting to later, or work
> > around this some other way.
> > 
> > Could I trouble you to open a bug for this at bugzilla.kernel.org?
> 
> Thanks a lot for your quick response on it and the analysis.
> 
> Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
> patch already, do you still want me to do it?
>
> If so I try to reference as well all followups so that the information
> is not spread around threads.
> 
> Thanks a lot for your work!
> 

I think you can skip the BZ for now.

Thanks again for the bug report!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20
  2025-01-26 12:06     ` Jeff Layton
@ 2025-01-26 12:54       ` Salvatore Bonaccorso
  0 siblings, 0 replies; 5+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-26 12:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

Hi Jeff,

On Sun, Jan 26, 2025 at 07:06:09AM -0500, Jeff Layton wrote:
> On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote:
> > Hi Jeff,
> > 
> > On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote:
> > > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> > > > Hi Chuck, Jeff, NFSD maintainers,
> > > > 
> > > > In Debian we got a report from a user which triggered an issue during
> > > > package updates hwere nfs-kernel-server restart was involved, then
> > > > hanging and included a kernel trace of a NULL pointer dereference.
> > > > 
> > > > The full report is at:
> > > > https://bugs.debian.org/1093734
> > > > 
> > > > While I was not able to trigger the issue, the provided log is as
> > > > follows:
> > > > 
> > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> > > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
> > > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> > > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> > > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> > > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> > > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
> > > 
> > > Thanks for the bug report. It's getting late here, so I can only take a
> > > quick look. svc_wake_up is pretty small:
> > > 
> > > void svc_wake_up(struct svc_serv *serv)
> > > {
> > >         struct svc_pool *pool = &serv->sv_pools[0];
> > > 
> > >         set_bit(SP_TASK_PENDING, &pool->sp_flags);
> > >         svc_pool_wake_idle_thread(pool);
> > > }
> > > 
> > > pahole on my machine says that struct svc_serv has this at offset 0x90:
> > > 
> > > 	struct svc_pool *          sv_pools;             /*  0x90   0x8 */
> > > 
> > > So it looks like the nn->nfsd_serv was a NULL pointer. That only
> > > happens when we shut down the server, so this looks like a race between
> > > filecache garbage collection with shutdown.
> > > 
> > > The filecache gets shut down in nfsd_shutdown_net, which gets called
> > > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
> > > at whether we can reorder the NULL pointer setting to later, or work
> > > around this some other way.
> > > 
> > > Could I trouble you to open a bug for this at bugzilla.kernel.org?
> > 
> > Thanks a lot for your quick response on it and the analysis.
> > 
> > Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a
> > patch already, do you still want me to do it?
> >
> > If so I try to reference as well all followups so that the information
> > is not spread around threads.
> > 
> > Thanks a lot for your work!
> > 
> 
> I think you can skip the BZ for now.

Ok then I leave the bugzilla bug filling step off.

thanks again for your hard work on the NFS front!

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-26 12:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-25 20:44 kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20 Salvatore Bonaccorso
2025-01-25 22:55 ` Jeff Layton
2025-01-26  7:57   ` Salvatore Bonaccorso
2025-01-26 12:06     ` Jeff Layton
2025-01-26 12:54       ` Salvatore Bonaccorso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox