* [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
@ 2025-11-13 18:29 Salvatore Bonaccorso
2025-11-14 6:03 ` Naman Jain
0 siblings, 1 reply; 8+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-13 18:29 UTC (permalink / raw)
To: Naman Jain, John Starks, Michael Kelley, Long Li, Tianyu Lan,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman, Peter Morrow
Cc: 1120602, linux-hyperv, linux-kernel, regressions, stable
Peter Morrow reported in Debian a regression, reported in
https://bugs.debian.org/1120602 . The regression was seen after
updating, to 6.12.57-1 in Debian, but details on the offending commit
follows.
His report was as follows:
> Dear Maintainer,
>
> I'm seeing a kernel crash quite soon after boot on a debian trixie based
> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
> I can access the system to gather more information. Thus I'll provide details
> of the system using a previously known good version. The panic is happening
> 100% of the time unfortunately. I have access to the serial console however
> so can enable any required verbose logging during boot if necessary.
>
> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
> same userspace. We had pinned to that version until very recently to in order
> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
>
> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
>
> The only relevant upstream commit in this area (as far as I can see) is:
>
> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
>
> The comment regarding avoiding races at start adds a bit more weight behind this
> hunch, though it's only a hunch as I am most definitely nowhere near an expert
> in this area.
>
> -- Package-specific info:
>
> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [ 19.628874] #PF: supervisor read access in kernel mode
> [ 19.630841] #PF: error_code(0x0000) - not-present page
> [ 19.632788] PGD 0 P4D 0
> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> [ 19.679385] Call Trace:
> [ 19.680361] <IRQ>
> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
> [ 19.686665] </IRQ>
> [ 19.687509] <TASK>
> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 19.715173] default_idle+0x9/0x20
> [ 19.716846] default_idle_call+0x29/0x100
> [ 19.718623] do_idle+0x1fe/0x240
> [ 19.720045] cpu_startup_entry+0x29/0x30
> [ 19.721595] start_secondary+0x11e/0x140
> [ 19.723080] common_startup_64+0x13e/0x141
> [ 19.725222] </TASK>
> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
> [ 19.726466] crc32_pclmul crc32c_intel
> [ 19.765771] CR2: 00000000000000a0
> [ 19.767524] ---[ end trace 0000000000000000 ]---
> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> lspci output:
>
> Collected from a system that is not crashing (6.12.41+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.41-1 (2025-08-12) x86_64 GNU/Linux)...
>
> $ sudo lspci -v
> 2f22:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80)
> Subsystem: Mellanox Technologies Device 0190
> Physical Slot: 3
> Flags: bus master, fast devsel, latency 0, NUMA node 0
> Memory at fe0100000 (64-bit, prefetchable) [size=1M]
> Capabilities: [60] Express Endpoint, IntMsgNum 0
> Capabilities: [9c] MSI-X: Enable+ Count=8 Masked-
> Kernel driver in use: mlx5_core
> Kernel modules: mlx5_core
>
> 52f7:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80)
> Subsystem: Mellanox Technologies Device 0190
> Physical Slot: 2
> Flags: bus master, fast devsel, latency 0, NUMA node 0
> Memory at fe0000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [60] Express Endpoint, IntMsgNum 0
> Capabilities: [9c] MSI-X: Enable+ Count=8 Masked-
> Kernel driver in use: mlx5_core
> Kernel modules: mlx5_core
>
> 7852:00:02.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] (rev 80)
> Subsystem: Mellanox Technologies Device 0190
> Physical Slot: 4
> Flags: bus master, fast devsel, latency 0, NUMA node 0
> Memory at fe0200000 (64-bit, prefetchable) [size=1M]
> Capabilities: [60] Express Endpoint, IntMsgNum 0
> Capabilities: [9c] MSI-X: Enable+ Count=8 Masked-
> Kernel driver in use: mlx5_core
> Kernel modules: mlx5_core
>
> dmidecode output:
>
> $ sudo dmidecode
> # dmidecode 3.6
> Getting SMBIOS data from sysfs.
> SMBIOS 3.1.0 present.
> Table at 0x3FF82000.
>
> Handle 0x0000, DMI type 0, 26 bytes
> BIOS Information
> Vendor: Microsoft Corporation
> Version: Hyper-V UEFI Release v4.1
> Release Date: 05/13/2024
> ROM Size: 64 kB
> Characteristics:
> BIOS characteristics not supported
> ACPI is supported
> Targeted content distribution is supported
> UEFI is supported
> System is a virtual machine
> BIOS Revision: 4.1
>
> Handle 0x0001, DMI type 1, 27 bytes
> System Information
> Manufacturer: Microsoft Corporation
> Product Name: Virtual Machine
> Version: Hyper-V UEFI Release v4.1
> Serial Number: 0000-0010-5437-9499-5225-4477-46
> UUID: 925315af-4af4-4d42-915a-1516b5a1fe5c
> Wake-up Type: Power Switch
> SKU Number: None
> Family: Virtual Machine
>
> Handle 0x0002, DMI type 3, 24 bytes
> Chassis Information
> Manufacturer: Microsoft Corporation
> Type: Desktop
> Lock: Not Present
> Version: Hyper-V UEFI Release v4.1
> Serial Number: 2466-9316-1078-9783-6078-7718-80
> Asset Tag: 7783-7084-3265-9085-8269-3286-77
> Boot-up State: Safe
> Power Supply State: Safe
> Thermal State: Safe
> Security Status: Unknown
> OEM Information: 0x00000000
> Height: Unspecified
> Number Of Power Cords: Unspecified
> Contained Elements: 0
> SKU Number: Virtual Machine
>
> Handle 0x0003, DMI type 2, 17 bytes
> Base Board Information
> Manufacturer: Microsoft Corporation
> Product Name: Virtual Machine
> Version: Hyper-V UEFI Release v4.1
> Serial Number: 0000-0010-4737-0707-0684-2660-76
> Asset Tag: None
> Features:
> Board is a hosting board
> Location In Chassis: Virtual Machine
> Chassis Handle: 0x0002
> Type: Motherboard
> Contained Object Handles: 0
>
> Handle 0x0004, DMI type 4, 48 bytes
> Processor Information
> Socket Designation: None
> Type: Central Processor
> Family: Unknown
> Manufacturer: None
> ID: 00 00 00 00 00 00 00 00
> Version: None
> Voltage: Unknown
> External Clock: Unknown
> Max Speed: Unknown
> Current Speed: Unknown
> Status: Unpopulated
> Upgrade: Other
> L1 Cache Handle: Not Provided
> L2 Cache Handle: Not Provided
> L3 Cache Handle: Not Provided
> Serial Number: None
> Asset Tag: None
> Part Number: None
> Core Count: 4
> Core Enabled: 4
> Thread Count: 1
> Characteristics: None
>
> Handle 0x0005, DMI type 11, 5 bytes
> OEM Strings
> String 1: [MS_VM_CERT/SHA1/9b80ca0d5dd061ec9da4e494f4c3fd1196270c22]
> String 2: None
> String 3: To be filled by OEM
>
> Handle 0x0006, DMI type 16, 23 bytes
> Physical Memory Array
> Location: System Board Or Motherboard
> Use: System Memory
> Error Correction Type: None
> Maximum Capacity: 0 bytes
> Error Information Handle: Not Provided
> Number Of Devices: 3
>
> Handle 0x0007, DMI type 17, 92 bytes
> Memory Device
> Array Handle: 0x0006
> Error Information Handle: Not Provided
> Total Width: Unknown
> Data Width: Unknown
> Size: 26 MB
> Form Factor: Unknown
> Set: None
> Locator: M0001
> Bank Locator: None
> Type: Unknown
> Type Detail: Unknown
> Speed: Unknown
> Manufacturer: Microsoft Corporation
> Serial Number: None
> Asset Tag: None
> Part Number: None
> Rank: Unknown
> Configured Memory Speed: Unknown
> Minimum Voltage: Unknown
> Maximum Voltage: Unknown
> Configured Voltage: Unknown
> Memory Technology: <OUT OF SPEC>
> Memory Operating Mode Capability: None
> Firmware Version: Not Specified
> Module Manufacturer ID: Unknown
> Module Product ID: Unknown
> Memory Subsystem Controller Manufacturer ID: Unknown
> Memory Subsystem Controller Product ID: Unknown
> Non-Volatile Size: None
> Volatile Size: None
> Cache Size: None
> Logical Size: None
>
> Handle 0x0008, DMI type 19, 31 bytes
> Memory Array Mapped Address
> Starting Address: 0x00000000000
> Ending Address: 0x00001A003FF
> Range Size: 26625 kB
> Physical Array Handle: 0x0006
> Partition Width: 0
>
> Handle 0x0009, DMI type 20, 35 bytes
> Memory Device Mapped Address
> Starting Address: 0x00000000000
> Ending Address: 0x00001A003FF
> Range Size: 26625 kB
> Physical Device Handle: 0x0007
> Memory Array Mapped Address Handle: 0x0008
> Partition Row Position: Unknown
>
> Handle 0x000A, DMI type 17, 92 bytes
> Memory Device
> Array Handle: 0x0006
> Error Information Handle: Not Provided
> Total Width: Unknown
> Data Width: Unknown
> Size: 948 MB
> Form Factor: Unknown
> Set: None
> Locator: M0002
> Bank Locator: None
> Type: Unknown
> Type Detail: Unknown
> Speed: Unknown
> Manufacturer: Microsoft Corporation
> Serial Number: None
> Asset Tag: None
> Part Number: None
> Rank: Unknown
> Configured Memory Speed: Unknown
> Minimum Voltage: Unknown
> Maximum Voltage: Unknown
> Configured Voltage: Unknown
> Memory Technology: <OUT OF SPEC>
> Memory Operating Mode Capability: None
> Firmware Version: Not Specified
> Module Manufacturer ID: Unknown
> Module Product ID: Unknown
> Memory Subsystem Controller Manufacturer ID: Unknown
> Memory Subsystem Controller Product ID: Unknown
> Non-Volatile Size: None
> Volatile Size: None
> Cache Size: None
> Logical Size: None
>
> Handle 0x000B, DMI type 19, 31 bytes
> Memory Array Mapped Address
> Starting Address: 0x00004C00000
> Ending Address: 0x000400003FF
> Range Size: 970753 kB
> Physical Array Handle: 0x0006
> Partition Width: 0
>
> Handle 0x000C, DMI type 20, 35 bytes
> Memory Device Mapped Address
> Starting Address: 0x00004C00000
> Ending Address: 0x000400003FF
> Range Size: 970753 kB
> Physical Device Handle: 0x000A
> Memory Array Mapped Address Handle: 0x000B
> Partition Row Position: Unknown
>
> Handle 0x000D, DMI type 17, 92 bytes
> Memory Device
> Array Handle: 0x0006
> Error Information Handle: Not Provided
> Total Width: Unknown
> Data Width: Unknown
> Size: 13 GB
> Form Factor: Unknown
> Set: None
> Locator: M0003
> Bank Locator: None
> Type: Unknown
> Type Detail: Unknown
> Speed: Unknown
> Manufacturer: Microsoft Corporation
> Serial Number: None
> Asset Tag: None
> Part Number: None
> Rank: Unknown
> Configured Memory Speed: Unknown
> Minimum Voltage: Unknown
> Maximum Voltage: Unknown
> Configured Voltage: Unknown
> Memory Technology: <OUT OF SPEC>
> Memory Operating Mode Capability: None
> Firmware Version: Not Specified
> Module Manufacturer ID: Unknown
> Module Product ID: Unknown
> Memory Subsystem Controller Manufacturer ID: Unknown
> Memory Subsystem Controller Product ID: Unknown
> Non-Volatile Size: None
> Volatile Size: None
> Cache Size: None
> Logical Size: None
>
> Handle 0x000E, DMI type 19, 31 bytes
> Memory Array Mapped Address
> Starting Address: 0x00100000000
> Ending Address: 0x004400003FF
> Range Size: 13 GB
> Physical Array Handle: 0x0006
> Partition Width: 0
>
> Handle 0x000F, DMI type 20, 35 bytes
> Memory Device Mapped Address
> Starting Address: 0x00100000000
> Ending Address: 0x004400003FF
> Range Size: 13 GB
> Physical Device Handle: 0x000D
> Memory Array Mapped Address Handle: 0x000E
> Partition Row Position: Unknown
>
> Handle 0x0010, DMI type 32, 11 bytes
> System Boot Information
> Status: No errors detected
>
> Handle 0xFEFF, DMI type 127, 4 bytes
> End Of Table
The offending commit appers to be the backport of b15b7d2a1b09
("uio_hv_generic: Let userspace take care of interrupt mask") for
6.12.y.
Peter confirmed that reverting this commit on top of 6.12.57-1 as
packaged in Debian resolves indeed the issue. Interestingly the issue
is *not* seen with 6.17.7 based kernel in Debian.
#regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
#regzbot monitor: https://bugs.debian.org/1120602
Thank you already!
Regards,
Salvatore
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-13 18:29 [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic] Salvatore Bonaccorso
@ 2025-11-14 6:03 ` Naman Jain
2025-11-14 11:49 ` Peter Morrow
0 siblings, 1 reply; 8+ messages in thread
From: Naman Jain @ 2025-11-14 6:03 UTC (permalink / raw)
To: Salvatore Bonaccorso, Peter Morrow, Long Li
Cc: 1120602, linux-hyperv, linux-kernel, regressions, stable,
John Starks, Michael Kelley, Long Li, Tianyu Lan,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman
On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
> Peter Morrow reported in Debian a regression, reported in
> https://bugs.debian.org/1120602 . The regression was seen after
> updating, to 6.12.57-1 in Debian, but details on the offending commit
> follows.
>
> His report was as follows:
>
>> Dear Maintainer,
>>
>> I'm seeing a kernel crash quite soon after boot on a debian trixie based
>> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
>> I can access the system to gather more information. Thus I'll provide details
>> of the system using a previously known good version. The panic is happening
>> 100% of the time unfortunately. I have access to the serial console however
>> so can enable any required verbose logging during boot if necessary.
>>
>> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
>> same userspace. We had pinned to that version until very recently to in order
>> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
>>
>> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
>> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
>>
>> The only relevant upstream commit in this area (as far as I can see) is:
>>
>> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
>>
>> The comment regarding avoiding races at start adds a bit more weight behind this
>> hunch, though it's only a hunch as I am most definitely nowhere near an expert
>> in this area.
>>
>> -- Package-specific info:
>>
>> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>> [ 19.628874] #PF: supervisor read access in kernel mode
>> [ 19.630841] #PF: error_code(0x0000) - not-present page
>> [ 19.632788] PGD 0 P4D 0
>> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
>> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
>> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>> [ 19.679385] Call Trace:
>> [ 19.680361] <IRQ>
>> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
>> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
>> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
>> [ 19.686665] </IRQ>
>> [ 19.687509] <TASK>
>> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
>> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
>> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
>> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
>> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
>> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
>> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
>> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> [ 19.715173] default_idle+0x9/0x20
>> [ 19.716846] default_idle_call+0x29/0x100
>> [ 19.718623] do_idle+0x1fe/0x240
>> [ 19.720045] cpu_startup_entry+0x29/0x30
>> [ 19.721595] start_secondary+0x11e/0x140
>> [ 19.723080] common_startup_64+0x13e/0x141
>> [ 19.725222] </TASK>
>> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
>> [ 19.726466] crc32_pclmul crc32c_intel
>> [ 19.765771] CR2: 00000000000000a0
>> [ 19.767524] ---[ end trace 0000000000000000 ]---
>> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
>> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>
<snip>
> The offending commit appers to be the backport of b15b7d2a1b09
> ("uio_hv_generic: Let userspace take care of interrupt mask") for
> 6.12.y.
>
> Peter confirmed that reverting this commit on top of 6.12.57-1 as
> packaged in Debian resolves indeed the issue. Interestingly the issue
> is *not* seen with 6.17.7 based kernel in Debian.
>
> #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
> #regzbot monitor: https://bugs.debian.org/1120602
>
> Thank you already!
>
> Regards,
> Salvatore
Hi Peter, Salvatore,
Thanks for reporting this crash, and sorry for the trouble. Here is my
analysis.
On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
channels on the device") is present, hv_uio_irqcontrol() supports
setting of interrupt mask from userspace for sub-channels as well.
This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
take care of interrupt mask") which relies on userspace to manage
interrupt mask, so it safely removes the interrupt mask management logic
in the driver.
However, in 6.12.57, the first commit is not present, but the second one
is, so there is no way to disable interrupt mask for sub-channels and
interrupt_mask stays 0, which means interrupts are not masked. So we may
be having an interrupt callback being handled for a sub-channel, where
we do not expect it to come. This may be causing this issue.
This would have led to a crash in hv_uio_channel_cb() for sub-channels:
struct hv_device *hv_dev = chan->device_obj;
I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
channels on the device") on 6.12.57, and resolved some merge conflicts.
Could you please help with testing this, if it works for you.
Hi Long,
If this works, do you see any concerns if I back-port your patch on
older kernels (6.12 and prior)?
Regards,
Naman
--------------
Patch:
From 2f14d48d2bde3f86b153b9f756a9cd688cda3453 Mon Sep 17 00:00:00 2001
From: Long Li <longli@microsoft.com>
Date: Mon, 10 Mar 2025 15:12:01 -0700
Subject: [PATCH] uio_hv_generic: Set event for all channels on the device
Hyper-V may offer a non latency sensitive device with subchannels without
monitor bit enabled. The decision is entirely on the Hyper-V host not
configurable within guest.
When a device has subchannels, also signal events for the subchannel
if its monitor bit is disabled.
This patch also removes the memory barrier when monitor bit is enabled
as it is not necessary. The memory barrier is only needed between
setting up interrupt mask and calling vmbus_set_event() when monitor
bit is disabled.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
drivers/uio/uio_hv_generic.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0b414d1168dd..9f3b124a5e09 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -65,6 +65,16 @@ struct hv_uio_private_data {
char send_name[32];
};
+static void set_event(struct vmbus_channel *channel, s32 irq_state)
+{
+ channel->inbound.ring_buffer->interrupt_mask = !irq_state;
+ if (!channel->offermsg.monitor_allocated && irq_state) {
+ /* MB is needed for host to see the interrupt mask first */
+ virt_mb();
+ vmbus_set_event(channel);
+ }
+}
+
/*
* This is the irqcontrol callback to be registered to uio_info.
* It can be used to disable/enable interrupt from user space processes.
@@ -79,12 +89,15 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
{
struct hv_uio_private_data *pdata = info->priv;
struct hv_device *dev = pdata->device;
+ struct vmbus_channel *primary, *sc;
- dev->channel->inbound.ring_buffer->interrupt_mask = !irq_state;
- virt_mb();
+ primary = dev->channel;
+ set_event(primary, irq_state);
- if (!dev->channel->offermsg.monitor_allocated && irq_state)
- vmbus_setevent(dev->channel);
+ mutex_lock(&vmbus_connection.channel_mutex);
+ list_for_each_entry(sc, &primary->sc_list, sc_list)
+ set_event(sc, irq_state);
+ mutex_unlock(&vmbus_connection.channel_mutex);
return 0;
}
@@ -95,11 +108,18 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
static void hv_uio_channel_cb(void *context)
{
struct vmbus_channel *chan = context;
- struct hv_device *hv_dev = chan->device_obj;
- struct hv_uio_private_data *pdata = hv_get_drvdata(hv_dev);
+ struct hv_device *hv_dev;
+ struct hv_uio_private_data *pdata;
virt_mb();
+ /*
+ * The callback may come from a subchannel, in which case look
+ * for the hv device in the primary channel
+ */
+ hv_dev = chan->primary_channel ?
+ chan->primary_channel->device_obj : chan->device_obj;
+ pdata = hv_get_drvdata(hv_dev);
uio_event_notify(&pdata->info);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-14 6:03 ` Naman Jain
@ 2025-11-14 11:49 ` Peter Morrow
2025-11-14 14:35 ` Naman Jain
0 siblings, 1 reply; 8+ messages in thread
From: Peter Morrow @ 2025-11-14 11:49 UTC (permalink / raw)
To: Naman Jain
Cc: Salvatore Bonaccorso, Long Li, 1120602, linux-hyperv,
linux-kernel, regressions, stable, John Starks, Michael Kelley,
Tianyu Lan, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman
Hi Naman,
On Fri, 14 Nov 2025 at 06:03, Naman Jain <namjain@linux.microsoft.com> wrote:
>
>
>
> On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
> > Peter Morrow reported in Debian a regression, reported in
> > https://bugs.debian.org/1120602 . The regression was seen after
> > updating, to 6.12.57-1 in Debian, but details on the offending commit
> > follows.
> >
> > His report was as follows:
> >
> >> Dear Maintainer,
> >>
> >> I'm seeing a kernel crash quite soon after boot on a debian trixie based
> >> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
> >> I can access the system to gather more information. Thus I'll provide details
> >> of the system using a previously known good version. The panic is happening
> >> 100% of the time unfortunately. I have access to the serial console however
> >> so can enable any required verbose logging during boot if necessary.
> >>
> >> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
> >> same userspace. We had pinned to that version until very recently to in order
> >> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
> >>
> >> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
> >> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
> >>
> >> The only relevant upstream commit in this area (as far as I can see) is:
> >>
> >> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
> >>
> >> The comment regarding avoiding races at start adds a bit more weight behind this
> >> hunch, though it's only a hunch as I am most definitely nowhere near an expert
> >> in this area.
> >>
> >> -- Package-specific info:
> >>
> >> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> >> [ 19.628874] #PF: supervisor read access in kernel mode
> >> [ 19.630841] #PF: error_code(0x0000) - not-present page
> >> [ 19.632788] PGD 0 P4D 0
> >> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> >> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
> >> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
> >> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> >> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> >> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> >> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> >> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> >> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> >> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> >> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> >> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> >> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> >> [ 19.679385] Call Trace:
> >> [ 19.680361] <IRQ>
> >> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
> >> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
> >> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
> >> [ 19.686665] </IRQ>
> >> [ 19.687509] <TASK>
> >> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
> >> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
> >> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> >> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
> >> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
> >> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
> >> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
> >> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
> >> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >> [ 19.715173] default_idle+0x9/0x20
> >> [ 19.716846] default_idle_call+0x29/0x100
> >> [ 19.718623] do_idle+0x1fe/0x240
> >> [ 19.720045] cpu_startup_entry+0x29/0x30
> >> [ 19.721595] start_secondary+0x11e/0x140
> >> [ 19.723080] common_startup_64+0x13e/0x141
> >> [ 19.725222] </TASK>
> >> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
> >> [ 19.726466] crc32_pclmul crc32c_intel
> >> [ 19.765771] CR2: 00000000000000a0
> >> [ 19.767524] ---[ end trace 0000000000000000 ]---
> >> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> >> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> >> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> >> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> >> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> >> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> >> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> >> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> >> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> >> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> >> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
> >> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >>
>
> <snip>
>
> > The offending commit appers to be the backport of b15b7d2a1b09
> > ("uio_hv_generic: Let userspace take care of interrupt mask") for
> > 6.12.y.
> >
> > Peter confirmed that reverting this commit on top of 6.12.57-1 as
> > packaged in Debian resolves indeed the issue. Interestingly the issue
> > is *not* seen with 6.17.7 based kernel in Debian.
> >
> > #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
> > #regzbot monitor: https://bugs.debian.org/1120602
> >
> > Thank you already!
> >
> > Regards,
> > Salvatore
>
> Hi Peter, Salvatore,
> Thanks for reporting this crash, and sorry for the trouble. Here is my
> analysis.
>
> On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
> channels on the device") is present, hv_uio_irqcontrol() supports
> setting of interrupt mask from userspace for sub-channels as well.
>
> This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
> take care of interrupt mask") which relies on userspace to manage
> interrupt mask, so it safely removes the interrupt mask management logic
> in the driver.
>
> However, in 6.12.57, the first commit is not present, but the second one
> is, so there is no way to disable interrupt mask for sub-channels and
> interrupt_mask stays 0, which means interrupts are not masked. So we may
> be having an interrupt callback being handled for a sub-channel, where
> we do not expect it to come. This may be causing this issue.
>
> This would have led to a crash in hv_uio_channel_cb() for sub-channels:
> struct hv_device *hv_dev = chan->device_obj;
>
>
> I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
> channels on the device") on 6.12.57, and resolved some merge conflicts.
> Could you please help with testing this, if it works for you.
Applying the patch against the debian 6.12.57 kernel worked, I am no
longer seeing that panic on boot:
gnos@vEdge:~$ uname -a
Linux vEdge 6.12+unreleased-amd64 #1 SMP PREEMPT_DYNAMIC Debian
6.12.57-1a~test (2025-11-14) x86_64 GNU/Linux
gnos@vEdge:~$ uptime
11:46:33 up 4 min, 1 user, load average: 3.31, 2.07, 0.89
gnos@vEdge:~$ sudo dmidecode -t system
# dmidecode 3.6
Getting SMBIOS data from sysfs.
SMBIOS 3.1.0 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Microsoft Corporation
Product Name: Virtual Machine
Version: Hyper-V UEFI Release v4.1
Serial Number: 0000-0002-8036-1108-7588-3134-50
UUID: 26e86d6e-140c-496a-862c-a3b3bbcd16ad
Wake-up Type: Power Switch
SKU Number: None
Family: Virtual Machine
Handle 0x0010, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
gnos@vEdge:~$
Thanks a lot for the quick analysis!
Peter.
>
> Hi Long,
> If this works, do you see any concerns if I back-port your patch on
> older kernels (6.12 and prior)?
>
> Regards,
> Naman
>
> --------------
> Patch:
>
> From 2f14d48d2bde3f86b153b9f756a9cd688cda3453 Mon Sep 17 00:00:00 2001
> From: Long Li <longli@microsoft.com>
> Date: Mon, 10 Mar 2025 15:12:01 -0700
> Subject: [PATCH] uio_hv_generic: Set event for all channels on the device
>
> Hyper-V may offer a non latency sensitive device with subchannels without
> monitor bit enabled. The decision is entirely on the Hyper-V host not
> configurable within guest.
>
> When a device has subchannels, also signal events for the subchannel
> if its monitor bit is disabled.
>
> This patch also removes the memory barrier when monitor bit is enabled
> as it is not necessary. The memory barrier is only needed between
> setting up interrupt mask and calling vmbus_set_event() when monitor
> bit is disabled.
>
> Signed-off-by: Long Li <longli@microsoft.com>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
> drivers/uio/uio_hv_generic.c | 32 ++++++++++++++++++++++++++------
> 1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
> index 0b414d1168dd..9f3b124a5e09 100644
> --- a/drivers/uio/uio_hv_generic.c
> +++ b/drivers/uio/uio_hv_generic.c
> @@ -65,6 +65,16 @@ struct hv_uio_private_data {
> char send_name[32];
> };
>
> +static void set_event(struct vmbus_channel *channel, s32 irq_state)
> +{
> + channel->inbound.ring_buffer->interrupt_mask = !irq_state;
> + if (!channel->offermsg.monitor_allocated && irq_state) {
> + /* MB is needed for host to see the interrupt mask first */
> + virt_mb();
> + vmbus_set_event(channel);
> + }
> +}
> +
> /*
> * This is the irqcontrol callback to be registered to uio_info.
> * It can be used to disable/enable interrupt from user space processes.
> @@ -79,12 +89,15 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
> {
> struct hv_uio_private_data *pdata = info->priv;
> struct hv_device *dev = pdata->device;
> + struct vmbus_channel *primary, *sc;
>
> - dev->channel->inbound.ring_buffer->interrupt_mask = !irq_state;
> - virt_mb();
> + primary = dev->channel;
> + set_event(primary, irq_state);
>
> - if (!dev->channel->offermsg.monitor_allocated && irq_state)
> - vmbus_setevent(dev->channel);
> + mutex_lock(&vmbus_connection.channel_mutex);
> + list_for_each_entry(sc, &primary->sc_list, sc_list)
> + set_event(sc, irq_state);
> + mutex_unlock(&vmbus_connection.channel_mutex);
>
> return 0;
> }
> @@ -95,11 +108,18 @@ hv_uio_irqcontrol(struct uio_info *info, s32 irq_state)
> static void hv_uio_channel_cb(void *context)
> {
> struct vmbus_channel *chan = context;
> - struct hv_device *hv_dev = chan->device_obj;
> - struct hv_uio_private_data *pdata = hv_get_drvdata(hv_dev);
> + struct hv_device *hv_dev;
> + struct hv_uio_private_data *pdata;
>
> virt_mb();
>
> + /*
> + * The callback may come from a subchannel, in which case look
> + * for the hv device in the primary channel
> + */
> + hv_dev = chan->primary_channel ?
> + chan->primary_channel->device_obj : chan->device_obj;
> + pdata = hv_get_drvdata(hv_dev);
> uio_event_notify(&pdata->info);
> }
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-14 11:49 ` Peter Morrow
@ 2025-11-14 14:35 ` Naman Jain
2025-11-14 21:44 ` Bug#1120602: " Salvatore Bonaccorso
0 siblings, 1 reply; 8+ messages in thread
From: Naman Jain @ 2025-11-14 14:35 UTC (permalink / raw)
To: Peter Morrow
Cc: Salvatore Bonaccorso, Long Li, 1120602, linux-hyperv,
linux-kernel, regressions, stable, John Starks, Michael Kelley,
Tianyu Lan, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman
On 11/14/2025 5:19 PM, Peter Morrow wrote:
> Hi Naman,
>
> On Fri, 14 Nov 2025 at 06:03, Naman Jain <namjain@linux.microsoft.com> wrote:
>>
>>
>>
>> On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
>>> Peter Morrow reported in Debian a regression, reported in
>>> https://bugs.debian.org/1120602 . The regression was seen after
>>> updating, to 6.12.57-1 in Debian, but details on the offending commit
>>> follows.
>>>
>>> His report was as follows:
>>>
>>>> Dear Maintainer,
>>>>
>>>> I'm seeing a kernel crash quite soon after boot on a debian trixie based
>>>> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
>>>> I can access the system to gather more information. Thus I'll provide details
>>>> of the system using a previously known good version. The panic is happening
>>>> 100% of the time unfortunately. I have access to the serial console however
>>>> so can enable any required verbose logging during boot if necessary.
>>>>
>>>> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
>>>> same userspace. We had pinned to that version until very recently to in order
>>>> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
>>>>
>>>> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
>>>> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
>>>>
>>>> The only relevant upstream commit in this area (as far as I can see) is:
>>>>
>>>> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
>>>>
>>>> The comment regarding avoiding races at start adds a bit more weight behind this
>>>> hunch, though it's only a hunch as I am most definitely nowhere near an expert
>>>> in this area.
>>>>
>>>> -- Package-specific info:
>>>>
>>>> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>>>> [ 19.628874] #PF: supervisor read access in kernel mode
>>>> [ 19.630841] #PF: error_code(0x0000) - not-present page
>>>> [ 19.632788] PGD 0 P4D 0
>>>> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>>> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
>>>> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
>>>> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>>>> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>>>> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>>>> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>>>> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>>>> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>>>> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>>>> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>>>> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>>>> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>>>> [ 19.679385] Call Trace:
>>>> [ 19.680361] <IRQ>
>>>> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
>>>> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
>>>> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
>>>> [ 19.686665] </IRQ>
>>>> [ 19.687509] <TASK>
>>>> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
>>>> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
>>>> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>>>> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
>>>> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
>>>> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
>>>> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
>>>> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
>>>> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>> [ 19.715173] default_idle+0x9/0x20
>>>> [ 19.716846] default_idle_call+0x29/0x100
>>>> [ 19.718623] do_idle+0x1fe/0x240
>>>> [ 19.720045] cpu_startup_entry+0x29/0x30
>>>> [ 19.721595] start_secondary+0x11e/0x140
>>>> [ 19.723080] common_startup_64+0x13e/0x141
>>>> [ 19.725222] </TASK>
>>>> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
>>>> [ 19.726466] crc32_pclmul crc32c_intel
>>>> [ 19.765771] CR2: 00000000000000a0
>>>> [ 19.767524] ---[ end trace 0000000000000000 ]---
>>>> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>>>> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>>>> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>>>> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>>>> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>>>> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>>>> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>>>> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>>>> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>>>> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>>>> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>
>>
>> <snip>
>>
>>> The offending commit appers to be the backport of b15b7d2a1b09
>>> ("uio_hv_generic: Let userspace take care of interrupt mask") for
>>> 6.12.y.
>>>
>>> Peter confirmed that reverting this commit on top of 6.12.57-1 as
>>> packaged in Debian resolves indeed the issue. Interestingly the issue
>>> is *not* seen with 6.17.7 based kernel in Debian.
>>>
>>> #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
>>> #regzbot monitor: https://bugs.debian.org/1120602
>>>
>>> Thank you already!
>>>
>>> Regards,
>>> Salvatore
>>
>> Hi Peter, Salvatore,
>> Thanks for reporting this crash, and sorry for the trouble. Here is my
>> analysis.
>>
>> On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
>> channels on the device") is present, hv_uio_irqcontrol() supports
>> setting of interrupt mask from userspace for sub-channels as well.
>>
>> This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
>> take care of interrupt mask") which relies on userspace to manage
>> interrupt mask, so it safely removes the interrupt mask management logic
>> in the driver.
>>
>> However, in 6.12.57, the first commit is not present, but the second one
>> is, so there is no way to disable interrupt mask for sub-channels and
>> interrupt_mask stays 0, which means interrupts are not masked. So we may
>> be having an interrupt callback being handled for a sub-channel, where
>> we do not expect it to come. This may be causing this issue.
>>
>> This would have led to a crash in hv_uio_channel_cb() for sub-channels:
>> struct hv_device *hv_dev = chan->device_obj;
>>
>>
>> I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
>> channels on the device") on 6.12.57, and resolved some merge conflicts.
>> Could you please help with testing this, if it works for you.
>
> Applying the patch against the debian 6.12.57 kernel worked, I am no
> longer seeing that panic on boot:
>
> gnos@vEdge:~$ uname -a
> Linux vEdge 6.12+unreleased-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> 6.12.57-1a~test (2025-11-14) x86_64 GNU/Linux
> gnos@vEdge:~$ uptime
> 11:46:33 up 4 min, 1 user, load average: 3.31, 2.07, 0.89
> gnos@vEdge:~$ sudo dmidecode -t system
> # dmidecode 3.6
> Getting SMBIOS data from sysfs.
> SMBIOS 3.1.0 present.
>
> Handle 0x0001, DMI type 1, 27 bytes
> System Information
> Manufacturer: Microsoft Corporation
> Product Name: Virtual Machine
> Version: Hyper-V UEFI Release v4.1
> Serial Number: 0000-0002-8036-1108-7588-3134-50
> UUID: 26e86d6e-140c-496a-862c-a3b3bbcd16ad
> Wake-up Type: Power Switch
> SKU Number: None
> Family: Virtual Machine
>
> Handle 0x0010, DMI type 32, 11 bytes
> System Boot Information
> Status: No errors detected
>
> gnos@vEdge:~$
>
> Thanks a lot for the quick analysis!
>
> Peter.
Hi Peter,
Thanks for confirming. I am discussing this with Long Li, to hear his
thoughts on this, and have kept the patch ready.
Porting the same on 6.6 and older kernels would be a little different
since we don't have commit 547fa4ffd799 ("uio_hv_generic: Enable
interrupt for low speed VMBus devices") on these kernels and this would
lead to merge conflicts, which needs to be handled separately.
Meanwhile, if I should be including any tags in the fix patch for debian
bug, please let me know.
Regards,
Naman
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Bug#1120602: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-14 14:35 ` Naman Jain
@ 2025-11-14 21:44 ` Salvatore Bonaccorso
2025-11-15 9:03 ` Naman Jain
0 siblings, 1 reply; 8+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-14 21:44 UTC (permalink / raw)
To: Naman Jain, 1120602
Cc: Peter Morrow, Long Li, linux-hyperv, linux-kernel, regressions,
stable, John Starks, Michael Kelley, Tianyu Lan, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Greg Kroah-Hartman
Hi,
On Fri, Nov 14, 2025 at 08:05:55PM +0530, Naman Jain wrote:
>
>
> On 11/14/2025 5:19 PM, Peter Morrow wrote:
> > Hi Naman,
> >
> > On Fri, 14 Nov 2025 at 06:03, Naman Jain <namjain@linux.microsoft.com> wrote:
> > >
> > >
> > >
> > > On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
> > > > Peter Morrow reported in Debian a regression, reported in
> > > > https://bugs.debian.org/1120602 . The regression was seen after
> > > > updating, to 6.12.57-1 in Debian, but details on the offending commit
> > > > follows.
> > > >
> > > > His report was as follows:
> > > >
> > > > > Dear Maintainer,
> > > > >
> > > > > I'm seeing a kernel crash quite soon after boot on a debian trixie based
> > > > > system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
> > > > > I can access the system to gather more information. Thus I'll provide details
> > > > > of the system using a previously known good version. The panic is happening
> > > > > 100% of the time unfortunately. I have access to the serial console however
> > > > > so can enable any required verbose logging during boot if necessary.
> > > > >
> > > > > Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
> > > > > same userspace. We had pinned to that version until very recently to in order
> > > > > to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
> > > > >
> > > > > I'm running a dpdk application here (VPP) on Azure, VM form factor is a
> > > > > "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
> > > > >
> > > > > The only relevant upstream commit in this area (as far as I can see) is:
> > > > >
> > > > > https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
> > > > >
> > > > > The comment regarding avoiding races at start adds a bit more weight behind this
> > > > > hunch, though it's only a hunch as I am most definitely nowhere near an expert
> > > > > in this area.
> > > > >
> > > > > -- Package-specific info:
> > > > >
> > > > > [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> > > > > [ 19.628874] #PF: supervisor read access in kernel mode
> > > > > [ 19.630841] #PF: error_code(0x0000) - not-present page
> > > > > [ 19.632788] PGD 0 P4D 0
> > > > > [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
> > > > > [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
> > > > > [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> > > > > [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> > > > > [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> > > > > [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> > > > > [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> > > > > [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> > > > > [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> > > > > [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> > > > > [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> > > > > [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> > > > > [ 19.679385] Call Trace:
> > > > > [ 19.680361] <IRQ>
> > > > > [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
> > > > > [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
> > > > > [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
> > > > > [ 19.686665] </IRQ>
> > > > > [ 19.687509] <TASK>
> > > > > [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
> > > > > [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
> > > > > [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> > > > > [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
> > > > > [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
> > > > > [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
> > > > > [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
> > > > > [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
> > > > > [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > > [ 19.715173] default_idle+0x9/0x20
> > > > > [ 19.716846] default_idle_call+0x29/0x100
> > > > > [ 19.718623] do_idle+0x1fe/0x240
> > > > > [ 19.720045] cpu_startup_entry+0x29/0x30
> > > > > [ 19.721595] start_secondary+0x11e/0x140
> > > > > [ 19.723080] common_startup_64+0x13e/0x141
> > > > > [ 19.725222] </TASK>
> > > > > [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
> > > > > [ 19.726466] crc32_pclmul crc32c_intel
> > > > > [ 19.765771] CR2: 00000000000000a0
> > > > > [ 19.767524] ---[ end trace 0000000000000000 ]---
> > > > > [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> > > > > [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> > > > > [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> > > > > [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> > > > > [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> > > > > [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> > > > > [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> > > > > [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> > > > > [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> > > > > [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> > > > > [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
> > > > > [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > > > >
> > >
> > > <snip>
> > >
> > > > The offending commit appers to be the backport of b15b7d2a1b09
> > > > ("uio_hv_generic: Let userspace take care of interrupt mask") for
> > > > 6.12.y.
> > > >
> > > > Peter confirmed that reverting this commit on top of 6.12.57-1 as
> > > > packaged in Debian resolves indeed the issue. Interestingly the issue
> > > > is *not* seen with 6.17.7 based kernel in Debian.
> > > >
> > > > #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
> > > > #regzbot monitor: https://bugs.debian.org/1120602
> > > >
> > > > Thank you already!
> > > >
> > > > Regards,
> > > > Salvatore
> > >
> > > Hi Peter, Salvatore,
> > > Thanks for reporting this crash, and sorry for the trouble. Here is my
> > > analysis.
> > >
> > > On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
> > > channels on the device") is present, hv_uio_irqcontrol() supports
> > > setting of interrupt mask from userspace for sub-channels as well.
> > >
> > > This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
> > > take care of interrupt mask") which relies on userspace to manage
> > > interrupt mask, so it safely removes the interrupt mask management logic
> > > in the driver.
> > >
> > > However, in 6.12.57, the first commit is not present, but the second one
> > > is, so there is no way to disable interrupt mask for sub-channels and
> > > interrupt_mask stays 0, which means interrupts are not masked. So we may
> > > be having an interrupt callback being handled for a sub-channel, where
> > > we do not expect it to come. This may be causing this issue.
> > >
> > > This would have led to a crash in hv_uio_channel_cb() for sub-channels:
> > > struct hv_device *hv_dev = chan->device_obj;
> > >
> > >
> > > I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
> > > channels on the device") on 6.12.57, and resolved some merge conflicts.
> > > Could you please help with testing this, if it works for you.
> >
> > Applying the patch against the debian 6.12.57 kernel worked, I am no
> > longer seeing that panic on boot:
> >
> > gnos@vEdge:~$ uname -a
> > Linux vEdge 6.12+unreleased-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> > 6.12.57-1a~test (2025-11-14) x86_64 GNU/Linux
> > gnos@vEdge:~$ uptime
> > 11:46:33 up 4 min, 1 user, load average: 3.31, 2.07, 0.89
> > gnos@vEdge:~$ sudo dmidecode -t system
> > # dmidecode 3.6
> > Getting SMBIOS data from sysfs.
> > SMBIOS 3.1.0 present.
> >
> > Handle 0x0001, DMI type 1, 27 bytes
> > System Information
> > Manufacturer: Microsoft Corporation
> > Product Name: Virtual Machine
> > Version: Hyper-V UEFI Release v4.1
> > Serial Number: 0000-0002-8036-1108-7588-3134-50
> > UUID: 26e86d6e-140c-496a-862c-a3b3bbcd16ad
> > Wake-up Type: Power Switch
> > SKU Number: None
> > Family: Virtual Machine
> >
> > Handle 0x0010, DMI type 32, 11 bytes
> > System Boot Information
> > Status: No errors detected
> >
> > gnos@vEdge:~$
> >
> > Thanks a lot for the quick analysis!
> >
> > Peter.
>
> Hi Peter,
>
> Thanks for confirming. I am discussing this with Long Li, to hear his
> thoughts on this, and have kept the patch ready.
> Porting the same on 6.6 and older kernels would be a little different since
> we don't have commit 547fa4ffd799 ("uio_hv_generic: Enable interrupt for low
> speed VMBus devices") on these kernels and this would lead to merge
> conflicts, which needs to be handled separately.
>
> Meanwhile, if I should be including any tags in the fix patch for debian
> bug, please let me know.
Thank you very much for the quick analysis and fix.
If you can add a Closes: https://bugs.debian.org/1120602 that would
make our tracking for the fixes easier. But not sure if this is
allowed for proposing the backport for a stable series, as it did not
affect the upper releases.
In any case your work is much appreciated!
Regards,
Salvatore
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Bug#1120602: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-14 21:44 ` Bug#1120602: " Salvatore Bonaccorso
@ 2025-11-15 9:03 ` Naman Jain
2025-11-21 10:04 ` Peter Morrow
0 siblings, 1 reply; 8+ messages in thread
From: Naman Jain @ 2025-11-15 9:03 UTC (permalink / raw)
To: Salvatore Bonaccorso, 1120602
Cc: Peter Morrow, Long Li, linux-hyperv, linux-kernel, regressions,
stable, John Starks, Michael Kelley, Tianyu Lan, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Greg Kroah-Hartman
On 11/15/2025 3:14 AM, Salvatore Bonaccorso wrote:
> Hi,
>
> On Fri, Nov 14, 2025 at 08:05:55PM +0530, Naman Jain wrote:
>>
>>
>> On 11/14/2025 5:19 PM, Peter Morrow wrote:
>>> Hi Naman,
>>>
>>> On Fri, 14 Nov 2025 at 06:03, Naman Jain <namjain@linux.microsoft.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
>>>>> Peter Morrow reported in Debian a regression, reported in
>>>>> https://bugs.debian.org/1120602 . The regression was seen after
>>>>> updating, to 6.12.57-1 in Debian, but details on the offending commit
>>>>> follows.
>>>>>
>>>>> His report was as follows:
>>>>>
>>>>>> Dear Maintainer,
>>>>>>
>>>>>> I'm seeing a kernel crash quite soon after boot on a debian trixie based
>>>>>> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
>>>>>> I can access the system to gather more information. Thus I'll provide details
>>>>>> of the system using a previously known good version. The panic is happening
>>>>>> 100% of the time unfortunately. I have access to the serial console however
>>>>>> so can enable any required verbose logging during boot if necessary.
>>>>>>
>>>>>> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
>>>>>> same userspace. We had pinned to that version until very recently to in order
>>>>>> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
>>>>>>
>>>>>> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
>>>>>> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
>>>>>>
>>>>>> The only relevant upstream commit in this area (as far as I can see) is:
>>>>>>
>>>>>> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
>>>>>>
>>>>>> The comment regarding avoiding races at start adds a bit more weight behind this
>>>>>> hunch, though it's only a hunch as I am most definitely nowhere near an expert
>>>>>> in this area.
>>>>>>
>>>>>> -- Package-specific info:
>>>>>>
>>>>>> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>>>>>> [ 19.628874] #PF: supervisor read access in kernel mode
>>>>>> [ 19.630841] #PF: error_code(0x0000) - not-present page
>>>>>> [ 19.632788] PGD 0 P4D 0
>>>>>> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
>>>>>> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
>>>>>> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>>>>>> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>>>>>> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>>>>>> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>>>>>> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>>>>>> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>>>>>> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>>>>>> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>>>>>> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>>>>>> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>>>>>> [ 19.679385] Call Trace:
>>>>>> [ 19.680361] <IRQ>
>>>>>> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
>>>>>> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
>>>>>> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
>>>>>> [ 19.686665] </IRQ>
>>>>>> [ 19.687509] <TASK>
>>>>>> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
>>>>>> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
>>>>>> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>>>>>> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
>>>>>> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
>>>>>> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
>>>>>> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
>>>>>> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
>>>>>> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>>>> [ 19.715173] default_idle+0x9/0x20
>>>>>> [ 19.716846] default_idle_call+0x29/0x100
>>>>>> [ 19.718623] do_idle+0x1fe/0x240
>>>>>> [ 19.720045] cpu_startup_entry+0x29/0x30
>>>>>> [ 19.721595] start_secondary+0x11e/0x140
>>>>>> [ 19.723080] common_startup_64+0x13e/0x141
>>>>>> [ 19.725222] </TASK>
>>>>>> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
>>>>>> [ 19.726466] crc32_pclmul crc32c_intel
>>>>>> [ 19.765771] CR2: 00000000000000a0
>>>>>> [ 19.767524] ---[ end trace 0000000000000000 ]---
>>>>>> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
>>>>>> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
>>>>>> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
>>>>>> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
>>>>>> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
>>>>>> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
>>>>>> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
>>>>>> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
>>>>>> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
>>>>>> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
>>>>>> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>>>>
>>>>
>>>> <snip>
>>>>
>>>>> The offending commit appers to be the backport of b15b7d2a1b09
>>>>> ("uio_hv_generic: Let userspace take care of interrupt mask") for
>>>>> 6.12.y.
>>>>>
>>>>> Peter confirmed that reverting this commit on top of 6.12.57-1 as
>>>>> packaged in Debian resolves indeed the issue. Interestingly the issue
>>>>> is *not* seen with 6.17.7 based kernel in Debian.
>>>>>
>>>>> #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
>>>>> #regzbot monitor: https://bugs.debian.org/1120602
>>>>>
>>>>> Thank you already!
>>>>>
>>>>> Regards,
>>>>> Salvatore
>>>>
>>>> Hi Peter, Salvatore,
>>>> Thanks for reporting this crash, and sorry for the trouble. Here is my
>>>> analysis.
>>>>
>>>> On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
>>>> channels on the device") is present, hv_uio_irqcontrol() supports
>>>> setting of interrupt mask from userspace for sub-channels as well.
>>>>
>>>> This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
>>>> take care of interrupt mask") which relies on userspace to manage
>>>> interrupt mask, so it safely removes the interrupt mask management logic
>>>> in the driver.
>>>>
>>>> However, in 6.12.57, the first commit is not present, but the second one
>>>> is, so there is no way to disable interrupt mask for sub-channels and
>>>> interrupt_mask stays 0, which means interrupts are not masked. So we may
>>>> be having an interrupt callback being handled for a sub-channel, where
>>>> we do not expect it to come. This may be causing this issue.
>>>>
>>>> This would have led to a crash in hv_uio_channel_cb() for sub-channels:
>>>> struct hv_device *hv_dev = chan->device_obj;
>>>>
>>>>
>>>> I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
>>>> channels on the device") on 6.12.57, and resolved some merge conflicts.
>>>> Could you please help with testing this, if it works for you.
>>>
>>> Applying the patch against the debian 6.12.57 kernel worked, I am no
>>> longer seeing that panic on boot:
>>>
>>> gnos@vEdge:~$ uname -a
>>> Linux vEdge 6.12+unreleased-amd64 #1 SMP PREEMPT_DYNAMIC Debian
>>> 6.12.57-1a~test (2025-11-14) x86_64 GNU/Linux
>>> gnos@vEdge:~$ uptime
>>> 11:46:33 up 4 min, 1 user, load average: 3.31, 2.07, 0.89
>>> gnos@vEdge:~$ sudo dmidecode -t system
>>> # dmidecode 3.6
>>> Getting SMBIOS data from sysfs.
>>> SMBIOS 3.1.0 present.
>>>
>>> Handle 0x0001, DMI type 1, 27 bytes
>>> System Information
>>> Manufacturer: Microsoft Corporation
>>> Product Name: Virtual Machine
>>> Version: Hyper-V UEFI Release v4.1
>>> Serial Number: 0000-0002-8036-1108-7588-3134-50
>>> UUID: 26e86d6e-140c-496a-862c-a3b3bbcd16ad
>>> Wake-up Type: Power Switch
>>> SKU Number: None
>>> Family: Virtual Machine
>>>
>>> Handle 0x0010, DMI type 32, 11 bytes
>>> System Boot Information
>>> Status: No errors detected
>>>
>>> gnos@vEdge:~$
>>>
>>> Thanks a lot for the quick analysis!
>>>
>>> Peter.
>>
>> Hi Peter,
>>
>> Thanks for confirming. I am discussing this with Long Li, to hear his
>> thoughts on this, and have kept the patch ready.
>> Porting the same on 6.6 and older kernels would be a little different since
>> we don't have commit 547fa4ffd799 ("uio_hv_generic: Enable interrupt for low
>> speed VMBus devices") on these kernels and this would lead to merge
>> conflicts, which needs to be handled separately.
>>
>> Meanwhile, if I should be including any tags in the fix patch for debian
>> bug, please let me know.
>
> Thank you very much for the quick analysis and fix.
>
> If you can add a Closes: https://bugs.debian.org/1120602 that would
> make our tracking for the fixes easier. But not sure if this is
> allowed for proposing the backport for a stable series, as it did not
> affect the upper releases.
>
> In any case your work is much appreciated!
>
> Regards,
> Salvatore
Hi,
I have sent the patches now to the list. Please consider adding your
tested-by if you find it alright.
Thanks.
Regards,
Naman
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Bug#1120602: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-15 9:03 ` Naman Jain
@ 2025-11-21 10:04 ` Peter Morrow
2025-11-26 6:15 ` Naman Jain
0 siblings, 1 reply; 8+ messages in thread
From: Peter Morrow @ 2025-11-21 10:04 UTC (permalink / raw)
To: Naman Jain
Cc: Salvatore Bonaccorso, 1120602, Long Li, linux-hyperv,
linux-kernel, regressions, stable, John Starks, Michael Kelley,
Tianyu Lan, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman
Hi Naman/Salvatore,
Is it possible to get this fixed in the 6.1 LTS series too? I just ran
into this crash when moving from bookworm based Debian kernel
6.1.153-1 to 6.1.158-1. I saw that "uio_hv_generic: Let userspace take
care of interrupt mask" appeared in 6.1.156.
Thanks,
Peter.
On Sat, 15 Nov 2025 at 09:04, Naman Jain <namjain@linux.microsoft.com> wrote:
>
>
>
> On 11/15/2025 3:14 AM, Salvatore Bonaccorso wrote:
> > Hi,
> >
> > On Fri, Nov 14, 2025 at 08:05:55PM +0530, Naman Jain wrote:
> >>
> >>
> >> On 11/14/2025 5:19 PM, Peter Morrow wrote:
> >>> Hi Naman,
> >>>
> >>> On Fri, 14 Nov 2025 at 06:03, Naman Jain <namjain@linux.microsoft.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 11/13/2025 11:59 PM, Salvatore Bonaccorso wrote:
> >>>>> Peter Morrow reported in Debian a regression, reported in
> >>>>> https://bugs.debian.org/1120602 . The regression was seen after
> >>>>> updating, to 6.12.57-1 in Debian, but details on the offending commit
> >>>>> follows.
> >>>>>
> >>>>> His report was as follows:
> >>>>>
> >>>>>> Dear Maintainer,
> >>>>>>
> >>>>>> I'm seeing a kernel crash quite soon after boot on a debian trixie based
> >>>>>> system running 6.12.57+deb13-amd64, unfortunately the kernel panics before
> >>>>>> I can access the system to gather more information. Thus I'll provide details
> >>>>>> of the system using a previously known good version. The panic is happening
> >>>>>> 100% of the time unfortunately. I have access to the serial console however
> >>>>>> so can enable any required verbose logging during boot if necessary.
> >>>>>>
> >>>>>> Crucially the crash is not seen with kernel version 6.12.41+deb13-amd64 with the
> >>>>>> same userspace. We had pinned to that version until very recently to in order
> >>>>>> to work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109676
> >>>>>>
> >>>>>> I'm running a dpdk application here (VPP) on Azure, VM form factor is a
> >>>>>> "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
> >>>>>>
> >>>>>> The only relevant upstream commit in this area (as far as I can see) is:
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-hyperv/1bb599ee-fe28-409d-b430-2fc086268936@linux.microsoft.com/
> >>>>>>
> >>>>>> The comment regarding avoiding races at start adds a bit more weight behind this
> >>>>>> hunch, though it's only a hunch as I am most definitely nowhere near an expert
> >>>>>> in this area.
> >>>>>>
> >>>>>> -- Package-specific info:
> >>>>>>
> >>>>>> [ 19.625535] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> >>>>>> [ 19.628874] #PF: supervisor read access in kernel mode
> >>>>>> [ 19.630841] #PF: error_code(0x0000) - not-present page
> >>>>>> [ 19.632788] PGD 0 P4D 0
> >>>>>> [ 19.633905] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>>> [ 19.635586] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
> >>>>>> [ 19.640216] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
> >>>>>> [ 19.644514] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> >>>>>> [ 19.646994] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> >>>>>> [ 19.654377] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> >>>>>> [ 19.656385] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> >>>>>> [ 19.659240] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> >>>>>> [ 19.662168] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> >>>>>> [ 19.665239] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> >>>>>> [ 19.668193] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> >>>>>> [ 19.671106] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> >>>>>> [ 19.674281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> [ 19.676533] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> >>>>>> [ 19.679385] Call Trace:
> >>>>>> [ 19.680361] <IRQ>
> >>>>>> [ 19.681181] vmbus_isr+0x1a5/0x210 [hv_vmbus]
> >>>>>> [ 19.682916] __sysvec_hyperv_callback+0x32/0x60
> >>>>>> [ 19.684991] sysvec_hyperv_callback+0x6c/0x90
> >>>>>> [ 19.686665] </IRQ>
> >>>>>> [ 19.687509] <TASK>
> >>>>>> [ 19.688366] asm_sysvec_hyperv_callback+0x1a/0x20
> >>>>>> [ 19.690262] RIP: 0010:pv_native_safe_halt+0xf/0x20
> >>>>>> [ 19.692067] Code: 09 e9 c5 08 01 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d e5 3b 31 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> >>>>>> [ 19.699119] RSP: 0018:ffffb15ac0103ed8 EFLAGS: 00000246
> >>>>>> [ 19.701412] RAX: 0000000000000003 RBX: ffff8ff5403b1fc0 RCX: ffff8ff54c64ce30
> >>>>>> [ 19.704328] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 000000000001f894
> >>>>>> [ 19.706910] RBP: 0000000000000003 R08: 000000000bb760d9 R09: 00fca75150b080e9
> >>>>>> [ 19.709762] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000000
> >>>>>> [ 19.712510] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >>>>>> [ 19.715173] default_idle+0x9/0x20
> >>>>>> [ 19.716846] default_idle_call+0x29/0x100
> >>>>>> [ 19.718623] do_idle+0x1fe/0x240
> >>>>>> [ 19.720045] cpu_startup_entry+0x29/0x30
> >>>>>> [ 19.721595] start_secondary+0x11e/0x140
> >>>>>> [ 19.723080] common_startup_64+0x13e/0x141
> >>>>>> [ 19.725222] </TASK>
> >>>>>> [ 19.726387] Modules linked in: isofs cdrom uio_hv_generic uio binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_mbox_msr isst_if_common rpcrdma skx_edac_common nfit sunrpc libnvdimm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 rdma_ucm ib_iser sha1_ssse3 rdma_cm aesni_intel iw_cm gf128mul crypto_simd libiscsi cryptd ib_umad ib_ipoib scsi_transport_iscsi ib_cm rapl sg hv_utils hv_balloon evdev pcspkr joydev mpls_router ip_tunnel ramoops configfs pstore_blk efi_pstore pstore_zone nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs ip_tables x_tables autofs4 overlay squashfs dm_verity dm_bufio reed_solomon dm_mod loop ext4 crc16 mbcache jbd2 crc32c_generic mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv pci_hyperv_intf hyperv_drm drm_shmem_helper sd_mod drm_kms_helper hv_storvsc scsi_transport_fc drm scsi_mod hid_generic hid_hyperv hid serio_raw hv_netvsc hyperv_keyboard scsi_common hv_vmbus
> >>>>>> [ 19.726466] crc32_pclmul crc32c_intel
> >>>>>> [ 19.765771] CR2: 00000000000000a0
> >>>>>> [ 19.767524] ---[ end trace 0000000000000000 ]---
> >>>>>> [ 19.800433] RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
> >>>>>> [ 19.803170] Code: 02 00 00 5b 5d e9 53 98 69 e9 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 10 <48> 8b b8 a0 00 00 00 f0 83 44 24 fc 00 e9 51 6f fa ff 90 90 90 90
> >>>>>> [ 19.811041] RSP: 0018:ffffb15ac01a4fa8 EFLAGS: 00010046
> >>>>>> [ 19.813466] RAX: 0000000000000000 RBX: 0000000000000015 RCX: 0000000000000015
> >>>>>> [ 19.816504] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff8ff69c759400
> >>>>>> [ 19.819484] RBP: ffff8ff548790200 R08: ffff8ff548790200 R09: 00fca75150b080e9
> >>>>>> [ 19.822625] R10: 0000000000000000 R11: ffffb15ac01a4ff8 R12: ffff8ff871dc1480
> >>>>>> [ 19.825569] R13: ffff8ff69c759400 R14: ffff8ff69c7596a0 R15: ffffffffc106e160
> >>>>>> [ 19.828804] FS: 0000000000000000(0000) GS:ffff8ff871d80000(0000) knlGS:0000000000000000
> >>>>>> [ 19.832214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> [ 19.834709] CR2: 00000000000000a0 CR3: 0000000100ba6003 CR4: 00000000003706f0
> >>>>>> [ 19.837976] Kernel panic - not syncing: Fatal exception in interrupt
> >>>>>> [ 19.841825] Kernel Offset: 0x28a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>>>>> [ 19.896620] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >>>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>> The offending commit appers to be the backport of b15b7d2a1b09
> >>>>> ("uio_hv_generic: Let userspace take care of interrupt mask") for
> >>>>> 6.12.y.
> >>>>>
> >>>>> Peter confirmed that reverting this commit on top of 6.12.57-1 as
> >>>>> packaged in Debian resolves indeed the issue. Interestingly the issue
> >>>>> is *not* seen with 6.17.7 based kernel in Debian.
> >>>>>
> >>>>> #regzbot introduced: 37bd91f22794dc05436130d6983302cb90ecfe7e
> >>>>> #regzbot monitor: https://bugs.debian.org/1120602
> >>>>>
> >>>>> Thank you already!
> >>>>>
> >>>>> Regards,
> >>>>> Salvatore
> >>>>
> >>>> Hi Peter, Salvatore,
> >>>> Thanks for reporting this crash, and sorry for the trouble. Here is my
> >>>> analysis.
> >>>>
> >>>> On 6.17.7, where commit d062463edf17 ("uio_hv_generic: Set event for all
> >>>> channels on the device") is present, hv_uio_irqcontrol() supports
> >>>> setting of interrupt mask from userspace for sub-channels as well.
> >>>>
> >>>> This aligns with commit e29587c07537 ("uio_hv_generic: Let userspace
> >>>> take care of interrupt mask") which relies on userspace to manage
> >>>> interrupt mask, so it safely removes the interrupt mask management logic
> >>>> in the driver.
> >>>>
> >>>> However, in 6.12.57, the first commit is not present, but the second one
> >>>> is, so there is no way to disable interrupt mask for sub-channels and
> >>>> interrupt_mask stays 0, which means interrupts are not masked. So we may
> >>>> be having an interrupt callback being handled for a sub-channel, where
> >>>> we do not expect it to come. This may be causing this issue.
> >>>>
> >>>> This would have led to a crash in hv_uio_channel_cb() for sub-channels:
> >>>> struct hv_device *hv_dev = chan->device_obj;
> >>>>
> >>>>
> >>>> I have ported commit d062463edf17 ("uio_hv_generic: Set event for all
> >>>> channels on the device") on 6.12.57, and resolved some merge conflicts.
> >>>> Could you please help with testing this, if it works for you.
> >>>
> >>> Applying the patch against the debian 6.12.57 kernel worked, I am no
> >>> longer seeing that panic on boot:
> >>>
> >>> gnos@vEdge:~$ uname -a
> >>> Linux vEdge 6.12+unreleased-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> >>> 6.12.57-1a~test (2025-11-14) x86_64 GNU/Linux
> >>> gnos@vEdge:~$ uptime
> >>> 11:46:33 up 4 min, 1 user, load average: 3.31, 2.07, 0.89
> >>> gnos@vEdge:~$ sudo dmidecode -t system
> >>> # dmidecode 3.6
> >>> Getting SMBIOS data from sysfs.
> >>> SMBIOS 3.1.0 present.
> >>>
> >>> Handle 0x0001, DMI type 1, 27 bytes
> >>> System Information
> >>> Manufacturer: Microsoft Corporation
> >>> Product Name: Virtual Machine
> >>> Version: Hyper-V UEFI Release v4.1
> >>> Serial Number: 0000-0002-8036-1108-7588-3134-50
> >>> UUID: 26e86d6e-140c-496a-862c-a3b3bbcd16ad
> >>> Wake-up Type: Power Switch
> >>> SKU Number: None
> >>> Family: Virtual Machine
> >>>
> >>> Handle 0x0010, DMI type 32, 11 bytes
> >>> System Boot Information
> >>> Status: No errors detected
> >>>
> >>> gnos@vEdge:~$
> >>>
> >>> Thanks a lot for the quick analysis!
> >>>
> >>> Peter.
> >>
> >> Hi Peter,
> >>
> >> Thanks for confirming. I am discussing this with Long Li, to hear his
> >> thoughts on this, and have kept the patch ready.
> >> Porting the same on 6.6 and older kernels would be a little different since
> >> we don't have commit 547fa4ffd799 ("uio_hv_generic: Enable interrupt for low
> >> speed VMBus devices") on these kernels and this would lead to merge
> >> conflicts, which needs to be handled separately.
> >>
> >> Meanwhile, if I should be including any tags in the fix patch for debian
> >> bug, please let me know.
> >
> > Thank you very much for the quick analysis and fix.
> >
> > If you can add a Closes: https://bugs.debian.org/1120602 that would
> > make our tracking for the fixes easier. But not sure if this is
> > allowed for proposing the backport for a stable series, as it did not
> > affect the upper releases.
> >
> > In any case your work is much appreciated!
> >
> > Regards,
> > Salvatore
>
> Hi,
> I have sent the patches now to the list. Please consider adding your
> tested-by if you find it alright.
>
> Thanks.
>
> Regards,
> Naman
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Bug#1120602: [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic]
2025-11-21 10:04 ` Peter Morrow
@ 2025-11-26 6:15 ` Naman Jain
0 siblings, 0 replies; 8+ messages in thread
From: Naman Jain @ 2025-11-26 6:15 UTC (permalink / raw)
To: Peter Morrow
Cc: Salvatore Bonaccorso, 1120602, Long Li, linux-hyperv,
linux-kernel, regressions, stable, John Starks, Michael Kelley,
Tianyu Lan, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Greg Kroah-Hartman
On 11/21/2025 3:34 PM, Peter Morrow wrote:
> Hi Naman/Salvatore,
>
> Is it possible to get this fixed in the 6.1 LTS series too? I just ran
> into this crash when moving from bookworm based Debian kernel
> 6.1.153-1 to 6.1.158-1. I saw that "uio_hv_generic: Let userspace take
> care of interrupt mask" appeared in 6.1.156.
>
> Thanks,
> Peter.
>
Hi Peter,
Yes, I have sent a patch for older kernel versions as well.
I am working to fix the review comments and send new revisions.
Here is the link:
https://lore.kernel.org/all/20251115085937.2237-1-namjain@linux.microsoft.com/
Regards,
Naman
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-26 6:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 18:29 [REGRESSION 6.12.y] hyper-v: BUG: kernel NULL pointer dereference, address: 00000000000000a0: RIP: 0010:hv_uio_channel_cb+0xd/0x20 [uio_hv_generic] Salvatore Bonaccorso
2025-11-14 6:03 ` Naman Jain
2025-11-14 11:49 ` Peter Morrow
2025-11-14 14:35 ` Naman Jain
2025-11-14 21:44 ` Bug#1120602: " Salvatore Bonaccorso
2025-11-15 9:03 ` Naman Jain
2025-11-21 10:04 ` Peter Morrow
2025-11-26 6:15 ` Naman Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).