public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [syzbot] [pci?] linux-next test error: general protection fault in msix_capability_init
@ 2025-03-25 22:37 Michael Roth
  2025-03-26  9:25 ` Roger Pau Monné
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Roth @ 2025-03-25 22:37 UTC (permalink / raw)
  To: Aithal, Srikanth
  Cc: linux-pci, linux-kernel, bhelgaas, sfr, syzkaller-bugs,
	linux-next, Roger Pau Monne, Juergen Gross

Also able to reproduce this trace on every boot with a basic KVM guest on an
EPYC Milan system using next-20250325 for both host/guest.

A bisect of commits to drivers/pci/msi seems to indicate the following commit
is the source of the regression:

  commit d9f2164238d814d119e8c979a3579d1199e271bb
  Author: Roger Pau Monne <roger.pau@citrix.com>
  Date:   Wed Feb 19 10:20:57 2025 +0100
  
      PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
      
      Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
      MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
      Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
      event channels, to prevent PCI code from attempting to toggle the maskbit,
      as it's Xen that controls the bit.
      
      However, the pci_msi_ignore_mask being global will affect devices that use
      MSI interrupts but are not routing those interrupts over event channels
      (not using the Xen pIRQ chip).  One example is devices behind a VMD PCI
      bridge.  In that scenario the VMD bridge configures MSI(-X) using the
      normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
      bridge configure the MSI entries using indexes into the VMD bridge MSI
      table.  The VMD bridge then demultiplexes such interrupts and delivers to
      the destination device(s).  Having pci_msi_ignore_mask set in that scenario
      prevents (un)masking of MSI entries for devices behind the VMD bridge.
      
      Move the signaling of no entry masking into the MSI domain flags, as that
      allows setting it on a per-domain basis.  Set it for the Xen MSI domain
      that uses the pIRQ chip, while leaving it unset for the rest of the
      cases.
      
      Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
      with Xen dropping usage the variable is unneeded.
      
      This fixes using devices behind a VMD bridge on Xen PV hardware domains.
      
      Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
      Linux cannot use them.  By inhibiting the usage of
      VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
      bodge devices behind a VMD bridge do work fine when use from a Linux Xen
      hardware domain.  That's the whole point of the series.
      
      Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: Juergen Gross <jgross@suse.com>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com>
      Message-ID: <20250219092059.90850-4-roger.pau@citrix.com>
      Signed-off-by: Juergen Gross <jgross@suse.com>

Thanks,

Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [syzbot] [pci?] linux-next test error: general protection fault in msix_capability_init
@ 2025-03-24 23:58 syzbot
  2025-03-25  3:55 ` Aithal, Srikanth
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2025-03-24 23:58 UTC (permalink / raw)
  To: bhelgaas, linux-kernel, linux-next, linux-pci, sfr,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    882a18c2c14f Add linux-next specific files for 20250324
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17d24804580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=30e7faf61be4d27e
dashboard link: https://syzkaller.appspot.com/bug?extid=d33642573545e529ab61
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ea720fb0d677/disk-882a18c2.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/723a320ec217/vmlinux-882a18c2.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4f23b2e1eb2c/bzImage-882a18c2.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d33642573545e529ab61@syzkaller.appspotmail.com

ntfs3: Enabled Linux POSIX ACLs support
ntfs3: Read-only LZX/Xpress compression included
efs: 1.0a - http://aeschi.ch.eu.org/efs/
jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
romfs: ROMFS MTD (C) 2007 Red Hat, Inc.
QNX4 filesystem 0.2.3 registered.
qnx6: QNX6 filesystem 1.0.0 registered.
fuse: init (API version 7.43)
orangefs_debugfs_init: called with debug mask: :none: :0:
orangefs_init: module version upstream loaded
JFS: nTxBlock = 8192, nTxLock = 65536
SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
9p: Installing v9fs 9p2000 file system support
NILFS version 2 loaded
befs: version: 0.9.3
ocfs2: Registered cluster interface o2cb
ocfs2: Registered cluster interface user
OCFS2 User DLM kernel interface loaded
gfs2: GFS2 installed
ceph: loaded (mds proto 32)
NET: Registered PF_ALG protocol family
xor: automatically using best checksumming function   avx       
async_tx: api initialized (async)
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Asymmetric key parser 'pkcs8' registered
Key type pkcs7_test registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 238)
io scheduler mq-deadline registered
io scheduler kyber registered
io scheduler bfq registered
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: button: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
ACPI: button: Sleep Button [SLPF]
ioatdma: Intel(R) QuickData Technology Driver 5.00
ACPI: \_SB_.LNKC: Enabled at IRQ 11
virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKD: Enabled at IRQ 10
virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
ACPI: \_SB_.LNKB: Enabled at IRQ 10
virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
N_HDLC line discipline registered with maxframe=4096
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
Non-volatile memory driver v1.3
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc7-next-20250324-syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
RIP: 0010:msix_prepare_msi_desc drivers/pci/msi/msi.c:616 [inline]
RIP: 0010:msix_setup_msi_descs drivers/pci/msi/msi.c:640 [inline]
RIP: 0010:msix_setup_interrupts drivers/pci/msi/msi.c:680 [inline]
RIP: 0010:msix_capability_init+0x7a9/0x1550 drivers/pci/msi/msi.c:743
Code: 10 00 74 0f e8 28 9f de fc 48 ba 00 00 00 00 00 fc ff df 48 89 9c 24 d0 00 00 00 48 89 9c 24 98 01 00 00 4c 89 f0 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 86 02 00 00 41 8b 1e be 00 00 40 00 21 de
RSP: 0000:ffffc90000066ee0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000009e008 RCX: ffff8881412b8000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc90000067078
RBP: ffffc90000067138 R08: ffffffff854ea585 R09: 0000000000000000
R10: ffffc90000067020 R11: fffff5200000ce10 R12: 0000000000000000
R13: 0000000000000101 R14: 0000000000000000 R15: 1ffff9200000ce0d
FS:  0000000000000000(0000) GS:ffff888124fc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000eb38000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __pci_enable_msix_range+0x5c7/0x710 drivers/pci/msi/msi.c:851
 pci_alloc_irq_vectors_affinity+0x10e/0x2b0 drivers/pci/msi/api.c:270
 vp_request_msix_vectors drivers/virtio/virtio_pci_common.c:160 [inline]
 vp_find_vqs_msix+0x5da/0xeb0 drivers/virtio/virtio_pci_common.c:417
 vp_find_vqs+0xa0/0x7e0 drivers/virtio/virtio_pci_common.c:525
 virtio_find_vqs include/linux/virtio_config.h:226 [inline]
 virtio_find_single_vq include/linux/virtio_config.h:237 [inline]
 probe_common+0x37b/0x6b0 drivers/char/hw_random/virtio-rng.c:155
 virtio_dev_probe+0x931/0xc80 drivers/virtio/virtio.c:341
 really_probe+0x2b9/0xad0 drivers/base/dd.c:658
 __driver_probe_device+0x1a2/0x390 drivers/base/dd.c:800
 driver_probe_device+0x50/0x430 drivers/base/dd.c:830
 __driver_attach+0x45f/0x710 drivers/base/dd.c:1216
 bus_for_each_dev+0x23e/0x2b0 drivers/base/bus.c:370
 bus_add_driver+0x346/0x670 drivers/base/bus.c:678
 driver_register+0x23a/0x320 drivers/base/driver.c:249
 do_one_initcall+0x24a/0x940 init/main.c:1257
 do_initcall_level+0x157/0x210 init/main.c:1319
 do_initcalls+0x71/0xd0 init/main.c:1335
 kernel_init_freeable+0x432/0x5d0 init/main.c:1567
 kernel_init+0x1d/0x2b0 init/main.c:1457
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:msix_prepare_msi_desc drivers/pci/msi/msi.c:616 [inline]
RIP: 0010:msix_setup_msi_descs drivers/pci/msi/msi.c:640 [inline]
RIP: 0010:msix_setup_interrupts drivers/pci/msi/msi.c:680 [inline]
RIP: 0010:msix_capability_init+0x7a9/0x1550 drivers/pci/msi/msi.c:743
Code: 10 00 74 0f e8 28 9f de fc 48 ba 00 00 00 00 00 fc ff df 48 89 9c 24 d0 00 00 00 48 89 9c 24 98 01 00 00 4c 89 f0 48 c1 e8 03 <0f> b6 04 10 84 c0 0f 85 86 02 00 00 41 8b 1e be 00 00 40 00 21 de
RSP: 0000:ffffc90000066ee0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000009e008 RCX: ffff8881412b8000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc90000067078
RBP: ffffc90000067138 R08: ffffffff854ea585 R09: 0000000000000000
R10: ffffc90000067020 R11: fffff5200000ce10 R12: 0000000000000000
R13: 0000000000000101 R14: 0000000000000000 R15: 1ffff9200000ce0d
FS:  0000000000000000(0000) GS:ffff8881250c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000eb38000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	10 00                	adc    %al,(%rax)
   2:	74 0f                	je     0x13
   4:	e8 28 9f de fc       	call   0xfcde9f31
   9:	48 ba 00 00 00 00 00 	movabs $0xdffffc0000000000,%rdx
  10:	fc ff df
  13:	48 89 9c 24 d0 00 00 	mov    %rbx,0xd0(%rsp)
  1a:	00
  1b:	48 89 9c 24 98 01 00 	mov    %rbx,0x198(%rsp)
  22:	00
  23:	4c 89 f0             	mov    %r14,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	0f b6 04 10          	movzbl (%rax,%rdx,1),%eax <-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	0f 85 86 02 00 00    	jne    0x2bc
  36:	41 8b 1e             	mov    (%r14),%ebx
  39:	be 00 00 40 00       	mov    $0x400000,%esi
  3e:	21 de                	and    %ebx,%esi


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-26  9:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-25 22:37 [syzbot] [pci?] linux-next test error: general protection fault in msix_capability_init Michael Roth
2025-03-26  9:25 ` Roger Pau Monné
  -- strict thread matches above, loose matches on Subject: below --
2025-03-24 23:58 syzbot
2025-03-25  3:55 ` Aithal, Srikanth
2025-03-25  4:33   ` Aithal, Srikanth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox