All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: qemu-devel@nongnu.org, Juro Bystricky <juro.bystricky@intel.com>
Subject: Re: [RFC PATCH] docs: Enhance documentation for iommu bypass
Date: Wed, 22 May 2024 05:28:50 -0400	[thread overview]
Message-ID: <20240522051403-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240522074008.GA171222@ziqianlu-desk2>

On Wed, May 22, 2024 at 03:40:08PM +0800, Aaron Lu wrote:
> When Intel vIOMMU is used and irq remapping is enabled, using
> bypass_iommu will cause following two callstacks dumped during kernel
> boot and all PCI devices attached to root bridge lose their MSI
> capabilities and fall back to using IOAPIC:
> 
> [    0.960262] ------------[ cut here ]------------
> [    0.961245] WARNING: CPU: 3 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x27/0x40
> [    0.963070] Modules linked in:
> [    0.963695] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc7-00056-g45db3ab70092 #1
> [    0.965225] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [    0.967382] RIP: 0010:pci_msi_setup_msi_irqs+0x27/0x40
> [    0.968378] Code: 90 90 90 0f 1f 44 00 00 48 8b 87 30 03 00 00 89 f2 48 85 c0 74 14 f6 40 28 01 74 0e 48 81 c7 c0 00 00 00 31 f6 e9 29 42 9e ff <0f> 0b b8 ed ff ff ff c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00
> [    0.971756] RSP: 0000:ffffc90000017988 EFLAGS: 00010246
> [    0.972669] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [    0.973901] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff888100ee1000
> [    0.975391] RBP: 0000000000000005 R08: ffff888101f44d90 R09: 0000000000000228
> [    0.976629] R10: 0000000000000001 R11: 0000000000008d3f R12: ffffc90000017b80
> [    0.977864] R13: ffff888102312000 R14: ffff888100ee1000 R15: 0000000000000005
> [    0.979092] FS:  0000000000000000(0000) GS:ffff88817bd80000(0000) knlGS:0000000000000000
> [    0.980473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.981464] CR2: 0000000000000000 CR3: 000000000302e001 CR4: 0000000000770ef0
> [    0.982687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.983919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.985143] PKRU: 55555554
> [    0.985625] Call Trace:
> [    0.986056]  <TASK>
> [    0.986440]  ? __warn+0x80/0x130
> [    0.987014]  ? pci_msi_setup_msi_irqs+0x27/0x40
> [    0.987810]  ? report_bug+0x18d/0x1c0
> [    0.988443]  ? handle_bug+0x3a/0x70
> [    0.989026]  ? exc_invalid_op+0x13/0x60
> [    0.989672]  ? asm_exc_invalid_op+0x16/0x20
> [    0.990374]  ? pci_msi_setup_msi_irqs+0x27/0x40
> [    0.991118]  __pci_enable_msix_range+0x325/0x5b0
> [    0.991883]  pci_alloc_irq_vectors_affinity+0xa9/0x110
> [    0.992698]  vp_find_vqs_msix+0x1a8/0x4c0
> [    0.993332]  vp_find_vqs+0x3a/0x1a0
> [    0.993893]  vp_modern_find_vqs+0x17/0x70
> [    0.994531]  init_vq+0x3ad/0x410
> [    0.995051]  ? __pfx_default_calc_sets+0x10/0x10
> [    0.995789]  virtblk_probe+0xeb/0xbc0
> [    0.996362]  ? up_write+0x74/0x160
> [    0.996900]  ? down_write+0x4d/0x80
> [    0.997450]  virtio_dev_probe+0x1bc/0x270
> [    0.998059]  really_probe+0xc1/0x390
> [    0.998626]  ? __pfx___driver_attach+0x10/0x10
> [    0.999288]  __driver_probe_device+0x78/0x150
> [    0.999924]  driver_probe_device+0x1f/0x90
> [    1.000506]  __driver_attach+0xce/0x1c0
> [    1.001073]  bus_for_each_dev+0x70/0xc0
> [    1.001638]  bus_add_driver+0x112/0x210
> [    1.002191]  driver_register+0x55/0x100
> [    1.002760]  virtio_blk_init+0x4c/0x90
> [    1.003332]  ? __pfx_virtio_blk_init+0x10/0x10
> [    1.003974]  do_one_initcall+0x41/0x240
> [    1.004510]  ? kernel_init_freeable+0x240/0x4a0
> [    1.005142]  kernel_init_freeable+0x321/0x4a0
> [    1.005749]  ? __pfx_kernel_init+0x10/0x10
> [    1.006311]  kernel_init+0x16/0x1c0
> [    1.006798]  ret_from_fork+0x2d/0x50
> [    1.007303]  ? __pfx_kernel_init+0x10/0x10
> [    1.007883]  ret_from_fork_asm+0x1a/0x30
> [    1.008431]  </TASK>
> [    1.008748] ---[ end trace 0000000000000000 ]---
> 
> Another callstack happens at pci_msi_teardown_msi_irqs().
> 
> Actually every PCI device will trigger these two paths. There are only
> two callstack dumps because the two places use WARN_ON_ONCE().
> 
> What happened is: when irq remapping is enabled, kernel expects all PCI
> device(or its parent bridges) appear in some DMA Remapping Hardware unit
> Definition(DRHD)'s device scope list and if not, this device's irq domain
> will become NULL and that would make this device's MSI functionality
> enabling fail.
> 
> Per my understanding, only virtualized system can have such a setup: irq
> remapping enabled while not all PCI/PCIe devices appear in a DRHD's
> device scope.
> 
> Enhance the document by mentioning what could happen when bypass_iommu
> is used.
> 
> For detailed qemu cmdline and guest kernel dmesg, please see:
> https://lore.kernel.org/qemu-devel/20240510072519.GA39314@ziqianlu-desk2/
> 
> Reported-by: Juro Bystricky <juro.bystricky@intel.com>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>

Is this issue specific to Linux?

> ---
>  docs/bypass-iommu.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/docs/bypass-iommu.txt b/docs/bypass-iommu.txt
> index e6677bddd3..8226f79104 100644
> --- a/docs/bypass-iommu.txt
> +++ b/docs/bypass-iommu.txt
> @@ -68,6 +68,11 @@ devices might send malicious dma request to virtual machine if there is no
>  iommu isolation. So it would be necessary to only bypass iommu for trusted
>  device.
>  
> +When Intel IOMMU is virtualized, if irq remapping is enabled, PCI and PCIe
> +devices that bypassed vIOMMU will have their MSI/MSI-x functionalities disabled

functionality

> +and fall back to IOAPIC. If this is not desired, disable irq remapping:
> +qemu -device intel-iommu,intremap=off
> +
>  Implementation
>  ==============
>  The bypass iommu feature includes:
> -- 
> 2.45.0



  reply	other threads:[~2024-05-22  9:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-22  7:40 [RFC PATCH] docs: Enhance documentation for iommu bypass Aaron Lu
2024-05-22  9:28 ` Michael S. Tsirkin [this message]
2024-05-22 12:34   ` Aaron Lu
2024-05-23 12:52     ` Aaron Lu
2024-05-23 13:15       ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240522051403-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=juro.bystricky@intel.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.