public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] amd/iommu: do not split domain flushes when flushing the entire range
@ 2026-03-04 21:30 Josef Bacik
  2026-03-12 13:40 ` Jason Gunthorpe
  2026-03-24 20:14 ` Josef Bacik
  0 siblings, 2 replies; 5+ messages in thread
From: Josef Bacik @ 2026-03-04 21:30 UTC (permalink / raw)
  To: joro, iommu, linux-kernel; +Cc: stable

We are hitting the following soft lockup in production on v6.6 and
v6.12, but the bug exists in all versions

watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
Call Trace:
 <TASK>
 amd_iommu_attach_device+0x69/0x450
 __iommu_device_set_domain+0x7b/0x190
 __iommu_group_set_core_domain+0x61/0xd0
 iommu_detatch_group+0x27/0x40
 vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
 vfio_group_detach_container+0x59/0x160 [vfio]
 vfio_group_fops_release+0x4d/0x90 [vfio]
 __fput+0x95/0x2a0
 task_work_run+0x93/0xc0
 do_exit+0x321/0x950
 do_group_exit+0x7f/0xa0
 get_signal_0x77d/0x780
 </TASK>

This occurs because we're a VM and we're splitting up the size
CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. These
trap into the host on each flush, all while holding the domain lock with
IRQs disabled.

Fix this by not splitting up this special size amount and sending the
whole command in, so perhaps the host will decide to be gracious and not
spend 7 business years to do a flush.

cc: stable@vger.kernel.org
Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 drivers/iommu/amd/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 81c4d7733872..f0d3e06734ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1769,7 +1769,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 {
 	lockdep_assert_held(&domain->lock);
 
-	if (likely(!amd_iommu_np_cache)) {
+	if (likely(!amd_iommu_np_cache) ||
+	    size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS) {
 		__domain_flush_pages(domain, address, size);
 
 		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-26 22:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 21:30 [PATCH] amd/iommu: do not split domain flushes when flushing the entire range Josef Bacik
2026-03-12 13:40 ` Jason Gunthorpe
2026-03-14 18:24   ` Josef Bacik
2026-03-26 22:05     ` Jason Gunthorpe
2026-03-24 20:14 ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox