All of lore.kernel.org
 help / color / mirror / Atom feed
From: Weinan Liu <wnliu@google.com>
To: jgg@ziepe.ca
Cc: iommu@lists.linux.dev, joro@8bytes.org, josef@toxicpanda.com,
	 linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	kpsingh@kernel.org
Subject: Re: [PATCH] amd/iommu: do not split domain flushes when flushing the entire range
Date: Thu,  9 Apr 2026 08:12:25 +0000	[thread overview]
Message-ID: <20260409081227.2149181-1-wnliu@google.com> (raw)
In-Reply-To: <20260326220512.GA245789@ziepe.ca>

> On Thu, Mar 26, 2026 19:05:12 -0300 Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote:
> > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> > > > We are hitting the following soft lockup in production on v6.6 and
> > > > v6.12, but the bug exists in all versions
> > > >
> > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> > > > Call Trace:
> > > >  <TASK>
> > > >  amd_iommu_attach_device+0x69/0x450
> > > >  __iommu_device_set_domain+0x7b/0x190
> > > >  __iommu_group_set_core_domain+0x61/0xd0
> > > >  iommu_detatch_group+0x27/0x40
> > > >  vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> > > >  vfio_group_detach_container+0x59/0x160 [vfio]
> > > >  vfio_group_fops_release+0x4d/0x90 [vfio]
> > > >  __fput+0x95/0x2a0
> > > >  task_work_run+0x93/0xc0
> > > >  do_exit+0x321/0x950
> > > >  do_group_exit+0x7f/0xa0
> > > >  get_signal_0x77d/0x780
> > > >  </TASK>
> > > >
> > > > This occurs because we're a VM and we're splitting up the size
> > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
> > >
> > > This function doesn't exist in the upstream kernel anymore, and the
> > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
> > > all, AFAIK.
> > 
> > This was based on linus/master as of March 4th, and we get here via
> > amd_iommu_flush_tlb_all, which definitely still exists, so what
> > specifically are you talking about? Thanks,
> 
> $ git grep amd_iommu_domain_flush_tlb_pde | wc -l
> 0
> 
> The entire page table logic was rewritten. The stuff that caused these
> issues is gone and the new stuff doesn't appear to have this bug of
> passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS.
> 
> If it does please explain it in terms of the new stuff without
> referencing deleted functions.
> 
> I don't know how you get something like this into -stable.

I believe the function Josef is referring to on linux/master is amd_iommu_domain_flush_all().
https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1820

The potential call sequence appears to be:
```
blocked_domain_attach_device() or amd_iommu_attach_device()
  -> detach_device()
    -> amd_iommu_domain_flush_all()
      ->amd_iommu_domain_flush_pages(...,
		CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
```

Based on the code in build_inv_address()[1], it doesn't make sense to break 
the entire cache size into smaller sizes to perform multiple flushes for a chunk size
larger than 1 << 51(full flush)

[1] https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1289


  reply	other threads:[~2026-04-09  8:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 21:30 [PATCH] amd/iommu: do not split domain flushes when flushing the entire range Josef Bacik
2026-03-12 13:40 ` Jason Gunthorpe
2026-03-14 18:24   ` Josef Bacik
2026-03-26 22:05     ` Jason Gunthorpe
2026-04-09  8:12       ` Weinan Liu [this message]
2026-04-09 13:17         ` Jason Gunthorpe
2026-03-24 20:14 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260409081227.2149181-1-wnliu@google.com \
    --to=wnliu@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=josef@toxicpanda.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.