From: Jason Gunthorpe <jgg@ziepe.ca>
To: Weinan Liu <wnliu@google.com>
Cc: iommu@lists.linux.dev, joro@8bytes.org, josef@toxicpanda.com,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
kpsingh@kernel.org
Subject: Re: [PATCH] amd/iommu: do not split domain flushes when flushing the entire range
Date: Thu, 9 Apr 2026 10:17:21 -0300 [thread overview]
Message-ID: <20260409131721.GQ2551565@ziepe.ca> (raw)
In-Reply-To: <20260409081227.2149181-1-wnliu@google.com>
On Thu, Apr 09, 2026 at 08:12:25AM +0000, Weinan Liu wrote:
> > On Thu, Mar 26, 2026 19:05:12 -0300 Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote:
> > > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> > > > > We are hitting the following soft lockup in production on v6.6 and
> > > > > v6.12, but the bug exists in all versions
> > > > >
> > > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> > > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> > > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> > > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> > > > > Call Trace:
> > > > > <TASK>
> > > > > amd_iommu_attach_device+0x69/0x450
> > > > > __iommu_device_set_domain+0x7b/0x190
> > > > > __iommu_group_set_core_domain+0x61/0xd0
> > > > > iommu_detatch_group+0x27/0x40
> > > > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> > > > > vfio_group_detach_container+0x59/0x160 [vfio]
> > > > > vfio_group_fops_release+0x4d/0x90 [vfio]
> > > > > __fput+0x95/0x2a0
> > > > > task_work_run+0x93/0xc0
> > > > > do_exit+0x321/0x950
> > > > > do_group_exit+0x7f/0xa0
> > > > > get_signal_0x77d/0x780
> > > > > </TASK>
> > > > >
> > > > > This occurs because we're a VM and we're splitting up the size
> > > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> > > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
> > > >
> > > > This function doesn't exist in the upstream kernel anymore, and the
> > > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
> > > > all, AFAIK.
> > >
> > > This was based on linus/master as of March 4th, and we get here via
> > > amd_iommu_flush_tlb_all, which definitely still exists, so what
> > > specifically are you talking about? Thanks,
> >
> > $ git grep amd_iommu_domain_flush_tlb_pde | wc -l
> > 0
> >
> > The entire page table logic was rewritten. The stuff that caused these
> > issues is gone and the new stuff doesn't appear to have this bug of
> > passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS.
> >
> > If it does please explain it in terms of the new stuff without
> > referencing deleted functions.
> >
> > I don't know how you get something like this into -stable.
>
> I believe the function Josef is referring to on linux/master is amd_iommu_domain_flush_all().
> https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1820
That does seem to be an issue, but it is not going to be triggred by a
VFIO trace like Josef is showing. I've already fixed this properly in
my series:
https://lore.kernel.org/all/3-v2-90ddd19c0894+13561-iommupt_inv_amd_jgg@nvidia.com/
+ if (likely(!amd_iommu_np_cache) ||
+ unlikely(address == 0 && last == U64_MAX)) {
+ __domain_flush_pages(domain, address, last);
By fully getting rid of the wrong use of
CMD_INV_IOMMU_ALL_PAGES_ADDRESS as a size in the callers.
So there is a small window when this patch could land with a commit
message to address amd_iommu_domain_flush_all() and be backported
before it all gets reworked and backporting will become hard. Respin
it quickly?
Jason
next prev parent reply other threads:[~2026-04-09 13:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 21:30 [PATCH] amd/iommu: do not split domain flushes when flushing the entire range Josef Bacik
2026-03-12 13:40 ` Jason Gunthorpe
2026-03-14 18:24 ` Josef Bacik
2026-03-26 22:05 ` Jason Gunthorpe
2026-04-09 8:12 ` Weinan Liu
2026-04-09 13:17 ` Jason Gunthorpe [this message]
2026-03-24 20:14 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260409131721.GQ2551565@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=josef@toxicpanda.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=wnliu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox