From: Alex Mastro <amastro@fb.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
Jason Gunthorpe <jgg@ziepe.ca>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable limit
Date: Fri, 17 Oct 2025 09:29:26 -0700 [thread overview]
Message-ID: <aPJu5sXw6v3DI8w8@devgpu012.nha5.facebook.com> (raw)
In-Reply-To: <20251016160138.374c8cfb@shazbot.org>
On Thu, Oct 16, 2025 at 04:01:38PM -0600, Alex Williamson wrote:
> The legacy vfio container represents a single IOMMU context, which is
> typically managed by a single domain. The replay comes into play when
> groups are under IOMMUs with different properties that prevent us from
> re-using the domain. The case that most comes to mind for this is
> Intel platforms with integrated graphics where there's a separate IOMMU
> for the GPU, which iirc has different coherency settings.
Thanks, this context is helpful and makes sense.
> That mechanism for triggering replay requires a specific hardware
> configuration, but we can easily trigger it through code
> instrumentation, ex:
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5167bec14e36..2cb19ddbb524 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2368,7 +2368,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> d->enforce_cache_coherency ==
> domain->enforce_cache_coherency) {
> iommu_detach_group(domain->domain, group->iommu_group);
> - if (!iommu_attach_group(d->domain,
> + if (0 && !iommu_attach_group(d->domain,
> group->iommu_group)) {
> list_add(&group->next, &d->group_list);
> iommu_domain_free(domain->domain);
>
> We might consider whether it's useful for testing purposes to expose a
> mechanism to toggle this. For a unit test, if we create a container,
> add a group, and build up some suspect mappings, if we then add another
> group to the container with the above bypass we should trigger the
> replay.
Thanks for the tip. I did this, and validated via bpftrace-ing iommu_map that
the container's mappings (one of which lies at the end of address space) are
replayed correctly. Without the fix, the loop body
while (iova < dma->iova + dma->size) { ... iommu_map() ... }
would never be entered for the end of address space mapping due to
dma->iova + dma->size == 0
$ sudo bpftrace -e 'kprobe:iommu_map { printf("pid=%d comm=%s domain=%p iova=%p paddr=%p size=%p prot=%p gfp=%p\n", pid, comm, (void*)arg0, (void*)arg1, (void*)arg2, (void*)arg3, (void*)arg4, (void*)arg5); }'
Attached 1 probe
# original mappings
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
# replayed mapping
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
> In general though the replay shouldn't have a mechanism to trigger
> overflows, we're simply iterating the current set of mappings that have
> already been validated and applying them to a new domain.
Agree. Overflow means that some other invariant has broken, and nonsensical
vfio_dma have infiltrated iommu->dma_list. The combination of iommu->lock
serialization + overflow checks elsewhere should have prevented that.
> In any case, we can all take a second look at the changes there.
Thanks!
Alex
next prev parent reply other threads:[~2025-10-17 16:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 5:32 [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable limit Alex Mastro
2025-10-13 5:32 ` [PATCH v4 1/3] vfio/type1: sanitize for overflow using check_*_overflow Alex Mastro
2025-10-13 5:32 ` [PATCH v4 2/3] vfio/type1: move iova increment to unmap_unpin_* caller Alex Mastro
2025-10-13 5:32 ` [PATCH v4 3/3] vfio/type1: handle DMA map/unmap up to the addressable limit Alex Mastro
2025-10-21 22:18 ` Alejandro Jimenez
2025-10-22 14:24 ` Alex Mastro
2025-10-15 19:24 ` [PATCH v4 0/3] vfio: " Alex Williamson
2025-10-15 21:25 ` Alejandro Jimenez
2025-10-16 21:19 ` Alex Mastro
2025-10-16 22:01 ` Alex Williamson
2025-10-17 16:29 ` Alex Mastro [this message]
2025-10-20 21:36 ` Alex Williamson
2025-10-21 16:25 ` Alex Mastro
2025-10-21 16:31 ` David Matlack
2025-10-21 19:13 ` Alex Mastro
2025-10-22 0:38 ` David Matlack
2025-10-22 14:55 ` Alex Mastro
2025-10-23 20:52 ` Alex Mastro
2025-10-23 22:33 ` Alex Williamson
2025-10-27 16:02 ` Alex Mastro
2025-10-28 1:57 ` Alex Williamson
2025-10-28 15:29 ` Alex Mastro
2025-10-21 12:36 ` Jason Gunthorpe
2025-10-21 22:21 ` Alejandro Jimenez
2025-10-25 18:11 ` Alex Mastro
2025-10-27 13:39 ` Jason Gunthorpe
2025-10-28 18:42 ` Alex Mastro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPJu5sXw6v3DI8w8@devgpu012.nha5.facebook.com \
--to=amastro@fb.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=alex@shazbot.org \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox