public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Mastro <amastro@fb.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable limit
Date: Fri, 17 Oct 2025 09:29:26 -0700	[thread overview]
Message-ID: <aPJu5sXw6v3DI8w8@devgpu012.nha5.facebook.com> (raw)
In-Reply-To: <20251016160138.374c8cfb@shazbot.org>

On Thu, Oct 16, 2025 at 04:01:38PM -0600, Alex Williamson wrote:
> The legacy vfio container represents a single IOMMU context, which is
> typically managed by a single domain.  The replay comes into play when
> groups are under IOMMUs with different properties that prevent us from
> re-using the domain.  The case that most comes to mind for this is
> Intel platforms with integrated graphics where there's a separate IOMMU
> for the GPU, which iirc has different coherency settings.

Thanks, this context is helpful and makes sense.

> That mechanism for triggering replay requires a specific hardware
> configuration, but we can easily trigger it through code
> instrumentation, ex:
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5167bec14e36..2cb19ddbb524 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2368,7 +2368,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>                     d->enforce_cache_coherency ==
>                             domain->enforce_cache_coherency) {
>                         iommu_detach_group(domain->domain, group->iommu_group);
> -                       if (!iommu_attach_group(d->domain,
> +                       if (0 && !iommu_attach_group(d->domain,
>                                                 group->iommu_group)) {
>                                 list_add(&group->next, &d->group_list);
>                                 iommu_domain_free(domain->domain);
> 
> We might consider whether it's useful for testing purposes to expose a
> mechanism to toggle this.  For a unit test, if we create a container,
> add a group, and build up some suspect mappings, if we then add another
> group to the container with the above bypass we should trigger the
> replay.

Thanks for the tip. I did this, and validated via bpftrace-ing iommu_map that
the container's mappings (one of which lies at the end of address space) are
replayed correctly. Without the fix, the loop body

while (iova < dma->iova + dma->size) { ... iommu_map() ... }

would never be entered for the end of address space mapping due to

dma->iova + dma->size == 0

$ sudo bpftrace -e 'kprobe:iommu_map { printf("pid=%d comm=%s domain=%p iova=%p paddr=%p size=%p prot=%p gfp=%p\n", pid, comm, (void*)arg0, (void*)arg1, (void*)arg2, (void*)arg3, (void*)arg4, (void*)arg5); }'
Attached 1 probe
# original mappings
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
# replayed mapping
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0

> In general though the replay shouldn't have a mechanism to trigger
> overflows, we're simply iterating the current set of mappings that have
> already been validated and applying them to a new domain.

Agree. Overflow means that some other invariant has broken, and nonsensical
vfio_dma have infiltrated iommu->dma_list. The combination of iommu->lock
serialization + overflow checks elsewhere should have prevented that.

> In any case, we can all take a second look at the changes there.
Thanks!

Alex

  reply	other threads:[~2025-10-17 16:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13  5:32 [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable limit Alex Mastro
2025-10-13  5:32 ` [PATCH v4 1/3] vfio/type1: sanitize for overflow using check_*_overflow Alex Mastro
2025-10-13  5:32 ` [PATCH v4 2/3] vfio/type1: move iova increment to unmap_unpin_* caller Alex Mastro
2025-10-13  5:32 ` [PATCH v4 3/3] vfio/type1: handle DMA map/unmap up to the addressable limit Alex Mastro
2025-10-21 22:18   ` Alejandro Jimenez
2025-10-22 14:24     ` Alex Mastro
2025-10-15 19:24 ` [PATCH v4 0/3] vfio: " Alex Williamson
2025-10-15 21:25   ` Alejandro Jimenez
2025-10-16 21:19     ` Alex Mastro
2025-10-16 22:01       ` Alex Williamson
2025-10-17 16:29         ` Alex Mastro [this message]
2025-10-20 21:36           ` Alex Williamson
2025-10-21 16:25             ` Alex Mastro
2025-10-21 16:31               ` David Matlack
2025-10-21 19:13                 ` Alex Mastro
2025-10-22  0:38                   ` David Matlack
2025-10-22 14:55                     ` Alex Mastro
2025-10-23 20:52               ` Alex Mastro
2025-10-23 22:33                 ` Alex Williamson
2025-10-27 16:02               ` Alex Mastro
2025-10-28  1:57                 ` Alex Williamson
2025-10-28 15:29                   ` Alex Mastro
2025-10-21 12:36 ` Jason Gunthorpe
2025-10-21 22:21 ` Alejandro Jimenez
2025-10-25 18:11 ` Alex Mastro
2025-10-27 13:39   ` Jason Gunthorpe
2025-10-28 18:42     ` Alex Mastro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPJu5sXw6v3DI8w8@devgpu012.nha5.facebook.com \
    --to=amastro@fb.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=alex@shazbot.org \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox