From: Will Deacon <will.deacon@arm.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@linaro.org>,
eric.auger@st.com, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
suravee.suthikulpanit@amd.com, christoffer.dall@linaro.org,
linux-kernel@vger.kernel.org, patches@linaro.org
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
Date: Wed, 28 Oct 2015 17:14:11 +0000 [thread overview]
Message-ID: <20151028171410.GK18966@arm.com> (raw)
In-Reply-To: <1446049648.8018.397.camel@redhat.com>
On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 57d8c37..13fb974 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> > {
> > struct vfio_domain *domain;
> > - unsigned long bitmap = PAGE_MASK;
> > + unsigned long bitmap = ULONG_MAX;
>
> Isn't this and removing the WARN_ON()s the only real change in this
> patch? The rest looks like conversion to use IS_ALIGNED and the
> following test, that I don't really understand...
>
> >
> > mutex_lock(&iommu->lock);
> > list_for_each_entry(domain, &iommu->domain_list, next)
> > @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> > static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> > struct vfio_iommu_type1_dma_unmap *unmap)
> > {
> > - uint64_t mask;
> > struct vfio_dma *dma;
> > size_t unmapped = 0;
> > int ret = 0;
> > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> > + PAGE_SIZE : min_pagesz;
>
> This one. If we're going to support sub-PAGE_SIZE mappings, why do we
> care to cap alignment at PAGE_SIZE?
Eric can clarify, but I think the intention here is to have VFIO continue
doing things in PAGE_SIZE chunks precisely so that we don't have to rework
all of the pinning code etc. The IOMMU API can then deal with the smaller
page size.
> > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > -
> > - if (unmap->iova & mask)
> > + if (!IS_ALIGNED(unmap->iova, requested_alignment))
> > return -EINVAL;
> > - if (!unmap->size || unmap->size & mask)
> > + if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
> > return -EINVAL;
> >
> > - WARN_ON(mask & PAGE_MASK);
> > -
> > mutex_lock(&iommu->lock);
> >
> > /*
> > @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > size_t size = map->size;
> > long npage;
> > int ret = 0, prot = 0;
> > - uint64_t mask;
> > struct vfio_dma *dma;
> > unsigned long pfn;
> > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> > + PAGE_SIZE : min_pagesz;
> >
> > /* Verify that none of our __u64 fields overflow */
> > if (map->size != size || map->vaddr != vaddr || map->iova != iova)
> > return -EINVAL;
> >
> > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > -
> > - WARN_ON(mask & PAGE_MASK);
> > -
> > /* READ/WRITE from device perspective */
> > if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
> > prot |= IOMMU_WRITE;
> > if (map->flags & VFIO_DMA_MAP_FLAG_READ)
> > prot |= IOMMU_READ;
> >
> > - if (!prot || !size || (size | iova | vaddr) & mask)
> > + if (!prot || !size ||
> > + !IS_ALIGNED(size | iova | vaddr, requested_alignment))
> > return -EINVAL;
> >
> > /* Don't allow IOVA or virtual address wrap */
>
> This is mostly ignoring the problems with sub-PAGE_SIZE mappings. For
> instance, we can only pin on PAGE_SIZE and therefore we only do
> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
> page, that page gets pinned and accounted 16 times. Are we going to
> tell users that their locked memory limit needs to be 16x now? The rest
> of the code would need an audit as well to see what other sub-page bugs
> might be hiding. Thanks,
I don't see that. The pinning all happens the same in VFIO, which can
then happily pass a 64k region to iommu_map. iommu_map will then call
->map in 4k chunks on the IOMMU driver ops.
Will
next prev parent reply other threads:[~2015-10-28 17:14 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-28 13:12 [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size Eric Auger
2015-10-28 15:37 ` Will Deacon
2015-10-28 16:27 ` Alex Williamson
2015-10-28 17:10 ` Eric Auger
2015-10-28 17:37 ` Alex Williamson
2015-10-28 17:48 ` Eric Auger
2015-10-28 17:55 ` Will Deacon
2015-10-28 18:00 ` Eric Auger
2015-10-28 18:15 ` Alex Williamson
2015-10-28 17:14 ` Will Deacon [this message]
2015-10-28 17:17 ` Eric Auger
2015-10-28 17:28 ` Alex Williamson
2015-10-28 17:41 ` Eric Auger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151028171410.GK18966@arm.com \
--to=will.deacon@arm.com \
--cc=alex.williamson@redhat.com \
--cc=christoffer.dall@linaro.org \
--cc=eric.auger@linaro.org \
--cc=eric.auger@st.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@linaro.org \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).