Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Auger <eric.auger@linaro.org>
To: Will Deacon <will.deacon@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: eric.auger@st.com, kvm@vger.kernel.org, patches@linaro.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
Date: Wed, 28 Oct 2015 18:17:53 +0100	[thread overview]
Message-ID: <56310341.7010307@linaro.org> (raw)
In-Reply-To: <20151028171410.GK18966@arm.com>

Hi Will,
On 10/28/2015 06:14 PM, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 57d8c37..13fb974 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  {
>>>  	struct vfio_domain *domain;
>>> -	unsigned long bitmap = PAGE_MASK;
>>> +	unsigned long bitmap = ULONG_MAX;
>>
>> Isn't this and removing the WARN_ON()s the only real change in this
>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>> following test, that I don't really understand...
>>
>>>  
>>>  	mutex_lock(&iommu->lock);
>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>  {
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	size_t unmapped = 0;
>>>  	int ret = 0;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>
>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>> care to cap alignment at PAGE_SIZE?
> 
> Eric can clarify, but I think the intention here is to have VFIO continue
> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
> all of the pinning code etc.
That's my intention indeed ;-)

Thanks

Eric
 The IOMMU API can then deal with the smaller
> page size.

> 
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	if (unmap->iova & mask)
>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>  		return -EINVAL;
>>> -	if (!unmap->size || unmap->size & mask)
>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	mutex_lock(&iommu->lock);
>>>  
>>>  	/*
>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	size_t size = map->size;
>>>  	long npage;
>>>  	int ret = 0, prot = 0;
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	unsigned long pfn;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>>  
>>>  	/* Verify that none of our __u64 fields overflow */
>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>  		return -EINVAL;
>>>  
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	/* READ/WRITE from device perspective */
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>  		prot |= IOMMU_WRITE;
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>  		prot |= IOMMU_READ;
>>>  
>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>> +	if (!prot || !size ||
>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>>  	/* Don't allow IOVA or virtual address wrap */
>>
>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>> instance, we can only pin on PAGE_SIZE and therefore we only do
>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>> page, that page gets pinned and accounted 16 times.  Are we going to
>> tell users that their locked memory limit needs to be 16x now?  The rest
>> of the code would need an audit as well to see what other sub-page bugs
>> might be hiding.  Thanks,
> 
> I don't see that. The pinning all happens the same in VFIO, which can
> then happily pass a 64k region to iommu_map. iommu_map will then call
> ->map in 4k chunks on the IOMMU driver ops.
> 
> Will
>

WARNING: multiple messages have this Message-ID (diff)

From: eric.auger@linaro.org (Eric Auger)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
Date: Wed, 28 Oct 2015 18:17:53 +0100	[thread overview]
Message-ID: <56310341.7010307@linaro.org> (raw)
In-Reply-To: <20151028171410.GK18966@arm.com>

Hi Will,
On 10/28/2015 06:14 PM, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 57d8c37..13fb974 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  {
>>>  	struct vfio_domain *domain;
>>> -	unsigned long bitmap = PAGE_MASK;
>>> +	unsigned long bitmap = ULONG_MAX;
>>
>> Isn't this and removing the WARN_ON()s the only real change in this
>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>> following test, that I don't really understand...
>>
>>>  
>>>  	mutex_lock(&iommu->lock);
>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>  {
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	size_t unmapped = 0;
>>>  	int ret = 0;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>
>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>> care to cap alignment at PAGE_SIZE?
> 
> Eric can clarify, but I think the intention here is to have VFIO continue
> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
> all of the pinning code etc.
That's my intention indeed ;-)

Thanks

Eric
 The IOMMU API can then deal with the smaller
> page size.

> 
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	if (unmap->iova & mask)
>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>  		return -EINVAL;
>>> -	if (!unmap->size || unmap->size & mask)
>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	mutex_lock(&iommu->lock);
>>>  
>>>  	/*
>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	size_t size = map->size;
>>>  	long npage;
>>>  	int ret = 0, prot = 0;
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	unsigned long pfn;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>>  
>>>  	/* Verify that none of our __u64 fields overflow */
>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>  		return -EINVAL;
>>>  
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	/* READ/WRITE from device perspective */
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>  		prot |= IOMMU_WRITE;
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>  		prot |= IOMMU_READ;
>>>  
>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>> +	if (!prot || !size ||
>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>>  	/* Don't allow IOVA or virtual address wrap */
>>
>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>> instance, we can only pin on PAGE_SIZE and therefore we only do
>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>> page, that page gets pinned and accounted 16 times.  Are we going to
>> tell users that their locked memory limit needs to be 16x now?  The rest
>> of the code would need an audit as well to see what other sub-page bugs
>> might be hiding.  Thanks,
> 
> I don't see that. The pinning all happens the same in VFIO, which can
> then happily pass a 64k region to iommu_map. iommu_map will then call
> ->map in 4k chunks on the IOMMU driver ops.
> 
> Will
>

WARNING: multiple messages have this Message-ID (diff)

From: Eric Auger <eric.auger@linaro.org>
To: Will Deacon <will.deacon@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: eric.auger@st.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	suravee.suthikulpanit@amd.com, christoffer.dall@linaro.org,
	linux-kernel@vger.kernel.org, patches@linaro.org
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
Date: Wed, 28 Oct 2015 18:17:53 +0100	[thread overview]
Message-ID: <56310341.7010307@linaro.org> (raw)
In-Reply-To: <20151028171410.GK18966@arm.com>

Hi Will,
On 10/28/2015 06:14 PM, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 57d8c37..13fb974 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  {
>>>  	struct vfio_domain *domain;
>>> -	unsigned long bitmap = PAGE_MASK;
>>> +	unsigned long bitmap = ULONG_MAX;
>>
>> Isn't this and removing the WARN_ON()s the only real change in this
>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>> following test, that I don't really understand...
>>
>>>  
>>>  	mutex_lock(&iommu->lock);
>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>  {
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	size_t unmapped = 0;
>>>  	int ret = 0;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>
>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>> care to cap alignment at PAGE_SIZE?
> 
> Eric can clarify, but I think the intention here is to have VFIO continue
> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
> all of the pinning code etc.
That's my intention indeed ;-)

Thanks

Eric
 The IOMMU API can then deal with the smaller
> page size.

> 
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	if (unmap->iova & mask)
>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>  		return -EINVAL;
>>> -	if (!unmap->size || unmap->size & mask)
>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	mutex_lock(&iommu->lock);
>>>  
>>>  	/*
>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	size_t size = map->size;
>>>  	long npage;
>>>  	int ret = 0, prot = 0;
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	unsigned long pfn;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>>  
>>>  	/* Verify that none of our __u64 fields overflow */
>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>  		return -EINVAL;
>>>  
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	/* READ/WRITE from device perspective */
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>  		prot |= IOMMU_WRITE;
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>  		prot |= IOMMU_READ;
>>>  
>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>> +	if (!prot || !size ||
>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>>  	/* Don't allow IOVA or virtual address wrap */
>>
>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>> instance, we can only pin on PAGE_SIZE and therefore we only do
>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>> page, that page gets pinned and accounted 16 times.  Are we going to
>> tell users that their locked memory limit needs to be 16x now?  The rest
>> of the code would need an audit as well to see what other sub-page bugs
>> might be hiding.  Thanks,
> 
> I don't see that. The pinning all happens the same in VFIO, which can
> then happily pass a 64k region to iommu_map. iommu_map will then call
> ->map in 4k chunks on the IOMMU driver ops.
> 
> Will
>

next prev parent reply	other threads:[~2015-10-28 17:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 13:12 [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size Eric Auger
2015-10-28 13:12 ` Eric Auger
2015-10-28 15:37 ` Will Deacon
2015-10-28 15:37   ` Will Deacon
2015-10-28 15:37   ` Will Deacon
2015-10-28 16:27 ` Alex Williamson
2015-10-28 16:27   ` Alex Williamson
2015-10-28 17:10   ` Eric Auger
2015-10-28 17:10     ` Eric Auger
2015-10-28 17:37     ` Alex Williamson
2015-10-28 17:37       ` Alex Williamson
2015-10-28 17:37       ` Alex Williamson
2015-10-28 17:48       ` Eric Auger
2015-10-28 17:48         ` Eric Auger
2015-10-28 17:48         ` Eric Auger
2015-10-28 17:55         ` Will Deacon
2015-10-28 17:55           ` Will Deacon
2015-10-28 18:00           ` Eric Auger
2015-10-28 18:00             ` Eric Auger
2015-10-28 18:00             ` Eric Auger
2015-10-28 18:15             ` Alex Williamson
2015-10-28 18:15               ` Alex Williamson
2015-10-28 18:15               ` Alex Williamson
2015-10-28 17:14   ` Will Deacon
2015-10-28 17:14     ` Will Deacon
2015-10-28 17:17     ` Eric Auger [this message]
2015-10-28 17:17       ` Eric Auger
2015-10-28 17:17       ` Eric Auger
2015-10-28 17:28     ` Alex Williamson
2015-10-28 17:28       ` Alex Williamson
2015-10-28 17:28       ` Alex Williamson
2015-10-28 17:41       ` Eric Auger
2015-10-28 17:41         ` Eric Auger
2015-10-28 17:41         ` Eric Auger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56310341.7010307@linaro.org \
    --to=eric.auger@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=eric.auger@st.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@linaro.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.