From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Auger <eric.auger@linaro.org>
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support
 PAGE_SIZE size
Date: Wed, 28 Oct 2015 18:41:51 +0100
Message-ID: <563108DF.6070406@linaro.org>
References: <1446037965-2341-1-git-send-email-eric.auger@linaro.org>
 <1446049648.8018.397.camel@redhat.com> <20151028171410.GK18966@arm.com>
 <1446053322.8018.402.camel@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <kvmarm-bounces@lists.cs.columbia.edu>
Received: from localhost (localhost [127.0.0.1])
 by mm01.cs.columbia.edu (Postfix) with ESMTP id 0FAF641022
 for <kvmarm@lists.cs.columbia.edu>; Wed, 28 Oct 2015 13:39:05 -0400 (EDT)
Received: from mm01.cs.columbia.edu ([127.0.0.1])
 by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id J22QOxGMPeFR for <kvmarm@lists.cs.columbia.edu>;
 Wed, 28 Oct 2015 13:39:04 -0400 (EDT)
Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com
 [209.85.212.179])
 by mm01.cs.columbia.edu (Postfix) with ESMTPS id C6B3B40FA2
 for <kvmarm@lists.cs.columbia.edu>; Wed, 28 Oct 2015 13:39:03 -0400 (EDT)
Received: by wikq8 with SMTP id q8so262169028wik.1
 for <kvmarm@lists.cs.columbia.edu>; Wed, 28 Oct 2015 10:41:57 -0700 (PDT)
In-Reply-To: <1446053322.8018.402.camel@redhat.com>
List-Unsubscribe: <https://lists.cs.columbia.edu/mailman/options/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=unsubscribe>
List-Archive: <https://lists.cs.columbia.edu/pipermail/kvmarm>
List-Post: <mailto:kvmarm@lists.cs.columbia.edu>
List-Help: <mailto:kvmarm-request@lists.cs.columbia.edu?subject=help>
List-Subscribe: <https://lists.cs.columbia.edu/mailman/listinfo/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=subscribe>
Errors-To: kvmarm-bounces@lists.cs.columbia.edu
Sender: kvmarm-bounces@lists.cs.columbia.edu
To: Alex Williamson <alex.williamson@redhat.com>, Will Deacon <will.deacon@arm.com>
Cc: eric.auger@st.com, kvm@vger.kernel.org, patches@linaro.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu
List-Id: kvmarm@lists.cs.columbia.edu

Alex,
On 10/28/2015 06:28 PM, Alex Williamson wrote:
> On Wed, 2015-10-28 at 17:14 +0000, Will Deacon wrote:
>> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 57d8c37..13fb974 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  {
>>>>  	struct vfio_domain *domain;
>>>> -	unsigned long bitmap = PAGE_MASK;
>>>> +	unsigned long bitmap = ULONG_MAX;
>>>
>>> Isn't this and removing the WARN_ON()s the only real change in this
>>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>>> following test, that I don't really understand...
>>>
>>>>  
>>>>  	mutex_lock(&iommu->lock);
>>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>>  {
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	size_t unmapped = 0;
>>>>  	int ret = 0;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>
>>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>>> care to cap alignment at PAGE_SIZE?
>>
>> Eric can clarify, but I think the intention here is to have VFIO continue
>> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
>> all of the pinning code etc. The IOMMU API can then deal with the smaller
>> page size.
> 
> Gak, I read this wrong.  So really we're just artificially adding
> PAGE_SIZE as a supported IOMMU size so long as the IOMMU support
> something smaller than PAGE_SIZE, where PAGE_SIZE is obviously a
> multiple of that smaller size.  Ok, but should we just do this once in
> vfio_pgsize_bitmap()?  This is exactly why VT-d just reports ~(4k - 1)
> for the iommu bitmap.
Yes I can do this in vfio_pgsize_bitmap if you prefer.

Thanks

Eric

> 
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	if (unmap->iova & mask)
>>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>>  		return -EINVAL;
>>>> -	if (!unmap->size || unmap->size & mask)
>>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	mutex_lock(&iommu->lock);
>>>>  
>>>>  	/*
>>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	size_t size = map->size;
>>>>  	long npage;
>>>>  	int ret = 0, prot = 0;
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	unsigned long pfn;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>>  
>>>>  	/* Verify that none of our __u64 fields overflow */
>>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>>  		return -EINVAL;
>>>>  
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	/* READ/WRITE from device perspective */
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>>  		prot |= IOMMU_WRITE;
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>>  		prot |= IOMMU_READ;
>>>>  
>>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>>> +	if (!prot || !size ||
>>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>>  	/* Don't allow IOVA or virtual address wrap */
>>>
>>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>>> instance, we can only pin on PAGE_SIZE and therefore we only do
>>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>>> page, that page gets pinned and accounted 16 times.  Are we going to
>>> tell users that their locked memory limit needs to be 16x now?  The rest
>>> of the code would need an audit as well to see what other sub-page bugs
>>> might be hiding.  Thanks,
>>
>> I don't see that. The pinning all happens the same in VFIO, which can
>> then happily pass a 64k region to iommu_map. iommu_map will then call
>> ->map in 4k chunks on the IOMMU driver ops.
> 
> Yep, I see now that this isn't doing sub-page mappings.  Thanks,
> 
> Alex
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
From: eric.auger@linaro.org (Eric Auger)
Date: Wed, 28 Oct 2015 18:41:51 +0100
Subject: [RFC] vfio/type1: handle case where IOMMU does not support
 PAGE_SIZE size
In-Reply-To: <1446053322.8018.402.camel@redhat.com>
References: <1446037965-2341-1-git-send-email-eric.auger@linaro.org>
 <1446049648.8018.397.camel@redhat.com> <20151028171410.GK18966@arm.com>
 <1446053322.8018.402.camel@redhat.com>
Message-ID: <563108DF.6070406@linaro.org>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Alex,
On 10/28/2015 06:28 PM, Alex Williamson wrote:
> On Wed, 2015-10-28 at 17:14 +0000, Will Deacon wrote:
>> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 57d8c37..13fb974 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  {
>>>>  	struct vfio_domain *domain;
>>>> -	unsigned long bitmap = PAGE_MASK;
>>>> +	unsigned long bitmap = ULONG_MAX;
>>>
>>> Isn't this and removing the WARN_ON()s the only real change in this
>>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>>> following test, that I don't really understand...
>>>
>>>>  
>>>>  	mutex_lock(&iommu->lock);
>>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>>  {
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	size_t unmapped = 0;
>>>>  	int ret = 0;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>
>>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>>> care to cap alignment at PAGE_SIZE?
>>
>> Eric can clarify, but I think the intention here is to have VFIO continue
>> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
>> all of the pinning code etc. The IOMMU API can then deal with the smaller
>> page size.
> 
> Gak, I read this wrong.  So really we're just artificially adding
> PAGE_SIZE as a supported IOMMU size so long as the IOMMU support
> something smaller than PAGE_SIZE, where PAGE_SIZE is obviously a
> multiple of that smaller size.  Ok, but should we just do this once in
> vfio_pgsize_bitmap()?  This is exactly why VT-d just reports ~(4k - 1)
> for the iommu bitmap.
Yes I can do this in vfio_pgsize_bitmap if you prefer.

Thanks

Eric

> 
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	if (unmap->iova & mask)
>>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>>  		return -EINVAL;
>>>> -	if (!unmap->size || unmap->size & mask)
>>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	mutex_lock(&iommu->lock);
>>>>  
>>>>  	/*
>>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	size_t size = map->size;
>>>>  	long npage;
>>>>  	int ret = 0, prot = 0;
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	unsigned long pfn;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>>  
>>>>  	/* Verify that none of our __u64 fields overflow */
>>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>>  		return -EINVAL;
>>>>  
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	/* READ/WRITE from device perspective */
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>>  		prot |= IOMMU_WRITE;
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>>  		prot |= IOMMU_READ;
>>>>  
>>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>>> +	if (!prot || !size ||
>>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>>  	/* Don't allow IOVA or virtual address wrap */
>>>
>>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>>> instance, we can only pin on PAGE_SIZE and therefore we only do
>>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>>> page, that page gets pinned and accounted 16 times.  Are we going to
>>> tell users that their locked memory limit needs to be 16x now?  The rest
>>> of the code would need an audit as well to see what other sub-page bugs
>>> might be hiding.  Thanks,
>>
>> I don't see that. The pinning all happens the same in VFIO, which can
>> then happily pass a 64k region to iommu_map. iommu_map will then call
>> ->map in 4k chunks on the IOMMU driver ops.
> 
> Yep, I see now that this isn't doing sub-page mappings.  Thanks,
> 
> Alex
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755624AbbJ1RmC (ORCPT <rfc822;w@1wt.eu>);
	Wed, 28 Oct 2015 13:42:02 -0400
Received: from mail-wi0-f171.google.com ([209.85.212.171]:37789 "EHLO
	mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755291AbbJ1Rl7 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 28 Oct 2015 13:41:59 -0400
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support
 PAGE_SIZE size
To: Alex Williamson <alex.williamson@redhat.com>,
        Will Deacon <will.deacon@arm.com>
References: <1446037965-2341-1-git-send-email-eric.auger@linaro.org>
 <1446049648.8018.397.camel@redhat.com> <20151028171410.GK18966@arm.com>
 <1446053322.8018.402.camel@redhat.com>
Cc: eric.auger@st.com, linux-arm-kernel@lists.infradead.org,
        kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
        suravee.suthikulpanit@amd.com, christoffer.dall@linaro.org,
        linux-kernel@vger.kernel.org, patches@linaro.org
From: Eric Auger <eric.auger@linaro.org>
Message-ID: <563108DF.6070406@linaro.org>
Date: Wed, 28 Oct 2015 18:41:51 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <1446053322.8018.402.camel@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Alex,
On 10/28/2015 06:28 PM, Alex Williamson wrote:
> On Wed, 2015-10-28 at 17:14 +0000, Will Deacon wrote:
>> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 57d8c37..13fb974 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  {
>>>>  	struct vfio_domain *domain;
>>>> -	unsigned long bitmap = PAGE_MASK;
>>>> +	unsigned long bitmap = ULONG_MAX;
>>>
>>> Isn't this and removing the WARN_ON()s the only real change in this
>>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>>> following test, that I don't really understand...
>>>
>>>>  
>>>>  	mutex_lock(&iommu->lock);
>>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>>  {
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	size_t unmapped = 0;
>>>>  	int ret = 0;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>
>>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>>> care to cap alignment at PAGE_SIZE?
>>
>> Eric can clarify, but I think the intention here is to have VFIO continue
>> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
>> all of the pinning code etc. The IOMMU API can then deal with the smaller
>> page size.
> 
> Gak, I read this wrong.  So really we're just artificially adding
> PAGE_SIZE as a supported IOMMU size so long as the IOMMU support
> something smaller than PAGE_SIZE, where PAGE_SIZE is obviously a
> multiple of that smaller size.  Ok, but should we just do this once in
> vfio_pgsize_bitmap()?  This is exactly why VT-d just reports ~(4k - 1)
> for the iommu bitmap.
Yes I can do this in vfio_pgsize_bitmap if you prefer.

Thanks

Eric

> 
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	if (unmap->iova & mask)
>>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>>  		return -EINVAL;
>>>> -	if (!unmap->size || unmap->size & mask)
>>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	mutex_lock(&iommu->lock);
>>>>  
>>>>  	/*
>>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	size_t size = map->size;
>>>>  	long npage;
>>>>  	int ret = 0, prot = 0;
>>>> -	uint64_t mask;
>>>>  	struct vfio_dma *dma;
>>>>  	unsigned long pfn;
>>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>>> +						PAGE_SIZE : min_pagesz;
>>>>  
>>>>  	/* Verify that none of our __u64 fields overflow */
>>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>>  		return -EINVAL;
>>>>  
>>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>>> -
>>>> -	WARN_ON(mask & PAGE_MASK);
>>>> -
>>>>  	/* READ/WRITE from device perspective */
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>>  		prot |= IOMMU_WRITE;
>>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>>  		prot |= IOMMU_READ;
>>>>  
>>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>>> +	if (!prot || !size ||
>>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>>  		return -EINVAL;
>>>>  
>>>>  	/* Don't allow IOVA or virtual address wrap */
>>>
>>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>>> instance, we can only pin on PAGE_SIZE and therefore we only do
>>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>>> page, that page gets pinned and accounted 16 times.  Are we going to
>>> tell users that their locked memory limit needs to be 16x now?  The rest
>>> of the code would need an audit as well to see what other sub-page bugs
>>> might be hiding.  Thanks,
>>
>> I don't see that. The pinning all happens the same in VFIO, which can
>> then happily pass a 64k region to iommu_map. iommu_map will then call
>> ->map in 4k chunks on the IOMMU driver ops.
> 
> Yep, I see now that this isn't doing sub-page mappings.  Thanks,
> 
> Alex
>