public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow
Date: Tue, 13 Dec 2022 13:21:15 -0500	[thread overview]
Message-ID: <43ad3256-e485-b358-6445-35645d943b7b@oracle.com> (raw)
In-Reply-To: <69e68902-eed9-748a-887a-549c717ebe01@oracle.com>

On 12/13/2022 1:17 PM, Steven Sistare wrote:
> On 12/13/2022 1:02 PM, Alex Williamson wrote:
>> On Tue, 13 Dec 2022 07:46:56 -0800
>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>
>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>>> dma mapping, locked_vm underflows to a large unsigned value, and a
>>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>>
>>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>>> the rlimit.  Now that mediated devices are excluded when using
>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>>> mapping.
>>>
>>> Underflow will not occur when all dma mappings are invalidated before exec.
>>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>>> will fail with EINVAL because the mapping is in the vaddr_invalid state.
>>
>> Where is this enforced?
> 
> In vfio_dma_do_unmap:
>         if (invalidate_vaddr) {
>                 if (dma->vaddr_invalid) {
>                         ...
>                         ret = -EINVAL;

My bad, this is a different case, and my comment in the commit message is
incorrect.  I should test mm != dma->mm during unmap as well, and suppress
the locked_vm deduction there.

- Steve

>>> Underflow may still occur in a buggy application that fails to invalidate
>>> all before exec.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index f81e925..e5a02f8 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>>  	struct task_struct	*task;
>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>  	unsigned long		*bitmap;
>>> +	struct mm_struct	*mm;
>>>  };
>>>  
>>>  struct vfio_batch {
>>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  	vfio_unmap_unpin(iommu, dma, true);
>>>  	vfio_unlink_dma(iommu, dma);
>>>  	put_task_struct(dma->task);
>>> +	mmdrop(dma->mm);
>>>  	vfio_dma_bitmap_free(dma);
>>>  	if (dma->vaddr_invalid) {
>>>  		iommu->vaddr_invalid_count--;
>>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  			dma->vaddr = vaddr;
>>>  			dma->vaddr_invalid = false;
>>>  			iommu->vaddr_invalid_count--;
>>> +			if (current->mm != dma->mm) {
>>> +				mmdrop(dma->mm);
>>> +				dma->mm = current->mm;
>>> +				mmgrab(dma->mm);
>>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>>> +						     0);
>>
>> What does it actually mean if this fails?  The pages are still pinned.
>> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,
> 
> If this fails, the user has locked additional memory after exec and before making
> this call -- more than was locked before exec -- and the rlimit is exceeded.
> A misbehaving application, which will only hurt itself.
> 
> However, I should reorder these, and check ret before changing the other state.
> 
> - Steve
> 
>>> +			}
>>>  			wake_up_all(&iommu->vaddr_wait);
>>>  		}
>>>  		goto out_unlock;
>>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	get_task_struct(current->group_leader);
>>>  	dma->task = current->group_leader;
>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>> +	dma->mm = dma->task->mm;
>>> +	mmgrab(dma->mm);
>>>  
>>>  	dma->pfn_list = RB_ROOT;
>>>  
>>

  reply	other threads:[~2022-12-13 18:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 15:46 [PATCH V1 0/2] fixes for virtual address update Steve Sistare
2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
2022-12-13 16:26   ` Alex Williamson
2022-12-13 16:54     ` Steven Sistare
2022-12-13 17:31       ` Alex Williamson
2022-12-13 17:42         ` Steven Sistare
2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare
2022-12-13 18:02   ` Alex Williamson
2022-12-13 18:17     ` Steven Sistare
2022-12-13 18:21       ` Steven Sistare [this message]
2022-12-13 19:29         ` Alex Williamson
2022-12-13 19:40           ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43ad3256-e485-b358-6445-35645d943b7b@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox