public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow
Date: Tue, 13 Dec 2022 14:40:24 -0500	[thread overview]
Message-ID: <e8037fea-d4e6-4f51-165c-bb23d90e221f@oracle.com> (raw)
In-Reply-To: <20221213122918.024f43c5.alex.williamson@redhat.com>

On 12/13/2022 2:29 PM, Alex Williamson wrote:
> On Tue, 13 Dec 2022 13:21:15 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 12/13/2022 1:17 PM, Steven Sistare wrote:
>>> On 12/13/2022 1:02 PM, Alex Williamson wrote:  
>>>> On Tue, 13 Dec 2022 07:46:56 -0800
>>>> Steve Sistare <steven.sistare@oracle.com> wrote:
>>>>  
>>>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR
>>>>> interfaces, locked_vm of the new mm becomes 0.  If the user later unmaps a
>>>>> dma mapping, locked_vm underflows to a large unsigned value, and a
>>>>> subsequent dma map request fails with ENOMEM in __account_locked_vm.
>>>>>
>>>>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed,
>>>>> add the mapping's pinned page count to the new mm->locked_vm, subject to
>>>>> the rlimit.  Now that mediated devices are excluded when using
>>>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the
>>>>> mapping.
>>>>>
>>>>> Underflow will not occur when all dma mappings are invalidated before exec.
>>>>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR
>>>>> will fail with EINVAL because the mapping is in the vaddr_invalid state.  
>>>>
>>>> Where is this enforced?  
>>>
>>> In vfio_dma_do_unmap:
>>>         if (invalidate_vaddr) {
>>>                 if (dma->vaddr_invalid) {
>>>                         ...
>>>                         ret = -EINVAL;  
>>
>> My bad, this is a different case, and my comment in the commit message is
>> incorrect.  I should test mm != dma->mm during unmap as well, and suppress
>> the locked_vm deduction there.
> 
> I'm getting confused how this patch actually does anything.  We grab
> the mm of the task doing mappings, and we swap that grab when updating
> the vaddr, but vfio_lock_acct() uses the original dma->task mm for
> accounting.  Therefore how can an underflow occur?  It seems we're
> simply failing to adjust locked_vm for the new mm at all.

The old code saves dma->task, but not dma->task->mm.  The task's mm changes 
across exec.

>>>>> Underflow may still occur in a buggy application that fails to invalidate
>>>>> all before exec.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>>  drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>> index f81e925..e5a02f8 100644
>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>> @@ -100,6 +100,7 @@ struct vfio_dma {
>>>>>  	struct task_struct	*task;
>>>>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>>>>  	unsigned long		*bitmap;
>>>>> +	struct mm_struct	*mm;
>>>>>  };
>>>>>  
>>>>>  struct vfio_batch {
>>>>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>>>  	vfio_unmap_unpin(iommu, dma, true);
>>>>>  	vfio_unlink_dma(iommu, dma);
>>>>>  	put_task_struct(dma->task);
>>>>> +	mmdrop(dma->mm);
>>>>>  	vfio_dma_bitmap_free(dma);
>>>>>  	if (dma->vaddr_invalid) {
>>>>>  		iommu->vaddr_invalid_count--;
>>>>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>>  			dma->vaddr = vaddr;
>>>>>  			dma->vaddr_invalid = false;
>>>>>  			iommu->vaddr_invalid_count--;
>>>>> +			if (current->mm != dma->mm) {
>>>>> +				mmdrop(dma->mm);
>>>>> +				dma->mm = current->mm;
>>>>> +				mmgrab(dma->mm);
>>>>> +				ret = vfio_lock_acct(dma, size >> PAGE_SHIFT,
>>>>> +						     0);  
>>>>
>>>> What does it actually mean if this fails?  The pages are still pinned.
>>>> lock_vm doesn't get updated.  Underflow can still occur.  Thanks,  
>>>
>>> If this fails, the user has locked additional memory after exec and before making
>>> this call -- more than was locked before exec -- and the rlimit is exceeded.
>>> A misbehaving application, which will only hurt itself.
>>>
>>> However, I should reorder these, and check ret before changing the other state.
> 
> The result would then be that the mapping remains with vaddr_invalid on
> error?  Thanks,

Correct.  In theory the app could recover by releasing the extra locked memory that
it grabbed, or increase its rlimit, and then try map_flag_vaddr again.

- Steve

>>>>> +			}
>>>>>  			wake_up_all(&iommu->vaddr_wait);
>>>>>  		}
>>>>>  		goto out_unlock;
>>>>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>>  	get_task_struct(current->group_leader);
>>>>>  	dma->task = current->group_leader;
>>>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>>>> +	dma->mm = dma->task->mm;
>>>>> +	mmgrab(dma->mm);
>>>>>  
>>>>>  	dma->pfn_list = RB_ROOT;
>>>>>    
>>>>  
>>
> 

      reply	other threads:[~2022-12-13 19:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 15:46 [PATCH V1 0/2] fixes for virtual address update Steve Sistare
2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
2022-12-13 16:26   ` Alex Williamson
2022-12-13 16:54     ` Steven Sistare
2022-12-13 17:31       ` Alex Williamson
2022-12-13 17:42         ` Steven Sistare
2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare
2022-12-13 18:02   ` Alex Williamson
2022-12-13 18:17     ` Steven Sistare
2022-12-13 18:21       ` Steven Sistare
2022-12-13 19:29         ` Alex Williamson
2022-12-13 19:40           ` Steven Sistare [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8037fea-d4e6-4f51-165c-bb23d90e221f@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox