From: si-wei liu <si-wei.liu@oracle.com>
To: Jason Wang <jasowang@redhat.com>, mst@redhat.com, lingshan.zhu@intel.com
Cc: virtualization@lists.linux-foundation.org,
boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] vhost-vdpa: fix page pinning leakage in error path (rework)
Date: Tue, 03 Nov 2020 17:06:15 -0800 [thread overview]
Message-ID: <5FA1FE87.4050909@oracle.com> (raw)
In-Reply-To: <42fe6ef3-90f6-ddb9-f206-e60c1e98c301@redhat.com>
On 11/3/2020 5:00 AM, Jason Wang wrote:
>
> On 2020/10/30 下午3:45, Si-Wei Liu wrote:
>> Pinned pages are not properly accounted particularly when
>> mapping error occurs on IOTLB update. Clean up dangling
>> pinned pages for the error path.
>>
>> The memory usage for bookkeeping pinned pages is reverted
>> to what it was before: only one single free page is needed.
>> This helps reduce the host memory demand for VM with a large
>> amount of memory, or in the situation where host is running
>> short of free memory.
>>
>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>> drivers/vhost/vdpa.c | 64
>> +++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 46 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index b6d9016..8da8558 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>> if (r)
>> vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>> + else
>> + atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>> return r;
>> }
>> @@ -591,14 +593,16 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>> unsigned int gup_flags = FOLL_LONGTERM;
>> unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>> - unsigned long locked, lock_limit, pinned, i;
>> + unsigned long lock_limit, sz2pin, nchunks, i;
>> u64 iova = msg->iova;
>> + long pinned;
>> int ret = 0;
>> if (vhost_iotlb_itree_first(iotlb, msg->iova,
>> msg->iova + msg->size - 1))
>> return -EEXIST;
>> + /* Limit the use of memory for bookkeeping */
>> page_list = (struct page **) __get_free_page(GFP_KERNEL);
>> if (!page_list)
>> return -ENOMEM;
>> @@ -607,52 +611,64 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> gup_flags |= FOLL_WRITE;
>> npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >>
>> PAGE_SHIFT;
>> - if (!npages)
>> - return -EINVAL;
>> + if (!npages) {
>> + ret = -EINVAL;
>> + goto free;
>> + }
>> mmap_read_lock(dev->mm);
>> - locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>> lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>> -
>> - if (locked > lock_limit) {
>> + if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>> ret = -ENOMEM;
>> - goto out;
>> + goto unlock;
>> }
>> cur_base = msg->uaddr & PAGE_MASK;
>> iova &= PAGE_MASK;
>> + nchunks = 0;
>> while (npages) {
>> - pinned = min_t(unsigned long, npages, list_size);
>> - ret = pin_user_pages(cur_base, pinned,
>> - gup_flags, page_list, NULL);
>> - if (ret != pinned)
>> + sz2pin = min_t(unsigned long, npages, list_size);
>> + pinned = pin_user_pages(cur_base, sz2pin,
>> + gup_flags, page_list, NULL);
>> + if (sz2pin != pinned) {
>> + if (pinned < 0) {
>> + ret = pinned;
>> + } else {
>> + unpin_user_pages(page_list, pinned);
>> + ret = -ENOMEM;
>> + }
>> goto out;
>> + }
>> + nchunks++;
>> if (!last_pfn)
>> map_pfn = page_to_pfn(page_list[0]);
>> - for (i = 0; i < ret; i++) {
>> + for (i = 0; i < pinned; i++) {
>> unsigned long this_pfn = page_to_pfn(page_list[i]);
>> u64 csize;
>> if (last_pfn && (this_pfn != last_pfn + 1)) {
>> /* Pin a contiguous chunk of memory */
>> csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>> - if (vhost_vdpa_map(v, iova, csize,
>> - map_pfn << PAGE_SHIFT,
>> - msg->perm))
>> + ret = vhost_vdpa_map(v, iova, csize,
>> + map_pfn << PAGE_SHIFT,
>> + msg->perm);
>> + if (ret)
>> goto out;
>> +
>> map_pfn = this_pfn;
>> iova += csize;
>> + nchunks = 0;
>> }
>> last_pfn = this_pfn;
>> }
>> - cur_base += ret << PAGE_SHIFT;
>> - npages -= ret;
>> + cur_base += pinned << PAGE_SHIFT;
>> + npages -= pinned;
>> }
>> /* Pin the rest chunk */
>> @@ -660,10 +676,22 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> map_pfn << PAGE_SHIFT, msg->perm);
>> out:
>> if (ret) {
>> + if (nchunks && last_pfn) {
>> + unsigned long pfn;
>> +
>> + /*
>> + * Unpin the outstanding pages which are unmapped.
>> + * Mapped pages are accounted in vdpa_map(), thus
>> + * will be handled by vdpa_unmap().
>> + */
>> + for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>> + unpin_user_page(pfn_to_page(pfn));
>> + }
>> vhost_vdpa_unmap(v, msg->iova, msg->size);
>
>
> I want to know what's wrong with current code.
>
> We call vhost_vdpa_unmap() on error which calls
> vhost_vdpa_iotlb_unmap() that will unpin and reduce the pinned_vm.
Think about the case where vhost_vdpa_map() fails in the middle after
making a few successful ones. In the current code, the
vhost_vdpa_iotlb_unmap() unpins what had been mapped, but does not unpin
those that have not yet been mapped. These outstanding pinned pages will
be leaked after leaving the vhost_vdpa_map() function.
Also, the subtraction accounting at the end of the function is incorrect
in that case: @npages is deduced by @pinned in each iteration. That's
why I moved the accounting to vhost_vdpa_map() to be symmetric with
vhost_vdpa_unmap().
-Siwei
>
> Thanks
>
>
>> - atomic64_sub(npages, &dev->mm->pinned_vm);
>> }
>> +unlock:
>> mmap_read_unlock(dev->mm);
>> +free:
>> free_page((unsigned long)page_list);
>> return ret;
>> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: si-wei liu <si-wei.liu@oracle.com>
To: Jason Wang <jasowang@redhat.com>, mst@redhat.com, lingshan.zhu@intel.com
Cc: joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] vhost-vdpa: fix page pinning leakage in error path (rework)
Date: Tue, 03 Nov 2020 17:06:15 -0800 [thread overview]
Message-ID: <5FA1FE87.4050909@oracle.com> (raw)
In-Reply-To: <42fe6ef3-90f6-ddb9-f206-e60c1e98c301@redhat.com>
On 11/3/2020 5:00 AM, Jason Wang wrote:
>
> On 2020/10/30 下午3:45, Si-Wei Liu wrote:
>> Pinned pages are not properly accounted particularly when
>> mapping error occurs on IOTLB update. Clean up dangling
>> pinned pages for the error path.
>>
>> The memory usage for bookkeeping pinned pages is reverted
>> to what it was before: only one single free page is needed.
>> This helps reduce the host memory demand for VM with a large
>> amount of memory, or in the situation where host is running
>> short of free memory.
>>
>> Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> ---
>> drivers/vhost/vdpa.c | 64
>> +++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 46 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index b6d9016..8da8558 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -560,6 +560,8 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
>> if (r)
>> vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1);
>> + else
>> + atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm);
>> return r;
>> }
>> @@ -591,14 +593,16 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>> unsigned int gup_flags = FOLL_LONGTERM;
>> unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>> - unsigned long locked, lock_limit, pinned, i;
>> + unsigned long lock_limit, sz2pin, nchunks, i;
>> u64 iova = msg->iova;
>> + long pinned;
>> int ret = 0;
>> if (vhost_iotlb_itree_first(iotlb, msg->iova,
>> msg->iova + msg->size - 1))
>> return -EEXIST;
>> + /* Limit the use of memory for bookkeeping */
>> page_list = (struct page **) __get_free_page(GFP_KERNEL);
>> if (!page_list)
>> return -ENOMEM;
>> @@ -607,52 +611,64 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> gup_flags |= FOLL_WRITE;
>> npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >>
>> PAGE_SHIFT;
>> - if (!npages)
>> - return -EINVAL;
>> + if (!npages) {
>> + ret = -EINVAL;
>> + goto free;
>> + }
>> mmap_read_lock(dev->mm);
>> - locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
>> lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>> -
>> - if (locked > lock_limit) {
>> + if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>> ret = -ENOMEM;
>> - goto out;
>> + goto unlock;
>> }
>> cur_base = msg->uaddr & PAGE_MASK;
>> iova &= PAGE_MASK;
>> + nchunks = 0;
>> while (npages) {
>> - pinned = min_t(unsigned long, npages, list_size);
>> - ret = pin_user_pages(cur_base, pinned,
>> - gup_flags, page_list, NULL);
>> - if (ret != pinned)
>> + sz2pin = min_t(unsigned long, npages, list_size);
>> + pinned = pin_user_pages(cur_base, sz2pin,
>> + gup_flags, page_list, NULL);
>> + if (sz2pin != pinned) {
>> + if (pinned < 0) {
>> + ret = pinned;
>> + } else {
>> + unpin_user_pages(page_list, pinned);
>> + ret = -ENOMEM;
>> + }
>> goto out;
>> + }
>> + nchunks++;
>> if (!last_pfn)
>> map_pfn = page_to_pfn(page_list[0]);
>> - for (i = 0; i < ret; i++) {
>> + for (i = 0; i < pinned; i++) {
>> unsigned long this_pfn = page_to_pfn(page_list[i]);
>> u64 csize;
>> if (last_pfn && (this_pfn != last_pfn + 1)) {
>> /* Pin a contiguous chunk of memory */
>> csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
>> - if (vhost_vdpa_map(v, iova, csize,
>> - map_pfn << PAGE_SHIFT,
>> - msg->perm))
>> + ret = vhost_vdpa_map(v, iova, csize,
>> + map_pfn << PAGE_SHIFT,
>> + msg->perm);
>> + if (ret)
>> goto out;
>> +
>> map_pfn = this_pfn;
>> iova += csize;
>> + nchunks = 0;
>> }
>> last_pfn = this_pfn;
>> }
>> - cur_base += ret << PAGE_SHIFT;
>> - npages -= ret;
>> + cur_base += pinned << PAGE_SHIFT;
>> + npages -= pinned;
>> }
>> /* Pin the rest chunk */
>> @@ -660,10 +676,22 @@ static int
>> vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
>> map_pfn << PAGE_SHIFT, msg->perm);
>> out:
>> if (ret) {
>> + if (nchunks && last_pfn) {
>> + unsigned long pfn;
>> +
>> + /*
>> + * Unpin the outstanding pages which are unmapped.
>> + * Mapped pages are accounted in vdpa_map(), thus
>> + * will be handled by vdpa_unmap().
>> + */
>> + for (pfn = map_pfn; pfn <= last_pfn; pfn++)
>> + unpin_user_page(pfn_to_page(pfn));
>> + }
>> vhost_vdpa_unmap(v, msg->iova, msg->size);
>
>
> I want to know what's wrong with current code.
>
> We call vhost_vdpa_unmap() on error which calls
> vhost_vdpa_iotlb_unmap() that will unpin and reduce the pinned_vm.
Think about the case where vhost_vdpa_map() fails in the middle after
making a few successful ones. In the current code, the
vhost_vdpa_iotlb_unmap() unpins what had been mapped, but does not unpin
those that have not yet been mapped. These outstanding pinned pages will
be leaked after leaving the vhost_vdpa_map() function.
Also, the subtraction accounting at the end of the function is incorrect
in that case: @npages is deduced by @pinned in each iteration. That's
why I moved the accounting to vhost_vdpa_map() to be symmetric with
vhost_vdpa_unmap().
-Siwei
>
> Thanks
>
>
>> - atomic64_sub(npages, &dev->mm->pinned_vm);
>> }
>> +unlock:
>> mmap_read_unlock(dev->mm);
>> +free:
>> free_page((unsigned long)page_list);
>> return ret;
>> }
>
next prev parent reply other threads:[~2020-11-04 1:06 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-30 7:45 [PATCH 1/2] Revert "vhost-vdpa: fix page pinning leakage in error path" Si-Wei Liu
2020-10-30 7:45 ` Si-Wei Liu
2020-10-30 7:45 ` [PATCH 2/2] vhost-vdpa: fix page pinning leakage in error path (rework) Si-Wei Liu
2020-10-30 7:45 ` Si-Wei Liu
2020-11-03 13:00 ` Jason Wang
2020-11-03 13:00 ` Jason Wang
2020-11-04 1:06 ` si-wei liu [this message]
2020-11-04 1:06 ` si-wei liu
2020-11-04 1:08 ` si-wei liu
2020-11-04 1:08 ` si-wei liu
2020-11-04 1:58 ` Jason Wang
2020-11-04 1:58 ` Jason Wang
2020-11-04 2:14 ` si-wei liu
2020-11-04 2:14 ` si-wei liu
2020-11-04 2:42 ` Jason Wang
2020-11-04 2:42 ` Jason Wang
2020-11-04 23:40 ` si-wei liu
2020-11-04 23:40 ` si-wei liu
2020-11-05 3:12 ` Jason Wang
2020-11-05 3:12 ` Jason Wang
2020-11-05 22:40 ` si-wei liu
2020-11-05 22:40 ` si-wei liu
2020-11-03 8:53 ` [PATCH 1/2] Revert "vhost-vdpa: fix page pinning leakage in error path" Jason Wang
2020-11-03 8:53 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5FA1FE87.4050909@oracle.com \
--to=si-wei.liu@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=jasowang@redhat.com \
--cc=lingshan.zhu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.