Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH] vhost/vdpa: reject overflowing PA map page counts
@ 2026-06-24 19:06 Yousef Alhouseen
  2026-06-24 19:53 ` Michael S. Tsirkin
  2026-06-25 19:07 ` sashiko-bot
  0 siblings, 2 replies; 4+ messages in thread
From: Yousef Alhouseen @ 2026-06-24 19:06 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Pérez
  Cc: kvm, virtualization, netdev, linux-kernel, Yousef Alhouseen

vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
size before computing the number of pages to pin. If that addition wraps,
the code can pin and map fewer pages than the requested IOTLB range.

Reject sizes that overflow the page-count calculation. Also make the
memlock check subtraction-based so a large page count cannot wrap the
pinned page total.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 drivers/vhost/vdpa.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ac55275fa..090cb8693 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	unsigned int gup_flags = FOLL_LONGTERM;
 	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
 	unsigned long lock_limit, sz2pin, nchunks, i;
+	unsigned long page_offset;
+	u64 pinned_vm;
 	u64 start = iova;
 	long pinned;
 	int ret = 0;
@@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	if (perm & VHOST_ACCESS_WO)
 		gup_flags |= FOLL_WRITE;
 
-	npages = PFN_UP(size + (iova & ~PAGE_MASK));
+	page_offset = iova & ~PAGE_MASK;
+	if (size > ULONG_MAX - page_offset) {
+		ret = -EINVAL;
+		goto free;
+	}
+	npages = PFN_UP(size + page_offset);
 	if (!npages) {
 		ret = -EINVAL;
 		goto free;
@@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	mmap_read_lock(dev->mm);
 
 	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
-	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
+	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
+	if (npages > lock_limit || pinned_vm > lock_limit - npages) {
 		ret = -ENOMEM;
 		goto unlock;
 	}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
  2026-06-24 19:06 [PATCH] vhost/vdpa: reject overflowing PA map page counts Yousef Alhouseen
@ 2026-06-24 19:53 ` Michael S. Tsirkin
  2026-06-24 21:47   ` Yousef Alhouseen
  2026-06-25 19:07 ` sashiko-bot
  1 sibling, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2026-06-24 19:53 UTC (permalink / raw)
  To: Yousef Alhouseen
  Cc: Jason Wang, Eugenio Pérez, kvm, virtualization, netdev,
	linux-kernel

On Wed, Jun 24, 2026 at 09:06:53PM +0200, Yousef Alhouseen wrote:
> vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
> size before computing the number of pages to pin. If that addition wraps,
> the code can pin and map fewer pages than the requested IOTLB range.
> 
> Reject sizes that overflow the page-count calculation.

You should add "on 32 bit systems" - I do not see how it can
overflow on 64 bit.

> Also make the
> memlock check subtraction-based so a large page count cannot wrap the
> pinned page total.

I don't see how this can happen at all - pinned_vm is in units of pages.

> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
>  drivers/vhost/vdpa.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index ac55275fa..090cb8693 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	unsigned int gup_flags = FOLL_LONGTERM;
>  	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>  	unsigned long lock_limit, sz2pin, nchunks, i;
> +	unsigned long page_offset;
> +	u64 pinned_vm;
>  	u64 start = iova;
>  	long pinned;
>  	int ret = 0;
> @@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	if (perm & VHOST_ACCESS_WO)
>  		gup_flags |= FOLL_WRITE;
>  
> -	npages = PFN_UP(size + (iova & ~PAGE_MASK));
> +	page_offset = iova & ~PAGE_MASK;
> +	if (size > ULONG_MAX - page_offset) {
> +		ret = -EINVAL;
> +		goto free;
> +	}
> +	npages = PFN_UP(size + page_offset);
>  	if (!npages) {
>  		ret = -EINVAL;
>  		goto free;
> @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	mmap_read_lock(dev->mm);
>  
>  	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> -	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> +	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> +	if (npages > lock_limit || pinned_vm > lock_limit - npages) {
>  		ret = -ENOMEM;
>  		goto unlock;
>  	}
> -- 
> 2.54.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
  2026-06-24 19:53 ` Michael S. Tsirkin
@ 2026-06-24 21:47   ` Yousef Alhouseen
  0 siblings, 0 replies; 4+ messages in thread
From: Yousef Alhouseen @ 2026-06-24 21:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Eugenio Pérez, kvm, virtualization, netdev,
	linux-kernel

On Wed, Jun 24, 2026 at 01:53:38PM -0400, Michael S. Tsirkin wrote:
> You should add "on 32 bit systems" - I do not see how it can
> overflow on 64 bit.

Right, the overflow I was trying to cover is the unsigned long
page-count calculation on 32-bit systems, where size can be wider than
unsigned long and the page offset is added before PFN_UP(). I should
have made that scope explicit in the changelog.

> I don't see how this can happen at all - pinned_vm is in units of pages.

Agreed, that part is not needed for this fix. I'll drop the memlock
check change and send a v2 with the changelog clarified to say this is
for 32-bit systems.

Thanks,
Yousef


On Wed, 24 Jun 2026 15:53:38 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jun 24, 2026 at 09:06:53PM +0200, Yousef Alhouseen wrote:
> > vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
> > size before computing the number of pages to pin. If that addition wraps,
> > the code can pin and map fewer pages than the requested IOTLB range.
> >
> > Reject sizes that overflow the page-count calculation.
>
> You should add "on 32 bit systems" - I do not see how it can
> overflow on 64 bit.
>
> > Also make the
> > memlock check subtraction-based so a large page count cannot wrap the
> > pinned page total.
>
> I don't see how this can happen at all - pinned_vm is in units of pages.
>
> > Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> > ---
> > drivers/vhost/vdpa.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index ac55275fa..090cb8693 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > unsigned int gup_flags = FOLL_LONGTERM;
> > unsigned long npages, cur_base, map_pfn, last_pfn = 0;
> > unsigned long lock_limit, sz2pin, nchunks, i;
> > + unsigned long page_offset;
> > + u64 pinned_vm;
> > u64 start = iova;
> > long pinned;
> > int ret = 0;
> > @@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > if (perm & VHOST_ACCESS_WO)
> > gup_flags |= FOLL_WRITE;
> >
> > - npages = PFN_UP(size + (iova & ~PAGE_MASK));
> > + page_offset = iova & ~PAGE_MASK;
> > + if (size > ULONG_MAX - page_offset) {
> > + ret = -EINVAL;
> > + goto free;
> > + }
> > + npages = PFN_UP(size + page_offset);
> > if (!npages) {
> > ret = -EINVAL;
> > goto free;
> > @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > mmap_read_lock(dev->mm);
> >
> > lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> > - if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> > + pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> > + if (npages > lock_limit || pinned_vm > lock_limit - npages) {
> > ret = -ENOMEM;
> > goto unlock;
> > }
> > --
> > 2.54.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
  2026-06-24 19:06 [PATCH] vhost/vdpa: reject overflowing PA map page counts Yousef Alhouseen
  2026-06-24 19:53 ` Michael S. Tsirkin
@ 2026-06-25 19:07 ` sashiko-bot
  1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-25 19:07 UTC (permalink / raw)
  To: Yousef Alhouseen; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The variable `last_pfn` uses `0` as a sentinel value, which breaks the physical discontinuity check if PFN 0 is pinned, allowing an attacker to map arbitrary physical memory into the device IOMMU.
- [High] Permanent leak of pinned memory pages in the error handling path due to flawed state tracking with the `nchunks` variable.
- [High] TOCTOU race in RLIMIT_MEMLOCK check allows users to bypass the locked memory limit by concurrently mapping pages across multiple vhost-vdpa devices.
--

commit b62fefdce516922a639203f628dde15b26a4b097
Author: Yousef Alhouseen <alhouseenyousef@gmail.com>

vhost/vdpa: reject overflowing PA map page counts

This commit adds a check in vhost_vdpa_pa_map to prevent integer overflows
when adding the size and page offset. It also converts the pinned memory
limit check to use subtraction to prevent wrapping.

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index ac55275fa0d0a..090cb8693b7ce 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	unsigned int gup_flags = FOLL_LONGTERM;
>  	unsigned long npages, cur_base, map_pfn, last_pfn = 0;

[Severity: Critical]
This is a pre-existing issue, but does using 0 as a sentinel for last_pfn
break the discontinuity check if PFN 0 is pinned?

If PFN 0 is pinned, last_pfn becomes 0, and this check is silently bypassed
for all subsequent pages:

vhost_vdpa_pa_map() {
    ...
    if (last_pfn && (this_pfn != last_pfn + 1)) {
    ...
}

Could this allow discontiguous physical pages to be incorrectly grouped
together and map arbitrary physical memory into the device IOMMU?

[ ... ]

> @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	mmap_read_lock(dev->mm);
>  
>  	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> -	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> +	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> +	if (npages > lock_limit || pinned_vm > lock_limit - npages) {

[Severity: High]
This isn't a bug introduced by this patch, but does this limit check contain
a race condition?

Because pinned_vm is read without global serialization across the shared
mm_struct, multiple threads could concurrently issue ioctls to update the
IOTLB. Each thread takes its own device mutex and a shared mmap_read_lock.

If all threads concurrently read pinned_vm here, couldn't they all pass the
limit check, pin memory, and increment pinned_vm later, effectively bypassing
the limit?

>  		ret = -ENOMEM;
>  		goto unlock;
>  	}

[Severity: High]
This is a pre-existing issue, but is there a leak of pinned memory pages in
the error handling path of this function?

When a discontiguous page is encountered, the previous chunk is mapped and
nchunks is cleared to 0:

vhost_vdpa_pa_map() {
    ...
    if (last_pfn && (this_pfn != last_pfn + 1)) {
        ...
        ret = vhost_vdpa_map(...)
        ...
        nchunks = 0;
    }
    ...
}

However, the inner loop immediately continues processing the new page, making
it the start of a new outstanding chunk. If a subsequent pin_user_pages()
fails in the next iteration of the outer loop, the code jumps to the out
label:

vhost_vdpa_pa_map() {
    ...
out:
    if (ret) {
        if (nchunks) {
            ...
            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
                unpin_user_page(pfn_to_page(pfn));
        }
        ...
}

Because nchunks was left at 0, doesn't it skip unpinning the outstanding
pages?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624190653.2893-1-alhouseenyousef@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-25 19:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 19:06 [PATCH] vhost/vdpa: reject overflowing PA map page counts Yousef Alhouseen
2026-06-24 19:53 ` Michael S. Tsirkin
2026-06-24 21:47   ` Yousef Alhouseen
2026-06-25 19:07 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox