qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] linux-user: fix several mremap bugs
@ 2025-10-11 20:03 Matthew Lugg
  2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable, Matthew Lugg

I was recently debugging a strange crash in a downstream project which turned
out to be a QEMU bug related to the `mremap` implementation in linux-user. In
practice, this bug essentially led to arbitrary memory regions being unmapped
when a 32-bit guest, running on a 64-bit host, uses `mremap` to shrink a memory
mapping.

The first patch in this set resolves that bug. Since the patch is very simple,
and the bug is quite likely to be hit, I suspect that that commit is a good
candidate for qemu-stable.

The following two patches just resolve two more bugs I became aware of whilst
working on this code. I believe the messages in those patches contain all
necessary context. They are less critical and the fixes more complex, so are
likely not suitable for backporting into qemu-stable.

The final commits adds tcg tests for the fixed `mremap` behavior. The third fix
is unfortunately difficult to test programmatically, but I have confirmed that
it behaves as expected by observing the output of `strace qemu-i386 repro`,
where `repro` is the following C program:

    #define _GNU_SOURCE
    #include <stddef.h>
    #include <sys/mman.h>
    int main(void) {
        char *a = mmap(NULL, 4097, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        char *b = mmap(NULL, 4097, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        mremap(b, 4097, 4097, MREMAP_FIXED | MREMAP_MAYMOVE, a);
        // QEMU has now leaked a page of its memory reservation!
        return 0;
    }

Prior to the patch, as the comment says, QEMU leaks a page of its address space
reservation (i.e. the page becomes unmapped). After the patch, QEMU correctly
reclaims that page with `mmap`.

Matthew Lugg (4):
  linux-user: fix mremap unmapping adjacent region
  linux-user: fix mremap errors for invalid ranges
  linux-user: fix reserved_va page leak in do_munmap
  tests: add tcg coverage for fixed mremap bugs

 linux-user/mmap.c               | 75 +++++++++++++--------------------
 tests/tcg/multiarch/test-mmap.c | 47 ++++++++++++++++++---
 2 files changed, 71 insertions(+), 51 deletions(-)

-- 
2.51.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] linux-user: fix mremap unmapping adjacent region
  2025-10-11 20:03 [PATCH 0/4] linux-user: fix several mremap bugs Matthew Lugg
@ 2025-10-11 20:03 ` Matthew Lugg
  2025-10-20 15:08   ` Peter Maydell
  2025-10-21 19:44   ` Richard Henderson
  2025-10-11 20:03 ` [PATCH 2/4] linux-user: fix mremap errors for invalid ranges Matthew Lugg
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable, Matthew Lugg

This typo meant that calls to `mremap` which shrink a mapping by some N
bytes would, when the virtual address space was pre-reserved (e.g.
32-bit guest on 64-bit host), unmap the N bytes following the *original*
mapping.

Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
---
 linux-user/mmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 847092a28a..ec8392b35b 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1164,7 +1164,8 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
                     errno = ENOMEM;
                     host_addr = MAP_FAILED;
                 } else if (reserved_va && old_size > new_size) {
-                    mmap_reserve_or_unmap(old_addr + old_size,
+                    /* Re-reserve pages we just shrunk out of the mapping */
+                    mmap_reserve_or_unmap(old_addr + new_size,
                                           old_size - new_size);
                 }
             }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] linux-user: fix mremap errors for invalid ranges
  2025-10-11 20:03 [PATCH 0/4] linux-user: fix several mremap bugs Matthew Lugg
  2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
@ 2025-10-11 20:03 ` Matthew Lugg
  2025-10-20 15:10   ` Peter Maydell
  2025-10-21 19:50   ` Richard Henderson
  2025-10-11 20:03 ` [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap Matthew Lugg
  2025-10-11 20:03 ` [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs Matthew Lugg
  3 siblings, 2 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable, Matthew Lugg

If an address range given to `mremap` is invalid (exceeds addressing
bounds on the guest), we were previously returning `ENOMEM`, which is
not correct. The manpage and the Linux kernel implementation both agree
that if `old_addr`/`old_size` refer to an invalid address, `EFAULT` is
returned, and if `new_addr`/`new_size` refer to an invalid address,
`EINVAL` is returned.

Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
---
 linux-user/mmap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index ec8392b35b..4c5fe832ad 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1103,12 +1103,15 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
     int prot;
     void *host_addr;
 
-    if (!guest_range_valid_untagged(old_addr, old_size) ||
-        ((flags & MREMAP_FIXED) &&
+    if (!guest_range_valid_untagged(old_addr, old_size)) {
+        errno = EFAULT;
+        return -1;
+    }
+    if (((flags & MREMAP_FIXED) &&
          !guest_range_valid_untagged(new_addr, new_size)) ||
         ((flags & MREMAP_MAYMOVE) == 0 &&
          !guest_range_valid_untagged(old_addr, new_size))) {
-        errno = ENOMEM;
+        errno = EINVAL;
         return -1;
     }
 
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap
  2025-10-11 20:03 [PATCH 0/4] linux-user: fix several mremap bugs Matthew Lugg
  2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
  2025-10-11 20:03 ` [PATCH 2/4] linux-user: fix mremap errors for invalid ranges Matthew Lugg
@ 2025-10-11 20:03 ` Matthew Lugg
  2025-10-20 16:00   ` Peter Maydell
  2025-10-21 19:57   ` Richard Henderson
  2025-10-11 20:03 ` [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs Matthew Lugg
  3 siblings, 2 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable, Matthew Lugg

The previous logic here had an off-by-one error: assuming 4k pages on
host and guest, if `len == 4097` (indicating to unmap 2 pages), then
`last = start + 4096`, so `real_last = start + 4095`, so ultimately
`real_len = 4096`. I do not believe this could cause any observable bugs
in guests, because `target_munmap` page-aligns the length it passes in.
However, calls to this function in `target_mremap` do not page-align the
length, so those calls could "drop" pages, leading to a part of the
reserved region becoming unmapped. At worst, a host allocation could get
mapped into that hole, then clobbered by a new guest mapping.

A simple fix didn't feel ideal here, because I think this function was
not written as well as it could be. Instead, the logic is simpler if we
use `end = start + len` instead of `last = start + len - 1` (overflow
does not cause any problem here), and use offsets in the loops (avoiding
overflows since the offset is never larger than the host page size).

Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
---
 linux-user/mmap.c | 63 ++++++++++++++++-------------------------------
 1 file changed, 21 insertions(+), 42 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 4c5fe832ad..e1ed9085c7 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1014,59 +1014,38 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
 static int mmap_reserve_or_unmap(abi_ulong start, abi_ulong len)
 {
     int host_page_size = qemu_real_host_page_size();
+    abi_ulong end;
     abi_ulong real_start;
-    abi_ulong real_last;
-    abi_ulong real_len;
-    abi_ulong last;
-    abi_ulong a;
+    abi_ulong real_end;
+    abi_ulong off;
     void *host_start;
     int prot;
 
-    last = start + len - 1;
+    end = ROUND_UP(start + len, TARGET_PAGE_SIZE);
+
     real_start = start & -host_page_size;
-    real_last = ROUND_UP(last, host_page_size) - 1;
+    real_end = ROUND_UP(end, host_page_size);
 
-    /*
-     * If guest pages remain on the first or last host pages,
-     * adjust the deallocation to retain those guest pages.
-     * The single page special case is required for the last page,
-     * lest real_start overflow to zero.
-     */
-    if (real_last - real_start < host_page_size) {
-        prot = 0;
-        for (a = real_start; a < start; a += TARGET_PAGE_SIZE) {
-            prot |= page_get_flags(a);
-        }
-        for (a = last; a < real_last; a += TARGET_PAGE_SIZE) {
-            prot |= page_get_flags(a + 1);
-        }
-        if (prot != 0) {
-            return 0;
-        }
-    } else {
-        for (prot = 0, a = real_start; a < start; a += TARGET_PAGE_SIZE) {
-            prot |= page_get_flags(a);
-        }
-        if (prot != 0) {
-            real_start += host_page_size;
-        }
+    /* end or real_end may have overflowed to 0, but that's okay. */
 
-        for (prot = 0, a = last; a < real_last; a += TARGET_PAGE_SIZE) {
-            prot |= page_get_flags(a + 1);
-        }
-        if (prot != 0) {
-            real_last -= host_page_size;
-        }
+    /* If [real_start,start) contains a mapped guest page, retain the first page. */
+    for (prot = 0, off = 0; off < start - real_start; off += TARGET_PAGE_SIZE) {
+        prot |= page_get_flags(real_start + off);
+    }
+    if (prot != 0) {
+        real_start += host_page_size;
+    }
 
-        if (real_last < real_start) {
-            return 0;
-        }
+    /* If [end,real_end) contains a mapped guest page, retain the last page. */
+    for (prot = 0, off = 0; off < real_end - end; off += TARGET_PAGE_SIZE) {
+        prot |= page_get_flags(end + off);
+    }
+    if (prot != 0) {
+        real_end -= host_page_size;
     }
 
-    real_len = real_last - real_start + 1;
     host_start = g2h_untagged(real_start);
-
-    return do_munmap(host_start, real_len);
+    return do_munmap(host_start, real_end - real_start);
 }
 
 int target_munmap(abi_ulong start, abi_ulong len)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs
  2025-10-11 20:03 [PATCH 0/4] linux-user: fix several mremap bugs Matthew Lugg
                   ` (2 preceding siblings ...)
  2025-10-11 20:03 ` [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap Matthew Lugg
@ 2025-10-11 20:03 ` Matthew Lugg
  2025-10-11 20:15   ` Matthew Lugg
  2025-10-20 15:26   ` Peter Maydell
  3 siblings, 2 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable, Matthew Lugg

These tests cover the first two fixes in this patch series. The final
patch is not covered because the bug it fixes is not easily observable
by the guest.

Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
---
 tests/tcg/multiarch/test-mmap.c | 47 +++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/tests/tcg/multiarch/test-mmap.c b/tests/tcg/multiarch/test-mmap.c
index 96257f8ebe..64df694d1a 100644
--- a/tests/tcg/multiarch/test-mmap.c
+++ b/tests/tcg/multiarch/test-mmap.c
@@ -22,6 +22,7 @@
  * along with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdint.h>
@@ -36,12 +37,12 @@
 do                                                             \
 {                                                              \
   if (!(x)) {                                                  \
-    fprintf(stderr, "FAILED at %s:%d\n", __FILE__, __LINE__); \
+    fprintf(stderr, " FAILED at %s:%d\n", __FILE__, __LINE__); \
     exit (EXIT_FAILURE);                                       \
   }                                                            \
 } while (0)
 
-unsigned char *dummybuf;
+unsigned char *dummybuf; /* length is 2*pagesize */
 static unsigned int pagesize;
 static unsigned int pagemask;
 int test_fd;
@@ -439,21 +440,56 @@ void check_invalid_mmaps(void)
 {
     unsigned char *addr;
 
+    fprintf(stdout, "%s", __func__);
+
     /* Attempt to map a zero length page.  */
     addr = mmap(NULL, 0, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
     fail_unless(addr == MAP_FAILED);
     fail_unless(errno == EINVAL);
 
     /* Attempt to map a over length page.  */
     addr = mmap(NULL, -4, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
     fail_unless(addr == MAP_FAILED);
     fail_unless(errno == ENOMEM);
 
+    /* Attempt to remap a region which exceeds the bounds of memory. */
+    addr = mremap((void *)((uintptr_t)pagesize * 10), SIZE_MAX & ~(size_t)pagemask, pagesize, 0);
+    fail_unless(addr == MAP_FAILED);
+    fail_unless(errno == EFAULT);
+
     fprintf(stdout, " passed\n");
 }
 
+void check_shrink_mmaps(void)
+{
+    unsigned char *a, *b, *c;
+    a = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+    b = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+    c = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+    fail_unless(a != MAP_FAILED);
+    fail_unless(b != MAP_FAILED);
+    fail_unless(c != MAP_FAILED);
+
+    /* Ensure we can read the full mappings */
+    memcpy(dummybuf, a, 2 * pagesize);
+    memcpy(dummybuf, b, 2 * pagesize);
+    memcpy(dummybuf, c, 2 * pagesize);
+
+    /* Shrink the middle mapping in-place; the others should be unaffected */
+    b = mremap(b, pagesize * 2, pagesize, 0);
+    fail_unless(b != MAP_FAILED);
+
+    /* Ensure we can still access all valid mappings */
+    memcpy(dummybuf, a, 2 * pagesize);
+    memcpy(dummybuf, b, pagesize);
+    memcpy(dummybuf, c, 2 * pagesize);
+
+    munmap(a, 2 * pagesize);
+    munmap(b, pagesize);
+    munmap(c, 2 * pagesize);
+}
+
 int main(int argc, char **argv)
 {
 	char tempname[] = "/tmp/.cmmapXXXXXX";
@@ -468,7 +504,7 @@ int main(int argc, char **argv)
 
 	/* Assume pagesize is a power of two.  */
 	pagemask = pagesize - 1;
-	dummybuf = malloc (pagesize);
+	dummybuf = malloc (pagesize * 2);
 	printf ("pagesize=%u pagemask=%x\n", pagesize, pagemask);
 
 	test_fd = mkstemp(tempname);
@@ -496,6 +532,7 @@ int main(int argc, char **argv)
 	check_file_fixed_eof_mmaps();
 	check_file_unfixed_eof_mmaps();
 	check_invalid_mmaps();
+    check_shrink_mmaps();
 
 	/* Fails at the moment.  */
 	/* check_aligned_anonymous_fixed_mmaps_collide_with_host(); */
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs
  2025-10-11 20:03 ` [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs Matthew Lugg
@ 2025-10-11 20:15   ` Matthew Lugg
  2025-10-20 15:26   ` Peter Maydell
  1 sibling, 0 replies; 13+ messages in thread
From: Matthew Lugg @ 2025-10-11 20:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent, qemu-stable

I only just noticed I accidentally changed a line in a macro at the top 
of this
file ("FAILED" -> " FAILED"); that was unintentional. I'll omit that 
change from
future versions of this patch.

On 10/11/25 21:03, Matthew Lugg wrote:
> These tests cover the first two fixes in this patch series. The final
> patch is not covered because the bug it fixes is not easily observable
> by the guest.
>
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---
>   tests/tcg/multiarch/test-mmap.c | 47 +++++++++++++++++++++++++++++----
>   1 file changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/tests/tcg/multiarch/test-mmap.c b/tests/tcg/multiarch/test-mmap.c
> index 96257f8ebe..64df694d1a 100644
> --- a/tests/tcg/multiarch/test-mmap.c
> +++ b/tests/tcg/multiarch/test-mmap.c
> @@ -22,6 +22,7 @@
>    * along with this program; if not, see <http://www.gnu.org/licenses/>.
>    */
>   
> +#define _GNU_SOURCE
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <stdint.h>
> @@ -36,12 +37,12 @@
>   do                                                             \
>   {                                                              \
>     if (!(x)) {                                                  \
> -    fprintf(stderr, "FAILED at %s:%d\n", __FILE__, __LINE__); \
> +    fprintf(stderr, " FAILED at %s:%d\n", __FILE__, __LINE__); \
>       exit (EXIT_FAILURE);                                       \
>     }                                                            \
>   } while (0)
>   
> -unsigned char *dummybuf;
> +unsigned char *dummybuf; /* length is 2*pagesize */
>   static unsigned int pagesize;
>   static unsigned int pagemask;
>   int test_fd;
> @@ -439,21 +440,56 @@ void check_invalid_mmaps(void)
>   {
>       unsigned char *addr;
>   
> +    fprintf(stdout, "%s", __func__);
> +
>       /* Attempt to map a zero length page.  */
>       addr = mmap(NULL, 0, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
>       fail_unless(addr == MAP_FAILED);
>       fail_unless(errno == EINVAL);
>   
>       /* Attempt to map a over length page.  */
>       addr = mmap(NULL, -4, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
>       fail_unless(addr == MAP_FAILED);
>       fail_unless(errno == ENOMEM);
>   
> +    /* Attempt to remap a region which exceeds the bounds of memory. */
> +    addr = mremap((void *)((uintptr_t)pagesize * 10), SIZE_MAX & ~(size_t)pagemask, pagesize, 0);
> +    fail_unless(addr == MAP_FAILED);
> +    fail_unless(errno == EFAULT);
> +
>       fprintf(stdout, " passed\n");
>   }
>   
> +void check_shrink_mmaps(void)
> +{
> +    unsigned char *a, *b, *c;
> +    a = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +    b = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +    c = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +
> +    fail_unless(a != MAP_FAILED);
> +    fail_unless(b != MAP_FAILED);
> +    fail_unless(c != MAP_FAILED);
> +
> +    /* Ensure we can read the full mappings */
> +    memcpy(dummybuf, a, 2 * pagesize);
> +    memcpy(dummybuf, b, 2 * pagesize);
> +    memcpy(dummybuf, c, 2 * pagesize);
> +
> +    /* Shrink the middle mapping in-place; the others should be unaffected */
> +    b = mremap(b, pagesize * 2, pagesize, 0);
> +    fail_unless(b != MAP_FAILED);
> +
> +    /* Ensure we can still access all valid mappings */
> +    memcpy(dummybuf, a, 2 * pagesize);
> +    memcpy(dummybuf, b, pagesize);
> +    memcpy(dummybuf, c, 2 * pagesize);
> +
> +    munmap(a, 2 * pagesize);
> +    munmap(b, pagesize);
> +    munmap(c, 2 * pagesize);
> +}
> +
>   int main(int argc, char **argv)
>   {
>   	char tempname[] = "/tmp/.cmmapXXXXXX";
> @@ -468,7 +504,7 @@ int main(int argc, char **argv)
>   
>   	/* Assume pagesize is a power of two.  */
>   	pagemask = pagesize - 1;
> -	dummybuf = malloc (pagesize);
> +	dummybuf = malloc (pagesize * 2);
>   	printf ("pagesize=%u pagemask=%x\n", pagesize, pagemask);
>   
>   	test_fd = mkstemp(tempname);
> @@ -496,6 +532,7 @@ int main(int argc, char **argv)
>   	check_file_fixed_eof_mmaps();
>   	check_file_unfixed_eof_mmaps();
>   	check_invalid_mmaps();
> +    check_shrink_mmaps();
>   
>   	/* Fails at the moment.  */
>   	/* check_aligned_anonymous_fixed_mmaps_collide_with_host(); */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] linux-user: fix mremap unmapping adjacent region
  2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
@ 2025-10-20 15:08   ` Peter Maydell
  2025-10-21 19:44   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2025-10-20 15:08 UTC (permalink / raw)
  To: Matthew Lugg; +Cc: qemu-devel, laurent, qemu-stable

On Sat, 11 Oct 2025 at 21:20, Matthew Lugg <mlugg@mlugg.co.uk> wrote:
>
> This typo meant that calls to `mremap` which shrink a mapping by some N
> bytes would, when the virtual address space was pre-reserved (e.g.
> 32-bit guest on 64-bit host), unmap the N bytes following the *original*
> mapping.
>
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---
>  linux-user/mmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index 847092a28a..ec8392b35b 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -1164,7 +1164,8 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
>                      errno = ENOMEM;
>                      host_addr = MAP_FAILED;
>                  } else if (reserved_va && old_size > new_size) {
> -                    mmap_reserve_or_unmap(old_addr + old_size,
> +                    /* Re-reserve pages we just shrunk out of the mapping */
> +                    mmap_reserve_or_unmap(old_addr + new_size,
>                                            old_size - new_size);
>                  }
>              }
> --

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I agree with your cover letter:

Cc: qemu-stable@nongnu.org

I think this has been broken for a long time, right back to
the introduction of pre-allocated guest address space in
commit 68a1c8168, where the code was clearly confused between
old_size and new_size:

+            if (host_addr != MAP_FAILED && reserved_va && old_size >
new_size) {
+                mmap_reserve(old_addr + old_size, new_size - old_size);
+            }

In 2020 commit 257a7e212d5e5 fixed half of this problem
(swapping the two sizes in the second argument to
mmap_reserve()) but we didn't notice that the first
argument was also wrong.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] linux-user: fix mremap errors for invalid ranges
  2025-10-11 20:03 ` [PATCH 2/4] linux-user: fix mremap errors for invalid ranges Matthew Lugg
@ 2025-10-20 15:10   ` Peter Maydell
  2025-10-21 19:50   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2025-10-20 15:10 UTC (permalink / raw)
  To: Matthew Lugg; +Cc: qemu-devel, laurent, qemu-stable

On Sat, 11 Oct 2025 at 21:21, Matthew Lugg <mlugg@mlugg.co.uk> wrote:
>
> If an address range given to `mremap` is invalid (exceeds addressing
> bounds on the guest), we were previously returning `ENOMEM`, which is
> not correct. The manpage and the Linux kernel implementation both agree
> that if `old_addr`/`old_size` refer to an invalid address, `EFAULT` is
> returned, and if `new_addr`/`new_size` refer to an invalid address,
> `EINVAL` is returned.
>
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs
  2025-10-11 20:03 ` [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs Matthew Lugg
  2025-10-11 20:15   ` Matthew Lugg
@ 2025-10-20 15:26   ` Peter Maydell
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2025-10-20 15:26 UTC (permalink / raw)
  To: Matthew Lugg; +Cc: qemu-devel, laurent, qemu-stable

On Sat, 11 Oct 2025 at 21:20, Matthew Lugg <mlugg@mlugg.co.uk> wrote:
>
> These tests cover the first two fixes in this patch series. The final
> patch is not covered because the bug it fixes is not easily observable
> by the guest.
>
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---
>  tests/tcg/multiarch/test-mmap.c | 47 +++++++++++++++++++++++++++++----
>  1 file changed, 42 insertions(+), 5 deletions(-)

The test case itself looks good, so my review comments
below are just minor nits.

> diff --git a/tests/tcg/multiarch/test-mmap.c b/tests/tcg/multiarch/test-mmap.c
> index 96257f8ebe..64df694d1a 100644
> --- a/tests/tcg/multiarch/test-mmap.c
> +++ b/tests/tcg/multiarch/test-mmap.c
> @@ -22,6 +22,7 @@
>   * along with this program; if not, see <http://www.gnu.org/licenses/>.
>   */
>
> +#define _GNU_SOURCE

Why do we need to define this now ?

>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <stdint.h>
> @@ -36,12 +37,12 @@
>  do                                                             \
>  {                                                              \
>    if (!(x)) {                                                  \
> -    fprintf(stderr, "FAILED at %s:%d\n", __FILE__, __LINE__); \
> +    fprintf(stderr, " FAILED at %s:%d\n", __FILE__, __LINE__); \

I think that this is trying to fix a cosmetic bug in
the printing of error messages: the tests each print
some line without a newline, like:
  fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
and then for the passing case we add a space and complete the line:
  fprintf(stdout, " passed\n");

but this fail_unless() macro is not adding the space, so
presumably we print something awkward like
check_invalid_mmaps addr=0x12435468FAILED at ...

But we should separate out this trivial cleanup from
the patch adding a new test case.

>      exit (EXIT_FAILURE);                                       \
>    }                                                            \
>  } while (0)
>
> -unsigned char *dummybuf;
> +unsigned char *dummybuf; /* length is 2*pagesize */
>  static unsigned int pagesize;
>  static unsigned int pagemask;
>  int test_fd;
> @@ -439,21 +440,56 @@ void check_invalid_mmaps(void)
>  {
>      unsigned char *addr;
>
> +    fprintf(stdout, "%s", __func__);
> +
>      /* Attempt to map a zero length page.  */
>      addr = mmap(NULL, 0, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
>      fail_unless(addr == MAP_FAILED);
>      fail_unless(errno == EINVAL);
>
>      /* Attempt to map a over length page.  */
>      addr = mmap(NULL, -4, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -    fprintf(stdout, "%s addr=%p", __func__, (void *)addr);
>      fail_unless(addr == MAP_FAILED);
>      fail_unless(errno == ENOMEM);

Can we leave the printfs for the existing test cases alone?
You can add a new one for your new subcase:
       fprintf(stdout, "%s mremap addr=%p", __func__, addr);

> +    /* Attempt to remap a region which exceeds the bounds of memory. */
> +    addr = mremap((void *)((uintptr_t)pagesize * 10), SIZE_MAX & ~(size_t)pagemask, pagesize, 0);
> +    fail_unless(addr == MAP_FAILED);
> +    fail_unless(errno == EFAULT);
> +
>      fprintf(stdout, " passed\n");
>  }
>
> +void check_shrink_mmaps(void)
> +{
> +    unsigned char *a, *b, *c;
> +    a = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +    b = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +    c = mmap(NULL, pagesize * 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +
> +    fail_unless(a != MAP_FAILED);
> +    fail_unless(b != MAP_FAILED);
> +    fail_unless(c != MAP_FAILED);
> +
> +    /* Ensure we can read the full mappings */
> +    memcpy(dummybuf, a, 2 * pagesize);
> +    memcpy(dummybuf, b, 2 * pagesize);
> +    memcpy(dummybuf, c, 2 * pagesize);
> +
> +    /* Shrink the middle mapping in-place; the others should be unaffected */
> +    b = mremap(b, pagesize * 2, pagesize, 0);
> +    fail_unless(b != MAP_FAILED);
> +
> +    /* Ensure we can still access all valid mappings */
> +    memcpy(dummybuf, a, 2 * pagesize);
> +    memcpy(dummybuf, b, pagesize);
> +    memcpy(dummybuf, c, 2 * pagesize);
> +
> +    munmap(a, 2 * pagesize);
> +    munmap(b, pagesize);
> +    munmap(c, 2 * pagesize);
> +}
> +
>  int main(int argc, char **argv)
>  {
>         char tempname[] = "/tmp/.cmmapXXXXXX";
> @@ -468,7 +504,7 @@ int main(int argc, char **argv)
>
>         /* Assume pagesize is a power of two.  */
>         pagemask = pagesize - 1;
> -       dummybuf = malloc (pagesize);
> +       dummybuf = malloc (pagesize * 2);
>         printf ("pagesize=%u pagemask=%x\n", pagesize, pagemask);
>
>         test_fd = mkstemp(tempname);
> @@ -496,6 +532,7 @@ int main(int argc, char **argv)
>         check_file_fixed_eof_mmaps();
>         check_file_unfixed_eof_mmaps();
>         check_invalid_mmaps();
> +    check_shrink_mmaps();

I was going to complain about indent on this line, but the
problem seems to be that the file is incorrectly
indented with hardcoded tabs for parts of it.

>         /* Fails at the moment.  */
>         /* check_aligned_anonymous_fixed_mmaps_collide_with_host(); */

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap
  2025-10-11 20:03 ` [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap Matthew Lugg
@ 2025-10-20 16:00   ` Peter Maydell
  2025-10-21 19:57   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2025-10-20 16:00 UTC (permalink / raw)
  To: Matthew Lugg; +Cc: qemu-devel, laurent, qemu-stable

On Sat, 11 Oct 2025 at 21:20, Matthew Lugg <mlugg@mlugg.co.uk> wrote:
>
> The previous logic here had an off-by-one error: assuming 4k pages on
> host and guest, if `len == 4097` (indicating to unmap 2 pages), then
> `last = start + 4096`, so `real_last = start + 4095`, so ultimately
> `real_len = 4096`. I do not believe this could cause any observable bugs
> in guests, because `target_munmap` page-aligns the length it passes in.
> However, calls to this function in `target_mremap` do not page-align the
> length, so those calls could "drop" pages, leading to a part of the
> reserved region becoming unmapped. At worst, a host allocation could get
> mapped into that hole, then clobbered by a new guest mapping.
>
> A simple fix didn't feel ideal here, because I think this function was
> not written as well as it could be. Instead, the logic is simpler if we
> use `end = start + len` instead of `last = start + len - 1` (overflow
> does not cause any problem here), and use offsets in the loops (avoiding
> overflows since the offset is never larger than the host page size).

I don't really understand this code, I'm just looking at
it fresh, so my comment below might be wrong.

> -    /*
> -     * If guest pages remain on the first or last host pages,
> -     * adjust the deallocation to retain those guest pages.
> -     * The single page special case is required for the last page,
> -     * lest real_start overflow to zero.
> -     */

This comment says we need the special case for
"real_last - real_start < host_page_size" to avoid an overflow.

> -    if (real_last - real_start < host_page_size) {
> -        prot = 0;

We delete the special case...

> -        for (a = real_start; a < start; a += TARGET_PAGE_SIZE) {
> -            prot |= page_get_flags(a);
> -        }
> -        for (a = last; a < real_last; a += TARGET_PAGE_SIZE) {
> -            prot |= page_get_flags(a + 1);
> -        }
> -        if (prot != 0) {
> -            return 0;
> -        }
> -    } else {
> -        for (prot = 0, a = real_start; a < start; a += TARGET_PAGE_SIZE) {
> -            prot |= page_get_flags(a);
> -        }
> -        if (prot != 0) {
> -            real_start += host_page_size;
> -        }
> +    /* end or real_end may have overflowed to 0, but that's okay. */
>
> -        for (prot = 0, a = last; a < real_last; a += TARGET_PAGE_SIZE) {
> -            prot |= page_get_flags(a + 1);
> -        }
> -        if (prot != 0) {
> -            real_last -= host_page_size;
> -        }
> +    /* If [real_start,start) contains a mapped guest page, retain the first page. */
> +    for (prot = 0, off = 0; off < start - real_start; off += TARGET_PAGE_SIZE) {
> +        prot |= page_get_flags(real_start + off);
> +    }
> +    if (prot != 0) {
> +        real_start += host_page_size;

...and now if real_start was the last page in the
address space, this addition will overflow it to zero.

> +    }
>
> -        if (real_last < real_start) {
> -            return 0;
> -        }
> +    /* If [end,real_end) contains a mapped guest page, retain the last page. */
> +    for (prot = 0, off = 0; off < real_end - end; off += TARGET_PAGE_SIZE) {
> +        prot |= page_get_flags(end + off);
> +    }
> +    if (prot != 0) {
> +        real_end -= host_page_size;
>      }
>
> -    real_len = real_last - real_start + 1;
>      host_start = g2h_untagged(real_start);
> -
> -    return do_munmap(host_start, real_len);
> +    return do_munmap(host_start, real_end - real_start);
>  }
>
>  int target_munmap(abi_ulong start, abi_ulong len)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] linux-user: fix mremap unmapping adjacent region
  2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
  2025-10-20 15:08   ` Peter Maydell
@ 2025-10-21 19:44   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2025-10-21 19:44 UTC (permalink / raw)
  To: qemu-devel

On 10/11/25 15:03, Matthew Lugg wrote:
> This typo meant that calls to `mremap` which shrink a mapping by some N
> bytes would, when the virtual address space was pre-reserved (e.g.
> 32-bit guest on 64-bit host), unmap the N bytes following the *original*
> mapping.
> 
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---
>   linux-user/mmap.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index 847092a28a..ec8392b35b 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -1164,7 +1164,8 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
>                       errno = ENOMEM;
>                       host_addr = MAP_FAILED;
>                   } else if (reserved_va && old_size > new_size) {
> -                    mmap_reserve_or_unmap(old_addr + old_size,
> +                    /* Re-reserve pages we just shrunk out of the mapping */
> +                    mmap_reserve_or_unmap(old_addr + new_size,
>                                             old_size - new_size);
>                   }
>               }

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] linux-user: fix mremap errors for invalid ranges
  2025-10-11 20:03 ` [PATCH 2/4] linux-user: fix mremap errors for invalid ranges Matthew Lugg
  2025-10-20 15:10   ` Peter Maydell
@ 2025-10-21 19:50   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2025-10-21 19:50 UTC (permalink / raw)
  To: qemu-devel

On 10/11/25 15:03, Matthew Lugg wrote:
> If an address range given to `mremap` is invalid (exceeds addressing
> bounds on the guest), we were previously returning `ENOMEM`, which is
> not correct. The manpage and the Linux kernel implementation both agree
> that if `old_addr`/`old_size` refer to an invalid address, `EFAULT` is
> returned, and if `new_addr`/`new_size` refer to an invalid address,
> `EINVAL` is returned.
> 
> Signed-off-by: Matthew Lugg <mlugg@mlugg.co.uk>
> ---
>   linux-user/mmap.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index ec8392b35b..4c5fe832ad 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -1103,12 +1103,15 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
>       int prot;
>       void *host_addr;
>   
> -    if (!guest_range_valid_untagged(old_addr, old_size) ||
> -        ((flags & MREMAP_FIXED) &&
> +    if (!guest_range_valid_untagged(old_addr, old_size)) {
> +        errno = EFAULT;
> +        return -1;
> +    }
> +    if (((flags & MREMAP_FIXED) &&
>            !guest_range_valid_untagged(new_addr, new_size)) ||
>           ((flags & MREMAP_MAYMOVE) == 0 &&
>            !guest_range_valid_untagged(old_addr, new_size))) {
> -        errno = ENOMEM;
> +        errno = EINVAL;
>           return -1;
>       }
>   

The order of the checks here is wrong.  We should match do_remap and check_mremap_params. 
In particular, it appears as if all of the EINVAL checks come first.


r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap
  2025-10-11 20:03 ` [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap Matthew Lugg
  2025-10-20 16:00   ` Peter Maydell
@ 2025-10-21 19:57   ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2025-10-21 19:57 UTC (permalink / raw)
  To: qemu-devel

On 10/11/25 15:03, Matthew Lugg wrote:
> The previous logic here had an off-by-one error: assuming 4k pages on
> host and guest, if `len == 4097` (indicating to unmap 2 pages), then
> `last = start + 4096`, so `real_last = start + 4095`, so ultimately
> `real_len = 4096`. I do not believe this could cause any observable bugs
> in guests, because `target_munmap` page-aligns the length it passes in.
> However, calls to this function in `target_mremap` do not page-align the
> length, so those calls could "drop" pages, leading to a part of the
> reserved region becoming unmapped. At worst, a host allocation could get
> mapped into that hole, then clobbered by a new guest mapping.
> 
> A simple fix didn't feel ideal here, because I think this function was
> not written as well as it could be. Instead, the logic is simpler if we
> use `end = start + len` instead of `last = start + len - 1` (overflow
> does not cause any problem here), and use offsets in the loops (avoiding
> overflows since the offset is never larger than the host page size).

No, it is not simpler with 'end', because end == 0 is 'valid' in the sense that the range 
goes from [start, (abi_ptr)-1].  But having end <= start is awkward in the extreme.

Thus we prefer the inclusive range [start, last] to the exclusive range [start, end).

Not everything has been converted away from 'end', but we certainly should not regress 
existing code.


r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-21 19:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-11 20:03 [PATCH 0/4] linux-user: fix several mremap bugs Matthew Lugg
2025-10-11 20:03 ` [PATCH 1/4] linux-user: fix mremap unmapping adjacent region Matthew Lugg
2025-10-20 15:08   ` Peter Maydell
2025-10-21 19:44   ` Richard Henderson
2025-10-11 20:03 ` [PATCH 2/4] linux-user: fix mremap errors for invalid ranges Matthew Lugg
2025-10-20 15:10   ` Peter Maydell
2025-10-21 19:50   ` Richard Henderson
2025-10-11 20:03 ` [PATCH 3/4] linux-user: fix reserved_va page leak in do_munmap Matthew Lugg
2025-10-20 16:00   ` Peter Maydell
2025-10-21 19:57   ` Richard Henderson
2025-10-11 20:03 ` [PATCH 4/4] tests: add tcg coverage for fixed mremap bugs Matthew Lugg
2025-10-11 20:15   ` Matthew Lugg
2025-10-20 15:26   ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).