qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/8] linux-user: brk fixes
@ 2023-08-01 23:27 Helge Deller
  2023-08-01 23:27 ` [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host Helge Deller
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

This patch series is a fix-up for some current problems
regarding heap memory / brk handling in qemu which happens
on some 32-bit platforms, e.g. problems loading static
binaries.

This series includes the 5 patches from Akihiko Odaki
with some additional fixes and cleanups by me.

Akihiko Odaki (5):
  linux-user: Unset MAP_FIXED_NOREPLACE for host
  linux-user: Do not call get_errno() in do_brk()
  linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
  linux-user: Do nothing if too small brk is specified
  linux-user: Do not align brk with host page size

Helge Deller (3):
  linux-user: Show heap address in /proc/pid/maps
  linux-user: Optimize memory layout for static and dynamic executables
  linux-user: Load pie executables at upper memory

 include/exec/cpu_ldst.h |  4 +--
 linux-user/elfload.c    | 59 ++++++++++--------------------
 linux-user/loader.h     | 12 +++++++
 linux-user/main.c       |  2 ++
 linux-user/mmap.c       | 35 ++++++++++--------
 linux-user/qemu.h       |  4 +--
 linux-user/syscall.c    | 80 ++++++++++++-----------------------------
 7 files changed, 79 insertions(+), 117 deletions(-)

--
2.41.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-01 23:40   ` Richard Henderson
  2023-08-01 23:27 ` [PATCH v6 2/8] linux-user: Do not call get_errno() in do_brk() Helge Deller
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Passing MAP_FIXED_NOREPLACE to host will fail if the virtual
address space is reserved with mmap. Replace it with MAP_FIXED.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/mmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index a5dfb56545..2f26cbaf5d 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -610,6 +610,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
             goto fail;
         }

+        flags = (flags & ~MAP_FIXED_NOREPLACE) | MAP_FIXED;
+
         /*
          * worst case: we cannot map the file because the offset is not
          * aligned, so we read it
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 2/8] linux-user: Do not call get_errno() in do_brk()
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
  2023-08-01 23:27 ` [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-01 23:27 ` [PATCH v6 3/8] linux-user: Use MAP_FIXED_NOREPLACE for do_brk() Helge Deller
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Later the returned value is compared with -1, and negated errno is not
expected.

Fixes: 00faf08c95 ("linux-user: Don't use MAP_FIXED in do_brk()")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 95727a816a..b9d2ec02f9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -862,9 +862,9 @@ abi_long do_brk(abi_ulong brk_val)
      */
     if (new_host_brk_page > brk_page) {
         new_alloc_size = new_host_brk_page - brk_page;
-        mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
-                                        PROT_READ|PROT_WRITE,
-                                        MAP_ANON|MAP_PRIVATE, 0, 0));
+        mapped_addr = target_mmap(brk_page, new_alloc_size,
+                                  PROT_READ|PROT_WRITE,
+                                  MAP_ANON|MAP_PRIVATE, 0, 0);
     } else {
         new_alloc_size = 0;
         mapped_addr = brk_page;
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 3/8] linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
  2023-08-01 23:27 ` [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host Helge Deller
  2023-08-01 23:27 ` [PATCH v6 2/8] linux-user: Do not call get_errno() in do_brk() Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-01 23:27 ` [PATCH v6 4/8] linux-user: Do nothing if too small brk is specified Helge Deller
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

From: Akihiko Odaki <akihiko.odaki@daynix.com>

MAP_FIXED_NOREPLACE can ensure the mapped address is fixed without
concerning that the new mapping overwrites something else.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/syscall.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b9d2ec02f9..ac429a185a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -854,17 +854,12 @@ abi_long do_brk(abi_ulong brk_val)
         return target_brk;
     }

-    /* We need to allocate more memory after the brk... Note that
-     * we don't use MAP_FIXED because that will map over the top of
-     * any existing mapping (like the one with the host libc or qemu
-     * itself); instead we treat "mapped but at wrong address" as
-     * a failure and unmap again.
-     */
     if (new_host_brk_page > brk_page) {
         new_alloc_size = new_host_brk_page - brk_page;
         mapped_addr = target_mmap(brk_page, new_alloc_size,
-                                  PROT_READ|PROT_WRITE,
-                                  MAP_ANON|MAP_PRIVATE, 0, 0);
+                                  PROT_READ | PROT_WRITE,
+                                  MAP_FIXED_NOREPLACE | MAP_ANON | MAP_PRIVATE,
+                                  0, 0);
     } else {
         new_alloc_size = 0;
         mapped_addr = brk_page;
@@ -883,12 +878,6 @@ abi_long do_brk(abi_ulong brk_val)
         target_brk = brk_val;
         brk_page = new_host_brk_page;
         return target_brk;
-    } else if (mapped_addr != -1) {
-        /* Mapped but at wrong address, meaning there wasn't actually
-         * enough space for this brk.
-         */
-        target_munmap(mapped_addr, new_alloc_size);
-        mapped_addr = -1;
     }

 #if defined(TARGET_ALPHA)
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 4/8] linux-user: Do nothing if too small brk is specified
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (2 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 3/8] linux-user: Use MAP_FIXED_NOREPLACE for do_brk() Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-01 23:27 ` [PATCH v6 5/8] linux-user: Do not align brk with host page size Helge Deller
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Linux 6.4.7 does nothing when a value smaller than the initial brk is
specified.

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ac429a185a..ebdc8c144c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -820,14 +820,14 @@ abi_long do_brk(abi_ulong brk_val)

     /* brk pointers are always untagged */

-    /* return old brk value if brk_val unchanged or zero */
-    if (!brk_val || brk_val == target_brk) {
+    /* return old brk value if brk_val unchanged */
+    if (brk_val == target_brk) {
         return target_brk;
     }

     /* do not allow to shrink below initial brk value */
     if (brk_val < initial_target_brk) {
-        brk_val = initial_target_brk;
+        return target_brk;
     }

     new_brk = TARGET_PAGE_ALIGN(brk_val);
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 5/8] linux-user: Do not align brk with host page size
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (3 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 4/8] linux-user: Do nothing if too small brk is specified Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-01 23:27 ` [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps Helge Deller
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

From: Akihiko Odaki <akihiko.odaki@daynix.com>

do_brk() minimizes calls into target_mmap() by aligning the address
with host page size, which is potentially larger than the target page
size. However, the current implementation of this optimization has two
bugs:

- The start of brk is rounded up with the host page size while brk
  advertises an address aligned with the target page size as the
  beginning of brk. This makes the beginning of brk unmapped.
- Content clearing after mapping is flawed. The size to clear is
  specified as HOST_PAGE_ALIGN(brk_page) - brk_page, but brk_page is
  aligned with the host page size so it is always zero.

This optimization actually has no practical benefit. It makes difference
when brk() is called multiple times with values in a range of the host
page size. However, sophisticated memory allocators try to avoid to
make such frequent brk() calls. For example, glibc 2.37 calls brk() to
shrink the heap only when there is a room more than 128 KiB. It is
rare to have a page size larger than 128 KiB if it happens.

Let's remove the optimization to fix the bugs and make the code simpler.

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1616
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/elfload.c |  4 ++--
 linux-user/syscall.c | 54 ++++++++++----------------------------------
 2 files changed, 14 insertions(+), 44 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 861ec07abc..2aee2298ec 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3678,8 +3678,8 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
      * to mmap pages in this space.
      */
     if (info->reserve_brk) {
-        abi_ulong start_brk = HOST_PAGE_ALIGN(info->brk);
-        abi_ulong end_brk = HOST_PAGE_ALIGN(info->brk + info->reserve_brk);
+        abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk);
+        abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + info->reserve_brk);
         target_munmap(start_brk, end_brk - start_brk);
     }

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ebdc8c144c..475260b7ce 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -802,81 +802,51 @@ static inline int host_to_target_sock_type(int host_type)
 }

 static abi_ulong target_brk, initial_target_brk;
-static abi_ulong brk_page;

 void target_set_brk(abi_ulong new_brk)
 {
     target_brk = TARGET_PAGE_ALIGN(new_brk);
     initial_target_brk = target_brk;
-    brk_page = HOST_PAGE_ALIGN(target_brk);
 }

 /* do_brk() must return target values and target errnos. */
 abi_long do_brk(abi_ulong brk_val)
 {
     abi_long mapped_addr;
-    abi_ulong new_alloc_size;
-    abi_ulong new_brk, new_host_brk_page;
+    abi_ulong new_brk;
+    abi_ulong old_brk;

     /* brk pointers are always untagged */

-    /* return old brk value if brk_val unchanged */
-    if (brk_val == target_brk) {
-        return target_brk;
-    }
-
     /* do not allow to shrink below initial brk value */
     if (brk_val < initial_target_brk) {
         return target_brk;
     }

     new_brk = TARGET_PAGE_ALIGN(brk_val);
-    new_host_brk_page = HOST_PAGE_ALIGN(brk_val);
+    old_brk = TARGET_PAGE_ALIGN(target_brk);

-    /* brk_val and old target_brk might be on the same page */
-    if (new_brk == TARGET_PAGE_ALIGN(target_brk)) {
-        /* empty remaining bytes in (possibly larger) host page */
-        memset(g2h_untagged(new_brk), 0, new_host_brk_page - new_brk);
+    /* new and old target_brk might be on the same page */
+    if (new_brk == old_brk) {
         target_brk = brk_val;
         return target_brk;
     }

     /* Release heap if necesary */
-    if (new_brk < target_brk) {
-        /* empty remaining bytes in (possibly larger) host page */
-        memset(g2h_untagged(new_brk), 0, new_host_brk_page - new_brk);
-
-        /* free unused host pages and set new brk_page */
-        target_munmap(new_host_brk_page, brk_page - new_host_brk_page);
-        brk_page = new_host_brk_page;
+    if (new_brk < old_brk) {
+        target_munmap(new_brk, old_brk - new_brk);

         target_brk = brk_val;
         return target_brk;
     }

-    if (new_host_brk_page > brk_page) {
-        new_alloc_size = new_host_brk_page - brk_page;
-        mapped_addr = target_mmap(brk_page, new_alloc_size,
-                                  PROT_READ | PROT_WRITE,
-                                  MAP_FIXED_NOREPLACE | MAP_ANON | MAP_PRIVATE,
-                                  0, 0);
-    } else {
-        new_alloc_size = 0;
-        mapped_addr = brk_page;
-    }
-
-    if (mapped_addr == brk_page) {
-        /* Heap contents are initialized to zero, as for anonymous
-         * mapped pages.  Technically the new pages are already
-         * initialized to zero since they *are* anonymous mapped
-         * pages, however we have to take care with the contents that
-         * come from the remaining part of the previous page: it may
-         * contains garbage data due to a previous heap usage (grown
-         * then shrunken).  */
-        memset(g2h_untagged(brk_page), 0, HOST_PAGE_ALIGN(brk_page) - brk_page);
+    mapped_addr = target_mmap(old_brk, new_brk - old_brk,
+                              PROT_READ | PROT_WRITE,
+                              MAP_FIXED_NOREPLACE | MAP_ANON | MAP_PRIVATE,
+                              0, 0);

+    if (mapped_addr == old_brk) {
         target_brk = brk_val;
-        brk_page = new_host_brk_page;
         return target_brk;
     }

--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (4 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 5/8] linux-user: Do not align brk with host page size Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-02  5:41   ` Philippe Mathieu-Daudé
  2023-08-01 23:27 ` [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables Helge Deller
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

Show the memory location of the heap in the /proc/pid/maps file inside
the guest. Store the heap address in ts->heap_base, which requires to
make that variable accessible for all guest architectures, not just
architectures for semihosted binaries (arm, m68k, riscv).

Note that /proc/pid/maps in the guest needs to show target-aligned
addresses. This is fixed in this patch, so now the heap and stack
address for architectures like sparc64 and alpha now show up in that
output as well.

Show 32- and 64-bit pointers with 8 digits and leading zeros (%08x/%08lx).
For 64-bit we could use %16lx, but we mimic the Linux kernel, which shows
even 64-bit addresses with %08lx.

Example:

user@machine:/# uname -a
Linux paq 5.15.88+ #47 SMP Sun Jan 15 12:53:11 CET 2023 aarch64 GNU/Linux

user@machine:/# cat /proc/self/maps
Linux p100 6.4.4-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jul 19 16:32:49 UTC 2023 aarch64 GNU/Linux
5500000000-5500009000 r-xp 00000000 fd:00 570430                         /usr/bin/cat
5500009000-550001f000 ---p 00000000 00:00 0
550001f000-5500020000 r--p 0000f000 fd:00 570430                         /usr/bin/cat
5500020000-5500021000 rw-p 00010000 fd:00 570430                         /usr/bin/cat
5500021000-5500042000 rw-p 00000000 00:00 0                              [heap]
7000000000-7000001000 ---p 00000000 00:00 0
7000001000-7000801000 rw-p 00000000 00:00 0                              [stack]
7000801000-7000827000 r-xp 00000000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
7000827000-700083f000 ---p 00000000 00:00 0
700083f000-7000841000 r--p 0002e000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
7000841000-7000843000 rw-p 00030000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
7000843000-7000844000 r-xp 00000000 00:00 0
7000844000-7000846000 rw-p 00000000 00:00 0
7000850000-70009d7000 r-xp 00000000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
70009d7000-70009ed000 ---p 00187000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
70009ed000-70009f0000 r--p 0018d000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
70009f0000-70009f2000 rw-p 00190000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6

Signed-off-by: Helge Deller <deller@gmx.de>
---
 include/exec/cpu_ldst.h |  4 ++--
 linux-user/main.c       |  2 ++
 linux-user/qemu.h       |  4 ++--
 linux-user/syscall.c    | 13 +++++++++----
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 645476f0e5..f1e6f31e88 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -72,10 +72,10 @@
  */
 #if TARGET_VIRT_ADDR_SPACE_BITS <= 32
 typedef uint32_t abi_ptr;
-#define TARGET_ABI_FMT_ptr "%x"
+#define TARGET_ABI_FMT_ptr "%08x"
 #else
 typedef uint64_t abi_ptr;
-#define TARGET_ABI_FMT_ptr "%"PRIx64
+#define TARGET_ABI_FMT_ptr "%08"PRIx64
 #endif

 #ifndef TARGET_TAGGED_ADDRESSES
diff --git a/linux-user/main.c b/linux-user/main.c
index dba67ffa36..fa6e47510f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -946,6 +946,7 @@ int main(int argc, char **argv, char **envp)
         }
     }

+    info->brk = TARGET_PAGE_ALIGN(info->brk);
     target_set_brk(info->brk);
     syscall_init();
     signal_init();
@@ -955,6 +956,7 @@ int main(int argc, char **argv, char **envp)
        the real value of GUEST_BASE into account.  */
     tcg_prologue_init(tcg_ctx);

+    ts->heap_base = info->brk;
     target_cpu_copy_regs(env, regs);

     if (gdbstub) {
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 802794db63..7a6adac637 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -121,11 +121,11 @@ typedef struct TaskState {
 #ifdef TARGET_M68K
     abi_ulong tp_value;
 #endif
-#if defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_RISCV)
+
     /* Extra fields for semihosted binaries.  */
     abi_ulong heap_base;
     abi_ulong heap_limit;
-#endif
+
     abi_ulong stack_base;
     int used; /* non zero if used */
     struct image_info *info;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 475260b7ce..dc8266c073 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8078,8 +8078,9 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
         MapInfo *e = (MapInfo *) s->data;

         if (h2g_valid(e->start)) {
-            unsigned long min = e->start;
-            unsigned long max = e->end;
+            /* show page granularity of guest in /proc/pid/maps */
+            unsigned long min = TARGET_PAGE_ALIGN(e->start);
+            unsigned long max = TARGET_PAGE_ALIGN(e->end);
             int flags = page_get_flags(h2g(min));
             const char *path;

@@ -8090,14 +8091,18 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
                 continue;
             }

+            path = e->path;
+
+            if (ts->heap_base && h2g(min) == ts->heap_base) {
+                path = "[heap]";
+            }
+
 #ifdef TARGET_HPPA
             if (h2g(max) == ts->info->stack_limit) {
 #else
             if (h2g(min) == ts->info->stack_limit) {
 #endif
                 path = "[stack]";
-            } else {
-                path = e->path;
             }

             count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (5 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-02 18:25   ` Richard Henderson
  2023-08-01 23:27 ` [PATCH v6 8/8] linux-user: Load pie executables at upper memory Helge Deller
  2023-08-02  2:21 ` [PATCH v6 0/8] linux-user: brk fixes Joel Stanley
  8 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

Reorganize the guest memory layout to get as much memory as possible for
heap for the guest application.

This patch optimizes the memory layout by loading pie executables
into lower memory and shared libs into higher memory (at
TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap
space which will be located directly after the executable.
Up to now, pie executable and shared libs were loaded directly behind
each other in the area at TASK_UNMAPPED_BASE, which leaves very little
space for heap.

I tested this patchset with chroots of alpha, arm, armel, arm64, hppa, m68k,
mips64el, mipsel, powerpc, ppc64, ppc64el, s390x, sh4 and sparc64 on a x86-64
host, and with a static armhf binary (which fails to run without this patch).

This patch temporarily breaks the Thread Sanitizer (TSan) application
which expects specific boundary definitions for memory mappings on
different platforms [1], see commit aab613fb9597 ("linux-user: Update
TASK_UNMAPPED_BASE for aarch64") for aarch64. The follow-up patch fixes it
again.

[1] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h

Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/elfload.c | 55 +++++++++++++-------------------------------
 linux-user/mmap.c    |  8 ++++---
 2 files changed, 21 insertions(+), 42 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 2aee2298ec..47a118e430 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int image_fd,
     abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
     int i, retval, prot_exec;
     Error *err = NULL;
+    bool is_main_executable;

     /* First of all, some simple consistency checks */
     if (!elf_check_ident(ehdr)) {
@@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd,
         }
     }

-    if (pinterp_name != NULL) {
-        /*
-         * This is the main executable.
-         *
-         * Reserve extra space for brk.
-         * We hold on to this space while placing the interpreter
-         * and the stack, lest they be placed immediately after
-         * the data segment and block allocation from the brk.
-         *
-         * 16MB is chosen as "large enough" without being so large as
-         * to allow the result to not fit with a 32-bit guest on a
-         * 32-bit host. However some 64 bit guests (e.g. s390x)
-         * attempt to place their heap further ahead and currently
-         * nothing stops them smashing into QEMUs address space.
-         */
-#if TARGET_LONG_BITS == 64
-        info->reserve_brk = 32 * MiB;
-#else
-        info->reserve_brk = 16 * MiB;
-#endif
-        hiaddr += info->reserve_brk;
-
+    is_main_executable = (pinterp_name != NULL);
+    if (is_main_executable) {
         if (ehdr->e_type == ET_EXEC) {
             /*
              * Make sure that the low address does not conflict with
@@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int image_fd,
             probe_guest_base(image_name, loaddr, hiaddr);
         } else {
             /*
-             * The binary is dynamic, but we still need to
+             * The binary is dynamic (pie-executabe), but we still need to
              * select guest_base.  In this case we pass a size.
              */
             probe_guest_base(image_name, 0, hiaddr - loaddr);
@@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
      */
     load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
                             MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
-                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+                            (is_main_executable ? MAP_FIXED : 0),
                             -1, 0);
     if (load_addr == -1) {
         goto exit_mmap;
@@ -3194,7 +3175,8 @@ static void load_elf_image(const char *image_name, int image_fd,
     info->end_code = 0;
     info->start_data = -1;
     info->end_data = 0;
-    info->brk = 0;
+    /* possible start for brk is behind all sections of this ELF file. */
+    info->brk = TARGET_PAGE_ALIGN(hiaddr);
     info->elf_flags = ehdr->e_flags;

     prot_exec = PROT_EXEC;
@@ -3288,9 +3270,6 @@ static void load_elf_image(const char *image_name, int image_fd,
                     info->end_data = vaddr_ef;
                 }
             }
-            if (vaddr_em > info->brk) {
-                info->brk = vaddr_em;
-            }
 #ifdef TARGET_MIPS
         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
             Mips_elf_abiflags_v0 abiflags;
@@ -3618,6 +3597,15 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)

     if (elf_interpreter) {
         load_elf_interp(elf_interpreter, &interp_info, bprm->buf);
+        /*
+	 * Use brk address of interpreter if it was loaded above the
+	 * executable and leaves less than 16 MB for heap.
+	 * This happens e.g. with static binaries on armhf.
+         */
+        if (interp_info.brk > info->brk &&
+            interp_info.load_bias - info->brk < 16 * MiB)  {
+            info->brk = interp_info.brk;
+        }

         /* If the program interpreter is one of these two, then assume
            an iBCS2 image.  Otherwise assume a native linux image.  */
@@ -3672,17 +3660,6 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
     bprm->core_dump = &elf_core_dump;
 #endif

-    /*
-     * If we reserved extra space for brk, release it now.
-     * The implementation of do_brk in syscalls.c expects to be able
-     * to mmap pages in this space.
-     */
-    if (info->reserve_brk) {
-        abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk);
-        abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + info->reserve_brk);
-        target_munmap(start_brk, end_brk - start_brk);
-    }
-
     return 0;
 }

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 2f26cbaf5d..c624feead0 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -299,14 +299,16 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
 #ifdef TARGET_AARCH64
 # define TASK_UNMAPPED_BASE  0x5500000000
 #else
-# define TASK_UNMAPPED_BASE  (1ul << 38)
+# define TASK_UNMAPPED_BASE  0x4000000000
 #endif
-#else
+#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
 #ifdef TARGET_HPPA
 # define TASK_UNMAPPED_BASE  0xfa000000
 #else
-# define TASK_UNMAPPED_BASE  0x40000000
+# define TASK_UNMAPPED_BASE  0xe0000000
 #endif
+#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
+# define TASK_UNMAPPED_BASE  0x40000000
 #endif
 abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;

--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (6 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables Helge Deller
@ 2023-08-01 23:27 ` Helge Deller
  2023-08-02  7:49   ` Akihiko Odaki
  2023-08-02 18:36   ` Richard Henderson
  2023-08-02  2:21 ` [PATCH v6 0/8] linux-user: brk fixes Joel Stanley
  8 siblings, 2 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-01 23:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki, Helge Deller

Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address for all
32-bit architectures, based on the GUEST_ADDR_MAX constant.

Additionally modify the elf loader to load dynamic pie executables at
around:
~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
- 0x00300000    for 32-bit guest binaries on 64-bit host, and
- 0x00000000    for 32-bit guest binaries on 32-bit host.

With this patch the Thread Sanitizer (TSan) application will work again,
as in commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for
aarch64").

Signed-off-by: Helge Deller <deller@gmx.de>
---
 linux-user/elfload.c |  6 ++++--
 linux-user/loader.h  | 12 ++++++++++++
 linux-user/mmap.c    | 35 ++++++++++++++++++-----------------
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 47a118e430..8f5a79b537 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3021,6 +3021,7 @@ static void load_elf_image(const char *image_name, int image_fd,
     struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
     struct elf_phdr *phdr;
     abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
+    unsigned long load_offset = 0;
     int i, retval, prot_exec;
     Error *err = NULL;
     bool is_main_executable;
@@ -3121,6 +3122,7 @@ static void load_elf_image(const char *image_name, int image_fd,
              * select guest_base.  In this case we pass a size.
              */
             probe_guest_base(image_name, 0, hiaddr - loaddr);
+            load_offset = TASK_UNMAPPED_BASE_PIE;
         }
     }

@@ -3138,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
      * In both cases, we will overwrite pages in this range with mappings
      * from the executable.
      */
-    load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
+    load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - loaddr + 1, PROT_NONE,
                             MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
                             (is_main_executable ? MAP_FIXED : 0),
                             -1, 0);
@@ -3176,7 +3178,7 @@ static void load_elf_image(const char *image_name, int image_fd,
     info->start_data = -1;
     info->end_data = 0;
     /* possible start for brk is behind all sections of this ELF file. */
-    info->brk = TARGET_PAGE_ALIGN(hiaddr);
+    info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
     info->elf_flags = ehdr->e_flags;

     prot_exec = PROT_EXEC;
diff --git a/linux-user/loader.h b/linux-user/loader.h
index 59cbeacf24..3bbfc108eb 100644
--- a/linux-user/loader.h
+++ b/linux-user/loader.h
@@ -18,6 +18,18 @@
 #ifndef LINUX_USER_LOADER_H
 #define LINUX_USER_LOADER_H

+/* where to map binaries? */
+#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
+# define TASK_UNMAPPED_BASE_PIE 0x5500000000
+# define TASK_UNMAPPED_BASE	0x7000000000
+#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
+# define TASK_UNMAPPED_BASE_PIE	0x00300000
+# define TASK_UNMAPPED_BASE	(GUEST_ADDR_MAX - 0x20000000 + 1)
+#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
+# define TASK_UNMAPPED_BASE_PIE	0x00000000
+# define TASK_UNMAPPED_BASE	0x40000000
+#endif
+
 /*
  * Read a good amount of data initially, to hopefully get all the
  * program headers loaded.
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c624feead0..3441198e21 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -23,6 +23,7 @@
 #include "user-internals.h"
 #include "user-mmap.h"
 #include "target_mman.h"
+#include "loader.h"

 static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
 static __thread int mmap_lock_count;
@@ -295,23 +296,6 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
     return true;
 }

-#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
-#ifdef TARGET_AARCH64
-# define TASK_UNMAPPED_BASE  0x5500000000
-#else
-# define TASK_UNMAPPED_BASE  0x4000000000
-#endif
-#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
-#ifdef TARGET_HPPA
-# define TASK_UNMAPPED_BASE  0xfa000000
-#else
-# define TASK_UNMAPPED_BASE  0xe0000000
-#endif
-#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
-# define TASK_UNMAPPED_BASE  0x40000000
-#endif
-abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
-
 unsigned long last_brk;

 /*
@@ -344,6 +328,23 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align)
     abi_ulong addr;
     int wrapped, repeat;

+    static abi_ulong mmap_next_start;
+
+    /* initialize mmap_next_start if necessary */
+    if (!mmap_next_start) {
+        mmap_next_start = TASK_UNMAPPED_BASE;
+
+        /* do sanity checks on guest memory layout */
+        if (mmap_next_start >= GUEST_ADDR_MAX) {
+            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;
+        }
+
+        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
+            fprintf(stderr, "Memory too small for PIE executables.\n");
+            exit(EXIT_FAILURE);
+        }
+    }
+
     align = MAX(align, qemu_host_page_size);

     /* If 'start' == 0, then a default start address is used. */
--
2.41.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host
  2023-08-01 23:27 ` [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host Helge Deller
@ 2023-08-01 23:40   ` Richard Henderson
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2023-08-01 23:40 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/1/23 16:27, Helge Deller wrote:
> From: Akihiko Odaki <akihiko.odaki@daynix.com>
> 
> Passing MAP_FIXED_NOREPLACE to host will fail if the virtual
> address space is reserved with mmap. Replace it with MAP_FIXED.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
>   linux-user/mmap.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index a5dfb56545..2f26cbaf5d 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -610,6 +610,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
>               goto fail;
>           }
> 
> +        flags = (flags & ~MAP_FIXED_NOREPLACE) | MAP_FIXED;


Again, this must be restricted to reserved_va == 0 or 64-bit guests will fail.


r~

> +
>           /*
>            * worst case: we cannot map the file because the offset is not
>            * aligned, so we read it
> --
> 2.41.0
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 0/8] linux-user: brk fixes
  2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
                   ` (7 preceding siblings ...)
  2023-08-01 23:27 ` [PATCH v6 8/8] linux-user: Load pie executables at upper memory Helge Deller
@ 2023-08-02  2:21 ` Joel Stanley
  2023-08-02  6:10   ` Helge Deller
  8 siblings, 1 reply; 26+ messages in thread
From: Joel Stanley @ 2023-08-02  2:21 UTC (permalink / raw)
  To: Helge Deller
  Cc: qemu-devel, Richard Henderson, Laurent Vivier, Paolo Bonzini,
	Akihiko Odaki

On Tue, 1 Aug 2023 at 23:28, Helge Deller <deller@gmx.de> wrote:
>
> This patch series is a fix-up for some current problems
> regarding heap memory / brk handling in qemu which happens
> on some 32-bit platforms, e.g. problems loading static
> binaries.
>
> This series includes the 5 patches from Akihiko Odaki
> with some additional fixes and cleanups by me.

This has the same segfault as the branch that I previously tested,
when running on a ppc64le host..

As a reminder, the ppc64le machine (normally, and does in this case)
uses a 64K page size. I think this is a detail that is missing from
your chroot testing.


>
> Akihiko Odaki (5):
>   linux-user: Unset MAP_FIXED_NOREPLACE for host
>   linux-user: Do not call get_errno() in do_brk()
>   linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
>   linux-user: Do nothing if too small brk is specified
>   linux-user: Do not align brk with host page size
>
> Helge Deller (3):
>   linux-user: Show heap address in /proc/pid/maps
>   linux-user: Optimize memory layout for static and dynamic executables
>   linux-user: Load pie executables at upper memory
>
>  include/exec/cpu_ldst.h |  4 +--
>  linux-user/elfload.c    | 59 ++++++++++--------------------
>  linux-user/loader.h     | 12 +++++++
>  linux-user/main.c       |  2 ++
>  linux-user/mmap.c       | 35 ++++++++++--------
>  linux-user/qemu.h       |  4 +--
>  linux-user/syscall.c    | 80 ++++++++++++-----------------------------
>  7 files changed, 79 insertions(+), 117 deletions(-)
>
> --
> 2.41.0
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps
  2023-08-01 23:27 ` [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps Helge Deller
@ 2023-08-02  5:41   ` Philippe Mathieu-Daudé
  2023-08-02  6:07     ` Helge Deller
  0 siblings, 1 reply; 26+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-08-02  5:41 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki

Hi Helge,

On 2/8/23 01:27, Helge Deller wrote:
> Show the memory location of the heap in the /proc/pid/maps file inside
> the guest. Store the heap address in ts->heap_base, which requires to
> make that variable accessible for all guest architectures, not just
> architectures for semihosted binaries (arm, m68k, riscv).
> 
> Note that /proc/pid/maps in the guest needs to show target-aligned
> addresses. This is fixed in this patch, so now the heap and stack
> address for architectures like sparc64 and alpha now show up in that
> output as well.
> 
> Show 32- and 64-bit pointers with 8 digits and leading zeros (%08x/%08lx).
> For 64-bit we could use %16lx, but we mimic the Linux kernel, which shows
> even 64-bit addresses with %08lx.

You are describing 3 changes, do you mind splitting in 3 patches?
Otherwise LGTM.

> Example:
> 
> user@machine:/# uname -a
> Linux paq 5.15.88+ #47 SMP Sun Jan 15 12:53:11 CET 2023 aarch64 GNU/Linux
> 
> user@machine:/# cat /proc/self/maps
> Linux p100 6.4.4-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jul 19 16:32:49 UTC 2023 aarch64 GNU/Linux
> 5500000000-5500009000 r-xp 00000000 fd:00 570430                         /usr/bin/cat
> 5500009000-550001f000 ---p 00000000 00:00 0
> 550001f000-5500020000 r--p 0000f000 fd:00 570430                         /usr/bin/cat
> 5500020000-5500021000 rw-p 00010000 fd:00 570430                         /usr/bin/cat
> 5500021000-5500042000 rw-p 00000000 00:00 0                              [heap]
> 7000000000-7000001000 ---p 00000000 00:00 0
> 7000001000-7000801000 rw-p 00000000 00:00 0                              [stack]
> 7000801000-7000827000 r-xp 00000000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
> 7000827000-700083f000 ---p 00000000 00:00 0
> 700083f000-7000841000 r--p 0002e000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
> 7000841000-7000843000 rw-p 00030000 fd:00 571555                         /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
> 7000843000-7000844000 r-xp 00000000 00:00 0
> 7000844000-7000846000 rw-p 00000000 00:00 0
> 7000850000-70009d7000 r-xp 00000000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
> 70009d7000-70009ed000 ---p 00187000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
> 70009ed000-70009f0000 r--p 0018d000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
> 70009f0000-70009f2000 rw-p 00190000 fd:00 571558                         /usr/lib/aarch64-linux-gnu/libc.so.6
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
>   include/exec/cpu_ldst.h |  4 ++--
>   linux-user/main.c       |  2 ++
>   linux-user/qemu.h       |  4 ++--
>   linux-user/syscall.c    | 13 +++++++++----
>   4 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
> index 645476f0e5..f1e6f31e88 100644
> --- a/include/exec/cpu_ldst.h
> +++ b/include/exec/cpu_ldst.h
> @@ -72,10 +72,10 @@
>    */
>   #if TARGET_VIRT_ADDR_SPACE_BITS <= 32
>   typedef uint32_t abi_ptr;
> -#define TARGET_ABI_FMT_ptr "%x"
> +#define TARGET_ABI_FMT_ptr "%08x"
>   #else
>   typedef uint64_t abi_ptr;
> -#define TARGET_ABI_FMT_ptr "%"PRIx64
> +#define TARGET_ABI_FMT_ptr "%08"PRIx64
>   #endif

This is patch #1,

> 
>   #ifndef TARGET_TAGGED_ADDRESSES
> diff --git a/linux-user/main.c b/linux-user/main.c
> index dba67ffa36..fa6e47510f 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -946,6 +946,7 @@ int main(int argc, char **argv, char **envp)
>           }
>       }
> 
> +    info->brk = TARGET_PAGE_ALIGN(info->brk);

Patch #3,

>       target_set_brk(info->brk);
>       syscall_init();
>       signal_init();
> @@ -955,6 +956,7 @@ int main(int argc, char **argv, char **envp)
>          the real value of GUEST_BASE into account.  */
>       tcg_prologue_init(tcg_ctx);
> 
> +    ts->heap_base = info->brk;

Patch #3,

>       target_cpu_copy_regs(env, regs);
> 
>       if (gdbstub) {
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index 802794db63..7a6adac637 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -121,11 +121,11 @@ typedef struct TaskState {
>   #ifdef TARGET_M68K
>       abi_ulong tp_value;
>   #endif
> -#if defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_RISCV)
> +

Patch #3,

>       /* Extra fields for semihosted binaries.  */
>       abi_ulong heap_base;
>       abi_ulong heap_limit;
> -#endif
> +
>       abi_ulong stack_base;
>       int used; /* non zero if used */
>       struct image_info *info;
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 475260b7ce..dc8266c073 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8078,8 +8078,9 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
>           MapInfo *e = (MapInfo *) s->data;
> 
>           if (h2g_valid(e->start)) {
> -            unsigned long min = e->start;
> -            unsigned long max = e->end;
> +            /* show page granularity of guest in /proc/pid/maps */
> +            unsigned long min = TARGET_PAGE_ALIGN(e->start);
> +            unsigned long max = TARGET_PAGE_ALIGN(e->end);

Patch #2,

>               int flags = page_get_flags(h2g(min));
>               const char *path;
> 
> @@ -8090,14 +8091,18 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
>                   continue;
>               }
> 
> +            path = e->path;
> +
> +            if (ts->heap_base && h2g(min) == ts->heap_base) {
> +                path = "[heap]";
> +            }

Patch #3 but see below,

>   #ifdef TARGET_HPPA
>               if (h2g(max) == ts->info->stack_limit) {
>   #else
>               if (h2g(min) == ts->info->stack_limit) {
>   #endif
>                   path = "[stack]";

                } else if (ts->heap_base && h2g(min) == ts->heap_base) {
                     path = "[heap]";

> -            } else {
> -                path = e->path;
>               }
> 
>               count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
> --
> 2.41.0
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps
  2023-08-02  5:41   ` Philippe Mathieu-Daudé
@ 2023-08-02  6:07     ` Helge Deller
  0 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-02  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley,
	Akihiko Odaki

Hi Philiplle,

On 8/2/23 07:41, Philippe Mathieu-Daudé wrote:
> On 2/8/23 01:27, Helge Deller wrote:
..
>> Show 32- and 64-bit pointers with 8 digits and leading zeros (%08x/%08lx).
>> For 64-bit we could use %16lx, but we mimic the Linux kernel, which shows
>> even 64-bit addresses with %08lx.
> 
> You are describing 3 changes, do you mind splitting in 3 patches?

Yes, will do...

>> @@ -8090,14 +8091,18 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
>>                   continue;
>>               }
>>
>> +            path = e->path;
>> +
>> +            if (ts->heap_base && h2g(min) == ts->heap_base) {
>> +                path = "[heap]";
>> +            }
> 
> Patch #3 but see below,

Ok...but (see below)...

  
>>   #ifdef TARGET_HPPA
>>               if (h2g(max) == ts->info->stack_limit) {
>>   #else
>>               if (h2g(min) == ts->info->stack_limit) {
>>   #endif
>>                   path = "[stack]";
> 
>                 } else if (ts->heap_base && h2g(min) == ts->heap_base) {
>                      path = "[heap]";
> 

You can't put it into the "else" part here... then heap will never show up
(as heap and stack often share the same region).

Since "heap" is logically at smaller address than stack, I moved the
hunk above the "stack" part.
  
>> -            } else {
>> -                path = e->path;
>>               }
>>
>>               count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr

Thanks for the review!
Helge

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 0/8] linux-user: brk fixes
  2023-08-02  2:21 ` [PATCH v6 0/8] linux-user: brk fixes Joel Stanley
@ 2023-08-02  6:10   ` Helge Deller
  0 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-02  6:10 UTC (permalink / raw)
  To: Joel Stanley, Laurent Vivier, Richard Henderson, qemu-devel
  Cc: Helge Deller, Paolo Bonzini, Akihiko Odaki

* Joel Stanley <joel@jms.id.au>:
> On Tue, 1 Aug 2023 at 23:28, Helge Deller <deller@gmx.de> wrote:
> >
> > This patch series is a fix-up for some current problems
> > regarding heap memory / brk handling in qemu which happens
> > on some 32-bit platforms, e.g. problems loading static
> > binaries.
> >
> > This series includes the 5 patches from Akihiko Odaki
> > with some additional fixes and cleanups by me.
>
> This has the same segfault as the branch that I previously tested,
> when running on a ppc64le host..
>
> As a reminder, the ppc64le machine (normally, and does in this case)
> uses a 64K page size. I think this is a detail that is missing from
> your chroot testing.

Could you try with this hunk on top of the patch series ?

Helge

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 8f5a79b537..a61e3d1080 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3178,7 +3178,7 @@ static void load_elf_image(const char *image_name, int image_fd,
     info->start_data = -1;
     info->end_data = 0;
     /* possible start for brk is behind all sections of this ELF file. */
-    info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
+    info->brk = HOST_PAGE_ALIGN(load_offset + hiaddr);
     info->elf_flags = ehdr->e_flags;

     prot_exec = PROT_EXEC;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-01 23:27 ` [PATCH v6 8/8] linux-user: Load pie executables at upper memory Helge Deller
@ 2023-08-02  7:49   ` Akihiko Odaki
  2023-08-02  8:42     ` Helge Deller
  2023-08-02 18:36   ` Richard Henderson
  1 sibling, 1 reply; 26+ messages in thread
From: Akihiko Odaki @ 2023-08-02  7:49 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 2023/08/02 8:27, Helge Deller wrote:
> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address for all
> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
> 
> Additionally modify the elf loader to load dynamic pie executables at
> around:
> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
> - 0x00000000    for 32-bit guest binaries on 32-bit host.

Why do you change guest addresses depending on the host?

> 
> With this patch the Thread Sanitizer (TSan) application will work again,
> as in commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for
> aarch64").
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
>   linux-user/elfload.c |  6 ++++--
>   linux-user/loader.h  | 12 ++++++++++++
>   linux-user/mmap.c    | 35 ++++++++++++++++++-----------------
>   3 files changed, 34 insertions(+), 19 deletions(-)
> 
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 47a118e430..8f5a79b537 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -3021,6 +3021,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>       struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
>       struct elf_phdr *phdr;
>       abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
> +    unsigned long load_offset = 0;
>       int i, retval, prot_exec;
>       Error *err = NULL;
>       bool is_main_executable;
> @@ -3121,6 +3122,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>                * select guest_base.  In this case we pass a size.
>                */
>               probe_guest_base(image_name, 0, hiaddr - loaddr);
> +            load_offset = TASK_UNMAPPED_BASE_PIE;
>           }
>       }
> 
> @@ -3138,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>        * In both cases, we will overwrite pages in this range with mappings
>        * from the executable.
>        */
> -    load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
> +    load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>                               (is_main_executable ? MAP_FIXED : 0),
>                               -1, 0);
> @@ -3176,7 +3178,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>       info->start_data = -1;
>       info->end_data = 0;
>       /* possible start for brk is behind all sections of this ELF file. */
> -    info->brk = TARGET_PAGE_ALIGN(hiaddr);
> +    info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
>       info->elf_flags = ehdr->e_flags;
> 
>       prot_exec = PROT_EXEC;
> diff --git a/linux-user/loader.h b/linux-user/loader.h
> index 59cbeacf24..3bbfc108eb 100644
> --- a/linux-user/loader.h
> +++ b/linux-user/loader.h
> @@ -18,6 +18,18 @@
>   #ifndef LINUX_USER_LOADER_H
>   #define LINUX_USER_LOADER_H
> 
> +/* where to map binaries? */
> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> +# define TASK_UNMAPPED_BASE_PIE 0x5500000000
> +# define TASK_UNMAPPED_BASE	0x7000000000
> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
> +# define TASK_UNMAPPED_BASE_PIE	0x00300000
> +# define TASK_UNMAPPED_BASE	(GUEST_ADDR_MAX - 0x20000000 + 1)
> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
> +# define TASK_UNMAPPED_BASE_PIE	0x00000000
> +# define TASK_UNMAPPED_BASE	0x40000000
> +#endif
> +
>   /*
>    * Read a good amount of data initially, to hopefully get all the
>    * program headers loaded.
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index c624feead0..3441198e21 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -23,6 +23,7 @@
>   #include "user-internals.h"
>   #include "user-mmap.h"
>   #include "target_mman.h"
> +#include "loader.h"
> 
>   static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
>   static __thread int mmap_lock_count;
> @@ -295,23 +296,6 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
>       return true;
>   }
> 
> -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> -#ifdef TARGET_AARCH64
> -# define TASK_UNMAPPED_BASE  0x5500000000
> -#else
> -# define TASK_UNMAPPED_BASE  0x4000000000
> -#endif
> -#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
> -#ifdef TARGET_HPPA
> -# define TASK_UNMAPPED_BASE  0xfa000000
> -#else
> -# define TASK_UNMAPPED_BASE  0xe0000000
> -#endif
> -#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
> -# define TASK_UNMAPPED_BASE  0x40000000
> -#endif
> -abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
> -
>   unsigned long last_brk;
> 
>   /*
> @@ -344,6 +328,23 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align)
>       abi_ulong addr;
>       int wrapped, repeat;
> 
> +    static abi_ulong mmap_next_start;
> +
> +    /* initialize mmap_next_start if necessary */
> +    if (!mmap_next_start) {
> +        mmap_next_start = TASK_UNMAPPED_BASE;
> +
> +        /* do sanity checks on guest memory layout */
> +        if (mmap_next_start >= GUEST_ADDR_MAX) {
> +            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;

What if GUEST_ADDR_MAX < 0x1000000000? I think you can just return a 
hard error when mmap_next_start >= GUEST_ADDR_MAX.

> +        }
> +
> +        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
> +            fprintf(stderr, "Memory too small for PIE executables.\n");

Perhaps it's better to use error_report() for new code.

> +            exit(EXIT_FAILURE);
> +        }
> +    }
> +
>       align = MAX(align, qemu_host_page_size);
> 
>       /* If 'start' == 0, then a default start address is used. */
> --
> 2.41.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02  7:49   ` Akihiko Odaki
@ 2023-08-02  8:42     ` Helge Deller
  2023-08-02  8:44       ` Akihiko Odaki
  0 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-02  8:42 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 8/2/23 09:49, Akihiko Odaki wrote:
> On 2023/08/02 8:27, Helge Deller wrote:
>> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address for all
>> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
>>
>> Additionally modify the elf loader to load dynamic pie executables at
>> around:
>> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
>> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
>> - 0x00000000    for 32-bit guest binaries on 32-bit host.
>
> Why do you change guest addresses depending on the host?

The addresses are guest-addresses.
A 32-bit guest PIE can't be loaded at e.g. 0x5500000000,
while a 64-bit guest PIE needs to be loaded at 0x5500000000.

>> With this patch the Thread Sanitizer (TSan) application will work again,
>> as in commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for
>> aarch64").
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> ---
>>   linux-user/elfload.c |  6 ++++--
>>   linux-user/loader.h  | 12 ++++++++++++
>>   linux-user/mmap.c    | 35 ++++++++++++++++++-----------------
>>   3 files changed, 34 insertions(+), 19 deletions(-)
>>
>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>> index 47a118e430..8f5a79b537 100644
>> --- a/linux-user/elfload.c
>> +++ b/linux-user/elfload.c
>> @@ -3021,6 +3021,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>       struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
>>       struct elf_phdr *phdr;
>>       abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
>> +    unsigned long load_offset = 0;
>>       int i, retval, prot_exec;
>>       Error *err = NULL;
>>       bool is_main_executable;
>> @@ -3121,6 +3122,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>                * select guest_base.  In this case we pass a size.
>>                */
>>               probe_guest_base(image_name, 0, hiaddr - loaddr);
>> +            load_offset = TASK_UNMAPPED_BASE_PIE;
>>           }
>>       }
>>
>> @@ -3138,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>        * In both cases, we will overwrite pages in this range with mappings
>>        * from the executable.
>>        */
>> -    load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>> +    load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>                               (is_main_executable ? MAP_FIXED : 0),
>>                               -1, 0);
>> @@ -3176,7 +3178,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>       info->start_data = -1;
>>       info->end_data = 0;
>>       /* possible start for brk is behind all sections of this ELF file. */
>> -    info->brk = TARGET_PAGE_ALIGN(hiaddr);
>> +    info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
>>       info->elf_flags = ehdr->e_flags;
>>
>>       prot_exec = PROT_EXEC;
>> diff --git a/linux-user/loader.h b/linux-user/loader.h
>> index 59cbeacf24..3bbfc108eb 100644
>> --- a/linux-user/loader.h
>> +++ b/linux-user/loader.h
>> @@ -18,6 +18,18 @@
>>   #ifndef LINUX_USER_LOADER_H
>>   #define LINUX_USER_LOADER_H
>>
>> +/* where to map binaries? */
>> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
>> +# define TASK_UNMAPPED_BASE_PIE 0x5500000000
>> +# define TASK_UNMAPPED_BASE    0x7000000000
>> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>> +# define TASK_UNMAPPED_BASE_PIE    0x00300000
>> +# define TASK_UNMAPPED_BASE    (GUEST_ADDR_MAX - 0x20000000 + 1)
>> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>> +# define TASK_UNMAPPED_BASE_PIE    0x00000000
>> +# define TASK_UNMAPPED_BASE    0x40000000
>> +#endif
>> +
>>   /*
>>    * Read a good amount of data initially, to hopefully get all the
>>    * program headers loaded.
>> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
>> index c624feead0..3441198e21 100644
>> --- a/linux-user/mmap.c
>> +++ b/linux-user/mmap.c
>> @@ -23,6 +23,7 @@
>>   #include "user-internals.h"
>>   #include "user-mmap.h"
>>   #include "target_mman.h"
>> +#include "loader.h"
>>
>>   static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
>>   static __thread int mmap_lock_count;
>> @@ -295,23 +296,6 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
>>       return true;
>>   }
>>
>> -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
>> -#ifdef TARGET_AARCH64
>> -# define TASK_UNMAPPED_BASE  0x5500000000
>> -#else
>> -# define TASK_UNMAPPED_BASE  0x4000000000
>> -#endif
>> -#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>> -#ifdef TARGET_HPPA
>> -# define TASK_UNMAPPED_BASE  0xfa000000
>> -#else
>> -# define TASK_UNMAPPED_BASE  0xe0000000
>> -#endif
>> -#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>> -# define TASK_UNMAPPED_BASE  0x40000000
>> -#endif
>> -abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
>> -
>>   unsigned long last_brk;
>>
>>   /*
>> @@ -344,6 +328,23 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align)
>>       abi_ulong addr;
>>       int wrapped, repeat;
>>
>> +    static abi_ulong mmap_next_start;
>> +
>> +    /* initialize mmap_next_start if necessary */
>> +    if (!mmap_next_start) {
>> +        mmap_next_start = TASK_UNMAPPED_BASE;
>> +
>> +        /* do sanity checks on guest memory layout */
>> +        if (mmap_next_start >= GUEST_ADDR_MAX) {
>> +            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;
>
> What if GUEST_ADDR_MAX < 0x1000000000?

this check affects 64-bit executables only where GUEST_ADDR_MAX is bigger
than that number. But I agree it's not directly visible from the code.
32-bit ones are taken care of where TASK_UNMAPPED_BASE is defined.

> I think you can just return a hard error when mmap_next_start >= GUEST_ADDR_MAX.

Can't happen, but I will rewrite it.

>> +        }
>> +
>> +        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
>> +            fprintf(stderr, "Memory too small for PIE executables.\n");
>
> Perhaps it's better to use error_report() for new code.

Ok.

Helge


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02  8:42     ` Helge Deller
@ 2023-08-02  8:44       ` Akihiko Odaki
  2023-08-02  9:34         ` Helge Deller
  0 siblings, 1 reply; 26+ messages in thread
From: Akihiko Odaki @ 2023-08-02  8:44 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 2023/08/02 17:42, Helge Deller wrote:
> On 8/2/23 09:49, Akihiko Odaki wrote:
>> On 2023/08/02 8:27, Helge Deller wrote:
>>> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address 
>>> for all
>>> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
>>>
>>> Additionally modify the elf loader to load dynamic pie executables at
>>> around:
>>> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
>>> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
>>> - 0x00000000    for 32-bit guest binaries on 32-bit host.
>>
>> Why do you change guest addresses depending on the host?
> 
> The addresses are guest-addresses.
> A 32-bit guest PIE can't be loaded at e.g. 0x5500000000,
> while a 64-bit guest PIE needs to be loaded at 0x5500000000.

I mean, why do you use address 0x00000000 for 32-bit guest binaries on 
32-bit host while you use address 0x00300000 on 64-bit host?

> 
>>> With this patch the Thread Sanitizer (TSan) application will work again,
>>> as in commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for
>>> aarch64").
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> ---
>>>   linux-user/elfload.c |  6 ++++--
>>>   linux-user/loader.h  | 12 ++++++++++++
>>>   linux-user/mmap.c    | 35 ++++++++++++++++++-----------------
>>>   3 files changed, 34 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>> index 47a118e430..8f5a79b537 100644
>>> --- a/linux-user/elfload.c
>>> +++ b/linux-user/elfload.c
>>> @@ -3021,6 +3021,7 @@ static void load_elf_image(const char 
>>> *image_name, int image_fd,
>>>       struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
>>>       struct elf_phdr *phdr;
>>>       abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
>>> +    unsigned long load_offset = 0;
>>>       int i, retval, prot_exec;
>>>       Error *err = NULL;
>>>       bool is_main_executable;
>>> @@ -3121,6 +3122,7 @@ static void load_elf_image(const char 
>>> *image_name, int image_fd,
>>>                * select guest_base.  In this case we pass a size.
>>>                */
>>>               probe_guest_base(image_name, 0, hiaddr - loaddr);
>>> +            load_offset = TASK_UNMAPPED_BASE_PIE;
>>>           }
>>>       }
>>>
>>> @@ -3138,7 +3140,7 @@ static void load_elf_image(const char 
>>> *image_name, int image_fd,
>>>        * In both cases, we will overwrite pages in this range with 
>>> mappings
>>>        * from the executable.
>>>        */
>>> -    load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, 
>>> PROT_NONE,
>>> +    load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - 
>>> loaddr + 1, PROT_NONE,
>>>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>>                               (is_main_executable ? MAP_FIXED : 0),
>>>                               -1, 0);
>>> @@ -3176,7 +3178,7 @@ static void load_elf_image(const char 
>>> *image_name, int image_fd,
>>>       info->start_data = -1;
>>>       info->end_data = 0;
>>>       /* possible start for brk is behind all sections of this ELF 
>>> file. */
>>> -    info->brk = TARGET_PAGE_ALIGN(hiaddr);
>>> +    info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
>>>       info->elf_flags = ehdr->e_flags;
>>>
>>>       prot_exec = PROT_EXEC;
>>> diff --git a/linux-user/loader.h b/linux-user/loader.h
>>> index 59cbeacf24..3bbfc108eb 100644
>>> --- a/linux-user/loader.h
>>> +++ b/linux-user/loader.h
>>> @@ -18,6 +18,18 @@
>>>   #ifndef LINUX_USER_LOADER_H
>>>   #define LINUX_USER_LOADER_H
>>>
>>> +/* where to map binaries? */
>>> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
>>> +# define TASK_UNMAPPED_BASE_PIE 0x5500000000
>>> +# define TASK_UNMAPPED_BASE    0x7000000000
>>> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>>> +# define TASK_UNMAPPED_BASE_PIE    0x00300000
>>> +# define TASK_UNMAPPED_BASE    (GUEST_ADDR_MAX - 0x20000000 + 1)
>>> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>>> +# define TASK_UNMAPPED_BASE_PIE    0x00000000
>>> +# define TASK_UNMAPPED_BASE    0x40000000
>>> +#endif
>>> +
>>>   /*
>>>    * Read a good amount of data initially, to hopefully get all the
>>>    * program headers loaded.
>>> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
>>> index c624feead0..3441198e21 100644
>>> --- a/linux-user/mmap.c
>>> +++ b/linux-user/mmap.c
>>> @@ -23,6 +23,7 @@
>>>   #include "user-internals.h"
>>>   #include "user-mmap.h"
>>>   #include "target_mman.h"
>>> +#include "loader.h"
>>>
>>>   static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
>>>   static __thread int mmap_lock_count;
>>> @@ -295,23 +296,6 @@ static bool mmap_frag(abi_ulong real_start, 
>>> abi_ulong start, abi_ulong last,
>>>       return true;
>>>   }
>>>
>>> -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
>>> -#ifdef TARGET_AARCH64
>>> -# define TASK_UNMAPPED_BASE  0x5500000000
>>> -#else
>>> -# define TASK_UNMAPPED_BASE  0x4000000000
>>> -#endif
>>> -#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>>> -#ifdef TARGET_HPPA
>>> -# define TASK_UNMAPPED_BASE  0xfa000000
>>> -#else
>>> -# define TASK_UNMAPPED_BASE  0xe0000000
>>> -#endif
>>> -#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>>> -# define TASK_UNMAPPED_BASE  0x40000000
>>> -#endif
>>> -abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
>>> -
>>>   unsigned long last_brk;
>>>
>>>   /*
>>> @@ -344,6 +328,23 @@ abi_ulong mmap_find_vma(abi_ulong start, 
>>> abi_ulong size, abi_ulong align)
>>>       abi_ulong addr;
>>>       int wrapped, repeat;
>>>
>>> +    static abi_ulong mmap_next_start;
>>> +
>>> +    /* initialize mmap_next_start if necessary */
>>> +    if (!mmap_next_start) {
>>> +        mmap_next_start = TASK_UNMAPPED_BASE;
>>> +
>>> +        /* do sanity checks on guest memory layout */
>>> +        if (mmap_next_start >= GUEST_ADDR_MAX) {
>>> +            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;
>>
>> What if GUEST_ADDR_MAX < 0x1000000000?
> 
> this check affects 64-bit executables only where GUEST_ADDR_MAX is bigger
> than that number. But I agree it's not directly visible from the code.
> 32-bit ones are taken care of where TASK_UNMAPPED_BASE is defined.
> 
>> I think you can just return a hard error when mmap_next_start >= 
>> GUEST_ADDR_MAX.
> 
> Can't happen, but I will rewrite it.
> 
>>> +        }
>>> +
>>> +        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
>>> +            fprintf(stderr, "Memory too small for PIE executables.\n");
>>
>> Perhaps it's better to use error_report() for new code.
> 
> Ok.
> 
> Helge


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02  8:44       ` Akihiko Odaki
@ 2023-08-02  9:34         ` Helge Deller
  2023-08-02  9:58           ` Akihiko Odaki
  0 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-02  9:34 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 8/2/23 10:44, Akihiko Odaki wrote:
> On 2023/08/02 17:42, Helge Deller wrote:
>> On 8/2/23 09:49, Akihiko Odaki wrote:
>>> On 2023/08/02 8:27, Helge Deller wrote:
>>>> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address for all
>>>> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
>>>>
>>>> Additionally modify the elf loader to load dynamic pie executables at
>>>> around:
>>>> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
>>>> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
>>>> - 0x00000000    for 32-bit guest binaries on 32-bit host.
>>>
>>> Why do you change guest addresses depending on the host?
>>
>> The addresses are guest-addresses.
>> A 32-bit guest PIE can't be loaded at e.g. 0x5500000000,
>> while a 64-bit guest PIE needs to be loaded at 0x5500000000.
>
> I mean, why do you use address 0x00000000 for 32-bit guest binaries on 32-bit host while you use address 0x00300000 on 64-bit host?

To keep the memory pressure for the 32-bit qemu binary minimal.
On 64-bit host we have the full 32-bit address space for the guest.

Helge



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02  9:34         ` Helge Deller
@ 2023-08-02  9:58           ` Akihiko Odaki
  2023-08-02 10:35             ` Helge Deller
  0 siblings, 1 reply; 26+ messages in thread
From: Akihiko Odaki @ 2023-08-02  9:58 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 2023/08/02 18:34, Helge Deller wrote:
> On 8/2/23 10:44, Akihiko Odaki wrote:
>> On 2023/08/02 17:42, Helge Deller wrote:
>>> On 8/2/23 09:49, Akihiko Odaki wrote:
>>>> On 2023/08/02 8:27, Helge Deller wrote:
>>>>> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address 
>>>>> for all
>>>>> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
>>>>>
>>>>> Additionally modify the elf loader to load dynamic pie executables at
>>>>> around:
>>>>> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
>>>>> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
>>>>> - 0x00000000    for 32-bit guest binaries on 32-bit host.
>>>>
>>>> Why do you change guest addresses depending on the host?
>>>
>>> The addresses are guest-addresses.
>>> A 32-bit guest PIE can't be loaded at e.g. 0x5500000000,
>>> while a 64-bit guest PIE needs to be loaded at 0x5500000000.
>>
>> I mean, why do you use address 0x00000000 for 32-bit guest binaries on 
>> 32-bit host while you use address 0x00300000 on 64-bit host?
> 
> To keep the memory pressure for the 32-bit qemu binary minimal.
> On 64-bit host we have the full 32-bit address space for the guest.
> 
> Helge
> 

That makes sense. I'm worried that using 0x00000000 may break NULL 
checks on the guest though.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02  9:58           ` Akihiko Odaki
@ 2023-08-02 10:35             ` Helge Deller
  0 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-02 10:35 UTC (permalink / raw)
  To: Akihiko Odaki, qemu-devel
  Cc: Richard Henderson, Laurent Vivier, Paolo Bonzini, Joel Stanley

On 8/2/23 11:58, Akihiko Odaki wrote:
> On 2023/08/02 18:34, Helge Deller wrote:
>> On 8/2/23 10:44, Akihiko Odaki wrote:
>>> On 2023/08/02 17:42, Helge Deller wrote:
>>>> On 8/2/23 09:49, Akihiko Odaki wrote:
>>>>> On 2023/08/02 8:27, Helge Deller wrote:
>>>>>> Fix the elf loader to calculate a valid TASK_UNMAPPED_BASE address for all
>>>>>> 32-bit architectures, based on the GUEST_ADDR_MAX constant.
>>>>>>
>>>>>> Additionally modify the elf loader to load dynamic pie executables at
>>>>>> around:
>>>>>> ~ 0x5500000000  for 64-bit guest binaries on 64-bit host,
>>>>>> - 0x00300000    for 32-bit guest binaries on 64-bit host, and
>>>>>> - 0x00000000    for 32-bit guest binaries on 32-bit host.
>>>>>
>>>>> Why do you change guest addresses depending on the host?
>>>>
>>>> The addresses are guest-addresses.
>>>> A 32-bit guest PIE can't be loaded at e.g. 0x5500000000,
>>>> while a 64-bit guest PIE needs to be loaded at 0x5500000000.
>>>
>>> I mean, why do you use address 0x00000000 for 32-bit guest binaries on 32-bit host while you use address 0x00300000 on 64-bit host?
>>
>> To keep the memory pressure for the 32-bit qemu binary minimal.
>> On 64-bit host we have the full 32-bit address space for the guest.
>>
>> Helge
>>
>
> That makes sense. I'm worried that using 0x00000000 may break NULL checks on the guest though.

probably not, because 0 means to search any free space.
It's not fixed, it will be searched from there.

Helge


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables
  2023-08-01 23:27 ` [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables Helge Deller
@ 2023-08-02 18:25   ` Richard Henderson
  2023-08-02 19:51     ` Helge Deller
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2023-08-02 18:25 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/1/23 16:27, Helge Deller wrote:
> Reorganize the guest memory layout to get as much memory as possible for
> heap for the guest application.
> 
> This patch optimizes the memory layout by loading pie executables
> into lower memory and shared libs into higher memory (at
> TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap
> space which will be located directly after the executable.
> Up to now, pie executable and shared libs were loaded directly behind
> each other in the area at TASK_UNMAPPED_BASE, which leaves very little
> space for heap.
> 
> I tested this patchset with chroots of alpha, arm, armel, arm64, hppa, m68k,
> mips64el, mipsel, powerpc, ppc64, ppc64el, s390x, sh4 and sparc64 on a x86-64
> host, and with a static armhf binary (which fails to run without this patch).
> 
> This patch temporarily breaks the Thread Sanitizer (TSan) application
> which expects specific boundary definitions for memory mappings on
> different platforms [1], see commit aab613fb9597 ("linux-user: Update
> TASK_UNMAPPED_BASE for aarch64") for aarch64. The follow-up patch fixes it
> again.
> 
> [1] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> ---
>   linux-user/elfload.c | 55 +++++++++++++-------------------------------
>   linux-user/mmap.c    |  8 ++++---
>   2 files changed, 21 insertions(+), 42 deletions(-)
> 
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 2aee2298ec..47a118e430 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>       abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
>       int i, retval, prot_exec;
>       Error *err = NULL;
> +    bool is_main_executable;
> 
>       /* First of all, some simple consistency checks */
>       if (!elf_check_ident(ehdr)) {
> @@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd,
>           }
>       }
> 
> -    if (pinterp_name != NULL) {
> -        /*
> -         * This is the main executable.
> -         *
> -         * Reserve extra space for brk.
> -         * We hold on to this space while placing the interpreter
> -         * and the stack, lest they be placed immediately after
> -         * the data segment and block allocation from the brk.
> -         *
> -         * 16MB is chosen as "large enough" without being so large as
> -         * to allow the result to not fit with a 32-bit guest on a
> -         * 32-bit host. However some 64 bit guests (e.g. s390x)
> -         * attempt to place their heap further ahead and currently
> -         * nothing stops them smashing into QEMUs address space.
> -         */
> -#if TARGET_LONG_BITS == 64
> -        info->reserve_brk = 32 * MiB;
> -#else
> -        info->reserve_brk = 16 * MiB;
> -#endif
> -        hiaddr += info->reserve_brk;
> -
> +    is_main_executable = (pinterp_name != NULL);

This will be false for static main executables.


> +    if (is_main_executable) {
>           if (ehdr->e_type == ET_EXEC) {
>               /*
>                * Make sure that the low address does not conflict with
> @@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>               probe_guest_base(image_name, loaddr, hiaddr);
>           } else {
>               /*
> -             * The binary is dynamic, but we still need to
> +             * The binary is dynamic (pie-executabe), but we still need to
>                * select guest_base.  In this case we pass a size.
>                */
>               probe_guest_base(image_name, 0, hiaddr - loaddr);
> @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>        */
>       load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> -                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
> +                            (is_main_executable ? MAP_FIXED : 0),

This is definitely wrong, as all ET_EXEC require FIXED.
(In addition to static, it is possible to write an ET_EXEC interpreter.)

> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -299,14 +299,16 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
>   #ifdef TARGET_AARCH64
>   # define TASK_UNMAPPED_BASE  0x5500000000
>   #else
> -# define TASK_UNMAPPED_BASE  (1ul << 38)
> +# define TASK_UNMAPPED_BASE  0x4000000000
>   #endif
> -#else
> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>   #ifdef TARGET_HPPA
>   # define TASK_UNMAPPED_BASE  0xfa000000
>   #else
> -# define TASK_UNMAPPED_BASE  0x40000000
> +# define TASK_UNMAPPED_BASE  0xe0000000
>   #endif
> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
> +# define TASK_UNMAPPED_BASE  0x40000000
>   #endif
>   abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;

This should be a separate change.

While we're at it, we should move this to e.g. linux-user/$GUEST/target_mman.h, and make 
this match each guest kernel's TASK_UNMAPPED_BASE.  It really shouldn't depend on the host 
at all.


r~



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-01 23:27 ` [PATCH v6 8/8] linux-user: Load pie executables at upper memory Helge Deller
  2023-08-02  7:49   ` Akihiko Odaki
@ 2023-08-02 18:36   ` Richard Henderson
  2023-08-02 19:57     ` Helge Deller
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2023-08-02 18:36 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/1/23 16:27, Helge Deller wrote:
> +/* where to map binaries? */
> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> +# define TASK_UNMAPPED_BASE_PIE 0x5500000000
> +# define TASK_UNMAPPED_BASE	0x7000000000
> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
> +# define TASK_UNMAPPED_BASE_PIE	0x00300000
> +# define TASK_UNMAPPED_BASE	(GUEST_ADDR_MAX - 0x20000000 + 1)
> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
> +# define TASK_UNMAPPED_BASE_PIE	0x00000000
> +# define TASK_UNMAPPED_BASE	0x40000000
> +#endif

It would probably be easier to follow if you use the kernel's name: ELF_ET_DYN_BASE. 
Should be in linux-user/$GUEST/target_mmap.h with TASK_UNMAPPED_BASE, per my comment vs 
patch 7.


> +        /* do sanity checks on guest memory layout */
> +        if (mmap_next_start >= GUEST_ADDR_MAX) {
> +            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;
> +        }

What in the world is that random number?  It certainly won't compile on 32-bit host, and 
certainly isn't relevant to a 32-bit guest.

> +        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
> +            fprintf(stderr, "Memory too small for PIE executables.\n");
> +            exit(EXIT_FAILURE);
> +        }

Really bad timing for this diagnostic.  What if a PIE executable isn't even being used?

Both TASK_UNMAPPED_BASE and ELF_ET_DYN_BASE could be computed in main (or a subroutine), 
when reserved_va is assigned.


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables
  2023-08-02 18:25   ` Richard Henderson
@ 2023-08-02 19:51     ` Helge Deller
  2023-08-02 19:57       ` Richard Henderson
  0 siblings, 1 reply; 26+ messages in thread
From: Helge Deller @ 2023-08-02 19:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/2/23 20:25, Richard Henderson wrote:
> On 8/1/23 16:27, Helge Deller wrote:
>> Reorganize the guest memory layout to get as much memory as possible for
>> heap for the guest application.
>>
>> This patch optimizes the memory layout by loading pie executables
>> into lower memory and shared libs into higher memory (at
>> TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap
>> space which will be located directly after the executable.
>> Up to now, pie executable and shared libs were loaded directly behind
>> each other in the area at TASK_UNMAPPED_BASE, which leaves very little
>> space for heap.
>>
>> I tested this patchset with chroots of alpha, arm, armel, arm64, hppa, m68k,
>> mips64el, mipsel, powerpc, ppc64, ppc64el, s390x, sh4 and sparc64 on a x86-64
>> host, and with a static armhf binary (which fails to run without this patch).
>>
>> This patch temporarily breaks the Thread Sanitizer (TSan) application
>> which expects specific boundary definitions for memory mappings on
>> different platforms [1], see commit aab613fb9597 ("linux-user: Update
>> TASK_UNMAPPED_BASE for aarch64") for aarch64. The follow-up patch fixes it
>> again.
>>
>> [1] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> ---
>>   linux-user/elfload.c | 55 +++++++++++++-------------------------------
>>   linux-user/mmap.c    |  8 ++++---
>>   2 files changed, 21 insertions(+), 42 deletions(-)
>>
>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>> index 2aee2298ec..47a118e430 100644
>> --- a/linux-user/elfload.c
>> +++ b/linux-user/elfload.c
>> @@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>       abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
>>       int i, retval, prot_exec;
>>       Error *err = NULL;
>> +    bool is_main_executable;
>>
>>       /* First of all, some simple consistency checks */
>>       if (!elf_check_ident(ehdr)) {
>> @@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd,
>>           }
>>       }
>>
>> -    if (pinterp_name != NULL) {
>> -        /*
>> -         * This is the main executable.
>> -         *
>> -         * Reserve extra space for brk.
>> -         * We hold on to this space while placing the interpreter
>> -         * and the stack, lest they be placed immediately after
>> -         * the data segment and block allocation from the brk.
>> -         *
>> -         * 16MB is chosen as "large enough" without being so large as
>> -         * to allow the result to not fit with a 32-bit guest on a
>> -         * 32-bit host. However some 64 bit guests (e.g. s390x)
>> -         * attempt to place their heap further ahead and currently
>> -         * nothing stops them smashing into QEMUs address space.
>> -         */
>> -#if TARGET_LONG_BITS == 64
>> -        info->reserve_brk = 32 * MiB;
>> -#else
>> -        info->reserve_brk = 16 * MiB;
>> -#endif
>> -        hiaddr += info->reserve_brk;
>> -
>> +    is_main_executable = (pinterp_name != NULL);
>
> This will be false for static main executables.

[deller@p100 qemu-helge-user-armhf]$ file fstype
fstype: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, interpreter /lib/klibc-PupSAGgtpafMlSLXOLgje1kXFo8.so, BuildID[sha1]=45aac32edcd204fd6fa06febf3abff274ab39b26, stripped

And for real static binaries (without interpreter), the "loadaddr" is set in the mmap below,
and MAP_FIX isn't needed then.

>> +    if (is_main_executable) {
>>           if (ehdr->e_type == ET_EXEC) {
>>               /*
>>                * Make sure that the low address does not conflict with
>> @@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>               probe_guest_base(image_name, loaddr, hiaddr);
>>           } else {
>>               /*
>> -             * The binary is dynamic, but we still need to
>> +             * The binary is dynamic (pie-executabe), but we still need to
>>                * select guest_base.  In this case we pass a size.
>>                */
>>               probe_guest_base(image_name, 0, hiaddr - loaddr);
>> @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>        */
>>       load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>> -                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>> +                            (is_main_executable ? MAP_FIXED : 0),
>
> This is definitely wrong, as all ET_EXEC require FIXED.

Not if the PIE flag is set too and even here the loadaddr defines where
it will be loaded then.

> (In addition to static, it is possible to write an ET_EXEC interpreter.)
>
>> --- a/linux-user/mmap.c
>> +++ b/linux-user/mmap.c
>> @@ -299,14 +299,16 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
>>   #ifdef TARGET_AARCH64
>>   # define TASK_UNMAPPED_BASE  0x5500000000
>>   #else
>> -# define TASK_UNMAPPED_BASE  (1ul << 38)
>> +# define TASK_UNMAPPED_BASE  0x4000000000
>>   #endif
>> -#else
>> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>>   #ifdef TARGET_HPPA
>>   # define TASK_UNMAPPED_BASE  0xfa000000
>>   #else
>> -# define TASK_UNMAPPED_BASE  0x40000000
>> +# define TASK_UNMAPPED_BASE  0xe0000000
>>   #endif
>> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>> +# define TASK_UNMAPPED_BASE  0x40000000
>>   #endif
>>   abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
>
> This should be a separate change.
>
> While we're at it, we should move this to e.g.
> linux-user/$GUEST/target_mman.h, and make this match each guest
> kernel's TASK_UNMAPPED_BASE.  It really shouldn't depend on the host
> at all.

I think it matters if you want to run a 32-bit guest on 32-bit host?
Those TASK_UNMAPPED_BASE are actually TARGET_TASK_UNMAPPED_BASE values.

Helge


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables
  2023-08-02 19:51     ` Helge Deller
@ 2023-08-02 19:57       ` Richard Henderson
  2023-08-02 20:06         ` Helge Deller
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2023-08-02 19:57 UTC (permalink / raw)
  To: Helge Deller, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/2/23 12:51, Helge Deller wrote:
>>> @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>        */
>>>       load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>> -                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>> +                            (is_main_executable ? MAP_FIXED : 0),
>>
>> This is definitely wrong, as all ET_EXEC require FIXED.
> 
> Not if the PIE flag is set too...

What in the world are you talking about?
There is no "PIE flag" in ELF.


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 8/8] linux-user: Load pie executables at upper memory
  2023-08-02 18:36   ` Richard Henderson
@ 2023-08-02 19:57     ` Helge Deller
  0 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-02 19:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/2/23 20:36, Richard Henderson wrote:
> On 8/1/23 16:27, Helge Deller wrote:
>> +/* where to map binaries? */
>> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
>> +# define TASK_UNMAPPED_BASE_PIE 0x5500000000
>> +# define TASK_UNMAPPED_BASE    0x7000000000
>> +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
>> +# define TASK_UNMAPPED_BASE_PIE    0x00300000
>> +# define TASK_UNMAPPED_BASE    (GUEST_ADDR_MAX - 0x20000000 + 1)
>> +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
>> +# define TASK_UNMAPPED_BASE_PIE    0x00000000
>> +# define TASK_UNMAPPED_BASE    0x40000000
>> +#endif
>
> It would probably be easier to follow if you use the kernel's name:
> ELF_ET_DYN_BASE.

Agreed.

> Should be in linux-user/$GUEST/target_mmap.h with
> TASK_UNMAPPED_BASE, per my comment vs patch 7.

Maybe, but see my comments/answers there...

>> +        /* do sanity checks on guest memory layout */
>> +        if (mmap_next_start >= GUEST_ADDR_MAX) {
>> +            mmap_next_start = GUEST_ADDR_MAX - 0x1000000000 + 1;
>> +        }
>
> What in the world is that random number?

It's random. Idea is to keep enough space for shared libs below GUEST_ADDR_MAX.

> It certainly won't compile
> on 32-bit host, and certainly isn't relevant to a 32-bit guest.

That code will never be compiled/executed on 32-bit guests & hosts.
The defines for TASK_UNMAPPED_BASE take already care of that.
But I agree it's not directly visible (and probably not nice that way).

>> +        if (TASK_UNMAPPED_BASE_PIE >= mmap_next_start) {
>> +            fprintf(stderr, "Memory too small for PIE executables.\n");
>> +            exit(EXIT_FAILURE);
>> +        }
>
> Really bad timing for this diagnostic.  What if a PIE executable isn't even being used?
>
> Both TASK_UNMAPPED_BASE and ELF_ET_DYN_BASE could be computed in main (or a subroutine), when reserved_va is assigned.

Helge



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables
  2023-08-02 19:57       ` Richard Henderson
@ 2023-08-02 20:06         ` Helge Deller
  0 siblings, 0 replies; 26+ messages in thread
From: Helge Deller @ 2023-08-02 20:06 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Laurent Vivier, Paolo Bonzini, Joel Stanley, Akihiko Odaki

On 8/2/23 21:57, Richard Henderson wrote:
> On 8/2/23 12:51, Helge Deller wrote:
>>>> @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>>        */
>>>>       load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>>>                               MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>>> -                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>>> +                            (is_main_executable ? MAP_FIXED : 0),
>>>
>>> This is definitely wrong, as all ET_EXEC require FIXED.
>>
>> Not if the PIE flag is set too...
> 
> What in the world are you talking about?
> There is no "PIE flag" in ELF.

I probably used the wrong wording (and meant the semi-static binaries
which still have e.g. klibc as interpreter)-
Anyway, loadaddr defines the address already.

Helge

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-08-02 20:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-01 23:27 [PATCH v6 0/8] linux-user: brk fixes Helge Deller
2023-08-01 23:27 ` [PATCH v6 1/8] linux-user: Unset MAP_FIXED_NOREPLACE for host Helge Deller
2023-08-01 23:40   ` Richard Henderson
2023-08-01 23:27 ` [PATCH v6 2/8] linux-user: Do not call get_errno() in do_brk() Helge Deller
2023-08-01 23:27 ` [PATCH v6 3/8] linux-user: Use MAP_FIXED_NOREPLACE for do_brk() Helge Deller
2023-08-01 23:27 ` [PATCH v6 4/8] linux-user: Do nothing if too small brk is specified Helge Deller
2023-08-01 23:27 ` [PATCH v6 5/8] linux-user: Do not align brk with host page size Helge Deller
2023-08-01 23:27 ` [PATCH v6 6/8] linux-user: Show heap address in /proc/pid/maps Helge Deller
2023-08-02  5:41   ` Philippe Mathieu-Daudé
2023-08-02  6:07     ` Helge Deller
2023-08-01 23:27 ` [PATCH v6 7/8] linux-user: Optimize memory layout for static and dynamic executables Helge Deller
2023-08-02 18:25   ` Richard Henderson
2023-08-02 19:51     ` Helge Deller
2023-08-02 19:57       ` Richard Henderson
2023-08-02 20:06         ` Helge Deller
2023-08-01 23:27 ` [PATCH v6 8/8] linux-user: Load pie executables at upper memory Helge Deller
2023-08-02  7:49   ` Akihiko Odaki
2023-08-02  8:42     ` Helge Deller
2023-08-02  8:44       ` Akihiko Odaki
2023-08-02  9:34         ` Helge Deller
2023-08-02  9:58           ` Akihiko Odaki
2023-08-02 10:35             ` Helge Deller
2023-08-02 18:36   ` Richard Henderson
2023-08-02 19:57     ` Helge Deller
2023-08-02  2:21 ` [PATCH v6 0/8] linux-user: brk fixes Joel Stanley
2023-08-02  6:10   ` Helge Deller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).