* [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes
@ 2023-08-07 16:36 Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
` (14 more replies)
0 siblings, 15 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
This is the second half of
https://patchew.org/QEMU/20230804220032.295411-1-richard.henderson@linaro.org/
which I held back because of regressions with s390x testing.
It turns out that patch 4, "Use MAP_FIXED_NOREPLACE for initial image mmap"
actually triggered EEXIST, which meant that probe_guest_base did not
do its job to select unused host virtual memory. It's a mystery why
we have not seen larger problems because of this.
As I kept digging, I found quite a number of problems within
probe_guest_base and its subroutines. I have rewritten it completely.
Hopefully it is much easier to understand in its new form.
Testing this has been difficult, because it is most visible with
non-PIE executables, and most modern distros default to PIE, and
our current implementation of --disable-pie does not work.
r~
Helge Deller (1):
linux-user: Adjust initial brk when interpreter is close to executable
Richard Henderson (13):
linux-user: Adjust task_unmapped_base for reserved_va
linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
linux-user: Do not adjust image mapping for host page size
linux-user: Do not adjust zero_bss for host page size
linux-user: Use zero_bss for PT_LOAD with no file contents too
util/selfmap: Rewrite using qemu/interval-tree.h
linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base
linux-user: Consolidate guest bounds check in probe_guest_base
linux-user: Rewrite fixed probe_guest_base
linux-user: Rewrite non-fixed probe_guest_base
include/qemu/selfmap.h | 20 +-
linux-user/aarch64/target_mman.h | 13 +
linux-user/alpha/target_mman.h | 11 +
linux-user/arm/target_mman.h | 11 +
linux-user/cris/target_mman.h | 12 +
linux-user/hexagon/target_mman.h | 13 +
linux-user/hppa/target_mman.h | 6 +
linux-user/i386/target_mman.h | 16 +
linux-user/loongarch64/target_mman.h | 11 +
linux-user/m68k/target_mman.h | 5 +
linux-user/microblaze/target_mman.h | 11 +
linux-user/mips/target_mman.h | 10 +
linux-user/nios2/target_mman.h | 10 +
linux-user/openrisc/target_mman.h | 10 +
linux-user/ppc/target_mman.h | 20 +
linux-user/qemu.h | 1 -
linux-user/riscv/target_mman.h | 10 +
linux-user/s390x/target_mman.h | 20 +
linux-user/sh4/target_mman.h | 7 +
linux-user/sparc/target_mman.h | 25 +
linux-user/user-mmap.h | 5 +-
linux-user/x86_64/target_mman.h | 15 +
linux-user/xtensa/target_mman.h | 10 +
linux-user/elfload.c | 788 +++++++++++++--------------
linux-user/main.c | 43 ++
linux-user/mmap.c | 19 +-
linux-user/syscall.c | 15 +-
util/selfmap.c | 114 ++--
28 files changed, 777 insertions(+), 474 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 9:10 ` Alex Bennée
2023-08-08 15:35 ` Helge Deller
2023-08-07 16:36 ` [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h Richard Henderson
` (13 subsequent siblings)
14 siblings, 2 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Ensure that the chosen values for mmap_next_start and
task_unmapped_base are within the guest address space.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/user-mmap.h | 18 +++++++++++++++++-
linux-user/main.c | 28 ++++++++++++++++++++++++++++
linux-user/mmap.c | 18 +++---------------
3 files changed, 48 insertions(+), 16 deletions(-)
diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index 7265c2c116..fd456e024e 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -18,6 +18,23 @@
#ifndef LINUX_USER_USER_MMAP_H
#define LINUX_USER_USER_MMAP_H
+#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
+#ifdef TARGET_AARCH64
+# define TASK_UNMAPPED_BASE 0x5500000000
+#else
+# define TASK_UNMAPPED_BASE (1ul << 38)
+#endif
+#else
+#ifdef TARGET_HPPA
+# define TASK_UNMAPPED_BASE 0xfa000000
+#else
+# define TASK_UNMAPPED_BASE 0x40000000
+#endif
+#endif
+
+extern abi_ulong task_unmapped_base;
+extern abi_ulong mmap_next_start;
+
int target_mprotect(abi_ulong start, abi_ulong len, int prot);
abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
int flags, int fd, off_t offset);
@@ -26,7 +43,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
abi_ulong new_size, unsigned long flags,
abi_ulong new_addr);
abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice);
-extern abi_ulong mmap_next_start;
abi_ulong mmap_find_vma(abi_ulong, abi_ulong, abi_ulong);
void mmap_fork_start(void);
void mmap_fork_end(int child);
diff --git a/linux-user/main.c b/linux-user/main.c
index 556956c363..be621dc792 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -821,6 +821,34 @@ int main(int argc, char **argv, char **envp)
reserved_va = max_reserved_va;
}
+ /*
+ * Temporarily disable
+ * "comparison is always false due to limited range of data type"
+ * due to comparison between (possible) uint64_t and uintptr_t.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtype-limits"
+
+ /*
+ * Select an initial value for task_unmapped_base that is in range.
+ */
+ if (reserved_va) {
+ if (TASK_UNMAPPED_BASE < reserved_va) {
+ task_unmapped_base = TASK_UNMAPPED_BASE;
+ } else {
+ /* The most common default formula is TASK_SIZE / 3. */
+ task_unmapped_base = TARGET_PAGE_ALIGN(reserved_va / 3);
+ }
+ } else if (TASK_UNMAPPED_BASE < UINTPTR_MAX) {
+ task_unmapped_base = TASK_UNMAPPED_BASE;
+ } else {
+ /* 32-bit host: pick something medium size. */
+ task_unmapped_base = 0x10000000;
+ }
+ mmap_next_start = task_unmapped_base;
+
+#pragma GCC diagnostic pop
+
{
Error *err = NULL;
if (seed_optarg != NULL) {
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index eb04fab8ab..84436d45c8 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -299,20 +299,8 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
return true;
}
-#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
-#ifdef TARGET_AARCH64
-# define TASK_UNMAPPED_BASE 0x5500000000
-#else
-# define TASK_UNMAPPED_BASE (1ul << 38)
-#endif
-#else
-#ifdef TARGET_HPPA
-# define TASK_UNMAPPED_BASE 0xfa000000
-#else
-# define TASK_UNMAPPED_BASE 0x40000000
-#endif
-#endif
-abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
+abi_ulong task_unmapped_base;
+abi_ulong mmap_next_start;
/*
* Subroutine of mmap_find_vma, used when we have pre-allocated
@@ -391,7 +379,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align)
if ((addr & (align - 1)) == 0) {
/* Success. */
- if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
+ if (start == mmap_next_start && addr >= task_unmapped_base) {
mmap_next_start = addr + size;
}
return addr;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 9:19 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 03/14] linux-user: Define ELF_ET_DYN_BASE " Richard Henderson
` (12 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Provide default values that are as close as possible to the
values used by the guest's kernel.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/aarch64/target_mman.h | 10 ++++++++++
linux-user/alpha/target_mman.h | 8 ++++++++
linux-user/arm/target_mman.h | 8 ++++++++
linux-user/cris/target_mman.h | 9 +++++++++
linux-user/hexagon/target_mman.h | 10 ++++++++++
linux-user/hppa/target_mman.h | 3 +++
linux-user/i386/target_mman.h | 13 +++++++++++++
linux-user/loongarch64/target_mman.h | 8 ++++++++
linux-user/m68k/target_mman.h | 3 +++
linux-user/microblaze/target_mman.h | 8 ++++++++
linux-user/mips/target_mman.h | 7 +++++++
linux-user/nios2/target_mman.h | 7 +++++++
linux-user/openrisc/target_mman.h | 7 +++++++
linux-user/ppc/target_mman.h | 13 +++++++++++++
linux-user/riscv/target_mman.h | 7 +++++++
linux-user/s390x/target_mman.h | 10 ++++++++++
linux-user/sh4/target_mman.h | 4 ++++
linux-user/sparc/target_mman.h | 14 ++++++++++++++
linux-user/user-mmap.h | 14 --------------
linux-user/x86_64/target_mman.h | 12 ++++++++++++
linux-user/xtensa/target_mman.h | 6 ++++++
21 files changed, 167 insertions(+), 14 deletions(-)
diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h
index f721295fe1..4d3eecfb26 100644
--- a/linux-user/aarch64/target_mman.h
+++ b/linux-user/aarch64/target_mman.h
@@ -4,6 +4,16 @@
#define TARGET_PROT_BTI 0x10
#define TARGET_PROT_MTE 0x20
+/*
+ * arch/arm64/include/asm/processor.h:
+ *
+ * TASK_UNMAPPED_BASE DEFAULT_MAP_WINDOW / 4
+ * DEFAULT_MAP_WINDOW DEFAULT_MAP_WINDOW_64
+ * DEFAULT_MAP_WINDOW_64 UL(1) << VA_BITS_MIN
+ * VA_BITS_MIN 48 (unless explicitly configured smaller)
+ */
+#define TASK_UNMAPPED_BASE (1ull << (48 - 2))
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h
index 6bb03e7336..c90b493711 100644
--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -20,6 +20,14 @@
#define TARGET_MS_SYNC 2
#define TARGET_MS_INVALIDATE 4
+/*
+ * arch/alpha/include/asm/processor.h:
+ *
+ * TASK_UNMAPPED_BASE TASK_SIZE / 2
+ * TASK_SIZE 0x40000000000UL
+ */
+#define TASK_UNMAPPED_BASE 0x20000000000ull
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/arm/target_mman.h b/linux-user/arm/target_mman.h
index e7ba6070fe..76275b2c7e 100644
--- a/linux-user/arm/target_mman.h
+++ b/linux-user/arm/target_mman.h
@@ -1 +1,9 @@
+/*
+ * arch/arm/include/asm/memory.h
+ * TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M)
+ * TASK_SIZE CONFIG_PAGE_OFFSET
+ * CONFIG_PAGE_OFFSET 0xC0000000 (default in Kconfig)
+ */
+#define TASK_UNMAPPED_BASE 0x40000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/cris/target_mman.h b/linux-user/cris/target_mman.h
index e7ba6070fe..9df7b1eda5 100644
--- a/linux-user/cris/target_mman.h
+++ b/linux-user/cris/target_mman.h
@@ -1 +1,10 @@
+/*
+ * arch/cris/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+ *
+ * arch/cris/include/arch-v32/arch/processor.h
+ * TASK_SIZE 0xb0000000
+ */
+#define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0xb0000000 / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/hexagon/target_mman.h b/linux-user/hexagon/target_mman.h
index e7ba6070fe..c5ae336e07 100644
--- a/linux-user/hexagon/target_mman.h
+++ b/linux-user/hexagon/target_mman.h
@@ -1 +1,11 @@
+/*
+ * arch/hexgon/include/asm/processor.h
+ * TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
+ *
+ * arch/hexagon/include/asm/mem-layout.h
+ * TASK_SIZE PAGE_OFFSET
+ * PAGE_OFFSET 0xc0000000
+ */
+#define TASK_UNMAPPED_BASE 0x40000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h
index 97f87d042a..6459e7dbdd 100644
--- a/linux-user/hppa/target_mman.h
+++ b/linux-user/hppa/target_mman.h
@@ -24,6 +24,9 @@
#define TARGET_MS_ASYNC 2
#define TARGET_MS_INVALIDATE 4
+/* arch/parisc/include/asm/processor.h: DEFAULT_MAP_BASE32 */
+#define TASK_UNMAPPED_BASE 0x40000000
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/i386/target_mman.h b/linux-user/i386/target_mman.h
index e7ba6070fe..cc3382007f 100644
--- a/linux-user/i386/target_mman.h
+++ b/linux-user/i386/target_mman.h
@@ -1 +1,14 @@
+/*
+ * arch/x86/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE_LOW)
+ * __TASK_UNMAPPED_BASE(S) PAGE_ALIGN(S / 3)
+ *
+ * arch/x86/include/asm/page_32_types.h:
+ * TASK_SIZE_LOW TASK_SIZE
+ * TASK_SIZE __PAGE_OFFSET
+ * __PAGE_OFFSET CONFIG_PAGE_OFFSET
+ * CONFIG_PAGE_OFFSET 0xc0000000 (default in Kconfig)
+ */
+#define TASK_UNMAPPED_BASE 0x40000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/loongarch64/target_mman.h b/linux-user/loongarch64/target_mman.h
index e7ba6070fe..d70e44d44c 100644
--- a/linux-user/loongarch64/target_mman.h
+++ b/linux-user/loongarch64/target_mman.h
@@ -1 +1,9 @@
+/*
+ * arch/loongarch/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
+ * TASK_SIZE64 0x1UL << (... ? VA_BITS : ...)
+ */
+#define TASK_UNMAPPED_BASE \
+ TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/m68k/target_mman.h b/linux-user/m68k/target_mman.h
index e7ba6070fe..d3eceb663b 100644
--- a/linux-user/m68k/target_mman.h
+++ b/linux-user/m68k/target_mman.h
@@ -1 +1,4 @@
+/* arch/m68k/include/asm/processor.h */
+#define TASK_UNMAPPED_BASE 0xC0000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/microblaze/target_mman.h b/linux-user/microblaze/target_mman.h
index e7ba6070fe..ffee869db4 100644
--- a/linux-user/microblaze/target_mman.h
+++ b/linux-user/microblaze/target_mman.h
@@ -1 +1,9 @@
+/*
+ * arch/microblaze/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE (TASK_SIZE / 8 * 3)
+ * TASK_SIZE CONFIG_KERNEL_START
+ * CONFIG_KERNEL_START 0xc0000000 (default in Kconfig)
+ */
+#define TASK_UNMAPPED_BASE 0x48000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h
index e97694aa4e..fe1eec2d0b 100644
--- a/linux-user/mips/target_mman.h
+++ b/linux-user/mips/target_mman.h
@@ -14,6 +14,13 @@
#define TARGET_MAP_STACK 0x40000
#define TARGET_MAP_HUGETLB 0x80000
+/*
+ * arch/mips/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
+ */
+#define TASK_UNMAPPED_BASE \
+ TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/nios2/target_mman.h b/linux-user/nios2/target_mman.h
index e7ba6070fe..ce18f4f871 100644
--- a/linux-user/nios2/target_mman.h
+++ b/linux-user/nios2/target_mman.h
@@ -1 +1,8 @@
+/*
+ * arch/nios2/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
+ * TASK_SIZE 0x7FFF0000UL
+ */
+#define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0x7FFF0000 / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/openrisc/target_mman.h b/linux-user/openrisc/target_mman.h
index e7ba6070fe..f1aaad809d 100644
--- a/linux-user/openrisc/target_mman.h
+++ b/linux-user/openrisc/target_mman.h
@@ -1 +1,8 @@
+/*
+ * arch/openrisc/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE (TASK_SIZE / 8 * 3)
+ * TASK_SIZE (0x80000000UL)
+ */
+#define TASK_UNMAPPED_BASE 0x30000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/ppc/target_mman.h b/linux-user/ppc/target_mman.h
index 67cc218f2e..04f99c6077 100644
--- a/linux-user/ppc/target_mman.h
+++ b/linux-user/ppc/target_mman.h
@@ -4,6 +4,19 @@
#define TARGET_MAP_NORESERVE 0x40
#define TARGET_MAP_LOCKED 0x80
+/*
+ * arch/powerpc/include/asm/task_size_64.h
+ * TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4))
+ * TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(DEFAULT_MAP_WINDOW_USER64 / 4))
+ * TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
+ * DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB (with 4k pages)
+ */
+#ifdef TARGET_PPC64
+#define TASK_UNMAPPED_BASE 0x0000100000000000ull
+#else
+#define TASK_UNMAPPED_BASE 0x40000000
+#endif
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/riscv/target_mman.h b/linux-user/riscv/target_mman.h
index e7ba6070fe..0f06dadbd4 100644
--- a/linux-user/riscv/target_mman.h
+++ b/linux-user/riscv/target_mman.h
@@ -1 +1,8 @@
+/*
+ * arch/loongarch/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
+ */
+#define TASK_UNMAPPED_BASE \
+ TARGET_PAGE_ALIGN((1ull << (TARGET_VIRT_ADDR_SPACE_BITS - 1)) / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/s390x/target_mman.h b/linux-user/s390x/target_mman.h
index e7ba6070fe..40d149b329 100644
--- a/linux-user/s390x/target_mman.h
+++ b/linux-user/s390x/target_mman.h
@@ -1 +1,11 @@
+/*
+ * arch/s390/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE (... : (_REGION2_SIZE >> 1))
+ *
+ * arch/s390/include/asm/pgtable.h:
+ * _REGION2_SIZE (1UL << _REGION2_SHIFT)
+ * _REGION2_SHIFT 42
+ */
+#define TASK_UNMAPPED_BASE (1ull << 41)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/sh4/target_mman.h b/linux-user/sh4/target_mman.h
index e7ba6070fe..bbbc223398 100644
--- a/linux-user/sh4/target_mman.h
+++ b/linux-user/sh4/target_mman.h
@@ -1 +1,5 @@
+/* arch/sh/include/asm/processor_32.h */
+#define TASK_UNMAPPED_BASE \
+ TARGET_PAGE_ALIGN((1u << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/sparc/target_mman.h b/linux-user/sparc/target_mman.h
index 9bad99c852..692ebf9dd7 100644
--- a/linux-user/sparc/target_mman.h
+++ b/linux-user/sparc/target_mman.h
@@ -5,6 +5,20 @@
#define TARGET_MAP_LOCKED 0x100
#define TARGET_MAP_GROWSDOWN 0x0200
+/*
+ * arch/sparc/include/asm/page_64.h:
+ * TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \
+ * _AC(0x0000000070000000,UL) : \
+ * VA_EXCLUDE_END)
+ * But VA_EXCLUDE_END is > 0xffff800000000000UL which doesn't work
+ * in userland emulation.
+ */
+#ifdef TARGET_ABI32
+#define TASK_UNMAPPED_BASE 0x70000000
+#else
+#define TASK_UNMAPPED_BASE (1ull << (TARGET_VIRT_ADDR_SPACE_BITS - 2))
+#endif
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index fd456e024e..bae49059e0 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -18,20 +18,6 @@
#ifndef LINUX_USER_USER_MMAP_H
#define LINUX_USER_USER_MMAP_H
-#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
-#ifdef TARGET_AARCH64
-# define TASK_UNMAPPED_BASE 0x5500000000
-#else
-# define TASK_UNMAPPED_BASE (1ul << 38)
-#endif
-#else
-#ifdef TARGET_HPPA
-# define TASK_UNMAPPED_BASE 0xfa000000
-#else
-# define TASK_UNMAPPED_BASE 0x40000000
-#endif
-#endif
-
extern abi_ulong task_unmapped_base;
extern abi_ulong mmap_next_start;
diff --git a/linux-user/x86_64/target_mman.h b/linux-user/x86_64/target_mman.h
index e7ba6070fe..f9ff652b37 100644
--- a/linux-user/x86_64/target_mman.h
+++ b/linux-user/x86_64/target_mman.h
@@ -1 +1,13 @@
+/*
+ * arch/x86/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE_LOW)
+ * __TASK_UNMAPPED_BASE(S) PAGE_ALIGN(S / 3)
+ *
+ * arch/x86/include/asm/page_64_types.h:
+ * TASK_SIZE_LOW DEFAULT_MAP_WINDOW
+ * DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE)
+ */
+#define TASK_UNMAPPED_BASE \
+ TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/xtensa/target_mman.h b/linux-user/xtensa/target_mman.h
index 3933771b5b..c4f671adb7 100644
--- a/linux-user/xtensa/target_mman.h
+++ b/linux-user/xtensa/target_mman.h
@@ -14,6 +14,12 @@
#define TARGET_MAP_STACK 0x40000
#define TARGET_MAP_HUGETLB 0x80000
+/*
+ * arch/xtensa/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE (TASK_SIZE / 2)
+ */
+#define TASK_UNMAPPED_BASE (1u << (TARGET_VIRT_ADDR_SPACE_BITS - 1))
+
#include "../generic/target_mman.h"
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 03/14] linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap Richard Henderson
` (11 subsequent siblings)
14 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
Copy each guest kernel's default value, then bound it
against reserved_va or the host address space.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/aarch64/target_mman.h | 3 +++
linux-user/alpha/target_mman.h | 3 +++
linux-user/arm/target_mman.h | 3 +++
linux-user/cris/target_mman.h | 3 +++
linux-user/hexagon/target_mman.h | 3 +++
linux-user/hppa/target_mman.h | 3 +++
linux-user/i386/target_mman.h | 3 +++
linux-user/loongarch64/target_mman.h | 3 +++
linux-user/m68k/target_mman.h | 2 ++
linux-user/microblaze/target_mman.h | 3 +++
linux-user/mips/target_mman.h | 3 +++
linux-user/nios2/target_mman.h | 3 +++
linux-user/openrisc/target_mman.h | 3 +++
linux-user/ppc/target_mman.h | 7 +++++++
linux-user/riscv/target_mman.h | 3 +++
linux-user/s390x/target_mman.h | 10 ++++++++++
linux-user/sh4/target_mman.h | 3 +++
linux-user/sparc/target_mman.h | 11 +++++++++++
linux-user/user-mmap.h | 1 +
linux-user/x86_64/target_mman.h | 3 +++
linux-user/xtensa/target_mman.h | 4 ++++
linux-user/main.c | 15 +++++++++++++++
linux-user/mmap.c | 1 +
23 files changed, 96 insertions(+)
diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h
index 4d3eecfb26..69ec5d5739 100644
--- a/linux-user/aarch64/target_mman.h
+++ b/linux-user/aarch64/target_mman.h
@@ -14,6 +14,9 @@
*/
#define TASK_UNMAPPED_BASE (1ull << (48 - 2))
+/* arch/arm64/include/asm/elf.h */
+#define ELF_ET_DYN_BASE TARGET_PAGE_ALIGN((1ull << 48) / 3 * 2)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h
index c90b493711..8edfe2b88c 100644
--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -28,6 +28,9 @@
*/
#define TASK_UNMAPPED_BASE 0x20000000000ull
+/* arch/alpha/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/arm/target_mman.h b/linux-user/arm/target_mman.h
index 76275b2c7e..51005da869 100644
--- a/linux-user/arm/target_mman.h
+++ b/linux-user/arm/target_mman.h
@@ -6,4 +6,7 @@
*/
#define TASK_UNMAPPED_BASE 0x40000000
+/* arch/arm/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0x00400000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/cris/target_mman.h b/linux-user/cris/target_mman.h
index 9df7b1eda5..9ace8ac292 100644
--- a/linux-user/cris/target_mman.h
+++ b/linux-user/cris/target_mman.h
@@ -7,4 +7,7 @@
*/
#define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0xb0000000 / 3)
+/* arch/cris/include/uapi/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/hexagon/target_mman.h b/linux-user/hexagon/target_mman.h
index c5ae336e07..e6b5e2ca36 100644
--- a/linux-user/hexagon/target_mman.h
+++ b/linux-user/hexagon/target_mman.h
@@ -8,4 +8,7 @@
*/
#define TASK_UNMAPPED_BASE 0x40000000
+/* arch/hexagon/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0x08000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h
index 6459e7dbdd..ccda46e842 100644
--- a/linux-user/hppa/target_mman.h
+++ b/linux-user/hppa/target_mman.h
@@ -27,6 +27,9 @@
/* arch/parisc/include/asm/processor.h: DEFAULT_MAP_BASE32 */
#define TASK_UNMAPPED_BASE 0x40000000
+/* arch/parisc/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x01000000)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/i386/target_mman.h b/linux-user/i386/target_mman.h
index cc3382007f..e3b8e1eaa6 100644
--- a/linux-user/i386/target_mman.h
+++ b/linux-user/i386/target_mman.h
@@ -11,4 +11,7 @@
*/
#define TASK_UNMAPPED_BASE 0x40000000
+/* arch/x86/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0x00400000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/loongarch64/target_mman.h b/linux-user/loongarch64/target_mman.h
index d70e44d44c..8c2a3d5596 100644
--- a/linux-user/loongarch64/target_mman.h
+++ b/linux-user/loongarch64/target_mman.h
@@ -6,4 +6,7 @@
#define TASK_UNMAPPED_BASE \
TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+/* arch/loongarch/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/m68k/target_mman.h b/linux-user/m68k/target_mman.h
index d3eceb663b..20cfe750c5 100644
--- a/linux-user/m68k/target_mman.h
+++ b/linux-user/m68k/target_mman.h
@@ -1,4 +1,6 @@
/* arch/m68k/include/asm/processor.h */
#define TASK_UNMAPPED_BASE 0xC0000000
+/* arch/m68k/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0xD0000000
#include "../generic/target_mman.h"
diff --git a/linux-user/microblaze/target_mman.h b/linux-user/microblaze/target_mman.h
index ffee869db4..6b3dd54f89 100644
--- a/linux-user/microblaze/target_mman.h
+++ b/linux-user/microblaze/target_mman.h
@@ -6,4 +6,7 @@
*/
#define TASK_UNMAPPED_BASE 0x48000000
+/* arch/microblaze/include/uapi/asm/elf.h */
+#define ELF_ET_DYN_BASE 0x08000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h
index fe1eec2d0b..b84fe1e8a8 100644
--- a/linux-user/mips/target_mman.h
+++ b/linux-user/mips/target_mman.h
@@ -21,6 +21,9 @@
#define TASK_UNMAPPED_BASE \
TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+/* arch/mips/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/nios2/target_mman.h b/linux-user/nios2/target_mman.h
index ce18f4f871..ab16ad4f03 100644
--- a/linux-user/nios2/target_mman.h
+++ b/linux-user/nios2/target_mman.h
@@ -5,4 +5,7 @@
*/
#define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0x7FFF0000 / 3)
+/* arch/nios2/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0xD0000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/openrisc/target_mman.h b/linux-user/openrisc/target_mman.h
index f1aaad809d..243c1d5f26 100644
--- a/linux-user/openrisc/target_mman.h
+++ b/linux-user/openrisc/target_mman.h
@@ -5,4 +5,7 @@
*/
#define TASK_UNMAPPED_BASE 0x30000000
+/* arch/openrisc/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0x08000000
+
#include "../generic/target_mman.h"
diff --git a/linux-user/ppc/target_mman.h b/linux-user/ppc/target_mman.h
index 04f99c6077..646d1ccae7 100644
--- a/linux-user/ppc/target_mman.h
+++ b/linux-user/ppc/target_mman.h
@@ -17,6 +17,13 @@
#define TASK_UNMAPPED_BASE 0x40000000
#endif
+/* arch/powerpc/include/asm/elf.h */
+#ifdef TARGET_PPC64
+#define ELF_ET_DYN_BASE 0x100000000ull
+#else
+#define ELF_ET_DYN_BASE 0x000400000
+#endif
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/riscv/target_mman.h b/linux-user/riscv/target_mman.h
index 0f06dadbd4..3049bcc67d 100644
--- a/linux-user/riscv/target_mman.h
+++ b/linux-user/riscv/target_mman.h
@@ -5,4 +5,7 @@
#define TASK_UNMAPPED_BASE \
TARGET_PAGE_ALIGN((1ull << (TARGET_VIRT_ADDR_SPACE_BITS - 1)) / 3)
+/* arch/riscv/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/s390x/target_mman.h b/linux-user/s390x/target_mman.h
index 40d149b329..c82435e381 100644
--- a/linux-user/s390x/target_mman.h
+++ b/linux-user/s390x/target_mman.h
@@ -8,4 +8,14 @@
*/
#define TASK_UNMAPPED_BASE (1ull << 41)
+/*
+ * arch/s390/include/asm/elf.h:
+ * ELF_ET_DYN_BASE (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)
+ *
+ * arch/s390/include/asm/processor.h:
+ * STACK_TOP VDSO_LIMIT - VDSO_SIZE - PAGE_SIZE
+ * VDSO_LIMIT _REGION2_SIZE
+ */
+#define ELF_ET_DYN_BASE (((1ull << 42) / 3 * 2) & ~0xffffffffull)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/sh4/target_mman.h b/linux-user/sh4/target_mman.h
index bbbc223398..dd9016081e 100644
--- a/linux-user/sh4/target_mman.h
+++ b/linux-user/sh4/target_mman.h
@@ -2,4 +2,7 @@
#define TASK_UNMAPPED_BASE \
TARGET_PAGE_ALIGN((1u << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+/* arch/sh/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/sparc/target_mman.h b/linux-user/sparc/target_mman.h
index 692ebf9dd7..696ca73fe4 100644
--- a/linux-user/sparc/target_mman.h
+++ b/linux-user/sparc/target_mman.h
@@ -19,6 +19,17 @@
#define TASK_UNMAPPED_BASE (1ull << (TARGET_VIRT_ADDR_SPACE_BITS - 2))
#endif
+/*
+ * arch/sparc/include/asm/elf_64.h
+ * Except that COMPAT_ELF_ET_DYN_BASE exactly matches TASK_UNMAPPED_BASE,
+ * so move it up a bit.
+ */
+#ifdef TARGET_ABI32
+#define ELF_ET_DYN_BASE 0x78000000
+#else
+#define ELF_ET_DYN_BASE 0x0000010000000000ull
+#endif
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index bae49059e0..5dd48a458d 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -20,6 +20,7 @@
extern abi_ulong task_unmapped_base;
extern abi_ulong mmap_next_start;
+extern abi_ulong elf_et_dyn_base;
int target_mprotect(abi_ulong start, abi_ulong len, int prot);
abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
diff --git a/linux-user/x86_64/target_mman.h b/linux-user/x86_64/target_mman.h
index f9ff652b37..48fbf20b42 100644
--- a/linux-user/x86_64/target_mman.h
+++ b/linux-user/x86_64/target_mman.h
@@ -10,4 +10,7 @@
#define TASK_UNMAPPED_BASE \
TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+/* arch/x86/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE * 2)
+
#include "../generic/target_mman.h"
diff --git a/linux-user/xtensa/target_mman.h b/linux-user/xtensa/target_mman.h
index c4f671adb7..8fa6337a97 100644
--- a/linux-user/xtensa/target_mman.h
+++ b/linux-user/xtensa/target_mman.h
@@ -20,6 +20,10 @@
*/
#define TASK_UNMAPPED_BASE (1u << (TARGET_VIRT_ADDR_SPACE_BITS - 1))
+/* arch/xtensa/include/asm/elf.h */
+#define ELF_ET_DYN_BASE \
+ TARGET_PAGE_ALIGN((1u << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
+
#include "../generic/target_mman.h"
#endif
diff --git a/linux-user/main.c b/linux-user/main.c
index be621dc792..96be354897 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -847,6 +847,21 @@ int main(int argc, char **argv, char **envp)
}
mmap_next_start = task_unmapped_base;
+ /* Similarly for elf_et_dyn_base. */
+ if (reserved_va) {
+ if (ELF_ET_DYN_BASE < reserved_va) {
+ elf_et_dyn_base = ELF_ET_DYN_BASE;
+ } else {
+ /* The most common default formula is TASK_SIZE / 3 * 2. */
+ elf_et_dyn_base = TARGET_PAGE_ALIGN(reserved_va / 3) * 2;
+ }
+ } else if (ELF_ET_DYN_BASE < UINTPTR_MAX) {
+ elf_et_dyn_base = ELF_ET_DYN_BASE;
+ } else {
+ /* 32-bit host: pick something medium size. */
+ elf_et_dyn_base = 0x18000000;
+ }
+
#pragma GCC diagnostic pop
{
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 84436d45c8..949c4090f3 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -301,6 +301,7 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
abi_ulong task_unmapped_base;
abi_ulong mmap_next_start;
+abi_ulong elf_et_dyn_base;
/*
* Subroutine of mmap_find_vma, used when we have pre-allocated
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (2 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 03/14] linux-user: Define ELF_ET_DYN_BASE " Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 9:43 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter Richard Henderson
` (10 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Use this as extra protection for the guest mapping over
any qemu host mappings.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..1b4bb2d5af 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
/*
* Reserve address space for all of this.
*
- * In the case of ET_EXEC, we supply MAP_FIXED so that we get
- * exactly the address range that is required.
+ * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
+ * exactly the address range that is required. Without reserved_va,
+ * the guest address space is not isolated. We have attempted to avoid
+ * conflict with the host program itself via probe_guest_base, but using
+ * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
*
* Otherwise this is ET_DYN, and we are searching for a location
* that can hold the memory space required. If the image is
@@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
*/
load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
- (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+ (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
-1, 0);
if (load_addr == -1) {
goto exit_mmap;
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (3 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 9:49 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable Richard Henderson
` (9 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Follow the lead of the linux kernel in fs/binfmt_elf.c,
in which an ET_DYN executable which uses an interpreter
(usually a PIE executable) is loaded away from where the
interpreter itself will be loaded.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1b4bb2d5af..d1b278d799 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3107,6 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd,
}
}
+ load_addr = loaddr;
+
if (pinterp_name != NULL) {
/*
* This is the main executable.
@@ -3136,11 +3138,32 @@ static void load_elf_image(const char *image_name, int image_fd,
*/
probe_guest_base(image_name, loaddr, hiaddr);
} else {
+ abi_ulong align;
+
/*
* The binary is dynamic, but we still need to
* select guest_base. In this case we pass a size.
*/
probe_guest_base(image_name, 0, hiaddr - loaddr);
+
+ /*
+ * Avoid collision with the loader by providing a different
+ * default load address.
+ */
+ load_addr += elf_et_dyn_base;
+
+ /*
+ * TODO: Better support for mmap alignment is desirable.
+ * Since we do not have complete control over the guest
+ * address space, we prefer the kernel to choose some address
+ * rather than force the use of LOAD_ADDR via MAP_FIXED.
+ * But without MAP_FIXED we cannot guarantee alignment,
+ * only suggest it.
+ */
+ align = pow2ceil(info->alignment);
+ if (align) {
+ load_addr &= -align;
+ }
}
}
@@ -3155,13 +3178,13 @@ static void load_elf_image(const char *image_name, int image_fd,
*
* Otherwise this is ET_DYN, and we are searching for a location
* that can hold the memory space required. If the image is
- * pre-linked, LOADDR will be non-zero, and the kernel should
+ * pre-linked, LOAD_ADDR will be non-zero, and the kernel should
* honor that address if it happens to be free.
*
* In both cases, we will overwrite pages in this range with mappings
* from the executable.
*/
- load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
+ load_addr = target_mmap(load_addr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
-1, 0);
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (4 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 10:54 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size Richard Henderson
` (8 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
From: Helge Deller <deller@gmx.de>
While we attempt to load a ET_DYN executable far away from
TASK_UNMAPPED_BASE, we are not completely in control of the
address space layout. If the interpreter lands close to
the executable, leaving insufficient heap space, move brk.
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
[rth: Re-order after ELF_ET_DYN_BASE patch so that we do not
"temporarily break" tsan, and also to minimize the changes required.
Remove image_info.reserve_brk as unused.]
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/qemu.h | 1 -
linux-user/elfload.c | 51 +++++++++++++-------------------------------
2 files changed, 15 insertions(+), 37 deletions(-)
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 2046a23037..4f8b55e2fb 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -30,7 +30,6 @@ struct image_info {
abi_ulong start_data;
abi_ulong end_data;
abi_ulong brk;
- abi_ulong reserve_brk;
abi_ulong start_mmap;
abi_ulong start_stack;
abi_ulong stack_limit;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index d1b278d799..3553a3eaef 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3110,27 +3110,6 @@ static void load_elf_image(const char *image_name, int image_fd,
load_addr = loaddr;
if (pinterp_name != NULL) {
- /*
- * This is the main executable.
- *
- * Reserve extra space for brk.
- * We hold on to this space while placing the interpreter
- * and the stack, lest they be placed immediately after
- * the data segment and block allocation from the brk.
- *
- * 16MB is chosen as "large enough" without being so large as
- * to allow the result to not fit with a 32-bit guest on a
- * 32-bit host. However some 64 bit guests (e.g. s390x)
- * attempt to place their heap further ahead and currently
- * nothing stops them smashing into QEMUs address space.
- */
-#if TARGET_LONG_BITS == 64
- info->reserve_brk = 32 * MiB;
-#else
- info->reserve_brk = 16 * MiB;
-#endif
- hiaddr += info->reserve_brk;
-
if (ehdr->e_type == ET_EXEC) {
/*
* Make sure that the low address does not conflict with
@@ -3221,7 +3200,8 @@ static void load_elf_image(const char *image_name, int image_fd,
info->end_code = 0;
info->start_data = -1;
info->end_data = 0;
- info->brk = 0;
+ /* Usual start for brk is after all sections of the main executable. */
+ info->brk = TARGET_PAGE_ALIGN(hiaddr);
info->elf_flags = ehdr->e_flags;
prot_exec = PROT_EXEC;
@@ -3315,9 +3295,6 @@ static void load_elf_image(const char *image_name, int image_fd,
info->end_data = vaddr_ef;
}
}
- if (vaddr_em > info->brk) {
- info->brk = vaddr_em;
- }
#ifdef TARGET_MIPS
} else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
Mips_elf_abiflags_v0 abiflags;
@@ -3646,6 +3623,19 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
if (elf_interpreter) {
load_elf_interp(elf_interpreter, &interp_info, bprm->buf);
+ /*
+ * While unusual because of ELF_ET_DYN_BASE, if we are unlucky
+ * with the mappings the interpreter can be loaded above but
+ * near the main executable, which can leave very little room
+ * for the heap.
+ * If the current brk has less than 16MB, use the end of the
+ * interpreter.
+ */
+ if (interp_info.brk > info->brk &&
+ interp_info.load_bias - info->brk < 16 * MiB) {
+ info->brk = interp_info.brk;
+ }
+
/* If the program interpreter is one of these two, then assume
an iBCS2 image. Otherwise assume a native linux image. */
@@ -3699,17 +3689,6 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
bprm->core_dump = &elf_core_dump;
#endif
- /*
- * If we reserved extra space for brk, release it now.
- * The implementation of do_brk in syscalls.c expects to be able
- * to mmap pages in this space.
- */
- if (info->reserve_brk) {
- abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk);
- abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + info->reserve_brk);
- target_munmap(start_brk, end_brk - start_brk);
- }
-
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (5 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 10:59 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss " Richard Henderson
` (7 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Remove TARGET_ELF_EXEC_PAGESIZE, and 3 other TARGET_ELF_PAGE* macros
based off of that. Rely on target_mmap to handle guest vs host page
size mismatch.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3553a3eaef..964b21f997 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1960,15 +1960,6 @@ struct exec
#define ZMAGIC 0413
#define QMAGIC 0314
-/* Necessary parameters */
-#define TARGET_ELF_EXEC_PAGESIZE \
- (((eppnt->p_align & ~qemu_host_page_mask) != 0) ? \
- TARGET_PAGE_SIZE : MAX(qemu_host_page_size, TARGET_PAGE_SIZE))
-#define TARGET_ELF_PAGELENGTH(_v) ROUND_UP((_v), TARGET_ELF_EXEC_PAGESIZE)
-#define TARGET_ELF_PAGESTART(_v) ((_v) & \
- ~(abi_ulong)(TARGET_ELF_EXEC_PAGESIZE-1))
-#define TARGET_ELF_PAGEOFFSET(_v) ((_v) & (TARGET_ELF_EXEC_PAGESIZE-1))
-
#define DLINFO_ITEMS 16
static inline void memcpy_fromfs(void * to, const void * from, unsigned long n)
@@ -3241,8 +3232,8 @@ static void load_elf_image(const char *image_name, int image_fd,
}
vaddr = load_bias + eppnt->p_vaddr;
- vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
- vaddr_ps = TARGET_ELF_PAGESTART(vaddr);
+ vaddr_po = vaddr & ~TARGET_PAGE_MASK;
+ vaddr_ps = vaddr & TARGET_PAGE_MASK;
vaddr_ef = vaddr + eppnt->p_filesz;
vaddr_em = vaddr + eppnt->p_memsz;
@@ -3252,7 +3243,7 @@ static void load_elf_image(const char *image_name, int image_fd,
* but no backing file segment.
*/
if (eppnt->p_filesz != 0) {
- vaddr_len = TARGET_ELF_PAGELENGTH(eppnt->p_filesz + vaddr_po);
+ vaddr_len = eppnt->p_filesz + vaddr_po;
error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
MAP_PRIVATE | MAP_FIXED,
image_fd, eppnt->p_offset - vaddr_po);
@@ -3268,7 +3259,7 @@ static void load_elf_image(const char *image_name, int image_fd,
zero_bss(vaddr_ef, vaddr_em, elf_prot);
}
} else if (eppnt->p_memsz != 0) {
- vaddr_len = TARGET_ELF_PAGELENGTH(eppnt->p_memsz + vaddr_po);
+ vaddr_len = eppnt->p_memsz + vaddr_po;
error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS,
-1, 0);
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss for host page size
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (6 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size Richard Henderson
@ 2023-08-07 16:36 ` Richard Henderson
2023-08-08 11:38 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too Richard Henderson
` (6 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:36 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
Rely on target_mmap to handle guest vs host page size mismatch.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 54 +++++++++++++++++++-------------------------
1 file changed, 23 insertions(+), 31 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 964b21f997..6c28cb70ef 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2213,44 +2213,36 @@ static abi_ulong setup_arg_pages(struct linux_binprm *bprm,
/* Map and zero the bss. We need to explicitly zero any fractional pages
after the data section (i.e. bss). */
-static void zero_bss(abi_ulong elf_bss, abi_ulong last_bss, int prot)
+static void zero_bss(abi_ulong start_bss, abi_ulong end_bss, int prot)
{
- uintptr_t host_start, host_map_start, host_end;
+ abi_ulong align_bss;
- last_bss = TARGET_PAGE_ALIGN(last_bss);
+ align_bss = TARGET_PAGE_ALIGN(start_bss);
+ end_bss = TARGET_PAGE_ALIGN(end_bss);
- /* ??? There is confusion between qemu_real_host_page_size and
- qemu_host_page_size here and elsewhere in target_mmap, which
- may lead to the end of the data section mapping from the file
- not being mapped. At least there was an explicit test and
- comment for that here, suggesting that "the file size must
- be known". The comment probably pre-dates the introduction
- of the fstat system call in target_mmap which does in fact
- find out the size. What isn't clear is if the workaround
- here is still actually needed. For now, continue with it,
- but merge it with the "normal" mmap that would allocate the bss. */
+ if (start_bss < align_bss) {
+ int flags = page_get_flags(start_bss);
- host_start = (uintptr_t) g2h_untagged(elf_bss);
- host_end = (uintptr_t) g2h_untagged(last_bss);
- host_map_start = REAL_HOST_PAGE_ALIGN(host_start);
-
- if (host_map_start < host_end) {
- void *p = mmap((void *)host_map_start, host_end - host_map_start,
- prot, MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
- if (p == MAP_FAILED) {
- perror("cannot mmap brk");
- exit(-1);
+ if (!(flags & PAGE_VALID)) {
+ /* Map the start of the bss. */
+ align_bss -= TARGET_PAGE_SIZE;
+ } else if (flags & PAGE_WRITE) {
+ /* The page is already mapped writable. */
+ memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
+ } else {
+ /* Read-only zeros? */
+ g_assert_not_reached();
}
}
- /* Ensure that the bss page(s) are valid */
- if ((page_get_flags(last_bss-1) & prot) != prot) {
- page_set_flags(elf_bss & TARGET_PAGE_MASK, last_bss - 1,
- prot | PAGE_VALID);
- }
-
- if (host_start < host_map_start) {
- memset((void *)host_start, 0, host_map_start - host_start);
+ if (align_bss < end_bss) {
+ abi_long err = target_mmap(align_bss, end_bss - align_bss, prot,
+ MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
+ -1, 0);
+ if (err == -1) {
+ perror("cannot mmap brk");
+ exit(-1);
+ }
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (7 preceding siblings ...)
2023-08-07 16:36 ` [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss " Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-08 11:43 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Richard Henderson
` (5 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki
If p_filesz == 0, then vaddr_ef == vaddr. We can reuse the
code in zero_bss rather than incompletely duplicating it in
load_elf_image.
Tested-by: Helge Deller <deller@gmx.de>
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 27 +++++++--------------------
1 file changed, 7 insertions(+), 20 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 6c28cb70ef..c9e176a9f6 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3210,7 +3210,7 @@ static void load_elf_image(const char *image_name, int image_fd,
for (i = 0; i < ehdr->e_phnum; i++) {
struct elf_phdr *eppnt = phdr + i;
if (eppnt->p_type == PT_LOAD) {
- abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len;
+ abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em;
int elf_prot = 0;
if (eppnt->p_flags & PF_R) {
@@ -3235,30 +3235,17 @@ static void load_elf_image(const char *image_name, int image_fd,
* but no backing file segment.
*/
if (eppnt->p_filesz != 0) {
- vaddr_len = eppnt->p_filesz + vaddr_po;
- error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
- MAP_PRIVATE | MAP_FIXED,
+ error = target_mmap(vaddr_ps, eppnt->p_filesz + vaddr_po,
+ elf_prot, MAP_PRIVATE | MAP_FIXED,
image_fd, eppnt->p_offset - vaddr_po);
-
if (error == -1) {
goto exit_mmap;
}
+ }
- /*
- * If the load segment requests extra zeros (e.g. bss), map it.
- */
- if (eppnt->p_filesz < eppnt->p_memsz) {
- zero_bss(vaddr_ef, vaddr_em, elf_prot);
- }
- } else if (eppnt->p_memsz != 0) {
- vaddr_len = eppnt->p_memsz + vaddr_po;
- error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
- MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS,
- -1, 0);
-
- if (error == -1) {
- goto exit_mmap;
- }
+ /* If the load segment requests extra zeros (e.g. bss), map it. */
+ if (vaddr_ef < vaddr_em) {
+ zero_bss(vaddr_ef, vaddr_em, elf_prot);
}
/* Find the full program boundaries. */
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (8 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-07 18:17 ` Richard Henderson
2023-08-08 6:15 ` Michael Tokarev
2023-08-07 16:37 ` [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base Richard Henderson
` (4 subsequent siblings)
14 siblings, 2 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
We will want to be able to search the set of mappings.
For this patch, the two users iterate the tree in order.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/selfmap.h | 20 ++++----
linux-user/elfload.c | 14 +++--
linux-user/syscall.c | 15 +++---
util/selfmap.c | 114 +++++++++++++++++++++++++----------------
4 files changed, 96 insertions(+), 67 deletions(-)
diff --git a/include/qemu/selfmap.h b/include/qemu/selfmap.h
index 3479a2a618..aacd6ae0a0 100644
--- a/include/qemu/selfmap.h
+++ b/include/qemu/selfmap.h
@@ -9,9 +9,10 @@
#ifndef SELFMAP_H
#define SELFMAP_H
+#include "qemu/interval-tree.h"
+
typedef struct {
- unsigned long start;
- unsigned long end;
+ IntervalTreeNode itree;
/* flags */
bool is_read;
@@ -19,26 +20,25 @@ typedef struct {
bool is_exec;
bool is_priv;
- unsigned long offset;
- gchar *dev;
+ uint64_t offset;
uint64_t inode;
- gchar *path;
+ const char *path;
+ char dev[];
} MapInfo;
-
/**
* read_self_maps:
*
* Read /proc/self/maps and return a list of MapInfo structures.
*/
-GSList *read_self_maps(void);
+IntervalTreeRoot *read_self_maps(void);
/**
* free_self_maps:
- * @info: a GSlist
+ * @info: an interval tree
*
- * Free a list of MapInfo structures.
+ * Free a tree of MapInfo structures.
*/
-void free_self_maps(GSList *info);
+void free_self_maps(IntervalTreeRoot *root);
#endif /* SELFMAP_H */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index c9e176a9f6..f497286abe 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2622,7 +2622,8 @@ static uintptr_t pgd_find_hole_fallback(uintptr_t guest_size, uintptr_t brk,
static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
long align, uintptr_t offset)
{
- GSList *maps, *iter;
+ IntervalTreeRoot *maps;
+ IntervalTreeNode *iter;
uintptr_t this_start, this_end, next_start, brk;
intptr_t ret = -1;
@@ -2640,12 +2641,15 @@ static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
/* The first hole is before the first map entry. */
this_start = mmap_min_addr;
- for (iter = maps; iter;
- this_start = next_start, iter = g_slist_next(iter)) {
+ for (iter = interval_tree_iter_first(maps, 0, -1);
+ iter;
+ this_start = next_start,
+ iter = interval_tree_iter_next(iter, 0, -1)) {
+ MapInfo *info = container_of(iter, MapInfo, itree);
uintptr_t align_start, hole_size;
- this_end = ((MapInfo *)iter->data)->start;
- next_start = ((MapInfo *)iter->data)->end;
+ this_end = info->itree.start;
+ next_start = info->itree.last + 1;
align_start = ROUND_UP(this_start + offset, align);
/* Skip holes that are too small. */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7c2c2f6e2f..a15bce2be2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8070,16 +8070,17 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
{
CPUState *cpu = env_cpu(cpu_env);
TaskState *ts = cpu->opaque;
- GSList *map_info = read_self_maps();
- GSList *s;
+ IntervalTreeRoot *map_info = read_self_maps();
+ IntervalTreeNode *s;
int count;
- for (s = map_info; s; s = g_slist_next(s)) {
- MapInfo *e = (MapInfo *) s->data;
+ for (s = interval_tree_iter_first(map_info, 0, -1); s;
+ s = interval_tree_iter_next(s, 0, -1)) {
+ MapInfo *e = container_of(s, MapInfo, itree);
- if (h2g_valid(e->start)) {
- unsigned long min = e->start;
- unsigned long max = e->end;
+ if (h2g_valid(e->itree.start)) {
+ unsigned long min = e->itree.start;
+ unsigned long max = e->itree.last + 1;
int flags = page_get_flags(h2g(min));
const char *path;
diff --git a/util/selfmap.c b/util/selfmap.c
index 2c14f019ce..4db5b42651 100644
--- a/util/selfmap.c
+++ b/util/selfmap.c
@@ -10,74 +10,98 @@
#include "qemu/cutils.h"
#include "qemu/selfmap.h"
-GSList *read_self_maps(void)
+IntervalTreeRoot *read_self_maps(void)
{
- gchar *maps;
- GSList *map_info = NULL;
+ IntervalTreeRoot *root;
+ gchar *maps, **lines;
+ guint i, nlines;
- if (g_file_get_contents("/proc/self/maps", &maps, NULL, NULL)) {
- gchar **lines = g_strsplit(maps, "\n", 0);
- int i, entries = g_strv_length(lines);
+ if (!g_file_get_contents("/proc/self/maps", &maps, NULL, NULL)) {
+ return NULL;
+ }
- for (i = 0; i < entries; i++) {
- gchar **fields = g_strsplit(lines[i], " ", 6);
- if (g_strv_length(fields) > 4) {
- MapInfo *e = g_new0(MapInfo, 1);
- int errors = 0;
- const char *end;
+ root = g_new0(IntervalTreeRoot, 1);
+ lines = g_strsplit(maps, "\n", 0);
+ nlines = g_strv_length(lines);
- errors |= qemu_strtoul(fields[0], &end, 16, &e->start);
- errors |= qemu_strtoul(end + 1, NULL, 16, &e->end);
+ for (i = 0; i < nlines; i++) {
+ gchar **fields = g_strsplit(lines[i], " ", 6);
+ guint nfields = g_strv_length(fields);
+
+ if (nfields > 4) {
+ uint64_t start, end, offset, inode;
+ int errors = 0;
+ const char *p;
+
+ errors |= qemu_strtou64(fields[0], &p, 16, &start);
+ errors |= qemu_strtou64(p + 1, NULL, 16, &end);
+ errors |= qemu_strtou64(fields[2], NULL, 16, &offset);
+ errors |= qemu_strtou64(fields[4], NULL, 10, &inode);
+
+ if (!errors) {
+ size_t dev_len, path_len;
+ MapInfo *e;
+
+ dev_len = strlen(fields[3]) + 1;
+ if (nfields == 6) {
+ p = fields[5];
+ p += strspn(p, " ");
+ path_len = strlen(p) + 1;
+ } else {
+ p = NULL;
+ path_len = 0;
+ }
+
+ e = g_malloc0(sizeof(*e) + dev_len + path_len);
+
+ e->itree.start = start;
+ e->itree.last = end - 1;
+ e->offset = offset;
+ e->inode = inode;
e->is_read = fields[1][0] == 'r';
e->is_write = fields[1][1] == 'w';
e->is_exec = fields[1][2] == 'x';
e->is_priv = fields[1][3] == 'p';
- errors |= qemu_strtoul(fields[2], NULL, 16, &e->offset);
- e->dev = g_strdup(fields[3]);
- errors |= qemu_strtou64(fields[4], NULL, 10, &e->inode);
-
- if (!errors) {
- /*
- * The last field may have leading spaces which we
- * need to strip.
- */
- if (g_strv_length(fields) == 6) {
- e->path = g_strdup(g_strchug(fields[5]));
- }
- map_info = g_slist_prepend(map_info, e);
- } else {
- g_free(e->dev);
- g_free(e);
+ memcpy(e->dev, fields[3], dev_len);
+ if (path_len) {
+ e->path = memcpy(e->dev + dev_len, p, path_len);
}
+
+ interval_tree_insert(&e->itree, root);
}
-
- g_strfreev(fields);
}
- g_strfreev(lines);
- g_free(maps);
+ g_strfreev(fields);
}
+ g_strfreev(lines);
+ g_free(maps);
- /* ensure the map data is in the same order we collected it */
- return g_slist_reverse(map_info);
+ return root;
}
/**
* free_self_maps:
- * @info: a GSlist
+ * @root: an interval tree
*
- * Free a list of MapInfo structures.
+ * Free a tree of MapInfo structures.
+ * Since we allocated each MapInfo in one chunk, we need not consider the
+ * contents and can simply free each RBNode.
*/
-static void free_info(gpointer data)
+
+static void free_rbnode(RBNode *n)
{
- MapInfo *e = (MapInfo *) data;
- g_free(e->dev);
- g_free(e->path);
- g_free(e);
+ if (n) {
+ free_rbnode(n->rb_left);
+ free_rbnode(n->rb_right);
+ g_free(n);
+ }
}
-void free_self_maps(GSList *info)
+void free_self_maps(IntervalTreeRoot *root)
{
- g_slist_free_full(info, &free_info);
+ if (root) {
+ free_rbnode(root->rb_root.rb_node);
+ g_free(root);
+ }
}
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (9 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-08 11:45 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base Richard Henderson
` (3 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
The proper logging for probe_guest_base is in the main function.
There is no need to duplicate that in the subroutines.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f497286abe..400af4a4c0 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2562,9 +2562,6 @@ static void pgb_have_guest_base(const char *image_name, abi_ulong guest_loaddr,
if (test != addr) {
pgb_fail_in_use(image_name);
}
- qemu_log_mask(CPU_LOG_PAGE,
- "%s: base @ %p for %" PRIu64 " bytes\n",
- __func__, addr, (uint64_t)guest_hiaddr - guest_loaddr + 1);
}
/**
@@ -2607,9 +2604,6 @@ static uintptr_t pgd_find_hole_fallback(uintptr_t guest_size, uintptr_t brk,
if (mmap_start != MAP_FAILED) {
munmap(mmap_start, guest_size);
if (mmap_start == (void *) align_start) {
- qemu_log_mask(CPU_LOG_PAGE,
- "%s: base @ %p for %" PRIdPTR" bytes\n",
- __func__, mmap_start + offset, guest_size);
return (uintptr_t) mmap_start + offset;
}
}
@@ -2691,13 +2685,6 @@ static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
}
}
free_self_maps(maps);
-
- if (ret != -1) {
- qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %" PRIxPTR
- " for %" PRIuPTR " bytes\n",
- __func__, ret, guest_size);
- }
-
return ret;
}
@@ -2749,9 +2736,6 @@ static void pgb_static(const char *image_name, abi_ulong orig_loaddr,
}
guest_base = addr;
-
- qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %"PRIxPTR" for %" PRIuPTR" bytes\n",
- __func__, addr, hiaddr - loaddr);
}
static void pgb_dynamic(const char *image_name, long align)
@@ -2809,9 +2793,6 @@ static void pgb_reserved_va(const char *image_name, abi_ulong guest_loaddr,
reserved_va + 1, test, strerror(errno));
exit(EXIT_FAILURE);
}
-
- qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %p for %lu bytes\n",
- __func__, addr, reserved_va + 1);
}
void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (10 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-08 11:46 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base Richard Henderson
` (2 subsequent siblings)
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
The three sets of checks are identical, logically.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 50 +++++++++++++++-----------------------------
1 file changed, 17 insertions(+), 33 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 400af4a4c0..484ab7131a 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2527,25 +2527,6 @@ static void pgb_have_guest_base(const char *image_name, abi_ulong guest_loaddr,
exit(EXIT_FAILURE);
}
- /* Sanity check the guest binary. */
- if (reserved_va) {
- if (guest_hiaddr > reserved_va) {
- error_report("%s: requires more than reserved virtual "
- "address space (0x%" PRIx64 " > 0x%lx)",
- image_name, (uint64_t)guest_hiaddr, reserved_va);
- exit(EXIT_FAILURE);
- }
- } else {
-#if HOST_LONG_BITS < TARGET_ABI_BITS
- if ((guest_hiaddr - guest_base) > ~(uintptr_t)0) {
- error_report("%s: requires more virtual address space "
- "than the host can provide (0x%" PRIx64 ")",
- image_name, (uint64_t)guest_hiaddr + 1 - guest_base);
- exit(EXIT_FAILURE);
- }
-#endif
- }
-
/*
* Expand the allocation to the entire reserved_va.
* Exclude the mmap_min_addr hole.
@@ -2696,13 +2677,6 @@ static void pgb_static(const char *image_name, abi_ulong orig_loaddr,
uintptr_t offset = 0;
uintptr_t addr;
- if (hiaddr != orig_hiaddr) {
- error_report("%s: requires virtual address space that the "
- "host cannot provide (0x%" PRIx64 ")",
- image_name, (uint64_t)orig_hiaddr + 1);
- exit(EXIT_FAILURE);
- }
-
loaddr &= -align;
if (HI_COMMPAGE) {
/*
@@ -2768,13 +2742,6 @@ static void pgb_reserved_va(const char *image_name, abi_ulong guest_loaddr,
int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
void *addr, *test;
- if (guest_hiaddr > reserved_va) {
- error_report("%s: requires more than reserved virtual "
- "address space (0x%" PRIx64 " > 0x%lx)",
- image_name, (uint64_t)guest_hiaddr, reserved_va);
- exit(EXIT_FAILURE);
- }
-
/* Widen the "image" to the entire reserved address space. */
pgb_static(image_name, 0, reserved_va, align);
@@ -2801,6 +2768,23 @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
/* In order to use host shmat, we must be able to honor SHMLBA. */
uintptr_t align = MAX(SHMLBA, qemu_host_page_size);
+ /* Sanity check the guest binary. */
+ if (reserved_va) {
+ if (guest_hiaddr > reserved_va) {
+ error_report("%s: requires more than reserved virtual "
+ "address space (0x%" PRIx64 " > 0x%lx)",
+ image_name, (uint64_t)guest_hiaddr, reserved_va);
+ exit(EXIT_FAILURE);
+ }
+ } else {
+ if (guest_hiaddr != (uintptr_t)guest_hiaddr) {
+ error_report("%s: requires more virtual address space "
+ "than the host can provide (0x%" PRIx64 ")",
+ image_name, (uint64_t)guest_hiaddr + 1);
+ exit(EXIT_FAILURE);
+ }
+ }
+
if (have_guest_base) {
pgb_have_guest_base(image_name, guest_loaddr, guest_hiaddr, align);
} else if (reserved_va) {
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (11 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-08 16:39 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base Richard Henderson
2023-08-08 17:00 ` [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Alex Bennée
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
Create a set of subroutines to collect a set of guest addresses,
all of which must be mappable on the host. Use this within the
renamed pgb_fixed subroutine to validate the user's choice of
guest_base specified by the -B command-line option.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 189 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 162 insertions(+), 27 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 484ab7131a..33c74be3af 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2506,6 +2506,158 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc,
#endif
#endif
+/**
+ * pgb_try_mmap:
+ * @addr: host start address
+ * @addr_last: host last address
+ * @keep: do not unmap the probe region
+ *
+ * Return 1 if [@addr, @addr_last] is not mapped in the host,
+ * return 0 if it is not available to map, and -1 on mmap error.
+ * If @keep, the region is left mapped on success, otherwise unmapped.
+ */
+static int pgb_try_mmap(uintptr_t addr, uintptr_t addr_last, bool keep)
+{
+ size_t size = addr_last - addr + 1;
+ void *p = mmap((void *)addr, size, PROT_NONE,
+ MAP_ANONYMOUS | MAP_PRIVATE |
+ MAP_NORESERVE | MAP_FIXED_NOREPLACE, -1, 0);
+ int ret;
+
+ if (p == MAP_FAILED) {
+ return errno == EEXIST ? 0 : -1;
+ }
+ ret = p == (void *)addr;
+ if (!keep || !ret) {
+ munmap(p, size);
+ }
+ return ret;
+}
+
+/**
+ * pgb_try_mmap_skip_brk(uintptr_t addr, uintptr_t size, uintptr_t brk)
+ * @addr: host address
+ * @size: size
+ * @brk: host brk
+ *
+ * Like pgb_try_mmap, but additionally reserve some memory following brk.
+ */
+static int pgb_try_mmap_skip_brk(uintptr_t addr, uintptr_t addr_last,
+ uintptr_t brk, bool keep)
+{
+ uintptr_t brk_last = brk + 16 * MiB - 1;
+
+ /* Do not map anything close to the host brk. */
+ if (addr <= brk_last && brk <= addr_last) {
+ return 0;
+ }
+ return pgb_try_mmap(addr, addr_last, keep);
+}
+
+/**
+ * pgb_try_mmap_set:
+ * @ga: set of guest addrs
+ * @base: guest_base
+ * @brk: host brk
+ *
+ * Return true if all @ga can be mapped by the host at @base.
+ * On success, retain the mapping at index 0 for reserved_va.
+ */
+
+typedef struct PGBAddrs {
+ uintptr_t bounds[3][2]; /* start/last pairs */
+ int nbounds;
+ bool with_null_page;
+} PGBAddrs;
+
+static bool pgb_try_mmap_set(const PGBAddrs *ga, uintptr_t base, uintptr_t brk)
+{
+ for (int i = ga->nbounds - 1; i >= 0; --i) {
+ if (pgb_try_mmap_skip_brk(ga->bounds[i][0] + base,
+ ga->bounds[i][1] + base,
+ brk, i == 0 && reserved_va) <= 0) {
+ return false;
+ }
+ }
+ return true;
+}
+
+/**
+ * pgb_addr_set:
+ * @ga: output set of guest addrs
+ * @guest_loaddr: guest image low address
+ * @guest_loaddr: guest image high address
+ * @identity: create for identity mapping
+ *
+ * Fill in @ga with the image, COMMPAGE and NULL page.
+ */
+static bool pgb_addr_set(PGBAddrs *ga, abi_ulong guest_loaddr,
+ abi_ulong guest_hiaddr, bool try_identity)
+{
+ int n;
+
+ /*
+ * With a low commpage, or a guest mapped very low,
+ * we may not be able to use the identity map.
+ */
+ if (try_identity) {
+ if (LO_COMMPAGE != -1 && LO_COMMPAGE < mmap_min_addr) {
+ return false;
+ }
+ if (guest_loaddr != 0 && guest_loaddr < mmap_min_addr) {
+ return false;
+ }
+ }
+
+ memset(ga, 0, sizeof(*ga));
+ n = 0;
+
+ if (reserved_va) {
+ ga->bounds[n][0] = try_identity ? mmap_min_addr : 0;
+ ga->bounds[n][1] = reserved_va;
+ n++;
+ /* LO_COMMPAGE and NULL handled by reserving from 0. */
+ } else {
+ /* Add any LO_COMMPAGE or NULL page. */
+ if (LO_COMMPAGE != -1) {
+ ga->bounds[n][0] = 0;
+ ga->bounds[n][1] = LO_COMMPAGE + TARGET_PAGE_SIZE - 1;
+ n++;
+ } else if (!try_identity) {
+ ga->bounds[n][0] = 0;
+ ga->bounds[n][1] = TARGET_PAGE_SIZE - 1;
+ n++;
+ }
+
+ /* Add the guest image for ET_EXEC. */
+ if (guest_loaddr) {
+ ga->bounds[n][0] = guest_loaddr;
+ ga->bounds[n][1] = guest_hiaddr;
+ n++;
+ }
+ }
+
+ /*
+ * Temporarily disable
+ * "comparison is always false due to limited range of data type"
+ * due to comparison between unsigned and (possible) 0.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtype-limits"
+
+ /* Add any HI_COMMPAGE not covered by reserved_va. */
+ if (reserved_va < HI_COMMPAGE) {
+ ga->bounds[n][0] = HI_COMMPAGE & qemu_host_page_mask;
+ ga->bounds[n][1] = HI_COMMPAGE + TARGET_PAGE_SIZE - 1;
+ n++;
+ }
+
+#pragma GCC diagnostic pop
+
+ ga->nbounds = n;
+ return true;
+}
+
static void pgb_fail_in_use(const char *image_name)
{
error_report("%s: requires virtual address space that is in use "
@@ -2514,33 +2666,21 @@ static void pgb_fail_in_use(const char *image_name)
exit(EXIT_FAILURE);
}
-static void pgb_have_guest_base(const char *image_name, abi_ulong guest_loaddr,
- abi_ulong guest_hiaddr, long align)
+static void pgb_fixed(const char *image_name, uintptr_t guest_loaddr,
+ uintptr_t guest_hiaddr, uintptr_t align)
{
- const int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
- void *addr, *test;
+ PGBAddrs ga;
+ uintptr_t brk = (uintptr_t)sbrk(0);
if (!QEMU_IS_ALIGNED(guest_base, align)) {
fprintf(stderr, "Requested guest base %p does not satisfy "
- "host minimum alignment (0x%lx)\n",
+ "host minimum alignment (0x%" PRIxPTR ")\n",
(void *)guest_base, align);
exit(EXIT_FAILURE);
}
- /*
- * Expand the allocation to the entire reserved_va.
- * Exclude the mmap_min_addr hole.
- */
- if (reserved_va) {
- guest_loaddr = (guest_base >= mmap_min_addr ? 0
- : mmap_min_addr - guest_base);
- guest_hiaddr = reserved_va;
- }
-
- /* Reserve the address space for the binary, or reserved_va. */
- test = g2h_untagged(guest_loaddr);
- addr = mmap(test, guest_hiaddr - guest_loaddr + 1, PROT_NONE, flags, -1, 0);
- if (test != addr) {
+ if (!pgb_addr_set(&ga, guest_loaddr, guest_hiaddr, !guest_base)
+ || !pgb_try_mmap_set(&ga, guest_base, brk)) {
pgb_fail_in_use(image_name);
}
}
@@ -2786,7 +2926,7 @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
}
if (have_guest_base) {
- pgb_have_guest_base(image_name, guest_loaddr, guest_hiaddr, align);
+ pgb_fixed(image_name, guest_loaddr, guest_hiaddr, align);
} else if (reserved_va) {
pgb_reserved_va(image_name, guest_loaddr, guest_hiaddr, align);
} else if (guest_loaddr) {
@@ -2797,13 +2937,8 @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
/* Reserve and initialize the commpage. */
if (!init_guest_commpage()) {
- /*
- * With have_guest_base, the user has selected the address and
- * we are trying to work with that. Otherwise, we have selected
- * free space and init_guest_commpage must succeeded.
- */
- assert(have_guest_base);
- pgb_fail_in_use(image_name);
+ /* We have already probed for the commpage being free. */
+ g_assert_not_reached();
}
assert(QEMU_IS_ALIGNED(guest_base, align));
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (12 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base Richard Henderson
@ 2023-08-07 16:37 ` Richard Henderson
2023-08-08 16:58 ` Alex Bennée
2023-08-08 17:00 ` [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Alex Bennée
14 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 16:37 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller
Use pgb_addr_set to probe for all of the guest addresses,
not just the main executable. Handle the identity map
specially and separately from the search.
If /proc/self/maps is available, utilize the full power
of the interval tree search, rather than a linear search
through the address list.
If /proc/self/maps is not available, increase the skip
between probes so that we do not probe every single page
of the host address space. Choose 1 MiB for 32-bit hosts
(max 4k probes) and 1 GiB for 64-bit hosts (possibly a
large number of probes, but the large step makes it more
likely to find empty space quicker).
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 311 ++++++++++++++++---------------------------
1 file changed, 115 insertions(+), 196 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 33c74be3af..ffea900308 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2686,220 +2686,143 @@ static void pgb_fixed(const char *image_name, uintptr_t guest_loaddr,
}
/**
- * pgd_find_hole_fallback: potential mmap address
- * @guest_size: size of available space
- * @brk: location of break
- * @align: memory alignment
+ * pgb_find_fallback:
*
- * This is a fallback method for finding a hole in the host address
- * space if we don't have the benefit of being able to access
- * /proc/self/map. It can potentially take a very long time as we can
- * only dumbly iterate up the host address space seeing if the
- * allocation would work.
+ * This is a fallback method for finding holes in the host address space
+ * if we don't have the benefit of being able to access /proc/self/map.
+ * It can potentially take a very long time as we can only dumbly iterate
+ * up the host address space seeing if the allocation would work.
*/
-static uintptr_t pgd_find_hole_fallback(uintptr_t guest_size, uintptr_t brk,
- long align, uintptr_t offset)
+static uintptr_t pgb_find_fallback(const PGBAddrs *ga, uintptr_t align,
+ uintptr_t brk)
{
- uintptr_t base;
+ /* TODO: come up with a better estimate of how much to skip. */
+ uintptr_t skip = sizeof(uintptr_t) == 4 ? MiB : GiB;
- /* Start (aligned) at the bottom and work our way up */
- base = ROUND_UP(mmap_min_addr, align);
-
- while (true) {
- uintptr_t align_start, end;
- align_start = ROUND_UP(base, align);
- end = align_start + guest_size + offset;
-
- /* if brk is anywhere in the range give ourselves some room to grow. */
- if (align_start <= brk && brk < end) {
- base = brk + (16 * MiB);
- continue;
- } else if (align_start + guest_size < align_start) {
- /* we have run out of space */
+ for (uintptr_t base = skip; ; base += skip) {
+ base = ROUND_UP(base, align);
+ if (pgb_try_mmap_set(ga, base, brk)) {
+ return base;
+ }
+ if (base >= -skip) {
return -1;
- } else {
- int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE |
- MAP_FIXED_NOREPLACE;
- void * mmap_start = mmap((void *) align_start, guest_size,
- PROT_NONE, flags, -1, 0);
- if (mmap_start != MAP_FAILED) {
- munmap(mmap_start, guest_size);
- if (mmap_start == (void *) align_start) {
- return (uintptr_t) mmap_start + offset;
- }
- }
- base += qemu_host_page_size;
}
}
}
-/* Return value for guest_base, or -1 if no hole found. */
-static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
- long align, uintptr_t offset)
+static uintptr_t pgb_try_itree(const PGBAddrs *ga, uintptr_t base,
+ IntervalTreeRoot *root)
{
- IntervalTreeRoot *maps;
- IntervalTreeNode *iter;
- uintptr_t this_start, this_end, next_start, brk;
- intptr_t ret = -1;
+ for (int i = ga->nbounds - 1; i >= 0; --i) {
+ uintptr_t s = base + ga->bounds[i][0];
+ uintptr_t l = base + ga->bounds[i][1];
+ IntervalTreeNode *n;
+
+ if (l < s) {
+ /* Wraparound. Skip to advance S to 0. */
+ return -s;
+ }
+
+ n = interval_tree_iter_first(root, s, l);
+ if (n != NULL) {
+ /* Conflict. Skip to advance S to LAST + 1. */
+ return n->last - s + 1;
+ }
+ }
+ return 0; /* success */
+}
+
+static uintptr_t pgb_find_itree(const PGBAddrs *ga, IntervalTreeRoot *root,
+ uintptr_t align, uintptr_t brk)
+{
+ uintptr_t last = mmap_min_addr;
+ uintptr_t base, skip;
+
+ while (true) {
+ base = ROUND_UP(last, align);
+ if (base < last) {
+ return -1;
+ }
+
+ skip = pgb_try_itree(ga, base, root);
+ if (skip == 0) {
+ break;
+ }
+
+ last = base + skip;
+ if (last < base) {
+ return -1;
+ }
+ }
+
+ /*
+ * We've chosen 'base' based on holes in the interval tree,
+ * but we don't yet know if it is a valid host address.
+ * Because it is the first matching hole, if the host addresses
+ * are invalid we know there are no further matches.
+ */
+ return pgb_try_mmap_set(ga, base, brk) ? base : -1;
+}
+
+static void pgb_dynamic(const char *image_name, uintptr_t guest_loaddr,
+ uintptr_t guest_hiaddr, uintptr_t align)
+{
+ IntervalTreeRoot *root;
+ uintptr_t brk, ret;
+ PGBAddrs ga;
assert(QEMU_IS_ALIGNED(guest_loaddr, align));
- maps = read_self_maps();
+ /* Try the identity map first. */
+ if (pgb_addr_set(&ga, guest_loaddr, guest_hiaddr, true)) {
+ brk = (uintptr_t)sbrk(0);
+ if (pgb_try_mmap_set(&ga, 0, brk)) {
+ guest_base = 0;
+ return;
+ }
+ }
+
+ /*
+ * Rebuild the address set for non-identity map.
+ * This differs in the mapping of the guest NULL page.
+ */
+ pgb_addr_set(&ga, guest_loaddr, guest_hiaddr, false);
+
+ root = read_self_maps();
/* Read brk after we've read the maps, which will malloc. */
brk = (uintptr_t)sbrk(0);
- if (!maps) {
- return pgd_find_hole_fallback(guest_size, brk, align, offset);
- }
-
- /* The first hole is before the first map entry. */
- this_start = mmap_min_addr;
-
- for (iter = interval_tree_iter_first(maps, 0, -1);
- iter;
- this_start = next_start,
- iter = interval_tree_iter_next(iter, 0, -1)) {
- MapInfo *info = container_of(iter, MapInfo, itree);
- uintptr_t align_start, hole_size;
-
- this_end = info->itree.start;
- next_start = info->itree.last + 1;
- align_start = ROUND_UP(this_start + offset, align);
-
- /* Skip holes that are too small. */
- if (align_start >= this_end) {
- continue;
- }
- hole_size = this_end - align_start;
- if (hole_size < guest_size) {
- continue;
- }
-
- /* If this hole contains brk, give ourselves some room to grow. */
- if (this_start <= brk && brk < this_end) {
- hole_size -= guest_size;
- if (sizeof(uintptr_t) == 8 && hole_size >= 1 * GiB) {
- align_start += 1 * GiB;
- } else if (hole_size >= 16 * MiB) {
- align_start += 16 * MiB;
- } else {
- align_start = (this_end - guest_size) & -align;
- if (align_start < this_start) {
- continue;
- }
- }
- }
-
- /* Record the lowest successful match. */
- if (ret < 0) {
- ret = align_start;
- }
- /* If this hole contains the identity map, select it. */
- if (align_start <= guest_loaddr &&
- guest_loaddr + guest_size <= this_end) {
- ret = 0;
- }
- /* If this hole ends above the identity map, stop looking. */
- if (this_end >= guest_loaddr) {
- break;
- }
- }
- free_self_maps(maps);
- return ret;
-}
-
-static void pgb_static(const char *image_name, abi_ulong orig_loaddr,
- abi_ulong orig_hiaddr, long align)
-{
- uintptr_t loaddr = orig_loaddr;
- uintptr_t hiaddr = orig_hiaddr;
- uintptr_t offset = 0;
- uintptr_t addr;
-
- loaddr &= -align;
- if (HI_COMMPAGE) {
+ if (!root) {
+ ret = pgb_find_fallback(&ga, align, brk);
+ } else {
/*
- * Extend the allocation to include the commpage.
- * For a 64-bit host, this is just 4GiB; for a 32-bit host we
- * need to ensure there is space bellow the guest_base so we
- * can map the commpage in the place needed when the address
- * arithmetic wraps around.
+ * Reserve the area close to the host brk.
+ * This will be freed with the rest of the tree.
*/
- if (sizeof(uintptr_t) == 8 || loaddr >= 0x80000000u) {
- hiaddr = UINT32_MAX;
- } else {
- offset = -(HI_COMMPAGE & -align);
- }
- } else if (LO_COMMPAGE != -1) {
- loaddr = MIN(loaddr, LO_COMMPAGE & -align);
+ IntervalTreeNode *b = g_new0(IntervalTreeNode, 1);
+ b->start = brk;
+ b->last = brk + 16 * MiB - 1;
+ interval_tree_insert(b, root);
+
+ ret = pgb_find_itree(&ga, root, align, brk);
+ free_self_maps(root);
}
- addr = pgb_find_hole(loaddr, hiaddr - loaddr + 1, align, offset);
- if (addr == -1) {
- /*
- * If HI_COMMPAGE, there *might* be a non-consecutive allocation
- * that can satisfy both. But as the normal arm32 link base address
- * is ~32k, and we extend down to include the commpage, making the
- * overhead only ~96k, this is unlikely.
- */
- error_report("%s: Unable to allocate %#zx bytes of "
- "virtual address space", image_name,
- (size_t)(hiaddr - loaddr));
- exit(EXIT_FAILURE);
- }
-
- guest_base = addr;
-}
-
-static void pgb_dynamic(const char *image_name, long align)
-{
- /*
- * The executable is dynamic and does not require a fixed address.
- * All we need is a commpage that satisfies align.
- * If we do not need a commpage, leave guest_base == 0.
- */
- if (HI_COMMPAGE) {
- uintptr_t addr, commpage;
-
- /* 64-bit hosts should have used reserved_va. */
- assert(sizeof(uintptr_t) == 4);
-
- /*
- * By putting the commpage at the first hole, that puts guest_base
- * just above that, and maximises the positive guest addresses.
- */
- commpage = HI_COMMPAGE & -align;
- addr = pgb_find_hole(commpage, -commpage, align, 0);
- assert(addr != -1);
- guest_base = addr;
- }
-}
-
-static void pgb_reserved_va(const char *image_name, abi_ulong guest_loaddr,
- abi_ulong guest_hiaddr, long align)
-{
- int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
- void *addr, *test;
-
- /* Widen the "image" to the entire reserved address space. */
- pgb_static(image_name, 0, reserved_va, align);
-
- /* osdep.h defines this as 0 if it's missing */
- flags |= MAP_FIXED_NOREPLACE;
-
- /* Reserve the memory on the host. */
- assert(guest_base != 0);
- test = g2h_untagged(0);
- addr = mmap(test, reserved_va + 1, PROT_NONE, flags, -1, 0);
- if (addr == MAP_FAILED || addr != test) {
- error_report("Unable to reserve 0x%lx bytes of virtual address "
- "space at %p (%s) for use as guest address space (check your "
- "virtual memory ulimit setting, mmap_min_addr or reserve less "
- "using qemu-user's -R option)",
- reserved_va + 1, test, strerror(errno));
+ if (ret == -1) {
+ int w = TARGET_LONG_BITS / 4;
+
+ error_report("%s: Unable to find a guest_base to satisfy all "
+ "guest address mapping requirements", image_name);
+
+ for (int i = 0; i < ga.nbounds; ++i) {
+ error_printf(" %0*" PRIx64 "-%0*" PRIx64 "\n",
+ w, (uint64_t)ga.bounds[i][0],
+ w, (uint64_t)ga.bounds[i][1]);
+ }
exit(EXIT_FAILURE);
}
+ guest_base = ret;
}
void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
@@ -2927,12 +2850,8 @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
if (have_guest_base) {
pgb_fixed(image_name, guest_loaddr, guest_hiaddr, align);
- } else if (reserved_va) {
- pgb_reserved_va(image_name, guest_loaddr, guest_hiaddr, align);
- } else if (guest_loaddr) {
- pgb_static(image_name, guest_loaddr, guest_hiaddr, align);
} else {
- pgb_dynamic(image_name, align);
+ pgb_dynamic(image_name, guest_loaddr, guest_hiaddr, align);
}
/* Reserve and initialize the commpage. */
--
2.34.1
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h
2023-08-07 16:37 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Richard Henderson
@ 2023-08-07 18:17 ` Richard Henderson
2023-08-09 15:11 ` Fix interval_tree_iter_first() to check root node value Helge Deller
2023-08-10 21:31 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Ilya Leoshkevich
2023-08-08 6:15 ` Michael Tokarev
1 sibling, 2 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-07 18:17 UTC (permalink / raw)
To: qemu-devel; +Cc: pbonzini, philmd, laurent, deller, Ilya Leoshkevich
On 8/7/23 09:37, Richard Henderson wrote:
> We will want to be able to search the set of mappings.
> For this patch, the two users iterate the tree in order.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/selfmap.h | 20 ++++----
> linux-user/elfload.c | 14 +++--
> linux-user/syscall.c | 15 +++---
> util/selfmap.c | 114 +++++++++++++++++++++++++----------------
> 4 files changed, 96 insertions(+), 67 deletions(-)
I should note that, for 8.2, this will enable a rewrite of open_self_maps_1 so that it
does not require page-by-page checking of page_get_flags.
My idea is that open_self_maps_1 would use walk_memory_regions to see all guest memory
regions. The per-region callback would cross-check with the host-region interval tree to
find the dev+inode+path.
Cc Ilya and Helge, since there are two outstanding changes to open_self_maps.
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h
2023-08-07 16:37 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Richard Henderson
2023-08-07 18:17 ` Richard Henderson
@ 2023-08-08 6:15 ` Michael Tokarev
1 sibling, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2023-08-08 6:15 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, philmd, laurent, deller
07.08.2023 19:37, Richard Henderson wrote:
> We will want to be able to search the set of mappings.
> For this patch, the two users iterate the tree in order.
> diff --git a/include/qemu/selfmap.h b/include/qemu/selfmap.h
> /**
> * read_self_maps:
> *
> * Read /proc/self/maps and return a list of MapInfo structures.
Nitpick: this comment still says "a list", while in all other places
it has been changed to "a tree".
> */
> -GSList *read_self_maps(void);
> +IntervalTreeRoot *read_self_maps(void);
/mjt
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
@ 2023-08-08 9:10 ` Alex Bennée
2023-08-08 15:16 ` Richard Henderson
2023-08-08 15:35 ` Helge Deller
1 sibling, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 9:10 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Ensure that the chosen values for mmap_next_start and
> task_unmapped_base are within the guest address space.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> linux-user/user-mmap.h | 18 +++++++++++++++++-
> linux-user/main.c | 28 ++++++++++++++++++++++++++++
> linux-user/mmap.c | 18 +++---------------
> 3 files changed, 48 insertions(+), 16 deletions(-)
>
> diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
> index 7265c2c116..fd456e024e 100644
> --- a/linux-user/user-mmap.h
> +++ b/linux-user/user-mmap.h
> @@ -18,6 +18,23 @@
> #ifndef LINUX_USER_USER_MMAP_H
> #define LINUX_USER_USER_MMAP_H
>
> +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> +#ifdef TARGET_AARCH64
> +# define TASK_UNMAPPED_BASE 0x5500000000
> +#else
> +# define TASK_UNMAPPED_BASE (1ul << 38)
> +#endif
> +#else
> +#ifdef TARGET_HPPA
> +# define TASK_UNMAPPED_BASE 0xfa000000
> +#else
> +# define TASK_UNMAPPED_BASE 0x40000000
> +#endif
> +#endif
> +
> +extern abi_ulong task_unmapped_base;
> +extern abi_ulong mmap_next_start;
> +
> int target_mprotect(abi_ulong start, abi_ulong len, int prot);
> abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
> int flags, int fd, off_t offset);
> @@ -26,7 +43,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
> abi_ulong new_size, unsigned long flags,
> abi_ulong new_addr);
> abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice);
> -extern abi_ulong mmap_next_start;
> abi_ulong mmap_find_vma(abi_ulong, abi_ulong, abi_ulong);
> void mmap_fork_start(void);
> void mmap_fork_end(int child);
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 556956c363..be621dc792 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -821,6 +821,34 @@ int main(int argc, char **argv, char **envp)
> reserved_va = max_reserved_va;
> }
>
> + /*
> + * Temporarily disable
> + * "comparison is always false due to limited range of data type"
> + * due to comparison between (possible) uint64_t and uintptr_t.
> + */
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wtype-limits"
> +
> + /*
> + * Select an initial value for task_unmapped_base that is in range.
> + */
> + if (reserved_va) {
> + if (TASK_UNMAPPED_BASE < reserved_va) {
> + task_unmapped_base = TASK_UNMAPPED_BASE;
> + } else {
> + /* The most common default formula is TASK_SIZE / 3. */
> + task_unmapped_base = TARGET_PAGE_ALIGN(reserved_va / 3);
> + }
> + } else if (TASK_UNMAPPED_BASE < UINTPTR_MAX) {
> + task_unmapped_base = TASK_UNMAPPED_BASE;
> + } else {
> + /* 32-bit host: pick something medium size. */
> + task_unmapped_base = 0x10000000;
> + }
> + mmap_next_start = task_unmapped_base;
> +
> +#pragma GCC diagnostic pop
> +
> {
> Error *err = NULL;
> if (seed_optarg != NULL) {
> diff --git a/linux-user/mmap.c b/linux-user/mmap.c
> index eb04fab8ab..84436d45c8 100644
> --- a/linux-user/mmap.c
> +++ b/linux-user/mmap.c
> @@ -299,20 +299,8 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
> return true;
> }
>
> -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
> -#ifdef TARGET_AARCH64
> -# define TASK_UNMAPPED_BASE 0x5500000000
> -#else
> -# define TASK_UNMAPPED_BASE (1ul << 38)
> -#endif
> -#else
> -#ifdef TARGET_HPPA
> -# define TASK_UNMAPPED_BASE 0xfa000000
> -#else
> -# define TASK_UNMAPPED_BASE 0x40000000
> -#endif
> -#endif
> -abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
> +abi_ulong task_unmapped_base;
> +abi_ulong mmap_next_start;
I feel we could help ourselves a bit more by documenting these globals
and what they mean:
task_unmapped_base represents the start of unmapped memory in the
guests programs address space. It is generally a function of the size
of the address space and it defined at the start of execution.
mmap_next_start is the base address for the next anonymous mmap and is
increased after each successful map, starting at task_unmapped_base.
One thing I'm slightly confused by is the ELF_ET_DYN_BASE can be above
this (or sometimes the same). Should the mapping of ELF segments be
handled with mmap_next_start? I assume once mmap_next_start meets the
mappings for the ELF segments we skip over until we get to more free
space after the program code?
>
> /*
> * Subroutine of mmap_find_vma, used when we have pre-allocated
> @@ -391,7 +379,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align)
>
> if ((addr & (align - 1)) == 0) {
> /* Success. */
> - if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
> + if (start == mmap_next_start && addr >= task_unmapped_base) {
> mmap_next_start = addr + size;
> }
> return addr;
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
2023-08-07 16:36 ` [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h Richard Henderson
@ 2023-08-08 9:19 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 9:19 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Provide default values that are as close as possible to the
> values used by the guest's kernel.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-07 16:36 ` [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap Richard Henderson
@ 2023-08-08 9:43 ` Alex Bennée
2023-08-08 11:57 ` Akihiko Odaki
0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 9:43 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Use this as extra protection for the guest mapping over
> any qemu host mappings.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> linux-user/elfload.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 36e4026f05..1b4bb2d5af 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
> /*
> * Reserve address space for all of this.
> *
> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
> - * exactly the address range that is required.
> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
> + * exactly the address range that is required. Without reserved_va,
> + * the guest address space is not isolated. We have attempted to avoid
> + * conflict with the host program itself via probe_guest_base, but using
> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
> *
> * Otherwise this is ET_DYN, and we are searching for a location
> * that can hold the memory space required. If the image is
> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
> */
> load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
> + (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
> -1, 0);
We should probably also check the result == load_addr for the places
where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
#ifndef MAP_FIXED_NOREPLACE
#define MAP_FIXED_NOREPLACE 0
#endif
See 2667e069e7 (linux-user: don't use MAP_FIXED in pgd_find_hole_fallback)
> if (load_addr == -1) {
> goto exit_mmap;
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
2023-08-07 16:36 ` [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter Richard Henderson
@ 2023-08-08 9:49 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 9:49 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Follow the lead of the linux kernel in fs/binfmt_elf.c,
> in which an ET_DYN executable which uses an interpreter
> (usually a PIE executable) is loaded away from where the
> interpreter itself will be loaded.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
<snip>
> @@ -3155,13 +3178,13 @@ static void load_elf_image(const char *image_name, int image_fd,
> *
> * Otherwise this is ET_DYN, and we are searching for a location
> * that can hold the memory space required. If the image is
> - * pre-linked, LOADDR will be non-zero, and the kernel should
> + * pre-linked, LOAD_ADDR will be non-zero, and the kernel should
> * honor that address if it happens to be free.
> *
> * In both cases, we will overwrite pages in this range with mappings
> * from the executable.
> */
> - load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
> + load_addr = target_mmap(load_addr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
> -1, 0);
See previous comment about verifying address.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable
2023-08-07 16:36 ` [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable Richard Henderson
@ 2023-08-08 10:54 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 10:54 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> From: Helge Deller <deller@gmx.de>
>
> While we attempt to load a ET_DYN executable far away from
> TASK_UNMAPPED_BASE, we are not completely in control of the
> address space layout. If the interpreter lands close to
> the executable, leaving insufficient heap space, move brk.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Helge Deller <deller@gmx.de>
> [rth: Re-order after ELF_ET_DYN_BASE patch so that we do not
> "temporarily break" tsan, and also to minimize the changes required.
> Remove image_info.reserve_brk as unused.]
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size
2023-08-07 16:36 ` [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size Richard Henderson
@ 2023-08-08 10:59 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 10:59 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Remove TARGET_ELF_EXEC_PAGESIZE, and 3 other TARGET_ELF_PAGE* macros
> based off of that. Rely on target_mmap to handle guest vs host page
> size mismatch.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss for host page size
2023-08-07 16:36 ` [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss " Richard Henderson
@ 2023-08-08 11:38 ` Alex Bennée
2023-08-08 15:56 ` Richard Henderson
0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 11:38 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Rely on target_mmap to handle guest vs host page size mismatch.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> linux-user/elfload.c | 54 +++++++++++++++++++-------------------------
> 1 file changed, 23 insertions(+), 31 deletions(-)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 964b21f997..6c28cb70ef 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -2213,44 +2213,36 @@ static abi_ulong setup_arg_pages(struct linux_binprm *bprm,
>
> /* Map and zero the bss. We need to explicitly zero any fractional pages
> after the data section (i.e. bss). */
> -static void zero_bss(abi_ulong elf_bss, abi_ulong last_bss, int prot)
> +static void zero_bss(abi_ulong start_bss, abi_ulong end_bss, int prot)
> {
> - uintptr_t host_start, host_map_start, host_end;
> + abi_ulong align_bss;
>
> - last_bss = TARGET_PAGE_ALIGN(last_bss);
> + align_bss = TARGET_PAGE_ALIGN(start_bss);
> + end_bss = TARGET_PAGE_ALIGN(end_bss);
>
> - /* ??? There is confusion between qemu_real_host_page_size and
> - qemu_host_page_size here and elsewhere in target_mmap, which
> - may lead to the end of the data section mapping from the file
> - not being mapped. At least there was an explicit test and
> - comment for that here, suggesting that "the file size must
> - be known". The comment probably pre-dates the introduction
> - of the fstat system call in target_mmap which does in fact
> - find out the size. What isn't clear is if the workaround
> - here is still actually needed. For now, continue with it,
> - but merge it with the "normal" mmap that would allocate the bss. */
> + if (start_bss < align_bss) {
> + int flags = page_get_flags(start_bss);
>
> - host_start = (uintptr_t) g2h_untagged(elf_bss);
> - host_end = (uintptr_t) g2h_untagged(last_bss);
> - host_map_start = REAL_HOST_PAGE_ALIGN(host_start);
> -
> - if (host_map_start < host_end) {
> - void *p = mmap((void *)host_map_start, host_end - host_map_start,
> - prot, MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> - if (p == MAP_FAILED) {
> - perror("cannot mmap brk");
> - exit(-1);
> + if (!(flags & PAGE_VALID)) {
> + /* Map the start of the bss. */
> + align_bss -= TARGET_PAGE_SIZE;
> + } else if (flags & PAGE_WRITE) {
> + /* The page is already mapped writable. */
> + memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
> + } else {
> + /* Read-only zeros? */
> + g_assert_not_reached();
> }
> }
>
> - /* Ensure that the bss page(s) are valid */
> - if ((page_get_flags(last_bss-1) & prot) != prot) {
> - page_set_flags(elf_bss & TARGET_PAGE_MASK, last_bss - 1,
> - prot | PAGE_VALID);
> - }
> -
> - if (host_start < host_map_start) {
> - memset((void *)host_start, 0, host_map_start - host_start);
> + if (align_bss < end_bss) {
> + abi_long err = target_mmap(align_bss, end_bss - align_bss, prot,
> + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
> + -1, 0);
> + if (err == -1) {
> + perror("cannot mmap brk");
> + exit(-1);
brk != bss even if brk generally comes after the bss section.
> + }
> }
> }
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too
2023-08-07 16:37 ` [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too Richard Henderson
@ 2023-08-08 11:43 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 11:43 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> If p_filesz == 0, then vaddr_ef == vaddr. We can reuse the
> code in zero_bss rather than incompletely duplicating it in
> load_elf_image.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base
2023-08-07 16:37 ` [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base Richard Henderson
@ 2023-08-08 11:45 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 11:45 UTC (permalink / raw)
To: Richard Henderson; +Cc: pbonzini, philmd, laurent, deller, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> The proper logging for probe_guest_base is in the main function.
> There is no need to duplicate that in the subroutines.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base
2023-08-07 16:37 ` [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base Richard Henderson
@ 2023-08-08 11:46 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 11:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: pbonzini, philmd, laurent, deller, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> The three sets of checks are identical, logically.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-08 9:43 ` Alex Bennée
@ 2023-08-08 11:57 ` Akihiko Odaki
2023-08-08 13:48 ` Alex Bennée
0 siblings, 1 reply; 45+ messages in thread
From: Akihiko Odaki @ 2023-08-08 11:57 UTC (permalink / raw)
To: Alex Bennée, Richard Henderson
Cc: pbonzini, philmd, laurent, deller, qemu-devel
On 2023/08/08 18:43, Alex Bennée wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> Use this as extra protection for the guest mapping over
>> any qemu host mappings.
>>
>> Tested-by: Helge Deller <deller@gmx.de>
>> Reviewed-by: Helge Deller <deller@gmx.de>
>> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> linux-user/elfload.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>> index 36e4026f05..1b4bb2d5af 100644
>> --- a/linux-user/elfload.c
>> +++ b/linux-user/elfload.c
>> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
>> /*
>> * Reserve address space for all of this.
>> *
>> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
>> - * exactly the address range that is required.
>> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
>> + * exactly the address range that is required. Without reserved_va,
>> + * the guest address space is not isolated. We have attempted to avoid
>> + * conflict with the host program itself via probe_guest_base, but using
>> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>> *
>> * Otherwise this is ET_DYN, and we are searching for a location
>> * that can hold the memory space required. If the image is
>> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>> */
>> load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>> - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>> + (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
>> -1, 0);
>
> We should probably also check the result == load_addr for the places
> where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
>
> #ifndef MAP_FIXED_NOREPLACE
> #define MAP_FIXED_NOREPLACE 0
> #endif
>
> See 2667e069e7 (linux-user: don't use MAP_FIXED in pgd_find_hole_fallback)
It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host does
not support it as commit e69e032d1a ("linux-user: Use
MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
https://patchew.org/QEMU/20230808115242.73025-1-akihiko.odaki@daynix.com/
>
>> if (load_addr == -1) {
>> goto exit_mmap;
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-08 11:57 ` Akihiko Odaki
@ 2023-08-08 13:48 ` Alex Bennée
2023-08-08 14:08 ` Akihiko Odaki
0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 13:48 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Richard Henderson, pbonzini, philmd, laurent, deller, qemu-devel
Akihiko Odaki <akihiko.odaki@daynix.com> writes:
> On 2023/08/08 18:43, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> Use this as extra protection for the guest mapping over
>>> any qemu host mappings.
>>>
>>> Tested-by: Helge Deller <deller@gmx.de>
>>> Reviewed-by: Helge Deller <deller@gmx.de>
>>> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>> linux-user/elfload.c | 9 ++++++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>> index 36e4026f05..1b4bb2d5af 100644
>>> --- a/linux-user/elfload.c
>>> +++ b/linux-user/elfload.c
>>> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
>>> /*
>>> * Reserve address space for all of this.
>>> *
>>> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
>>> - * exactly the address range that is required.
>>> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
>>> + * exactly the address range that is required. Without reserved_va,
>>> + * the guest address space is not isolated. We have attempted to avoid
>>> + * conflict with the host program itself via probe_guest_base, but using
>>> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>>> *
>>> * Otherwise this is ET_DYN, and we are searching for a location
>>> * that can hold the memory space required. If the image is
>>> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>> */
>>> load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>> - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>> + (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
>>> -1, 0);
>> We should probably also check the result == load_addr for the places
>> where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
>> #ifndef MAP_FIXED_NOREPLACE
>> #define MAP_FIXED_NOREPLACE 0
>> #endif
>> See 2667e069e7 (linux-user: don't use MAP_FIXED in
>> pgd_find_hole_fallback)
>
> It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
> does not support it as commit e69e032d1a ("linux-user: Use
> MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
> MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
> https://patchew.org/QEMU/20230808115242.73025-1-akihiko.odaki@daynix.com/
Hmm doesn't that push the problem to real mmap() calls to a host system
that doesn't support MAP_FIXED_NOREPLACE?
I wonder if we need an internal flag rather than overloading the host
flags?
>
>>
>>> if (load_addr == -1) {
>>> goto exit_mmap;
>>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-08 13:48 ` Alex Bennée
@ 2023-08-08 14:08 ` Akihiko Odaki
2023-08-08 14:20 ` Alex Bennée
0 siblings, 1 reply; 45+ messages in thread
From: Akihiko Odaki @ 2023-08-08 14:08 UTC (permalink / raw)
To: Alex Bennée
Cc: Richard Henderson, pbonzini, philmd, laurent, deller, qemu-devel
On 2023/08/08 22:48, Alex Bennée wrote:
>
> Akihiko Odaki <akihiko.odaki@daynix.com> writes:
>
>> On 2023/08/08 18:43, Alex Bennée wrote:
>>> Richard Henderson <richard.henderson@linaro.org> writes:
>>>
>>>> Use this as extra protection for the guest mapping over
>>>> any qemu host mappings.
>>>>
>>>> Tested-by: Helge Deller <deller@gmx.de>
>>>> Reviewed-by: Helge Deller <deller@gmx.de>
>>>> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>>> ---
>>>> linux-user/elfload.c | 9 ++++++---
>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>>> index 36e4026f05..1b4bb2d5af 100644
>>>> --- a/linux-user/elfload.c
>>>> +++ b/linux-user/elfload.c
>>>> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>> /*
>>>> * Reserve address space for all of this.
>>>> *
>>>> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
>>>> - * exactly the address range that is required.
>>>> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
>>>> + * exactly the address range that is required. Without reserved_va,
>>>> + * the guest address space is not isolated. We have attempted to avoid
>>>> + * conflict with the host program itself via probe_guest_base, but using
>>>> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>>>> *
>>>> * Otherwise this is ET_DYN, and we are searching for a location
>>>> * that can hold the memory space required. If the image is
>>>> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>> */
>>>> load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>>> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>>> - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>>> + (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
>>>> -1, 0);
>>> We should probably also check the result == load_addr for the places
>>> where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
>>> #ifndef MAP_FIXED_NOREPLACE
>>> #define MAP_FIXED_NOREPLACE 0
>>> #endif
>>> See 2667e069e7 (linux-user: don't use MAP_FIXED in
>>> pgd_find_hole_fallback)
>>
>> It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
>> does not support it as commit e69e032d1a ("linux-user: Use
>> MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
>> MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
>> https://patchew.org/QEMU/20230808115242.73025-1-akihiko.odaki@daynix.com/
>
> Hmm doesn't that push the problem to real mmap() calls to a host system
> that doesn't support MAP_FIXED_NOREPLACE?
That can happen even without that patch if you run QEMU built with a new
libc on old Linux so we should have prepared for that kind of situation.
The man page also says:
> Note that older kernels which do not recognize the MAP_FIXED_NOREPLACE
> flag will typically (upon detecting a collision with a preexisting
> mapping) fall back to a “non-MAP_FIXED” type of behavior: they will
> return an address that is different from the requested address.
> Therefore, backward-compatible software should check the returned
> address against the requested address.
https://man7.org/linux/man-pages/man2/mmap.2.html
It basically means MAP_FIXED_NOREPLACE has no effect on a host that
doesn't support it, and the existing code checking the returned address
should continue to work.
>
> I wonder if we need an internal flag rather than overloading the host
> flags?
>
>>
>>>
>>>> if (load_addr == -1) {
>>>> goto exit_mmap;
>>>
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
2023-08-08 14:08 ` Akihiko Odaki
@ 2023-08-08 14:20 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 14:20 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Richard Henderson, pbonzini, philmd, laurent, deller, qemu-devel
Akihiko Odaki <akihiko.odaki@daynix.com> writes:
> On 2023/08/08 22:48, Alex Bennée wrote:
>> Akihiko Odaki <akihiko.odaki@daynix.com> writes:
>>
>>> On 2023/08/08 18:43, Alex Bennée wrote:
>>>> Richard Henderson <richard.henderson@linaro.org> writes:
>>>>
>>>>> Use this as extra protection for the guest mapping over
>>>>> any qemu host mappings.
>>>>>
>>>>> Tested-by: Helge Deller <deller@gmx.de>
>>>>> Reviewed-by: Helge Deller <deller@gmx.de>
>>>>> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>>>> ---
>>>>> linux-user/elfload.c | 9 ++++++---
>>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>>>> index 36e4026f05..1b4bb2d5af 100644
>>>>> --- a/linux-user/elfload.c
>>>>> +++ b/linux-user/elfload.c
>>>>> @@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>>> /*
>>>>> * Reserve address space for all of this.
>>>>> *
>>>>> - * In the case of ET_EXEC, we supply MAP_FIXED so that we get
>>>>> - * exactly the address range that is required.
>>>>> + * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
>>>>> + * exactly the address range that is required. Without reserved_va,
>>>>> + * the guest address space is not isolated. We have attempted to avoid
>>>>> + * conflict with the host program itself via probe_guest_base, but using
>>>>> + * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
>>>>> *
>>>>> * Otherwise this is ET_DYN, and we are searching for a location
>>>>> * that can hold the memory space required. If the image is
>>>>> @@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int image_fd,
>>>>> */
>>>>> load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
>>>>> MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
>>>>> - (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
>>>>> + (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 0),
>>>>> -1, 0);
>>>> We should probably also check the result == load_addr for the places
>>>> where MAP_FIXED_NOREPLACE isn't supported as we have this in osdep.h:
>>>> #ifndef MAP_FIXED_NOREPLACE
>>>> #define MAP_FIXED_NOREPLACE 0
>>>> #endif
>>>> See 2667e069e7 (linux-user: don't use MAP_FIXED in
>>>> pgd_find_hole_fallback)
>>>
>>> It assumes target_mmap() emulates MAP_FIXED_NOREPLACE when the host
>>> does not support it as commit e69e032d1a ("linux-user: Use
>>> MAP_FIXED_NOREPLACE for do_brk()") already does, but defining
>>> MAP_FIXED_NOREPLACE zero breaks such emulation. I wrote a fix:
>>> https://patchew.org/QEMU/20230808115242.73025-1-akihiko.odaki@daynix.com/
>> Hmm doesn't that push the problem to real mmap() calls to a host
>> system
>> that doesn't support MAP_FIXED_NOREPLACE?
>
> That can happen even without that patch if you run QEMU built with a
> new libc on old Linux so we should have prepared for that kind of
> situation.
>
> The man page also says:
>> Note that older kernels which do not recognize the MAP_FIXED_NOREPLACE
>> flag will typically (upon detecting a collision with a preexisting
>> mapping) fall back to a “non-MAP_FIXED” type of behavior: they will
>> return an address that is different from the requested address.
>> Therefore, backward-compatible software should check the returned
>> address against the requested address.
> https://man7.org/linux/man-pages/man2/mmap.2.html
>
> It basically means MAP_FIXED_NOREPLACE has no effect on a host that
> doesn't support it, and the existing code checking the returned
> address should continue to work.
OK - as long as it doesn't barf on unknown flags.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-08 9:10 ` Alex Bennée
@ 2023-08-08 15:16 ` Richard Henderson
2023-08-08 16:59 ` Alex Bennée
0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-08 15:16 UTC (permalink / raw)
To: Alex Bennée
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
On 8/8/23 02:10, Alex Bennée wrote:
> One thing I'm slightly confused by is the ELF_ET_DYN_BASE can be above
> this (or sometimes the same). Should the mapping of ELF segments be
> handled with mmap_next_start? I assume once mmap_next_start meets the
> mappings for the ELF segments we skip over until we get to more free
> space after the program code?
ELF_ET_DYN_BASE is a hack imported from the kernel to put separation between an ET_DYN
main binary and TASK_UNMAPPED_BASE, so that the brk can follow the binary and have space
to grow.
All of this is part of the "legacy" memory layout, for which there is a personality flag.
For 8.2, I think we should work on implementing the "new" memory layout, which places
everything top-down. But most importantly it completely separates brk from the binary.
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
2023-08-08 9:10 ` Alex Bennée
@ 2023-08-08 15:35 ` Helge Deller
1 sibling, 0 replies; 45+ messages in thread
From: Helge Deller @ 2023-08-08 15:35 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, philmd, laurent, Akihiko Odaki
Hi Richard,
On 8/7/23 18:36, Richard Henderson wrote:
> Ensure that the chosen values for mmap_next_start and
> task_unmapped_base are within the guest address space.
>
> Tested-by: Helge Deller <deller@gmx.de>
> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
I've tested this whole series for quite some chroots.
Good thing is, that all targets do run and can execute a static hello-world
program.
So, overall that's good and I think the patch series should go in for 8.1.
Looking at the target's memmap I do see non-optimal heap-addresses for arm and
powerpc.
I know it's not directly related to your patches, but we should later
try to move the heap behind the executables where they can grow bigger.
Helge
armel-chroot and armhf-chroot:
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 2023 armv7l GNU/Linux
00021000-00042000 rw-p 00000000 00:00 0 [heap]
00400000-00408000 r-xp 00000000 fd:00 801471 /usr/bin/cat
00408000-0041f000 ---p 00000000 00:00 0
0041f000-00420000 r--p 0000f000 fd:00 801471 /usr/bin/cat
00420000-00421000 rw-p 00010000 fd:00 801471 /usr/bin/cat
40000000-40001000 ---p 00000000 00:00 0
40001000-40801000 rw-p 00000000 00:00 0 [stack]
40801000-40827000 r-xp 00000000 fd:00 839152 /usr/lib/arm-linux-gnueabi/ld-linux.so.3
40827000-40828000 r--p 00026000 fd:00 839152 /usr/lib/arm-linux-gnueabi/ld-linux.so.3
40828000-40829000 rw-p 00027000 fd:00 839152 /usr/lib/arm-linux-gnueabi/ld-linux.so.3
...
powerpc-chroot
Linux p100 6.4.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 20:51:12 UTC 2023 ppc GNU/Linux
00021000-00043000 rw-p 00000000 00:00 0 [heap]
001c0000-003d5000 r-xp 00000000 fd:00 577250 /usr/lib/powerpc-linux-gnu/libc.so.6
003d5000-003eb000 ---p 00215000 fd:00 577250 /usr/lib/powerpc-linux-gnu/libc.so.6
003eb000-003f0000 r--p 0021b000 fd:00 577250 /usr/lib/powerpc-linux-gnu/libc.so.6
003f0000-003f1000 rw-p 00220000 fd:00 577250 /usr/lib/powerpc-linux-gnu/libc.so.6
003f1000-003fb000 rw-p 00000000 00:00 0
00400000-0040b000 r-xp 00000000 fd:00 535994 /usr/bin/cat
0040b000-0041f000 ---p 00000000 00:00 0
0041f000-00420000 r--p 0000f000 fd:00 535994 /usr/bin/cat
00420000-00421000 rw-p 00010000 fd:00 535994 /usr/bin/cat
40000000-40001000 ---p 00000000 00:00 0
40001000-40801000 rw-p 00000000 00:00 0 [stack]
40801000-40834000 r-xp 00000000 fd:00 577246 /usr/lib/powerpc-linux-gnu/ld.so.1
40834000-4084f000 ---p 00000000 00:00 0
4084f000-40851000 r--p 0003e000 fd:00 577246 /usr/lib/powerpc-linux-gnu/ld.so.1
40851000-40852000 rw-p 00040000 fd:00 577246 /usr/lib/powerpc-linux-gnu/ld.so.1
40852000-40853000 r-xp 00000000 00:00 0
40853000-40855000 rw-p 00000000 00:00 0
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss for host page size
2023-08-08 11:38 ` Alex Bennée
@ 2023-08-08 15:56 ` Richard Henderson
0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-08 15:56 UTC (permalink / raw)
To: Alex Bennée
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
On 8/8/23 04:38, Alex Bennée wrote:
>> - if (host_start < host_map_start) {
>> - memset((void *)host_start, 0, host_map_start - host_start);
>> + if (align_bss < end_bss) {
>> + abi_long err = target_mmap(align_bss, end_bss - align_bss, prot,
>> + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
>> + -1, 0);
>> + if (err == -1) {
>> + perror("cannot mmap brk");
>> + exit(-1);
>
> brk != bss even if brk generally comes after the bss section.
I've removed this error report entirely, returning failure to the caller, which will then
handle it like any other mmap failure of the image.
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base
2023-08-07 16:37 ` [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base Richard Henderson
@ 2023-08-08 16:39 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 16:39 UTC (permalink / raw)
To: Richard Henderson; +Cc: pbonzini, philmd, laurent, deller, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Create a set of subroutines to collect a set of guest addresses,
> all of which must be mappable on the host. Use this within the
> renamed pgb_fixed subroutine to validate the user's choice of
> guest_base specified by the -B command-line option.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base
2023-08-07 16:37 ` [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base Richard Henderson
@ 2023-08-08 16:58 ` Alex Bennée
0 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 16:58 UTC (permalink / raw)
To: Richard Henderson; +Cc: pbonzini, philmd, laurent, deller, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Use pgb_addr_set to probe for all of the guest addresses,
> not just the main executable. Handle the identity map
> specially and separately from the search.
>
> If /proc/self/maps is available, utilize the full power
> of the interval tree search, rather than a linear search
> through the address list.
>
> If /proc/self/maps is not available, increase the skip
> between probes so that we do not probe every single page
> of the host address space. Choose 1 MiB for 32-bit hosts
> (max 4k probes) and 1 GiB for 64-bit hosts (possibly a
> large number of probes, but the large step makes it more
> likely to find empty space quicker).
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-08 15:16 ` Richard Henderson
@ 2023-08-08 16:59 ` Alex Bennée
2023-08-08 17:40 ` Richard Henderson
0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 16:59 UTC (permalink / raw)
To: Richard Henderson
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> On 8/8/23 02:10, Alex Bennée wrote:
>> One thing I'm slightly confused by is the ELF_ET_DYN_BASE can be above
>> this (or sometimes the same). Should the mapping of ELF segments be
>> handled with mmap_next_start? I assume once mmap_next_start meets the
>> mappings for the ELF segments we skip over until we get to more free
>> space after the program code?
>
> ELF_ET_DYN_BASE is a hack imported from the kernel to put separation
> between an ET_DYN main binary and TASK_UNMAPPED_BASE, so that the brk
> can follow the binary and have space to grow.
yeach :-/
>
> All of this is part of the "legacy" memory layout, for which there is a personality flag.
>
> For 8.2, I think we should work on implementing the "new" memory
> layout, which places everything top-down. But most importantly it
> completely separates brk from the binary.
The QEMU brk? The guest will have one emulated for it?
>
>
> r~
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
` (13 preceding siblings ...)
2023-08-07 16:37 ` [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base Richard Henderson
@ 2023-08-08 17:00 ` Alex Bennée
14 siblings, 0 replies; 45+ messages in thread
From: Alex Bennée @ 2023-08-08 17:00 UTC (permalink / raw)
To: Richard Henderson; +Cc: pbonzini, philmd, laurent, deller, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> This is the second half of
>
> https://patchew.org/QEMU/20230804220032.295411-1-richard.henderson@linaro.org/
>
> which I held back because of regressions with s390x testing.
>
> It turns out that patch 4, "Use MAP_FIXED_NOREPLACE for initial image mmap"
> actually triggered EEXIST, which meant that probe_guest_base did not
> do its job to select unused host virtual memory. It's a mystery why
> we have not seen larger problems because of this.
>
> As I kept digging, I found quite a number of problems within
> probe_guest_base and its subroutines. I have rewritten it completely.
> Hopefully it is much easier to understand in its new form.
>
> Testing this has been difficult, because it is most visible with
> non-PIE executables, and most modern distros default to PIE, and
> our current implementation of --disable-pie does not work.
I should say:
[alex@aarch64:~/l/q/b/ci.all.linux.static] review/linux-user-mapping-v4|… + retry.py -n 100 -c -- make check-tcg
Gives:
Results summary:
0: 100 times (100.00%), avg time 150.644 (14.47 varience/3.80 deviation)
Ran command 100 times, 100 passes
which I think has squashed an intermittent bug plaguing CI so have a:
Tested-by: Alex Bennée <alex.bennee@linaro.org>
for the series.
>
>
> r~
>
>
> Helge Deller (1):
> linux-user: Adjust initial brk when interpreter is close to executable
>
> Richard Henderson (13):
> linux-user: Adjust task_unmapped_base for reserved_va
> linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
> linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
> linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
> linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
> linux-user: Do not adjust image mapping for host page size
> linux-user: Do not adjust zero_bss for host page size
> linux-user: Use zero_bss for PT_LOAD with no file contents too
> util/selfmap: Rewrite using qemu/interval-tree.h
> linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base
> linux-user: Consolidate guest bounds check in probe_guest_base
> linux-user: Rewrite fixed probe_guest_base
> linux-user: Rewrite non-fixed probe_guest_base
>
> include/qemu/selfmap.h | 20 +-
> linux-user/aarch64/target_mman.h | 13 +
> linux-user/alpha/target_mman.h | 11 +
> linux-user/arm/target_mman.h | 11 +
> linux-user/cris/target_mman.h | 12 +
> linux-user/hexagon/target_mman.h | 13 +
> linux-user/hppa/target_mman.h | 6 +
> linux-user/i386/target_mman.h | 16 +
> linux-user/loongarch64/target_mman.h | 11 +
> linux-user/m68k/target_mman.h | 5 +
> linux-user/microblaze/target_mman.h | 11 +
> linux-user/mips/target_mman.h | 10 +
> linux-user/nios2/target_mman.h | 10 +
> linux-user/openrisc/target_mman.h | 10 +
> linux-user/ppc/target_mman.h | 20 +
> linux-user/qemu.h | 1 -
> linux-user/riscv/target_mman.h | 10 +
> linux-user/s390x/target_mman.h | 20 +
> linux-user/sh4/target_mman.h | 7 +
> linux-user/sparc/target_mman.h | 25 +
> linux-user/user-mmap.h | 5 +-
> linux-user/x86_64/target_mman.h | 15 +
> linux-user/xtensa/target_mman.h | 10 +
> linux-user/elfload.c | 788 +++++++++++++--------------
> linux-user/main.c | 43 ++
> linux-user/mmap.c | 19 +-
> linux-user/syscall.c | 15 +-
> util/selfmap.c | 114 ++--
> 28 files changed, 777 insertions(+), 474 deletions(-)
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va
2023-08-08 16:59 ` Alex Bennée
@ 2023-08-08 17:40 ` Richard Henderson
0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-08 17:40 UTC (permalink / raw)
To: Alex Bennée
Cc: pbonzini, philmd, laurent, deller, Akihiko Odaki, qemu-devel
On 8/8/23 09:59, Alex Bennée wrote:
>> All of this is part of the "legacy" memory layout, for which there is a personality flag.
>>
>> For 8.2, I think we should work on implementing the "new" memory
>> layout, which places everything top-down. But most importantly it
>> completely separates brk from the binary.
>
> The QEMU brk? The guest will have one emulated for it?
In this context we're talking about guest memory layout.
So there is one brk: the guest one.
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Fix interval_tree_iter_first() to check root node value
2023-08-07 18:17 ` Richard Henderson
@ 2023-08-09 15:11 ` Helge Deller
2023-08-09 15:23 ` Richard Henderson
2023-08-10 21:31 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Ilya Leoshkevich
1 sibling, 1 reply; 45+ messages in thread
From: Helge Deller @ 2023-08-09 15:11 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Fix a crash in qemu-user when running
cat /proc/self/maps
in a chroot, where /proc isn't mounted.
The problem was introduced by commit 3ce3dd8ca965 ("util/selfmap:
Rewrite using qemu/interval-tree.h") where in open_self_maps_1() the
function read_self_maps() is called and which returns NULL if it can't
read the hosts /proc/self/maps file. Afterwards that NULL is fed into
interval_tree_iter_first() which doesn't check if the root node is NULL.
Fix it by adding a check if root is NULL and return NULL in that case.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 3ce3dd8ca965 ("util/selfmap: Rewrite using qemu/interval-tree.h")
diff --git a/util/interval-tree.c b/util/interval-tree.c
index f2866aa7d3..53465182e6 100644
--- a/util/interval-tree.c
+++ b/util/interval-tree.c
@@ -797,7 +797,7 @@ IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
{
IntervalTreeNode *node, *leftmost;
- if (!root->rb_root.rb_node) {
+ if (!root || !root->rb_root.rb_node) {
return NULL;
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: Fix interval_tree_iter_first() to check root node value
2023-08-09 15:11 ` Fix interval_tree_iter_first() to check root node value Helge Deller
@ 2023-08-09 15:23 ` Richard Henderson
2023-08-09 15:53 ` Helge Deller
0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2023-08-09 15:23 UTC (permalink / raw)
To: Helge Deller, qemu-devel
On 8/9/23 08:11, Helge Deller wrote:
> Fix a crash in qemu-user when running
>
> cat /proc/self/maps
>
> in a chroot, where /proc isn't mounted.
>
> The problem was introduced by commit 3ce3dd8ca965 ("util/selfmap:
> Rewrite using qemu/interval-tree.h") where in open_self_maps_1() the
> function read_self_maps() is called and which returns NULL if it can't
> read the hosts /proc/self/maps file. Afterwards that NULL is fed into
> interval_tree_iter_first() which doesn't check if the root node is NULL.
>
> Fix it by adding a check if root is NULL and return NULL in that case.
>
> Signed-off-by: Helge Deller <deller@gmx.de>
> Fixes: 3ce3dd8ca965 ("util/selfmap: Rewrite using qemu/interval-tree.h")
>
> diff --git a/util/interval-tree.c b/util/interval-tree.c
> index f2866aa7d3..53465182e6 100644
> --- a/util/interval-tree.c
> +++ b/util/interval-tree.c
> @@ -797,7 +797,7 @@ IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
> {
> IntervalTreeNode *node, *leftmost;
>
> - if (!root->rb_root.rb_node) {
> + if (!root || !root->rb_root.rb_node) {
I guess this is good enough for 8.1. Before the conversion to interval-tree we would also
emit nothing.
I've already done a rewrite for 8.2, and I noticed this problem. There I emit what
mapping information that I have, which is everything except for the device+path data.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Fix interval_tree_iter_first() to check root node value
2023-08-09 15:23 ` Richard Henderson
@ 2023-08-09 15:53 ` Helge Deller
2023-08-09 16:33 ` Richard Henderson
0 siblings, 1 reply; 45+ messages in thread
From: Helge Deller @ 2023-08-09 15:53 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
On 8/9/23 17:23, Richard Henderson wrote:
> On 8/9/23 08:11, Helge Deller wrote:
>> Fix a crash in qemu-user when running
>>
>> cat /proc/self/maps
>>
>> in a chroot, where /proc isn't mounted.
>>
>> The problem was introduced by commit 3ce3dd8ca965 ("util/selfmap:
>> Rewrite using qemu/interval-tree.h") where in open_self_maps_1() the
>> function read_self_maps() is called and which returns NULL if it can't
>> read the hosts /proc/self/maps file. Afterwards that NULL is fed into
>> interval_tree_iter_first() which doesn't check if the root node is NULL.
>>
>> Fix it by adding a check if root is NULL and return NULL in that case.
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> Fixes: 3ce3dd8ca965 ("util/selfmap: Rewrite using qemu/interval-tree.h")
>>
>> diff --git a/util/interval-tree.c b/util/interval-tree.c
>> index f2866aa7d3..53465182e6 100644
>> --- a/util/interval-tree.c
>> +++ b/util/interval-tree.c
>> @@ -797,7 +797,7 @@ IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
>> {
>> IntervalTreeNode *node, *leftmost;
>>
>> - if (!root->rb_root.rb_node) {
>> + if (!root || !root->rb_root.rb_node) {
>
>
> I guess this is good enough for 8.1. Before the conversion to interval-tree we would also emit nothing.
Yes and yes.
> I've already done a rewrite for 8.2, and I noticed this problem.
> There I emit what mapping information that I have, which is
> everything except for the device+path data.
nice.
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Shall I send a pull request?
If so, is it OK that I include this patch in the pull-request as well?
linux-user: Fix openat() emulation to correctly detect accesses to /proc
https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg00165.html
which already has been R-b: Daniel P. Berrangé
Helge
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Fix interval_tree_iter_first() to check root node value
2023-08-09 15:53 ` Helge Deller
@ 2023-08-09 16:33 ` Richard Henderson
0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2023-08-09 16:33 UTC (permalink / raw)
To: Helge Deller, qemu-devel
On 8/9/23 08:53, Helge Deller wrote:
> On 8/9/23 17:23, Richard Henderson wrote:
>> On 8/9/23 08:11, Helge Deller wrote:
>>> Fix a crash in qemu-user when running
>>>
>>> cat /proc/self/maps
>>>
>>> in a chroot, where /proc isn't mounted.
>>>
>>> The problem was introduced by commit 3ce3dd8ca965 ("util/selfmap:
>>> Rewrite using qemu/interval-tree.h") where in open_self_maps_1() the
>>> function read_self_maps() is called and which returns NULL if it can't
>>> read the hosts /proc/self/maps file. Afterwards that NULL is fed into
>>> interval_tree_iter_first() which doesn't check if the root node is NULL.
>>>
>>> Fix it by adding a check if root is NULL and return NULL in that case.
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> Fixes: 3ce3dd8ca965 ("util/selfmap: Rewrite using qemu/interval-tree.h")
>>>
>>> diff --git a/util/interval-tree.c b/util/interval-tree.c
>>> index f2866aa7d3..53465182e6 100644
>>> --- a/util/interval-tree.c
>>> +++ b/util/interval-tree.c
>>> @@ -797,7 +797,7 @@ IntervalTreeNode *interval_tree_iter_first(IntervalTreeRoot *root,
>>> {
>>> IntervalTreeNode *node, *leftmost;
>>>
>>> - if (!root->rb_root.rb_node) {
>>> + if (!root || !root->rb_root.rb_node) {
>>
>>
>> I guess this is good enough for 8.1. Before the conversion to interval-tree we would
>> also emit nothing.
>
> Yes and yes.
>
>> I've already done a rewrite for 8.2, and I noticed this problem.
>> There I emit what mapping information that I have, which is
>> everything except for the device+path data.
>
> nice.
>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
> Shall I send a pull request?
> If so, is it OK that I include this patch in the pull-request as well?
> linux-user: Fix openat() emulation to correctly detect accesses to /proc
> https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg00165.html
> which already has been R-b: Daniel P. Berrangé
I can pick them both up -- I have other linux-user patches to send.
r~
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h
2023-08-07 18:17 ` Richard Henderson
2023-08-09 15:11 ` Fix interval_tree_iter_first() to check root node value Helge Deller
@ 2023-08-10 21:31 ` Ilya Leoshkevich
2023-08-10 22:06 ` Helge Deller
1 sibling, 1 reply; 45+ messages in thread
From: Ilya Leoshkevich @ 2023-08-10 21:31 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: pbonzini, philmd, laurent, deller
On Mon, 2023-08-07 at 11:17 -0700, Richard Henderson wrote:
> On 8/7/23 09:37, Richard Henderson wrote:
> > We will want to be able to search the set of mappings.
> > For this patch, the two users iterate the tree in order.
> >
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> > include/qemu/selfmap.h | 20 ++++----
> > linux-user/elfload.c | 14 +++--
> > linux-user/syscall.c | 15 +++---
> > util/selfmap.c | 114 +++++++++++++++++++++++++-----------
> > -----
> > 4 files changed, 96 insertions(+), 67 deletions(-)
>
> I should note that, for 8.2, this will enable a rewrite of
> open_self_maps_1 so that it
> does not require page-by-page checking of page_get_flags.
>
> My idea is that open_self_maps_1 would use walk_memory_regions to see
> all guest memory
> regions. The per-region callback would cross-check with the host-
> region interval tree to
> find the dev+inode+path.
>
> Cc Ilya and Helge, since there are two outstanding changes to
> open_self_maps.
>
>
> r~
My outstanding change should not be sensitive to this; it should be
possible to put it in both before or after the rewrite.
I really like this idea though, since I looked into ppc64le and there
printing maps is quite broken: it's not just that QEMU can't determine
the names of the mapped files, but also a number of regions are simply
missing. This also affects core dumps generated by GDB attached to
gdbstub.
For example, cat /proc/self/maps has the following internal page
layout:
start end size prot
0000000010000000-000000001000d000 000000000000d000 r-x
000000001000d000-0000000010010000 0000000000003000 ---
0000000010010000-000000001001f000 000000000000f000 r--
000000001001f000-0000000010020000 0000000000001000 r--
0000000010020000-0000000010021000 0000000000001000 rw-
0000100000000000-0000100000010000 0000000000010000 ---
0000100000010000-0000100000810000 0000000000800000 rw-
0000100000810000-0000100000830000 0000000000020000 r-x
0000100000830000-000010000083d000 000000000000d000 r-x
000010000083d000-0000100000840000 0000000000003000 ---
0000100000840000-000010000084f000 000000000000f000 r--
000010000084f000-0000100000850000 0000000000001000 r--
0000100000850000-0000100000851000 0000000000001000 rw-
0000100000851000-0000100000852000 0000000000001000 rw-
0000100000860000-0000100000861000 0000000000001000 r-x
0000100000880000-0000100000a50000 00000000001d0000 r-x
0000100000a50000-0000100000a60000 0000000000010000 r--
0000100000a60000-0000100000a70000 0000000000010000 rw-
0000100000a70000-0000100000b70000 0000000000100000 rw-
0000100000b70000-000010000742d000 00000000068bd000 r--
00007fffb22b0000-00007fffb22e0000 0000000000030000 rw-
but prints only:
100000000000-100000010000 ---p 00000000 00:00 0
100000010000-100000810000 rw-p 00000000 00:00 0
[stack]
100000810000-100000830000 r-xp 00000000 fd:00 3049136
/usr/lib64/ld-2.17.so
100000880000-100000a50000 r-xp 00000000 fd:00 3017372
/usr/lib64/libc-2.17.so
100000a50000-100000a60000 r--p 001c0000 fd:00 3017372
/usr/lib64/libc-2.17.so
100000a60000-100000a70000 rw-p 001d0000 fd:00 3017372
/usr/lib64/libc-2.17.so
100000a70000-100000b70000 rw-p 00000000 00:00 0
7fffb22b0000-7fffb22e0000 rw-p 00000000 00:00 0
I don't see a good way to prevent page_check_range() from rejecting
most of the mappings with the current code structure, but I think that
after the proposed rewrite it should begin to just work.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h
2023-08-10 21:31 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Ilya Leoshkevich
@ 2023-08-10 22:06 ` Helge Deller
0 siblings, 0 replies; 45+ messages in thread
From: Helge Deller @ 2023-08-10 22:06 UTC (permalink / raw)
To: Ilya Leoshkevich, Richard Henderson, qemu-devel; +Cc: pbonzini, philmd, laurent
On 8/10/23 23:31, Ilya Leoshkevich wrote:
> On Mon, 2023-08-07 at 11:17 -0700, Richard Henderson wrote:
>> On 8/7/23 09:37, Richard Henderson wrote:
>>> We will want to be able to search the set of mappings.
>>> For this patch, the two users iterate the tree in order.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>> include/qemu/selfmap.h | 20 ++++----
>>> linux-user/elfload.c | 14 +++--
>>> linux-user/syscall.c | 15 +++---
>>> util/selfmap.c | 114 +++++++++++++++++++++++++-----------
>>> -----
>>> 4 files changed, 96 insertions(+), 67 deletions(-)
>>
>> I should note that, for 8.2, this will enable a rewrite of
>> open_self_maps_1 so that it
>> does not require page-by-page checking of page_get_flags.
>>
>> My idea is that open_self_maps_1 would use walk_memory_regions to see
>> all guest memory
>> regions. The per-region callback would cross-check with the host-
>> region interval tree to
>> find the dev+inode+path.
>>
>> Cc Ilya and Helge, since there are two outstanding changes to
>> open_self_maps.
I think the rewrite is good.
My patches regarding the map aren't important, I can adjust them
afterwards and resend (if necessary).
Helge
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2023-08-10 22:07 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-07 16:36 [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 01/14] linux-user: Adjust task_unmapped_base for reserved_va Richard Henderson
2023-08-08 9:10 ` Alex Bennée
2023-08-08 15:16 ` Richard Henderson
2023-08-08 16:59 ` Alex Bennée
2023-08-08 17:40 ` Richard Henderson
2023-08-08 15:35 ` Helge Deller
2023-08-07 16:36 ` [PATCH for-8.1 v10 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h Richard Henderson
2023-08-08 9:19 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 03/14] linux-user: Define ELF_ET_DYN_BASE " Richard Henderson
2023-08-07 16:36 ` [PATCH for-8.1 v10 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap Richard Henderson
2023-08-08 9:43 ` Alex Bennée
2023-08-08 11:57 ` Akihiko Odaki
2023-08-08 13:48 ` Alex Bennée
2023-08-08 14:08 ` Akihiko Odaki
2023-08-08 14:20 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter Richard Henderson
2023-08-08 9:49 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 06/14] linux-user: Adjust initial brk when interpreter is close to executable Richard Henderson
2023-08-08 10:54 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 07/14] linux-user: Do not adjust image mapping for host page size Richard Henderson
2023-08-08 10:59 ` Alex Bennée
2023-08-07 16:36 ` [PATCH for-8.1 v10 08/14] linux-user: Do not adjust zero_bss " Richard Henderson
2023-08-08 11:38 ` Alex Bennée
2023-08-08 15:56 ` Richard Henderson
2023-08-07 16:37 ` [PATCH for-8.1 v10 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too Richard Henderson
2023-08-08 11:43 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Richard Henderson
2023-08-07 18:17 ` Richard Henderson
2023-08-09 15:11 ` Fix interval_tree_iter_first() to check root node value Helge Deller
2023-08-09 15:23 ` Richard Henderson
2023-08-09 15:53 ` Helge Deller
2023-08-09 16:33 ` Richard Henderson
2023-08-10 21:31 ` [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h Ilya Leoshkevich
2023-08-10 22:06 ` Helge Deller
2023-08-08 6:15 ` Michael Tokarev
2023-08-07 16:37 ` [PATCH for-8.1 v10 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base Richard Henderson
2023-08-08 11:45 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 12/14] linux-user: Consolidate guest bounds check in probe_guest_base Richard Henderson
2023-08-08 11:46 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 13/14] linux-user: Rewrite fixed probe_guest_base Richard Henderson
2023-08-08 16:39 ` Alex Bennée
2023-08-07 16:37 ` [PATCH for-8.1 v10 14/14] linux-user: Rewrite non-fixed probe_guest_base Richard Henderson
2023-08-08 16:58 ` Alex Bennée
2023-08-08 17:00 ` [PATCH for-8.1 v10 00/14] linux-user: image mapping fixes Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).