All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Richard Henderson" <richard.henderson@linaro.org>
Cc: qemu-devel@nongnu.org, laurent@vivier.eu
Subject: qemu-x86_64, buster /sbin/ldconfig and setup_arg_pages (a mind dump)
Date: Fri, 17 Jan 2020 17:33:20 +0000	[thread overview]
Message-ID: <874kwukmxr.fsf@linaro.org> (raw)


Hi Richard,

While I was attempting to test the new vsyscall patches for x86 I
discovered I couldn't debootstrap an x86_64 buster image on my ARM box.
After digging further into it I discovered it was because executing
/sbin/ldconfig crashes and aborts the bootstrap.

This is helpfully reproducible on my main development system which is
also running buster:

  ./x86_64-linux-user/qemu-x86_64 /sbin/ldconfig
  setup_arg_pages: 00000040000e0000
  target_set_brk: new_brk=00000040000dfdf8
  do_brk(0000000000000000) -> 00000040000e0000 (!new_brk)
  do_brk(00000040000e11c0) -> do_brk: allocating 8192 => 00007fb2dace5000
  00000040000e0000 (mapped_addr != -1 or brk_page)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  fish: Job 2, “./x86_64-linux-user/qemu-x86_64…” terminated by signal SIGSEGV (Address boundary error)

The failure of the second do_brk during the early setup of the binaries
TLS data area. However for some reason this isn't always the case. For
example with testthread which also uses TLS:

  ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread
  setup_arg_pages: 0000004000000000
  target_set_brk: new_brk=00000000004c8558
  do_brk(0000000000000000) -> 00000000004c9000 (!new_brk)
  do_brk(00000000004ca1c0) -> do_brk: allocating 8192 => 00000000004c9000
  00000000004ca1c0 (mapped_addr == brk_page)
  do_brk(00000000004eb1c0) -> do_brk: allocating 135168 => 00000000004cb000
  00000000004eb1c0 (mapped_addr == brk_page)
  do_brk(00000000004ec000) -> 00000000004ec000 (new_brk <= brk_page)
  thread1: 0 hello1
  thread2: 0 hello2
  thread1: 1 hello1

Ultimately the failure is down to setup_arg_pages allocating too low in
the address space in the ldconfig case which leaves the second brk
unable to example it's region of memory. Turning on -d page and you can
see the region forming:

  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000df000 0000000000007000 rw-
  00000040000df000-00000040000e0000 0000000000001000 ---
  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040008e1000 0000000000809000 rw-
  setup_arg_pages: 00000040000e0000
  guest_base  0x0
  page layout changed following binary load
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000e0000 0000000000008000 rw-
  00000040000e0000-00000040000e1000 0000000000001000 ---
  00000040000e1000-00000040008e1000 0000000000800000 rw-
  start_brk   0x0000000000000000
  end_code    0x00000040000ad971
  start_code  0x0000004000009000
  start_data  0x00000040000d8778
  end_data    0x00000040000de510
  start_stack 0x00000040008e02d0
  brk         0x00000040000dfdf8
  entry       0x000000400000a370
  argv_start  0x00000040008e02d8
  env_start   0x00000040008e02e8
  auxv_start  0x00000040008e0428
  target_set_brk: new_brk=00000040000dfdf8
  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000e0000 0000000000008000 rw-
  00000040000e0000-00000040000e1000 0000000000001000 ---
  00000040000e1000-00000040008e2000 0000000000801000 rw-

So it looks like setup_arg_pages just creates a segment right in the
middle of a previously allocated block of storage. This is odd because
the loader basically just leaves it to mmap to pick a region:

    error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

AFAICT this just depends on where we have allocated last, in the
testthread case we already have a high mapping to splat:

  page layout changed following target_mmap
  start            end              size             prot
  0000000000400000-0000000000401000 0000000000001000 r--
  0000000000401000-0000000000495000 0000000000094000 r-x
  0000000000495000-00000000004bc000 0000000000027000 r--
  00000000004bd000-00000000004c9000 000000000000c000 rw-
  0000004000000000-0000004000801000 0000000000801000 rw-
  setup_arg_pages: 0000004000000000
  guest_base  0x0
  page layout changed following binary load
  start            end              size             prot
  0000000000400000-0000000000401000 0000000000001000 r--
  0000000000401000-0000000000495000 0000000000094000 r-x
  0000000000495000-00000000004bc000 0000000000027000 r--
  00000000004bd000-00000000004c9000 000000000000c000 rw-
  0000004000000000-0000004000001000 0000000000001000 ---
  0000004000001000-0000004000801000 0000000000800000 rw-

And comparing the ldconfig to a "normal" case we can see that the
problem is all of ldconfig has been allocated in the TASK_UNMAPPED_BASE
region. This is due to ldconfig having a DYNAMIC region without a load
address which causes mmap_find_vma to get called to find space for it
and then all the subsequent anonymous regions that are needed:

  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000000000
  load_elf_image: mapping un-backed region: 0000004000000000:0000000000009000
  load_elf_image: mapping un-backed region: 0000004000009000:00000000000a5000
  load_elf_image: mapping un-backed region: 00000040000ae000:000000000002a000
  load_elf_image: mapping un-backed region: 00000040000d8000:0000000000007000
  mmap_find_vma: 00000040000e0000
  setup_arg_pages: 00000040000e0000
  target_set_brk: new_brk=00000040000dfdf8
  mmap_find_vma: 00000040008e1000
  mmap_find_vma: 00000040008e2000
  do_brk(0000000000000000) -> 00000040000e0000 (!new_brk)
  do_brk(00000040000e11c0) -> mmap_find_vma: 00000040000e0000
  do_brk: allocating 8192 => 00007fb999e49000
  00000040000e0000 (mapped_addr != -1 or brk_page)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped

But no actually this all seems to be normal for dynamically linked
things - but still something must be different:

  ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread.dyn
  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000000000
  load_elf_image: mapping un-backed region: 0000004000000000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000001000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000002000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000003000:0000000000002000
  mmap_find_vma: 0000004000005000
  setup_arg_pages: 0000004000005000
  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000806000
  load_elf_image: mapping un-backed region: 0000004000806000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000807000:000000000001e000
  load_elf_image: mapping un-backed region: 0000004000825000:0000000000008000
  load_elf_image: mapping un-backed region: 000000400082d000:0000000000002000
  target_set_brk: new_brk=0000004000004070
  mmap_find_vma: 0000004000830000
  mmap_find_vma: 0000004000831000
  do_brk(0000000000000000) -> 0000004000005000 (!new_brk)
  mmap_find_vma: 0000004000832000
  mmap_find_vma: 0000004000857000
  mmap_find_vma: 0000004000878000
  mmap_find_vma: 000000400087a000
  mmap_find_vma: 0000004000a3b000
  mmap_find_vma: 0000004000a3e000
  do_brk(0000000000000000) -> 0000004000005000 (!new_brk)
  do_brk(0000004000026000) -> mmap_find_vma: 0000004000005000
  do_brk: allocating 135168 => 00007fa00659b000
  0000004000005000 (mapped_addr != -1 or brk_page)
  mmap_find_vma: 000000400123f000
  mmap_find_vma: 000000400923f000

Recompiling testthread as a dynamic executable and it runs fine, leaving
itself enough space to expand the brk region at least once.

So what do we take away from this?

 * we need testcases to exercise the memory layout of dynamic binaries
 * "special" dynamic binaries can break our careful memory layout
 * I feel as though I've trodden on a nest of vipers

Does any of this track with you? What is different about ldconfig that
breaks our memory placement?

-- 
Alex Bennée


             reply	other threads:[~2020-01-17 17:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 17:33 Alex Bennée [this message]
2020-01-17 18:10 ` qemu-x86_64, buster /sbin/ldconfig and setup_arg_pages (a mind dump) Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kwukmxr.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=laurent@vivier.eu \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.