qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Helge Deller <deller@gmx.de>
Cc: qemu-devel@nongnu.org, Laurent Vivier <laurent@vivier.eu>
Subject: Re: [PATCH 5/5] linux-user: Do not align brk with host page size
Date: Tue, 1 Aug 2023 07:43:54 +0900	[thread overview]
Message-ID: <aa7e87f0-ae72-0147-3e8e-4d06b736b28c@daynix.com> (raw)
In-Reply-To: <7b55e607-45f9-a574-fb00-b65508e57b10@gmx.de>

On 2023/08/01 0:51, Helge Deller wrote:
> On 7/31/23 10:03, Akihiko Odaki wrote:
>> do_brk() minimizes calls into target_mmap() by aligning the address
>> with host page size, which is potentially larger than the target page
>> size.
> 
> Keep in mind, that host page size can be smaller than target page size too.
> That's the reason why host brk (brk_page) and target brk (target_brk)
> needs to be tracked individually.
> So, it's not an optimization, but required.
> Btw, I think we have been there before with the idea of
> just keeping track of host pages...

It does not matter even if the host page size is smaller or larger. 
do_brk() solely relies on target_mmap(), which works with target page size.

> 
>> However, the current implementation of this optimization has two
>> bugs:
>>
>> - The start of brk is rounded up with the host page size while brk
>>    advertises an address aligned with the target page size as the
>>    beginning of brk. This makes the beginning of brk unmapped.
>> - Content clearing after mapping is flawed. The size to clear is
>>    specified as HOST_PAGE_ALIGN(brk_page) - brk_page, but brk_page is
>>    aligned with the host page size so it is always zero.
>>
>> This optimization actually has no practical benefit. It makes difference
>> when brk() is called multiple times with values in a range of the host
>> page size. However, sophisticated memory allocators try to avoid to
>> make such frequent brk() calls. For example, glibc 2.37 calls brk() to
>> shrink the heap only when there is a room more than 128 KiB. It is
>> rare to have a page size larger than 128 KiB if it happens.
>>
>> Let's remove the optimization to fix the bugs and make the code simpler.
>>
>> Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1616
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> ---
>>   linux-user/elfload.c |  4 ++--
>>   linux-user/syscall.c | 54 ++++++++++----------------------------------
>>   2 files changed, 14 insertions(+), 44 deletions(-)
>>
>> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>> index 861ec07abc..2aee2298ec 100644
>> --- a/linux-user/elfload.c
>> +++ b/linux-user/elfload.c
>> @@ -3678,8 +3678,8 @@ int load_elf_binary(struct linux_binprm *bprm, 
>> struct image_info *info)
>>        * to mmap pages in this space.
>>        */
>>       if (info->reserve_brk) {
>> -        abi_ulong start_brk = HOST_PAGE_ALIGN(info->brk);
>> -        abi_ulong end_brk = HOST_PAGE_ALIGN(info->brk + 
>> info->reserve_brk);
>> +        abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk);
>> +        abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + 
>> info->reserve_brk);
> 
> In my patchset I removed the reserve_brk stuff...
> 
>>           target_munmap(start_brk, end_brk - start_brk);
>>       }
>>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index ebdc8c144c..475260b7ce 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -802,81 +802,51 @@ static inline int host_to_target_sock_type(int 
>> host_type)
>>   }
>>
>>   static abi_ulong target_brk, initial_target_brk;
>> -static abi_ulong brk_page;
> 
> with that you loose the ability to track the brk address on host.

brk_page actually doesn't track the address on the host. It's just an 
address on the target but aligned with host page size. And the alignment 
is purely for optimization.

> 
> 
>>   void target_set_brk(abi_ulong new_brk)
>>   {
>>       target_brk = TARGET_PAGE_ALIGN(new_brk);
>>       initial_target_brk = target_brk;
>> -    brk_page = HOST_PAGE_ALIGN(target_brk);
>>   }
>>
>>   /* do_brk() must return target values and target errnos. */
>>   abi_long do_brk(abi_ulong brk_val)
>>   {
>>       abi_long mapped_addr;
>> -    abi_ulong new_alloc_size;
>> -    abi_ulong new_brk, new_host_brk_page;
>> +    abi_ulong new_brk;
>> +    abi_ulong old_brk;
>>
>>       /* brk pointers are always untagged */
>>
>> -    /* return old brk value if brk_val unchanged */
>> -    if (brk_val == target_brk) {
>> -        return target_brk;
>> -    }
>> -
>>       /* do not allow to shrink below initial brk value */
>>       if (brk_val < initial_target_brk) {
>>           return target_brk;
>>       }
>>
>>       new_brk = TARGET_PAGE_ALIGN(brk_val);
>> -    new_host_brk_page = HOST_PAGE_ALIGN(brk_val);
>> +    old_brk = TARGET_PAGE_ALIGN(target_brk);
>>
>> -    /* brk_val and old target_brk might be on the same page */
>> -    if (new_brk == TARGET_PAGE_ALIGN(target_brk)) {
>> -        /* empty remaining bytes in (possibly larger) host page */
>> -        memset(g2h_untagged(new_brk), 0, new_host_brk_page - new_brk);
> 
> you can't drop that.
> guest could have clobbered the upper parts, e.g. when it first mapped
> all (e.g. 8k), wrote to it, then released upper half of the host page 
> (4k) and
> then remaps the upper 4k again.
> In that case we need to ensure that the upper 4k gets cleaned. On a 
> physical
> machine the kernel would have requested a new page, but here we are stuck
> with the bigger host page which we can't simply release.

Such a case should also be handled with target_mmap().

Regards,
Akihiko Odaki


  reply	other threads:[~2023-07-31 22:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  8:03 [PATCH 0/5] linux-user: brk/mmap fixes Akihiko Odaki
2023-07-31  8:03 ` [PATCH 1/5] linux-user: Unset MAP_FIXED_NOREPLACE for host Akihiko Odaki
2023-07-31 15:43   ` Richard Henderson
2023-07-31  8:03 ` [PATCH 2/5] linux-user: Do not call get_errno() in do_brk() Akihiko Odaki
2023-07-31 15:13   ` Helge Deller
2023-07-31  8:03 ` [PATCH 3/5] linux-user: Use MAP_FIXED_NOREPLACE for do_brk() Akihiko Odaki
2023-07-31 11:44   ` Peter Maydell
2023-07-31 13:32     ` Akihiko Odaki
2023-07-31 13:45       ` Peter Maydell
2023-07-31 15:44       ` Richard Henderson
2023-07-31  8:03 ` [PATCH 4/5] linux-user: Do nothing if too small brk is specified Akihiko Odaki
2023-07-31 15:16   ` Helge Deller
2023-07-31  8:03 ` [PATCH 5/5] linux-user: Do not align brk with host page size Akihiko Odaki
2023-07-31 15:51   ` Helge Deller
2023-07-31 22:43     ` Akihiko Odaki [this message]
2023-07-31  9:31 ` [PATCH 0/5] linux-user: brk/mmap fixes Michael Tokarev
2023-07-31 10:17   ` Joel Stanley
2023-07-31 10:23     ` Akihiko Odaki
2023-07-31 16:47       ` Helge Deller
2023-07-31 17:43         ` Helge Deller
2023-07-31 18:24           ` Helge Deller
2023-08-01  4:49             ` Joel Stanley
2023-08-01 10:43               ` Helge Deller
2023-08-02  7:26                 ` Akihiko Odaki
2023-08-02 10:37                   ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7e87f0-ae72-0147-3e8e-4d06b736b28c@daynix.com \
    --to=akihiko.odaki@daynix.com \
    --cc=deller@gmx.de \
    --cc=laurent@vivier.eu \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).