From: Steven Sistare <steven.sistare@oracle.com>
To: David Hildenbrand <david@redhat.com>, qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
Igor Mammedov <imammedo@redhat.com>,
Joao Martins <joao.m.martins@oracle.com>,
Gavin Shan <gshan@redhat.com>
Subject: Re: [PATCH] util/mmap: optimize qemu_ram_mmap() alignment
Date: Tue, 11 Apr 2023 10:39:53 -0400 [thread overview]
Message-ID: <c3793343-8fc7-7399-3629-43ed3456feac@oracle.com> (raw)
In-Reply-To: <5657e1e4-ab20-f0e9-a9d5-7b91aece1459@redhat.com>
On 4/11/2023 3:57 AM, David Hildenbrand wrote:
> On 10.04.23 17:46, Steve Sistare wrote:
>> Guest RAM created with memory-backend-memfd is aligned to a
>> QEMU_VMALLOC_ALIGN=2M boundary, and memory-backend-memfd does not support
>> the "align" parameter to change the default. This is sub-optimal on
>> aarch64 kernels with a 64 KB page size and 512 MB huge page size, as the
>> range will not be backed by huge pages. Moreover, any shared allocation
>> using qemu_ram_mmap() will be sub-optimal on such a system if the align
>> parameter is less than 512 MB.
>>
>> The kernel is smart enough to return a hugely aligned pointer for MAP_SHARED
>> mappings when /sys/kernel/mm/transparent_hugepage/shmem_enabled allows it.
>> However, the qemu function qemu_ram_mmap() mmap's twice to perform its own
>> alignment:
>>
>> guardptr = mmap(0, total, PROT_NONE, flags, ...
>> flags |= shared ? MAP_SHARED : MAP_PRIVATE;
>> ptr = mmap(guardptr + offset, size, prot, flags | map_sync_flags, ...
>>
>> On the first call, flags has MAP_PRIVATE, hence the kernel does not apply
>> its shared memory policy, and returns a non-huge-aligned guardptr.
>>
>> To fix, for shared mappings, pass MAP_SHARED to both mmap calls.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> util/mmap-alloc.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
>> index 5ed7d29..37a0d1e 100644
>> --- a/util/mmap-alloc.c
>> +++ b/util/mmap-alloc.c
>> @@ -121,7 +121,7 @@ static bool map_noreserve_effective(int fd, uint32_t qemu_map_flags)
>> * Reserve a new memory region of the requested size to be used for mapping
>> * from the given fd (if any).
>> */
>> -static void *mmap_reserve(size_t size, int fd)
>> +static void *mmap_reserve(size_t size, int fd, int final_flags)
>> {
>> int flags = MAP_PRIVATE;
>> @@ -144,6 +144,7 @@ static void *mmap_reserve(size_t size, int fd)
>> #else
>> fd = -1;
>> flags |= MAP_ANONYMOUS;
>> + flags |= final_flags & MAP_SHARED;
>
> Setting both, MAP_PRIVATE and MAP_SHARED sure sounds very wrong.
Yes, thanks. I introduced that mistake when I ported the fix from an earlier qemu that did not
have mmap_reserve. Should be:
fd = -1;
flags = MAP_ANONYMOUS;
flags |= final_flags & (MAP_SHARED | MAP_PRIVATE);
> The traditional way to handle that is via QEMU_VMALLOC_ALIGN.
>
> I think you'd actually want to turn QEMU_VMALLOC_ALIGN into a function that is able to special-case like hw/virtio/virtio-mem.c:virtio_mem_default_thp_size() does for "qemu_real_host_page_size() == 64 * KiB".
>
> If you set QEMU_VMALLOC_ALIGN properly, you'll also have any DIMMs get that alignment in guest physical address space.
If we increase QEMU_VMALLOC_ALIGN, then all allocations will be 512MB aligned. If we make many small
allocations, we could conceivably run out of VA space. Further, the processor may use low order
address bits in internal hashes, and now offset X in every allocated range will have the same value for
the low 29 bits, possibly causing more collisions and reducing performance. Further, the huge alignment
is not even needed if huge pages for shmem have been disabled in sysconfig.
We could avoid that by adding logic to also consider allocation size when choosing alignment, and
checking sysconfig tunables. Or, we can just let the kernel do so by telling it the truth about
memory flavor when calling mmap.
- Steve
next prev parent reply other threads:[~2023-04-11 14:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-10 15:46 [PATCH] util/mmap: optimize qemu_ram_mmap() alignment Steve Sistare
2023-04-11 7:57 ` David Hildenbrand
2023-04-11 14:39 ` Steven Sistare [this message]
2023-04-11 18:31 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c3793343-8fc7-7399-3629-43ed3456feac@oracle.com \
--to=steven.sistare@oracle.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).