From: David Hildenbrand <david@redhat.com>
To: Michal Privoznik <mprivozn@redhat.com>, qemu-devel@nongnu.org
Cc: imammedo@redhat.com
Subject: Re: [PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete()
Date: Mon, 27 Nov 2023 14:55:15 +0100 [thread overview]
Message-ID: <b9c8b4da-cb14-4a67-b98d-655ed7348bef@redhat.com> (raw)
In-Reply-To: <9b8a2863-1264-4058-b367-0b61a75921ac@redhat.com>
On 27.11.23 14:37, David Hildenbrand wrote:
> On 27.11.23 13:32, Michal Privoznik wrote:
>> Simple reproducer:
>> qemu.git $ ./build/qemu-system-x86_64 \
>> -m size=8389632k,slots=16,maxmem=25600000k \
>> -object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/hugepages2M/","prealloc":true,"size":8590983168,"host-nodes":[0],"policy":"bind"}' \
>> -numa node,nodeid=0,cpus=0,memdev=ram-node0
>>
>> With current master I get:
>>
>> qemu-system-x86_64: cannot bind memory to host NUMA nodes: Invalid argument
>>
>> The problem is that memory size (8193MiB) is not an integer
>> multiple of underlying pagesize (2MiB) which triggers a check
>> inside of madvise(), since we can't really set a madvise() policy
>> just to a fraction of a page.
>
> I thought we would just always fail create something that doesn't really
> make any sense.
>
> Why would we want to support that case?
>
> Let me dig, I thought we would have had some check there at some point
> that would make that fail (especially: RAM block not aligned to the
> pagesize).
At least memory-backend-memfd properly fails for that case:
$ ./build/qemu-system-x86_64 -object memory-backend-memfd,hugetlb=on,size=3m,id=tmp
qemu-system-x86_64: failed to resize memfd to 3145728: Invalid argument
memory-backend-file ends up creating a new file:
$ ./build/qemu-system-x86_64 -object memory-backend-file,share=on,mem-path=/dev/hugepages/tmp,size=3m,id=tmp
$ stat /dev/hugepages/tmp
File: /dev/hugepages/tmp
Size: 4194304 Blocks: 0 IO Block: 2097152 regular file
... and ends up sizing it properly aligned to the huge page size.
Seems to be due to:
if (memory < block->page_size) {
error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
"or larger than page size 0x%zx",
memory, block->page_size);
return NULL;
}
memory = ROUND_UP(memory, block->page_size);
/*
* ftruncate is not supported by hugetlbfs in older
* hosts, so don't bother bailing out on errors.
* If anything goes wrong with it under other filesystems,
* mmap will fail.
*
* Do not truncate the non-empty backend file to avoid corrupting
* the existing data in the file. Disabling shrinking is not
* enough. For example, the current vNVDIMM implementation stores
* the guest NVDIMM labels at the end of the backend file. If the
* backend file is later extended, QEMU will not be able to find
* those labels. Therefore, extending the non-empty backend file
* is disabled as well.
*/
if (truncate && ftruncate(fd, offset + memory)) {
perror("ftruncate");
}
So we create a bigger file and map the bigger file and also have a
RAMBlock that is bigger. So we'll also consume more memory.
... but the memory region is smaller and we tell the VM that it has
less memory. Lot of work with no obvious benefit, and only some
memory waste :)
We better should have just rejected such memory backends right from
the start. But now it's likely too late.
I suspect other things like
* qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
* qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
fail, but we don't care for hugetlb at least regarding merging
and don't even log an error.
But QEMU_MADV_DONTDUMP might also be broken, because that
qemu_madvise() call will just fail.
Your fix would be correct. But I do wonder if we want to just let that
case fail and warn users that they are doing something that doesn't
make too much sense.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-11-27 13:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-27 12:32 [PATCH] hostmem: Round up memory size for qemu_madvise() in host_memory_backend_memory_complete() Michal Privoznik
2023-11-27 13:37 ` David Hildenbrand
2023-11-27 13:55 ` David Hildenbrand [this message]
2023-12-01 9:07 ` Michal Prívozník
2023-12-04 9:21 ` David Hildenbrand
2023-12-04 14:12 ` Michal Prívozník
2023-12-04 18:38 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9c8b4da-cb14-4a67-b98d-655ed7348bef@redhat.com \
--to=david@redhat.com \
--cc=imammedo@redhat.com \
--cc=mprivozn@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).