qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ani Sinha <ani@anisinha.ca>
Cc: Igor Mammedov <imammedo@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: 9 TiB vm memory creation
Date: Tue, 15 Feb 2022 10:44:04 +0100	[thread overview]
Message-ID: <86b5c589-c1d2-bd2b-12e4-9bec25d3a9ef@redhat.com> (raw)
In-Reply-To: <CAARzgwzd-p-GLOQ-VtBC2_-fd1=fg2rZU7t9XhVA1QSUe1vT0A@mail.gmail.com>

On 15.02.22 10:40, Ani Sinha wrote:
> On Tue, Feb 15, 2022 at 2:08 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 15.02.22 09:12, Ani Sinha wrote:
>>> On Tue, Feb 15, 2022 at 1:25 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 15.02.22 08:00, Ani Sinha wrote:
>>>>>
>>>>>
>>>>> On Mon, 14 Feb 2022, David Hildenbrand wrote:
>>>>>
>>>>>> On 14.02.22 13:36, Igor Mammedov wrote:
>>>>>>> On Mon, 14 Feb 2022 10:54:22 +0530 (IST)
>>>>>>> Ani Sinha <ani@anisinha.ca> wrote:
>>>>>>>
>>>>>>>> Hi Igor:
>>>>>>>>
>>>>>>>> I failed to spawn a 9 Tib VM. The max I could do was a 2 TiB vm on my
>>>>>>>> system with the following commandline before either the system
>>>>>>>> destabilized or the OOM killed killed qemu
>>>>>>>>
>>>>>>>> -m 2T,maxmem=9T,slots=1 \
>>>>>>>> -object memory-backend-file,id=mem0,size=2T,mem-path=/data/temp/memfile,prealloc=off \
>>>>>>>> -machine memory-backend=mem0 \
>>>>>>>> -chardev file,path=/tmp/debugcon2.txt,id=debugcon \
>>>>>>>> -device isa-debugcon,iobase=0x402,chardev=debugcon \
>>>>>>>>
>>>>>>>> I have attached the debugcon output from 2 TiB vm.
>>>>>>>> Is there any other commandline parameters or options I should try?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> ani
>>>>>>>
>>>>>>> $ truncate -s 9T 9tb_sparse_disk.img
>>>>>>> $ qemu-system-x86_64 -m 9T \
>>>>>>>   -object memory-backend-file,id=mem0,size=9T,mem-path=9tb_sparse_disk.img,prealloc=off,share=on \
>>>>>>>   -machine memory-backend=mem0
>>>>>>>
>>>>>>> works for me till GRUB menu, with sufficient guest kernel
>>>>>>> persuasion (i.e. CLI limit ram size to something reasonable) you can boot linux
>>>>>>> guest on it and inspect SMBIOS tables comfortably.
>>>>>>>
>>>>>>>
>>>>>>> With KVM enabled it bails out with:
>>>>>>>    qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=1, start=0x100000000, size=0x8ff40000000: Invalid argument
>>>>>>>
>>>>>
>>>>> I have seen this in my system but not always. Maybe I should have dug
>>>>> deeper as to why i do see this all the time.
>>>>>
>>>>>>> all of that on a host with 32G of RAM/no swap.
>>>>>>>
>>>>>
>>>>> My system in 16 Gib of main memory, no swap.
>>>>>
>>>>>>
>>>>>> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
>>>>>>
>>>>>> ~8 TiB (7,999999)
>>>>>
>>>>> That's not 8 Tib, thats 2 GiB. But yes, 0x8ff40000000 is certainly greater
>>>>> than 2 Gib * 4K (assuming 4K size pages).
>>>>
>>>> "pages" don't carry the unit "GiB/TiB", so I was talking about the
>>>> actual size with 4k pages (your setup, I assume)
>>>
>>> yes I got that after reading your email again.
>>> The interesting question now is how is redhat QE running 9 TiB vm with kvm?
>>
>> As already indicated by me regarding s390x only having single large NUMA
>> nodes, x86 is usually using multiple NUMA nodes with such large memory.
>> And QE seems to be using virtual numa nodes:
>>
>> Each of the 32 virtual numa nodes receive a:
>>
>>   -object memory-backend-ram,id=ram-node20,size=309237645312,host-
>>    nodes=0-31,policy=bind
>>
>> which results in a dedicated KVM memslot (just like each DIMM would)
>>
>>
>> 32 * 309237645312 == 9 TiB :)
> 
> ah, I should have looked closely at the other commandlines before
> shooting off the email. Yes the limitation is per mem-slot and they
> have 32 slots one per node.
> ok so should we do
> kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
> from i386 kvm_arch_init()?


As I said, I'm not a friend of these workarounds in user space.

Assume you have one KVM memslot left and you hotplug a huge DIMM that
will consume more than one KVM memslot -- you're in trouble, because
hotplug will succeed but creating the second memslot will fail. So you
need additional logic in memory device code to special-case on these
corner cases.

We should try increasing the limit in KVM and handle it gracefully in
QEMU. But that's just my 2 cents.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-02-15 10:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.DEB.2.22.394.2202141048390.13781@anisinha-lenovo>
2022-02-14 12:36 ` 9 TiB vm memory creation Igor Mammedov
2022-02-14 14:37   ` David Hildenbrand
2022-02-14 15:55     ` Igor Mammedov
2022-02-14 16:32       ` David Hildenbrand
2022-02-15  7:00     ` Ani Sinha
2022-02-15  7:11       ` Ani Sinha
2022-02-15  7:29       ` Ani Sinha
2022-02-15  7:55       ` David Hildenbrand
2022-02-15  8:12         ` Ani Sinha
2022-02-15  8:38           ` David Hildenbrand
2022-02-15  9:40             ` Ani Sinha
2022-02-15  9:44               ` David Hildenbrand [this message]
2022-02-15  9:48                 ` Ani Sinha
2022-02-15  9:51                   ` David Hildenbrand
2022-02-15 10:44             ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86b5c589-c1d2-bd2b-12e4-9bec25d3a9ef@redhat.com \
    --to=david@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).