qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ani Sinha <anisinha@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Xiao Guangrong" <xiaoguangrong.eric@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [PATCH] mem/x86: add processor address space check for VM memory
Date: Tue, 12 Sep 2023 17:34:42 +0200	[thread overview]
Message-ID: <30f0ddfb-6eb7-84a5-04a0-e11905451733@redhat.com> (raw)
In-Reply-To: <3A287C52-F547-4494-B803-8CFC50CBA175@redhat.com>

[...]

>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 54838c0c41..d187890675 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
>> {
>>      X86CPU *cpu = X86_CPU(first_cpu);
>>
>> -    /* 32-bit systems don't have hole64 thus return max CPU address */
>> -    if (cpu->phys_bits <= 32) {
>> -        return ((hwaddr)1 << cpu->phys_bits) - 1;
>> +    /*
>> +     * 32-bit systems don't have hole64, but we might have a region for
>> +     * memory hotplug.
>> +     */
>> +    if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
>> +        return pc_pci_hole64_start() - 1;
> 
> Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I have a few questions …
> (a) pc_get_device_memory_range() returns the size of the device memory as the difference between ram_size and maxram_size. But from what I understand, ram_size is the actual size of the ram present and maxram_size is the max size of ram *after* hot plugging additional memory. How can we assume that the additional available space is already occupied by hot plugged memory?

Let's take a look at an example:

$ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
   -object memory-backend-ram,id=mem0,size=1g \
   -device pc-dimm,memdev=mem0 \
   -nodefaults -nographic -S -monitor stdio

(qemu) info mtree
...
memory-region: system
   0000000000000000-ffffffffffffffff (prio 0, i/o): system
     0000000000000000-00000000bfffffff (prio 0, ram): alias ram-below-4g @pc.ram 0000000000000000-00000000bfffffff
     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
       00000000000e0000-00000000000fffff (prio 1, rom): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff
     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff
     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff
     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff
     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff
     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff
     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff
     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff
     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff
     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff
     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff
     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff
     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff
     00000000fec00000-00000000fec00fff (prio 0, i/o): ioapic
     00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
     00000000fee00000-00000000feefffff (prio 4096, i/o): apic-msi
     0000000100000000-000000023fffffff (prio 0, ram): alias ram-above-4g @pc.ram 00000000c0000000-00000001ffffffff
     0000000240000000-000000047fffffff (prio 0, i/o): device-memory
       0000000240000000-000000027fffffff (prio 0, ram): mem0


We requested 8G of boot memory, which is split between "<4G" memory and ">=4G" memory.

We only place exactly 3G (0x0->0xbfffffff) under 4G, starting at address 0.

We leave the remainder (1G) of the <4G addresses available for I/O devices (32bit PCI hole).

So we end up with 5G (0x100000000->0x23fffffff) of memory starting exactly at address 4G.

"maxram_size - ram_size"=8G is the maximum amount of memory you can hotplug. We use it to size the
"device-memory" region:

0x47fffffff - 0x240000000+1 = 0x240000000
-> 9 GiB

We requested a to hotplug a maximum of "8 GiB", and sized the area slightly larger to allow for some flexibility
when it comes to placing DIMMs in that "device-memory" area.

We place that area for memory devices after the RAM. So it starts after the 5G of ">=4G" boot memory.


Long story short, based on the initial RAM size and the maximum RAM size, you
can construct the layout above and exactly know
a) How much memory is below 4G, starting at address 0 -> leaving 1G for the 32bit PCI hole
b) How much memory is above 4G, starting at address 4g.
c) Where the region for memory devices starts (aligned after b) ) and how big it is.
d) Where the 64bit PCI hole is (after c) )

> (b) Another question is, in pc_pci_hole64_start(), why are we adding this size to the start address?
> 
> } else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
> 	pc_get_device_memory_range(pcms, &hole64_start, &size);
>          if (!pcmc->broken_reserved_end) {
>              hole64_start += size;

The 64bit PCI hole starts after "device-memory" above.

Apparently, we have to take care of some layout issues before QEMU 2.5. You can assume that nowadays,
"pcmc->broken_reserved_end" is never set. So the PCI64 hole is always after the device-memory region.

> 
> I think this is trying to put the hole after the device memory. But if the ram size is <=maxram_size then the hole is after the above_4G memory? Why?

I didn't quit get what the concern is, can you elaborate?

> 
> (c) in your above change, what does long mode have anything to do with all of this?

According to my understanding, 32bit (i386) doesn't have a 64bit hole. And 32bit vs.
64bit (i386 vs. x86_64) is decided based on LM, not on the address bits (as we learned, PSE36, and PAE).

But really, I just did what x86_cpu_realizefn() does to decide 32bit vs. 64bit ;)

     /* For 64bit systems think about the number of physical bits to present.
      * ideally this should be the same as the host; anything other than matching
      * the host can cause incorrect guest behaviour.
      * QEMU used to pick the magic value of 40 bits that corresponds to
      * consumer AMD devices but nothing else.
      *
      * Note that this code assumes features expansion has already been done
      * (as it checks for CPUID_EXT2_LM), and also assumes that potential
      * phys_bits adjustments to match the host have been already done in
      * accel-specific code in cpu_exec_realizefn.
      */
     if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
     ...
     } else {
         /* For 32 bit systems don't use the user set value, but keep
          * phys_bits consistent with what we tell the guest.
          */
     ...


But that was just my quick attempt at fixing pc_max_used_gpa().

*Maybe* there is a 64bit PCI hole on 32bit i386 with 36bit addresses?

I'm the wrong person to ask, but I kind-of doubt it. :)


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-09-12 15:36 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08  9:50 [PATCH] mem/x86: add processor address space check for VM memory Ani Sinha
2023-09-08 10:28 ` David Hildenbrand
2023-09-08 14:12   ` Ani Sinha
2023-09-08 14:16     ` David Hildenbrand
2023-09-08 15:13       ` Ani Sinha
2023-09-08 16:02         ` David Hildenbrand
2023-09-08 16:04           ` David Hildenbrand
2023-09-12 10:41           ` Ani Sinha
2023-09-12 15:34             ` David Hildenbrand [this message]
2023-09-14  5:53               ` Ani Sinha
2023-09-14  8:37                 ` David Hildenbrand
2023-09-14 11:21                   ` Ani Sinha
2023-09-14 11:49                     ` David Hildenbrand
2023-09-15 10:38                       ` Ani Sinha
2023-09-18  9:33                         ` David Hildenbrand
2023-09-18 10:07                           ` Ani Sinha
2023-09-18 10:09                             ` David Hildenbrand
2023-09-18 10:11                               ` Ani Sinha
2023-09-18 10:14                                 ` Ani Sinha
2023-09-18 10:19                                 ` David Hildenbrand
2023-09-18 10:54                                   ` Ani Sinha
2023-09-18 10:58                                     ` David Hildenbrand
2023-09-18 11:00                                       ` Ani Sinha
2023-09-18 11:02                                         ` Ani Sinha
2023-09-18 11:02                                         ` David Hildenbrand
2023-09-18 11:04                                           ` Ani Sinha
2023-09-14 17:11                   ` Ani Sinha
2023-09-16  5:17                   ` Ani Sinha
2023-09-08 16:04         ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30f0ddfb-6eb7-84a5-04a0-e11905451733@redhat.com \
    --to=david@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=wangyanan55@huawei.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).