From: Igor Mammedov <imammedo@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: qemu-devel@nongnu.org, Eduardo Habkost <eduardo@habkost.net>,
"Michael S. Tsirkin" <mst@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
David Edmondson <david.edmondson@oracle.com>,
Alex Williamson <alex.williamson@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Ani Sinha <ani@anisinha.ca>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Subject: Re: [PATCH v5 4/5] i386/pc: relocate 4g start to 1T where applicable
Date: Tue, 28 Jun 2022 14:38:03 +0200 [thread overview]
Message-ID: <20220628143803.538bfe74@redhat.com> (raw)
In-Reply-To: <5a094bd6-ebc1-c512-e97e-c1edba94f41a@oracle.com>
On Mon, 20 Jun 2022 19:13:46 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:
> On 6/20/22 17:36, Joao Martins wrote:
> > On 6/20/22 15:27, Igor Mammedov wrote:
> >> On Fri, 17 Jun 2022 14:33:02 +0100
> >> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>> On 6/17/22 13:32, Igor Mammedov wrote:
> >>>> On Fri, 17 Jun 2022 13:18:38 +0100
> >>>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>> On 6/16/22 15:23, Igor Mammedov wrote:
> >>>>>> On Fri, 20 May 2022 11:45:31 +0100
> >>>>>> Joao Martins <joao.m.martins@oracle.com> wrote:
> >>>>>>> + hwaddr above_4g_mem_start,
> >>>>>>> + uint64_t pci_hole64_size)
> >>>>>>> +{
> >>>>>>> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> >>>>>>> + X86MachineState *x86ms = X86_MACHINE(pcms);
> >>>>>>> + MachineState *machine = MACHINE(pcms);
> >>>>>>> + ram_addr_t device_mem_size = 0;
> >>>>>>> + hwaddr base;
> >>>>>>> +
> >>>>>>> + if (!x86ms->above_4g_mem_size) {
> >>>>>>> + /*
> >>>>>>> + * 32-bit pci hole goes from
> >>>>>>> + * end-of-low-ram (@below_4g_mem_size) to IOAPIC.
> >>>>>>> + */
> >>>>>>> + return IO_APIC_DEFAULT_ADDRESS - 1;
> >>>>>>
> >>>>>> lack of above_4g_mem, doesn't mean absence of device_mem_size or anything else
> >>>>>> that's located above it.
> >>>>>>
> >>>>>
> >>>>> True. But the intent is to fix 32-bit boundaries as one of the qtests was failing
> >>>>> otherwise. We won't hit the 1T hole, hence a nop.
> >>>>
> >>>> I don't get the reasoning, can you clarify it pls?
> >>>>
> >>>
> >>> I was trying to say that what lead me here was a couple of qtests failures (from v3->v4).
> >>>
> >>> I was doing this before based on pci_hole64. phys-bits=32 was for example one
> >>> of the test failures, and pci-hole64 sits above what 32-bit can reference.
> >>
> >> if user sets phys-bits=32, then nothing above 4Gb should work (be usable)
> >> (including above-4g-ram, hotplug region or pci64 hole or sgx or cxl)
> >>
> >> and this doesn't look to me as AMD specific issue
> >>
> >> perhaps do a phys-bits check as a separate patch
> >> that will error out if max_used_gpa is above phys-bits limit
> >> (maybe at machine_done time)
> >> (i.e. defining max_gpa and checking if compatible with configured cpu
> >> are 2 different things)
> >>
> >> (it might be possible that tests need to be fixed too to account for it)
> >>
> >
> > My old notes (from v3) tell me with such a check these tests were exiting early thanks to
> > that error:
> >
> > 1/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/qom-test ERROR 0.07s
> > killed by signal 6 SIGABRT
> > 4/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/test-hmp ERROR 0.07s
> > killed by signal 6 SIGABRT
> > 7/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/boot-serial-test ERROR 0.07s
> > killed by signal 6 SIGABRT
> > 44/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/test-x86-cpuid-compat ERROR 0.09s
> > killed by signal 6 SIGABRT
> > 45/56 qemu:qtest+qtest-x86_64 / qtest-x86_64/numa-test ERROR 0.17s
> > killed by signal 6 SIGABRT
> >
> > But the real reason these fail is not at all related to CPU phys bits,
> > but because we just don't handle the case where no pci_hole64 is supposed to exist (which
> > is what that other check is trying to do) e.g. A VM with -m 1G would
> > observe the same thing i.e. the computations after that conditional are all for the pci
> > hole64, which acounts for SGX/CXL/hotplug or etc which consequently means it's *errousnly*
> > bigger than phys-bits=32 (by definition). So the error_report is just telling me that
> > pc_max_used_gpa() is just incorrect without the !x86ms->above_4g_mem_size check.
> >
> > If you're not fond of:
> >
> > + if (!x86ms->above_4g_mem_size) {
> > + /*
> > + * 32-bit pci hole goes from
> > + * end-of-low-ram (@below_4g_mem_size) to IOAPIC.
> > + */
> > + return IO_APIC_DEFAULT_ADDRESS - 1;
> > + }
> >
> > Then what should I use instead of the above?
> >
> > 'IO_APIC_DEFAULT_ADDRESS - 1' is the size of the 32-bit PCI hole, which is
> > also what is used for i440fx/q35 code. I could move it to a macro (e.g.
> > PCI_HOST_HOLE32_SIZE) to make it a bit readable and less hardcoded. Or
> > perhaps your problem is on !x86ms->above_4g_mem_size and maybe I should check
> > in addition for hotplug/CXL/etc existence?
> >
> >>>>> Unless we plan on using
> >>>>> pc_max_used_gpa() for something else other than this.
> >>>>
> >>>> Even if '!above_4g_mem_sizem', we can still have hotpluggable memory region
> >>>> present and that can hit 1Tb. The same goes for pci64_hole if it's configured
> >>>> large enough on CLI.
> >>>>
> >>> So hotpluggable memory seems to assume it sits above 4g mem.
> >>>
> >>> pci_hole64 likewise as it uses similar computations as hotplug.
> >>>
> >>> Unless I am misunderstanding something here.
> >>>
> >>>> Looks like guesstimate we could use is taking pci64_hole_end as max used GPA
> >>>>
> >>> I think this was what I had before (v3[0]) and did not work.
> >>
> >> that had been tied to host's phys-bits directly, all in one patch
> >> and duplicating existing pc_pci_hole64_start().
> >>
> >
> > Duplicating was sort of my bad attempt in this patch for pc_max_used_gpa()
> >
> > I was sort of thinking to something like extracting calls to start + size "tuple" into
> > functions -- e.g. for hotplug it is pc_get_device_memory_range() and for CXL it would be
> > maybe pc_get_cxl_range()) -- rather than assuming those values are already initialized on
> > the memory-region @base and its size.
> >
> > See snippet below. Note I am missing CXL handling, but gives you the idea.
> >
> > But it is slightly more complex than what I had in this version :( and would require
> > anyone doing changes in pc_memory_init() and pc_pci_hole64_start() to make sure it follows
> > the similar logic.
> >
>
> Ignore previous snippet, here's a slightly cleaner version:
lets go with this version
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 8eaa32ee2106..1d97c77a5eac 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -803,6 +803,43 @@ void xen_load_linux(PCMachineState *pcms)
> #define PC_ROM_ALIGN 0x800
> #define PC_ROM_SIZE (PC_ROM_MAX - PC_ROM_MIN_VGA)
>
> +static void pc_get_device_memory_range(PCMachineState *pcms,
> + hwaddr *base,
> + hwaddr *device_mem_size)
> +{
> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> + X86MachineState *x86ms = X86_MACHINE(pcms);
> + MachineState *machine = MACHINE(pcms);
> + hwaddr addr, size;
> +
> + if (pcmc->has_reserved_memory &&
> + machine->device_memory && machine->device_memory->base) {
> + addr = machine->device_memory->base;
> + size = memory_region_size(&machine->device_memory->mr);
> + goto out;
> + }
> +
> + /* uninitialized memory region */
> + size = machine->maxram_size - machine->ram_size;
> +
> + if (pcms->sgx_epc.size != 0) {
> + addr = sgx_epc_above_4g_end(&pcms->sgx_epc);
> + } else {
> + addr = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
> + }
> +
> + if (pcmc->enforce_aligned_dimm) {
> + /* size device region assuming 1G page max alignment per slot */
> + size += (1 * GiB) * machine->ram_slots;
> + }
> +
> +out:
> + if (base)
> + *base = addr;
> + if (device_mem_size)
> + *device_mem_size = size;
> +}
> +
> void pc_memory_init(PCMachineState *pcms,
> MemoryRegion *system_memory,
> MemoryRegion *rom_memory,
> @@ -864,7 +901,7 @@ void pc_memory_init(PCMachineState *pcms,
> /* initialize device memory address space */
> if (pcmc->has_reserved_memory &&
> (machine->ram_size < machine->maxram_size)) {
> - ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
> + ram_addr_t device_mem_size;
>
> if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
> error_report("unsupported amount of memory slots: %"PRIu64,
> @@ -879,20 +916,7 @@ void pc_memory_init(PCMachineState *pcms,
> exit(EXIT_FAILURE);
> }
>
> - if (pcms->sgx_epc.size != 0) {
> - machine->device_memory->base = sgx_epc_above_4g_end(&pcms->sgx_epc);
> - } else {
> - machine->device_memory->base =
> - x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
> - }
> -
> - machine->device_memory->base =
> - ROUND_UP(machine->device_memory->base, 1 * GiB);
> -
> - if (pcmc->enforce_aligned_dimm) {
> - /* size device region assuming 1G page max alignment per slot */
> - device_mem_size += (1 * GiB) * machine->ram_slots;
> - }
> + pc_get_device_memory_range(pcms, &machine->device_memory->base, &device_mem_size);
>
> if ((machine->device_memory->base + device_mem_size) <
> device_mem_size) {
> @@ -965,12 +989,13 @@ uint64_t pc_pci_hole64_start(void)
> PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> MachineState *ms = MACHINE(pcms);
> X86MachineState *x86ms = X86_MACHINE(pcms);
> - uint64_t hole64_start = 0;
> + uint64_t hole64_start = 0, size = 0;
>
> - if (pcmc->has_reserved_memory && ms->device_memory->base) {
> - hole64_start = ms->device_memory->base;
> + if (pcmc->has_reserved_memory &&
> + (ms->ram_size < ms->maxram_size)) {
> + pc_get_device_memory_range(pcms, &hole64_start, &size);
> if (!pcmc->broken_reserved_end) {
> - hole64_start += memory_region_size(&ms->device_memory->mr);
> + hole64_start += size;
> }
> } else if (pcms->sgx_epc.size != 0) {
> hole64_start = sgx_epc_above_4g_end(&pcms->sgx_epc);
>
next prev parent reply other threads:[~2022-06-28 12:42 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 10:45 [PATCH v5 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-05-20 10:45 ` [PATCH v5 1/5] hw/i386: add 4g boundary start to X86MachineState Joao Martins
2022-06-16 13:05 ` Igor Mammedov
2022-06-17 10:57 ` Joao Martins
2022-05-20 10:45 ` [PATCH v5 2/5] i386/pc: create pci-host qdev prior to pc_memory_init() Joao Martins
2022-06-16 13:21 ` Reviewed-by: Igor Mammedov
2022-06-17 11:03 ` Joao Martins
2022-06-20 7:12 ` Mark Cave-Ayland
2022-05-20 10:45 ` [PATCH v5 3/5] i386/pc: pass pci_hole64_size " Joao Martins
2022-06-16 13:30 ` Igor Mammedov
2022-06-16 14:16 ` Michael S. Tsirkin
2022-06-17 11:13 ` Joao Martins
2022-06-17 11:58 ` Igor Mammedov
2022-05-20 10:45 ` [PATCH v5 4/5] i386/pc: relocate 4g start to 1T where applicable Joao Martins
2022-06-16 14:23 ` Igor Mammedov
2022-06-17 12:18 ` Joao Martins
2022-06-17 12:32 ` Igor Mammedov
2022-06-17 13:33 ` Joao Martins
2022-06-20 14:27 ` Igor Mammedov
2022-06-20 16:36 ` Joao Martins
2022-06-20 18:13 ` Joao Martins
2022-06-28 12:38 ` Igor Mammedov [this message]
2022-06-28 15:27 ` Joao Martins
2022-06-17 16:12 ` Joao Martins
2022-05-20 10:45 ` [PATCH v5 5/5] i386/pc: restrict AMD only enforcing of valid IOVAs to new machine type Joao Martins
2022-06-16 14:27 ` Igor Mammedov
2022-06-17 13:36 ` Joao Martins
2022-06-08 10:37 ` [PATCH v5 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-06-22 22:37 ` Alex Williamson
2022-06-22 23:18 ` Joao Martins
2022-06-23 16:03 ` Alex Williamson
2022-06-23 17:13 ` Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220628143803.538bfe74@redhat.com \
--to=imammedo@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=ani@anisinha.ca \
--cc=daniel.m.jordan@oracle.com \
--cc=david.edmondson@oracle.com \
--cc=eduardo@habkost.net \
--cc=joao.m.martins@oracle.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.