All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: qemu-devel@nongnu.org, Eduardo Habkost <eduardo@habkost.net>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Ani Sinha <ani@anisinha.ca>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Subject: Re: [PATCH v6 09/10] i386/pc: relocate 4g start to 1T where applicable
Date: Thu, 14 Jul 2022 11:28:20 +0200	[thread overview]
Message-ID: <20220714112820.2cf526ea@redhat.com> (raw)
In-Reply-To: <addb6d9f-5d04-0280-a0b4-79edd888faeb@oracle.com>

On Tue, 12 Jul 2022 12:35:49 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:

> On 7/12/22 11:01, Joao Martins wrote:
> > On 7/12/22 10:06, Igor Mammedov wrote:  
> >> On Mon, 11 Jul 2022 21:03:28 +0100
> >> Joao Martins <joao.m.martins@oracle.com> wrote:  
> >>> On 7/11/22 16:31, Joao Martins wrote:  
> >>>> On 7/11/22 15:52, Joao Martins wrote:    
> >>>>> On 7/11/22 13:56, Igor Mammedov wrote:    
> >>>>>> On Fri,  1 Jul 2022 17:10:13 +0100
> >>>>>> Joao Martins <joao.m.martins@oracle.com> wrote:  
> >>>  void pc_memory_init(PCMachineState *pcms,
> >>>                      MemoryRegion *system_memory,
> >>>                      MemoryRegion *rom_memory,
> >>> @@ -897,6 +953,7 @@ void pc_memory_init(PCMachineState *pcms,
> >>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> >>>      X86MachineState *x86ms = X86_MACHINE(pcms);
> >>>      hwaddr cxl_base, cxl_resv_end = 0;
> >>> +    X86CPU *cpu = X86_CPU(first_cpu);
> >>>
> >>>      assert(machine->ram_size == x86ms->below_4g_mem_size +
> >>>                                  x86ms->above_4g_mem_size);
> >>> @@ -904,6 +961,29 @@ void pc_memory_init(PCMachineState *pcms,
> >>>      linux_boot = (machine->kernel_filename != NULL);
> >>>
> >>>      /*
> >>> +     * The HyperTransport range close to the 1T boundary is unique to AMD
> >>> +     * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
> >>> +     * to above 1T to AMD vCPUs only.
> >>> +     */
> >>> +    if (IS_AMD_CPU(&cpu->env) && x86ms->above_4g_mem_size) {  
> >>
> >> it has the same issue as pc_max_used_gpa(), i.e.
> >>   x86ms->above_4g_mem_size != 0
> >> doesn't mean that there isn't any memory above 4Gb nor that there aren't
> >> any MMIO (sgx/cxl/pci64hole), that's was the reason we were are considering
> >> max_used_gpa
> >> I'd prefer to keep pc_max_used_gpa(),
> >> idea but make it work for above cases and be more generic (i.e. not to be
> >> tied to AMD only) since 'pc_max_used_gpa() < physbits'
> >> applies to equally
> >> to AMD and Intel (and to trip it, one just have to configure small enough
> >> physbits or large enough hotpluggable RAM/CXL/PCI64HOLE)
> >>  
> > I can reproduce the issue you're thinking with basic memory hotplug.   
> 
> I was mislead by a bug that only existed in v6. Which I fixed now.
> So any bug possibility with hotplug, SGX and CXL, or pcihole64 is simply covered with:
> 
> 	pc_pci_hole64_start() + pci_hole64_size;
> 
> which is what pc_max_used_gpa() does. This works fine /without/ above_4g_mem_size != 0
> check even without above_4g_mem_size (e.g. mem=2G,maxmem=1024G).
> 
> And as a reminder: SGX, hotplug, CXL and pci-hole64 *require* memory above 4G[*]. And part
> of the point of us moving to pc_pci_hole64_start() was to make these all work in a generic
> way.
> 
> So I've removed the x86ms->above_4g_mem_size != 0 check. Current patch diff pasted at the end.
> 
> [*] As reiterated here:
> 
> > Let me see
> > what I can come up in pc_max_used_gpa() to cover this one. I'll respond here with a proposal.
> >   
> 
> I was over-complicating things here. It turns out nothing else is needed aside in the
> context of 1T hole.
> 
> This is because I only need to check address space limits (as consequence of
> pc_set_amd_above_4g_mem_start()) when pc_max_used_gpa() surprasses HT_START. Which
> requires fundamentally a value closer to 1T well beyond what 32-bit can cover. So on
> 32-bit guests this is never true and thus it things don't change behaviour from current
> default for these guests. And thus I won't break qtests and things fail correctly in the
> right places.
> 
> Now I should say that pc_max_used_gpa() is still not returning the accurate 32-bit guest
> max used GPA value, given that I return pci hole64 end (essentially). Do you still that
> addressed out of correctness even if it doesn't matter much for the 64-bit 1T case?
> 
> If so, our only option seems to be to check phys_bits <= 32 and return max CPU
> boundary there? Unless you have something enterily different in mind?
> 
> > I would really love to have v7.1.0 with this issue fixed but I am not very
> > confident it is going to make it :(
> > 
> > Meanwhile, let me know if you have thoughts on this one:
> > 
> > https://lore.kernel.org/qemu-devel/1b2fa957-74f6-b5a9-3fc1-65c5d68300ce@oracle.com/
> > 
> > I am going to assume that if no comments on the above that I'll keep things as is.
> > 
> > And also, whether I can retain your ack with Bernhard's suggestion here:
> > 
> > https://lore.kernel.org/qemu-devel/0eefb382-4ac6-4335-ca61-035babb95a88@oracle.com/
> >   
> 
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 668e15c8f2a6..45433cc53b5b 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -881,6 +881,67 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
>      return start;
>  }
> 
> +static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
> +{
> +    return pc_pci_hole64_start() + pci_hole64_size;
> +}
> +
> +/*
> + * AMD systems with an IOMMU have an additional hole close to the
> + * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
> + * on kernel version, VFIO may or may not let you DMA map those ranges.
> + * Starting Linux v5.4 we validate it, and can't create guests on AMD machines
> + * with certain memory sizes. It's also wrong to use those IOVA ranges
> + * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
> + * The ranges reserved for Hyper-Transport are:
> + *
> + * FD_0000_0000h - FF_FFFF_FFFFh
> + *
> + * The ranges represent the following:
> + *
> + * Base Address   Top Address  Use
> + *
> + * FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
> + * FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
> + * FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
> + * FD_F910_0000h FD_F91F_FFFFh System Management
> + * FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
> + * FD_FB00_0000h FD_FBFF_FFFFh Address Translation
> + * FD_FC00_0000h FD_FDFF_FFFFh I/O Space
> + * FD_FE00_0000h FD_FFFF_FFFFh Configuration
> + * FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
> + * FE_2000_0000h FF_FFFF_FFFFh Reserved
> + *
> + * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
> + * Table 3: Special Address Controls (GPA) for more information.
> + */
> +#define AMD_HT_START         0xfd00000000UL
> +#define AMD_HT_END           0xffffffffffUL
> +#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
> +#define AMD_HT_SIZE          (AMD_ABOVE_1TB_START - AMD_HT_START)
> +
> +static void pc_set_amd_above_4g_mem_start(PCMachineState *pcms,
> +                                          uint64_t pci_hole64_size)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
> +    hwaddr maxphysaddr, maxusedaddr;
> +
> +    /*
> +     * Relocating ram-above-4G requires more than TCG_PHYS_ADDR_BITS (40).
> +     * So make sure phys-bits is required to be appropriately sized in order
> +     * to proceed with the above-4g-region relocation and thus boot.
> +     */
> +    x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;
> +    maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size);
> +    maxphysaddr = ((hwaddr)1 << X86_CPU(first_cpu)->phys_bits) - 1;
> +    if (maxphysaddr < maxusedaddr) {
> +        error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
> +                     " phys-bits too low (%u) cannot avoid AMD HT range",
> +                     maxphysaddr, maxusedaddr, X86_CPU(first_cpu)->phys_bits);
> +        exit(EXIT_FAILURE);
> +    }
> +}
> +
>  void pc_memory_init(PCMachineState *pcms,
>                      MemoryRegion *system_memory,
>                      MemoryRegion *rom_memory,
> @@ -896,6 +957,7 @@ void pc_memory_init(PCMachineState *pcms,
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>      X86MachineState *x86ms = X86_MACHINE(pcms);
>      hwaddr cxl_base, cxl_resv_end = 0;
> +    X86CPU *cpu = X86_CPU(first_cpu);
> 
>      assert(machine->ram_size == x86ms->below_4g_mem_size +
>                                  x86ms->above_4g_mem_size);
> @@ -903,6 +965,27 @@ void pc_memory_init(PCMachineState *pcms,
>      linux_boot = (machine->kernel_filename != NULL);
> 
>      /*
> +     * The HyperTransport range close to the 1T boundary is unique to AMD
> +     * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
> +     * to above 1T to AMD vCPUs only.
> +     */
> +    if (IS_AMD_CPU(&cpu->env)) {
> +        /* Bail out if max possible address does not cross HT range */
> +        if (pc_max_used_gpa(pcms, pci_hole64_size) >= AMD_HT_START) {
> +            pc_set_amd_above_4g_mem_start(pcms, pci_hole64_size);

I'd replace call with 
   x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;

> +        }
> +
> +        /*
> +         * Advertise the HT region if address space covers the reserved
> +         * region or if we relocate.
> +         */
> +        if (x86ms->above_4g_mem_start == AMD_ABOVE_1TB_START ||
> +            cpu->phys_bits >= 40) {
> +            e820_add_entry(AMD_HT_START, AMD_HT_SIZE, E820_RESERVED);
> +        }
> +    }

and then here check that pc_max_used_gpa() fits into phys_bits
which should cover AMD case and case where pci64_hole goes beyond 
supported address range even without 1TB hole

> +
> +    /*
>       * Split single memory region and use aliases to address portions of it,
>       * done for backwards compatibility with older qemus.
>       */
> 



  reply	other threads:[~2022-07-14  9:43 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-01 16:10 [PATCH v6 00/10] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-07-01 16:10 ` [PATCH v6 01/10] hw/i386: add 4g boundary start to X86MachineState Joao Martins
2022-07-01 16:10 ` [PATCH v6 02/10] i386/pc: create pci-host qdev prior to pc_memory_init() Joao Martins
2022-07-01 16:10 ` [PATCH v6 03/10] i386/pc: pass pci_hole64_size " Joao Martins
2022-07-09 20:51   ` B
2022-07-11 10:01     ` Joao Martins
2022-07-11 22:17       ` B
2022-07-12  9:27         ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 04/10] i386/pc: factor out above-4g end to an helper Joao Martins
2022-07-07 12:42   ` Igor Mammedov
2022-07-07 15:14     ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 05/10] i386/pc: factor out cxl range end to helper Joao Martins
2022-07-07 12:57   ` Igor Mammedov
2022-07-07 15:17     ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 06/10] i386/pc: factor out cxl range start " Joao Martins
2022-07-07 13:00   ` Igor Mammedov
2022-07-07 15:18     ` Joao Martins
2022-07-11 12:47       ` Igor Mammedov
2022-07-11 14:28         ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 07/10] i386/pc: handle unitialized mr in pc_get_cxl_range_end() Joao Martins
2022-07-07 13:05   ` Igor Mammedov
2022-07-07 15:21     ` Joao Martins
2022-07-11 12:58       ` Igor Mammedov
2022-07-11 14:32         ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 08/10] i386/pc: factor out device_memory base/size to helper Joao Martins
2022-07-07 13:15   ` Igor Mammedov
2022-07-07 15:23     ` Joao Martins
2022-07-01 16:10 ` [PATCH v6 09/10] i386/pc: relocate 4g start to 1T where applicable Joao Martins
2022-07-07 15:53   ` Joao Martins
2022-07-11 12:56   ` Igor Mammedov
2022-07-11 14:52     ` Joao Martins
2022-07-11 15:31       ` Joao Martins
2022-07-11 20:03         ` Joao Martins
2022-07-12  9:06           ` Igor Mammedov
2022-07-12 10:01             ` Joao Martins
2022-07-12 10:21               ` Joao Martins
2022-07-12 11:35               ` Joao Martins
2022-07-14  9:28                 ` Igor Mammedov [this message]
2022-07-14  9:54                   ` Joao Martins
2022-07-14 10:47                     ` Joao Martins
2022-07-14 11:50                       ` Igor Mammedov
2022-07-14 15:39                         ` Joao Martins
2022-07-14  9:30               ` Igor Mammedov
2022-07-01 16:10 ` [PATCH v6 10/10] i386/pc: restrict AMD only enforcing of valid IOVAs to new machine type Joao Martins
2022-07-04 14:27   ` Dr. David Alan Gilbert
2022-07-05  8:48     ` Joao Martins
2022-07-11 13:03   ` Igor Mammedov
2022-07-11 14:56     ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220714112820.2cf526ea@redhat.com \
    --to=imammedo@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=dgilbert@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=joao.m.martins@oracle.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=suravee.suthikulpanit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.