From: Joao Martins <joao.m.martins@oracle.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
qemu-devel@nongnu.org, Daniel Jordan <daniel.m.jordan@oracle.com>,
David Edmondson <david.edmondson@oracle.com>,
Alex Williamson <alex.williamson@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Ani Sinha <ani@anisinha.ca>,
Igor Mammedov <imammedo@redhat.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Subject: Re: [PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable
Date: Wed, 23 Feb 2022 23:35:56 +0000 [thread overview]
Message-ID: <5fee0e05-e4d1-712b-9ad1-f009aba431ea@oracle.com> (raw)
In-Reply-To: <20220223161744-mutt-send-email-mst@kernel.org>
On 2/23/22 21:22, Michael S. Tsirkin wrote:
> On Wed, Feb 23, 2022 at 06:44:53PM +0000, Joao Martins wrote:
>> It is assumed that the whole GPA space is available to be DMA
>> addressable, within a given address space limit, expect for a
>> tiny region before the 4G. Since Linux v5.4, VFIO validates
>> whether the selected GPA is indeed valid i.e. not reserved by
>> IOMMU on behalf of some specific devices or platform-defined
>> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
>> -EINVAL.
>>
>> AMD systems with an IOMMU are examples of such platforms and
>> particularly may only have these ranges as allowed:
>>
>> 0000000000000000 - 00000000fedfffff (0 .. 3.982G)
>> 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
>> 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*])
>>
>> We already account for the 4G hole, albeit if the guest is big
>> enough we will fail to allocate a guest with >1010G due to the
>> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).
>
> Could you point me to which driver then reserves the
> other regions on Linux for AMD platforms?
>
It's two regions only. The 4G hole which its use is the same use as AMD[0]/Intel[1],
and part of that hole is the IOMMU MSI reserved range. And the 1T hole, is reserved
for HyperTransport[2]. This is hardware behaviour, so drivers just mark them reserved
and avoid using those at all.
[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd/iommu.c#n2203
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/intel/iommu.c#n5328
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd/iommu.c#n2210
Now for the 256T on AMD, it isn't reserved anywhere and the only code reference that I can
give you is KVM selftests that had issues before[4] fixed by Paolo. The errata also gives
a glimpse[3].
[3] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
[4]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c8cc43c1eae2910ac96daa4216e0fb3391ad0504
>> +/*
>> + * AMD systems with an IOMMU have an additional hole close to the
>> + * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
>> + * on kernel version, VFIO may or may not let you DMA map those ranges.
>> + * Starting Linux v5.4 we validate it, and can't create guests on AMD machines
>> + * with certain memory sizes. It's also wrong to use those IOVA ranges
>> + * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
>> + * The ranges reserved for Hyper-Transport are:
>> + *
>> + * FD_0000_0000h - FF_FFFF_FFFFh
>> + *
>> + * The ranges represent the following:
>> + *
>> + * Base Address Top Address Use
>> + *
>> + * FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
>> + * FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
>> + * FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
>> + * FD_F910_0000h FD_F91F_FFFFh System Management
>> + * FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
>> + * FD_FB00_0000h FD_FBFF_FFFFh Address Translation
>> + * FD_FC00_0000h FD_FDFF_FFFFh I/O Space
>> + * FD_FE00_0000h FD_FFFF_FFFFh Configuration
>> + * FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
>> + * FE_2000_0000h FF_FFFF_FFFFh Reserved
>> + *
>> + * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
>> + * Table 3: Special Address Controls (GPA) for more information.
>> + */
>> +#define AMD_HT_START 0xfd00000000UL
>> +#define AMD_HT_END 0xffffffffffUL
>> +#define AMD_ABOVE_1TB_START (AMD_HT_END + 1)
>> +#define AMD_HT_SIZE (AMD_ABOVE_1TB_START - AMD_HT_START)
>> +
>> +static hwaddr x86_max_phys_addr(PCMachineState *pcms,
>> + uint64_t pci_hole64_size)
>> +{
>> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> + X86MachineState *x86ms = X86_MACHINE(pcms);
>> + MachineState *machine = MACHINE(pcms);
>> + ram_addr_t device_mem_size = 0;
>> + hwaddr base;
>> +
>> + if (pcmc->has_reserved_memory &&
>> + (machine->ram_size < machine->maxram_size)) {
>> + device_mem_size = machine->maxram_size - machine->ram_size;
>> + }
>> +
>> + base = ROUND_UP(x86ms->above_4g_mem_start + x86ms->above_4g_mem_size +
>> + pcms->sgx_epc.size, 1 * GiB);
>> +
>> + return base + device_mem_size + pci_hole64_size;
>> +}
>> +
>> +static void x86_update_above_4g_mem_start(PCMachineState *pcms,
>> + uint64_t pci_hole64_size)
>> +{
>> + X86MachineState *x86ms = X86_MACHINE(pcms);
>> + uint32_t eax, vendor[3];
>> +
>> + host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]);
>> + if (!IS_AMD_VENDOR(vendor)) {
>> + return;
>> + }
>
> Wait a sec, should this actually be tying things to the host CPU ID?
> It's really about what we present to the guest though,
> isn't it?
>
It was the easier catch all to use cpuid without going into
Linux UAPI specifics. But it doesn't have to tie in there, it is only
for systems with an IOMMU present.
> Also, can't we tie this to whether the AMD IOMMU is present?
>
I think so, I can add that. Something like a amd_iommu_exists() helper
in util/vfio-helpers.c which checks if there's any sysfs child entries
that start with ivhd in /sys/class/iommu/. Given that this HT region is
hardcoded in iommu reserved regions since >=4.11 (to latest) I don't think it's
even worth checking the range exists in:
/sys/kernel/iommu_groups/0/reserved_regions
(Also that sysfs ABI is >= 4.11 only)
next prev parent reply other threads:[~2022-02-23 23:38 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 18:44 [PATCH v3 0/6] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU Joao Martins
2022-02-23 18:44 ` [PATCH v3 1/6] hw/i386: add 4g boundary start to X86MachineState Joao Martins
2022-02-23 18:44 ` [PATCH v3 2/6] i386/pc: create pci-host qdev prior to pc_memory_init() Joao Martins
2022-02-23 18:44 ` [PATCH v3 3/6] i386/pc: pass pci_hole64_size " Joao Martins
2022-02-23 18:44 ` [PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable Joao Martins
2022-02-23 21:22 ` Michael S. Tsirkin
2022-02-23 23:35 ` Joao Martins [this message]
2022-02-24 16:07 ` Joao Martins
2022-02-24 17:23 ` Michael S. Tsirkin
2022-02-24 17:54 ` Joao Martins
2022-02-24 18:30 ` Michael S. Tsirkin
2022-02-24 19:44 ` Joao Martins
2022-02-24 19:54 ` Michael S. Tsirkin
2022-02-24 20:04 ` Joao Martins
2022-02-24 20:12 ` Michael S. Tsirkin
2022-02-24 20:34 ` Joao Martins
2022-02-24 21:40 ` Alex Williamson
2022-02-25 12:36 ` Joao Martins
2022-02-25 12:49 ` Michael S. Tsirkin
2022-02-25 17:40 ` Joao Martins
2022-02-25 16:15 ` Alex Williamson
2022-02-25 17:40 ` Joao Martins
2022-02-25 5:22 ` Michael S. Tsirkin
2022-02-25 12:36 ` Joao Martins
2022-02-25 3:52 ` Jason Wang
2022-02-24 14:27 ` Joao Martins
2022-02-23 18:44 ` [PATCH v3 5/6] i386/pc: warn if phys-bits is too low Joao Martins
2022-02-24 14:42 ` Joao Martins
2022-02-23 18:44 ` [PATCH v3 6/6] i386/pc: restrict AMD only enforcing of valid IOVAs to new machine type Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5fee0e05-e4d1-712b-9ad1-f009aba431ea@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=ani@anisinha.ca \
--cc=daniel.m.jordan@oracle.com \
--cc=david.edmondson@oracle.com \
--cc=ehabkost@redhat.com \
--cc=imammedo@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).