From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9A808C433EF for ; Thu, 24 Feb 2022 18:31:36 +0000 (UTC) Received: from localhost ([::1]:46482 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nNItr-0006R8-Kr for qemu-devel@archiver.kernel.org; Thu, 24 Feb 2022 13:31:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50424) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNIsw-0005Re-Rt for qemu-devel@nongnu.org; Thu, 24 Feb 2022 13:30:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:30217) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nNIsr-00022D-Lj for qemu-devel@nongnu.org; Thu, 24 Feb 2022 13:30:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645727432; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mm+6SUt5WNuzv49pj/mIF6KmZzVIi+BL+BSl2ghB+Pg=; b=F8nb0AI+9a8Z34UV3S96jUSY9413MMZHWFdKXVq3WUsvU8DvABWfgS1cQzZybERKBHhGQC hHHLn7upNbv3Kj8WDoZw1QPg37QG5M8kGWsp1pFVE5wWjitB/Xzjmm6O9O81sTKpJtTHRN NKlP2HFrp1T+2psLpMI/hwM2LjeZPUg= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-327-r7SvSLBsOIWuXgUWQfZYVA-1; Thu, 24 Feb 2022 13:30:31 -0500 X-MC-Unique: r7SvSLBsOIWuXgUWQfZYVA-1 Received: by mail-wr1-f71.google.com with SMTP id j8-20020adfc688000000b001e3322ced69so242037wrg.13 for ; Thu, 24 Feb 2022 10:30:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mm+6SUt5WNuzv49pj/mIF6KmZzVIi+BL+BSl2ghB+Pg=; b=qHWZJrzY2pbWbiEq+6hPlm6pMVei5p6mkUCx/KarzGjh++Ihy/TQ+rTnUSw8HpFDlZ JMHmMLmWiq93v27g3zLzOJNfH8aGn/V8jzTRQ4wXtMkn2oPQJttsXmMmval1j325NJiV Brs0KK3fIJcfCXwOU7dxnBd3R9kFPqpYm0SU/4W83JgnleaYfYvLcLBgYzYZ2CQh+RTX OmRqWSlcDlGSo1usY2Wyp9CKE39OexiBejp5LBI1f0CbEe+FnrvJD3vForsm5lrmB159 3yt0v7VrYkV6YMtma7OfeRL15ZsWKR/y2mS6AwpFhaG419HjxDM84g18DocIxljvlFwE gXWg== X-Gm-Message-State: AOAM533v6cXHZm110ubwZD9y2ggdXNPvBwwmJqGwCX4ROa0ebqgl46AH RWlfbLXuNFNugc7srCClb2WNe3jfXA9COe9WqVyRNHpjZGQtZkw/oSwwsX1HX4L1ud8G1UiE43W j0iQsb2UtmqQ8mLc= X-Received: by 2002:a5d:4391:0:b0:1ee:df66:265e with SMTP id i17-20020a5d4391000000b001eedf66265emr1774256wrq.300.1645727430528; Thu, 24 Feb 2022 10:30:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJx0BWKgrjPR3zlLd1KNYESOyMBAVjeIgT0SLe/3MxAsX0GzmYwKvQ2d0iisCDqi+UAjHxoY2w== X-Received: by 2002:a5d:4391:0:b0:1ee:df66:265e with SMTP id i17-20020a5d4391000000b001eedf66265emr1774225wrq.300.1645727430248; Thu, 24 Feb 2022 10:30:30 -0800 (PST) Received: from redhat.com ([2.55.145.157]) by smtp.gmail.com with ESMTPSA id s189-20020a1ca9c6000000b0037ee648fbc4sm118008wme.45.2022.02.24.10.30.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Feb 2022 10:30:29 -0800 (PST) Date: Thu, 24 Feb 2022 13:30:24 -0500 From: "Michael S. Tsirkin" To: Joao Martins Subject: Re: [PATCH v3 4/6] i386/pc: relocate 4g start to 1T where applicable Message-ID: <20220224131607-mutt-send-email-mst@kernel.org> References: <20220223184455.9057-1-joao.m.martins@oracle.com> <20220223184455.9057-5-joao.m.martins@oracle.com> <20220223161744-mutt-send-email-mst@kernel.org> <5fee0e05-e4d1-712b-9ad1-f009aba431ea@oracle.com> <20220224122146-mutt-send-email-mst@kernel.org> <7afb8caf-5c98-d6db-d3e5-6e08b2832d57@oracle.com> MIME-Version: 1.0 In-Reply-To: <7afb8caf-5c98-d6db-d3e5-6e08b2832d57@oracle.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mst@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=170.10.129.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eduardo Habkost , Jason Wang , Richard Henderson , qemu-devel@nongnu.org, Daniel Jordan , David Edmondson , Alex Williamson , Paolo Bonzini , Ani Sinha , Igor Mammedov , Suravee Suthikulpanit Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Thu, Feb 24, 2022 at 05:54:58PM +0000, Joao Martins wrote: > On 2/24/22 17:23, Michael S. Tsirkin wrote: > > On Thu, Feb 24, 2022 at 04:07:22PM +0000, Joao Martins wrote: > >> On 2/23/22 23:35, Joao Martins wrote: > >>> On 2/23/22 21:22, Michael S. Tsirkin wrote: > >>>>> +static void x86_update_above_4g_mem_start(PCMachineState *pcms, > >>>>> + uint64_t pci_hole64_size) > >>>>> +{ > >>>>> + X86MachineState *x86ms = X86_MACHINE(pcms); > >>>>> + uint32_t eax, vendor[3]; > >>>>> + > >>>>> + host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]); > >>>>> + if (!IS_AMD_VENDOR(vendor)) { > >>>>> + return; > >>>>> + } > >>>> > >>>> Wait a sec, should this actually be tying things to the host CPU ID? > >>>> It's really about what we present to the guest though, > >>>> isn't it? > >>> > >>> It was the easier catch all to use cpuid without going into > >>> Linux UAPI specifics. But it doesn't have to tie in there, it is only > >>> for systems with an IOMMU present. > >>> > >>>> Also, can't we tie this to whether the AMD IOMMU is present? > >>>> > >>> I think so, I can add that. Something like a amd_iommu_exists() helper > >>> in util/vfio-helpers.c which checks if there's any sysfs child entries > >>> that start with ivhd in /sys/class/iommu/. Given that this HT region is > >>> hardcoded in iommu reserved regions since >=4.11 (to latest) I don't think it's > >>> even worth checking the range exists in: > >>> > >>> /sys/kernel/iommu_groups/0/reserved_regions > >>> > >>> (Also that sysfs ABI is >= 4.11 only) > >> > >> Here's what I have staged in local tree, to address your comment. > >> > >> Naturally the first chunk is what's affected by this patch the rest is a > >> precedessor patch to introduce qemu_amd_iommu_is_present(). Seems to pass > >> all the tests and what not. > >> > >> I am not entirely sure this is the right place to put such a helper, open > >> to suggestions. wrt to the naming of the helper, I tried to follow the rest > >> of the file's style. > >> > >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c > >> index a9be5d33a291..2ea4430d5dcc 100644 > >> --- a/hw/i386/pc.c > >> +++ b/hw/i386/pc.c > >> @@ -868,10 +868,8 @@ static void x86_update_above_4g_mem_start(PCMachineState *pcms, > >> uint64_t pci_hole64_size) > >> { > >> X86MachineState *x86ms = X86_MACHINE(pcms); > >> - uint32_t eax, vendor[3]; > >> > >> - host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]); > >> - if (!IS_AMD_VENDOR(vendor)) { > >> + if (!qemu_amd_iommu_is_present()) { > >> return; > >> } > >> > >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h > >> index 7bcce3bceb0f..eb4ea071ecec 100644 > >> --- a/include/qemu/osdep.h > >> +++ b/include/qemu/osdep.h > >> @@ -637,6 +637,15 @@ char *qemu_get_host_name(Error **errp); > >> */ > >> size_t qemu_get_host_physmem(void); > >> > >> +/** > >> + * qemu_amd_iommu_is_present: > >> + * > >> + * Operating system agnostic way of querying if an AMD IOMMU > >> + * is present. > >> + * > >> + */ > >> +bool qemu_amd_iommu_is_present(void); > >> + > >> /* > >> * Toggle write/execute on the pages marked MAP_JIT > >> * for the current thread. > >> diff --git a/util/oslib-posix.c b/util/oslib-posix.c > >> index f2be7321c59f..54cef21217c4 100644 > >> --- a/util/oslib-posix.c > >> +++ b/util/oslib-posix.c > >> @@ -982,3 +982,32 @@ size_t qemu_get_host_physmem(void) > >> #endif > >> return 0; > >> } > >> + > >> +bool qemu_amd_iommu_is_present(void) > >> +{ > >> + bool found = false; > >> +#ifdef CONFIG_LINUX > >> + struct dirent *entry; > >> + char *path; > >> + DIR *dir; > >> + > >> + path = g_strdup_printf("/sys/class/iommu"); > >> + dir = opendir(path); > >> + if (!dir) { > >> + g_free(path); > >> + return found; > >> + } > >> + > >> + do { > >> + entry = readdir(dir); > >> + if (entry && !strncmp(entry->d_name, "ivhd", 4)) { > >> + found = true; > >> + break; > >> + } > >> + } while (entry); > >> + > >> + g_free(path); > >> + closedir(dir); > >> +#endif > >> + return found; > >> +} > > > > why are we checking whether an AMD IOMMU is present > > on the host? > > Isn't what we care about whether it's > > present in the VM? What we are reserving after all > > is a range of GPAs, not actual host PA's ... > > > Right. > > But any DMA map done by VFIO/IOMMU using those GPA ranges will fail, > and so we need to not map that portion of address space entirely > and use some other portion of the GPA-space. This has to > do with host IOMMU which is where the DMA mapping of guest PA takes > place. So, if you do not have an host IOMMU, you can't > service guest DMA/PCIe services via VFIO through the host IOMMU, therefore you > don't need this problem. > > Whether one uses a guest IOMMU or not does not affect the result, > it would be more of a case of mimicking real hardware not fixing > the issue of this series. Hmm okay ... my first reaction was to say let's put it under VFIO then. And ideally move the logic reporting reserved ranges there too. However, I think vdpa has the same issue too. CC Jason for an opinion. Also, some concerns - I think this changes memory layout for working VMs not using VFIO. Better preserve the old layout for old machine types? - You mention the phys bits issue very briefly, and it's pretty concerning. Do we maybe want to also disable the work around if phys bits is too small? Also, we'd need to test a bunch of old guests to see what happens. Which guests were tested? -- MST