From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 2002:a5d:4c4c:0:0:0:0:0 with SMTP id n12-v6csp6404192wrt; Wed, 3 Oct 2018 07:47:04 -0700 (PDT) X-Google-Smtp-Source: ACcGV61C3DAdJ0N7Ws5f99AinWJcfPY696qJ1n/YwgPXYdElUiLb1nyBj55JoviS6QhKqNs7Qomi X-Received: by 2002:ac8:2b78:: with SMTP id 53-v6mr1540158qtv.255.1538578024143; Wed, 03 Oct 2018 07:47:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538578024; cv=none; d=google.com; s=arc-20160816; b=hQUazokUf+dzKebG/LCEWS9W7cbhlQceF9hTyR4Ce8N/yxAARH7DHYC7c9rIPrcMTp HdD83m23K8fukzfupEsLF2CX0G2n/LKpwRAkvSCV7iCDw/oHouuWWg7icUJuLifUE6W9 Jnz9T36wuio1WroEMJVjqbEQ5RDBkh2x7awP7Csb5v+vDC0GjTkkEwFdV5o0TRK/vxqN SNetdb8y6vgGfhvwLT5B/RyJRC5x2d/I4/vqUGpo1UYoUKZ1LuMFxl2Xj5QKr6ItAtqQ 4/Mk4Z8PJoVCGErBTEYdkSFzPf5/xpDpVNoG02K/ALAwXWyo2fcPKF+KqLXeLgp22lNG XcbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:to:from:date; bh=shsmYAngIykdHR4oFtb2gLrJSIVt3Zf6ywqV0nux/NA=; b=NYd1PogNIJhjLmEXHOHkUhFnqstu8Zh6KipsOxkIqvuWIqA2bUC4Fl9DiWgW3NUVkw ZWlq7oy9Ix9/sDRjYsHm5EhjNLvA6N/qo6qyl48X/DNjNtXuh9V4rMgjZ1FIQrJJS3F7 eup0vIo8u35K2jiqAfs/fQjNsNo1zA6wvF41B/gRmGzZxfCLthyI90/PAvYh3R0PSV9m eZaNCVOKNEaSV5p48xLhDTyZyPPJMDjAzSn7d7hUFbzZ3/I/tOzKfLqNboHODwWtyQdw xFFZKYyJeKTvT0VU6UUji/BpDEXJ5j/9B/Q7HGxO2EiLvTwyRR0mW4o5En0xffFFna65 lzrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id f91-v6si1089584qkh.31.2018.10.03.07.47.03 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 03 Oct 2018 07:47:04 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from localhost ([::1]:49143 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g7iQZ-00022V-Kc for alex.bennee@linaro.org; Wed, 03 Oct 2018 10:47:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42172) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g7iQO-00022Q-Uu for qemu-arm@nongnu.org; Wed, 03 Oct 2018 10:46:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g7iQK-0006nR-W5 for qemu-arm@nongnu.org; Wed, 03 Oct 2018 10:46:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54724) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1g7iQK-0006mt-Le; Wed, 03 Oct 2018 10:46:48 -0400 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A07293082E02; Wed, 3 Oct 2018 14:46:47 +0000 (UTC) Received: from work-vm (ovpn-117-83.ams2.redhat.com [10.36.117.83]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 91B47308BDA2; Wed, 3 Oct 2018 14:46:44 +0000 (UTC) Date: Wed, 3 Oct 2018 15:46:42 +0100 From: "Dr. David Alan Gilbert" To: Auger Eric Message-ID: <20181003144641.GB2208@work-vm> References: <1530602398-16127-1-git-send-email-eric.auger@redhat.com> <43a03645-fa17-3274-9a66-502acc27b2ee@redhat.com> <20181003141346.GA2208@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 03 Oct 2018 14:46:47 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-arm] [RFC v3 00/15] ARM virt: PCDIMM/NVDIMM at 2TB X-BeenThere: qemu-arm@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, drjones@redhat.com, david@redhat.com, Ard Biesheuvel , qemu-devel@nongnu.org, shameerali.kolothum.thodi@huawei.com, agraf@suse.de, qemu-arm@nongnu.org, imammedo@redhat.com, david@gibson.dropbear.id.au, Laszlo Ersek , eric.auger.pro@gmail.com Errors-To: qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org Sender: "Qemu-arm" X-TUID: H3nGCKPo4fgQ * Auger Eric (eric.auger@redhat.com) wrote: > Hi Dave, > > On 10/3/18 4:13 PM, Dr. David Alan Gilbert wrote: > > * Auger Eric (eric.auger@redhat.com) wrote: > >> Hi, > >> > >> On 7/3/18 9:19 AM, Eric Auger wrote: > >>> This series aims at supporting PCDIMM/NVDIMM intantiation in > >>> machvirt at 2TB guest physical address. > >>> > >>> This is achieved in 3 steps: > >>> 1) support more than 40b IPA/GPA > >>> 2) support PCDIMM instantiation > >>> 3) support NVDIMM instantiation > >> > >> While respinning this series I have some general questions that raise up > >> when thinking about extending the RAM on mach-virt: > >> > >> At the moment mach-virt offers 255GB max initial RAM starting at 1GB > >> ("-m " option). > >> > >> This series does not touch this initial RAM and only targets to add > >> device memory (usable for PCDIMM, NVDIMM, virtio-mem, virtio-pmem) in > >> 3.1 machine, located at 2TB. 3.0 address map top currently is at 1TB > >> (legacy aarch32 LPAE limit) so it would leave 1TB for IO or PCI. Is it OK? > > > > Is there a reason not to make this configurable? > > It sounds a perfectly reasonable number, but you wouldn't be too > > surprised if someone came along with a pile of huge GPUs. > > GPUs consume PCI MMIO region right? (we have a high mem PCI MMIO region > [512GB, 1TB]). Yeh I think so. > you mean having an option to define the base address of the device > memory? Well it was just a matter of not having too many knobs. What's wrong with lots of knobs ! > > > >> - Putting device memory at 2TB means only ARMv8/aarch64 would get > >> benefit of it. Is it an issue? ie. no device memory for ARMv7 or > >> ARMv8/aarch32. Do we need to put effort supporting more memory and > >> memory devices for those configs? there is less than 256GB free in the > >> existing 1TB mach-virt memory map anyway. > > > > They can always explicitly specify an address on a pc-dimm's addr > > property can't they? > > If an address is passed it must be within [2TB, 4TB]. This is checked in > memory_device_get_free_addr(). So no way. OK. Dave > >> - is it OK to rely only on device memory to extend the existing 255 GB > >> RAM or would we need additional initial memory? device memory usage > >> induces a more complex command line so this puts a constraint on upper > >> layers. Is it acceptable though? > > > > Check with a libvirt person? > definitively ;-) > > > >> - I revisited the series so that the max IPA size shift would get > >> automatically computed according to the top address reached by the > >> device memory, ie. 2TB + (maxram_size - ramsize). So we would not need > >> any additional kvm-type or explicit vm-phys-shift option to select the > >> correct max IPA shift (or any CPU phys-bits as suggested by Dave). This > >> also assumes we don't put anything beyond the device memory. It is OK? > > > > Generically that probably sounds OK; be careful about how complex that > > calculation gets, otherwise it might turn into a complex thing you have > > to be careful of the effect of changing it (and eg if changing it causes > > migration issues). > > the function that does this computation would be a class function that > can be changed per virt version. > > > >> - Igor told me we was concerned about the split-memory RAM model as it > >> caused a lot of trouble regarding compat/migration on PC machine. After > >> having studied the pc machine code I now wonder if we can compare the PC > >> compat issues with the ones we could encounter on ARM with the proposed > >> split memory model. > >> > >> On PC there are many knobs to tune the RAM layout > >> - max_ram_below_4g option tunes how much RAM we want below 4G > >> - gigabyte_align to force 3GB versus 3.5GB lowmem limit if ram_size > > >> max_ram_below_4g > >> - plus the usual ram_size which affects the rest of the initial ram > >> - plus the maxram_size, slots which affect the size of the device memory > >> - the device memory is just behind the initial RAM, aligned to 1GB > >> > >> Note the inital RAM and the device memory may be disjoint due to > >> misalignment of the initial ram size against 1GB > >> > >> On ARM, we would have 3.0 virt machine supporting only initial RAM from > >> 1GB to 256 GB. 3.1 (or beyond ;-)) virt machine would support the same > >> initial RAM + device memory from 2TB to 4TB. > >> > >> With that memory split and the different machine type, I don't see any > >> major hurdle with respect to migration. Do I miss something? > > > > A lot of those knobs are there to keep migration compatibility due to > > keeping behaviour the same for migration. > OK > > Thank you for your inputs. > > Eric > > > > Dave > > > >> Alternative to have a split model is having a floating RAM base for a > >> contiguous initial + device memory (contiguity actually depends on > >> initial RAM size alignment too). This requires significant changes in FW > >> and also potentially impacts the legacy virt address map as we need to > >> pass the RAM floating base address in some way (using an SRAM at 1GB) or > >> using fw_cfg. Is it worth the effort? Also, Peter/Laszlo mentioned their > >> reluctance to move the RAM earlier > >> (https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03172.html). > >> > >> Your feedbacks on those points are really welcome! > >> > >> Thanks > >> > >> Eric > >> > >>> > >>> This series reuses/rebases patches initially submitted by Shameer in [1] > >>> and Kwangwoo in [2]. > >>> > >>> I put all parts all together for consistency and due to dependencies > >>> however as soon as the kernel dependency is resolved we can consider > >>> upstreaming them separately. > >>> > >>> Support more than 40b IPA/GPA [ patches 1 - 5 ] > >>> ----------------------------------------------- > >>> was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > >>> > >>> At the moment the guest physical address space is limited to 40b > >>> due to KVM limitations. [0] bumps this limitation and allows to > >>> create a VM with up to 52b GPA address space. > >>> > >>> With this series, QEMU creates a virt VM with the max IPA range > >>> reported by the host kernel or 40b by default. > >>> > >>> This choice can be overriden by using the -machine kvm-type= > >>> option with bits within [40, 52]. If are not supported by > >>> the host, the legacy 40b value is used. > >>> > >>> Currently the EDK2 FW also hardcodes the max number of GPA bits to > >>> 40. This will need to be fixed. > >>> > >>> PCDIMM Support [ patches 6 - 11 ] > >>> --------------------------------- > >>> was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > >>> > >>> We instantiate the device_memory at 2TB. Using it obviously requires > >>> at least 42b of IPA/GPA. While its max capacity is currently limited > >>> to 2TB, the actual size depends on the initial guest RAM size and > >>> maxmem parameter. > >>> > >>> Actual hot-plug and hot-unplug of PC-DIMM is not suported due to lack > >>> of support of those features in baremetal. > >>> > >>> NVDIMM support [ patches 12 - 15 ] > >>> ---------------------------------- > >>> > >>> Once the memory hotplug framework is in place it is fairly > >>> straightforward to add support for NVDIMM. the machine "nvdimm" option > >>> turns the capability on. > >>> > >>> Best Regards > >>> > >>> Eric > >>> > >>> References: > >>> > >>> [0] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support > >>> https://www.spinics.net/lists/kernel/msg2841735.html > >>> > >>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > >>> http://patchwork.ozlabs.org/cover/914694/ > >>> > >>> [2] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html > >>> > >>> Tests: > >>> - On Cavium Gigabyte, a 48b VM was created. > >>> - Migration tests were performed between kernel supporting the > >>> feature and destination kernel not suporting it > >>> - test with ACPI: to overcome the limitation of EDK2 FW, virt > >>> memory map was hacked to move the device memory below 1TB. > >>> > >>> This series can be found at: > >>> https://github.com/eauger/qemu/tree/v2.12.0-dimm-2tb-v3 > >>> > >>> History: > >>> > >>> v2 -> v3: > >>> - fix pc_q35 and pc_piix compilation error > >>> - kwangwoo's email being not valid anymore, remove his address > >>> > >>> v1 -> v2: > >>> - kvm_get_max_vm_phys_shift moved in arch specific file > >>> - addition of NVDIMM part > >>> - single series > >>> - rebase on David's refactoring > >>> > >>> v1: > >>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > >>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > >>> > >>> Best Regards > >>> > >>> Eric > >>> > >>> > >>> Eric Auger (9): > >>> linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT > >>> hw/boards: Add a MachineState parameter to kvm_type callback > >>> kvm: add kvm_arm_get_max_vm_phys_shift > >>> hw/arm/virt: support kvm_type property > >>> hw/arm/virt: handle max_vm_phys_shift conflicts on migration > >>> hw/arm/virt: Allocate device_memory > >>> acpi: move build_srat_hotpluggable_memory to generic ACPI source > >>> hw/arm/boot: Expose the pmem nodes in the DT > >>> hw/arm/virt: Add nvdimm and nvdimm-persistence options > >>> > >>> Kwangwoo Lee (2): > >>> nvdimm: use configurable ACPI IO base and size > >>> hw/arm/virt: Add nvdimm hot-plug infrastructure > >>> > >>> Shameer Kolothum (4): > >>> hw/arm/virt: Add memory hotplug framework > >>> hw/arm/boot: introduce fdt_add_memory_node helper > >>> hw/arm/boot: Expose the PC-DIMM nodes in the DT > >>> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT > >>> > >>> accel/kvm/kvm-all.c | 2 +- > >>> default-configs/arm-softmmu.mak | 4 + > >>> hw/acpi/aml-build.c | 51 ++++ > >>> hw/acpi/nvdimm.c | 28 ++- > >>> hw/arm/boot.c | 123 +++++++-- > >>> hw/arm/virt-acpi-build.c | 10 + > >>> hw/arm/virt.c | 330 ++++++++++++++++++++++--- > >>> hw/i386/acpi-build.c | 49 ---- > >>> hw/i386/pc_piix.c | 8 +- > >>> hw/i386/pc_q35.c | 8 +- > >>> hw/ppc/mac_newworld.c | 2 +- > >>> hw/ppc/mac_oldworld.c | 2 +- > >>> hw/ppc/spapr.c | 2 +- > >>> include/hw/acpi/aml-build.h | 3 + > >>> include/hw/arm/arm.h | 2 + > >>> include/hw/arm/virt.h | 7 + > >>> include/hw/boards.h | 2 +- > >>> include/hw/mem/nvdimm.h | 12 + > >>> include/standard-headers/linux/virtio_config.h | 16 +- > >>> linux-headers/asm-mips/unistd.h | 18 +- > >>> linux-headers/asm-powerpc/kvm.h | 1 + > >>> linux-headers/linux/kvm.h | 16 ++ > >>> target/arm/kvm.c | 9 + > >>> target/arm/kvm_arm.h | 16 ++ > >>> 24 files changed, 597 insertions(+), 124 deletions(-) > >>> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK